On Tue, Apr 7, 2009 at 6:50 AM, Matt <[email protected]> wrote:

>
>
> 4. Check for duplicate tweets since more than 1 query may return the
> same result


The database will do this for you if you make statusID a unique column and
use INSERT IGNORE when you add rows. It will just ignore the duplicate rows.


>
> 5. Since we can only return 100 results at a time I'll have to loop
> through pages and make additional queries


FYI, you're not really making additional queries to get more than 100
results, you're just asking for the next page of results from the same
query.


> 6. I'll store since_id in a config table as to not return redundant
> tweets.


You could just get the maximum statusID in your database, avoiding the need
to store it separately... just don't delete that row until you have
retrieved the next set.

>
>
> Once the data is in the database I should easily be able to filter out
> most of the spam using other methods not available through the search
> api. This should also make twitter happy as I am cutting down on api
> request drastically. Even if I bump my cron up to 15mins I would only
> be making < 20 calls an hour. Does this sound like a reasonable basic
> plan? Is there anything I am overlooking?


You're counting each page as a separate API call, right?  If you do this
four times an hour, with four queries, that's 16 queries.  If you get five
pages of results per query, that's 80 API calls... but I wouldn't worry
about hitting the limits even at that rate.

Nick

Reply via email to