On Tue, Apr 7, 2009 at 6:50 AM, Matt <[email protected]> wrote: > > > 4. Check for duplicate tweets since more than 1 query may return the > same result
The database will do this for you if you make statusID a unique column and use INSERT IGNORE when you add rows. It will just ignore the duplicate rows. > > 5. Since we can only return 100 results at a time I'll have to loop > through pages and make additional queries FYI, you're not really making additional queries to get more than 100 results, you're just asking for the next page of results from the same query. > 6. I'll store since_id in a config table as to not return redundant > tweets. You could just get the maximum statusID in your database, avoiding the need to store it separately... just don't delete that row until you have retrieved the next set. > > > Once the data is in the database I should easily be able to filter out > most of the spam using other methods not available through the search > api. This should also make twitter happy as I am cutting down on api > request drastically. Even if I bump my cron up to 15mins I would only > be making < 20 calls an hour. Does this sound like a reasonable basic > plan? Is there anything I am overlooking? You're counting each page as a separate API call, right? If you do this four times an hour, with four queries, that's 16 queries. If you get five pages of results per query, that's 80 API calls... but I wouldn't worry about hitting the limits even at that rate. Nick
