Hello all, My app has quickly outgrown the basic functionality of the search API. I am searching for a few specific hash tags along with some keywords to pull out a niche group of tweets. This results in about a 35% return of non-related tweets that are cluttering up my results. I am not looking into a method to cache these results to a database so I can do some advanced filtering and not hammer the api with requests.
I'm kinda "talking outloud", but here is what I'm thinking. (PHP MySQL) 1. I have 4 main queries that I need to run to get all results. 2. Setup a cron job that'll query the api hourly. 3. Parse through the results and store into temp database 4. Check for duplicate tweets since more than 1 query may return the same result 5. Since we can only return 100 results at a time I'll have to loop through pages and make additional queries. I don't think this will happen much as my search is very niche. Count tweets return vs expected results to detect end of results. 6. I'll store since_id in a config table as to not return redundant tweets. Once the data is in the database I should easily be able to filter out most of the spam using other methods not available through the search api. This should also make twitter happy as I am cutting down on api request drastically. Even if I bump my cron up to 15mins I would only be making < 20 calls an hour. Does this sound like a reasonable basic plan? Is there anything I am overlooking? Thanks for reading!
