Hello all,

My app has quickly outgrown the basic functionality of the search API.
I am searching for a few specific hash tags along with some keywords
to pull out a niche group of tweets. This results in about a 35%
return of non-related tweets that are cluttering up my results. I am
not looking into a method to cache these results to a database so I
can do some advanced filtering and not hammer the api with requests.

I'm kinda "talking outloud", but here is what I'm thinking. (PHP
MySQL)

1. I have 4 main queries that I need to run to get all results.
2. Setup a cron job that'll query the api hourly.
3. Parse through the results and store into temp database
4. Check for duplicate tweets since more than 1 query may return the
same result
5. Since we can only return 100 results at a time I'll have to loop
through pages and make additional queries. I don't think this will
happen much as my search is very niche. Count tweets return vs
expected results to detect end of results.
6. I'll store since_id in a config table as to not return redundant
tweets.

Once the data is in the database I should easily be able to filter out
most of the spam using other methods not available through the search
api. This should also make twitter happy as I am cutting down on api
request drastically. Even if I bump my cron up to 15mins I would only
be making < 20 calls an hour. Does this sound like a reasonable basic
plan? Is there anything I am overlooking?

Thanks for reading!

Reply via email to