[twitter-dev] search api: best practice to capture all tweets.

maestrojed Sat, 16 Jan 2010 18:49:09 -0800

I would like to capture and store all tweets that match a search query
and do so from this time forward. My 1st attempt to do this was to
query and store the matching results (tweets); additional queries
include the parameter since_id="The max id value already stored".
However the search api does not seem reliable to code this way. I am
missing tweets because apparently the api does not always return all
matches every query. By coding this way if a tweet is missed but the
next one is captured, because the next one has a higher id the missing
tweet will never be recovered.


This is discussed here:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/b7b6859620327bad/a31a88f8125c1c4e?lnk=gst&q=search+api+store+#a31a88f8125c1c4e

This is my code, I then just run it as a cron once a min.
http://pastebin.com/f6207f43

So if this is not a reliable method, what is?

I was thinking I could just remove the since_id parameter which would
return the 100 most recent results. Then, in my code, I could see if
the tweet was already stored or not and update/insert accordingly. If
a tweet is missing from a query maybe it will be there next time and
will be added. However this approach would fail if there were more
then a 100 results a minute. This script would not keep up.

I really appreciate any advice.

[twitter-dev] search api: best practice to capture all tweets.

Reply via email to