I would like to capture and store all tweets that match a search query and do so from this time forward. My 1st attempt to do this was to query and store the matching results (tweets); additional queries include the parameter since_id="The max id value already stored". However the search api does not seem reliable to code this way. I am missing tweets because apparently the api does not always return all matches every query. By coding this way if a tweet is missed but the next one is captured, because the next one has a higher id the missing tweet will never be recovered.
This is discussed here: http://groups.google.com/group/twitter-development-talk/browse_thread/thread/b7b6859620327bad/a31a88f8125c1c4e?lnk=gst&q=search+api+store+#a31a88f8125c1c4e This is my code, I then just run it as a cron once a min. http://pastebin.com/f6207f43 So if this is not a reliable method, what is? I was thinking I could just remove the since_id parameter which would return the 100 most recent results. Then, in my code, I could see if the tweet was already stored or not and update/insert accordingly. If a tweet is missing from a query maybe it will be there next time and will be added. However this approach would fail if there were more then a 100 results a minute. This script would not keep up. I really appreciate any advice.