Hi API Team, A few of us have been discussing off list a funky behavior we have been noticing and now users are starting to notice.
There is a problem for sites/apps like TweetGrid and TweetChat which auto-refresh tweets based on the Search API using the since_id. People are noticing that these sites are "missing tweets" when compared to the search.twitter.com results page for the same query. We believe what is happening is that the search servers are not indexing tweets in a serial manner, and so a tweet with a higher id may sneak into a search server and be indexed first before a tweet with a lower id. This means that when the since_id is sent back from the query (or derived from the first result in the results array), using that since_id to refresh the query will miss lower id tweets when they finally do get indexed. So the illusion of "missing tweets" is created. You can run TweetGrid and TweetChat in separate tabs using the same query and see that sometimes the results don't match up because of this. I'll try to give an example to be clear. Let's say for the sake of simplicity that I'm searching for "twitter" and that every 10th tweet in the public timeline matches. So, all tweets ending in 0 match my query. Search server 1 may index: 20 30 40 60 70 (notice missing 50) At the same time, Search server 2 may index: 20 30 40 50 (notice hasn't indexed 60 or 70 yet) I send a query and get a response from Server 1 and get a since_id of 70. On my next request I use that since_id=70 and I'll never see tweet 50. Thus the "missing tweets". This is quite annoying, especially now that users are noticing and complaining to us (the app devs) that are apps are broken. I cannot think of a good work around for this that would be simple enough to implement and be worth the effort. Is this behavior something anyone else can confirm? Are tweets supposed to be indexed/replicated serially by the search servers? -Chad