Hi All,
We have been noticing gaps appearing in search results at times when
doing geocoded searches in particular. For example with this search
over South Eastern Australia :
http://search.twitter.com/search.json?rpp=100&lang=en&page=1&geocode=-35.2,144.0,1000km
Occasionally produces results with large gaps in the created_at for the
tweets. For example, I just got these created_at for tweets returned :
...
Tue, 17 Nov 2009 22:59:50 +0000
Tue, 17 Nov 2009 22:59:49 +0000
Tue, 17 Nov 2009 22:59:48 +0000
Tue, 17 Nov 2009 22:59:43 +0000
Tue, 17 Nov 2009 22:52:04 +0000
Tue, 17 Nov 2009 22:52:04 +0000
Tue, 17 Nov 2009 22:51:34 +0000
Tue, 17 Nov 2009 22:50:21 +0000
Tue, 17 Nov 2009 22:50:20 +0000
Tue, 17 Nov 2009 22:43:37 +0000
...
This area is producing multiple tweets per second, but there are some
gaps there many minutes long. A subsequent search 10's of seconds, to a
few minutes later 'fills in these gaps' with many many more tweets from
the periods in between these minutes-long gaps, confirming that the
initial search was in fact sparse.
The same effect exists via the normal web search interface also.
It has previously been possible to just use the maximum id of the tweets
from the previous search result set, and then only page through results
until you saw that id again. But having the search results appear out
of order means this method is not possible. It means you would have to
search across all 15 pages x 100 rpp continuously in order to ensure
something approaching a complete result set. Even then it will not
always be possible if ~ 1500 of results 'appear at once'. This is not
sustainable for both the app doing the searching, or from the point of
view of the many additional requests that would have to hit Twitter's
servers.
Is this a problem to be resolved, or moving forward are we going to
continue to get results appearing via search out of order from their
creation date like this?
Thanks,
JB.