I've been trying to write a script to use the max_id parameter to loop
through all 15 pages of results (with 100 results per page) without
getting in troubles with grabbing the same tweet multiple times.

Every time I do so, I find that not only are there a couple of
duplicates on page 1 and 2, but also that the last tweet on page 1 is
well further into the future, and has a lower ID, than a bunch of
tweets on page 2.

For example, consider these two, both with the same max_id but page =
1 and page = 2 respectively:

http://search.twitter.com/search?rpp=100&page=1&geocode=-40.900557,174.885971,1000km&max_id=5379894247
http://search.twitter.com/search?rpp=100&page=2&geocode=-40.900557,174.885971,1000km&max_id=5379894247

(Or if you prefer json links which are what I am actually using, but I
see the same thing on the above ones which are easier to describe:
http://search.twitter.com/search.json?q=&rpp=100&geocode=-40.900557,174.885971,1000km&page=1&max_id=5379894247
Request: 
http://search.twitter.com/search.json?q=&rpp=100&geocode=-40.900557,174.885971,1000km&page=2&max_id=5379894247)

The first result on page 2 above was posted about 4 hours before the
last tweet on page 1.  There are also duplicates, eg  AshleyGray00:
Fireworks!

I've been trying to figure this bug out for a while as I'm sure I'm
missing something obvious but I'm completely stumped. Does anyone have
any clue what is going on here? The only other threads I have found
are about people trying to combine since_id and max_id which I know is
not allowed, so I can't find anyone else having similar problems.

Reply via email to