I've been using the Search API in a project and its been working very
reliably.  So today I decided to add support for pagination so I could
pull in more results and I think I've identified a couple of bugs with
the pagination code.

Bug 1)

The first few results of Page 2 for a query are sometimes duplicates.
To verify this do the following:

   1. Execute the query: 
http://search.twitter.com/search.atom?lang=en&q=http&rpp=100
   2. Grab the "next" link from the results and execute that.
   3. Compare the ID's at the end of set one with the ID's at the
begining of set 2.  They sometimes overlap.


Bug 2)

The second bug may be the cause of the 1st bug.  The link you get for
"next" in a result set is missing the "lang=en" query param.  So you
end up getting non-english items in your result set.  You can manually
add the "lang=en" param to your query and while you still get dupes
you get less.  If you do this though you then start getting a warning
in the result set about an adjusted since_id.

What's scarier though is that the result set seemed to get wierd on me
if I added the "lang" param and requested pages too fast.  By that I
mean I would sometimes get results for Page 2 that were (time wise)
hours before my original Since ID so my code would just stop
requesting pages since it assumed it had reached the end of the set.
The scary part... Adding around a 2 seconds sleep between queries
seemed to make this issue go away...


In general the pagination stuff with the "next" link doesn't seem very
reliable to me.  You do seem to get less dupes then just calling
search and incrementing the page number.  But I'm still seeing dupes,
results for the wrong language, and sometimes totally wierd results.

-steve

Reply via email to