I've been using the Search API in a project and its been working very reliably. So today I decided to add support for pagination so I could pull in more results and I think I've identified a couple of bugs with the pagination code.
Bug 1) The first few results of Page 2 for a query are sometimes duplicates. To verify this do the following: 1. Execute the query: http://search.twitter.com/search.atom?lang=en&q=http&rpp=100 2. Grab the "next" link from the results and execute that. 3. Compare the ID's at the end of set one with the ID's at the begining of set 2. They sometimes overlap. Bug 2) The second bug may be the cause of the 1st bug. The link you get for "next" in a result set is missing the "lang=en" query param. So you end up getting non-english items in your result set. You can manually add the "lang=en" param to your query and while you still get dupes you get less. If you do this though you then start getting a warning in the result set about an adjusted since_id. What's scarier though is that the result set seemed to get wierd on me if I added the "lang" param and requested pages too fast. By that I mean I would sometimes get results for Page 2 that were (time wise) hours before my original Since ID so my code would just stop requesting pages since it assumed it had reached the end of the set. The scary part... Adding around a 2 seconds sleep between queries seemed to make this issue go away... In general the pagination stuff with the "next" link doesn't seem very reliable to me. You do seem to get less dupes then just calling search and incrementing the page number. But I'm still seeing dupes, results for the wrong language, and sometimes totally wierd results. -steve
