Thanks for the reply Matt...
Just as an FYI...
I updated my code to track duplicates and then did a sample run over a
5 minute period that once a minute paged in new results for the query
http filter:links This resulted in about 11 pages of results each
minute and over the 11 pages I saw
the query http filter:links (which is a bit redundant) is such a
high volume query that I would doubt that the search servers would
ever be able to keep in sync even when things were running up to
speed.
Try with a less traffic'd query like twitter
-Chad
On Thu, Apr 16, 2009 at 6:55 PM,
So my project is a sort of tweetmeme or twitturly type thing where I'm
looking to collect a sample of the links being shared through
Twitter. Unlike those projects I don't have a firehose so I have to
rely on search. Fortunatly, I don't really need to see every link for
my project just a
I can't speak for twitter on the permission to do that side, but
that technique will work just fine, so you should be good to go
technically.
-chad
On Thu, Apr 16, 2009 at 9:34 PM, stevenic ick...@gmail.com wrote:
Matt... Another thought I just had...
As Chad points out, with my particular
It would be helpful if you could give some example output/results
where you are seeing duplicates across pages. I have spent a long
long time with the Search API and haven't ever had this problem (or
maybe I have and never noticed it).
-Chad
On Wed, Apr 15, 2009 at 9:07 PM, steve
Sure... It repros for me every time in IE using the steps I outlined
above. Do a query for lang=enq=http. Open the next link in a
new tab of your browser and compare the ID's.
So I just did this from my home PC and here's the condensed output.
Notice that on Page 2 not only do I get 3 dupes
Ok... So I think I know what's going on. Well I don't know what's
causing the bug obviously but I think I've narrowed down where it
is...
I just issued the Page 1 or previous query for the above example and
the ID's don't match the ID's from the original query. There are
extra rows that come