There is no way that paging through a large and volatile data set can
ever return results that are 100% accurate.

Let's say one wants to page through @aplusk's followers list. That's
going to take between 3 and 5 minutes just to collect the follower ids
with &page (or the new cursors).

It is likely that some of the follower ids that you have gone past and
have already colledted, have unfollowed @aplusk while you are still
collecting the rest. I assume that the Twitter system does paging by
doing a standard SQL LIMIT clause. If you do LIMIT 1000000, 20 and
some of the ids that you have already paged past have been deleted,
the result set is going to "shift to the left" and you are going to
miss the ones that were above 1000000 but have subsequently moved left
to below 1000000.

There really are only two solutions to this problem:

a) we need to have the capability to reliably retrieve the entire
result set in one API call, or

b) everyone has to accept that the result set cannot be guaranteed to
be 100% accurate.

Dewald

Reply via email to