Agreed. Is there a chance Twitter can return the full results in compressed
(gzip or similar) format to reduce load, leaving the burden of decompressing
on our end and reducing bandwidth?  I'm sure there are other areas this
could apply as well.  I think you'll find compressing the full social graph
of a user significantly reduces the size of the data you have to pass
through the pipe - my tests have proved it to be a huge difference, and
you'll have to get way past the 10s of millions of ids before things slow
down at all after that.
Jesse

On Sun, Sep 6, 2009 at 8:27 PM, Dewald Pretorius <dpr...@gmail.com> wrote:

>
> There is no way that paging through a large and volatile data set can
> ever return results that are 100% accurate.
>
> Let's say one wants to page through @aplusk's followers list. That's
> going to take between 3 and 5 minutes just to collect the follower ids
> with &page (or the new cursors).
>
> It is likely that some of the follower ids that you have gone past and
> have already colledted, have unfollowed @aplusk while you are still
> collecting the rest. I assume that the Twitter system does paging by
> doing a standard SQL LIMIT clause. If you do LIMIT 1000000, 20 and
> some of the ids that you have already paged past have been deleted,
> the result set is going to "shift to the left" and you are going to
> miss the ones that were above 1000000 but have subsequently moved left
> to below 1000000.
>
> There really are only two solutions to this problem:
>
> a) we need to have the capability to reliably retrieve the entire
> result set in one API call, or
>
> b) everyone has to accept that the result set cannot be guaranteed to
> be 100% accurate.
>
> Dewald
>

Reply via email to