The other solution would be to send it to us in batch results, attaching a timestamp to the request telling us "this is what the user's social graph looked like at x time". I personally would start with the compressed format though, as that makes it all possible to retrieve in a single request.
On Sun, Sep 6, 2009 at 10:33 PM, Jesse Stay <[email protected]> wrote: > Agreed. Is there a chance Twitter can return the full results in compressed > (gzip or similar) format to reduce load, leaving the burden of decompressing > on our end and reducing bandwidth? I'm sure there are other areas this > could apply as well. I think you'll find compressing the full social graph > of a user significantly reduces the size of the data you have to pass > through the pipe - my tests have proved it to be a huge difference, and > you'll have to get way past the 10s of millions of ids before things slow > down at all after that. > Jesse > > > On Sun, Sep 6, 2009 at 8:27 PM, Dewald Pretorius <[email protected]> wrote: > >> >> There is no way that paging through a large and volatile data set can >> ever return results that are 100% accurate. >> >> Let's say one wants to page through @aplusk's followers list. That's >> going to take between 3 and 5 minutes just to collect the follower ids >> with &page (or the new cursors). >> >> It is likely that some of the follower ids that you have gone past and >> have already colledted, have unfollowed @aplusk while you are still >> collecting the rest. I assume that the Twitter system does paging by >> doing a standard SQL LIMIT clause. If you do LIMIT 1000000, 20 and >> some of the ids that you have already paged past have been deleted, >> the result set is going to "shift to the left" and you are going to >> miss the ones that were above 1000000 but have subsequently moved left >> to below 1000000. >> >> There really are only two solutions to this problem: >> >> a) we need to have the capability to reliably retrieve the entire >> result set in one API call, or >> >> b) everyone has to accept that the result set cannot be guaranteed to >> be 100% accurate. >> >> Dewald >> > >
