Anyone else still confused at how this works? I'm still confused at how this is any different than the way it was before with the paging (other than one-less API call).
Jesse On Sun, Oct 4, 2009 at 10:57 PM, John Kalucki <[email protected]> wrote: > > If an API is untrusted, it must be treated as entirely untrusted. You > should be adding defensive heuristics between the untrusted API > results and your application. If a given fetch seems bad, then queue > the results and don't act on them until otherwise corroborated, > perhaps by some quorum of subsequent results. You should also > carefully be checking HTTP result codes, and performing exhaustive > field existence checking. > > In the end, if some results are untrusted, you cannot trust the > suggested improvements, as the improvements will, by necessity, be > served from the same data store. > > Finally, the suggested improvements take resources away from > stabilizing and otherwise improving the API. > > The purpose of the cursored resource is to make retrieval of high- > velocity high-cardinality sets possible via a RESTful API. This scheme > does not provide a snapshot view. > > The cursor scheme offers several useful properties however. One such > property is that if an edge exists at the beginning of a traversal and > remains unmodified throughout the traversal, the edge will always(**) > be returned in the result set, regardless of all other possible > operations performed on all other edges in the set. Additions and > modifications made after the first block is returned will tend to not > to be represented (perhaps never be present). Deletions made after the > first block is returned may or may not be represented. This is a very > strong and very useful form of consistency. > > ** = There remains an issue with cursor jitter that can, very rarely, > result in minor loss and minor overdelivery. I don't know when this > issue will be fully addressed. This jitter issue should only effect > high velocity sets, and rarely, if ever, affect ordinary users. > > -John Kalucki > http://twitter.com/jkalucki > Services, Twitter Inc. > > > On Oct 4, 10:45 am, Jesse Stay <[email protected]> wrote: > > John, because no offense, but frankly I don't trust the Twitter API. I've > > been burned too many times by things that were "supposed to work", code > > pushed into production that wasn't tested properly, etc. that I know > better > > to do all I can to account for Twitter's mistakes. There's no telling if > at > > some point that next_cursor returns nothing, but in reality it was > supposed > > to return something, and my users accidentally unfollow all their friends > > because of it when they weren't intending to do so. > > Having that number in there ensures, without a doubt (unless the number > > itself is wrong, which I can't do anything about), that I know if Twitter > is > > right or not when I retrieve that next_cursor value. I hope that makes > > sense - it's nothing against Twitter, I've just seen it too many times to > > know that I need to have backup error checking in place to be sure I know > > Twitter's return data is correct. > > > > Regarding the user being removed before finished, I thought the whole > > purpose of these cursors was to provide a snapshot of a social graph at a > > given point of time, so unfollowed users don't show up until after the > list > > is retrieved - is that not the case? Also, my experience has been that > > pulling the user's friend and follower count ahead of time pulls a number > > that is not the same as the number of followers/friends I actually pull > from > > the API. Having you guys do a count on the set ahead of time will help > > ensure that's the correct number. > > > > Thanks, > > > > Jesse > > > > On Sun, Oct 4, 2009 at 8:24 AM, John Kalucki <[email protected]> wrote: > > > > > Curious -- why isn't the end of list indicator a reliable enough > > > indication? "Iterate until" seems simple and reliable. > > > > > Can you request the denormalized count via the API before you begin? > > > (Not familiar enough with the API, but the back-end store offers this > > > for all sorts of purposes.) You'd have to apply some heuristic to > > > allow for high-velocity sets. > > > > > The last user in the list could be removed before iteration completes, > > > setting up a race-condition that you'd have to allow for as well. > > > > > -John Kalucki > > >http://twitter.com/jkalucki > > > Services, Twitter Inc. > > > > > On Oct 4, 1:29 am, Jesse Stay <[email protected]> wrote: > > > > I was wondering if it might be possible to include, at least in the > first > > > > page, but if it's easier it could be on all pages, either a total > > > expected > > > > number of followers/friends, or a total expected number of returned > pages > > > > when the cursor parameter is provided for friends/ids and > followers/ids? > > > I'm > > > > assuming since you're moving to the cursor-based approach you ought > to be > > > > able to accurately count this now since it's a snapshot of the data > at > > > that > > > > time. > > > > The reason I think that would be useful is that occasionally Twitter > goes > > > > down or introduces code that could break this. This would enable us > to > > > be > > > > absolutely sure we've hit the end of the entire set. I guess another > > > > approach could also be to just list the last expected cursor ID in > the > > > set > > > > so we can be looking for that. > > > > > > Thanks, > > > > > > Jesse >
