If an API is untrusted, it must be treated as entirely untrusted. You
should be adding defensive heuristics between the untrusted API
results and your application. If a given fetch seems bad, then queue
the results and don't act on them until otherwise corroborated,
perhaps by some quorum of subsequent results. You should also
carefully be checking HTTP result codes, and performing exhaustive
field existence checking.

In the end, if some results are untrusted, you cannot trust the
suggested improvements, as the improvements will, by necessity, be
served from the same data store.

Finally, the suggested improvements take resources away from
stabilizing and otherwise improving the API.

The purpose of the cursored resource is to make retrieval of high-
velocity high-cardinality sets possible via a RESTful API. This scheme
does not provide a snapshot view.

The cursor scheme offers several useful properties however. One such
property is that if an edge exists at the beginning of a traversal and
remains unmodified throughout the traversal, the edge will always(**)
be returned in the result set, regardless of all other possible
operations performed on all other edges in the set. Additions and
modifications made after the first block is returned will tend to not
to be represented (perhaps never be present). Deletions made after the
first block is returned may or may not be represented. This is a very
strong and very useful form of consistency.

** = There remains an issue with cursor jitter that can, very rarely,
result in minor loss and minor overdelivery. I don't know when this
issue will be fully addressed. This jitter issue should only effect
high velocity sets, and rarely, if ever, affect ordinary users.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Oct 4, 10:45 am, Jesse Stay <jesses...@gmail.com> wrote:
> John, because no offense, but frankly I don't trust the Twitter API. I've
> been burned too many times by things that were "supposed to work", code
> pushed into production that wasn't tested properly, etc. that I know better
> to do all I can to account for Twitter's mistakes.  There's no telling if at
> some point that next_cursor returns nothing, but in reality it was supposed
> to return something, and my users accidentally unfollow all their friends
> because of it when they weren't intending to do so.
> Having that number in there ensures, without a doubt (unless the number
> itself is wrong, which I can't do anything about), that I know if Twitter is
> right or not when I retrieve that next_cursor value.  I hope that makes
> sense - it's nothing against Twitter, I've just seen it too many times to
> know that I need to have backup error checking in place to be sure I know
> Twitter's return data is correct.
>
> Regarding the user being removed before finished, I thought the whole
> purpose of these cursors was to provide a snapshot of a social graph at a
> given point of time, so unfollowed users don't show up until after the list
> is retrieved - is that not the case?  Also, my experience has been that
> pulling the user's friend and follower count ahead of time pulls a number
> that is not the same as the number of followers/friends I actually pull from
> the API.  Having you guys do a count on the set ahead of time will help
> ensure that's the correct number.
>
> Thanks,
>
> Jesse
>
> On Sun, Oct 4, 2009 at 8:24 AM, John Kalucki <jkalu...@gmail.com> wrote:
>
> > Curious -- why isn't the end of list indicator a reliable enough
> > indication?  "Iterate until" seems simple and reliable.
>
> > Can you request the denormalized count via the API before you begin?
> > (Not familiar enough with the API, but the back-end store offers this
> > for all sorts of purposes.) You'd have to apply some heuristic to
> > allow for high-velocity sets.
>
> > The last user in the list could be removed before iteration completes,
> > setting up a race-condition that you'd have to allow for as well.
>
> > -John Kalucki
> >http://twitter.com/jkalucki
> > Services, Twitter Inc.
>
> > On Oct 4, 1:29 am, Jesse Stay <jesses...@gmail.com> wrote:
> > > I was wondering if it might be possible to include, at least in the first
> > > page, but if it's easier it could be on all pages, either a total
> > expected
> > > number of followers/friends, or a total expected number of returned pages
> > > when the cursor parameter is provided for friends/ids and followers/ids?
> > I'm
> > > assuming since you're moving to the cursor-based approach you ought to be
> > > able to accurately count this now since it's a snapshot of the data at
> > that
> > > time.
> > > The reason I think that would be useful is that occasionally Twitter goes
> > > down or introduces code that could break this.  This would enable us to
> > be
> > > absolutely sure we've hit the end of the entire set.  I guess another
> > > approach could also be to just list the last expected cursor ID in the
> > set
> > > so we can be looking for that.
>
> > > Thanks,
>
> > > Jesse

Reply via email to