[twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Naveen Thu, 08 Apr 2010 17:03:46 -0700

Ahh, yes, your workaround is a little better than mine, but it is
still a work around and requires changes to how since_id is currently
used by what I have assume is most applications. I understand the need
for change and am willing to work around it, I can imagine the
scalability issues of trying to use a synchronized id for all tweets.


However, I wanted to be clear and feel it should be made obvious that
with this change, there is a possibility that a tweet may not be
delivered to client if the implementation of how since_id is currently
used is not updated to cover the case.  I still envision the situation
as more likely than you seem to believe and figure as tweet velocity
increases, the likelihood will also increase; But I am assuming have
better data to support your viewpoint than I and shall defer.

--Naveen Ayyagari
@knight9
@SocialScope

On Apr 8, 7:37 pm, Mark McBride <[email protected]> wrote:
> It's a possibility, but by no means a probability.  Note that you can
> mitigate this by using the newest tweet that is outside your "danger zone".
>  For example in a sequence of tweets t1, t2 ... ti ... tn with creation
> times c1, c2 ... ci ... cn and a comfort threshold e you could use since_id
> from the latest ti such that c1 - ci > e.
>
>   ---Mark
>
> http://twitter.com/mccv
>
> On Thu, Apr 8, 2010 at 4:27 PM, Naveen <[email protected]> wrote:
> > This was my initial concern with the randomly generated ids that I
> > brought up, though I think Brian described it better than I.
>
> > It simply seems very likely that when using since_id to populate newer
> > tweets for the user, that some tweets will never be seen, because the
> > since_id of the last message received will be larger than one
> > generated 1ms later.
>
> > With the random generation of ids, I can see two way guarantee
> > delivery of all tweets in a users timeline
> > 1. Page forwards and backwards to ensure no tweets generated at or
> > near the same time as the newest one did not receive a lower id. This
> > will be very expensive for a mobile client not to mention complicate
> > any refresh algorithms significantly.
> > 2. Given that we know how IDs are generated (i.e. which bits represent
> > the time) we can simply over request by decrementing the since_id time
> > bits, by a second or two and filter out duplicates. (again, not really
> > ideal for mobile clients where battery life is an issue, plus it then
> > makes the implementation very dependent on twitters id format
> > remaining stable)
>
> > Please anyone explain if Brian and I are misinterpreting this as a
> > very real possibility of never displaying some tweets in a time line,
> > without changing how we request data from twitter (i.e. since_id
> > doesn't break)
>
> > --Naveen Ayyagari
> > @knight9
> > @SocialScope
>
> > On Apr 8, 7:01 pm, "Brian Smith" <[email protected]> wrote:
> > > What does “within the caveats given above” mean? Either since_id will
> > work or it won’t. It seems to me that if IDs are only in a “rough” order,
> > since_id won’t work—in particular, there is a possibility that paging
> > through tweets using since_id will completely skip over some tweets.
>
> > > My concern is that, since tweets will not be serialized at the time they
> > are written, there will be a race condition between me making a request and
> > users posting new statuses. That is, I could get a response with the largest
> > id in the response being X that gets evaluated just before a tweet (X-1) has
> > been saved in the database; If so, when I issue a request with since_id=X,
> > my program will never see the newer tweet (X-1).
>
> > > Are you going to change the implementation of the timeline methods so
> > that they never return a tweet with ID X until all nodes in the cluster
> > guarantee that they won’t create a new tweet with an ID less than X?
>
> > > I implement the following logic:
>
> > > 1.      Let LATEST start out as the earliest tweet available in the
> > user’s timeline.
>
> > > 2.      Make a request with since_id={LATEST}, which returns a set of
> > tweets T.
>
> > > 3.      If T is empty then stop.
>
> > > 4.      Let LATEST= max({ id(t), for all t in T}).
>
> > > 5.      Goto 2.
>
> > > Will I be guaranteed not to skip over any tweets in the timeline using
> > this logic? If not, what do I need to do to ensure I get them all?
>
> > > Thanks,
>
> > > Brian
>
> > > From: [email protected] [mailto:
> > [email protected]] On Behalf Of Mark McBride
> > > Sent: Thursday, April 08, 2010 5:10 PM
> > > To: [email protected]
> > > Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are
> > sequenced
>
> > > Thank you for the feedback.  It's great to hear about the variety of use
> > cases people have for the API, and in particular all the different ways
> > people are using IDs. To alleviate some of the concerns raised in this
> > thread we thought it would be useful to give more details about how we plan
> > to generate IDs
>
> > > 1) IDs are still 64-bit integers.  This should minimize any migration
> > pains.
>
> > > 2) You can still sort on ID.  Within a few millieconds you may get out of
> > order results, but for most use cases this shouldn't be an issue.
>
> > > 3) since_id will still work (within the caveats given above).
>
> > > 4) We will provide a way to backfill from the streaming API.
>
> > > 5) You cannot use the generated ID to reverse engineer tweet velocity.
> >  Note that you can still use the streaming API to determine the rate of
> > public statuses.
>
> > > Additional items of interest
>
> > > 1) At some point we will likely start using this as an ID for direct
> > messages too
>
> > > 2) We will almost certainly open source the ID generation code, probably
> > before we actually cut over to using it.
>
> > > 3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly
> > being within a few ms buckets), opaque 64-bit integers.  We may need to
> > change the scheme again at some point in the future, and want to minimize
> > migration pains should we need to do this.
>
> > > Hopefully this puts you more at ease with the changes we're making.  If
> > it raises new concerns, please let us know!
>
> > >   ---Mark
>
> > >  <http://twitter.com/mccv>http://twitter.com/mccv
>
> > > On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky <
> > [email protected]> wrote:
>
> > > On 04/05/2010 12:55 AM, Tim Haines wrote:
>
> > > > This made me laugh.  Hard.
>
> > > > On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]>
> > wrote:
>
> > > >> Mark,
>
> > > >> It's extremely important where you have two bots that reply to each
> > > >> others' tweets. With incorrectly sorted tweets, you get conversations
> > > >> that look completely unnatural.
>
> > > >> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote:
> > > >>> Just out of curiosity, what applications are you building that
> > require
> > > >>> sub-second sorting resolution for tweets?
>
> > > Yeah - my bot laughed too ;-)
>
> > > --
> > > M. Edward (Ed) Borasky
> > > borasky-research.net/m-edward-ed-borasky
>
> > > "A mathematician is a device for turning coffee into theorems." ~ Paul
> > Erdős
>
> > > --
>
> > > To unsubscribe, reply using "remove me" as the subject.

[twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Reply via email to