Re: [twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Mark McBride Thu, 08 Apr 2010 16:37:48 -0700

It's a possibility, but by no means a probability.  Note that you can
mitigate this by using the newest tweet that is outside your "danger zone".
 For example in a sequence of tweets t1, t2 ... ti ... tn with creation
times c1, c2 ... ci ... cn and a comfort threshold e you could use since_id
from the latest ti such that c1 - ci > e.


  ---Mark

http://twitter.com/mccv


On Thu, Apr 8, 2010 at 4:27 PM, Naveen <[email protected]> wrote:

> This was my initial concern with the randomly generated ids that I
> brought up, though I think Brian described it better than I.
>
> It simply seems very likely that when using since_id to populate newer
> tweets for the user, that some tweets will never be seen, because the
> since_id of the last message received will be larger than one
> generated 1ms later.
>
> With the random generation of ids, I can see two way guarantee
> delivery of all tweets in a users timeline
> 1. Page forwards and backwards to ensure no tweets generated at or
> near the same time as the newest one did not receive a lower id. This
> will be very expensive for a mobile client not to mention complicate
> any refresh algorithms significantly.
> 2. Given that we know how IDs are generated (i.e. which bits represent
> the time) we can simply over request by decrementing the since_id time
> bits, by a second or two and filter out duplicates. (again, not really
> ideal for mobile clients where battery life is an issue, plus it then
> makes the implementation very dependent on twitters id format
> remaining stable)
>
> Please anyone explain if Brian and I are misinterpreting this as a
> very real possibility of never displaying some tweets in a time line,
> without changing how we request data from twitter (i.e. since_id
> doesn't break)
>
> --Naveen Ayyagari
> @knight9
> @SocialScope
>
>
> On Apr 8, 7:01 pm, "Brian Smith" <[email protected]> wrote:
> > What does “within the caveats given above” mean? Either since_id will
> work or it won’t. It seems to me that if IDs are only in a “rough” order,
> since_id won’t work—in particular, there is a possibility that paging
> through tweets using since_id will completely skip over some tweets.
> >
> > My concern is that, since tweets will not be serialized at the time they
> are written, there will be a race condition between me making a request and
> users posting new statuses. That is, I could get a response with the largest
> id in the response being X that gets evaluated just before a tweet (X-1) has
> been saved in the database; If so, when I issue a request with since_id=X,
> my program will never see the newer tweet (X-1).
> >
> > Are you going to change the implementation of the timeline methods so
> that they never return a tweet with ID X until all nodes in the cluster
> guarantee that they won’t create a new tweet with an ID less than X?
> >
> > I implement the following logic:
> >
> > 1.      Let LATEST start out as the earliest tweet available in the
> user’s timeline.
> >
> > 2.      Make a request with since_id={LATEST}, which returns a set of
> tweets T.
> >
> > 3.      If T is empty then stop.
> >
> > 4.      Let LATEST= max({ id(t), for all t in T}).
> >
> > 5.      Goto 2.
> >
> > Will I be guaranteed not to skip over any tweets in the timeline using
> this logic? If not, what do I need to do to ensure I get them all?
> >
> > Thanks,
> >
> > Brian
> >
> > From: [email protected] [mailto:
> [email protected]] On Behalf Of Mark McBride
> > Sent: Thursday, April 08, 2010 5:10 PM
> > To: [email protected]
> > Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are
> sequenced
> >
> > Thank you for the feedback.  It's great to hear about the variety of use
> cases people have for the API, and in particular all the different ways
> people are using IDs. To alleviate some of the concerns raised in this
> thread we thought it would be useful to give more details about how we plan
> to generate IDs
> >
> > 1) IDs are still 64-bit integers.  This should minimize any migration
> pains.
> >
> > 2) You can still sort on ID.  Within a few millieconds you may get out of
> order results, but for most use cases this shouldn't be an issue.
> >
> > 3) since_id will still work (within the caveats given above).
> >
> > 4) We will provide a way to backfill from the streaming API.
> >
> > 5) You cannot use the generated ID to reverse engineer tweet velocity.
>  Note that you can still use the streaming API to determine the rate of
> public statuses.
> >
> > Additional items of interest
> >
> > 1) At some point we will likely start using this as an ID for direct
> messages too
> >
> > 2) We will almost certainly open source the ID generation code, probably
> before we actually cut over to using it.
> >
> > 3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly
> being within a few ms buckets), opaque 64-bit integers.  We may need to
> change the scheme again at some point in the future, and want to minimize
> migration pains should we need to do this.
> >
> > Hopefully this puts you more at ease with the changes we're making.  If
> it raises new concerns, please let us know!
> >
> >   ---Mark
> >
> >  <http://twitter.com/mccv>http://twitter.com/mccv
> >
> > On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky <
> [email protected]> wrote:
> >
> > On 04/05/2010 12:55 AM, Tim Haines wrote:
> >
> > > This made me laugh.  Hard.
> >
> > > On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]>
> wrote:
> >
> > >> Mark,
> >
> > >> It's extremely important where you have two bots that reply to each
> > >> others' tweets. With incorrectly sorted tweets, you get conversations
> > >> that look completely unnatural.
> >
> > >> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote:
> > >>> Just out of curiosity, what applications are you building that
> require
> > >>> sub-second sorting resolution for tweets?
> >
> > Yeah - my bot laughed too ;-)
> >
> > --
> > M. Edward (Ed) Borasky
> > borasky-research.net/m-edward-ed-borasky
> >
> > "A mathematician is a device for turning coffee into theorems." ~ Paul
> Erdős
> >
> > --
> >
> > To unsubscribe, reply using "remove me" as the subject.
>

Re: [twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Reply via email to