[twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Naveen Thu, 08 Apr 2010 16:27:32 -0700

This was my initial concern with the randomly generated ids that I
brought up, though I think Brian described it better than I.


It simply seems very likely that when using since_id to populate newer
tweets for the user, that some tweets will never be seen, because the
since_id of the last message received will be larger than one
generated 1ms later.

With the random generation of ids, I can see two way guarantee
delivery of all tweets in a users timeline
1. Page forwards and backwards to ensure no tweets generated at or
near the same time as the newest one did not receive a lower id. This
will be very expensive for a mobile client not to mention complicate
any refresh algorithms significantly.
2. Given that we know how IDs are generated (i.e. which bits represent
the time) we can simply over request by decrementing the since_id time
bits, by a second or two and filter out duplicates. (again, not really
ideal for mobile clients where battery life is an issue, plus it then
makes the implementation very dependent on twitters id format
remaining stable)

Please anyone explain if Brian and I are misinterpreting this as a
very real possibility of never displaying some tweets in a time line,
without changing how we request data from twitter (i.e. since_id
doesn't break)

--Naveen Ayyagari
@knight9
@SocialScope


On Apr 8, 7:01 pm, "Brian Smith" <[email protected]> wrote:
> What does “within the caveats given above” mean? Either since_id will work or 
> it won’t. It seems to me that if IDs are only in a “rough” order, since_id 
> won’t work—in particular, there is a possibility that paging through tweets 
> using since_id will completely skip over some tweets.
>
> My concern is that, since tweets will not be serialized at the time they are 
> written, there will be a race condition between me making a request and users 
> posting new statuses. That is, I could get a response with the largest id in 
> the response being X that gets evaluated just before a tweet (X-1) has been 
> saved in the database; If so, when I issue a request with since_id=X, my 
> program will never see the newer tweet (X-1).
>
> Are you going to change the implementation of the timeline methods so that 
> they never return a tweet with ID X until all nodes in the cluster guarantee 
> that they won’t create a new tweet with an ID less than X?
>
> I implement the following logic:
>
> 1.      Let LATEST start out as the earliest tweet available in the user’s 
> timeline.
>
> 2.      Make a request with since_id={LATEST}, which returns a set of tweets 
> T.
>
> 3.      If T is empty then stop.
>
> 4.      Let LATEST= max({ id(t), for all t in T}).
>
> 5.      Goto 2.
>
> Will I be guaranteed not to skip over any tweets in the timeline using this 
> logic? If not, what do I need to do to ensure I get them all?
>
> Thanks,
>
> Brian
>
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Mark McBride
> Sent: Thursday, April 08, 2010 5:10 PM
> To: [email protected]
> Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are 
> sequenced
>
> Thank you for the feedback.  It's great to hear about the variety of use 
> cases people have for the API, and in particular all the different ways 
> people are using IDs. To alleviate some of the concerns raised in this thread 
> we thought it would be useful to give more details about how we plan to 
> generate IDs
>
> 1) IDs are still 64-bit integers.  This should minimize any migration pains.
>
> 2) You can still sort on ID.  Within a few millieconds you may get out of 
> order results, but for most use cases this shouldn't be an issue.  
>
> 3) since_id will still work (within the caveats given above).  
>
> 4) We will provide a way to backfill from the streaming API.
>
> 5) You cannot use the generated ID to reverse engineer tweet velocity.  Note 
> that you can still use the streaming API to determine the rate of public 
> statuses.
>
> Additional items of interest
>
> 1) At some point we will likely start using this as an ID for direct messages 
> too
>
> 2) We will almost certainly open source the ID generation code, probably 
> before we actually cut over to using it.
>
> 3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly being 
> within a few ms buckets), opaque 64-bit integers.  We may need to change the 
> scheme again at some point in the future, and want to minimize migration 
> pains should we need to do this.
>
> Hopefully this puts you more at ease with the changes we're making.  If it 
> raises new concerns, please let us know!
>
>   ---Mark
>
>  <http://twitter.com/mccv>http://twitter.com/mccv
>
> On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky <[email protected]> 
> wrote:
>
> On 04/05/2010 12:55 AM, Tim Haines wrote:
>
> > This made me laugh.  Hard.
>
> > On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]> wrote:
>
> >> Mark,
>
> >> It's extremely important where you have two bots that reply to each
> >> others' tweets. With incorrectly sorted tweets, you get conversations
> >> that look completely unnatural.
>
> >> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote:
> >>> Just out of curiosity, what applications are you building that require
> >>> sub-second sorting resolution for tweets?
>
> Yeah - my bot laughed too ;-)
>
> --
> M. Edward (Ed) Borasky
> borasky-research.net/m-edward-ed-borasky
>
> "A mathematician is a device for turning coffee into theorems." ~ Paul Erdős
>
> --
>
> To unsubscribe, reply using "remove me" as the subject.

[twitter-dev] Re: Upcoming changes to the way status IDs are sequenced

Reply via email to