It's a possibility, but by no means a probability. Note that you can mitigate this by using the newest tweet that is outside your "danger zone". For example in a sequence of tweets t1, t2 ... ti ... tn with creation times c1, c2 ... ci ... cn and a comfort threshold e you could use since_id from the latest ti such that c1 - ci > e.
---Mark http://twitter.com/mccv On Thu, Apr 8, 2010 at 4:27 PM, Naveen <[email protected]> wrote: > This was my initial concern with the randomly generated ids that I > brought up, though I think Brian described it better than I. > > It simply seems very likely that when using since_id to populate newer > tweets for the user, that some tweets will never be seen, because the > since_id of the last message received will be larger than one > generated 1ms later. > > With the random generation of ids, I can see two way guarantee > delivery of all tweets in a users timeline > 1. Page forwards and backwards to ensure no tweets generated at or > near the same time as the newest one did not receive a lower id. This > will be very expensive for a mobile client not to mention complicate > any refresh algorithms significantly. > 2. Given that we know how IDs are generated (i.e. which bits represent > the time) we can simply over request by decrementing the since_id time > bits, by a second or two and filter out duplicates. (again, not really > ideal for mobile clients where battery life is an issue, plus it then > makes the implementation very dependent on twitters id format > remaining stable) > > Please anyone explain if Brian and I are misinterpreting this as a > very real possibility of never displaying some tweets in a time line, > without changing how we request data from twitter (i.e. since_id > doesn't break) > > --Naveen Ayyagari > @knight9 > @SocialScope > > > On Apr 8, 7:01 pm, "Brian Smith" <[email protected]> wrote: > > What does “within the caveats given above” mean? Either since_id will > work or it won’t. It seems to me that if IDs are only in a “rough” order, > since_id won’t work—in particular, there is a possibility that paging > through tweets using since_id will completely skip over some tweets. > > > > My concern is that, since tweets will not be serialized at the time they > are written, there will be a race condition between me making a request and > users posting new statuses. That is, I could get a response with the largest > id in the response being X that gets evaluated just before a tweet (X-1) has > been saved in the database; If so, when I issue a request with since_id=X, > my program will never see the newer tweet (X-1). > > > > Are you going to change the implementation of the timeline methods so > that they never return a tweet with ID X until all nodes in the cluster > guarantee that they won’t create a new tweet with an ID less than X? > > > > I implement the following logic: > > > > 1. Let LATEST start out as the earliest tweet available in the > user’s timeline. > > > > 2. Make a request with since_id={LATEST}, which returns a set of > tweets T. > > > > 3. If T is empty then stop. > > > > 4. Let LATEST= max({ id(t), for all t in T}). > > > > 5. Goto 2. > > > > Will I be guaranteed not to skip over any tweets in the timeline using > this logic? If not, what do I need to do to ensure I get them all? > > > > Thanks, > > > > Brian > > > > From: [email protected] [mailto: > [email protected]] On Behalf Of Mark McBride > > Sent: Thursday, April 08, 2010 5:10 PM > > To: [email protected] > > Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are > sequenced > > > > Thank you for the feedback. It's great to hear about the variety of use > cases people have for the API, and in particular all the different ways > people are using IDs. To alleviate some of the concerns raised in this > thread we thought it would be useful to give more details about how we plan > to generate IDs > > > > 1) IDs are still 64-bit integers. This should minimize any migration > pains. > > > > 2) You can still sort on ID. Within a few millieconds you may get out of > order results, but for most use cases this shouldn't be an issue. > > > > 3) since_id will still work (within the caveats given above). > > > > 4) We will provide a way to backfill from the streaming API. > > > > 5) You cannot use the generated ID to reverse engineer tweet velocity. > Note that you can still use the streaming API to determine the rate of > public statuses. > > > > Additional items of interest > > > > 1) At some point we will likely start using this as an ID for direct > messages too > > > > 2) We will almost certainly open source the ID generation code, probably > before we actually cut over to using it. > > > > 3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly > being within a few ms buckets), opaque 64-bit integers. We may need to > change the scheme again at some point in the future, and want to minimize > migration pains should we need to do this. > > > > Hopefully this puts you more at ease with the changes we're making. If > it raises new concerns, please let us know! > > > > ---Mark > > > > <http://twitter.com/mccv>http://twitter.com/mccv > > > > On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky < > [email protected]> wrote: > > > > On 04/05/2010 12:55 AM, Tim Haines wrote: > > > > > This made me laugh. Hard. > > > > > On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]> > wrote: > > > > >> Mark, > > > > >> It's extremely important where you have two bots that reply to each > > >> others' tweets. With incorrectly sorted tweets, you get conversations > > >> that look completely unnatural. > > > > >> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote: > > >>> Just out of curiosity, what applications are you building that > require > > >>> sub-second sorting resolution for tweets? > > > > Yeah - my bot laughed too ;-) > > > > -- > > M. Edward (Ed) Borasky > > borasky-research.net/m-edward-ed-borasky > > > > "A mathematician is a device for turning coffee into theorems." ~ Paul > Erdős > > > > -- > > > > To unsubscribe, reply using "remove me" as the subject. >
