Ahh, yes, your workaround is a little better than mine, but it is still a work around and requires changes to how since_id is currently used by what I have assume is most applications. I understand the need for change and am willing to work around it, I can imagine the scalability issues of trying to use a synchronized id for all tweets.
However, I wanted to be clear and feel it should be made obvious that with this change, there is a possibility that a tweet may not be delivered to client if the implementation of how since_id is currently used is not updated to cover the case. I still envision the situation as more likely than you seem to believe and figure as tweet velocity increases, the likelihood will also increase; But I am assuming have better data to support your viewpoint than I and shall defer. --Naveen Ayyagari @knight9 @SocialScope On Apr 8, 7:37 pm, Mark McBride <[email protected]> wrote: > It's a possibility, but by no means a probability. Note that you can > mitigate this by using the newest tweet that is outside your "danger zone". > For example in a sequence of tweets t1, t2 ... ti ... tn with creation > times c1, c2 ... ci ... cn and a comfort threshold e you could use since_id > from the latest ti such that c1 - ci > e. > > ---Mark > > http://twitter.com/mccv > > On Thu, Apr 8, 2010 at 4:27 PM, Naveen <[email protected]> wrote: > > This was my initial concern with the randomly generated ids that I > > brought up, though I think Brian described it better than I. > > > It simply seems very likely that when using since_id to populate newer > > tweets for the user, that some tweets will never be seen, because the > > since_id of the last message received will be larger than one > > generated 1ms later. > > > With the random generation of ids, I can see two way guarantee > > delivery of all tweets in a users timeline > > 1. Page forwards and backwards to ensure no tweets generated at or > > near the same time as the newest one did not receive a lower id. This > > will be very expensive for a mobile client not to mention complicate > > any refresh algorithms significantly. > > 2. Given that we know how IDs are generated (i.e. which bits represent > > the time) we can simply over request by decrementing the since_id time > > bits, by a second or two and filter out duplicates. (again, not really > > ideal for mobile clients where battery life is an issue, plus it then > > makes the implementation very dependent on twitters id format > > remaining stable) > > > Please anyone explain if Brian and I are misinterpreting this as a > > very real possibility of never displaying some tweets in a time line, > > without changing how we request data from twitter (i.e. since_id > > doesn't break) > > > --Naveen Ayyagari > > @knight9 > > @SocialScope > > > On Apr 8, 7:01 pm, "Brian Smith" <[email protected]> wrote: > > > What does “within the caveats given above” mean? Either since_id will > > work or it won’t. It seems to me that if IDs are only in a “rough” order, > > since_id won’t work—in particular, there is a possibility that paging > > through tweets using since_id will completely skip over some tweets. > > > > My concern is that, since tweets will not be serialized at the time they > > are written, there will be a race condition between me making a request and > > users posting new statuses. That is, I could get a response with the largest > > id in the response being X that gets evaluated just before a tweet (X-1) has > > been saved in the database; If so, when I issue a request with since_id=X, > > my program will never see the newer tweet (X-1). > > > > Are you going to change the implementation of the timeline methods so > > that they never return a tweet with ID X until all nodes in the cluster > > guarantee that they won’t create a new tweet with an ID less than X? > > > > I implement the following logic: > > > > 1. Let LATEST start out as the earliest tweet available in the > > user’s timeline. > > > > 2. Make a request with since_id={LATEST}, which returns a set of > > tweets T. > > > > 3. If T is empty then stop. > > > > 4. Let LATEST= max({ id(t), for all t in T}). > > > > 5. Goto 2. > > > > Will I be guaranteed not to skip over any tweets in the timeline using > > this logic? If not, what do I need to do to ensure I get them all? > > > > Thanks, > > > > Brian > > > > From: [email protected] [mailto: > > [email protected]] On Behalf Of Mark McBride > > > Sent: Thursday, April 08, 2010 5:10 PM > > > To: [email protected] > > > Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are > > sequenced > > > > Thank you for the feedback. It's great to hear about the variety of use > > cases people have for the API, and in particular all the different ways > > people are using IDs. To alleviate some of the concerns raised in this > > thread we thought it would be useful to give more details about how we plan > > to generate IDs > > > > 1) IDs are still 64-bit integers. This should minimize any migration > > pains. > > > > 2) You can still sort on ID. Within a few millieconds you may get out of > > order results, but for most use cases this shouldn't be an issue. > > > > 3) since_id will still work (within the caveats given above). > > > > 4) We will provide a way to backfill from the streaming API. > > > > 5) You cannot use the generated ID to reverse engineer tweet velocity. > > Note that you can still use the streaming API to determine the rate of > > public statuses. > > > > Additional items of interest > > > > 1) At some point we will likely start using this as an ID for direct > > messages too > > > > 2) We will almost certainly open source the ID generation code, probably > > before we actually cut over to using it. > > > > 3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly > > being within a few ms buckets), opaque 64-bit integers. We may need to > > change the scheme again at some point in the future, and want to minimize > > migration pains should we need to do this. > > > > Hopefully this puts you more at ease with the changes we're making. If > > it raises new concerns, please let us know! > > > > ---Mark > > > > <http://twitter.com/mccv>http://twitter.com/mccv > > > > On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky < > > [email protected]> wrote: > > > > On 04/05/2010 12:55 AM, Tim Haines wrote: > > > > > This made me laugh. Hard. > > > > > On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]> > > wrote: > > > > >> Mark, > > > > >> It's extremely important where you have two bots that reply to each > > > >> others' tweets. With incorrectly sorted tweets, you get conversations > > > >> that look completely unnatural. > > > > >> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote: > > > >>> Just out of curiosity, what applications are you building that > > require > > > >>> sub-second sorting resolution for tweets? > > > > Yeah - my bot laughed too ;-) > > > > -- > > > M. Edward (Ed) Borasky > > > borasky-research.net/m-edward-ed-borasky > > > > "A mathematician is a device for turning coffee into theorems." ~ Paul > > Erdős > > > > -- > > > > To unsubscribe, reply using "remove me" as the subject.
