What does “within the caveats given above” mean? Either since_id will work or
it won’t. It seems to me that if IDs are only in a “rough” order, since_id
won’t work—in particular, there is a possibility that paging through tweets
using since_id will completely skip over some tweets.
My concern is that, since tweets will not be serialized at the time they are
written, there will be a race condition between me making a request and users
posting new statuses. That is, I could get a response with the largest id in
the response being X that gets evaluated just before a tweet (X-1) has been
saved in the database; If so, when I issue a request with since_id=X, my
program will never see the newer tweet (X-1).
Are you going to change the implementation of the timeline methods so that they
never return a tweet with ID X until all nodes in the cluster guarantee that
they won’t create a new tweet with an ID less than X?
I implement the following logic:
1. Let LATEST start out as the earliest tweet available in the user’s
timeline.
2. Make a request with since_id={LATEST}, which returns a set of tweets T.
3. If T is empty then stop.
4. Let LATEST= max({ id(t), for all t in T}).
5. Goto 2.
Will I be guaranteed not to skip over any tweets in the timeline using this
logic? If not, what do I need to do to ensure I get them all?
Thanks,
Brian
From: [email protected]
[mailto:[email protected]] On Behalf Of Mark McBride
Sent: Thursday, April 08, 2010 5:10 PM
To: [email protected]
Subject: Re: [twitter-dev] Re: Upcoming changes to the way status IDs are
sequenced
Thank you for the feedback. It's great to hear about the variety of use cases
people have for the API, and in particular all the different ways people are
using IDs. To alleviate some of the concerns raised in this thread we thought
it would be useful to give more details about how we plan to generate IDs
1) IDs are still 64-bit integers. This should minimize any migration pains.
2) You can still sort on ID. Within a few millieconds you may get out of order
results, but for most use cases this shouldn't be an issue.
3) since_id will still work (within the caveats given above).
4) We will provide a way to backfill from the streaming API.
5) You cannot use the generated ID to reverse engineer tweet velocity. Note
that you can still use the streaming API to determine the rate of public
statuses.
Additional items of interest
1) At some point we will likely start using this as an ID for direct messages
too
2) We will almost certainly open source the ID generation code, probably before
we actually cut over to using it.
3) We STRONGLY suggest that you treat IDs as roughly sorted (roughly being
within a few ms buckets), opaque 64-bit integers. We may need to change the
scheme again at some point in the future, and want to minimize migration pains
should we need to do this.
Hopefully this puts you more at ease with the changes we're making. If it
raises new concerns, please let us know!
---Mark
<http://twitter.com/mccv> http://twitter.com/mccv
On Mon, Apr 5, 2010 at 4:18 PM, M. Edward (Ed) Borasky <[email protected]>
wrote:
On 04/05/2010 12:55 AM, Tim Haines wrote:
> This made me laugh. Hard.
>
> On Fri, Apr 2, 2010 at 6:47 AM, Dewald Pretorius <[email protected]> wrote:
>
>> Mark,
>>
>> It's extremely important where you have two bots that reply to each
>> others' tweets. With incorrectly sorted tweets, you get conversations
>> that look completely unnatural.
>>
>> On Apr 1, 1:39 pm, Mark McBride <[email protected]> wrote:
>>> Just out of curiosity, what applications are you building that require
>>> sub-second sorting resolution for tweets?
Yeah - my bot laughed too ;-)
--
M. Edward (Ed) Borasky
borasky-research.net/m-edward-ed-borasky
"A mathematician is a device for turning coffee into theorems." ~ Paul Erdős
--
To unsubscribe, reply using "remove me" as the subject.