Not complex, just not obvious.  When things are done in an unconventional
way, you need more explaining, unfortunately.
As mentioned before the only difference between what you're doing now and
this is the order of the results.  You return
the top, and sometimes you need the bottom.  Is that really hard to do in a
scalable way?

The disadvantage of not providing this is you now have to buffer, possibly
3200 messages, just to make sure
things are correct.  Also, we now have a potentially large latency (16
calls), to begin processing.

None of this is a huge deal.  It's cool you guys provide an API.  If it
can't be changed, it could be solved with docs.

I'm not whining, I'm just sayin...

Zero


On Fri, Mar 12, 2010 at 1:24 PM, Mark McBride <mmcbr...@twitter.com> wrote:

> Am I missing something regarding the complexity of doing this?
>
> Ruby pseudo-code:
>
> my_unread_tweets = []
> page = 1
> count = 200
> since_id = 123098485120985
>
> while(page_of_tweets = get_tweets("
> http://api.twitter.com/1/statuses/home_timeline.json?page=#{page}&count=#{count}&since_id=#{since_id}";))
> do
>   my_unread_tweets << page_of_tweets
> end
>
> I agree it's more complex than
> get_all_my_tweets_disregarding_the_size_of_the_actual_list_since(since_id)...
> however implementing such a method in a scalable way is pretty rough.
>
>   ---Mark
>
> http://twitter.com/mccv
>
>
>
> On Fri, Mar 12, 2010 at 11:11 AM, Zero Hero <zeroh...@qoobly.com> wrote:
>
>> Brian,
>>
>> Thanks for your reply.  I suspected that the "freshness" was the reason
>> that this was done.  Also the fact that
>> twitter started as a service for humans, and now is being used
>> programatically.
>>
>> However, from an API standpoint this makes no sense.  It's typical to want
>> to crawl forward through a stream
>> without missing anything.  The current API creates a problem with
>> reliability and also baroqueness of implementation.
>> For those people thinking of Twitter as a messaging API, it seems
>> incredibly unnatural to not be able to easily
>> and reliably process things in chronological order without worrying about
>> the rate being slightly too high.  This
>> exhibits itself as "messages dropping" once you have more than 200 in a
>> sample period.  True, you're not
>> dropping messages, but that's the way it'll be perceived.
>>
>> The fact that the ids are non-sequential (for a stream), means that you
>> have to bend over backward to do this
>> simple thing.  Note that the algorithm you give actually has to be
>> altered.  Since the ids are non-sequential, we'll
>> have to backtrack by using the entire previous sequence (-200), and then
>> find the message that is 200 back
>> (it won't be N-200).  So we'll start out with the largest range and then
>> revise it as we discover the newest
>> low water mark.  This fact is hidden by the "simpler numbers" I chose to
>> use.
>>
>> Also note that 3200 >> 200.  So I potentially have to do this backtracking
>> 16 times to get all my (undropped)
>> messages.
>>
>> Anyone who has a decent programming background will think this is lame.
>> People who have less background will simply
>> be confused (I've seen a fair amount of "Twitter drops my tweets" bug
>> reports which could be due to this simple
>> misunderstanding).  Also, If I write out the full algorithm to do reliable
>> forward iteration, I'd bet you'd get a double
>> take from most people.
>>
>> Although I don't know the twitter code, this is really just determined by
>> the sort order of your result set (whether you
>> get the most recent results or least recent).  It would be easy enough to
>> put another switch that gives you the
>> least recent, and default to most recent.  That provides you will the
>> result you want (people automatically get
>> most recent), but allows anyone who needs the ability (most programmers),
>> to scan forward easily.
>>
>> Respectfully,
>>
>> Zero.
>>
>>
>> On Fri, Mar 12, 2010 at 8:47 AM, Brian Smith <br...@briansmith.org>wrote:
>>
>>> Zero wrote:
>>>
>>>> 1. Assume we are at since_id = 1000.  This was the last (highest)
>>>> message id we had previously seen, which we have saved.
>>>> 2. There is a sudden spiked and 2000 tweets come in.
>>>> 3. We now try to query with since_id=1000, count=200 (the max).
>>>> Unfortunately, we have missed
>>>>     1800 tweets, because we only get the most recent 200 tweets.
>>>>
>>>>
>>> In step 3, you will get the 200 newest statuses, statuses 2801-3000. If
>>> you want 200 most recent statuses that are older than the ones you just got
>>> (that is, you want statuses 2601-2800), then you can query using
>>> max_id=2800, count=200, since_id=1000. You can keep doing this until Twitter
>>> returns zero tweets (which means it is refusing to give you any older
>>> tweets) or until Twitter returns the tweet with id=1000.
>>>
>>> (Note: You might be tempted to set since_id=1001 in order to avoid
>>> downloading the tweet with id 1000 twice; however, doing so will just cause
>>> problems and complications, and I don't recommend it.)
>>>
>>> Twitter is designed to be about what is happening "right now," and not so
>>> much to be about everything that happened between the last time you checked
>>> (could be weeks ago) and right now. That's why there's no API call to get
>>> new tweets oldest-first, and that's why you can't even get access to tweets
>>> older than the most recent ~3000 or so.
>>>
>>> Although there are Twitter users that really want to read every tweet in
>>> their timelines, Twitter's design--especially the website UI--doesn't
>>> facilitate that behavior. If you are developing an end-user client, be aware
>>> that the user probably doesn't want to read every tweet and almost
>>> definitely doesn't want to wait for dozens of API calls to complete before
>>> they see the refreshed timeline. I recommend optimizing apps for showing
>>> what's happening right now, whenever it is practical to do so. When I first
>>> started using Twitter I treated it more like a self-organizing forum for
>>> having conversations with people (so reading every tweet would be
>>> important), but I gave up as Twitter simply doesn't work well for that now.
>>>
>>> Regards,
>>> Brian
>>>
>>
>>
>

Reply via email to