[twitter-dev] Re: Search API: since_id is now unreliable

Doug Williams Tue, 21 Jul 2009 13:46:04 -0700

Chad,Your assessment is spot on.

At the heart of search there are a number of data stores that accept queries
(reads) while at the same time perform writes from an indexer. Heavy load --
large numbers of queries, large number of writes or both, or both -- can
cause the write replication between the indexer and various data stores to
grow inconsistent when a particular data store is blocked on a read.


Unfortunately there is no easy fix for this problem at the moment. The
search team has grown considerably in the last couple of weeks so as they
get up to speed, the feature set and stability of search should continue to
improve.

Thanks,
Doug



On Tue, Jul 21, 2009 at 11:57 AM, Chad Etzel <jazzyc...@gmail.com> wrote:

>
> Hi API Team,
>
> A few of us have been discussing off list a funky behavior we have
> been noticing and now users are starting to notice.
>
> There is a problem for sites/apps like TweetGrid and TweetChat which
> auto-refresh tweets based on the Search API using the since_id. People
> are noticing that these sites are "missing tweets" when compared to
> the search.twitter.com results page for the same query.
>
> We believe what is happening is that the search servers are not
> indexing tweets in a serial manner, and so a tweet with a higher id
> may sneak into a search server and be indexed first before a tweet
> with a lower id. This means that when the since_id is sent back from
> the query (or derived from the first result in the results array),
> using that since_id to refresh the query will miss lower id tweets
> when they finally do get indexed. So the illusion of "missing tweets"
> is created. You can run TweetGrid and TweetChat in separate tabs using
> the same query and see that sometimes the results don't match up
> because of this.
>
> I'll try to give an example to be clear.
>
> Let's say for the sake of simplicity that I'm searching for "twitter"
> and that every 10th tweet in the public timeline matches. So, all
> tweets ending in 0 match my query.
>
> Search server 1 may index:
>
> 20
> 30
> 40
> 60
> 70
>
> (notice missing 50)
>
> At the same time, Search server 2 may index:
>
> 20
> 30
> 40
> 50
>
> (notice hasn't indexed 60 or 70 yet)
>
> I send a query and get a response from Server 1 and get a since_id of
> 70.  On my next request I use that since_id=70 and I'll never see
> tweet 50.  Thus the "missing tweets".
>
> This is quite annoying, especially now that users are noticing and
> complaining to us (the app devs) that are apps are broken.
>
> I cannot think of a good work around for this that would be simple
> enough to implement and be worth the effort.
>
> Is this behavior something anyone else can confirm? Are tweets
> supposed to be indexed/replicated serially by the search servers?
>
> -Chad
>

[twitter-dev] Re: Search API: since_id is now unreliable

Reply via email to