The FEs keep a circular buffer of the last 150,000 tweets. The count
parameter controls how much of that buffer is examined to create the
historical dump before transitioning to live streaming. If the current tps
rate is, say, 600, then the buffer holds the last 250 seconds worth of
tweets. With a count of 150,000, a firehose stream would receive all 150,000
tweets, and then the very next live tweet, effectively masking all
disconnects of up to say 249 seconds. A filtered stream gets the exact same
coverage, but, instead, you only receive those tweets that match on at least
one predicate.

Yes, it's hard for a non-firehose consumer to estimate the optimal count
size. However, given a highly selective predicate, there's usually little
harm in requesting too much. Perhaps just request the full historical result
set,  and dedup the overlap. This request-it-all approach is less practical
for firehose and other high-volume streams, as receiving and parsing through
the duplicates adds latency to the first non-duplicated tweet.

A negative count returns only the historical result set and does not
transition to live streaming. In this case, your HTTP client should see a
TCP close and exit gracefully immediately after receiving any matching
historical tweets. The whole transaction should take perhaps 60ms (west
coast + speed of light).

Other than the 10 minute hang (which is probably your client's default TCP
socket timeout setting), your scenarios describe the desired behavior. I
suspect that your client isn't detecting a TCP close in a timely manner.
This flaw will lead to data loss when connections are cycled on our end. I
strongly encourage all clients to detect a TCP close and reconnect within a
few tens to hundreds of milliseconds.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.


On Thu, Mar 4, 2010 at 1:14 AM, Jonathan Strauss <
jonat...@snowballfactory.com> wrote:

> First of all John, that may be the best Saturday night reply ever :-).
>
> We are trying to use the count parameter with the follow predicate on
> an account with shadow access role and have been getting some curious
> responses when testing.
>
> Here is a brief description of the testing scenarios:
> * following a single Twitter ID
> * using a count parameter of -1000
> * tweeting from the Twitter ID being followed and then immediately
> starting the connection described
> Scenario A - if the connection is started within 1-2 seconds of the
> tweet, it will show up in the historical results and then the
> connection will hang for ~10min before disconnecting
> Scenario B - if the connection is started >10 seconds from the tweet,
> it will not show up in the historical results
>
> Questions:
> 1) In reading "On filtered streams, the number requested is the number
> of statuses that are applied to the filter predicate, and not the
> number of statuses returned." from
> http://apiwiki.twitter.com/Streaming-API-Documentation#count,
> are we to understand that the count parameter for the follow predicate
> should be keying off the expected volume of *all* tweets in the
> reconnect period, not just the ones from users we're following?
> 2) If that is the case, won't our count parameter always need to be a
> function of total Streaming API tweet volume as opposed to anything we
> can measure within our app?
> 3) And finally, what would be the explanation for the hang we see in
> testing Scenario A above?
>
> Thanks,
> -jonathan
>
> =====
> Jonathan Strauss, Co-Founder
> http://snowballfactory.com
>
> Campaign tracking for social media - http://awe.sm
> A smarter way to update Facebook from Twitter - http://tweetpo.st
> Sharecount button for Facebook - http://www.fbshare.me
>
> On Feb 27, 8:31 pm, John Kalucki <j...@twitter.com> wrote:
> > Each developer will come to understand Fullness in a unique
> inner-directed
> > manner. One might decide that exhausting the predicate list constitutes
> > adequate Fullness. Another might decide that data loss becomes
> unacceptable
> > at another point, perhaps due to the rapid cycling. A third might develop
> > another Fullness heuristic. We should not judge their reasons, rather
> their
> > reasoning and purity of motive. And their careful adherence to the
> > connection guidelines, as offered in the Wiki of Truth.
> >
> >
> >
> > On Sat, Feb 27, 2010 at 5:14 AM, Alam Sher <alamshe...@gmail.com> wrote:
> > > Okay, great.
> >
> > > When we say a default access account or elevated access is "TOO FULL".
> Does
> > > that mean, we have started getting rate limit messages in stream? Or it
> is
> > > something else?
> >
> > > Thanks,
> > > Alam Sher
> >
> > > On Sat, Feb 27, 2010 at 2:31 AM, John Kalucki <j...@twitter.com>
> wrote:
> >
> > >> The elevated access account can reconnect much less frequently by
> adding
> > >> new predicates to a default access stream that cycles based on demand.
> When
> > >> the default access account cycles, very little data will be lost, as
> it
> > >> receives a small fraction of your total feed. Once the default access
> > >> account is too full, the elevated access account can be restarted with
> the
> > >> current predicates.
> >
> > >> -John Kalucki
> > >>http://twitter.com/jkalucki
> > >> Infrastructure, Twitter Inc.
> >
> > >> On Thu, Feb 25, 2010 at 12:25 PM, Alam Sher <alamshe...@gmail.com>
> wrote:
> >
> > >>> Sorry, but exactly this portion of the documentations goes above my
> head.
> >
> > >>> Can you please explain a bit more to me how a default access account
> can
> > >>> be used along with the elevated access account to minimize the data
> loss?
> >
> > >>> Thanks,
> > >>> Alam Sher
> >
> > >>> On Thu, Feb 25, 2010 at 7:15 PM, John Kalucki <j...@twitter.com>
> wrote:
> >
> > >>>> Yes, this is indeed what you should be doing. If you have a low
> > >>>> tolerance for data loss, you will then use a total of four accounts:
> 2
> > >>>> elevated and 2 default access accounts. If you can tolerate a few
> missing
> > >>>> tweets on each reconnect, you can just use the two elevated
> accounts.
> >
> > >>>> -John Kalucki
> >
> > >>>>http://twitter.com/jkalucki
> > >>>> Infrastructure, Twitter Inc.
> >
> > >>>> On Thu, Feb 25, 2010 at 2:06 AM, Alam Sher <alamshe...@gmail.com
> >wrote:
> >
> > >>>>> So in case, if I have 20K users and I have to, say track 60K
> keywords
> > >>>>> for them + also have to follow all of them. I should be applying
> for 2
> > >>>>> higher access accounts one for track predicates and other for
> follow
> > >>>>> predicate. Does this make sense?
> >
> > >>>>> Thanks,
> >
> > >>>>> On Feb 25, 8:44 am, John Kalucki <j...@twitter.com> wrote:
> > >>>>> > This technique works for updating any filter predicate. The count
> > >>>>> parameter
> > >>>>> > should work on a shadow account. It won't work on a default
> access
> > >>>>> account.
> > >>>>> > We have a number of very large integrations using this technique
> with
> > >>>>> > Birddog access -- it should scale down to Shadow access just
> fine.
> >
> > >>>>> > The documentation makes it clear which cases are supported and
> which
> > >>>>> ones
> > >>>>> > are not:
> http://apiwiki.twitter.com/Streaming-API-Documentation#count
> >
> > >>>>> > The count parameter isn't supported on track streams for
> > >>>>> computational
> > >>>>> > complexity reasons, and it isn't supported on the default access
> role
> > >>>>> for
> > >>>>> > policy reasons.
> >
> > >>>>> > -John Kaluckihttp://twitter.com/jkalucki
> > >>>>> > Infrastructure, Twitter Inc.
> >
> > >>>>> > On Wed, Feb 24, 2010 at 2:31 PM, Jonathan Strauss <
> >
> > >>>>> > jonat...@snowballfactory.com> wrote:
> > >>>>> > > On Feb 24, 2:06 pm, John Kalucki <j...@twitter.com> wrote:
> > >>>>> > > > The documentation should be pretty clear on this topic. One
> main
> > >>>>> > > connection,
> > >>>>> > > > and perhaps an auxiliary connection to manage query velocity.
> >
> > >>>>> > > Hey John,
> >
> > >>>>> > > Do you recommend this kind of 2 connection setup for updating
> our
> > >>>>> user
> > >>>>> > > list when using the follow predicate?
> >
> > >>>>> > > We've been trying unsuccessfully to use the count parameter
> when
> > >>>>> > > reconnecting to add new users to our follow list. I've found
> > >>>>> several
> > >>>>> > > oblique mentions of the count parameter only working in some
> cases,
> > >>>>> > > but no specifics on how or why.
> >
> > >>>>> > > We currently have shadow role access for the TweetPo.st app.
> We're
> > >>>>> > > trying to update our Streaming API connection when new users
> signup
> > >>>>> > > for TweetPo.st without losing tweets for existing users during
> > >>>>> > > reconnect. Any suggestions on the best way to do this would be
> > >>>>> greatly
> > >>>>> > > appreciated.
> >
> > >>>>> > > Thanks!
> > >>>>> > > -jonathan
> >
> > >>>>> > > =====
> > >>>>> > > Jonathan Strauss, Co-Founder
> > >>>>> > >http://snowballfactory.com
> >
> > >>>>> > > Campaign tracking for social media -http://awe.sm
> > >>>>> > > A smarter way to update Facebook from Twitter -
> http://tweetpo.st
> > >>>>> > > Sharecount button for Facebook -http://www.fbshare.me
> >
> > >>> --
> > >>> _______________
> > >>> Alam Sher Khan
> > >>> +92 331 505 5549
> >
> > > --
> > > _______________
> > > Alam Sher Khan
> > > +92 331 505 5549
>

Reply via email to