It's possible that the track filter is missing something, but there's
probably other misunderstandings that are clouding things.
I don't know how Tweespeed comes up with their numbers, but the
Streaming API only makes available a proportion of all public
statuses. Spam accounts, for example, are filtered out, as are
protected accounts, direct messages, etc. etc. My guess is that
Tweespeed is assuming that status_ids are assigned sequentially and
they are just reporting the velocity of that column.
Your estimate that 40% of tweets contain a link seems more than 2x too
high. You can come up with a very accurate number by collecting a
sampled feed for a few hours or days (there are diurnal and daily
patterns to everything on Twitter) and dividing out. Even 10 minutes
of the default sampled feed (the old "spritzer") will give you an
Without knowing your sample size, day of week, or time of day, I'd say
that your reported matches per minute and limited statuses per minute
are pretty good. I don't think you are missing much, if anything,
other than the statuses reported by the limit message.
As a double check, I just ran a quick test with the highest level of
track and compared the result against the firehose. In a one minute
sample, the track feed had matched the same tweets as the firehose
piped to 'grep -i http'.
Services, Twitter Inc
On Sep 3, 7:23 pm, Zac Witte <zacwi...@gmail.com> wrote:
> I'm not sure the filter is actually catching everything that I'm
> supposedly tracking. There are ~20,000 tweets per minute right now
> according to tweespeed. I'm getting about 1000 tweets/m and skipping
> on average 1500 tweets/m according to the limit notifications. That
> means my filter is matching about 12.5% of all tweets, but I'm
> tracking "http" and supposedly 40% of all tweets contain a link so my
> filter would seem to be missing the majority of all links. Is this
> making sense?