Hi, I also have a question regarding throttling of the streaming API when tracking keywords.
We are successfully tracking keywords and reading messages, but would like to know when our query is too broad, and we are not receiving all the messages, so that we can back off. We would prefer to be getting all the messages for a finer-grained query than most of the messages for a broader one. Is it possible for the client to tell whether its query is being throttled? I checked the rate-limit data on the returned statuses, but these didn't seem to give useful information for the streaming API - I guess they only give data about GET requests to other APIs. We are using the default access level. regards, Robert On Sep 4, 4:20 am, John Kalucki <jkalu...@gmail.com> wrote: > Zac, > > It's possible that the trackfilteris missing something, but there's > probably other misunderstandings that are clouding things. > > I don't know how Tweespeed comes up with their numbers, but theStreamingAPI > only makes available a proportion of all public > statuses. Spam accounts, for example, are filtered out, as are > protected accounts, direct messages, etc. etc. My guess is that > Tweespeed is assuming that status_ids are assigned sequentially and > they are just reporting the velocity of that column. > > Your estimate that 40% of tweets contain a link seems more than 2x too > high. You can come up with a very accurate number by collecting a > sampled feed for a few hours or days (there are diurnal and daily > patterns to everything on Twitter) and dividing out. Even 10 minutes > of the default sampled feed (the old "spritzer") will give you an > idea. > > Without knowing your sample size, day of week, or time of day, I'd say > that your reported matches per minute and limited statuses per minute > are pretty good. I don't think you are missing much, if anything, > other than the statuses reported by the limit message. > > As a double check, I just ran a quick test with the highest level of > track and compared the result against the firehose. In a one minute > sample, the track feed had matched the same tweets as the firehose > piped to 'grep -i http'. > > -John Kaluckihttp://twitter.com/jkalucki > Services, Twitter Inc > > On Sep 3, 7:23 pm, Zac Witte <zacwi...@gmail.com> wrote: > > > I'm not sure thefilteris actually catching everything that I'm > > supposedly tracking. There are ~20,000 tweets per minute right now > > according to tweespeed. I'm getting about 1000 tweets/m and skipping > > on average 1500 tweets/m according to the limit notifications. That > > means myfilteris matching about 12.5% of all tweets, but I'm > > tracking "http" and supposedly 40% of all tweets contain a link so my > >filterwould seem to be missing the majority of all links. Is this > > making sense?