The track resource on the Streaming API is intended for just this sort of application. Yes, there will be some over delivery, especially if you intend to logically AND low frequency words with high frequency words. In the end, this is a minor amount of additional bandwidth and processing cost. Processing 1, 10, or 100 per second costs about the same. You should be able to do this volume post-processing at your end on a single core.
Search results will be increasingly filtered and ranked for relevance, which sounds like is not the results that you want. Whitelisting won't prevent this filtering. Additional track terms are not supported by opening additional connections to the Streaming API. Instead, you place more predicates on the same stream. The higher access levels support hundreds of thousands of predicates. Opening many connections to the Streaming API will appear like an attempt to circumvent existing rate limits and you are likely to be banned from all twitter.com access. -John Kalucki http://twitter.com/jkalucki Infrastructure, Twitter Inc. On Thu, Jan 28, 2010 at 6:15 PM, Jason Striegel <[email protected]> wrote: > We started running into rate limiting issues today with one of our > applications that uses the Search API (squawq.com). We're using it to > track user-defined queries for a bunch of folks and provide analytics > on those searches. It seems like developers are being asked to migrate > to the Streaming API, but I'm worried it's going to be _way_ less > efficient than how we're currently using the Search API. > > Most of the terms we are tracking are relatively low volume and > contain complex search "AND" type keyword phrases. ex: ["twitter > development" OR twitterdev OR "twitter api"]. Most of these are low > volume and we can poll a couple times an hour very efficiently. > > The problem is that as we gain more users, the number of these low- > volume terms increases. So a second user might be tracking [coke OR > "coca cola"], and a third user might track ["first lego league" OR > legoleague], and so on. To be able to support this with the Streaming > API we would either have to pull a gigantor amount of tweets in > through the firehose (assuming we had access) and implement another > layer of indexing, or we'd have to set up a stream for each search a > user has created, again pulling in way more data than we do currently, > but also requiring many concurrent connections and needing to do the > join behavior after the fact. > > Long story short, I totally see how the streaming api has made things > super efficient for a number of applications. For our Squawq app, > however, it seems to be the worst possible scenario: way more > bandwidth intensive, requiring more connections to support all the > different searches we are running on behalf of our users, and adding a > huge amount of processing, storage and software complexity to the > process. All for what seemed like a relatively lightweight, low- > bandwidth process with the search api. > > Anyone have any ideas for making the streaming api work well in this > scenario? Can the Twitter team still whitelist search api users that > have this sort of need? > > Thanks in advance for any feedback or recommendations. > @jmstriegel >
