On Dec 15, 9:58 am, John Kalucki <j...@twitter.com> wrote: > Bandwidth is likely to only be a small fraction of your total cost when > consuming the firehose. If you want to focus on this small part and ignore > all the other dominating costs, the prudent systems engineer would provision > 2x to 3x daily peak to account for traffic spikes, growth, backlog > retrieval, and to keep latency to a minimum. Not all have such requirements, > though. So, somewhere between 5 and 15 mbit, very very roughly. Your > requirements will certainly vary. > > The filtered and sampled streams are where virtually everyone will wind up. > > -John Kaluckihttp://twitter.com/jkalucki > Services, Twitter Inc.
I'm using the sampled stream at the moment and it's doing most of what I need. It's certainly more than enough for development and testing the algorithms. The filter stream, on the other hand, seems next to useless to me when compared with the stream coming out of Twitter search. For one thing, I do a lot of location-based processing. I'm quite interested in what's happening in Portland, Oregon, and not so much about the rest of the world. As far as I can tell, there's no geocode parameter for "filter". In addition, I can do a search back in time with Twitter search - with filter, if I don't know what I'm looking for, it's going to go right by me. ;-) But really, I'm much more concerned about legal issues with the firehose than I am with technical issues. There are "resellers" of firehose data now. They have an advantage over random developers like myself, because they have a business relationship with Twitter and I don't. I can't make a credible business plan without knowing what I will and will not be able to legally do with firehose data, or how much it will cost me for access. -- M. Edward (Ed) Borasky http://borasky-research.net "I've always regarded nature as the clothing of God." ~Alan Hovhaness