I am of course not talking about the business side of things-  however
I am searching for solid information regarding the following.

- We want to access as much public tweets as possible (for a 3 man
startup working on userfriendly analytics apps) - Do I understand
correctly that access to the firehose stream would provide us with the
necessary data- and is getting access to that stream feasible (for a

The Firehose Stream will provide you the most data you can get outside of Twitter. It consists of all public tweets from users deemed not to be on Twitter's list of "low quality users." It may not have everything "necessary" to satisfy your customers' needs, but it's literally "all you can get" without being Twitter or having some other arrangement (like a court order.) ;-)

Is getting the Firehose feasible for a startup? At one time Twitter had a program for that and I don't recall a public announcement that the program had been cancelled, so unless Twitter is ready to state otherwise, I'd say it's feasible. But it's a case by case basis, so send them a private email.

- Any best practices people would like to share? Most 'knowledge'
would be inferred using keywords.

If you *know* up front the keywords, rather than connecting to the Firehose and doing all the filtering yourself, you should connect to one of the "Filter" streams. If you're finding the keywords from a statistical sample of tweets, you may be able to get a big enough sample using a "Spritzer" or "Gardenhose" level sample rather than the full Firehose.

Another way to make your life easier would be if you have a list of *users* that generate a representative sample of the tweets you wish to analyze. The "Follow", "Shadow" or "Bird Dog" filters will let you collect all of the tweets from a set of users. IIRC you can "follow" at least 5000 users this way.

