Take a look at the Streaming API:
http://apiwiki.twitter.com/Streaming-API-Documentation
It's very easy to make a simple collection client to pull the
statuses/sample stream and gather a decent sample of all the tweets.
Tell your programmer to hop on the list and ask any questions that come
With the sample stream I got roughly an average of 10 tweets/sec
and roughly 11 words/tweet, but take in count you get the tweets
in multiple languages.
Regards,
Rolando Espinoza La fuente
www.rolandoespinoza.info
On Thu, Feb 11, 2010 at 11:23 PM, Michael Ivey michael.i...@gmail.com wrote:
On 02/10/2010 10:03 PM, mzap wrote:
I am a linguist at the University of Sydney currently studying the
language of microblogging. I would like to build a 100 million word
corpus of tweets. I am trying to determine the best way of collecting
such a corpus. Does Twitter make data available