Re: [twitter-dev] Building a 100 million word Twitter corpus

2010-02-11 Thread Michael Ivey
Take a look at the Streaming API: http://apiwiki.twitter.com/Streaming-API-Documentation It's very easy to make a simple collection client to pull the statuses/sample stream and gather a decent sample of all the tweets. Tell your programmer to hop on the list and ask any questions that come

Re: [twitter-dev] Building a 100 million word Twitter corpus

2010-02-11 Thread Rolando Espinoza La Fuente
With the sample stream I got roughly an average of 10 tweets/sec and roughly 11 words/tweet, but take in count you get the tweets in multiple languages. Regards, Rolando Espinoza La fuente www.rolandoespinoza.info On Thu, Feb 11, 2010 at 11:23 PM, Michael Ivey michael.i...@gmail.com wrote:

Re: [twitter-dev] Building a 100 million word Twitter corpus

2010-02-11 Thread M. Edward (Ed) Borasky
On 02/10/2010 10:03 PM, mzap wrote: I am a linguist at the University of Sydney currently studying the language of microblogging. I would like to build a 100 million word corpus of tweets. I am trying to determine the best way of collecting such a corpus. Does Twitter make data available