With the sample stream I got roughly an average of 10 tweets/sec and roughly 11 words/tweet, but take in count you get the tweets in multiple languages.
Regards, Rolando Espinoza La fuente www.rolandoespinoza.info On Thu, Feb 11, 2010 at 11:23 PM, Michael Ivey <[email protected]> wrote: > Take a look at the Streaming > API: http://apiwiki.twitter.com/Streaming-API-Documentation > It's very easy to make a simple collection client to pull the > statuses/sample stream and gather a decent sample of all the tweets. > Tell your programmer to hop on the list and ask any questions that come > up...we're (usually) a pretty helpful bunch. > -- ivey > > > On Thu, Feb 11, 2010 at 12:03 AM, mzap <[email protected]> wrote: >> >> I am a linguist at the University of Sydney currently studying the >> language of microblogging. I would like to build a 100 million word >> corpus of tweets. I am trying to determine the best way of collecting >> such a corpus. Does Twitter make data available directly or is the >> only method scraping tweets using the API( I am not a programmer >> myself although I do have access to a programmer who is able to use >> the API)? >> >> If I was to use the API would rate limiting mean that it is going to >> take ages to reach 100 million tweets? >> >> cheers, >> Michele > >
