Take a look at the Streaming API: http://apiwiki.twitter.com/Streaming-API-Documentation
It's very easy to make a simple collection client to pull the statuses/sample stream and gather a decent sample of all the tweets. Tell your programmer to hop on the list and ask any questions that come up...we're (usually) a pretty helpful bunch. -- ivey On Thu, Feb 11, 2010 at 12:03 AM, mzap <[email protected]> wrote: > I am a linguist at the University of Sydney currently studying the > language of microblogging. I would like to build a 100 million word > corpus of tweets. I am trying to determine the best way of collecting > such a corpus. Does Twitter make data available directly or is the > only method scraping tweets using the API( I am not a programmer > myself although I do have access to a programmer who is able to use > the API)? > > If I was to use the API would rate limiting mean that it is going to > take ages to reach 100 million tweets? > > cheers, > Michele >
