I am a linguist at the University of Sydney currently studying the language of microblogging. I would like to build a 100 million word corpus of tweets. I am trying to determine the best way of collecting such a corpus. Does Twitter make data available directly or is the only method scraping tweets using the API( I am not a programmer myself although I do have access to a programmer who is able to use the API)?
If I was to use the API would rate limiting mean that it is going to take ages to reach 100 million tweets? cheers, Michele
