Take a look at the Streaming API:

It's very easy to make a simple collection client to pull the
statuses/sample stream and gather a decent sample of all the tweets.

Tell your programmer to hop on the list and ask any questions that come
up...we're (usually) a pretty helpful bunch.

 -- ivey

On Thu, Feb 11, 2010 at 12:03 AM, mzap <michele.zappavi...@gmail.com> wrote:

> I am a linguist at the University of Sydney currently studying the
> language of microblogging. I would like to build a 100 million word
> corpus of tweets. I am trying to determine the best way of collecting
> such a corpus. Does Twitter make data available directly or is the
> only method scraping tweets using the API( I am not a programmer
> myself although I do have access to a programmer who is able to use
> the API)?
> If I was to use the API would rate limiting mean that it is going to
> take ages to reach 100 million tweets?
> cheers,
> Michele

