Re: [twitter-dev] Building a 100 million word Twitter corpus

Rolando Espinoza La Fuente Thu, 11 Feb 2010 20:34:55 -0800

With the sample stream I got roughly an average of 10 tweets/sec
and roughly 11 words/tweet, but take in count you get the tweets
in multiple languages.


Regards,

Rolando Espinoza La fuente
www.rolandoespinoza.info



On Thu, Feb 11, 2010 at 11:23 PM, Michael Ivey <[email protected]> wrote:
> Take a look at the Streaming
> API: http://apiwiki.twitter.com/Streaming-API-Documentation
> It's very easy to make a simple collection client to pull the
> statuses/sample stream and gather a decent sample of all the tweets.
> Tell your programmer to hop on the list and ask any questions that come
> up...we're (usually) a pretty helpful bunch.
>  -- ivey
>
>
> On Thu, Feb 11, 2010 at 12:03 AM, mzap <[email protected]> wrote:
>>
>> I am a linguist at the University of Sydney currently studying the
>> language of microblogging. I would like to build a 100 million word
>> corpus of tweets. I am trying to determine the best way of collecting
>> such a corpus. Does Twitter make data available directly or is the
>> only method scraping tweets using the API( I am not a programmer
>> myself although I do have access to a programmer who is able to use
>> the API)?
>>
>> If I was to use the API would rate limiting mean that it is going to
>> take ages to reach 100 million tweets?
>>
>> cheers,
>> Michele
>
>

Re: [twitter-dev] Building a 100 million word Twitter corpus

Reply via email to