[twitter-dev] Tweet Corpus creation for NLP research

kanny Wed, 08 Apr 2009 09:21:32 -0700

I am interested to do something deeper than the surface-level
processing of a user's incoming tweets. For this, I will need to
create a corpus of the user's friends_timeline over, say, past one
month or any computationally feasible period. Basically, a large
enough set of, say, 1-100 Million tweets for someone following
100-1000 people. It would be only a one-time download, as afterwards,
incremental downloads should suffice.


This would translate into 100MB-10 GB of download for a user. It could
be less for people following less or less-active people. Does Twitter
API provide support for such corpus creation ? It could be very
helpful for Natural Language Processing research if Twitter creates
some sample corpus of public_timeline or some selected user's
timelines.

Looking forward to some help in this regard.
Thanks

[twitter-dev] Tweet Corpus creation for NLP research

Reply via email to