I have been using Twitter API for research purposes and created an
ngram dataset of a tweet corpus that I have collected over the time. I
want to make this dataset public for research purposes so other
researchers may carry out their own studies without having to create a
similar corpus. I read the ToS and didn't see any explicit statement
that forbids such an action. I just want to be sure that my
interpretation is correct. Could anyone tell me more about this?

The dataset I plan to share is a collection of frequently-used ngram
phrases and their frequencies in my corpus. I don't plan to keep
phrases longer than 5 words. For instance, a sample of the file I plan
to make public is below:

drinking a glass of wine        233
drinking a cup of coffee        398
drinking poison and waiting for 10
drinking a tea without sugar    98

In this case the phrases are 5-grams (they all consist of 5 words/
tokens) and the number bext to them is the number of times they are
observed in my corpus. As far as I can tell I am not redistributing
the content of tweets because these samples contain common phrases
that are already used commonly in daily language and I am merely
releasing their frequency in a sample of tweets.

Thanks in advance for your thoughts.

Amaç Herdağdelen

Twitter developer documentation and resources: https://dev.twitter.com/doc
API updates via Twitter: https://twitter.com/twitterapi
Issues/Enhancements Tracker: https://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 

Reply via email to