Hi all,

I'm pleased to announce that Infochimps is making datasets from our massive
scrape of the Twitter corpus available for Chirp Hack day devs.

There's a big opportunity for apps that draw on the historical record and
*structure* of twitter -- apps that require a global perspective and intense
computation. The following are available to mash up against other datasets
from infochimps.org or even just to bootstrap-seed the database for your
Hack Day application.  We also have a 30-machine cluster up to do further
extractions, so if you have something really interesting you'd like to pull
please let me know.

*Reputation Metrics from Reply and Follow graph*s Uses algorithm similar to
pagerank to derive reputation, one using the a_follows_b graph and one using
the a_replies_b graphs
*Reply/retweet/mention graph* Every observed Reply, retweet, or mention seen
in a 1.6B-tweet sample (about 15% of historical record): a_[rel]_b,
user_a_id, user_b_id, tweet_id
*Twitter Users by Background Color* The number of users with each background
color: color code, user count
*Twitter Users by Friends Count *The number of users with a given number of
friends: number of friends, user count
*Twitter Users by Followers Count* The number of users with a given number
of followers: number of followers, user count
*Twitter Users by Created At* The number of users whose accounts were
created in a given month/day/hour along with the earliest seen ID in that
hour: timestamp to month/day/hour, user count
*Smileys* Smiley faces with user, date, tweet_id
*Hashtags* Hashtags with user, date, tweet_id
*TweetUrl* URLs with user, date, tweet_id
*Twitter Users by Location* The number of users in a location string (as
provided by the user in their profile). location, user count
*Stock Tweets* Tweets that include the stock symbol tag convention of
$STOCKNAME or $$. The tweet is listed for each time a tag is used in the
tweet. stock_tweet (resource name), symbol captured, tweet object (all
things in a tweet)
*Stock Prices *Daily stock prices for the NASDAQ, NYSE, AMEX exchanges
1970-now symbol, open, low, close, high, volume

Parameters for what's available:

raw object              size       number of objs
a_follows_b             45.8 GB     1,587,838,568
a_mentions_b            29.5 GB       493,682,309
a_retweets_b             1.6 GB        36,022,061
twitter_user             3.1 GB        43,261,388
tweets                 376.0 GB     1,641,624,381
hashtag                  7.1 GB       139,916,844
smiley                   4.4 GB        99,272,082
tweet_url               29.5 GB       433,278,116

If you'd like access to any of these, or have an idea that needs something
/not/ here, please let me know (f...@infochimps.org).  We're only opening
access to Hack Day devs for now -- but please let us know your ideas so we
can show twitter how much demand there is for aggregated access to data.


Find any dataset in the world

To unsubscribe, reply using "remove me" as the subject.

Reply via email to