Hi all,

I'm pleased to announce that Infochimps is making datasets from our massive
scrape of the Twitter corpus available for Chirp Hack day devs.

There's a big opportunity for apps that draw on the historical record and
*structure* of twitter -- apps that require a global perspective and intense
computation. The following are available to mash up against other datasets
from infochimps.org or even just to bootstrap-seed the database for your
Hack Day application.  We also have a 30-machine cluster up to do further
extractions, so if you have something really interesting you'd like to pull
please let me know.

*Reputation Metrics from Reply and Follow graph*s Uses algorithm similar to
pagerank to derive reputation, one using the a_follows_b graph and one using
the a_replies_b graphs
*Reply/retweet/mention graph* Every observed Reply, retweet, or mention seen
in a 1.6B-tweet sample (about 15% of historical record): a_[rel]_b,
user_a_id, user_b_id, tweet_id
*Twitter Users by Background Color* The number of users with each background
color: color code, user count
*Twitter Users by Friends Count *The number of users with a given number of
friends: number of friends, user count
*Twitter Users by Followers Count* The number of users with a given number
of followers: number of followers, user count
*Twitter Users by Created At* The number of users whose accounts were
created in a given month/day/hour along with the earliest seen ID in that
hour: timestamp to month/day/hour, user count
*Smileys* Smiley faces with user, date, tweet_id
*Hashtags* Hashtags with user, date, tweet_id
*TweetUrl* URLs with user, date, tweet_id
*Twitter Users by Location* The number of users in a location string (as
provided by the user in their profile). location, user count
*Stock Tweets* Tweets that include the stock symbol tag convention of
$STOCKNAME or $$. The tweet is listed for each time a tag is used in the
tweet. stock_tweet (resource name), symbol captured, tweet object (all
things in a tweet)
*Stock Prices *Daily stock prices for the NASDAQ, NYSE, AMEX exchanges
1970-now symbol, open, low, close, high, volume

Parameters for what's available:

raw object              size       number of objs
a_follows_b             45.8 GB     1,587,838,568
a_mentions_b            29.5 GB       493,682,309
a_retweets_b             1.6 GB        36,022,061
twitter_user             3.1 GB        43,261,388
tweets                 376.0 GB     1,641,624,381
hashtag                  7.1 GB       139,916,844
smiley                   4.4 GB        99,272,082
tweet_url               29.5 GB       433,278,116

If you'd like access to any of these, or have an idea that needs something
/not/ here, please let me know (f...@infochimps.org).  We're only opening
access to Hack Day devs for now -- but please let us know your ideas so we
can show twitter how much demand there is for aggregated access to data.

best,
flip
@mrflip
512-659-6846

----
http://infochimps.org
Find any dataset in the world


-- 
To unsubscribe, reply using "remove me" as the subject.

Reply via email to