Hi flip,
I'm hacking at chirp and am interested in a follows b, hashtags, and
urls. What would be extra sweet would be timestamps on the follow
relations, if you crawl the same person over time and we can see how
that network evolves. Thanks for making the data available.
Kovas
Infoharmoni
On Apr 14, 2010, at 1:51 PM, "Philip (flip) Kromer"
<[email protected]> wrote:
Hi all,
I'm pleased to announce that Infochimps is making datasets from our
massive scrape of the Twitter corpus available for Chirp Hack day
devs.
There's a big opportunity for apps that draw on the historical
record and *structure* of twitter -- apps that require a global
perspective and intense computation. The following are available to
mash up against other datasets from infochimps.org or even just to
bootstrap-seed the database for your Hack Day application. We also
have a 30-machine cluster up to do further extractions, so if you
have something really interesting you'd like to pull please let me
know.
Reputation Metrics from Reply and Follow graphs Uses algorithm
similar to pagerank to derive reputation, one using the a_follows_b
graph and one using the a_replies_b graphs
Reply/retweet/mention graph Every observed Reply, retweet, or
mention seen in a 1.6B-tweet sample (about 15% of historical
record): a_[rel]_b, user_a_id, user_b_id, tweet_id
Twitter Users by Background Color The number of users with each
background color: color code, user count
Twitter Users by Friends Count The number of users with a given
number of friends: number of friends, user count
Twitter Users by Followers Count The number of users with a given
number of followers: number of followers, user count
Twitter Users by Created At The number of users whose accounts were
created in a given month/day/hour along with the earliest seen ID in
that hour: timestamp to month/day/hour, user count
Smileys Smiley faces with user, date, tweet_id
Hashtags Hashtags with user, date, tweet_id
TweetUrl URLs with user, date, tweet_id
Twitter Users by Location The number of users in a location string
(as provided by the user in their profile). location, user count
Stock Tweets Tweets that include the stock symbol tag convention of
$STOCKNAME or $$. The tweet is listed for each time a tag is used in
the tweet. stock_tweet (resource name), symbol captured, tweet
object (all things in a tweet)
Stock Prices Daily stock prices for the NASDAQ, NYSE, AMEX exchanges
1970-now symbol, open, low, close, high, volume
Parameters for what's available:
raw object size number of objs
a_follows_b 45.8 GB 1,587,838,568
a_mentions_b 29.5 GB 493,682,309
a_retweets_b 1.6 GB 36,022,061
twitter_user 3.1 GB 43,261,388
tweets 376.0 GB 1,641,624,381
hashtag 7.1 GB 139,916,844
smiley 4.4 GB 99,272,082
tweet_url 29.5 GB 433,278,116
If you'd like access to any of these, or have an idea that needs
something /not/ here, please let me know ([email protected]).
We're only opening access to Hack Day devs for now -- but please let
us know your ideas so we can show twitter how much demand there is
for aggregated access to data.
best,
flip
@mrflip
512-659-6846
----
http://infochimps.org
Find any dataset in the world