Re: [twitter-dev] Infochimps Datasets available for Hack Day: drawn from 1.6B tweets, 40M+ users+reputation, ~0.5B reply links, more!
- "Philip (flip) Kromer" wrote: > Hi all, > > I'm pleased to announce that Infochimps is making datasets from our > massive scrape of the Twitter corpus available for Chirp Hack day > devs. > > There's a big opportunity for apps that draw on the historical record > and *structure* of twitter -- apps that require a global perspective > and intense computation. The following are available to mash up > against other datasets from infochimps.org or even just to > bootstrap-seed the database for your Hack Day application. We also > have a 30-machine cluster up to do further extractions, so if you have > something really interesting you'd like to pull please let me know. > > Reputation Metrics from Reply and Follow graph s Uses algorithm > similar to pagerank to derive reputation, one using the a_follows_b > graph and one using the a_replies_b graphs > Reply/retweet/mention graph Every observed Reply, retweet, or mention > seen in a 1.6B-tweet sample (about 15% of historical record): > a_[rel]_b, user_a_id, user_b_id, tweet_id > Twitter Users by Background Color The number of users with each > background color: color code, user count > Twitter Users by Friends Count The number of users with a given number > of friends: number of friends, user count > Twitter Users by Followers Count The number of users with a given > number of followers: number of followers, user count > Twitter Users by Created At The number of users whose accounts were > created in a given month/day/hour along with the earliest seen ID in > that hour: timestamp to month/day/hour, user count > Smileys Smiley faces with user, date, tweet_id > Hashtags Hashtags with user, date, tweet_id > TweetUrl URLs with user, date, tweet_id > Twitter Users by Location The number of users in a location string (as > provided by the user in their profile). location, user count > Stock Tweets Tweets that include the stock symbol tag convention of > $STOCKNAME or $$. The tweet is listed for each time a tag is used in > the tweet. stock_tweet (resource name), symbol captured, tweet object > (all things in a tweet) > Stock Prices Daily stock prices for the NASDAQ, NYSE, AMEX exchanges > 1970-now symbol, open, low, close, high, volume > > Parameters for what's available: > > raw object size number of objs > a_follows_b 45.8 GB 1,587,838,568 > a_mentions_b 29.5 GB 493,682,309 > a_retweets_b 1.6 GB 36,022,061 > twitter_user 3.1 GB 43,261,388 > tweets 376.0 GB 1,641,624,381 > hashtag 7.1 GB 139,916,844 > smiley 4.4 GB 99,272,082 > tweet_url 29.5 GB 433,278,116 > > If you'd like access to any of these, or have an idea that needs > something /not/ here, please let me know ( f...@infochimps.org ). > We're only opening access to Hack Day devs for now -- but please let > us know your ideas so we can show twitter how much demand there is for > aggregated access to data. > > best, > flip > @mrflip > 512-659-6846 > > > http://infochimps.org > Find any dataset in the world This is too short notice for me to be able to come up with a use for these data. But for the future, do you by any chance have access to *intraday futures and options* time series? Daily stock data are more or less useless. -- To unsubscribe, reply using "remove me" as the subject.
Re: [twitter-dev] Infochimps Datasets available for Hack Day: drawn from 1.6B tweets, 40M+ users+reputation, ~0.5B reply links, more!
Hi flip, I'm hacking at chirp and am interested in a follows b, hashtags, and urls. What would be extra sweet would be timestamps on the follow relations, if you crawl the same person over time and we can see how that network evolves. Thanks for making the data available. Kovas Infoharmoni On Apr 14, 2010, at 1:51 PM, "Philip (flip) Kromer" wrote: Hi all, I'm pleased to announce that Infochimps is making datasets from our massive scrape of the Twitter corpus available for Chirp Hack day devs. There's a big opportunity for apps that draw on the historical record and *structure* of twitter -- apps that require a global perspective and intense computation. The following are available to mash up against other datasets from infochimps.org or even just to bootstrap-seed the database for your Hack Day application. We also have a 30-machine cluster up to do further extractions, so if you have something really interesting you'd like to pull please let me know. Reputation Metrics from Reply and Follow graphs Uses algorithm similar to pagerank to derive reputation, one using the a_follows_b graph and one using the a_replies_b graphs Reply/retweet/mention graph Every observed Reply, retweet, or mention seen in a 1.6B-tweet sample (about 15% of historical record): a_[rel]_b, user_a_id, user_b_id, tweet_id Twitter Users by Background Color The number of users with each background color: color code, user count Twitter Users by Friends Count The number of users with a given number of friends: number of friends, user count Twitter Users by Followers Count The number of users with a given number of followers: number of followers, user count Twitter Users by Created At The number of users whose accounts were created in a given month/day/hour along with the earliest seen ID in that hour: timestamp to month/day/hour, user count Smileys Smiley faces with user, date, tweet_id Hashtags Hashtags with user, date, tweet_id TweetUrl URLs with user, date, tweet_id Twitter Users by Location The number of users in a location string (as provided by the user in their profile). location, user count Stock Tweets Tweets that include the stock symbol tag convention of $STOCKNAME or $$. The tweet is listed for each time a tag is used in the tweet. stock_tweet (resource name), symbol captured, tweet object (all things in a tweet) Stock Prices Daily stock prices for the NASDAQ, NYSE, AMEX exchanges 1970-now symbol, open, low, close, high, volume Parameters for what's available: raw object size number of objs a_follows_b 45.8 GB 1,587,838,568 a_mentions_b29.5 GB 493,682,309 a_retweets_b 1.6 GB36,022,061 twitter_user 3.1 GB43,261,388 tweets 376.0 GB 1,641,624,381 hashtag 7.1 GB 139,916,844 smiley 4.4 GB99,272,082 tweet_url 29.5 GB 433,278,116 If you'd like access to any of these, or have an idea that needs something /not/ here, please let me know (f...@infochimps.org). We're only opening access to Hack Day devs for now -- but please let us know your ideas so we can show twitter how much demand there is for aggregated access to data. best, flip @mrflip 512-659-6846 http://infochimps.org Find any dataset in the world