Hi all, I'm pleased to announce that Infochimps is making datasets from our massive scrape of the Twitter corpus available for Chirp Hack day devs.
There's a big opportunity for apps that draw on the historical record and *structure* of twitter -- apps that require a global perspective and intense computation. The following are available to mash up against other datasets from infochimps.org or even just to bootstrap-seed the database for your Hack Day application. We also have a 30-machine cluster up to do further extractions, so if you have something really interesting you'd like to pull please let me know. *Reputation Metrics from Reply and Follow graph*s Uses algorithm similar to pagerank to derive reputation, one using the a_follows_b graph and one using the a_replies_b graphs *Reply/retweet/mention graph* Every observed Reply, retweet, or mention seen in a 1.6B-tweet sample (about 15% of historical record): a_[rel]_b, user_a_id, user_b_id, tweet_id *Twitter Users by Background Color* The number of users with each background color: color code, user count *Twitter Users by Friends Count *The number of users with a given number of friends: number of friends, user count *Twitter Users by Followers Count* The number of users with a given number of followers: number of followers, user count *Twitter Users by Created At* The number of users whose accounts were created in a given month/day/hour along with the earliest seen ID in that hour: timestamp to month/day/hour, user count *Smileys* Smiley faces with user, date, tweet_id *Hashtags* Hashtags with user, date, tweet_id *TweetUrl* URLs with user, date, tweet_id *Twitter Users by Location* The number of users in a location string (as provided by the user in their profile). location, user count *Stock Tweets* Tweets that include the stock symbol tag convention of $STOCKNAME or $$. The tweet is listed for each time a tag is used in the tweet. stock_tweet (resource name), symbol captured, tweet object (all things in a tweet) *Stock Prices *Daily stock prices for the NASDAQ, NYSE, AMEX exchanges 1970-now symbol, open, low, close, high, volume Parameters for what's available: raw object size number of objs a_follows_b 45.8 GB 1,587,838,568 a_mentions_b 29.5 GB 493,682,309 a_retweets_b 1.6 GB 36,022,061 twitter_user 3.1 GB 43,261,388 tweets 376.0 GB 1,641,624,381 hashtag 7.1 GB 139,916,844 smiley 4.4 GB 99,272,082 tweet_url 29.5 GB 433,278,116 If you'd like access to any of these, or have an idea that needs something /not/ here, please let me know ([email protected]). We're only opening access to Hack Day devs for now -- but please let us know your ideas so we can show twitter how much demand there is for aggregated access to data. best, flip @mrflip 512-659-6846 ---- http://infochimps.org Find any dataset in the world -- To unsubscribe, reply using "remove me" as the subject.
