Hadoop Fans, I'm happy to announce a new tool from the Cloudera team.
We often found our customers wanting to import data from RDBMSs so they could conduct deeper analysis. To facilitate this, we built a command line tool that allows you to extract data from any JDBC source and build database-specific extensions to increase performance (we ship with an improved MySQL extension that leverages mysqldump and look forward to developing additional extensions with the community). We affectionately refer to this tool as Sqoop: SQL to Hadoop. Sqoop is available with the most recent update to Cloudera's Distribution for Hadoop (http://www.cloudera.com/hadoop) and has been contributed to Apache as well. You can use Sqoop to dump tables or entire databases to Hadoop. By default, it uses DBInputFormat, generates all of the necessary Java classes to work with your records, and also allows you to import data directly into Hive. You can get more details and see a video of Aaron Kimball's presentation at last month's Hadoop User Group meeting at Y!: http://www.cloudera.com/blog/2009/06/01/introducing-sqoop/ Also, our upcoming intermediate training session in Washington DC will cover Sqoop usage in detail: http://www.eventbrite.com/event/351945679 Cheers, Christophe and the Cloudera Team -- get hadoop: cloudera.com/hadoop online training: cloudera.com/hadoop-training blog: cloudera.com/blog twitter: twitter.com/cloudera