I'm happy to announce a new tool from the Cloudera team.
We often found our customers wanting to import data from RDBMSs so
they could conduct deeper analysis. To facilitate this, we built a
command line tool that allows you to extract data from any JDBC source
and build database-specific extensions to increase performance (we
ship with an improved MySQL extension that leverages mysqldump and
look forward to developing additional extensions with the community).
We affectionately refer to this tool as Sqoop: SQL to Hadoop. Sqoop is
available with the most recent update to Cloudera's Distribution for
Hadoop (http://www.cloudera.com/hadoop) and has been contributed to
Apache as well.
You can use Sqoop to dump tables or entire databases to Hadoop. By
default, it uses DBInputFormat, generates all of the necessary Java
classes to work with your records, and also allows you to import data
directly into Hive.
You can get more details and see a video of Aaron Kimball's
presentation at last month's Hadoop User Group meeting at Y!:
Also, our upcoming intermediate training session in Washington DC will
cover Sqoop usage in detail: http://www.eventbrite.com/event/351945679
Christophe and the Cloudera Team
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training