Sqoop is Java based and you should have JDK 1.6 or higher available on your system. We will add this as a dependency for the package.
Regarding accessing MySQL from a cluster - it should not be a problem if you control the number of tasks that do that. Sqoop allows you to explicitly specify the number of mappers, where each mapper holds a a connection to the database and effectively parallelizes the loading process. Apart from just the speed, Sqoop offers many other advantages too such as incremental loads, exporting data from HDFS back to the database, automatic creation of Hive tables or populating hbase etc. Arvind 2010/11/4 Anze <[email protected]> > > So Sqoop doesn't require JDK? > It seemed weird to me too. Also, if it would require it, then JDK would > probably have to be among dependencies of the package Sqoop is in. > > I started working on DBLoader, but the learning curve seems quite steep and > I > don't have enough time for it right now. Also, as Ankur said, it might not > be > a good idea to hit MySQL from the cluster. > > The ideal solution IMHO would be loading data from MySQL to HDFS from a > single > machine (but within LoadFunc, of course) and work with the data from there > (with schema automatically converted from MySQL). But I don't know enough > about Pig to do that kind of thing... yet. :) > > Anze > > > On Wednesday 03 November 2010, [email protected] wrote: > > Sorry that you ran into a problem. Typically, it is usually something > like > > missing a required option etc that could cause this and if you were to > send > > a mail to [email protected], you would get prompt assistance. > > > > Regardless, if you still have any use cases like this, I will be glad to > > help you out in using Sqoop for that purpose. > > > > Arvind > > > > 2010/11/3 Anze <[email protected]> > > > > > I tried to run it, got NullPointerException, searched the net, found > > > Sqoop requires JDK (instead of JRE) and gave up. I am working on a > > > production cluster - so I'd rather not upgrade to JDK if not necessary. > > > :) > > > > > > But I was able export MySQL with a simple bash script: > > > ********** > > > #!/bin/bash > > > > > > MYSQL_TABLES=( table1 table2 table3 ) > > > WHERE=/home/hadoop/pig > > > > > > for i in ${mysql_tabl...@]} > > > do > > > > > > mysql -BAN -h <mysql_host> -u <username> --password=<pass> <database> > \ > > > > > > -e "select * from $i;" --skip-column-names > $WHERE/$i.csv > > > > > > hadoop fs -copyFromLocal $WHERE/$i.csv /pig/mysql/ > > > rm $WHERE/$i.csv > > > > > > done > > > ********** > > > > > > Of course, in my case the tables were small enough so I could do it. > And > > > of course I lost schema in process. > > > > > > Hope it helps someone else too... > > > > > > Anze > > > > > > On Wednesday 03 November 2010, [email protected] wrote: > > > > Anze, > > > > > > > > Did you get a chance to try out Sqoop? If not, I would encourage you > to > > > > > > do > > > > > > > so. Here is a link to the user > > > > guide<http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html> > > > > . > > > > > > > > Sqoop allows you to easily move data across from relational databases > > > > and other enterprise systems to HDFS and back. > > > > > > > > Arvind > > > > > > > > 2010/11/3 Anze <[email protected]> > > > > > > > > > Alejandro, thanks for answering! > > > > > > > > > > I was hoping it could be done directly from Pig, but... :) > > > > > > > > > > I'll take a look at Sqoop then, and if that doesn't help, I'll just > > > > > > write > > > > > > > > a simple batch to export data to TXT/CSV. Thanks for the pointer! > > > > > > > > > > Anze > > > > > > > > > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote: > > > > > > Not a 100% Pig solution, but you could use Sqoop to get the data > in > > > > > > as > > > > > > > > > a pre-processing step. And if you want to handle all as single > job, > > > > > > you > > > > > > > > > > could > > > > > > > > > > > use Oozie to create a workflow that does Sqoop and then your Pig > > > > > > processing. > > > > > > > > > > > > Alejandro > > > > > > > > > > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]> wrote: > > > > > > > Hi! > > > > > > > > > > > > > > Part of data I have resides in MySQL. Is there a loader that > > > > > > > would > > > > > > > > > > allow > > > > > > > > > > > > loading directly from it? > > > > > > > > > > > > > > I can't find anything on the net, but it seems to me this must > be > > > > > > > a > > > > > > > > > > quite > > > > > > > > > > > > common problem. > > > > > > > I checked piggybank but there is only DBStorage (and no > > > > > > > DBLoader). > > > > > > > > > > > > > > Is some DBLoader out there too? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Anze > >
