Anze - I just checked that our Sqoop packages do declare the JDK dependency. Which package did you see as not having this dependency?
Arvind On Thu, Nov 4, 2010 at 9:25 AM, [email protected] <[email protected]>wrote: > Sqoop is Java based and you should have JDK 1.6 or higher available on your > system. We will add this as a dependency for the package. > > Regarding accessing MySQL from a cluster - it should not be a problem if > you control the number of tasks that do that. Sqoop allows you to explicitly > specify the number of mappers, where each mapper holds a a connection to the > database and effectively parallelizes the loading process. Apart from just > the speed, Sqoop offers many other advantages too such as incremental loads, > exporting data from HDFS back to the database, automatic creation of Hive > tables or populating hbase etc. > > Arvind > > 2010/11/4 Anze <[email protected]> > > >> So Sqoop doesn't require JDK? >> It seemed weird to me too. Also, if it would require it, then JDK would >> probably have to be among dependencies of the package Sqoop is in. >> >> I started working on DBLoader, but the learning curve seems quite steep >> and I >> don't have enough time for it right now. Also, as Ankur said, it might not >> be >> a good idea to hit MySQL from the cluster. >> >> The ideal solution IMHO would be loading data from MySQL to HDFS from a >> single >> machine (but within LoadFunc, of course) and work with the data from there >> (with schema automatically converted from MySQL). But I don't know enough >> about Pig to do that kind of thing... yet. :) >> >> Anze >> >> >> On Wednesday 03 November 2010, [email protected] wrote: >> > Sorry that you ran into a problem. Typically, it is usually something >> like >> > missing a required option etc that could cause this and if you were to >> send >> > a mail to [email protected], you would get prompt assistance. >> > >> > Regardless, if you still have any use cases like this, I will be glad to >> > help you out in using Sqoop for that purpose. >> > >> > Arvind >> > >> > 2010/11/3 Anze <[email protected]> >> > >> > > I tried to run it, got NullPointerException, searched the net, found >> > > Sqoop requires JDK (instead of JRE) and gave up. I am working on a >> > > production cluster - so I'd rather not upgrade to JDK if not >> necessary. >> > > :) >> > > >> > > But I was able export MySQL with a simple bash script: >> > > ********** >> > > #!/bin/bash >> > > >> > > MYSQL_TABLES=( table1 table2 table3 ) >> > > WHERE=/home/hadoop/pig >> > > >> > > for i in ${mysql_tabl...@]} >> > > do >> > > >> > > mysql -BAN -h <mysql_host> -u <username> --password=<pass> <database> >> \ >> > > >> > > -e "select * from $i;" --skip-column-names > $WHERE/$i.csv >> > > >> > > hadoop fs -copyFromLocal $WHERE/$i.csv /pig/mysql/ >> > > rm $WHERE/$i.csv >> > > >> > > done >> > > ********** >> > > >> > > Of course, in my case the tables were small enough so I could do it. >> And >> > > of course I lost schema in process. >> > > >> > > Hope it helps someone else too... >> > > >> > > Anze >> > > >> > > On Wednesday 03 November 2010, [email protected] wrote: >> > > > Anze, >> > > > >> > > > Did you get a chance to try out Sqoop? If not, I would encourage you >> to >> > > >> > > do >> > > >> > > > so. Here is a link to the user >> > > > guide<http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html> >> > > > . >> > > > >> > > > Sqoop allows you to easily move data across from relational >> databases >> > > > and other enterprise systems to HDFS and back. >> > > > >> > > > Arvind >> > > > >> > > > 2010/11/3 Anze <[email protected]> >> > > > >> > > > > Alejandro, thanks for answering! >> > > > > >> > > > > I was hoping it could be done directly from Pig, but... :) >> > > > > >> > > > > I'll take a look at Sqoop then, and if that doesn't help, I'll >> just >> > > >> > > write >> > > >> > > > > a simple batch to export data to TXT/CSV. Thanks for the pointer! >> > > > > >> > > > > Anze >> > > > > >> > > > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote: >> > > > > > Not a 100% Pig solution, but you could use Sqoop to get the data >> in >> > > >> > > as >> > > >> > > > > > a pre-processing step. And if you want to handle all as single >> job, >> > > > > > you >> > > > > >> > > > > could >> > > > > >> > > > > > use Oozie to create a workflow that does Sqoop and then your Pig >> > > > > > processing. >> > > > > > >> > > > > > Alejandro >> > > > > > >> > > > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]> >> wrote: >> > > > > > > Hi! >> > > > > > > >> > > > > > > Part of data I have resides in MySQL. Is there a loader that >> > > > > > > would >> > > > > >> > > > > allow >> > > > > >> > > > > > > loading directly from it? >> > > > > > > >> > > > > > > I can't find anything on the net, but it seems to me this must >> be >> > > > > > > a >> > > > > >> > > > > quite >> > > > > >> > > > > > > common problem. >> > > > > > > I checked piggybank but there is only DBStorage (and no >> > > > > > > DBLoader). >> > > > > > > >> > > > > > > Is some DBLoader out there too? >> > > > > > > >> > > > > > > Thanks, >> > > > > > > >> > > > > > > Anze >> >> >
