SQL DB Loader?

[email protected] Thu, 04 Nov 2010 09:30:01 -0700

Anze - I just checked that our Sqoop packages do declare the JDK dependency.
Which package did you see as not having this dependency?


Arvind

On Thu, Nov 4, 2010 at 9:25 AM, [email protected] <[email protected]>wrote:

> Sqoop is Java based and you should have JDK 1.6 or higher available on your
> system. We will add this as a dependency for the package.
>
> Regarding accessing MySQL from a cluster - it should not be a problem if
> you control the number of tasks that do that. Sqoop allows you to explicitly
> specify the number of mappers, where each mapper holds a a connection to the
> database and effectively parallelizes  the loading process. Apart from just
> the speed, Sqoop offers many other advantages too such as incremental loads,
> exporting data from HDFS back to the database, automatic creation of Hive
> tables or populating hbase etc.
>
> Arvind
>
> 2010/11/4 Anze <[email protected]>
>
>
>> So Sqoop doesn't require JDK?
>> It seemed weird to me too. Also, if it would require it, then JDK would
>> probably have to be among dependencies of the package Sqoop is in.
>>
>> I started working on DBLoader, but the learning curve seems quite steep
>> and I
>> don't have enough time for it right now. Also, as Ankur said, it might not
>> be
>> a good idea to hit MySQL from the cluster.
>>
>> The ideal solution IMHO would be loading data from MySQL to HDFS from a
>> single
>> machine (but within LoadFunc, of course) and work with the data from there
>> (with schema automatically converted from MySQL). But I don't know enough
>> about Pig to do that kind of thing... yet. :)
>>
>> Anze
>>
>>
>> On Wednesday 03 November 2010, [email protected] wrote:
>> > Sorry that you ran into a problem. Typically, it is usually something
>> like
>> > missing a required option etc that could cause this and if you were to
>> send
>> > a mail to [email protected], you would get prompt assistance.
>> >
>> > Regardless, if you still have any use cases like this, I will be glad to
>> > help you out in using Sqoop for that purpose.
>> >
>> > Arvind
>> >
>> > 2010/11/3 Anze <[email protected]>
>> >
>> > > I tried to run it, got NullPointerException, searched the net, found
>> > > Sqoop requires JDK (instead of JRE) and gave up. I am working on a
>> > > production cluster - so I'd rather not upgrade to JDK if not
>> necessary.
>> > > :)
>> > >
>> > > But I was able export MySQL with a simple bash script:
>> > > **********
>> > > #!/bin/bash
>> > >
>> > > MYSQL_TABLES=( table1 table2 table3 )
>> > > WHERE=/home/hadoop/pig
>> > >
>> > > for i in ${mysql_tabl...@]}
>> > > do
>> > >
>> > >  mysql -BAN -h <mysql_host> -u <username> --password=<pass> <database>
>> \
>> > >
>> > >    -e "select * from $i;" --skip-column-names > $WHERE/$i.csv
>> > >
>> > >  hadoop fs -copyFromLocal $WHERE/$i.csv /pig/mysql/
>> > >  rm $WHERE/$i.csv
>> > >
>> > > done
>> > > **********
>> > >
>> > > Of course, in my case the tables were small enough so I could do it.
>> And
>> > > of course I lost schema in process.
>> > >
>> > > Hope it helps someone else too...
>> > >
>> > > Anze
>> > >
>> > > On Wednesday 03 November 2010, [email protected] wrote:
>> > > > Anze,
>> > > >
>> > > > Did you get a chance to try out Sqoop? If not, I would encourage you
>> to
>> > >
>> > > do
>> > >
>> > > > so. Here is a link to the user
>> > > > guide<http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html>
>> > > > .
>> > > >
>> > > > Sqoop allows you to easily move data across from relational
>> databases
>> > > > and other enterprise systems to HDFS and back.
>> > > >
>> > > > Arvind
>> > > >
>> > > > 2010/11/3 Anze <[email protected]>
>> > > >
>> > > > > Alejandro, thanks for answering!
>> > > > >
>> > > > > I was hoping it could be done directly from Pig, but... :)
>> > > > >
>> > > > > I'll take a look at Sqoop then, and if that doesn't help, I'll
>> just
>> > >
>> > > write
>> > >
>> > > > > a simple batch to export data to TXT/CSV. Thanks for the pointer!
>> > > > >
>> > > > > Anze
>> > > > >
>> > > > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote:
>> > > > > > Not a 100% Pig solution, but you could use Sqoop to get the data
>> in
>> > >
>> > > as
>> > >
>> > > > > > a pre-processing step. And if you want to handle all as single
>> job,
>> > > > > > you
>> > > > >
>> > > > > could
>> > > > >
>> > > > > > use Oozie to create a workflow that does Sqoop and then your Pig
>> > > > > > processing.
>> > > > > >
>> > > > > > Alejandro
>> > > > > >
>> > > > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]>
>> wrote:
>> > > > > > > Hi!
>> > > > > > >
>> > > > > > > Part of data I have resides in MySQL. Is there a loader that
>> > > > > > > would
>> > > > >
>> > > > > allow
>> > > > >
>> > > > > > > loading directly from it?
>> > > > > > >
>> > > > > > > I can't find anything on the net, but it seems to me this must
>> be
>> > > > > > > a
>> > > > >
>> > > > > quite
>> > > > >
>> > > > > > > common problem.
>> > > > > > > I checked piggybank but there is only DBStorage (and no
>> > > > > > > DBLoader).
>> > > > > > >
>> > > > > > > Is some DBLoader out there too?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Anze
>>
>>
>

Re: MySQL / JDBC / SQL DB Loader?

Reply via email to