I imagine writing a Pig 0.7+ loader for the Sqoop files would be pretty
easy, since iirc Sqoop does generate an input format for you.

Good project for someone looking to get started in contributing to Pig ...
:)

-D

2010/11/4 Anze <[email protected]>

> Hi Arvind!
>
> Should we take this discussion off the list? It is not really Pig-related
> anymore... Not sure what the custom is around here. :)
>
> > > process. Apart from just the speed, Sqoop offers many other advantages
> > > too such as incremental loads, exporting data from HDFS back to the
> > > database, automatic creation of Hive tables or populating hbase etc.
>
> Only Pig is missing then... >:-D
> Sorry, couldn't hold that back... ;)
>
> I would love to use Sqoop for another task (periodically importing MySQL
> tables to HBase) if schema gets more or less preserved, however I don't
> dare
> upgrade JRE to JDK at the moment in fear of breaking things.
>
> > Anze - I just checked that our Sqoop packages do declare the JDK
> > dependency. Which package did you see as not having this dependency?
>
> We are using:
> -----
> deb http://archive.cloudera.com/debian lenny-cdh3b1 contrib
> -----
> But there is no sqoop package per se, I guess it is part of hadoop package:
> -----
> $ aptitude show hadoop-0.20 | grep Depends
> Depends: adduser, sun-java6-jre, sun-java6-bin
> -----
> $ aptitude search sun-java6 | grep "jdk\|jre"
> p   sun-java6-jdk                   - Sun Java(TM) Development Kit (JDK) 6
> i A sun-java6-jre                   - Sun Java(TM) Runtime Environment
> (JRE) 6
> -----
>
>
> This is where aa...@cloudera advises that JDK is needed (instead of JRE)
> for
> successful running of sqoop:
>
> http://getsatisfaction.com/cloudera/topics/error_sqoop_sqoop_got_exception_running_sqoop_java_lang_nullpointerexception_java_lang_nullpointerexception-
> j7ziz<http://getsatisfaction.com/cloudera/topics/error_sqoop_sqoop_got_exception_running_sqoop_java_lang_nullpointerexception_java_lang_nullpointerexception-%0Aj7ziz>
>
> As I said, I am interested in Sqoop (and alternatives) as we will be facing
> the problem in near future, so I appreciate your involvement in this
> thread!
>
> Anze
>
>
> On Thursday 04 November 2010, [email protected] wrote:
> > Anze - I just checked that our Sqoop packages do declare the JDK
> > dependency. Which package did you see as not having this dependency?
> >
> > Arvind
> >
> > On Thu, Nov 4, 2010 at 9:25 AM, [email protected]
> <[email protected]>wrote:
> > > Sqoop is Java based and you should have JDK 1.6 or higher available on
> > > your system. We will add this as a dependency for the package.
> > >
> > > Regarding accessing MySQL from a cluster - it should not be a problem
> if
> > > you control the number of tasks that do that. Sqoop allows you to
> > > explicitly specify the number of mappers, where each mapper holds a a
> > > connection to the database and effectively parallelizes  the loading
> > > process. Apart from just the speed, Sqoop offers many other advantages
> > > too such as incremental loads, exporting data from HDFS back to the
> > > database, automatic creation of Hive tables or populating hbase etc.
> > >
> > > Arvind
> > >
> > > 2010/11/4 Anze <[email protected]>
> > >
> > >> So Sqoop doesn't require JDK?
> > >> It seemed weird to me too. Also, if it would require it, then JDK
> would
> > >> probably have to be among dependencies of the package Sqoop is in.
> > >>
> > >> I started working on DBLoader, but the learning curve seems quite
> steep
> > >> and I
> > >> don't have enough time for it right now. Also, as Ankur said, it might
> > >> not be
> > >> a good idea to hit MySQL from the cluster.
> > >>
> > >> The ideal solution IMHO would be loading data from MySQL to HDFS from
> a
> > >> single
> > >> machine (but within LoadFunc, of course) and work with the data from
> > >> there (with schema automatically converted from MySQL). But I don't
> > >> know enough about Pig to do that kind of thing... yet. :)
> > >>
> > >> Anze
> > >>
> > >> On Wednesday 03 November 2010, [email protected] wrote:
> > >> > Sorry that you ran into a problem. Typically, it is usually
> something
> > >>
> > >> like
> > >>
> > >> > missing a required option etc that could cause this and if you were
> to
> > >>
> > >> send
> > >>
> > >> > a mail to [email protected], you would get prompt assistance.
> > >> >
> > >> > Regardless, if you still have any use cases like this, I will be
> glad
> > >> > to help you out in using Sqoop for that purpose.
> > >> >
> > >> > Arvind
> > >> >
> > >> > 2010/11/3 Anze <[email protected]>
> > >> >
> > >> > > I tried to run it, got NullPointerException, searched the net,
> found
> > >> > > Sqoop requires JDK (instead of JRE) and gave up. I am working on a
> > >> > > production cluster - so I'd rather not upgrade to JDK if not
> > >>
> > >> necessary.
> > >>
> > >> > > :)
> > >> > >
> > >> > > But I was able export MySQL with a simple bash script:
> > >> > > **********
> > >> > > #!/bin/bash
> > >> > >
> > >> > > MYSQL_TABLES=( table1 table2 table3 )
> > >> > > WHERE=/home/hadoop/pig
> > >> > >
> > >> > > for i in ${mysql_tabl...@]}
> > >> > > do
> > >> > >
> > >> > >  mysql -BAN -h <mysql_host> -u <username> --password=<pass>
> > >> > >  <database>
> > >>
> > >> \
> > >>
> > >> > >    -e "select * from $i;" --skip-column-names > $WHERE/$i.csv
> > >> > >
> > >> > >  hadoop fs -copyFromLocal $WHERE/$i.csv /pig/mysql/
> > >> > >  rm $WHERE/$i.csv
> > >> > >
> > >> > > done
> > >> > > **********
> > >> > >
> > >> > > Of course, in my case the tables were small enough so I could do
> it.
> > >>
> > >> And
> > >>
> > >> > > of course I lost schema in process.
> > >> > >
> > >> > > Hope it helps someone else too...
> > >> > >
> > >> > > Anze
> > >> > >
> > >> > > On Wednesday 03 November 2010, [email protected] wrote:
> > >> > > > Anze,
> > >> > > >
> > >> > > > Did you get a chance to try out Sqoop? If not, I would encourage
> > >> > > > you
> > >>
> > >> to
> > >>
> > >> > > do
> > >> > >
> > >> > > > so. Here is a link to the user
> > >> > > > guide<
> http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html>
> > >> > > > .
> > >> > > >
> > >> > > > Sqoop allows you to easily move data across from relational
> > >>
> > >> databases
> > >>
> > >> > > > and other enterprise systems to HDFS and back.
> > >> > > >
> > >> > > > Arvind
> > >> > > >
> > >> > > > 2010/11/3 Anze <[email protected]>
> > >> > > >
> > >> > > > > Alejandro, thanks for answering!
> > >> > > > >
> > >> > > > > I was hoping it could be done directly from Pig, but... :)
> > >> > > > >
> > >> > > > > I'll take a look at Sqoop then, and if that doesn't help, I'll
> > >>
> > >> just
> > >>
> > >> > > write
> > >> > >
> > >> > > > > a simple batch to export data to TXT/CSV. Thanks for the
> > >> > > > > pointer!
> > >> > > > >
> > >> > > > > Anze
> > >> > > > >
> > >> > > > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote:
> > >> > > > > > Not a 100% Pig solution, but you could use Sqoop to get the
> > >> > > > > > data
> > >>
> > >> in
> > >>
> > >> > > as
> > >> > >
> > >> > > > > > a pre-processing step. And if you want to handle all as
> single
> > >>
> > >> job,
> > >>
> > >> > > > > > you
> > >> > > > >
> > >> > > > > could
> > >> > > > >
> > >> > > > > > use Oozie to create a workflow that does Sqoop and then your
> > >> > > > > > Pig processing.
> > >> > > > > >
> > >> > > > > > Alejandro
> > >> > > > > >
> > >> > > > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]>
> > >>
> > >> wrote:
> > >> > > > > > > Hi!
> > >> > > > > > >
> > >> > > > > > > Part of data I have resides in MySQL. Is there a loader
> that
> > >> > > > > > > would
> > >> > > > >
> > >> > > > > allow
> > >> > > > >
> > >> > > > > > > loading directly from it?
> > >> > > > > > >
> > >> > > > > > > I can't find anything on the net, but it seems to me this
> > >> > > > > > > must
> > >>
> > >> be
> > >>
> > >> > > > > > > a
> > >> > > > >
> > >> > > > > quite
> > >> > > > >
> > >> > > > > > > common problem.
> > >> > > > > > > I checked piggybank but there is only DBStorage (and no
> > >> > > > > > > DBLoader).
> > >> > > > > > >
> > >> > > > > > > Is some DBLoader out there too?
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Anze
>
>

Reply via email to