Re: Zero rows imported while doing Mysql to Hive import

Jarek Jarcec Cecho Mon, 08 Jul 2013 08:36:56 -0700

Alternatively one can reconfigure the Hive to use different database backend 
other than derby (example configurations are in [1]) or specify absolute path 
for derby to avoid the relative path issues.


Jarcec

Links:
1: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_hive_metastore_configure.html

On Sat, Jul 06, 2013 at 12:39:53AM +0530, Siddharth Karandikar wrote:
> Hi Andy,
> 
> 
> I have sqoop installed in  '
> /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0' and Hive in '
> /root/siddharth/tools/hive-0.11.0-bin'.
> 
> I used to run following command from within sqoop's directory -
> ./bin/sqoop  import --connect jdbc:mysql://localhost/ClassicModels -table
> Customers -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin
> --hive-import --verbose --mysql-delimiters
> 
> This would create temporary HDFS file of given mysql table and then will
> start Hive from the same directory to create the table and load data.
> Hive's default behavior is to look for metastore_db in current directory
> and if not available, create it and use it. Above command used to run
> without errors. So I used to go in Hive's directory '
> /root/siddharth/tools/hive-0.11.0-bin' and run hive to check the newly
> created table by sqoop and never find it there. Issue was, when Hive is
> started from its own directory, it again checks for existence of
> metastore_db which is not there (as it is create in sqoops dir), so creates
> a new one which has no reference of previously created table.
> 
> What could have also worked in this case is, after running sqoop's command,
> just copy metastore_db from there in Hive's directory before starting Hive
> shell.
> 
> 
> -
> Siddharth
> 
> 
> 
> On Fri, Jul 5, 2013 at 7:53 PM, Han Sen Tey <[email protected]> wrote:
> 
> > Good day Siddharth,
> >
> >    Can you kindly elaborate further what the problem was and how do you
> > resolve it ( configure location of metastore_db in hive ? ) ?  Thank you.
> >
> > Regards,
> > Andy
> >
> >
> >   ------------------------------
> >  *From:* Siddharth Karandikar <[email protected]>
> > *To:* [email protected]
> > *Sent:* Friday, July 5, 2013 10:06:27 AM
> > *Subject:* Re: Zero rows imported while doing Mysql to Hive import
> >
> > Hi Jarek,
> >
> > Problem solved. It was my error!
> >
> > I was rereading your response and looked closely into sqoop directory, I
> > found a metastore_db directory there. Then I realized what you were talking
> > about and what the problem was. My bad!
> > As you mentioned, sqoop was doing its work perfectly fine, it was hive
> > that was not knowing about new table getting created due to separate
> > metastrore_db.
> >
> > Now I have configured location of metastore_db in hive and everything
> > works great!  :)
> >
> > Thanks Jarek.
> >
> >
> > Regards,
> > Siddharth
> >
> >
> >
> >
> > On Fri, Jul 5, 2013 at 7:17 PM, Siddharth Karandikar <
> > [email protected]> wrote:
> >
> > Hi,
> >
> > When I tried doing this import-mysql-into-hive in 3 separate steps, it
> > worked!
> >
> > Here is what I ran -
> >
> > 1) Import MySQL table in HDFS with command:
> > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop import
> > --connect jdbc:mysql://localhost/ClassicModels -table Customers -m 1
> > --verbose --fields-terminated-by '\t' --lines-terminated-by '\n'
> >
> > 2) Create a table in Hive with query: (Using same query found in --verbose
> > log of failing sqoop run)
> > hive> CREATE TABLE IF NOT EXISTS `Customers` ( `customerNumber` INT,
> > `customerName` STRING, `contactLastName` STRING, `contactFirstName` STRING,
> > `phone` STRING, `addressLine1` STRING, `addressLine2` STRING, `city`
> > STRING, `state` STRING, `postalCode` STRING, `country` STRING,
> > `salesRepEmployeeNumber` INT, `creditLimit` DOUBLE) COMMENT 'Imported by
> > sqoop on 2013/07/04 00:41:14' ROW FORMAT DELIMITED FIELDS TERMINATED BY
> > '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
> > OK
> > Time taken: 0.214 seconds
> >
> > 3) Load data from HDFS with query: (Using same query found in --verbose
> > log of failing sqoop run)
> > hive> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> > TABLE `Customers`;
> > Loading data to table default.customers
> >
> > Table default.customers stats: [num_partitions: 0, num_files: 2, num_rows:
> > 0, total_size: 15556, raw_data_size: 0]
> > OK
> > Time taken: 0.317 seconds
> >
> > hive> select count(*) from Customers;
> > .......
> > Total MapReduce CPU Time Spent: 2 seconds 460 msec
> > OK
> > 122
> > Time taken: 26.873 seconds, Fetched: 1 row(s)
> >
> >
> > So the table got created and populated properly. I now wonder why it is
> > not working when I use --hive-import.
> >
> >
> > Any pointers to debug this issue?
> >
> >
> > Thanks,
> > Siddharth
> >
> >
> >
> >
> > On Thu, Jul 4, 2013 at 12:21 PM, Siddharth Karandikar <
> > [email protected]> wrote:
> >
> > Hi Jarek,
> >
> > I am have not re-configured Hive. I am using the default
> > settings/locations. I am using --hive-home to tell sqoop where to find
> > Hive.
> >
> > Here are the locations of my sqoop, Hive and Hadoop instances.
> > Hadoop:    /root/siddharth/tools/hadoop-1.1.2
> > Hive:    /root/siddharth/tools/hive-0.11.0-bin
> > Sqoop:    /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0
> >
> >
> > And here are few more details after running it with verbose.
> >
> > I am using following command to import into hive:
> > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
> > import --connect jdbc:mysql://localhost/ClassicModels -table Customers
> > -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
> > --verbose --mysql-delimiters
> >
> > Verbose output of above command:
> > http://pastebin.com/TcYG8vkr
> >
> > After running this command here is what I see in Hive and HDFS
> >
> > HDFS
> > =====
> > ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
> > hdfs://localhost:9000/user/hive/warehouse/*
> > Found 2 items
> > -rw-r--r--   1 root supergroup          0 2013-07-04 00:41
> > /user/hive/warehouse/customers/_SUCCESS
> > -rw-r--r--   1 root supergroup      15569 2013-07-04 00:41
> > /user/hive/warehouse/customers/part-m-00000
> >
> >
> > Hive (I am running Hive from its own directory so metadata should be
> > accessible)
> > ===========================================================
> > ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive
> >
> > Logging initialized using configuration in
> >
> > jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
> > Hive history
> > file=/tmp/root/[email protected]_201307040044_383137545.txt
> > hive> show databases;
> > OK
> > default
> > Time taken: 8.035 seconds, Fetched: 1 row(s)
> >
> > hive> use default;
> > OK
> > Time taken: 0.018 seconds
> >
> > hive> show tables;
> > OK
> > Time taken: 4.175 seconds
> > hive>
> >
> > Strange thing is table named default.customers doesn't exist in Hive
> > even though sqoop output mentioned that.
> >
> >
> > Thanks,
> > Siddharth
> >
> >
> >
> > On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[email protected]>
> > wrote:
> > > Hi Siddharth,
> > > using directory in LOAD DATA command is completely valid. You can find
> > more information about the command in Hive documentation [1]. I would
> > estimate that your issue might be more with parsing the data rather than
> > accessing them when you are able to see the rows, just with incorrect
> > values.
> > >
> > > Jarcec
> > >
> > > Links:
> > > 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> > >
> > > On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
> > >> Hi,
> > >>
> > >> While looking into Hive history file, I found this query.
> > >>
> > >> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> > >> TABLE `Customers`"
> > >> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560"
> > >> .....
> > >>
> > >> HDFS location mentioned in this query is a directory not a csv file.
> > >> This directory contains the part-* file(s) which hold actual data. I
> > >> don't know if Sqoop understands this directory structure and knows how
> > >> to read those multiple part-* files? Or is this an issue?
> > >>
> > >> I was hit by a similar thing while creating an external table in Hive
> > >> where location specified was such hdfs directory (generated by sqoop
> > >> import) containing multiple part-* files. Hive table got created but
> > >> all the rows were NULL. And thats why I started looking into
> > >> --hive-import option available in sqoop. But looks like it is also not
> > >> working for me.
> > >>
> > >> Am I missing something?
> > >>
> > >>
> > >> Thanks,
> > >> Siddharth
> > >>
> > >> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
> > >> <[email protected]> wrote:
> > >> > Hi,
> > >> >
> > >> > I am facing some problems while importing a sample database from MySQL
> > >> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
> > >> > node setup.
> > >> >
> > >> > While doing this, I am always seeing following message in job logs -
> > >> > Table default.customers stats: [num_partitions: 0, num_files: 2,
> > >> > num_rows: 0, total_size: 15556, raw_data_size: 0]
> > >> >
> > >> > Job ends with success message -
> > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds
> > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Hive import complete.
> > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Export directory is empty,
> > removing it.
> > >> >
> > >> > Full command and log can be found at - http://pastebin.com/03f6Wdga
> > >> >
> > >> > I am using Sqoop for the first time and I could be missing few things.
> > >> > Any pointers to solve thos problem would really help.
> > >> >
> > >> >
> > >> > MySQL to HDFS is working fine though.
> > >> >
> > >> >
> > >> > Thanks,
> > >> > Siddharth
> >
> >
> >
> >
> >
> >

signature.asc
Description: Digital signature

Re: Zero rows imported while doing Mysql to Hive import

Reply via email to