Re: Zero rows imported while doing Mysql to Hive import

Jarek Jarcec Cecho Mon, 08 Jul 2013 08:31:32 -0700

Hi Siddharth,
I'm glad to see that everything is working for you!

Jarcec


On Fri, Jul 05, 2013 at 07:36:27PM +0530, Siddharth Karandikar wrote:
> Hi Jarek,
> 
> Problem solved. It was my error!
> 
> I was rereading your response and looked closely into sqoop directory, I
> found a metastore_db directory there. Then I realized what you were talking
> about and what the problem was. My bad!
> As you mentioned, sqoop was doing its work perfectly fine, it was hive that
> was not knowing about new table getting created due to separate
> metastrore_db.
> 
> Now I have configured location of metastore_db in hive and everything works
> great!  :)
> 
> Thanks Jarek.
> 
> 
> Regards,
> Siddharth
> 
> 
> 
> 
> On Fri, Jul 5, 2013 at 7:17 PM, Siddharth Karandikar <
> [email protected]> wrote:
> 
> > Hi,
> >
> > When I tried doing this import-mysql-into-hive in 3 separate steps, it
> > worked!
> >
> > Here is what I ran -
> >
> > 1) Import MySQL table in HDFS with command:
> > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop import
> > --connect jdbc:mysql://localhost/ClassicModels -table Customers -m 1
> > --verbose --fields-terminated-by '\t' --lines-terminated-by '\n'
> >
> > 2) Create a table in Hive with query: (Using same query found in --verbose
> > log of failing sqoop run)
> > hive> CREATE TABLE IF NOT EXISTS `Customers` ( `customerNumber` INT,
> > `customerName` STRING, `contactLastName` STRING, `contactFirstName` STRING,
> > `phone` STRING, `addressLine1` STRING, `addressLine2` STRING, `city`
> > STRING, `state` STRING, `postalCode` STRING, `country` STRING,
> > `salesRepEmployeeNumber` INT, `creditLimit` DOUBLE) COMMENT 'Imported by
> > sqoop on 2013/07/04 00:41:14' ROW FORMAT DELIMITED FIELDS TERMINATED BY
> > '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
> > OK
> > Time taken: 0.214 seconds
> >
> > 3) Load data from HDFS with query: (Using same query found in --verbose
> > log of failing sqoop run)
> > hive> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> > TABLE `Customers`;
> > Loading data to table default.customers
> >
> > Table default.customers stats: [num_partitions: 0, num_files: 2, num_rows:
> > 0, total_size: 15556, raw_data_size: 0]
> > OK
> > Time taken: 0.317 seconds
> >
> > hive> select count(*) from Customers;
> > .......
> > Total MapReduce CPU Time Spent: 2 seconds 460 msec
> > OK
> > 122
> > Time taken: 26.873 seconds, Fetched: 1 row(s)
> >
> >
> > So the table got created and populated properly. I now wonder why it is
> > not working when I use --hive-import.
> >
> >
> > Any pointers to debug this issue?
> >
> >
> > Thanks,
> > Siddharth
> >
> >
> >
> >
> > On Thu, Jul 4, 2013 at 12:21 PM, Siddharth Karandikar <
> > [email protected]> wrote:
> >
> >> Hi Jarek,
> >>
> >> I am have not re-configured Hive. I am using the default
> >> settings/locations. I am using --hive-home to tell sqoop where to find
> >> Hive.
> >>
> >> Here are the locations of my sqoop, Hive and Hadoop instances.
> >> Hadoop:    /root/siddharth/tools/hadoop-1.1.2
> >> Hive:    /root/siddharth/tools/hive-0.11.0-bin
> >> Sqoop:    /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0
> >>
> >>
> >> And here are few more details after running it with verbose.
> >>
> >> I am using following command to import into hive:
> >> ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop
> >> import --connect jdbc:mysql://localhost/ClassicModels -table Customers
> >> -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import
> >> --verbose --mysql-delimiters
> >>
> >> Verbose output of above command:
> >> http://pastebin.com/TcYG8vkr
> >>
> >> After running this command here is what I see in Hive and HDFS
> >>
> >> HDFS
> >> =====
> >> ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls
> >> hdfs://localhost:9000/user/hive/warehouse/*
> >> Found 2 items
> >> -rw-r--r--   1 root supergroup          0 2013-07-04 00:41
> >> /user/hive/warehouse/customers/_SUCCESS
> >> -rw-r--r--   1 root supergroup      15569 2013-07-04 00:41
> >> /user/hive/warehouse/customers/part-m-00000
> >>
> >>
> >> Hive (I am running Hive from its own directory so metadata should be
> >> accessible)
> >> ===========================================================
> >> ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive
> >>
> >> Logging initialized using configuration in
> >>
> >> jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
> >> Hive history
> >> file=/tmp/root/[email protected]_201307040044_383137545.txt
> >> hive> show databases;
> >> OK
> >> default
> >> Time taken: 8.035 seconds, Fetched: 1 row(s)
> >>
> >> hive> use default;
> >> OK
> >> Time taken: 0.018 seconds
> >>
> >> hive> show tables;
> >> OK
> >> Time taken: 4.175 seconds
> >> hive>
> >>
> >> Strange thing is table named default.customers doesn't exist in Hive
> >> even though sqoop output mentioned that.
> >>
> >>
> >> Thanks,
> >> Siddharth
> >>
> >>
> >>
> >> On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[email protected]>
> >> wrote:
> >> > Hi Siddharth,
> >> > using directory in LOAD DATA command is completely valid. You can find
> >> more information about the command in Hive documentation [1]. I would
> >> estimate that your issue might be more with parsing the data rather than
> >> accessing them when you are able to see the rows, just with incorrect
> >> values.
> >> >
> >> > Jarcec
> >> >
> >> > Links:
> >> > 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> >> >
> >> > On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote:
> >> >> Hi,
> >> >>
> >> >> While looking into Hive history file, I found this query.
> >> >>
> >> >> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO
> >> >> TABLE `Customers`"
> >> >> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560"
> >> >> .....
> >> >>
> >> >> HDFS location mentioned in this query is a directory not a csv file.
> >> >> This directory contains the part-* file(s) which hold actual data. I
> >> >> don't know if Sqoop understands this directory structure and knows how
> >> >> to read those multiple part-* files? Or is this an issue?
> >> >>
> >> >> I was hit by a similar thing while creating an external table in Hive
> >> >> where location specified was such hdfs directory (generated by sqoop
> >> >> import) containing multiple part-* files. Hive table got created but
> >> >> all the rows were NULL. And thats why I started looking into
> >> >> --hive-import option available in sqoop. But looks like it is also not
> >> >> working for me.
> >> >>
> >> >> Am I missing something?
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Siddharth
> >> >>
> >> >> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar
> >> >> <[email protected]> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I am facing some problems while importing a sample database from
> >> MySQL
> >> >> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single
> >> >> > node setup.
> >> >> >
> >> >> > While doing this, I am always seeing following message in job logs -
> >> >> > Table default.customers stats: [num_partitions: 0, num_files: 2,
> >> >> > num_rows: 0, total_size: 15556, raw_data_size: 0]
> >> >> >
> >> >> > Job ends with success message -
> >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds
> >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Hive import complete.
> >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Export directory is empty,
> >> removing it.
> >> >> >
> >> >> > Full command and log can be found at - http://pastebin.com/03f6Wdga
> >> >> >
> >> >> > I am using Sqoop for the first time and I could be missing few
> >> things.
> >> >> > Any pointers to solve thos problem would really help.
> >> >> >
> >> >> >
> >> >> > MySQL to HDFS is working fine though.
> >> >> >
> >> >> >
> >> >> > Thanks,
> >> >> > Siddharth
> >>
> >
> >

signature.asc
Description: Digital signature

Re: Zero rows imported while doing Mysql to Hive import

Reply via email to