Hi Siddharth, I'm glad to see that everything is working for you! Jarcec
On Fri, Jul 05, 2013 at 07:36:27PM +0530, Siddharth Karandikar wrote: > Hi Jarek, > > Problem solved. It was my error! > > I was rereading your response and looked closely into sqoop directory, I > found a metastore_db directory there. Then I realized what you were talking > about and what the problem was. My bad! > As you mentioned, sqoop was doing its work perfectly fine, it was hive that > was not knowing about new table getting created due to separate > metastrore_db. > > Now I have configured location of metastore_db in hive and everything works > great! :) > > Thanks Jarek. > > > Regards, > Siddharth > > > > > On Fri, Jul 5, 2013 at 7:17 PM, Siddharth Karandikar < > [email protected]> wrote: > > > Hi, > > > > When I tried doing this import-mysql-into-hive in 3 separate steps, it > > worked! > > > > Here is what I ran - > > > > 1) Import MySQL table in HDFS with command: > > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop import > > --connect jdbc:mysql://localhost/ClassicModels -table Customers -m 1 > > --verbose --fields-terminated-by '\t' --lines-terminated-by '\n' > > > > 2) Create a table in Hive with query: (Using same query found in --verbose > > log of failing sqoop run) > > hive> CREATE TABLE IF NOT EXISTS `Customers` ( `customerNumber` INT, > > `customerName` STRING, `contactLastName` STRING, `contactFirstName` STRING, > > `phone` STRING, `addressLine1` STRING, `addressLine2` STRING, `city` > > STRING, `state` STRING, `postalCode` STRING, `country` STRING, > > `salesRepEmployeeNumber` INT, `creditLimit` DOUBLE) COMMENT 'Imported by > > sqoop on 2013/07/04 00:41:14' ROW FORMAT DELIMITED FIELDS TERMINATED BY > > '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; > > OK > > Time taken: 0.214 seconds > > > > 3) Load data from HDFS with query: (Using same query found in --verbose > > log of failing sqoop run) > > hive> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO > > TABLE `Customers`; > > Loading data to table default.customers > > > > Table default.customers stats: [num_partitions: 0, num_files: 2, num_rows: > > 0, total_size: 15556, raw_data_size: 0] > > OK > > Time taken: 0.317 seconds > > > > hive> select count(*) from Customers; > > ....... > > Total MapReduce CPU Time Spent: 2 seconds 460 msec > > OK > > 122 > > Time taken: 26.873 seconds, Fetched: 1 row(s) > > > > > > So the table got created and populated properly. I now wonder why it is > > not working when I use --hive-import. > > > > > > Any pointers to debug this issue? > > > > > > Thanks, > > Siddharth > > > > > > > > > > On Thu, Jul 4, 2013 at 12:21 PM, Siddharth Karandikar < > > [email protected]> wrote: > > > >> Hi Jarek, > >> > >> I am have not re-configured Hive. I am using the default > >> settings/locations. I am using --hive-home to tell sqoop where to find > >> Hive. > >> > >> Here are the locations of my sqoop, Hive and Hadoop instances. > >> Hadoop: /root/siddharth/tools/hadoop-1.1.2 > >> Hive: /root/siddharth/tools/hive-0.11.0-bin > >> Sqoop: /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 > >> > >> > >> And here are few more details after running it with verbose. > >> > >> I am using following command to import into hive: > >> ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop > >> import --connect jdbc:mysql://localhost/ClassicModels -table Customers > >> -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import > >> --verbose --mysql-delimiters > >> > >> Verbose output of above command: > >> http://pastebin.com/TcYG8vkr > >> > >> After running this command here is what I see in Hive and HDFS > >> > >> HDFS > >> ===== > >> ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls > >> hdfs://localhost:9000/user/hive/warehouse/* > >> Found 2 items > >> -rw-r--r-- 1 root supergroup 0 2013-07-04 00:41 > >> /user/hive/warehouse/customers/_SUCCESS > >> -rw-r--r-- 1 root supergroup 15569 2013-07-04 00:41 > >> /user/hive/warehouse/customers/part-m-00000 > >> > >> > >> Hive (I am running Hive from its own directory so metadata should be > >> accessible) > >> =========================================================== > >> ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive > >> > >> Logging initialized using configuration in > >> > >> jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties > >> Hive history > >> file=/tmp/root/[email protected]_201307040044_383137545.txt > >> hive> show databases; > >> OK > >> default > >> Time taken: 8.035 seconds, Fetched: 1 row(s) > >> > >> hive> use default; > >> OK > >> Time taken: 0.018 seconds > >> > >> hive> show tables; > >> OK > >> Time taken: 4.175 seconds > >> hive> > >> > >> Strange thing is table named default.customers doesn't exist in Hive > >> even though sqoop output mentioned that. > >> > >> > >> Thanks, > >> Siddharth > >> > >> > >> > >> On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[email protected]> > >> wrote: > >> > Hi Siddharth, > >> > using directory in LOAD DATA command is completely valid. You can find > >> more information about the command in Hive documentation [1]. I would > >> estimate that your issue might be more with parsing the data rather than > >> accessing them when you are able to see the rows, just with incorrect > >> values. > >> > > >> > Jarcec > >> > > >> > Links: > >> > 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML > >> > > >> > On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote: > >> >> Hi, > >> >> > >> >> While looking into Hive history file, I found this query. > >> >> > >> >> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO > >> >> TABLE `Customers`" > >> >> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560" > >> >> ..... > >> >> > >> >> HDFS location mentioned in this query is a directory not a csv file. > >> >> This directory contains the part-* file(s) which hold actual data. I > >> >> don't know if Sqoop understands this directory structure and knows how > >> >> to read those multiple part-* files? Or is this an issue? > >> >> > >> >> I was hit by a similar thing while creating an external table in Hive > >> >> where location specified was such hdfs directory (generated by sqoop > >> >> import) containing multiple part-* files. Hive table got created but > >> >> all the rows were NULL. And thats why I started looking into > >> >> --hive-import option available in sqoop. But looks like it is also not > >> >> working for me. > >> >> > >> >> Am I missing something? > >> >> > >> >> > >> >> Thanks, > >> >> Siddharth > >> >> > >> >> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar > >> >> <[email protected]> wrote: > >> >> > Hi, > >> >> > > >> >> > I am facing some problems while importing a sample database from > >> MySQL > >> >> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single > >> >> > node setup. > >> >> > > >> >> > While doing this, I am always seeing following message in job logs - > >> >> > Table default.customers stats: [num_partitions: 0, num_files: 2, > >> >> > num_rows: 0, total_size: 15556, raw_data_size: 0] > >> >> > > >> >> > Job ends with success message - > >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds > >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Hive import complete. > >> >> > 13/07/03 05:09:30 INFO hive.HiveImport: Export directory is empty, > >> removing it. > >> >> > > >> >> > Full command and log can be found at - http://pastebin.com/03f6Wdga > >> >> > > >> >> > I am using Sqoop for the first time and I could be missing few > >> things. > >> >> > Any pointers to solve thos problem would really help. > >> >> > > >> >> > > >> >> > MySQL to HDFS is working fine though. > >> >> > > >> >> > > >> >> > Thanks, > >> >> > Siddharth > >> > > > >
signature.asc
Description: Digital signature
