Alternatively one can reconfigure the Hive to use different database backend other than derby (example configurations are in [1]) or specify absolute path for derby to avoid the relative path issues.
Jarcec Links: 1: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_hive_metastore_configure.html On Sat, Jul 06, 2013 at 12:39:53AM +0530, Siddharth Karandikar wrote: > Hi Andy, > > > I have sqoop installed in ' > /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0' and Hive in ' > /root/siddharth/tools/hive-0.11.0-bin'. > > I used to run following command from within sqoop's directory - > ./bin/sqoop import --connect jdbc:mysql://localhost/ClassicModels -table > Customers -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin > --hive-import --verbose --mysql-delimiters > > This would create temporary HDFS file of given mysql table and then will > start Hive from the same directory to create the table and load data. > Hive's default behavior is to look for metastore_db in current directory > and if not available, create it and use it. Above command used to run > without errors. So I used to go in Hive's directory ' > /root/siddharth/tools/hive-0.11.0-bin' and run hive to check the newly > created table by sqoop and never find it there. Issue was, when Hive is > started from its own directory, it again checks for existence of > metastore_db which is not there (as it is create in sqoops dir), so creates > a new one which has no reference of previously created table. > > What could have also worked in this case is, after running sqoop's command, > just copy metastore_db from there in Hive's directory before starting Hive > shell. > > > - > Siddharth > > > > On Fri, Jul 5, 2013 at 7:53 PM, Han Sen Tey <[email protected]> wrote: > > > Good day Siddharth, > > > > Can you kindly elaborate further what the problem was and how do you > > resolve it ( configure location of metastore_db in hive ? ) ? Thank you. > > > > Regards, > > Andy > > > > > > ------------------------------ > > *From:* Siddharth Karandikar <[email protected]> > > *To:* [email protected] > > *Sent:* Friday, July 5, 2013 10:06:27 AM > > *Subject:* Re: Zero rows imported while doing Mysql to Hive import > > > > Hi Jarek, > > > > Problem solved. It was my error! > > > > I was rereading your response and looked closely into sqoop directory, I > > found a metastore_db directory there. Then I realized what you were talking > > about and what the problem was. My bad! > > As you mentioned, sqoop was doing its work perfectly fine, it was hive > > that was not knowing about new table getting created due to separate > > metastrore_db. > > > > Now I have configured location of metastore_db in hive and everything > > works great! :) > > > > Thanks Jarek. > > > > > > Regards, > > Siddharth > > > > > > > > > > On Fri, Jul 5, 2013 at 7:17 PM, Siddharth Karandikar < > > [email protected]> wrote: > > > > Hi, > > > > When I tried doing this import-mysql-into-hive in 3 separate steps, it > > worked! > > > > Here is what I ran - > > > > 1) Import MySQL table in HDFS with command: > > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop import > > --connect jdbc:mysql://localhost/ClassicModels -table Customers -m 1 > > --verbose --fields-terminated-by '\t' --lines-terminated-by '\n' > > > > 2) Create a table in Hive with query: (Using same query found in --verbose > > log of failing sqoop run) > > hive> CREATE TABLE IF NOT EXISTS `Customers` ( `customerNumber` INT, > > `customerName` STRING, `contactLastName` STRING, `contactFirstName` STRING, > > `phone` STRING, `addressLine1` STRING, `addressLine2` STRING, `city` > > STRING, `state` STRING, `postalCode` STRING, `country` STRING, > > `salesRepEmployeeNumber` INT, `creditLimit` DOUBLE) COMMENT 'Imported by > > sqoop on 2013/07/04 00:41:14' ROW FORMAT DELIMITED FIELDS TERMINATED BY > > '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; > > OK > > Time taken: 0.214 seconds > > > > 3) Load data from HDFS with query: (Using same query found in --verbose > > log of failing sqoop run) > > hive> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO > > TABLE `Customers`; > > Loading data to table default.customers > > > > Table default.customers stats: [num_partitions: 0, num_files: 2, num_rows: > > 0, total_size: 15556, raw_data_size: 0] > > OK > > Time taken: 0.317 seconds > > > > hive> select count(*) from Customers; > > ....... > > Total MapReduce CPU Time Spent: 2 seconds 460 msec > > OK > > 122 > > Time taken: 26.873 seconds, Fetched: 1 row(s) > > > > > > So the table got created and populated properly. I now wonder why it is > > not working when I use --hive-import. > > > > > > Any pointers to debug this issue? > > > > > > Thanks, > > Siddharth > > > > > > > > > > On Thu, Jul 4, 2013 at 12:21 PM, Siddharth Karandikar < > > [email protected]> wrote: > > > > Hi Jarek, > > > > I am have not re-configured Hive. I am using the default > > settings/locations. I am using --hive-home to tell sqoop where to find > > Hive. > > > > Here are the locations of my sqoop, Hive and Hadoop instances. > > Hadoop: /root/siddharth/tools/hadoop-1.1.2 > > Hive: /root/siddharth/tools/hive-0.11.0-bin > > Sqoop: /root/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 > > > > > > And here are few more details after running it with verbose. > > > > I am using following command to import into hive: > > ssk01:~/siddharth/tools/sqoop-1.4.3.bin__hadoop-1.0.0 # ./bin/sqoop > > import --connect jdbc:mysql://localhost/ClassicModels -table Customers > > -m 1 --hive-home /root/siddharth/tools/hive-0.11.0-bin --hive-import > > --verbose --mysql-delimiters > > > > Verbose output of above command: > > http://pastebin.com/TcYG8vkr > > > > After running this command here is what I see in Hive and HDFS > > > > HDFS > > ===== > > ssk01:~/siddharth/tools/hadoop-1.1.2 # bin/hadoop fs -ls > > hdfs://localhost:9000/user/hive/warehouse/* > > Found 2 items > > -rw-r--r-- 1 root supergroup 0 2013-07-04 00:41 > > /user/hive/warehouse/customers/_SUCCESS > > -rw-r--r-- 1 root supergroup 15569 2013-07-04 00:41 > > /user/hive/warehouse/customers/part-m-00000 > > > > > > Hive (I am running Hive from its own directory so metadata should be > > accessible) > > =========================================================== > > ssk01:~/siddharth/tools/hive-0.11.0-bin # ./bin/hive > > > > Logging initialized using configuration in > > > > jar:file:/root/siddharth/tools/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties > > Hive history > > file=/tmp/root/[email protected]_201307040044_383137545.txt > > hive> show databases; > > OK > > default > > Time taken: 8.035 seconds, Fetched: 1 row(s) > > > > hive> use default; > > OK > > Time taken: 0.018 seconds > > > > hive> show tables; > > OK > > Time taken: 4.175 seconds > > hive> > > > > Strange thing is table named default.customers doesn't exist in Hive > > even though sqoop output mentioned that. > > > > > > Thanks, > > Siddharth > > > > > > > > On Wed, Jul 3, 2013 at 9:36 PM, Jarek Jarcec Cecho <[email protected]> > > wrote: > > > Hi Siddharth, > > > using directory in LOAD DATA command is completely valid. You can find > > more information about the command in Hive documentation [1]. I would > > estimate that your issue might be more with parsing the data rather than > > accessing them when you are able to see the rows, just with incorrect > > values. > > > > > > Jarcec > > > > > > Links: > > > 1: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML > > > > > > On Wed, Jul 03, 2013 at 05:11:47PM +0530, Siddharth Karandikar wrote: > > >> Hi, > > >> > > >> While looking into Hive history file, I found this query. > > >> > > >> LOAD DATA INPATH 'hdfs://localhost:9000/user/root/Customers' INTO > > >> TABLE `Customers`" > > >> QUERY_ID="root_20130703050909_882c2484-e1c8-43a3-9eff-dd0f296fc560" > > >> ..... > > >> > > >> HDFS location mentioned in this query is a directory not a csv file. > > >> This directory contains the part-* file(s) which hold actual data. I > > >> don't know if Sqoop understands this directory structure and knows how > > >> to read those multiple part-* files? Or is this an issue? > > >> > > >> I was hit by a similar thing while creating an external table in Hive > > >> where location specified was such hdfs directory (generated by sqoop > > >> import) containing multiple part-* files. Hive table got created but > > >> all the rows were NULL. And thats why I started looking into > > >> --hive-import option available in sqoop. But looks like it is also not > > >> working for me. > > >> > > >> Am I missing something? > > >> > > >> > > >> Thanks, > > >> Siddharth > > >> > > >> On Wed, Jul 3, 2013 at 4:55 PM, Siddharth Karandikar > > >> <[email protected]> wrote: > > >> > Hi, > > >> > > > >> > I am facing some problems while importing a sample database from MySQL > > >> > to Hive using Sqoop 1.4.3, Hive 0.11.0 and Hadoop 1.1.2 on a single > > >> > node setup. > > >> > > > >> > While doing this, I am always seeing following message in job logs - > > >> > Table default.customers stats: [num_partitions: 0, num_files: 2, > > >> > num_rows: 0, total_size: 15556, raw_data_size: 0] > > >> > > > >> > Job ends with success message - > > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Time taken: 0.74 seconds > > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Hive import complete. > > >> > 13/07/03 05:09:30 INFO hive.HiveImport: Export directory is empty, > > removing it. > > >> > > > >> > Full command and log can be found at - http://pastebin.com/03f6Wdga > > >> > > > >> > I am using Sqoop for the first time and I could be missing few things. > > >> > Any pointers to solve thos problem would really help. > > >> > > > >> > > > >> > MySQL to HDFS is working fine though. > > >> > > > >> > > > >> > Thanks, > > >> > Siddharth > > > > > > > > > > > >
signature.asc
Description: Digital signature
