I am attaching portions from a document I had written last year while investigating Hbase and Hive. You may have already crossed that bridge….nevertheless…
Please forgive me :-) if some steps seamy hacky and not very well explained….I was on a solo mission to build a Hive Data platform from scratch and QDBW (Quick and Dirty But Works) was my philosophy to go ahead !!! Good luck Sanjay ================================================================================================================================================================================================================================= Hive and Hbase integration on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hive+and+Hbase+integration+on+local+Fedora+desktop+guide> Pre-requisites * Hadoop needs to be installed and HDFS needs to be be running (Hadoop HDFS setup on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hadoop+HDFS+setup+on+local+Fedora+desktop+guide>) * Hive needs to be installed (Hive setup on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hive+setup+on+local+Fedora+desktop+guide>) * HBase needs to be installed and running.(Hbase setup on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hbase+setup+on+local+Fedora+desktop+guide>) * Make sure ZooKeeper is running on port 2181. If not stop Hbase , change $HBASE_HOME/conf/hbase-site.xml and restart HBase Copying JARS to HADOOP_CLASSPATH Before you query tables , copy these jars from $HIVE_HOME/lib ----> $HADOOP_HOME/lib 1. Make sure zookeeper-3.4.3.jar is not there * ls -latr $HADOOP_HOME/lib/zookeeper-3.4.3.jar 2. Copy zookeeper-3.4.3.jar * sudo cp -av $HIVE_HOME/zookeeper-3.4.3.jar $HADOOP_HOME/lib 3. Make sure hive-common-0.9.0.jar is not there * ls -latr $HADOOP_HOME/lib/hive-common-0.9.0.jar 4. Copy hive-common-0.9.0.jar * sudo cp -av $HIVE_HOME/hive-common-0.9.0.jar $HADOOP_HOME/lib 5. Make sure hive-hbase-handler-0.9.0.jar is not there * ls -latr $HADOOP_HOME/lib/hive-hbase-handler-0.9.0.jar 6. Copy hive-hbase-handler-0.9.0.jar * sudo cp -av $HIVE_HOME/hive-hbase-handler-0.9.0.jar $HADOOP_HOME/lib 7. Exit from Hive Shell (type exit;) 8. Exit from HBase shell 9. Stop Hbase * $HBASE_HOME/bin/stop-hbase.sh 10. Stop Hadoop/HDFS * $HADOOP_HOME/bin/stop-all.sh 11. Check if NO java processes related to Hadoop/HDFS/Hbase/Hive exist * ps auxw | grep java 12. Start Hadoop/HDFS * $HADOOP_HOME/bin/start-all.sh 13. Start Hbase * $HBASE_HOME/bin/start-hbase.sh 14. Check ALL java processes related to Hadoop/HDFS/Hbase/Hive exist * ps auxw | grep java Create tables in HBase * Refer Hbase setup on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hbase+setup+on+local+Fedora+desktop+guide> and create the tables mentioned there * hbase_2_hive_food * hbase_2_hive_names Create tables in HIVE To run Hive type $HIVE_HOME/bin/hive This will take you to Hive shell. In the shell, create these two tables * CREATE EXTERNAL TABLE hbase_hive_names(hbid INT, id INT, fn STRING, ln STRING, age INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:fn,name:ln,age:age") TBLPROPERTIES("hbase.table.name" = "hbase_2_hive_names"); * This HIVE table will map to Hbase table hbase_2_hive_names * CREATE EXTERNAL TABLE hbase_hive_food(hbid INT, id INT, name STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:name") TBLPROPERTIES("hbase.table.name" =id:id,name:name<http://idid%2Cnamename/>") TBLPROPERTIES("hbase.table.name" = "hbase_2_hive_food"); * This HIVE table will map to Hbase table hbase_2_hive_food Creating & Loading tables in HBase through Hive * Make sure there is no table in Hbase called 'hive2hbase_names_table' * In Hive shell * CREATE TABLE hive2hbase_names_table (hb_id int, fn string, ln string, age_dnq INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,student:fn,student:ln,student:age") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_names_table") ; * Go to HBase shell * check that table hive2hbase_names_table is created. * In Hive Shell * create a Hive table and populate with data which we will use to populate the HiveHBase table * CREATE TABLE names_tab (hb_id int, fn string, ln string, age_dnq INT) PARTITIONED BY (age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; * LOAD DATA LOCAL INPATH '/data/mycode/impressions/inputfiles/names1.tsv.4fields' OVERWRITE INTO TABLE names_tab PARTITION (age=60); * LOAD DATA LOCAL INPATH '/data/mycode/impressions/inputfiles/names2.tsv.4fields' OVERWRITE INTO TABLE names_tab PARTITION (age=70); * INSERT OVERWRITE TABLE hive2hbase_names_table SELECT hb_id, fn, ln, age_dnq FROM names_tab WHERE age=60; * The data files will look like this (separated by "\t") 1 paul simon 60 2 paul mccartney 60 3 paul anka 60 * INSERT OVERWRITE TABLE hive2hbase_names_table SELECT hb_id, fn, ln, age_dnq FROM names_tab WHERE age=70; * The data files will look like this (separated by "\t") 4 brian may 70 5 george harrison 70 6 john glover 70 * Now u can query in Hive Shell * select * from hive2hbase_names_table; * select * from hive2hbase_names_table where age_dnq=60; To detete the HBase table * Go In HBase shell. * Disable 'hive2hbase_names_table' * Go to Hive shell. * Drop table 'hive2hbase_names_table' (This deletes the table from Hbase) To check , go to HBase shell * you will see that the table is deleted Running queries on Hive Running queries on Hive are now transparent (as if the HBase tables were in Hive). As mentioned in this document Hive setup on local Fedora desktop guide<https://wizecommerce.atlassian.net/wiki/display/traffic/Hive+setup+on+local+Fedora+desktop+guide>, the HIVE queries run M-R (map reduce) jobs on Hadoop. You can monitor the M-R jobs on your sandbox at http://localhost.localdomain:50030/jobtracker.jsp Example Queries * select hbase_hive_names.fn , hbase_hive_names.ln , hbase_hive_food.name from hbase_hive_names JOIN hbase_hive_food ON hbase_hive_names.hbid = hbase_hive_food.hbid ; * select * from hbase_hive_names where age<>70; * ================================================================================================================================================================================================================================= From: "bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>" <bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>, "bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>" <bejoy...@yahoo.com<mailto:bejoy...@yahoo.com>> Date: Monday, July 8, 2013 3:16 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: integration issure about hive and hbase Hi Can you try including the zookeeper quorum and port in your hive configuration as shown below hive --auxpath .../hbase-handler.jar, .../hbase.jar, ...zookeeper.jar, .../guava.jar -hiveconf hbase.zookeeper.quorum=<zk server names separated by comma> -hiveconf hbase.zookeeper.property.clientPort=<your custom port> Substitute the above command with actual values. Also ensure that the zk, hbase jars specified above are those used in your hbase cluster. To avoid any version mismatches. Regards Bejoy KS Sent from remote device, Please excuse typos ________________________________ From: ch huang <justlo...@gmail.com<mailto:justlo...@gmail.com>> Date: Mon, 8 Jul 2013 16:40:59 +0800 To: <user@hive.apache.org<mailto:user@hive.apache.org>> ReplyTo: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: integration issure about hive and hbase i replace the zookeeper jar ,the error is different hive> CREATE TABLE hbase_table_1(key int, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > TBLPROPERTIES ("hbase.table.name<http://hbase.table.name>" = "xyz"); FAILED: Error in metadata: MetaException(message:org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:160) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1265) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:526) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:516) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:173) at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:93) at org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:74) at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:158) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:344) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:470) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3176) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:213) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:930) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:138) ... 24 more ) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask On Mon, Jul 8, 2013 at 2:52 PM, Cheng Su <scarcer...@gmail.com<mailto:scarcer...@gmail.com>> wrote: Did you hbase cluster start up? The error message is more like that something wrong with the classpath. So maybe you'd better also check that. On Mon, Jul 8, 2013 at 1:54 PM, ch huang <justlo...@gmail.com<mailto:justlo...@gmail.com>> wrote: i get error when try create table on hbase use hive, anyone can help? hive> CREATE TABLE hive_hbasetable_demo(key int,value string) > STORED BY 'ora.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > TBLPROPERTIES ("hbase.table.name<http://hbase.table.name/>" = "hivehbasedemo"); Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.ora.apache.hadoop.hive.hbase.HBaseStorageHandler FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask -- Regards, Cheng Su CONFIDENTIALITY NOTICE ====================== This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.