Hi all,
I need some Hive+HBase functionality that is not currently available in the
hive distribution but is available in hive trunk, so I downloaded the
tar.gz and did an ant build. Unfortunately, I'm not able to create an
external table using HBaseStorageHandler. HBase and Hadoop are installed
via ClouderaManager with CDH3u3.
CDH3u3 uses flavors of HBase 0.90.4 and Hive 0.7.1. The current hive-trunk
is HBase 0.92.0 and Hive 0.9.0. I'm guessing my problem is with the JARs?
I suspect that I've configured it to use the wrong ones, but I'm in
"trial-and-error" mode with the JAR versions to use. I have several JAR
versions to choose from and it's difficult to determine if I should be
using what is packaged with CDH3u3 or with hive-trunk (for hbase,
zookeeper, guava, hive-hbase-handler etc.).
Has anyone successfully used hive 0.9.0 with HBase from CDH3u3? Any
suggestions?
I've tried several configurations, but here's the current
hive-trunk/build/dist/conf/hive-site.xml:
<configuration>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive-trunk/build/dist/lib/hive-contrib-0.9.0-SNAPSHOT.jar,file:///usr/local/hive-trunk/build/dist/lib/hbase-0.92.0.jar,file:///usr/local/hive-trunk/build/dist/lib/hive-hbase-handler-0.9.0-SNAPSHOT.jar,file:///usr/local/hive-trunk/build/dist/lib/zookeeper-3.4.3.jar,file:///usr/local/hive-trunk/build/dist/lib/guava-r09.jar</value>
<description>These JAR file are available to all users for all
jobs</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>myzookeeper</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
<description>The port of zookeeper servers to talk to. This is only
needed for read/write locks.</description>
</property>
[...SNIP...]
</configuration>
...and here's what I'm seeing:
/usr/local/hive-trunk/build/dist # bin/hive
Logging initialized using configuration in
file:/usr/local/hive-trunk/build/dist/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201204061212_2013977734.txt
hive>
> CREATE EXTERNAL TABLE myhivetable(uid string, confidence
map<string,string>)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "mycolfam:")
> TBLPROPERTIES("hbase.table.name" = "myhbasetable");
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
FAILED: Error in metadata: java.lang.RuntimeException: Thread was
interrupted while trying to connect to master.
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
...the interrupt is a ^C from me, since the process becomes unresponsive.
Logs output:
ip-10-0-2-116:/tmp/root # tail -n 50 hive.log
2012-04-06 12:12:10,583 WARN conf.HiveConf (HiveConf.java:<clinit>(63)) -
DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at
/usr/local/hive-trunk/build/dist/conf/hive-default.xml
2012-04-06 12:12:15,065 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2012-04-06 12:12:15,065 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2012-04-06 12:12:15,066 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2012-04-06 12:12:15,066 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2012-04-06 12:12:15,067 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2012-04-06 12:12:15,067 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2012-04-06 12:12:17,566 WARN zookeeper.ClientCnxnSocket
(ClientCnxnSocket.java:readConnectResult(139)) - Connected to an old
server; r-o mode will be unavailable
2012-04-06 12:12:44,440 ERROR exec.Task (SessionState.java:printError(397))
- FAILED: Error in metadata: java.lang.RuntimeException: Thread was
interrupted while trying to connect to master.
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: Thread was interrupted while trying to connect
to master.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3304)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:241)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1325)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.RuntimeException: Thread was interrupted while trying
to connect to master.
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:669)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:106)
at
org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:73)
at
org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:147)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:396)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538)
... 17 more
2012-04-06 12:12:44,441 ERROR ql.Driver (SessionState.java:printError(397))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
...and HBase master logs:
ip-10-0-2-116:/var/log/hbase # tail -n 16
hbase-cmf-hbase1-MASTER-myhost.log.out
2012-04-06 12:57:29,048 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60000: readAndProcess threw exception java.io.EOFException.
Count of bytes read: 0
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:171)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processHeader(HBaseServer.java:966)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:950)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)