I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
attempting to query the contents with Pig (version 0.8.1-cdh3u3).
grunt> A = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
grunt> dump A;
(...)Success!
myhbasevalue1
This works when pig runs in local mode, but when it is executed in
mapreduce mode, the MR job fails with an all-too-familiar error message:
org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
connect to ZooKeeper but the connection closes immediately
To make this work with pig + local mode, I followed suggestions I found via
a web search and added the HBase classpath to PIG_CLASSPATH:
added to: /usr/lib/pig/bin/pig
export JAVA_HOME=/usr/java/latest
export HBASE_HOME=/usr/lib/hbase
export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
added to: /etc/hbase/conf/hbase-site.xml
<property>
<name>hbase.zookeeper.quorum</name>
<value>myzookeeper1</value>
</property>
So again, this works with pig in local mode. To make my job run in
mapreduce mode, I add a target HDFS and Jobtracker service to the pig
properties
added to: /etc/pig/conf/pig.properties
fs.default.name=hdfs://my-mr-cluster/
mapred.job.tracker=my-mr-cluster:8021
When I run the query again on the actual MR cluster, the job fails with the
Zookeeper exception I mentioned above.
When I examine the job.xml (in the MR dashboard as well in the temporary
taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
(myzookeeper1). However, when I arbitrarily select a Tasktracker node and
examine the TT logs, I see that the Tasktracker thinks the ZK is
"localhost".
Any ideas? This is mindbending.
Neil Yalowitz
[email protected]