Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

Neil Yalowitz Tue, 15 May 2012 17:28:55 -0700

I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
attempting to query the contents with Pig (version 0.8.1-cdh3u3).



grunt> A = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
grunt> dump A;
(...)Success!
myhbasevalue1


This works when pig runs in local mode, but when it is executed in
mapreduce mode, the MR job fails with an all-too-familiar error message:


    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
connect to ZooKeeper but the connection closes immediately


To make this work with pig + local mode, I followed suggestions I found via
a web search and added the HBase classpath to PIG_CLASSPATH:


added to:  /usr/lib/pig/bin/pig

export JAVA_HOME=/usr/java/latest
export HBASE_HOME=/usr/lib/hbase
export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"


added to: /etc/hbase/conf/hbase-site.xml

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>myzookeeper1</value>
</property>


So again, this works with pig in local mode.  To make my job run in
mapreduce mode, I add a target HDFS and Jobtracker service to the pig
properties


added to: /etc/pig/conf/pig.properties

fs.default.name=hdfs://my-mr-cluster/
mapred.job.tracker=my-mr-cluster:8021


When I run the query again on the actual MR cluster, the job fails with the
Zookeeper exception I mentioned above.

When I examine the job.xml (in the MR dashboard as well in the temporary
taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
(myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
examine the TT logs, I see that the Tasktracker thinks the ZK is
"localhost".

Any ideas?  This is mindbending.


Neil Yalowitz
[email protected]

Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

Reply via email to