I'm by no means an expert, but I think it's the latter. My rudimentary understanding is that pig uses HBaseStorage to load the data from hbase and passes the input splits along to hadoop/MR. Feel free to correct me if I'm wrong.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 04/12/2011 10:50 AM, Daniel Eklund wrote:
As a follow-up to my own question, which accurately describes the component
call-stack of the pig script I included in my post?

pig ->  mapreduce/hadoop ->  Hbase
pig  ->  Hbase ->  mapreduce/hadoop



On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<[email protected]>  wrote:

This question might be better diagnosed as an Hbase issue, but since it's
ultimately a Pig script I want to use, I figure someone on this group could
help me out. I tried asking the IRC channel, but I think it was in a lull.

My scenario:  I want to use Pig to call an HBase store.
My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
hbase-0.90.1-CDH3B4.
My sample script:

-----------
A = load 'passwd' using PigStorage(':');
rawDocs = LOAD 'hbase://daniel_product'
         USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
vals = foreach rawDocs generate $0 as val;
dump vals;
store vals into 'daniel.out';
-----------

I am consistently getting a
Failed Jobs:
JobId   Alias   Feature Message Outputs
N/A     rawDocs,vals    MAP_ONLY        Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
trying to locate root region
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)


Googling shows me similar issues:

http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig

My current understanding is that somewhere in the interaction between Pig,
Hadoop, HBase, and Zookeper, there is a configuration file that needs to be
included in a classpath or a configuration directory somewhere.  I have
tried various combinations of making hadoop aware of Hbase and vice-versa.
I have tried ZK running on its own, and also managed by HBase.

Can someone explain the dependencies here?  Any insight as to what I am
missing?  What would your diagnosis of the above message be?

thanks,
daniel





Reply via email to