Yes, Pig's HBaseStorage using the HBase client to read/write directly
to HBase from within a MR job, but chains to other Pig-generated MR
jobs as needed to transform.

Daniel, check that you have defined HBASE_CONF_DIR properly, or that
you have hbase-site.xml in your classpath. Then try to telnet to the
defined zookeeper host from the machine where the exception is being
generated. There is some communication from Pig to HBase/ZK from the
node that the client runs on before the MR jobs start on the cluster
FYI.


On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <[email protected]> wrote:
> I'm by no means an expert, but I think it's the latter. My rudimentary
> understanding is that pig uses HBaseStorage to load the data from hbase and
> passes the input splits along to hadoop/MR. Feel free to correct me if I'm
> wrong.
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
> On 04/12/2011 10:50 AM, Daniel Eklund wrote:
>>
>> As a follow-up to my own question, which accurately describes the
>> component
>> call-stack of the pig script I included in my post?
>>
>> pig ->  mapreduce/hadoop ->  Hbase
>> pig  ->  Hbase ->  mapreduce/hadoop
>>
>>
>>
>> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<[email protected]>  wrote:
>>
>>> This question might be better diagnosed as an Hbase issue, but since it's
>>> ultimately a Pig script I want to use, I figure someone on this group
>>> could
>>> help me out. I tried asking the IRC channel, but I think it was in a
>>> lull.
>>>
>>> My scenario:  I want to use Pig to call an HBase store.
>>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
>>> hbase-0.90.1-CDH3B4.
>>> My sample script:
>>>
>>> -----------
>>> A = load 'passwd' using PigStorage(':');
>>> rawDocs = LOAD 'hbase://daniel_product'
>>>         USING
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>>> vals = foreach rawDocs generate $0 as val;
>>> dump vals;
>>> store vals into 'daniel.out';
>>> -----------
>>>
>>> I am consistently getting a
>>> Failed Jobs:
>>> JobId   Alias   Feature Message Outputs
>>> N/A     rawDocs,vals    MAP_ONLY        Message:
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>>> out
>>> trying to locate root region
>>>         at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>>
>>>
>>> Googling shows me similar issues:
>>>
>>>
>>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>>>
>>> My current understanding is that somewhere in the interaction between
>>> Pig,
>>> Hadoop, HBase, and Zookeper, there is a configuration file that needs to
>>> be
>>> included in a classpath or a configuration directory somewhere.  I have
>>> tried various combinations of making hadoop aware of Hbase and
>>> vice-versa.
>>> I have tried ZK running on its own, and also managed by HBase.
>>>
>>> Can someone explain the dependencies here?  Any insight as to what I am
>>> missing?  What would your diagnosis of the above message be?
>>>
>>> thanks,
>>> daniel
>>>
>>>
>>>
>>>
>>
>

Reply via email to