Yes, Pig's HBaseStorage using the HBase client to read/write directly to HBase from within a MR job, but chains to other Pig-generated MR jobs as needed to transform.
Daniel, check that you have defined HBASE_CONF_DIR properly, or that you have hbase-site.xml in your classpath. Then try to telnet to the defined zookeeper host from the machine where the exception is being generated. There is some communication from Pig to HBase/ZK from the node that the client runs on before the MR jobs start on the cluster FYI. On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <[email protected]> wrote: > I'm by no means an expert, but I think it's the latter. My rudimentary > understanding is that pig uses HBaseStorage to load the data from hbase and > passes the input splits along to hadoop/MR. Feel free to correct me if I'm > wrong. > -- > Jameson Lopp > Software Engineer > Bronto Software, Inc. > > On 04/12/2011 10:50 AM, Daniel Eklund wrote: >> >> As a follow-up to my own question, which accurately describes the >> component >> call-stack of the pig script I included in my post? >> >> pig -> mapreduce/hadoop -> Hbase >> pig -> Hbase -> mapreduce/hadoop >> >> >> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<[email protected]> wrote: >> >>> This question might be better diagnosed as an Hbase issue, but since it's >>> ultimately a Pig script I want to use, I figure someone on this group >>> could >>> help me out. I tried asking the IRC channel, but I think it was in a >>> lull. >>> >>> My scenario: I want to use Pig to call an HBase store. >>> My installs: Apache Pig version 0.8.0-CDH3B4 --- hbase version: >>> hbase-0.90.1-CDH3B4. >>> My sample script: >>> >>> ----------- >>> A = load 'passwd' using PigStorage(':'); >>> rawDocs = LOAD 'hbase://daniel_product' >>> USING >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1'); >>> vals = foreach rawDocs generate $0 as val; >>> dump vals; >>> store vals into 'daniel.out'; >>> ----------- >>> >>> I am consistently getting a >>> Failed Jobs: >>> JobId Alias Feature Message Outputs >>> N/A rawDocs,vals MAP_ONLY Message: >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed >>> out >>> trying to locate root region >>> at >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280) >>> >>> >>> Googling shows me similar issues: >>> >>> >>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig >>> >>> My current understanding is that somewhere in the interaction between >>> Pig, >>> Hadoop, HBase, and Zookeper, there is a configuration file that needs to >>> be >>> included in a classpath or a configuration directory somewhere. I have >>> tried various combinations of making hadoop aware of Hbase and >>> vice-versa. >>> I have tried ZK running on its own, and also managed by HBase. >>> >>> Can someone explain the dependencies here? Any insight as to what I am >>> missing? What would your diagnosis of the above message be? >>> >>> thanks, >>> daniel >>> >>> >>> >>> >> >
