Re: trying to read from hbase

Dmitriy Ryaboy Wed, 30 Mar 2011 00:53:22 -0700

Ah, ok. The reason I was surprised is that if you are using 91 and latest
0.8, the hbaseStorage code in Pig is supposed to auto-register the hbase,
zookeeper, and google-collections jars, so you won't have to do that.


fwiw, 91 has been MUCH more stable for us than any of the 20 releases. The
upgrade is worth it.


D

On Tue, Mar 29, 2011 at 12:08 PM, Jameson Lopp <[email protected]> wrote:

> You're correct - I didn't mention that we have several environments.
> Running hbase 0.20 in production and upgraded to 0.91 in development, but
> they ended up rolling back the upgrade due to other issues. My point is that
> it looks like the class not found errors were unrelated to version
> incompatibilities - once I register the appropriate jars in my pig script,
> the MR jobs run.
>
>
> On 03/29/2011 12:47 PM, Dmitriy Ryaboy wrote:
>
>> There's something odd about this jar list.
>> You said you are running hbase 91, yet you register a cloudera hbase 20.3
>> jar. You are also registering an ancient zookeeper jar. It doesn't sound
>> like you are actually running either hbase 91, or Pig 8 from the tip of
>> the
>> svn branch.
>>
>> D
>>
>> On Tue, Mar 29, 2011 at 6:34 AM, Jameson Lopp<[email protected]>  wrote:
>>
>>  Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
>>> though I'm not sure why this was required. I resolved the Class Not Found
>>> errors by manually registering the jars in my Pig script:
>>>
>>> REGISTER /path/to/pig_0.8/piggybank.jar;
>>> REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
>>> REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
>>> REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar
>>>
>>> We had these jars placed in the hadoop /lib directory on all of our
>>> hadoop
>>> machines, and thus figured that they would get loaded for the map reduce
>>> jobs. Apparently this is not the case...
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc.
>>>
>>> On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:
>>>
>>>  Pig 8 distribution or Pig 8 from svn?
>>>> You want the latter (soon-to-be-Pig 0.8.1)
>>>>
>>>> D
>>>>
>>>> On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<[email protected]>
>>>> wrote:
>>>>
>>>>  Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything
>>>> is
>>>>
>>>>> configured, but my pig script hangs after connecting to zookeeper... my
>>>>> map
>>>>> reduce job doesn't get scheduled and the process looks frozen. Some
>>>>> debug
>>>>> output:
>>>>>
>>>>> 2011-03-25 15:51:07,344 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Merged MR job 285 into MR job 282
>>>>> 2011-03-25 15:51:07,344 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Merged MR job 293 into MR job 282
>>>>> 2011-03-25 15:51:07,344 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Merged MR job 313 into MR job 282
>>>>> 2011-03-25 15:51:07,345 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Requested parallelism of splitter: -1
>>>>> 2011-03-25 15:51:07,345 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Merged 3 map-reduce splittees.
>>>>> 2011-03-25 15:51:07,345 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - Merged 3 out of total 4 MR operators.
>>>>> 2011-03-25 15:51:07,345 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - MR plan size after optimization: 8
>>>>> 2011-03-25 15:51:07,423 [main] INFO
>>>>>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>>>>> added
>>>>> to the job
>>>>> 2011-03-25 15:51:07,434 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>>>>> 0.3
>>>>> 2011-03-25 15:51:11,014 [main] DEBUG
>>>>> org.apache.pig.impl.io.InterStorage
>>>>> -
>>>>> Pig Internal storage in use
>>>>> 2011-03-25 15:51:11,014 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - Setting up multi store job
>>>>> 2011-03-25 15:51:11,021 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
>>>>> 2011-03-25 15:51:11,022 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - Neither PARALLEL nor default parallelism is set for this job. Setting
>>>>> number of reducers to 1
>>>>> 2011-03-25 15:51:11,103 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - 1 map-reduce job(s) waiting for submission.
>>>>> 2011-03-25 15:51:11,504 [Thread-3] DEBUG
>>>>> org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
>>>>> 2011-03-25 15:51:11,611 [main] INFO
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - 0% complete
>>>>>
>>>>> [snipped] ...
>>>>>
>>>>> 2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
>>>>>  org.apache.zookeeper.ClientCnxn - Attempting connection to server
>>>>> 10.202.61.184:2181
>>>>> 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
>>>>>  org.apache.zookeeper.ClientCnxn - Priming connection to
>>>>> java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
>>>>> remote=
>>>>> 10.202.61.184:2181]
>>>>> 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
>>>>>  org.apache.zookeeper.ClientCnxn - Server connection successful
>>>>>
>>>>> I found a few threads about people having problems connecting to hbase
>>>>> through zookeeper due to misconfiguration / network issues but don't
>>>>> see
>>>>> any
>>>>> where it claims to connect successfully and then hangs... weird.
>>>>>
>>>>> --
>>>>> Jameson Lopp
>>>>> Software Engineer
>>>>> Bronto Software, Inc.
>>>>>
>>>>> On 03/25/2011 12:06 PM, Bill Graham wrote:
>>>>>
>>>>>  The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
>>>>>
>>>>>> PIG-1680). The Pig 0.8.0 release requires<    0.89 though so you
>>>>>> should
>>>>>> focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
>>>>>> if possible.
>>>>>>
>>>>>> On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>  Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with
>>>>>> Pig
>>>>>>
>>>>>>> 0.8
>>>>>>> from August 2010 and from trunk on March 25 2011. Do I need to use an
>>>>>>> older
>>>>>>> version?
>>>>>>>
>>>>>>> My pig script is trying to load from hbase via this command:
>>>>>>>        data = LOAD 'hbase://track' USING
>>>>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser
>>>>>>> open:ip
>>>>>>> open:os', '-caching 1000') as (browser:chararray,
>>>>>>> ipAddress:chararray,
>>>>>>> os:chararray);
>>>>>>>
>>>>>>> But the job fails trying to load the data:
>>>>>>>        Input(s):
>>>>>>>        Failed to read data from "hbase://track"
>>>>>>>
>>>>>>> When I look at my map reduce job, it fails every time with a
>>>>>>> ClassNotFoundException:
>>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>>>>>        at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
>>>>>>>        at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>>>>>        at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
>>>>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>>>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>        at
>>>>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>>        at java.lang.Class.forName0(Native Method)
>>>>>>>        at java.lang.Class.forName(Class.java:247)
>>>>>>>        at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
>>>>>>>        at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
>>>>>>>        ... 5 more
>>>>>>>
>>>>>>> Now, perhaps this issue is better suited for a hadoop / map reduce /
>>>>>>> cloudera mailing list, but every node in my hadoop cluster has
>>>>>>> /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
>>>>>>> TableSplit class... so it seems to me that it should have no problem
>>>>>>> loading
>>>>>>> it.
>>>>>>>
>>>>>>> I've run out of ideas at this point - anyone have suggestions?
>>>>>>> Thanks!
>>>>>>> --
>>>>>>> Jameson Lopp
>>>>>>> Software Engineer
>>>>>>> Bronto Software, Inc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: trying to read from hbase

Reply via email to