Re: trying to read from hbase

Dmitriy Ryaboy Tue, 29 Mar 2011 09:48:08 -0700

There's something odd about this jar list.
You said you are running hbase 91, yet you register a cloudera hbase 20.3
jar. You are also registering an ancient zookeeper jar. It doesn't sound
like you are actually running either hbase 91, or Pig 8 from the tip of the
svn branch.


D

On Tue, Mar 29, 2011 at 6:34 AM, Jameson Lopp <[email protected]> wrote:

> Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
> though I'm not sure why this was required. I resolved the Class Not Found
> errors by manually registering the jars in my Pig script:
>
> REGISTER /path/to/pig_0.8/piggybank.jar;
> REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
> REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
> REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar
>
> We had these jars placed in the hadoop /lib directory on all of our hadoop
> machines, and thus figured that they would get loaded for the map reduce
> jobs. Apparently this is not the case...
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
> On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:
>
>> Pig 8 distribution or Pig 8 from svn?
>> You want the latter (soon-to-be-Pig 0.8.1)
>>
>> D
>>
>> On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<[email protected]>  wrote:
>>
>>  Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
>>> configured, but my pig script hangs after connecting to zookeeper... my
>>> map
>>> reduce job doesn't get scheduled and the process looks frozen. Some debug
>>> output:
>>>
>>> 2011-03-25 15:51:07,344 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Merged MR job 285 into MR job 282
>>> 2011-03-25 15:51:07,344 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Merged MR job 293 into MR job 282
>>> 2011-03-25 15:51:07,344 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Merged MR job 313 into MR job 282
>>> 2011-03-25 15:51:07,345 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Requested parallelism of splitter: -1
>>> 2011-03-25 15:51:07,345 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Merged 3 map-reduce splittees.
>>> 2011-03-25 15:51:07,345 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - Merged 3 out of total 4 MR operators.
>>> 2011-03-25 15:51:07,345 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - MR plan size after optimization: 8
>>> 2011-03-25 15:51:07,423 [main] INFO
>>>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>>> added
>>> to the job
>>> 2011-03-25 15:51:07,434 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>>> 0.3
>>> 2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage
>>> -
>>> Pig Internal storage in use
>>> 2011-03-25 15:51:11,014 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - Setting up multi store job
>>> 2011-03-25 15:51:11,021 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
>>> 2011-03-25 15:51:11,022 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - Neither PARALLEL nor default parallelism is set for this job. Setting
>>> number of reducers to 1
>>> 2011-03-25 15:51:11,103 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 1 map-reduce job(s) waiting for submission.
>>> 2011-03-25 15:51:11,504 [Thread-3] DEBUG
>>> org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
>>> 2011-03-25 15:51:11,611 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 0% complete
>>>
>>> [snipped] ...
>>>
>>> 2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
>>>  org.apache.zookeeper.ClientCnxn - Attempting connection to server
>>> 10.202.61.184:2181
>>> 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
>>>  org.apache.zookeeper.ClientCnxn - Priming connection to
>>> java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
>>> remote=
>>> 10.202.61.184:2181]
>>> 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
>>>  org.apache.zookeeper.ClientCnxn - Server connection successful
>>>
>>> I found a few threads about people having problems connecting to hbase
>>> through zookeeper due to misconfiguration / network issues but don't see
>>> any
>>> where it claims to connect successfully and then hangs... weird.
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc.
>>>
>>> On 03/25/2011 12:06 PM, Bill Graham wrote:
>>>
>>>  The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
>>>> PIG-1680). The Pig 0.8.0 release requires<   0.89 though so you should
>>>> focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
>>>> if possible.
>>>>
>>>> On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>
>>>> wrote:
>>>>
>>>>  Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
>>>>> 0.8
>>>>> from August 2010 and from trunk on March 25 2011. Do I need to use an
>>>>> older
>>>>> version?
>>>>>
>>>>> My pig script is trying to load from hbase via this command:
>>>>>        data = LOAD 'hbase://track' USING
>>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
>>>>> open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
>>>>> os:chararray);
>>>>>
>>>>> But the job fails trying to load the data:
>>>>>        Input(s):
>>>>>        Failed to read data from "hbase://track"
>>>>>
>>>>> When I look at my map reduce job, it fails every time with a
>>>>> ClassNotFoundException:
>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>>>        at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
>>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>        at java.lang.Class.forName0(Native Method)
>>>>>        at java.lang.Class.forName(Class.java:247)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
>>>>>        ... 5 more
>>>>>
>>>>> Now, perhaps this issue is better suited for a hadoop / map reduce /
>>>>> cloudera mailing list, but every node in my hadoop cluster has
>>>>> /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
>>>>> TableSplit class... so it seems to me that it should have no problem
>>>>> loading
>>>>> it.
>>>>>
>>>>> I've run out of ideas at this point - anyone have suggestions? Thanks!
>>>>> --
>>>>> Jameson Lopp
>>>>> Software Engineer
>>>>> Bronto Software, Inc.
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: trying to read from hbase

Reply via email to