Re: trying to read from hbase

Dmitriy Ryaboy Fri, 25 Mar 2011 13:54:25 -0700

Pig 8 distribution or Pig 8 from svn?
You want the latter (soon-to-be-Pig 0.8.1)


D

On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp <[email protected]> wrote:

> Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
> configured, but my pig script hangs after connecting to zookeeper... my map
> reduce job doesn't get scheduled and the process looks frozen. Some debug
> output:
>
> 2011-03-25 15:51:07,344 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged MR job 285 into MR job 282
> 2011-03-25 15:51:07,344 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged MR job 293 into MR job 282
> 2011-03-25 15:51:07,344 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged MR job 313 into MR job 282
> 2011-03-25 15:51:07,345 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Requested parallelism of splitter: -1
> 2011-03-25 15:51:07,345 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged 3 map-reduce splittees.
> 2011-03-25 15:51:07,345 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged 3 out of total 4 MR operators.
> 2011-03-25 15:51:07,345 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 8
> 2011-03-25 15:51:07,423 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2011-03-25 15:51:07,434 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage -
> Pig Internal storage in use
> 2011-03-25 15:51:11,014 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up multi store job
> 2011-03-25 15:51:11,021 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
> 2011-03-25 15:51:11,022 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Neither PARALLEL nor default parallelism is set for this job. Setting
> number of reducers to 1
> 2011-03-25 15:51:11,103 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-03-25 15:51:11,504 [Thread-3] DEBUG
> org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
> 2011-03-25 15:51:11,611 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
>
> [snipped] ...
>
> 2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
>  org.apache.zookeeper.ClientCnxn - Attempting connection to server
> 10.202.61.184:2181
> 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
>  org.apache.zookeeper.ClientCnxn - Priming connection to
> java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote=
> 10.202.61.184:2181]
> 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
>  org.apache.zookeeper.ClientCnxn - Server connection successful
>
> I found a few threads about people having problems connecting to hbase
> through zookeeper due to misconfiguration / network issues but don't see any
> where it claims to connect successfully and then hangs... weird.
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
> On 03/25/2011 12:06 PM, Bill Graham wrote:
>
>> The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
>> PIG-1680). The Pig 0.8.0 release requires<  0.89 though so you should
>> focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
>> if possible.
>>
>> On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>  wrote:
>>
>>> Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
>>> 0.8
>>> from August 2010 and from trunk on March 25 2011. Do I need to use an
>>> older
>>> version?
>>>
>>> My pig script is trying to load from hbase via this command:
>>>        data = LOAD 'hbase://track' USING
>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
>>> open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
>>> os:chararray);
>>>
>>> But the job fails trying to load the data:
>>>        Input(s):
>>>        Failed to read data from "hbase://track"
>>>
>>> When I look at my map reduce job, it fails every time with a
>>> ClassNotFoundException:
>>> java.io.IOException: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>        at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
>>>        at
>>>
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>>        at
>>>
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hbase.mapreduce.TableSplit
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>        at java.lang.Class.forName0(Native Method)
>>>        at java.lang.Class.forName(Class.java:247)
>>>        at
>>>
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
>>>        at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
>>>        ... 5 more
>>>
>>> Now, perhaps this issue is better suited for a hadoop / map reduce /
>>> cloudera mailing list, but every node in my hadoop cluster has
>>> /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
>>> TableSplit class... so it seems to me that it should have no problem
>>> loading
>>> it.
>>>
>>> I've run out of ideas at this point - anyone have suggestions? Thanks!
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc.
>>>
>>>
>

Re: trying to read from hbase

Reply via email to