Re: trying to read from hbase

Jameson Lopp Fri, 25 Mar 2011 13:02:57 -0700

Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is configured, but my pigscript hangs after connecting to zookeeper... my map reduce job doesn't get scheduled and theprocess looks frozen. Some debug output:

2011-03-25 15:51:07,344 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 285into MR job 2822011-03-25 15:51:07,344 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 293into MR job 2822011-03-25 15:51:07,344 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 313into MR job 2822011-03-25 15:51:07,345 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Requestedparallelism of splitter: -12011-03-25 15:51:07,345 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3map-reduce splittees.2011-03-25 15:51:07,345 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3 out oftotal 4 MR operators.2011-03-25 15:51:07,345 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan sizeafter optimization: 82011-03-25 15:51:07,423 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settingsare added to the job2011-03-25 15:51:07,434 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig 
Internal storage in use

2011-03-25 15:51:11,014 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multistore job2011-03-25 15:51:11,021 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=02011-03-25 15:51:11,022 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLELnor default parallelism is set for this job. Setting number of reducers to 12011-03-25 15:51:11,103 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s)waiting for submission.2011-03-25 15:51:11,504 [Thread-3] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storagein use2011-03-25 15:51:11,611 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete


[snipped] ...

2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Attemptingconnection to server 10.202.61.184:21812011-03-25 15:47:08,625 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Primingconnection to java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote=10.202.61.184:2181]2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Serverconnection successful

I found a few threads about people having problems connecting to hbase through zookeeper due tomisconfiguration / network issues but don't see any where it claims to connect successfully and thenhangs... weird.


--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 12:06 PM, Bill Graham wrote:

The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
PIG-1680). The Pig 0.8.0 release requires<  0.89 though so you should
focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
if possible.

On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>  wrote:

Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8
from August 2010 and from trunk on March 25 2011. Do I need to use an older
version?

My pig script is trying to load from hbase via this command:
        data = LOAD 'hbase://track' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
os:chararray);

But the job fails trying to load the data:
        Input(s):
        Failed to read data from "hbase://track"

When I look at my map reduce job, it fails every time with a
ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
        ... 5 more

Now, perhaps this issue is better suited for a hadoop / map reduce /
cloudera mailing list, but every node in my hadoop cluster has
/usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
TableSplit class... so it seems to me that it should have no problem loading
it.

I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Re: trying to read from hbase

Reply via email to