Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is configured, but my pig script hangs after connecting to zookeeper... my map reduce job doesn't get scheduled and the process looks frozen. Some debug output:

2011-03-25 15:51:07,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 285 into MR job 282 2011-03-25 15:51:07,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 293 into MR job 282 2011-03-25 15:51:07,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 313 into MR job 282 2011-03-25 15:51:07,345 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Requested parallelism of splitter: -1 2011-03-25 15:51:07,345 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3 map-reduce splittees. 2011-03-25 15:51:07,345 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3 out of total 4 MR operators. 2011-03-25 15:51:07,345 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 8 2011-03-25 15:51:07,423 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-03-25 15:51:07,434 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig 
Internal storage in use
2011-03-25 15:51:11,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2011-03-25 15:51:11,021 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0 2011-03-25 15:51:11,022 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1 2011-03-25 15:51:11,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-03-25 15:51:11,504 [Thread-3] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage in use 2011-03-25 15:51:11,611 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete

[snipped] ...

2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Attempting connection to server 10.202.61.184:2181 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Priming connection to java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767 remote=10.202.61.184:2181] 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Server connection successful

I found a few threads about people having problems connecting to hbase through zookeeper due to misconfiguration / network issues but don't see any where it claims to connect successfully and then hangs... weird.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 12:06 PM, Bill Graham wrote:
The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
PIG-1680). The Pig 0.8.0 release requires<  0.89 though so you should
focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
if possible.

On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>  wrote:
Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8
from August 2010 and from trunk on March 25 2011. Do I need to use an older
version?

My pig script is trying to load from hbase via this command:
        data = LOAD 'hbase://track' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
os:chararray);

But the job fails trying to load the data:
        Input(s):
        Failed to read data from "hbase://track"

When I look at my map reduce job, it fails every time with a
ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
        ... 5 more

Now, perhaps this issue is better suited for a hadoop / map reduce /
cloudera mailing list, but every node in my hadoop cluster has
/usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
TableSplit class... so it seems to me that it should have no problem loading
it.

I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.


Reply via email to