Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is configured, but my pig
script hangs after connecting to zookeeper... my map reduce job doesn't get scheduled and the
process looks frozen. Some debug output:
2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 285
into MR job 282
2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 293
into MR job 282
2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged MR job 313
into MR job 282
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Requested
parallelism of splitter: -1
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3
map-reduce splittees.
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 3 out of
total 4 MR operators.
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size
after optimization: 8
2011-03-25 15:51:07,423 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings
are added to the job
2011-03-25 15:51:07,434 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -
mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig
Internal storage in use
2011-03-25 15:51:11,014 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi
store job
2011-03-25 15:51:11,021 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler -
BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
2011-03-25 15:51:11,022 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL
nor default parallelism is set for this job. Setting number of reducers to 1
2011-03-25 15:51:11,103 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s)
waiting for submission.
2011-03-25 15:51:11,504 [Thread-3] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage
in use
2011-03-25 15:51:11,611 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[snipped] ...
2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Attempting
connection to server 10.202.61.184:2181
2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Priming
connection to java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
remote=10.202.61.184:2181]
2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO org.apache.zookeeper.ClientCnxn - Server
connection successful
I found a few threads about people having problems connecting to hbase through zookeeper due to
misconfiguration / network issues but don't see any where it claims to connect successfully and then
hangs... weird.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.
On 03/25/2011 12:06 PM, Bill Graham wrote:
The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should
focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
if possible.
On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]> wrote:
Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8
from August 2010 and from trunk on March 25 2011. Do I need to use an older
version?
My pig script is trying to load from hbase via this command:
data = LOAD 'hbase://track' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
os:chararray);
But the job fails trying to load the data:
Input(s):
Failed to read data from "hbase://track"
When I look at my map reduce job, it fails every time with a
ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
... 5 more
Now, perhaps this issue is better suited for a hadoop / map reduce /
cloudera mailing list, but every node in my hadoop cluster has
/usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
TableSplit class... so it seems to me that it should have no problem loading
it.
I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.