Ah, ok. The reason I was surprised is that if you are using 91 and latest 0.8, the hbaseStorage code in Pig is supposed to auto-register the hbase, zookeeper, and google-collections jars, so you won't have to do that.
fwiw, 91 has been MUCH more stable for us than any of the 20 releases. The upgrade is worth it. D On Tue, Mar 29, 2011 at 12:08 PM, Jameson Lopp <[email protected]> wrote: > You're correct - I didn't mention that we have several environments. > Running hbase 0.20 in production and upgraded to 0.91 in development, but > they ended up rolling back the upgrade due to other issues. My point is that > it looks like the class not found errors were unrelated to version > incompatibilities - once I register the appropriate jars in my pig script, > the MR jobs run. > > > On 03/29/2011 12:47 PM, Dmitriy Ryaboy wrote: > >> There's something odd about this jar list. >> You said you are running hbase 91, yet you register a cloudera hbase 20.3 >> jar. You are also registering an ancient zookeeper jar. It doesn't sound >> like you are actually running either hbase 91, or Pig 8 from the tip of >> the >> svn branch. >> >> D >> >> On Tue, Mar 29, 2011 at 6:34 AM, Jameson Lopp<[email protected]> wrote: >> >> Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working >>> though I'm not sure why this was required. I resolved the Class Not Found >>> errors by manually registering the jars in my Pig script: >>> >>> REGISTER /path/to/pig_0.8/piggybank.jar; >>> REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar; >>> REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar; >>> REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar >>> >>> We had these jars placed in the hadoop /lib directory on all of our >>> hadoop >>> machines, and thus figured that they would get loaded for the map reduce >>> jobs. Apparently this is not the case... >>> >>> >>> -- >>> Jameson Lopp >>> Software Engineer >>> Bronto Software, Inc. >>> >>> On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote: >>> >>> Pig 8 distribution or Pig 8 from svn? >>>> You want the latter (soon-to-be-Pig 0.8.1) >>>> >>>> D >>>> >>>> On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<[email protected]> >>>> wrote: >>>> >>>> Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything >>>> is >>>> >>>>> configured, but my pig script hangs after connecting to zookeeper... my >>>>> map >>>>> reduce job doesn't get scheduled and the process looks frozen. Some >>>>> debug >>>>> output: >>>>> >>>>> 2011-03-25 15:51:07,344 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Merged MR job 285 into MR job 282 >>>>> 2011-03-25 15:51:07,344 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Merged MR job 293 into MR job 282 >>>>> 2011-03-25 15:51:07,344 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Merged MR job 313 into MR job 282 >>>>> 2011-03-25 15:51:07,345 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Requested parallelism of splitter: -1 >>>>> 2011-03-25 15:51:07,345 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Merged 3 map-reduce splittees. >>>>> 2011-03-25 15:51:07,345 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - Merged 3 out of total 4 MR operators. >>>>> 2011-03-25 15:51:07,345 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>>> - MR plan size after optimization: 8 >>>>> 2011-03-25 15:51:07,423 [main] INFO >>>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are >>>>> added >>>>> to the job >>>>> 2011-03-25 15:51:07,434 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default >>>>> 0.3 >>>>> 2011-03-25 15:51:11,014 [main] DEBUG >>>>> org.apache.pig.impl.io.InterStorage >>>>> - >>>>> Pig Internal storage in use >>>>> 2011-03-25 15:51:11,014 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>> - Setting up multi store job >>>>> 2011-03-25 15:51:11,021 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>> - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0 >>>>> 2011-03-25 15:51:11,022 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>>> - Neither PARALLEL nor default parallelism is set for this job. Setting >>>>> number of reducers to 1 >>>>> 2011-03-25 15:51:11,103 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>> - 1 map-reduce job(s) waiting for submission. >>>>> 2011-03-25 15:51:11,504 [Thread-3] DEBUG >>>>> org.apache.pig.impl.io.InterStorage - Pig Internal storage in use >>>>> 2011-03-25 15:51:11,611 [main] INFO >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>> - 0% complete >>>>> >>>>> [snipped] ... >>>>> >>>>> 2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO >>>>> org.apache.zookeeper.ClientCnxn - Attempting connection to server >>>>> 10.202.61.184:2181 >>>>> 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO >>>>> org.apache.zookeeper.ClientCnxn - Priming connection to >>>>> java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767 >>>>> remote= >>>>> 10.202.61.184:2181] >>>>> 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO >>>>> org.apache.zookeeper.ClientCnxn - Server connection successful >>>>> >>>>> I found a few threads about people having problems connecting to hbase >>>>> through zookeeper due to misconfiguration / network issues but don't >>>>> see >>>>> any >>>>> where it claims to connect successfully and then hangs... weird. >>>>> >>>>> -- >>>>> Jameson Lopp >>>>> Software Engineer >>>>> Bronto Software, Inc. >>>>> >>>>> On 03/25/2011 12:06 PM, Bill Graham wrote: >>>>> >>>>> The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see >>>>> >>>>>> PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you >>>>>> should >>>>>> focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1 >>>>>> if possible. >>>>>> >>>>>> On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]> >>>>>> wrote: >>>>>> >>>>>> Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with >>>>>> Pig >>>>>> >>>>>>> 0.8 >>>>>>> from August 2010 and from trunk on March 25 2011. Do I need to use an >>>>>>> older >>>>>>> version? >>>>>>> >>>>>>> My pig script is trying to load from hbase via this command: >>>>>>> data = LOAD 'hbase://track' USING >>>>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser >>>>>>> open:ip >>>>>>> open:os', '-caching 1000') as (browser:chararray, >>>>>>> ipAddress:chararray, >>>>>>> os:chararray); >>>>>>> >>>>>>> But the job fails trying to load the data: >>>>>>> Input(s): >>>>>>> Failed to read data from "hbase://track" >>>>>>> >>>>>>> When I look at my map reduce job, it fails every time with a >>>>>>> ClassNotFoundException: >>>>>>> java.io.IOException: java.lang.ClassNotFoundException: >>>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit >>>>>>> at >>>>>>> >>>>>>> >>>>>>> >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> >>>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> >>>>>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) >>>>>>> at >>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586) >>>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>> org.apache.hadoop.hbase.mapreduce.TableSplit >>>>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >>>>>>> at >>>>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:248) >>>>>>> at java.lang.Class.forName0(Native Method) >>>>>>> at java.lang.Class.forName(Class.java:247) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> >>>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> >>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185) >>>>>>> ... 5 more >>>>>>> >>>>>>> Now, perhaps this issue is better suited for a hadoop / map reduce / >>>>>>> cloudera mailing list, but every node in my hadoop cluster has >>>>>>> /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the >>>>>>> TableSplit class... so it seems to me that it should have no problem >>>>>>> loading >>>>>>> it. >>>>>>> >>>>>>> I've run out of ideas at this point - anyone have suggestions? >>>>>>> Thanks! >>>>>>> -- >>>>>>> Jameson Lopp >>>>>>> Software Engineer >>>>>>> Bronto Software, Inc. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >> > >
