Pig 8 distribution or Pig 8 from svn? You want the latter (soon-to-be-Pig 0.8.1)
D On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp <[email protected]> wrote: > Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is > configured, but my pig script hangs after connecting to zookeeper... my map > reduce job doesn't get scheduled and the process looks frozen. Some debug > output: > > 2011-03-25 15:51:07,344 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged MR job 285 into MR job 282 > 2011-03-25 15:51:07,344 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged MR job 293 into MR job 282 > 2011-03-25 15:51:07,344 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged MR job 313 into MR job 282 > 2011-03-25 15:51:07,345 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Requested parallelism of splitter: -1 > 2011-03-25 15:51:07,345 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 3 map-reduce splittees. > 2011-03-25 15:51:07,345 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 3 out of total 4 MR operators. > 2011-03-25 15:51:07,345 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 8 > 2011-03-25 15:51:07,423 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added > to the job > 2011-03-25 15:51:07,434 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage - > Pig Internal storage in use > 2011-03-25 15:51:11,014 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up multi store job > 2011-03-25 15:51:11,021 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0 > 2011-03-25 15:51:11,022 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Neither PARALLEL nor default parallelism is set for this job. Setting > number of reducers to 1 > 2011-03-25 15:51:11,103 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2011-03-25 15:51:11,504 [Thread-3] DEBUG > org.apache.pig.impl.io.InterStorage - Pig Internal storage in use > 2011-03-25 15:51:11,611 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > > [snipped] ... > > 2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO > org.apache.zookeeper.ClientCnxn - Attempting connection to server > 10.202.61.184:2181 > 2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO > org.apache.zookeeper.ClientCnxn - Priming connection to > java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote= > 10.202.61.184:2181] > 2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO > org.apache.zookeeper.ClientCnxn - Server connection successful > > I found a few threads about people having problems connecting to hbase > through zookeeper due to misconfiguration / network issues but don't see any > where it claims to connect successfully and then hangs... weird. > > -- > Jameson Lopp > Software Engineer > Bronto Software, Inc. > > On 03/25/2011 12:06 PM, Bill Graham wrote: > >> The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see >> PIG-1680). The Pig 0.8.0 release requires< 0.89 though so you should >> focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1 >> if possible. >> >> On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]> wrote: >> >>> Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig >>> 0.8 >>> from August 2010 and from trunk on March 25 2011. Do I need to use an >>> older >>> version? >>> >>> My pig script is trying to load from hbase via this command: >>> data = LOAD 'hbase://track' USING >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip >>> open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray, >>> os:chararray); >>> >>> But the job fails trying to load the data: >>> Input(s): >>> Failed to read data from "hbase://track" >>> >>> When I look at my map reduce job, it fails every time with a >>> ClassNotFoundException: >>> java.io.IOException: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hbase.mapreduce.TableSplit >>> at >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197) >>> at >>> >>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) >>> at >>> >>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hbase.mapreduce.TableSplit >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:248) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:247) >>> at >>> >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907) >>> at >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185) >>> ... 5 more >>> >>> Now, perhaps this issue is better suited for a hadoop / map reduce / >>> cloudera mailing list, but every node in my hadoop cluster has >>> /usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the >>> TableSplit class... so it seems to me that it should have no problem >>> loading >>> it. >>> >>> I've run out of ideas at this point - anyone have suggestions? Thanks! >>> -- >>> Jameson Lopp >>> Software Engineer >>> Bronto Software, Inc. >>> >>> >
