Re: trying to read from hbase

Jameson Lopp Tue, 29 Mar 2011 06:35:19 -0700

Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working though I'm not sure whythis was required. I resolved the Class Not Found errors by manually registering the jars in my Pigscript:


REGISTER /path/to/pig_0.8/piggybank.jar;
REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

We had these jars placed in the hadoop /lib directory on all of our hadoop machines, and thusfigured that they would get loaded for the map reduce jobs. Apparently this is not the case...


--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:

Pig 8 distribution or Pig 8 from svn?
You want the latter (soon-to-be-Pig 0.8.1)

D

On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<[email protected]>  wrote:

Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is
configured, but my pig script hangs after connecting to zookeeper... my map
reduce job doesn't get scheduled and the process looks frozen. Some debug
output:

2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 285 into MR job 282
2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 293 into MR job 282
2011-03-25 15:51:07,344 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 313 into MR job 282
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Requested parallelism of splitter: -1
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 3 map-reduce splittees.
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 3 out of total 4 MR operators.
2011-03-25 15:51:07,345 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 8
2011-03-25 15:51:07,423 [main] INFO
  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2011-03-25 15:51:07,434 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage -
Pig Internal storage in use
2011-03-25 15:51:11,014 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2011-03-25 15:51:11,021 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
2011-03-25 15:51:11,022 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism is set for this job. Setting
number of reducers to 1
2011-03-25 15:51:11,103 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-03-25 15:51:11,504 [Thread-3] DEBUG
org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
2011-03-25 15:51:11,611 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

[snipped] ...

2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Attempting connection to server
10.202.61.184:2181
2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Priming connection to
java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767remote=
10.202.61.184:2181]
2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Server connection successful

I found a few threads about people having problems connecting to hbase
through zookeeper due to misconfiguration / network issues but don't see any
where it claims to connect successfully and then hangs... weird.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 12:06 PM, Bill Graham wrote:

The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see
PIG-1680). The Pig 0.8.0 release requires<   0.89 though so you should
focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
if possible.

On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>   wrote:

Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig
0.8
from August 2010 and from trunk on March 25 2011. Do I need to use an
older
version?

My pig script is trying to load from hbase via this command:
        data = LOAD 'hbase://track' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
os:chararray);

But the job fails trying to load the data:
        Input(s):
        Failed to read data from "hbase://track"

When I look at my map reduce job, it fails every time with a
ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
        at

org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at

org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at

org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
        at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
        ... 5 more

Now, perhaps this issue is better suited for a hadoop / map reduce /
cloudera mailing list, but every node in my hadoop cluster has
/usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
TableSplit class... so it seems to me that it should have no problem
loading
it.

I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Re: trying to read from hbase

Reply via email to