Re: trying to read from hbase

Jameson Lopp Tue, 29 Mar 2011 12:09:20 -0700

You're correct - I didn't mention that we have several environments. Running hbase 0.20 inproduction and upgraded to 0.91 in development, but they ended up rolling back the upgrade due toother issues. My point is that it looks like the class not found errors were unrelated to versionincompatibilities - once I register the appropriate jars in my pig script, the MR jobs run.


On 03/29/2011 12:47 PM, Dmitriy Ryaboy wrote:

There's something odd about this jar list.
You said you are running hbase 91, yet you register a cloudera hbase 20.3
jar. You are also registering an ancient zookeeper jar. It doesn't sound
like you are actually running either hbase 91, or Pig 8 from the tip of the
svn branch.


D

On Tue, Mar 29, 2011 at 6:34 AM, Jameson Lopp<[email protected]>  wrote:

Just to follow up: I'm running Pig 0.8 from SVN. I finally got it working
though I'm not sure why this was required. I resolved the Class Not Found
errors by manually registering the jars in my Pig script:

REGISTER /path/to/pig_0.8/piggybank.jar;
REGISTER /path/to/pig_0.8/lib/google-collections-1.0.jar;
REGISTER /path/to/pig_0.8/lib/hbase-0.20.3-1.cloudera.jar;
REGISTER /path/to/pig_0.8/lib/zookeeper-hbase-1329.jar

We had these jars placed in the hadoop /lib directory on all of our hadoop
machines, and thus figured that they would get loaded for the map reduce
jobs. Apparently this is not the case...


--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 04:53 PM, Dmitriy Ryaboy wrote:

Pig 8 distribution or Pig 8 from svn?
You want the latter (soon-to-be-Pig 0.8.1)

D

On Fri, Mar 25, 2011 at 1:02 PM, Jameson Lopp<[email protected]>   wrote:

  Alright, I set up hbase 0.90.1 and pig 0.8.0 and feel like everything is

configured, but my pig script hangs after connecting to zookeeper... my
map
reduce job doesn't get scheduled and the process looks frozen. Some debug
output:

2011-03-25 15:51:07,344 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 285 into MR job 282
2011-03-25 15:51:07,344 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 293 into MR job 282
2011-03-25 15:51:07,344 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 313 into MR job 282
2011-03-25 15:51:07,345 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Requested parallelism of splitter: -1
2011-03-25 15:51:07,345 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 3 map-reduce splittees.
2011-03-25 15:51:07,345 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 3 out of total 4 MR operators.
2011-03-25 15:51:07,345 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 8
2011-03-25 15:51:07,423 [main] INFO
  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added
to the job
2011-03-25 15:51:07,434 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default
0.3
2011-03-25 15:51:11,014 [main] DEBUG org.apache.pig.impl.io.InterStorage
-
Pig Internal storage in use
2011-03-25 15:51:11,014 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2011-03-25 15:51:11,021 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
2011-03-25 15:51:11,022 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism is set for this job. Setting
number of reducers to 1
2011-03-25 15:51:11,103 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-03-25 15:51:11,504 [Thread-3] DEBUG
org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
2011-03-25 15:51:11,611 [main] INFO

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

[snipped] ...

2011-03-25 15:47:08,617 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Attempting connection to server
10.202.61.184:2181
2011-03-25 15:47:08,625 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Priming connection to
java.nio.channels.SocketChannel[connected local=/10.220.25.162:34767
remote=
10.202.61.184:2181]
2011-03-25 15:47:08,627 [Thread-3-SendThread] INFO
  org.apache.zookeeper.ClientCnxn - Server connection successful

I found a few threads about people having problems connecting to hbase
through zookeeper due to misconfiguration / network issues but don't see
any
where it claims to connect successfully and then hangs... weird.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 03/25/2011 12:06 PM, Bill Graham wrote:

  The Pig trunk and Pig 0.8.0 branch both require HBase>= 0.89 (see

PIG-1680). The Pig 0.8.0 release requires<    0.89 though so you should
focus on that version of Pig. Or better yet, upgrade HBase to 0.90.1
if possible.

On Fri, Mar 25, 2011 at 6:59 AM, Jameson Lopp<[email protected]>
wrote:

  Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig

0.8
from August 2010 and from trunk on March 25 2011. Do I need to use an
older
version?

My pig script is trying to load from hbase via this command:
        data = LOAD 'hbase://track' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
open:os', '-caching 1000') as (browser:chararray, ipAddress:chararray,
os:chararray);

But the job fails trying to load the data:
        Input(s):
        Failed to read data from "hbase://track"

When I look at my map reduce job, it fails every time with a
ClassNotFoundException:
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at


org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:197)
        at


org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at


org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:586)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableSplit
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at


org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
        at


org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit.readFields(PigSplit.java:185)
        ... 5 more

Now, perhaps this issue is better suited for a hadoop / map reduce /
cloudera mailing list, but every node in my hadoop cluster has
/usr/local/hadoop/lib/hbase-0.20.3-1.cloudera.jar which includes the
TableSplit class... so it seems to me that it should have no problem
loading
it.

I've run out of ideas at this point - anyone have suggestions? Thanks!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Re: trying to read from hbase

Reply via email to