This is the best documentation we've found for how to setup hbase on the hadoop cluster.
Basically add the jars and hbase-conf to the HADOOP_CLASSPATH. http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description On Nov 22, 2010, at 12:54 PM, Corbin Hoenes wrote: > Yes this was the problem. > > I think HBaseStorage class is fine. I just needed to configure our hadoop > cluster to "talk" to hbase correctly...if I were writing a java MR job I have > to do the same thing. > > Some better documentation / example on how to use HBaseStorage is all we need. > > On Nov 22, 2010, at 12:10 PM, Dmitriy Ryaboy wrote: > >> Why is it connecting to localhost? >> Sounds like you don't have the appropriate config files on the path. >> Hm, maybe we should serialize those in the constructor so that you don't >> have to have them on the JT classpath (I have them on the JT classpath so >> this never came up). Can you confirm that this is the problem? >> >> D >> >> On Fri, Nov 19, 2010 at 10:33 PM, Corbin Hoenes <[email protected]> wrote: >> >>> Hey Jeff, >>> >>> It wasn't starting a job but I got a bit further by registering the pig8 >>> jar in my pig script. It seemed to have a bunch of dependencies on google >>> common collections; zookeeper etc... built into that jar. >>> >>> Now I am seeing this in the web ui logs: >>> 2010-11-19 23:19:44,200 INFO org.apache.zookeeper.ClientCnxn: Attempting >>> connection to server localhost/127.0.0.1:2181 >>> 2010-11-19 23:19:44,201 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x0 to sun.nio.ch.selectionkeyi...@65efb4be >>> java.net.ConnectException: Connection refused >>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>> at >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:885) >>> 2010-11-19 23:19:44,201 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown input >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) >>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >>> 2010-11-19 23:19:44,201 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown output >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) >>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >>> 2010-11-19 23:19:44,303 WARN >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase >>> -- check quorum servers, currently=localhost:2181 >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase >>> Looks like it doesn't know where my hbase/conf/hbase-site.xml file is? Not >>> sure how would this get passed to the HBaseStorage class? >>> >>> On Nov 19, 2010, at 5:09 PM, Jeff Zhang wrote: >>> >>>> Does the mapreduce job start ? Could you check the logs on hadoop side ? >>>> >>>> >>>> On Sat, Nov 20, 2010 at 7:56 AM, Corbin Hoenes <[email protected]> wrote: >>>>> We are trying to use the HBaseStorage LoadFunc in pig 0.8 and getting an >>> exception. >>>>> >>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable >>> to open iterator for alias raw >>>>> at org.apache.pig.PigServer.openIterator(PigServer.java:754) >>>>> at >>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) >>>>> at >>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) >>>>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) >>>>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) >>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) >>>>> at org.apache.pig.Main.run(Main.java:465) >>>>> at org.apache.pig.Main.main(Main.java:107) >>>>> Caused by: java.io.IOException: Couldn't retrieve job. >>>>> at org.apache.pig.PigServer.store(PigServer.java:818) >>>>> at org.apache.pig.PigServer.openIterator(PigServer.java:728) >>>>> ... 7 more >>>>> >>>>> >>>>> Other jobs seem to work. >>>>> >>>>> What are the requirements for getting hbase storage to work? >>>>> >>>>> This is what I am doing: >>>>> 1 - added hbase config and hadoop config to my PIG_CLASSPATH >>>>> 2 - pig this script: >>>>> >>>>> REGISTER ../lib/hbase-0.20.6.jar >>>>> >>>>> raw = LOAD 'hbase://piggytest' USING >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('content:field1 >>> anchor:field1a anchor:field2a') as (content_field1, anchor_field1a, >>> anchor_field2a); >>>>> >>>>> dump raw; >>>>> >>>>> --- >>>>> what else am I missing? >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>> >>> >
