This is a config/classpath issue, no? At the lowest level, Hadoop MR tasks don't pick up settings from the HBase conf directory unless they're explicitly added to the classpath, usually via hadoop/conf/hadoop-env.sh:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath Perhaps the classpath that's being added to your Java jobs is slightly different? Norbert On Wed, May 2, 2012 at 6:48 AM, Royston Sellman < [email protected]> wrote: > Hi, > > We are still experiencing 40-60 minutes of task failure before our > HBaseStorage jobs run but we think we've narrowed the problem down to a > specific zookeeper issue. > > The HBaseStorage map task only works when it lands on a machine that > actually is running zookeeper server as part of the quorum. It typically > attempts from several different nodes in the cluster, failing repeatedly > before it hits on a zookeeper node. > > Logs show the failing task attempts are trying to connect to the localhost > machine on port 2181 to make a ZooKeeper connection (as part of the > Load/HBaseStorage map task): > > ... > > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening > > socket connection to server /127.0.0.1:2181 > ... > > java.net.ConnectException: Connection refused > ... > > This explains why the job succeeds eventually, as we have a zookeeper > quorum > server running on one of our worker nodes, but not on the other 3. > Therefore, the job fails repeatedly until it is redistributed onto the node > with the ZK server, at which point it succeeds immediately. > > We therefore suspect the issue is in our ZK configuration. Our > hbase-site.xml defines the zookeeper quorum as follows: > > <property> > <name>hbase.zookeeper.quorum</name> > <value>namenode,jobtracker,slave0</value> > </property> > > Therefore, we would expect the tasks to connect to one of those hosts when > attempting a zookeeper connection, however it appears to be attempting to > connect to "localhost" (which is the default). It is as if the hbase > configuration settings here are not used. > > Does anyone have any suggestions as to what might be the cause of this > behaviour? > > Sending this to both lists although it is only Pig HBaseStorage jobs that > suffer this problem on our cluster. HBase Java client jobs work normally. > > Thanks, > Royston > > -----Original Message----- > From: Subir S [mailto:[email protected]] > Sent: 24 April 2012 13:29 > To: [email protected]; [email protected] > Subject: Re: HBaseStorage not working > > Looping HBase group. > > On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman < > [email protected]> wrote: > > > We still haven't cracked this but bit more info (HBase 0.95; Pig 0.11): > > > > The script below runs fine in a few seconds using Pig in local mode > > but with Pig in MR mode it sometimes works rapidly but usually takes > > 40 minutes to an hour. > > > > --hbaseuploadtest.pig > > register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar > > register /opt/hbase/hbase-trunk/lib/guava-r09.jar > > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar > > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar > > raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' > > ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray, > mind : > > chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE > > raw_data INTO 'hbase://hbaseuploadtest' USING > > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf > > info:mt info:mind info:mimd info:mst); > > > > i.e. > > [hadoop1@namenode hadoop-1.0.2]$ pig -x local > > ../pig-scripts/hbaseuploadtest.pig > > WORKS EVERY TIME!! > > But > > [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce > > ../pig-scripts/hbaseuploadtest.pig > > Sometimes (but rarely) runs in under a minute, often takes more than > > 40 minutes to get to 50% but then completes to 100% in seconds. The > > dataset is very small. > > > > Note that the dump of raw_data works in both cases. However the STORE > > command causes the MR job to stall and the job setup task shows the > > following errors: > > Task attempt_201204240854_0006_m_000002_0 failed to report status for > > 602 seconds. Killing! > > Task attempt_201204240854_0006_m_000002_1 failed to report status for > > 601 seconds. Killing! > > > > And task log shows the following stream of errors: > > > > 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: > > Initiating client connection, connectString=localhost:2181 > > sessionTimeout=180000 watcher=hconnection 0x5567d7fb > > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening > > socket connection to server /127.0.0.1:2181 > > 2012-04-24 11:57:27,443 WARN > > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: > > java.lang.SecurityException: Unable to locate a login configuration > > occurred when trying to find JAAS configuration. > > 2012-04-24 11:57:27,443 INFO > > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not > > SASL-authenticate because the default JAAS configuration section 'Client' > > could not be found. If you are not using SASL, you may ignore this. On > > the other hand, if you expected SASL to work, please fix your JAAS > > configuration. > > 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session > > 0x0 for server null, unexpected error, closing socket connection and > > attempting reconnect > > java.net.ConnectException: Connection refused > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > > at > > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN > > IO.jav > > a:286) > > at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > 2012-04-24 11:57:27,445 INFO > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier > > of this process is 6846@slave2 > > 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening > > socket connection to server /127.0.0.1:2181 > > 2012-04-24 11:57:27,552 WARN > > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: > > java.lang.SecurityException: Unable to locate a login configuration > > occurred when trying to find JAAS configuration. > > 2012-04-24 11:57:27,552 INFO > > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not > > SASL-authenticate because the default JAAS configuration section 'Client' > > could not be found. If you are not using SASL, you may ignore this. On > > the other hand, if you expected SASL to work, please fix your JAAS > > configuration. > > 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session > > 0x0 for server null, unexpected error, closing socket connection and > > attempting reconnect > > java.net.ConnectException: Connection refused > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > > at > > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN > > IO.jav > > a:286) > > at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) > > 2012-04-24 11:57:27,553 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > > transient ZooKeeper exception: > > org.apache.zookeeper.KeeperException$ConnectionLossException: > > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid > > 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter: > > Sleeping 2000ms before retry #1... > > 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening > > socket connection to server localhost/127.0.0.1:2181 > > 2012-04-24 11:57:28,653 WARN > > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: > > java.lang.SecurityException: Unable to locate a login configuration > > occurred when trying to find JAAS configuration. > > 2012-04-24 11:57:28,653 INFO > > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not > > SASL-authenticate because the default JAAS configuration section 'Client' > > could not be found. If you are not using SASL, you may ignore this. On > > the other hand, if you expected SASL to work, please fix your JAAS > > configuration. > > 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session > > 0x0 for server null, unexpected error, closing socket connection and > > attempting reconnect > > java.net.ConnectException: Connection refused etc etc > > > > Any ideas? Anyone else out there successfully running Pig 0.11 > > HBaseStorage() against HBase 0.95? > > > > Thanks, > > Royston > > > > > > > > -----Original Message----- > > From: Dmitriy Ryaboy [mailto:[email protected]] > > Sent: 20 April 2012 00:03 > > To: [email protected] > > Subject: Re: HBaseStorage not working > > > > Nothing significant changed in Pig trunk, so I am guessing HBase > > changed something; you are more likely to get help from them (they > > should at least be able to point at APIs that changed and are likely > > to cause this sort of thing). > > > > You might also want to check if any of the started MR jobs have > > anything interesting in their task logs. > > > > D > > > > On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman > > <[email protected]> wrote: > > > Does HBaseStorage work with HBase 0.95? > > > > > > > > > > > > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase > > > 0.95 and Pig 0.11 (built from source): > > > > > > > > > > > > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar > > > > > > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar > > > > > > > > > > > > > > > > > > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) > > > AS ( > > > > > > ID:chararray, > > > > > > hp:chararray, > > > > > > pf:chararray, > > > > > > gz:chararray, > > > > > > hid:chararray, > > > > > > hst:chararray, > > > > > > mgz:chararray, > > > > > > gg:chararray, > > > > > > epc:chararray ); > > > > > > > > > > > > STORE tbl1 INTO 'hbase://sse.tbl1' > > > > > > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp > > > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc'); > > > > > > > > > > > > The job output (using either Grunt or PigServer makes no difference) > > > shows the family:descriptors being added by HBaseStorage then starts > > > up the MR job which (after a long pause) reports: > > > > > > ------------ > > > > > > Input(s): > > > > > > Failed to read data from > > > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv" > > > > > > > > > > > > Output(s): > > > > > > Failed to produce result in "hbase://sse.tbl1" > > > > > > > > > > > > > > > > > > INFO mapReduceLayer.MapReduceLauncher: Failed! > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:hp > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:pf > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:gz > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:hid > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:hst > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:mgz > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:gg > > > > > > INFO hbase.HBaseStorage: Adding family:descriptor filters with > > > values edrp:epc > > > > > > ------------ > > > > > > > > > > > > The "Failed to read" is misleading I think because dump tbl1; in > > > place of the store works fine. > > > > > > > > > > > > I get nothing in the HBase logs and nothing in the Pig log. > > > > > > > > > > > > HBase works fine from the shell and can read and write to the table. > > > Pig works fine in and out of HDFS on CSVs. > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > Royston > > > > > > > > > > > > > > >
