We still haven't cracked this but bit more info (HBase 0.95; Pig 0.11):
The script below runs fine in a few seconds using Pig in local mode but with
Pig in MR mode it sometimes works rapidly but usually takes 40 minutes to an
hour.
--hbaseuploadtest.pig
register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
register /opt/hbase/hbase-trunk/lib/guava-r09.jar
register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
(mid : chararray, hid : chararray, mf : chararray, mt : chararray, mind :
chararray, mimd : chararray, mst : chararray );
dump raw_data;
STORE raw_data INTO 'hbase://hbaseuploadtest' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf info:mt
info:mind info:mimd info:mst);
i.e.
[hadoop1@namenode hadoop-1.0.2]$ pig -x local
../pig-scripts/hbaseuploadtest.pig
WORKS EVERY TIME!!
But
[hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
../pig-scripts/hbaseuploadtest.pig
Sometimes (but rarely) runs in under a minute, often takes more than 40
minutes to get to 50% but then completes to 100% in seconds. The dataset is
very small.
Note that the dump of raw_data works in both cases. However the STORE
command causes the MR job to stall and the job setup task shows the
following errors:
Task attempt_201204240854_0006_m_000002_0 failed to report status for 602
seconds. Killing!
Task attempt_201204240854_0006_m_000002_1 failed to report status for 601
seconds. Killing!
And task log shows the following stream of errors:
2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=localhost:2181 sessionTimeout=180000
watcher=hconnection 0x5567d7fb
2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /127.0.0.1:2181
2012-04-24 11:57:27,443 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:27,443 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
a:286)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2012-04-24 11:57:27,445 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 6846@slave2
2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /127.0.0.1:2181
2012-04-24 11:57:27,552 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:27,552 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
a:286)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2012-04-24 11:57:27,553 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
Sleeping 2000ms before retry #1...
2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server localhost/127.0.0.1:2181
2012-04-24 11:57:28,653 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:28,653 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused etc etc
Any ideas? Anyone else out there successfully running Pig 0.11
HBaseStorage() against HBase 0.95?
Thanks,
Royston
-----Original Message-----
From: Dmitriy Ryaboy [mailto:[email protected]]
Sent: 20 April 2012 00:03
To: [email protected]
Subject: Re: HBaseStorage not working
Nothing significant changed in Pig trunk, so I am guessing HBase changed
something; you are more likely to get help from them (they should at least
be able to point at APIs that changed and are likely to cause this sort of
thing).
You might also want to check if any of the started MR jobs have anything
interesting in their task logs.
D
On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
<[email protected]> wrote:
> Does HBaseStorage work with HBase 0.95?
>
>
>
> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> 0.95 and Pig 0.11 (built from source):
>
>
>
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>
>
>
>
>
> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> (
>
> ID:chararray,
>
> hp:chararray,
>
> pf:chararray,
>
> gz:chararray,
>
> hid:chararray,
>
> hst:chararray,
>
> mgz:chararray,
>
> gg:chararray,
>
> epc:chararray );
>
>
>
> STORE tbl1 INTO 'hbase://sse.tbl1'
>
> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>
>
>
> The job output (using either Grunt or PigServer makes no difference)
> shows the family:descriptors being added by HBaseStorage then starts
> up the MR job which (after a long pause) reports:
>
> ------------
>
> Input(s):
>
> Failed to read data from
> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>
>
>
> Output(s):
>
> Failed to produce result in "hbase://sse.tbl1"
>
>
>
>
>
> INFO mapReduceLayer.MapReduceLauncher: Failed!
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hp
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:pf
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:gz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hid
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hst
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:mgz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:gg
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:epc
>
> ------------
>
>
>
> The "Failed to read" is misleading I think because dump tbl1; in place
> of the store works fine.
>
>
>
> I get nothing in the HBase logs and nothing in the Pig log.
>
>
>
> HBase works fine from the shell and can read and write to the table.
> Pig works fine in and out of HDFS on CSVs.
>
>
>
> Any ideas?
>
>
>
> Royston
>
>
>