[jira] [Resolved] (HBASE-10305) Batch update performance drops as the number of regions grows
[ https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi resolved HBASE-10305. -- Resolution: Not A Problem We're running 0.94.10. ASYNC_WAL does work for me. Thanks Lars. Close this issue. Batch update performance drops as the number of regions grows - Key: HBASE-10305 URL: https://issues.apache.org/jira/browse/HBASE-10305 Project: HBase Issue Type: Bug Components: Performance Reporter: Chao Shi In our use case, we use a small number (~5) of proxy programs that read from a queue and batch update to HBase. Our program is multi-threaded and HBase client will batch mutations to each RS. We found we're getting lower TPS when there are more regions. I think the reason is RS syncs HLog for each region. Suppose there is a single region, the batch update will only touch one region and therefore syncs HLog once. And suppose there are 10 regions per server, in RS#multi() it have to process update for each individual region and sync HLog 10 times. Please note that in our scenario, batched mutations usually are independent with each other and need to touch a various number of regions. We are using the 0.94 series, but I think the trunk should have the same problem after a quick look into the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10305) Batch update performance drops as the number of regions grows
[ https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867616#comment-13867616 ] Lars Hofhansl commented on HBASE-10305: --- Great. Maybe this should documented more prominently in the HBase book. Batch update performance drops as the number of regions grows - Key: HBASE-10305 URL: https://issues.apache.org/jira/browse/HBASE-10305 Project: HBase Issue Type: Bug Components: Performance Reporter: Chao Shi In our use case, we use a small number (~5) of proxy programs that read from a queue and batch update to HBase. Our program is multi-threaded and HBase client will batch mutations to each RS. We found we're getting lower TPS when there are more regions. I think the reason is RS syncs HLog for each region. Suppose there is a single region, the batch update will only touch one region and therefore syncs HLog once. And suppose there are 10 regions per server, in RS#multi() it have to process update for each individual region and sync HLog 10 times. Please note that in our scenario, batched mutations usually are independent with each other and need to touch a various number of regions. We are using the 0.94 series, but I think the trunk should have the same problem after a quick look into the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-6581) Build with hadoop.profile=3.0
[ https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Charles updated HBASE-6581: Attachment: HBASE-6581-6.patch HBASE-6581-6.patch rebased to latest trunk. Build with hadoop.profile=3.0 - Key: HBASE-6581 URL: https://issues.apache.org/jira/browse/HBASE-6581 Project: HBase Issue Type: Bug Reporter: Eric Charles Assignee: Eric Charles Fix For: 0.98.1 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581.diff, HBASE-6581.diff Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT instead of 3.0.0-SNAPSHOT in hbase-common). I can provide a patch that would move most of hadoop dependencies in their respective profiles and will define the correct hadoop deps in the 3.0 profile. Please tell me if that's ok to go this way. Thx, Eric [1] $ mvn clean install -Dhadoop.profile=3.0 [INFO] Scanning for projects... [ERROR] The build could not read 3 projects - [Help 1] [ERROR] [ERROR] The project org.apache.hbase:hbase-server:0.95-SNAPSHOT (/d/hbase.svn/hbase-server/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-common:0.95-SNAPSHOT (/d/hbase.svn/hbase-common/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-it:0.95-SNAPSHOT (/d/hbase.svn/hbase-it/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21 [ERROR] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0
[ https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867675#comment-13867675 ] Hadoop QA commented on HBASE-6581: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622372/HBASE-6581-6.patch against trunk revision . ATTACHMENT ID: 12622372 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8384//console This message is automatically generated. Build with hadoop.profile=3.0 - Key: HBASE-6581 URL: https://issues.apache.org/jira/browse/HBASE-6581 Project: HBase Issue Type: Bug Reporter: Eric Charles Assignee: Eric Charles Fix For: 0.98.1 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581.diff, HBASE-6581.diff Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT instead of 3.0.0-SNAPSHOT in hbase-common). I can provide a patch that would move most of hadoop dependencies in their respective profiles and will define the correct hadoop deps in the 3.0 profile. Please tell me if that's ok to go this way. Thx, Eric [1] $ mvn clean install -Dhadoop.profile=3.0 [INFO] Scanning for projects... [ERROR] The build could not read 3 projects - [Help 1] [ERROR] [ERROR] The project org.apache.hbase:hbase-server:0.95-SNAPSHOT (/d/hbase.svn/hbase-server/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-common:0.95-SNAPSHOT (/d/hbase.svn/hbase-common/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-it:0.95-SNAPSHOT (/d/hbase.svn/hbase-it/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21 [ERROR] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10309) Add support to delete empty regions in 0.94.x series
AcCud created HBASE-10309: - Summary: Add support to delete empty regions in 0.94.x series Key: HBASE-10309 URL: https://issues.apache.org/jira/browse/HBASE-10309 Project: HBase Issue Type: New Feature Reporter: AcCud Fix For: 0.94.16 My use case: I have several tables where keys start with a timestamp. Because of this and combined with the fact that I have set a 15 days retention period, after a period of time results empty regions. I am sure that no write will occur in these region. It would be nice to have a tool to delete regions without being necessary to stop the cluster. The easiest way for me is to have a tool that is able to delete all empty regions, but there wouldn't be any problem to specify which region to delete. Something like: deleteRegion tableName region -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9977) Define C interface of HBase Client Asynchronous APIs
[ https://issues.apache.org/jira/browse/HBASE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867782#comment-13867782 ] Cosmin Lehene commented on HBASE-9977: -- [~eclark] Is there a JIRA umbrella for the C++ (core)? It looks like HBASE-10168 is for JNI and HBASE-1015 suggests wrapping Thrift. Define C interface of HBase Client Asynchronous APIs Key: HBASE-9977 URL: https://issues.apache.org/jira/browse/HBASE-9977 Project: HBase Issue Type: Sub-task Components: Client Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.99.0 Attachments: HBASE-9977-0.patch, HBASE-9977-1.patch, HBASE-9977-2.patch, HBASE-9977-3.patch, HBASE-9977-4.patch, HBASE-9977-5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-1015) pure C and C++ client libraries
[ https://issues.apache.org/jira/browse/HBASE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867787#comment-13867787 ] Cosmin Lehene commented on HBASE-1015: -- HBASE-9977 suggests a C++ async client and C sync/async wrappers. Given that HBase talks protobuf natively. Is a native wrapper around Thrift still a goal? pure C and C++ client libraries --- Key: HBASE-1015 URL: https://issues.apache.org/jira/browse/HBASE-1015 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6 Reporter: Andrew Purtell Priority: Minor If via HBASE-794 first class support for talking via Thrift directly to HMaster and HRS is available, then pure C and C++ client libraries are possible. The C client library would wrap a Thrift core. The C++ client library can provide a class hierarchy quite close to o.a.h.h.client and, ideally, identical semantics. It should be just a wrapper around the C API, for economy. Internally to my employer there is a lot of resistance to HBase because many dev teams have a strong C/C++ bias. The real issue however is really client side integration, not a fundamental objection. (What runs server side and how it is managed is a secondary consideration.) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10310) ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
Samir Ahmic created HBASE-10310: --- Summary: ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 WARN zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at
[jira] [Updated] (HBASE-10310) ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic updated HBASE-10310: Attachment: HBASE-10310.patch Here is patch. ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
[jira] [Assigned] (HBASE-10310) ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic reassigned HBASE-10310: --- Assignee: Samir Ahmic ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range
[ https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867936#comment-13867936 ] Jonathan Hsieh commented on HBASE-10123: This is something that would need to wait for 1.0? or is there any chance of this in 0.98 [~apurtell]? Change default ports; move them out of linux ephemeral port range - Key: HBASE-10123 URL: https://issues.apache.org/jira/browse/HBASE-10123 Project: HBase Issue Type: Bug Reporter: stack Our defaults clash w/ the range linux assigns itself for creating come-and-go ephemeral ports; likely in our history we've clashed w/ a random, short-lived process. While easy to change the defaults, we should just ship w/ defaults that make sense. We could host ourselves up into the 7 or 8k range. See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
Neil Ferguson created HBASE-10311: - Summary: Add Scan object to preScannerNext and postScannerNext methods on RegionObserver Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Ferguson updated HBASE-10311: -- Attachment: HBASE-10311.patch Patch attached Add Scan object to preScannerNext and postScannerNext methods on RegionObserver --- Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson Attachments: HBASE-10311.patch I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867960#comment-13867960 ] Neil Ferguson commented on HBASE-10311: --- Just made this patch, then realised that I can accomplish what I want by mapping RegionScanner - Scan in postScannerOpen, then looking up this map in preScannerNext and postScannerNext. VisibilityController seems to do something similar already using a weak hashmap. This approach seems a little brittle, since there's theoretically no guarantee that the scanner that is passed to postScannerOpen is the same on that is passed to preScannerNext and postScannerNext. Perhaps we should change the docs to explicitly specify that this will always be the case. Anyway, since it doesn't involve changing the coprocessor interface, I'll take this approach. The patch to modify the coprocessor interface is attached if anyone wants it. Feel free to close this ticket otherwise. Add Scan object to preScannerNext and postScannerNext methods on RegionObserver --- Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson Attachments: HBASE-10311.patch I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867968#comment-13867968 ] stack commented on HBASE-10304: --- [~enis] That pointer helps. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So, the parent loads the pb classes. We then load ZCLBS only this is done in the claslsloader made by RunJar; ZKCLBS has a different classloader from its superclass and we get the above IllegalAccessError. Now (Jimmy's work comes in here), this can't be fixed by reflection -- you can't setAccess on a 'Class' -- and though it probably could be fixed by hacking RunJar so it was somehow made
[jira] [Commented] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867979#comment-13867979 ] Anoop Sam John commented on HBASE-10311: Actually it will be the same scanner object being passed in next() and next CP hooks. Yes we already use this VisibilityController. I was about to come here and suggest this and then saw you realized it alreay on your own.. Good :) We can not change the CP signature in released major versions. But can change in Trunk only ( if needed) .. Here it looks not needed at all.. So I will close this issue. Add Scan object to preScannerNext and postScannerNext methods on RegionObserver --- Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson Attachments: HBASE-10311.patch I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-10311. Resolution: Invalid Add Scan object to preScannerNext and postScannerNext methods on RegionObserver --- Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson Attachments: HBASE-10311.patch I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868027#comment-13868027 ] Sergey Shelukhin commented on HBASE-10294: -- Yes, above is unnecessary. Patch? :) Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Priority: Minor ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-10294: -- Assignee: Ted Yu Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10294: --- Attachment: 10294-v1.txt Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10294-v1.txt ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10294: --- Status: Patch Available (was: Open) Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10294-v1.txt ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868041#comment-13868041 ] Andrew Purtell commented on HBASE-10307: I need to commit this to move forward with 0.98, so will do so using CTR in a few hours. IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868053#comment-13868053 ] Ted Yu commented on HBASE-10307: +1 IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range
[ https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868055#comment-13868055 ] Andrew Purtell commented on HBASE-10123: No problem at all [~jmhsieh], go for it. Change default ports; move them out of linux ephemeral port range - Key: HBASE-10123 URL: https://issues.apache.org/jira/browse/HBASE-10123 Project: HBase Issue Type: Bug Reporter: stack Our defaults clash w/ the range linux assigns itself for creating come-and-go ephemeral ports; likely in our history we've clashed w/ a random, short-lived process. While easy to change the defaults, we should just ship w/ defaults that make sense. We could host ourselves up into the 7 or 8k range. See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868061#comment-13868061 ] Andrew Purtell commented on HBASE-10304: {quote} I think something like below should be the standard way to launch an HBase job. Java developers are used to thinking about the classpath, so I don't think it's a burden on anyone. {noformat} $ HADOOP_CLASSPATH=$(hbase mapredcp) hadoop jar foo.jar MainClass {noformat} or perhaps, if you're fancy {noformat} $ HADOOP_CLASSPATH=/path/to/hbase_config:$(hbase mapredcp) hadoop jar foo.jar {noformat} {quote} So can we get away with a doc change / manual update as the fix for this issue? Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent
[jira] [Resolved] (HBASE-10299) TestZKProcedure fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10299. Resolution: Duplicate Fix Version/s: (was: 0.99.0) (was: 0.98.0) Assignee: (was: Andrew Purtell) Dup of HBASE-10308 TestZKProcedure fails occasionally -- Key: HBASE-10299 URL: https://issues.apache.org/jira/browse/HBASE-10299 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell I can reproduce this using JDK 6 on Ubuntu 12. {noformat} Running org.apache.hadoop.hbase.procedure.TestZKProcedure Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.941 sec FAILURE! [...] Failed tests: testMultiCohortWithMemberTimeoutDuringPrepare(org.apache.hadoop.hbase.procedure.TestZKProcedure): (..) {noformat} Not seen running the test standalone. Quite rare, seen after 46 previous successful test suite runs. No failure trace available yet. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868063#comment-13868063 ] Andrew Purtell commented on HBASE-10308: I filed HBASE-10299 for this but will close that as a dup since this issue has more detail. TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.94.16, 0.96.2, 0.98.1, 0.99.0 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10308: --- Fix Version/s: 0.99.0 0.98.1 0.96.2 TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.94.16, 0.96.2, 0.98.1, 0.99.0 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868071#comment-13868071 ] Andrew Purtell commented on HBASE-10307: Thanks [~te...@apache.org], appreciate you taking a look. IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868091#comment-13868091 ] Anoop Sam John commented on HBASE-10307: Patch LGTM.. +1 IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10310) ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868092#comment-13868092 ] Andrew Purtell commented on HBASE-10310: +1, will commit this in a bit to trunk, 0.98, and 0.96 since it's an obvious fix.(Ping [~stack]). ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
[jira] [Commented] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868068#comment-13868068 ] Andrew Purtell commented on HBASE-10308: Updated fix versions since this happens AFAIK on all branches. TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.94.16, 0.96.2, 0.98.1, 0.99.0 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10288) make mvcc an (optional) part of KV serialization
[ https://issues.apache.org/jira/browse/HBASE-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868094#comment-13868094 ] Andrew Purtell commented on HBASE-10288: bq. It can be done using tags, but we may not want the overhead given that it will be in many KVs, so it might require HFileFormat vN+1 Only a minor version increment should be necessary. make mvcc an (optional) part of KV serialization Key: HBASE-10288 URL: https://issues.apache.org/jira/browse/HBASE-10288 Project: HBase Issue Type: Improvement Components: HFile Reporter: Sergey Shelukhin Priority: Minor This has been suggested in HBASE-10241. Mvcc can currently be serialized in HFile, but the mechanism is... magical. We might want to make it a part of proper serialization of the KV. It can be done using tags, but we may not want the overhead given that it will be in many KVs, so it might require HFileFormat vN+1. Regardless, the external mechanism would need to be removed while also preserving backward compat. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10309) Add support to delete empty regions in 0.94.x series
[ https://issues.apache.org/jira/browse/HBASE-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868085#comment-13868085 ] Andrew Purtell commented on HBASE-10309: Or you could create your keys based on a modulus of the timestamp? Could work if you can put an upper bound on the number of records stored within the retention period, call it N, then key = (timestamp mod N) ... Add support to delete empty regions in 0.94.x series Key: HBASE-10309 URL: https://issues.apache.org/jira/browse/HBASE-10309 Project: HBase Issue Type: New Feature Reporter: AcCud Fix For: 0.94.16 My use case: I have several tables where keys start with a timestamp. Because of this and combined with the fact that I have set a 15 days retention period, after a period of time results empty regions. I am sure that no write will occur in these region. It would be nice to have a tool to delete regions without being necessary to stop the cluster. The easiest way for me is to have a tool that is able to delete all empty regions, but there wouldn't be any problem to specify which region to delete. Something like: deleteRegion tableName region -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868095#comment-13868095 ] Jimmy Xiang commented on HBASE-10304: - I have verified the two workarounds to work fine. With the fat hbase jobjar, I can run row counter and get correct results. 1. run the job like HADOOP_CLASSPATH=/path/to/hbase_config:/path/to/hbase-protocol.jar hadoop jar fat-hbase-job.jar 2. put the hbase-protocol jar under hadoop/lib so that MR can pick it up, and run the job as before +1 on doc change as the fix for this issue. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So, the parent loads the pb
[jira] [Updated] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10307: --- Attachment: 10307.patch What I committed to trunk and 0.98. It's the same patch as reviewed but with the addition of a main method. Confirmed to work on a small test cluster. IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10200) Better error message when HttpServer fails to start due to java.net.BindException
[ https://issues.apache.org/jira/browse/HBASE-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10200: --- Labels: noob (was: ) Better error message when HttpServer fails to start due to java.net.BindException - Key: HBASE-10200 URL: https://issues.apache.org/jira/browse/HBASE-10200 Project: HBase Issue Type: Task Reporter: Ted Yu Priority: Minor Labels: noob Starting HBase using Hoya, I saw the following in log: {code} 2013-12-17 21:49:06,758 INFO [master:hor12n19:42587] http.HttpServer: HttpServer.start() threw a non Bind IOException java.net.BindException: Port in use: hor12n14.gq1.ygridcore.net:12432 at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:742) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:686) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:586) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:344) at sun.nio.ch.Net.bind(Net.java:336) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:738) {code} This was due to hbase.master.info.bindAddress giving static address but Hoya allocates master dynamically. Better error message should be provided: when bindAddress points another host than local host, message should remind user to remove / adjust hbase.master.info.bindAddress config param from hbase-site.xml -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10307. Resolution: Fixed Hadoop Flags: Reviewed IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868111#comment-13868111 ] Sergey Shelukhin commented on HBASE-10294: -- +1 Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10294-v1.txt ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868114#comment-13868114 ] stack commented on HBASE-10304: --- Agree with [~jxiang] I could have a go at it, np, but [~ndimiduk], you have an opinion on where we should be going that I like (Deprecate fat job jar) and you are a better writer... do you want to do up a doc patch? Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So, the parent loads the pb classes. We then load ZCLBS only this is done in the claslsloader made by RunJar; ZKCLBS has a different classloader from its superclass and we get the above IllegalAccessError.
[jira] [Closed] (HBASE-10311) Add Scan object to preScannerNext and postScannerNext methods on RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack closed HBASE-10311. - Closing at Mr [~neilf]'s suggestion. This seems like something we should doc in CP section of refguide given two of you fellas bumped into the prob. Let me try do that and point to this issue. Add Scan object to preScannerNext and postScannerNext methods on RegionObserver --- Key: HBASE-10311 URL: https://issues.apache.org/jira/browse/HBASE-10311 Project: HBase Issue Type: New Feature Components: Coprocessors Affects Versions: 0.96.1.1 Reporter: Neil Ferguson Attachments: HBASE-10311.patch I'd like to be able to access the org.apache.hadoop.hbase.client.Scan that was used to create a scanner in the RegionObserver.preScannerNext and RegionObserver.postScannerNext methods. The Scan object is available in the preScannerOpen method, but not in the preScannerNext or postScannerNext methods. The reason is that I'd like to access the attributes of the Scan object. I want to do some resource management in the coprocessor based on some attributes of the Scan object (like, who created it). Alternatively, does anybody know of another way to get hold of the Scan object in these methods without modifying things? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10310) ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868126#comment-13868126 ] stack commented on HBASE-10310: --- +1 for 0.96. Bug fix. Thanks [~asamir] ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
[jira] [Updated] (HBASE-9914) Port fix for HBASE-9836 'Intermittent TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking failure' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9914: -- Labels: noob (was: ) Port fix for HBASE-9836 'Intermittent TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking failure' to 0.94 - Key: HBASE-9914 URL: https://issues.apache.org/jira/browse/HBASE-9914 Project: HBase Issue Type: Test Reporter: Ted Yu Labels: noob According to this thread: http://search-hadoop.com/m/3CzC31BQsDd , TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking sometimes failed. This issue is to port the fix from HBASE-9836 to 0.94 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-10292: Still seeing this occasionally TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10309) Add support to delete empty regions in 0.94.x series
[ https://issues.apache.org/jira/browse/HBASE-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868136#comment-13868136 ] stack commented on HBASE-10309: --- Removing a single empty region is not possible without region merge facility. If multiple adjacent empty regions, you could replace them all with a single empty region that spans the deleted regions easy enough. Add support to delete empty regions in 0.94.x series Key: HBASE-10309 URL: https://issues.apache.org/jira/browse/HBASE-10309 Project: HBase Issue Type: New Feature Reporter: AcCud Fix For: 0.94.16 My use case: I have several tables where keys start with a timestamp. Because of this and combined with the fact that I have set a 15 days retention period, after a period of time results empty regions. I am sure that no write will occur in these region. It would be nice to have a tool to delete regions without being necessary to stop the cluster. The easiest way for me is to have a tool that is able to delete all empty regions, but there wouldn't be any problem to specify which region to delete. Something like: deleteRegion tableName region -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10305) Batch update performance drops as the number of regions grows
[ https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868140#comment-13868140 ] stack commented on HBASE-10305: --- Does HBASE-8755 help? It batches up the sync invocations making it so each filesystem sync satisfies more than just the one Handler API sync invocation. You'll get to a sync cadence that should be roughly independent of the number of times the Handler calls sync. Batch update performance drops as the number of regions grows - Key: HBASE-10305 URL: https://issues.apache.org/jira/browse/HBASE-10305 Project: HBase Issue Type: Bug Components: Performance Reporter: Chao Shi In our use case, we use a small number (~5) of proxy programs that read from a queue and batch update to HBase. Our program is multi-threaded and HBase client will batch mutations to each RS. We found we're getting lower TPS when there are more regions. I think the reason is RS syncs HLog for each region. Suppose there is a single region, the batch update will only touch one region and therefore syncs HLog once. And suppose there are 10 regions per server, in RS#multi() it have to process update for each individual region and sync HLog 10 times. Please note that in our scenario, batched mutations usually are independent with each other and need to touch a various number of regions. We are using the 0.94 series, but I think the trunk should have the same problem after a quick look into the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10294) Some synchronization on ServerManager#onlineServers can be removed
[ https://issues.apache.org/jira/browse/HBASE-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868143#comment-13868143 ] Hadoop QA commented on HBASE-10294: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622406/10294-v1.txt against trunk revision . ATTACHMENT ID: 12622406 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8385//console This message is automatically generated. Some synchronization on ServerManager#onlineServers can be removed -- Key: HBASE-10294 URL: https://issues.apache.org/jira/browse/HBASE-10294 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10294-v1.txt ServerManager#onlineServers is a ConcurrentHashMap Yet I found that some accesses to it are synchronized and unnecessary. Here is one example: {code} public MapServerName, ServerLoad getOnlineServers() { // Presumption is that iterating the returned Map is OK. synchronized (this.onlineServers) { return Collections.unmodifiableMap(this.onlineServers); {code} Note: not all accesses to ServerManager#onlineServers are synchronized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse
Andrew Purtell created HBASE-10312: -- Summary: Flooding the cluster with administrative actions leads to collapse Key: HBASE-10312 URL: https://issues.apache.org/jira/browse/HBASE-10312 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true ; do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
stack created HBASE-10313: - Summary: Duplicate servlet-api jars in hbase 0.96.0 Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.96.2 On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868190#comment-13868190 ] stack commented on HBASE-10313: --- stax-api also came recently in offline discussion. Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.96.2 On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse
[ https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868196#comment-13868196 ] Andrew Purtell commented on HBASE-10312: With the AccessController active only users granted ADMIN privilege can do this, so it's not a critical issue unless enabling security is not an option for the deployment. Flooding the cluster with administrative actions leads to collapse -- Key: HBASE-10312 URL: https://issues.apache.org/jira/browse/HBASE-10312 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true ; do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10303) Have snappy support properly documented would be helpful to hadoop and hbase users
[ https://issues.apache.org/jira/browse/HBASE-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10303: -- Priority: Blocker (was: Major) Fix Version/s: 0.96.2 Making a blocker against 0.96.2 If anyone has anything to add to Rural notes, it'd be appreciated ([~jmspaggi] Do we need to integrate your page into refguide?) Have snappy support properly documented would be helpful to hadoop and hbase users -- Key: HBASE-10303 URL: https://issues.apache.org/jira/browse/HBASE-10303 Project: HBase Issue Type: Task Components: documentation Reporter: Rural Hunter Priority: Blocker Fix For: 0.96.2 The currentl document for configuring snappy support(http://hbase.apache.org/book/snappy.compression.html) is not complete and it's a bit obscure. IMO, there are several improvments can be made: 1. Describe the relationship among hadoop,hbase,snappy. Is the snappy actually needed by hadoop hdfs or hbase itself? That's to make clear if you need to configure snappy support in hbase or hadoop. 2. It didn't mention the default hadoop binary package is compiled without snappy support and you need to compile it with snappy option manually. Actually it didn't work with any native libs on 64 bits OS as the libhadoop.so in the binary package is only for 32 bits OS(this of course is a hadoop issue not hbase. but it's good to mention it.). 3. In my experience, I actually need to install both snappy and hadoop-snappy. So the doc lack of the steps to install hadoop-snappy. 4. During my set up, I found difference where hadoop and hbase to pick up the native lib files. hadoop picks those files in ./lib while hbase picks in ./lib/[PLATFORM]. If it's correct, it can also be mentioned. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10314) Add Chaos Monkey that doesn't touch the master
Elliott Clark created HBASE-10314: - Summary: Add Chaos Monkey that doesn't touch the master Key: HBASE-10314 URL: https://issues.apache.org/jira/browse/HBASE-10314 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.1.1, 0.98.0, 0.99.0 Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868220#comment-13868220 ] stack commented on HBASE-7386: -- Related, from our boys at Xiaomi https://github.com/XiaoMi/minos Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868218#comment-13868218 ] Lars Hofhansl commented on HBASE-10308: --- Sorry I missed the earlier issue. Do you have any hunches about what the problem might be? TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.94.16, 0.96.2, 0.98.1, 0.99.0 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse
[ https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10312: --- Description: Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} was: Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true ; do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} Flooding the cluster with administrative actions leads to collapse -- Key: HBASE-10312 URL: https://issues.apache.org/jira/browse/HBASE-10312 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10309) Add support to delete empty regions in 0.94.x series
[ https://issues.apache.org/jira/browse/HBASE-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868224#comment-13868224 ] Lars Hofhansl commented on HBASE-10309: --- Is not simpler to remove an empty region? The equivalent of removing the directory from HDFS and fix META the way HBCK does? This part of the code is not my area of expertise so I might way off, but it seems it should easier than actually merging regions with data. Add support to delete empty regions in 0.94.x series Key: HBASE-10309 URL: https://issues.apache.org/jira/browse/HBASE-10309 Project: HBase Issue Type: New Feature Reporter: AcCud Fix For: 0.94.17 My use case: I have several tables where keys start with a timestamp. Because of this and combined with the fact that I have set a 15 days retention period, after a period of time results empty regions. I am sure that no write will occur in these region. It would be nice to have a tool to delete regions without being necessary to stop the cluster. The easiest way for me is to have a tool that is able to delete all empty regions, but there wouldn't be any problem to specify which region to delete. Something like: deleteRegion tableName region -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868221#comment-13868221 ] Andrew Purtell commented on HBASE-10308: No I haven't looked into it. TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.94.16, 0.96.2, 0.98.1, 0.99.0 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10309) Add support to delete empty regions in 0.94.x series
[ https://issues.apache.org/jira/browse/HBASE-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10309: -- Fix Version/s: (was: 0.94.16) 0.94.17 Add support to delete empty regions in 0.94.x series Key: HBASE-10309 URL: https://issues.apache.org/jira/browse/HBASE-10309 Project: HBase Issue Type: New Feature Reporter: AcCud Fix For: 0.94.17 My use case: I have several tables where keys start with a timestamp. Because of this and combined with the fact that I have set a 15 days retention period, after a period of time results empty regions. I am sure that no write will occur in these region. It would be nice to have a tool to delete regions without being necessary to stop the cluster. The easiest way for me is to have a tool that is able to delete all empty regions, but there wouldn't be any problem to specify which region to delete. Something like: deleteRegion tableName region -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10308: -- Fix Version/s: (was: 0.94.16) 0.94.17 TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9005) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers
[ https://issues.apache.org/jira/browse/HBASE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868229#comment-13868229 ] Lars Hofhansl commented on HBASE-9005: -- Only targeting 0.99 since this will go into the general documentation area on the site. Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers - Key: HBASE-9005 URL: https://issues.apache.org/jira/browse/HBASE-9005 Project: HBase Issue Type: Bug Components: documentation Reporter: Lars Hofhansl Fix For: 0.99.0 Attachments: 9005.txt Without KEEP_DELETED_CELLS all timerange queries are broken if their range covers a delete marker. As some internal discussions with colleagues showed, this feature is not well understand and documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9005) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers
[ https://issues.apache.org/jira/browse/HBASE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9005: - Priority: Minor (was: Major) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers - Key: HBASE-9005 URL: https://issues.apache.org/jira/browse/HBASE-9005 Project: HBase Issue Type: Bug Components: documentation Reporter: Lars Hofhansl Priority: Minor Fix For: 0.99.0 Attachments: 9005.txt Without KEEP_DELETED_CELLS all timerange queries are broken if their range covers a delete marker. As some internal discussions with colleagues showed, this feature is not well understand and documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9005) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers
[ https://issues.apache.org/jira/browse/HBASE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9005: - Fix Version/s: (was: 0.98.1) (was: 0.96.2) (was: 0.94.16) 0.99.0 Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers - Key: HBASE-9005 URL: https://issues.apache.org/jira/browse/HBASE-9005 Project: HBase Issue Type: Bug Components: documentation Reporter: Lars Hofhansl Fix For: 0.99.0 Attachments: 9005.txt Without KEEP_DELETED_CELLS all timerange queries are broken if their range covers a delete marker. As some internal discussions with colleagues showed, this feature is not well understand and documented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.
Elliott Clark created HBASE-10315: - Summary: Canary shouldn't exit with 3 if there is no master running. Key: HBASE-10315 URL: https://issues.apache.org/jira/browse/HBASE-10315 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1.1, 0.98.0 Reporter: Elliott Clark Assignee: Elliott Clark It's possible to timeout(when timeout is below the number of retires to master) before even initializing if there is no master up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10295) Refactor the replication implementation to eliminate permanent zk node
[ https://issues.apache.org/jira/browse/HBASE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868266#comment-13868266 ] stack commented on HBASE-10295: --- Make Master arbiter for these new system tables -- only the master can mod them -- and then add a response on the heartbeat to update regionservers on last edit? Currently we return a void. See RegionServerReportResponse in http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RegionServerStatus.proto?view=markup Could be as simple as master just replying w/ timestamp of last edit. If RS has not seen the new edit, it goes and reads the table Refactor the replication implementation to eliminate permanent zk node --- Key: HBASE-10295 URL: https://issues.apache.org/jira/browse/HBASE-10295 Project: HBase Issue Type: Bug Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Though this is a broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868311#comment-13868311 ] Hudson commented on HBASE-10307: FAILURE: Integrated in HBase-TRUNK #4805 (See [https://builds.apache.org/job/HBase-TRUNK/4805/]) HBASE-10307. IntegrationTestIngestWithEncryption assumes localhost cluster (apurtell: rev 1557219) * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithEncryption.java IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868322#comment-13868322 ] Hudson commented on HBASE-10307: FAILURE: Integrated in HBase-0.98 #68 (See [https://builds.apache.org/job/HBase-0.98/68/]) HBASE-10307. IntegrationTestIngestWithEncryption assumes localhost cluster (apurtell: rev 1557220) * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithEncryption.java IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868348#comment-13868348 ] Nick Dimiduk commented on HBASE-10304: -- Sure, I can write something up. I suppose there's no need to deprecate the fat jar approach so long as the docs are clear. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So, the parent loads the pb classes. We then load ZCLBS only this is done in the claslsloader made by RunJar; ZKCLBS has a different classloader from its superclass and we get the above IllegalAccessError. Now (Jimmy's work comes in here), this can't be fixed by reflection -- you can't
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868362#comment-13868362 ] Nick Dimiduk commented on HBASE-10304: -- Here's some copy we can use. Where in the book would you want something like this to live? I also suggest the package-info be updated as well. h3. Problem Mapreduce jobs submitted to the cluster via a fat jar, that is, a jar containing a 'lib' directory with their runtime dependencies, fail to launch. The symptom is an exception similar to the following: {noformat} Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) ... {noformat} This is because of an optimization introduced in [HBASE-9867|https://issues.apache.org/jira/browse/HBASE-9867] that inadvertently introduced a classloader dependency. Jobs submitted using a regular jar and specifying their runtime dependencies using the -libjars parameter are not affected by this regression. More details about using the -libjars parameter are available in this [blog post|http://grepalex.com/2013/02/25/hadoop-libjars/]. h3. Solution In order to satisfy the new classloader requirements, hbase-protocol.jar must be included in Hadoop's classpath. This can be resolved system-wide by including a reference to the hbase-protocol.jar in hadoop's lib directory, via a symlink or by copying the jar into the new location. This can also be achieved on a per-job launch basis by specifying a value for {{HADOOP_CLASSPATH}} at job submission time. All three of the following job launching commands satisfy this requirement: {noformat} $ HADOOP_CLASSPATH=/path/to/hbase-protocol.jar hadoop jar MyJob.jar MyJobMainClass $ HADOOP_CLASSPATH=$(hbase mapredcp) hadoop jar MyJob.jar MyJobMainClass $ HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass {noformat} h3. Apache Reference JIRA See also [HBASE-10304|https://issues.apache.org/jira/browse/HBASE-10304]. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass
[jira] [Updated] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10292: --- Attachment: 10292-addendum-1.patch Testing this addendum. On another issue Sergey mentioned that AsyncProcess may return the error for a given put on the next one. I'm not familiar with this area of the code, but let's try it. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10314) Add Chaos Monkey that doesn't touch the master
[ https://issues.apache.org/jira/browse/HBASE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868376#comment-13868376 ] Nick Dimiduk commented on HBASE-10314: -- Instead of defining all these monkeys in code, is it possible to define them via configuration? I haven't looked closely at the implementation, but I'd think the actions should be composable. Add Chaos Monkey that doesn't touch the master -- Key: HBASE-10314 URL: https://issues.apache.org/jira/browse/HBASE-10314 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-1015) pure C and C++ client libraries
[ https://issues.apache.org/jira/browse/HBASE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868382#comment-13868382 ] Ted Dunning commented on HBASE-1015: Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot? Shouldn't reality be recognized? Shouldn't this be closed as WONT_FIX? pure C and C++ client libraries --- Key: HBASE-1015 URL: https://issues.apache.org/jira/browse/HBASE-1015 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6 Reporter: Andrew Purtell Priority: Minor If via HBASE-794 first class support for talking via Thrift directly to HMaster and HRS is available, then pure C and C++ client libraries are possible. The C client library would wrap a Thrift core. The C++ client library can provide a class hierarchy quite close to o.a.h.h.client and, ideally, identical semantics. It should be just a wrapper around the C API, for economy. Internally to my employer there is a lot of resistance to HBase because many dev teams have a strong C/C++ bias. The real issue however is really client side integration, not a fundamental objection. (What runs server side and how it is managed is a secondary consideration.) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868387#comment-13868387 ] Hudson commented on HBASE-10307: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #63 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/63/]) HBASE-10307. IntegrationTestIngestWithEncryption assumes localhost cluster (apurtell: rev 1557220) * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithEncryption.java IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10316) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner()
Ted Yu created HBASE-10316: -- Summary: Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner() Key: HBASE-10316 URL: https://issues.apache.org/jira/browse/HBASE-10316 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Minor At line 624, in the else block, ResultScanner returned by table.getScanner() is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10316) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner()
[ https://issues.apache.org/jira/browse/HBASE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-10316: -- Assignee: Ted Yu Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner() - Key: HBASE-10316 URL: https://issues.apache.org/jira/browse/HBASE-10316 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10316.txt At line 624, in the else block, ResultScanner returned by table.getScanner() is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10316) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner()
[ https://issues.apache.org/jira/browse/HBASE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10316: --- Attachment: 10316.txt Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner() - Key: HBASE-10316 URL: https://issues.apache.org/jira/browse/HBASE-10316 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: 10316.txt At line 624, in the else block, ResultScanner returned by table.getScanner() is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10316) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner()
[ https://issues.apache.org/jira/browse/HBASE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10316: --- Status: Patch Available (was: Open) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner() - Key: HBASE-10316 URL: https://issues.apache.org/jira/browse/HBASE-10316 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10316.txt At line 624, in the else block, ResultScanner returned by table.getScanner() is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868394#comment-13868394 ] Nick Dimiduk commented on HBASE-10263: -- Actually, [~xieliang007] do you mind committing also to 0.98? I don't want to steal your thunder on commit ;) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block -- Key: HBASE-10263 URL: https://issues.apache.org/jira/browse/HBASE-10263 Project: HBase Issue Type: Improvement Components: io Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, HBASE-10263-trunk_v2.patch currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 1:2:1, which can lead to somewhat counter-intuition behavior for some user scenario where in-memory table's read performance is much worse than ordinary table when two tables' data size is almost equal and larger than regionserver's cache size (we ever did some such experiment and verified that in-memory table random read performance is two times worse than ordinary table). this patch fixes above issue and provides: 1. make single/multi/in-memory ratio user-configurable 2. provide a configurable switch which can make in-memory block preemptive, by preemptive means when this switch is on in-memory block can kick out any ordinary block to make room until no ordinary block, when this switch is off (by default) the behavior is the same as previous, using single/multi/in-memory ratio to determine evicting. by default, above two changes are both off and the behavior keeps the same as before applying this patch. it's client/user's choice to determine whether or which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-1015) pure C and C++ client libraries
[ https://issues.apache.org/jira/browse/HBASE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868410#comment-13868410 ] Andrew Purtell commented on HBASE-1015: --- bq. Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot? This issue has been superseded by the use of protobuf in RPCs instead of Thrift and the commit of the start of a C/C++ client library, see HBASE-9977. Closing this issue in lieu of something else is fine, but WONTFIX is the incorrect resolution. pure C and C++ client libraries --- Key: HBASE-1015 URL: https://issues.apache.org/jira/browse/HBASE-1015 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6 Reporter: Andrew Purtell Priority: Minor If via HBASE-794 first class support for talking via Thrift directly to HMaster and HRS is available, then pure C and C++ client libraries are possible. The C client library would wrap a Thrift core. The C++ client library can provide a class hierarchy quite close to o.a.h.h.client and, ideally, identical semantics. It should be just a wrapper around the C API, for economy. Internally to my employer there is a lot of resistance to HBase because many dev teams have a strong C/C++ bias. The real issue however is really client side integration, not a fundamental objection. (What runs server side and how it is managed is a secondary consideration.) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10310) ZNodeCleaner session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10310: --- Summary: ZNodeCleaner session expired for /hbase/master (was: ZNodeCleaner.java KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master) ZNodeCleaner session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
[jira] [Resolved] (HBASE-10310) ZNodeCleaner session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10310. Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Hadoop Flags: Reviewed Committed to trunk, 0.98, and 0.96. Thanks for the patch Samir! ZNodeCleaner session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at
[jira] [Commented] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868447#comment-13868447 ] Richard Ding commented on HBASE-9426: - [~jmhsieh], can you please take a look at the new patch and let me know what your think? Thanks. Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426-4.patch, HBASE-9426-4.patch, HBASE-9426-6.patch, HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10317) getClientPort method of MiniZooKeeperCluster does not always return the correct value
Vasu Mariyala created HBASE-10317: - Summary: getClientPort method of MiniZooKeeperCluster does not always return the correct value Key: HBASE-10317 URL: https://issues.apache.org/jira/browse/HBASE-10317 Project: HBase Issue Type: Bug Reporter: Vasu Mariyala Priority: Minor {code} //Starting 5 zk servers MiniZooKeeperCluster cluster = hbt.startMiniZKCluster(5); int defaultClientPort = 21818; cluster.setDefaultClientPort(defaultClientPort); cluster.killCurrentActiveZooKeeperServer(); cluster.getClientPort(); //Still returns the port of the zk server that was killed in the previous step {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10317) getClientPort method of MiniZooKeeperCluster does not always return the correct value
[ https://issues.apache.org/jira/browse/HBASE-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasu Mariyala updated HBASE-10317: -- Attachment: HBASE-10317.patch getClientPort method of MiniZooKeeperCluster does not always return the correct value - Key: HBASE-10317 URL: https://issues.apache.org/jira/browse/HBASE-10317 Project: HBase Issue Type: Bug Reporter: Vasu Mariyala Priority: Minor Attachments: HBASE-10317.patch {code} //Starting 5 zk servers MiniZooKeeperCluster cluster = hbt.startMiniZKCluster(5); int defaultClientPort = 21818; cluster.setDefaultClientPort(defaultClientPort); cluster.killCurrentActiveZooKeeperServer(); cluster.getClientPort(); //Still returns the port of the zk server that was killed in the previous step {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10317) getClientPort method of MiniZooKeeperCluster does not always return the correct value
[ https://issues.apache.org/jira/browse/HBASE-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasu Mariyala updated HBASE-10317: -- Status: Patch Available (was: Open) getClientPort method of MiniZooKeeperCluster does not always return the correct value - Key: HBASE-10317 URL: https://issues.apache.org/jira/browse/HBASE-10317 Project: HBase Issue Type: Bug Reporter: Vasu Mariyala Priority: Minor Attachments: HBASE-10317.patch {code} //Starting 5 zk servers MiniZooKeeperCluster cluster = hbt.startMiniZKCluster(5); int defaultClientPort = 21818; cluster.setDefaultClientPort(defaultClientPort); cluster.killCurrentActiveZooKeeperServer(); cluster.getClientPort(); //Still returns the port of the zk server that was killed in the previous step {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868472#comment-13868472 ] Jimmy Xiang commented on HBASE-10304: - I tried with -libjars, and it gave me the same problem. So it is not working for me. I also tried the three suggestions. The first two of them need some tweaking, while the third one work as-is. bq. $ HADOOP_CLASSPATH=/path/to/hbase-protocol.jar hadoop jar MyJob.jar MyJobMainClass I got this: {noformat} 14/01/10 15:31:05 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) {noformat} Basically, I can't connect to the ZK. I have to add the hbase conf dir as below: {noformat} $ HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase-conf hadoop jar MyJob.jar MyJobMainClass {noformat} bq. $ HADOOP_CLASSPATH=$(hbase mapredcp) hadoop jar MyJob.jar MyJobMainClass Same as above. I need to add hbase conf dir to the path: {noformat} $ HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase-conf hadoop jar MyJob.jar MyJobMainClass {noformat} bq. $ HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass Works for me. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (HBASE-10316) Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner()
[ https://issues.apache.org/jira/browse/HBASE-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868480#comment-13868480 ] Hadoop QA commented on HBASE-10316: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622448/10316.txt against trunk revision . ATTACHMENT ID: 12622448 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8386//console This message is automatically generated. Canary#RegionServerMonitor#monitorRegionServers() should close the scanner returned by table.getScanner() - Key: HBASE-10316 URL: https://issues.apache.org/jira/browse/HBASE-10316 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10316.txt At line 624, in the else block, ResultScanner returned by table.getScanner() is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse
[ https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10312: --- Fix Version/s: 0.99.0 Flooding the cluster with administrative actions leads to collapse -- Key: HBASE-10312 URL: https://issues.apache.org/jira/browse/HBASE-10312 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Fix For: 0.99.0 Steps to reproduce: 1. Start a cluster. 2. Start an ingest process. 3. In the HBase shell, do this: {noformat} while true do flush 'table' end {noformat} We should reject abuse via administrative requests like this. What happens on the cluster is the requests back up, leading to lots of these: {noformat} 2014-01-10 18:55:55,293 WARN [Priority.RpcServer.handler=2,port=8120] monitoring.TaskMonitor: Too many actions in action monitor! Purging some. {noformat} At this point we could lower a gate on further requests for actions until the backlog clears. Continuing, all of the regionservers will eventually die with a StackOverflowError of unknown origin because, stack overflow: {noformat} 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError at java.util.ArrayList$SubList.add(ArrayList.java:965) [...] {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10314) Add Chaos Monkey that doesn't touch the master
[ https://issues.apache.org/jira/browse/HBASE-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868519#comment-13868519 ] Elliott Clark commented on HBASE-10314: --- The actions are easily composable. But doing that on the command line is pretty awful so imo it's much better to have several of these already made and there for easy use. Add Chaos Monkey that doesn't touch the master -- Key: HBASE-10314 URL: https://issues.apache.org/jira/browse/HBASE-10314 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10308) TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868521#comment-13868521 ] Lars Hofhansl commented on HBASE-10308: --- I ran the test locally in a loop while I was away for meetings. After 1658 times it has not failed once. :( TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare fails occasionally Key: HBASE-10308 URL: https://issues.apache.org/jira/browse/HBASE-10308 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 Seen in 0.94 (both JDK6 and JDK7 builds) {code} Error Message Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) Stacktrace Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:204) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:262) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:217) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:319) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10317) getClientPort method of MiniZooKeeperCluster does not always return the correct value
[ https://issues.apache.org/jira/browse/HBASE-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868535#comment-13868535 ] Lars Hofhansl commented on HBASE-10317: --- Looks good to me. getClientPort method of MiniZooKeeperCluster does not always return the correct value - Key: HBASE-10317 URL: https://issues.apache.org/jira/browse/HBASE-10317 Project: HBase Issue Type: Bug Reporter: Vasu Mariyala Priority: Minor Attachments: HBASE-10317.patch {code} //Starting 5 zk servers MiniZooKeeperCluster cluster = hbt.startMiniZKCluster(5); int defaultClientPort = 21818; cluster.setDefaultClientPort(defaultClientPort); cluster.killCurrentActiveZooKeeperServer(); cluster.getClientPort(); //Still returns the port of the zk server that was killed in the previous step {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10318) generate-hadoopX-poms.sh expects the version to have one extra '-'
Raja Aluri created HBASE-10318: -- Summary: generate-hadoopX-poms.sh expects the version to have one extra '-' Key: HBASE-10318 URL: https://issues.apache.org/jira/browse/HBASE-10318 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.98.0 Reporter: Raja Aluri This change is in 0.96 branch, but missing in 0.98. Including the commit that made this [change|https://github.com/apache/hbase/commit/09442ca] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10318) generate-hadoopX-poms.sh expects the version to have one extra '-'
[ https://issues.apache.org/jira/browse/HBASE-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raja Aluri updated HBASE-10318: --- Status: Patch Available (was: Open) generate-hadoopX-poms.sh expects the version to have one extra '-' -- Key: HBASE-10318 URL: https://issues.apache.org/jira/browse/HBASE-10318 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.98.0 Reporter: Raja Aluri Attachments: 0001-HBASE-10318-generate-hadoopX-poms.sh-expects-the-ver.patch This change is in 0.96 branch, but missing in 0.98. Including the commit that made this [change|https://github.com/apache/hbase/commit/09442ca] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10318) generate-hadoopX-poms.sh expects the version to have one extra '-'
[ https://issues.apache.org/jira/browse/HBASE-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raja Aluri updated HBASE-10318: --- Attachment: 0001-HBASE-10318-generate-hadoopX-poms.sh-expects-the-ver.patch generate-hadoopX-poms.sh expects the version to have one extra '-' -- Key: HBASE-10318 URL: https://issues.apache.org/jira/browse/HBASE-10318 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.98.0 Reporter: Raja Aluri Attachments: 0001-HBASE-10318-generate-hadoopX-poms.sh-expects-the-ver.patch This change is in 0.96 branch, but missing in 0.98. Including the commit that made this [change|https://github.com/apache/hbase/commit/09442ca] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10310) ZNodeCleaner session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868545#comment-13868545 ] Hudson commented on HBASE-10310: FAILURE: Integrated in HBase-TRUNK #4806 (See [https://builds.apache.org/job/HBase-TRUNK/4806/]) HBASE-10310. ZNodeCleaner session expired for /hbase/master (Samir Ahmic) (apurtell: rev 1557273) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java ZNodeCleaner session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at
[jira] [Commented] (HBASE-10310) ZNodeCleaner session expired for /hbase/master
[ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868548#comment-13868548 ] Hudson commented on HBASE-10310: FAILURE: Integrated in hbase-0.96-hadoop2 #172 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/172/]) HBASE-10310. ZNodeCleaner session expired for /hbase/master (Samir Ahmic) (apurtell: rev 1557275) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java ZNodeCleaner session expired for /hbase/master -- Key: HBASE-10310 URL: https://issues.apache.org/jira/browse/HBASE-10310 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.1.1 Environment: x86_64 GNU/Linux Reporter: Samir Ahmic Assignee: Samir Ahmic Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10310.patch I was testing hbase master clear command while working on [HBASE-7386] here is command and exception: {code} $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 4 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160)
[jira] [Commented] (HBASE-10307) IntegrationTestIngestWithEncryption assumes localhost cluster
[ https://issues.apache.org/jira/browse/HBASE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868550#comment-13868550 ] Hudson commented on HBASE-10307: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #49 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/49/]) HBASE-10307. IntegrationTestIngestWithEncryption assumes localhost cluster (apurtell: rev 1557219) * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestIngestWithEncryption.java IntegrationTestIngestWithEncryption assumes localhost cluster - Key: HBASE-10307 URL: https://issues.apache.org/jira/browse/HBASE-10307 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10307.patch, 10307.patch, 10307.patch We forgot to update IntegrationTestIngestWithEncryption to handle the distributed cluster case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10278) Provide better write predictability
[ https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868551#comment-13868551 ] Himanshu Vashishtha commented on HBASE-10278: - Thanks for reviewing the doc Liang (and sorry about this delay in replying). True, to handle longer outages (rack down, for e.g.), we could tune the switching policy to avoid tiny log files (for e.g., take number of append ops since last switched, etc). Yes, 300ms is the avg time (total time for 1k ops was about 30sec). I didn't really dig into it to know the why it is better than as compared to 1 file scenario, but for me the interesting bit was about 568/1000 ops took more than a sec. Yes, The replication needs to handle two opened files. To get minimal impact on Replication, I am thinking of adding a separate ReplicationSource thread for the second WAL. But, I still need to look into it more if there is a better way to achieve this. Provide better write predictability --- Key: HBASE-10278 URL: https://issues.apache.org/jira/browse/HBASE-10278 Project: HBase Issue Type: New Feature Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: Multiwaldesigndoc.pdf Currently, HBase has one WAL per region server. Whenever there is any latency in the write pipeline (due to whatever reasons such as n/w blip, a node in the pipeline having a bad disk, etc), the overall write latency suffers. Jonathan Hsieh and I analyzed various approaches to tackle this issue. We also looked at HBASE-5699, which talks about adding concurrent multi WALs. Along with performance numbers, we also focussed on design simplicity, minimum impact on MTTR Replication, and compatibility with 0.96 and 0.98. Considering all these parameters, we propose a new HLog implementation with WAL Switching functionality. Please find attached the design doc for the same. It introduces the WAL Switching feature, and experiments/results of a prototype implementation, showing the benefits of this feature. The second goal of this work is to serve as a building block for concurrent multiple WALs feature. Please review the doc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10278) Provide better write predictability
[ https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868556#comment-13868556 ] Himanshu Vashishtha commented on HBASE-10278: - As mentioned in the doc, I will work on this feature on a different branch and merge it in the trunk when it is ready. I have created branch at my github (https://github.com/HimanshuVashishtha/hbase/tree/HBASE-10278). Provide better write predictability --- Key: HBASE-10278 URL: https://issues.apache.org/jira/browse/HBASE-10278 Project: HBase Issue Type: New Feature Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: Multiwaldesigndoc.pdf Currently, HBase has one WAL per region server. Whenever there is any latency in the write pipeline (due to whatever reasons such as n/w blip, a node in the pipeline having a bad disk, etc), the overall write latency suffers. Jonathan Hsieh and I analyzed various approaches to tackle this issue. We also looked at HBASE-5699, which talks about adding concurrent multi WALs. Along with performance numbers, we also focussed on design simplicity, minimum impact on MTTR Replication, and compatibility with 0.96 and 0.98. Considering all these parameters, we propose a new HLog implementation with WAL Switching functionality. Please find attached the design doc for the same. It introduces the WAL Switching feature, and experiments/results of a prototype implementation, showing the benefits of this feature. The second goal of this work is to serve as a building block for concurrent multiple WALs feature. Please review the doc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10319) HLog should roll periodically to allow DN decommission to eventually complete.
Jonathan Hsieh created HBASE-10319: -- Summary: HLog should roll periodically to allow DN decommission to eventually complete. Key: HBASE-10319 URL: https://issues.apache.org/jira/browse/HBASE-10319 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh We encountered a situation where we had an esseitially read only table and attempted to do a clean HDFS DN decommission. DN's cannot decomission if there are open blocks being written to currently on it. Because the hbase Hlog file was open, had some data (hlog header), the DN could not decommission itself. Since no new data is ever written, the existing periodic check is not activated. After discussing with [~atm], it seems that although an hdfs semantics change would be ideal (e.g. hbase doesn't have to be aware of hdfs decommission and the client would roll over) this would take much more effort than having hbase periodically force a log roll. This would enable the hdfs dn con complete. -- This message was sent by Atlassian JIRA (v6.1.5#6160)