[jira] [Commented] (HBASE-5787) Table owner can't disable/delete its own table
[ https://issues.apache.org/jira/browse/HBASE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256747#comment-13256747 ] Matteo Bertozzi commented on HBASE-5787: any additional thoughts or comments on the patch? Table owner can't disable/delete its own table -- Key: HBASE-5787 URL: https://issues.apache.org/jira/browse/HBASE-5787 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: acl, security Attachments: HBASE-5787-tests-wrong-names.patch, HBASE-5787-v0.patch, HBASE-5787-v1.patch An user with CREATE privileges can create a table, but can not disable it, because disable operation require ADMIN privileges. Also if a table is already disabled, anyone can remove it. {code} public void preDeleteTable(ObserverContextMasterCoprocessorEnvironment c, byte[] tableName) throws IOException { requirePermission(Permission.Action.CREATE); } public void preDisableTable(ObserverContextMasterCoprocessorEnvironment c, byte[] tableName) throws IOException { /* TODO: Allow for users with global CREATE permission and the table owner */ requirePermission(Permission.Action.ADMIN); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5787) Table owner can't disable/delete its own table
[ https://issues.apache.org/jira/browse/HBASE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256785#comment-13256785 ] Matteo Bertozzi commented on HBASE-5787: except for a getConf() - getConfiguration() change nothing is changed in AccessController between 0.92 and trunk Table owner can't disable/delete its own table -- Key: HBASE-5787 URL: https://issues.apache.org/jira/browse/HBASE-5787 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: acl, security Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5787-tests-wrong-names.patch, HBASE-5787-v0.patch, HBASE-5787-v1.patch An user with CREATE privileges can create a table, but can not disable it, because disable operation require ADMIN privileges. Also if a table is already disabled, anyone can remove it. {code} public void preDeleteTable(ObserverContextMasterCoprocessorEnvironment c, byte[] tableName) throws IOException { requirePermission(Permission.Action.CREATE); } public void preDisableTable(ObserverContextMasterCoprocessorEnvironment c, byte[] tableName) throws IOException { /* TODO: Allow for users with global CREATE permission and the table owner */ requirePermission(Permission.Action.ADMIN); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool
[ https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254654#comment-13254654 ] Matteo Bertozzi commented on HBASE-5792: @Jonathan the previous version had a HBaseAdmin call at startup because the tool contains also a Put test, but the hbase up si not needed. HLog Performance Evaluation Tool Key: HBASE-5792 URL: https://issues.apache.org/jira/browse/HBASE-5792 Project: HBase Issue Type: Test Components: wal Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: performance, wal Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch Related to HDFS-3280 and the HBase WAL slowdown on 0.23+ It would be nice to have a simple tool like HFilePerformanceEvaluation, ... to be able to check easily the HLog performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool
[ https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254428#comment-13254428 ] Matteo Bertozzi commented on HBASE-5792: Don't know if it makes sense catch the InterruptException, since we're trying to test write performance with nThreads... if we lost one thread, the test doesn't reflect what we've asked for. HLog Performance Evaluation Tool Key: HBASE-5792 URL: https://issues.apache.org/jira/browse/HBASE-5792 Project: HBase Issue Type: Test Components: wal Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: performance, wal Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch Related to HDFS-3280 and the HBase WAL slowdown on 0.23+ It would be nice to have a simple tool like HFilePerformanceEvaluation, ... to be able to check easily the HLog performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250868#comment-13250868 ] Matteo Bertozzi commented on HBASE-5666: The SOCKET_RETRY_WAIT_MS is 200ms but yes, is better sleeping with interrupt since the code can accept interrupt. The only real difference is that you've to wait the timeout if you want kill the inizialization. The retry loop is tricky to understand since RecoverableZookeeper is used... So if you give 0 as timeout, you're supposed to try once... but recoverableZookeeper.exists() retries in case of CONNECTIONLOSS, SESSIONEXPIRED and OPERATIONTIMEOUT. The idea here is to retry for x millisec until znode become available while (recoverableZookeeper.exists() == null) If the client comes up during this time I think that should crash anyway because the HRegion is still in the initialize() method... RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250907#comment-13250907 ] Matteo Bertozzi commented on HBASE-5666: {quote} Does this need to be public? + public boolean checkIfBaseNodeAvailable(int timeout) {? {quote} ZooKeeperNodeTracker.checkIfBaseNodeAvailable(timeout) is used by the HRegion so needs to be public RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250933#comment-13250933 ] Matteo Bertozzi commented on HBASE-5666: @Stack The client crash with this message (easy way to test put a long sleep in ZKUtil.createAndFailSilent() on /hbase node) {code} 12/04/10 20:29:36 ERROR client.HConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. {code} RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251100#comment-13251100 ] Matteo Bertozzi commented on HBASE-5666: @stack do you mean, change the log message in HConnectionManager? {code} Index: src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java === --- src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java (revision 1311874) +++ src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java (working copy) @@ -776,6 +776,7 @@ if (ZKUtil.checkExists(zkw, zkw.baseZNode) == -1) { errorMsg = The node + zkw.baseZNode+ is not in ZooKeeper. + It should have been written by the master. ++ Check the master is running? + Check the value configured in 'zookeeper.znode.parent'. + There could be a mismatch with the one configured in the master.; LOG.error(errorMsg); {code} RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248666#comment-13248666 ] Matteo Bertozzi commented on HBASE-5666: Since the logical flow is: * HMaster start and create the base node * Region Servers wait for base node to be available I prefer to use HBASE-5666-v6.patch adding the retry logic to ZKUtil.checkExists() in this way we've the wait for base node step. In the other case (@stack: fixing the ZookeeperWatcher) the first (master|RS) that arrive create the base node... but doesn't sounds right. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245720#comment-13245720 ] Matteo Bertozzi commented on HBASE-5666: Still looking at the 0.90 code... The new ZooKeeperWatcher (=0.92) calls the ZKUtil.createAndFailSilent(), to create base node and others, only if called by HMaster (canCreateBaseZNode = true), while before the code path was the same for everyone. So now, if HMaster has not reached the create base node point, before the HRegionServer checks the existence of base node... the region server crashes... If we want to keep the previous logic, the first one that arrives create the base node co, we can remove the canCreateBaseZNode flag, else we can use HBASE-5666-v4.patch to wait and retry on checkExists(). what do you think? RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245758#comment-13245758 ] Matteo Bertozzi commented on HBASE-5666: The problem that I see with the patch v4 and the if (timeout == 0) special case is that exists() is different for ZooKeeper and RecoverableZookeeper. RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this logic in ZKUtil.checkExist() in this way we can remove the special case, and remove the code in RecoverabeZK. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245759#comment-13245759 ] Matteo Bertozzi commented on HBASE-5666: The problem that I see with the patch v4 and the if (timeout == 0) special case is that exists() is different for ZooKeeper and RecoverableZookeeper. RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this logic in ZKUtil.checkExist() in this way we can remove the special case, and remove the code in RecoverabeZK. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244488#comment-13244488 ] Matteo Bertozzi commented on HBASE-5666: The problem here is that there's no ConnectionLossException... if you take a look at the log you can see that there's no KeeperException but zookeeper respond that the base node doesn't exists. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244537#comment-13244537 ] Matteo Bertozzi commented on HBASE-5666: m... maybe i've lost something but, in 0.92 and trunk that code was removed and there's just a call to ZKUtil.createAndFailSilent() that doesn't retry. Any idea? https://github.com/apache/hbase/commit/6dc7ccf3779add13188bd73011e0d25bbab77a05 RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table
[ https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243771#comment-13243771 ] Matteo Bertozzi commented on HBASE-5665: Can we also add a couple of methods to the region like isSplittable() and isAvailable() {code} boolean isAvailable() { return !isClosed() !isClosing(); } boolean isSplittable() { return isAvailable() !hasReferences(); } {code} just to avoid similar problems in future... For example in HRegionServer both getMostLoadedRegions() and closeUserRegions() does the same isAvailable() check... Repeated split causes HRegionServer failures and breaks table -- Key: HBASE-5665 URL: https://issues.apache.org/jira/browse/HBASE-5665 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Priority: Blocker Attachments: HBASE-5665-0.92.patch Repeated splits on large tables (2 consecutive would suffice) will essentially break the table (and the cluster), unrecoverable. The regionserver doing the split dies and the master will get into an infinite loop trying to assign regions that seem to have the files missing from HDFS. The table can be disabled once. upon trying to re-enable it, it will remain in an intermediary state forever. I was able to reproduce this on a smaller table consistently. {code} hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'} hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}} {code} Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) will reproduce the issue almost instantly and consistently. {code} 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in META 2012-03-28 10:57:16,321 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1.. compaction_queue=(0:1), split_queue=10 2012-03-28 10:57:16,343 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 java.io.IOException: Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: File does not exist: /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008) at org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484) ... 1 more 2012-03-28 10:57:16,345 FATAL
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243855#comment-13243855 ] Matteo Bertozzi commented on HBASE-5666: @Ted woo good catch I've just translated the method without thinking... and this simplified version emphasizes a problem already present in the previous version. If you take a look at the original version, (the LOG.info is under if, ok) but what happens if the method return and the znode is not available? no exception is raised... but I think that the caller of waitForXyz() expect some exception in case of timeout, in the other case the value that I'm looking for must be present... (this function is just called by one test) RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243236#comment-13243236 ] Matteo Bertozzi commented on HBASE-5666: diving into the code, I've also noticed that there's a ZKUtil.waitForBaseZNode() that has already the retry logic but: - It takes a Configuration object - The timeout is set internally to 1ms - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243302#comment-13243302 ] Matteo Bertozzi commented on HBASE-5666: I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the HRegionServer... what do you think? RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242734#comment-13242734 ] Matteo Bertozzi commented on HBASE-5666: The exception seems to be present in 0.92 and trunk, but I've only looked at trunk code RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5681) Split Region crash if region is still offline after a previous split
[ https://issues.apache.org/jira/browse/HBASE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242736#comment-13242736 ] Matteo Bertozzi commented on HBASE-5681: quite similar, i get both exception. The FileNotFound of HBASE-5665 and NotServingRegionException. I'll retry with 5665 patch. Split Region crash if region is still offline after a previous split Key: HBASE-5681 URL: https://issues.apache.org/jira/browse/HBASE-5681 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: logs0-HBASE-5681.tar.bz2, logs1-HBASE-5681.tar.bz2 I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) due to HBASE-5666 I need a sleep to ensure that rs are up. {code} $HBASE_HOME/bin/start-hbase.sh sleep 5 # bug HBASE-5666 rs doesn't retry if znode is not available. $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} Once hbase is started I run an hbase shell script file (see below) everything is fine till last split operation. {code} # $HBASE_HOME/bin/hbase shell test.hbase # test.hbase create 'bugtb-t1', 'tcf11', 'tcf12' create 'bugtb-t2', 'tcf11', 'tcf12' put 'bugtb-t1', '10', 'tcf11:c1', 'a' put 'bugtb-t1', '15', 'tcf11:c2', 'b' put 'bugtb-t1', '20', 'tcf11:c1', 'c' put 'bugtb-t1', '30', 'tcf11:c2', 'd' put 'bugtb-t1', '35', 'tcf11:c1', 'e' put 'bugtb-t1', '40', 'tcf11:c2', 'f' put 'bugtb-t2', '10', 'tcf11:c1', 'a' put 'bugtb-t2', '15', 'tcf11:c2', 'b' put 'bugtb-t2', '20', 'tcf11:c1', 'c' put 'bugtb-t2', '30', 'tcf11:c2', 'd' put 'bugtb-t2', '35', 'tcf11:c1', 'e' put 'bugtb-t2', '40', 'tcf11:c2', 'f' split 'bugtb-t1', '20' split 'bugtb-t2', '20' split 'bugtb-t1', '40' {code} During the last split the region is still offline, and you get an exception (If you sleep a bit before executing the last split, everything is fine) {code} ERROR: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: bugtb-t1,,1333134892936.4e14c2cf4293156d5b099dc3d5c44890. at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3123) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2926) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1383) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242801#comment-13242801 ] Matteo Bertozzi commented on HBASE-5666: yep, just checked and the code seems unchanged from trunk to 0.92... Anyone has a different idea on that? or I can try to implement a retry loop? RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240757#comment-13240757 ] Matteo Bertozzi commented on HBASE-5666: @Zhihong Yu: Probably a retry loop is the simplest solution... But retry for how long? an infinite loop hoping that the node become available seems wrong, maybe we can add another parameter to define a ZK_RETRY_TIMEOUT. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3929) Add option to HFile tool to produce basic stats
[ https://issues.apache.org/jira/browse/HBASE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129088#comment-13129088 ] Matteo Bertozzi commented on HBASE-3929: Currently HFilePrettyPrinter raise a couple of exceptions if the HFile is Empty, just because it doesn't check if seekTo() returns true or false, and the first call after seekTo() is a scanner.getKeyValue() so you get a NPE... I've added a v2 patch with the pkv rename, count == 0 handled, and seekTo checked to fix the NPE. Add option to HFile tool to produce basic stats --- Key: HBASE-3929 URL: https://issues.apache.org/jira/browse/HBASE-3929 Project: HBase Issue Type: New Feature Components: io Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.94.0 Attachments: HBASE-3929-v2.patch, hbase-3929-draft.patch, hbase-3929-draft.txt In looking at HBASE-3421 I wrote a small tool to scan an HFile and produce some basic statistics about it: - min/mean/max key size, value size (uncompressed) - min/mean/max number of columns per row (uncompressed) - min/mean/max number of bytes per row (uncompressed) - the key of the largest row -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3929) Add option to HFile tool to produce basic stats
[ https://issues.apache.org/jira/browse/HBASE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128224#comment-13128224 ] Matteo Bertozzi commented on HBASE-3929: From 0.92 the HFile.main() contains just a call to HFilePrettyPrinter.run() So there's no more the Tool code inside the HFile.java Probably was not the refactor that todd has in mind, but it solve the first todd's thought: 'we should refactor all of the HFile Tool stuff out of HFile into a new class.' Add option to HFile tool to produce basic stats --- Key: HBASE-3929 URL: https://issues.apache.org/jira/browse/HBASE-3929 Project: HBase Issue Type: New Feature Components: io Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.94.0 Attachments: hbase-3929-draft.patch, hbase-3929-draft.txt In looking at HBASE-3421 I wrote a small tool to scan an HFile and produce some basic statistics about it: - min/mean/max key size, value size (uncompressed) - min/mean/max number of columns per row (uncompressed) - min/mean/max number of bytes per row (uncompressed) - the key of the largest row -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira