[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v4.patch HBASE-5849_v4-0.92.patch HBASE-5849_v4.patch I have found 2 issues, that caused timeouts in 0.92 branch: 1. hbase dir was not setup to use the temp dir under target/, but used the default one under /tmp/hadoop-${username}, so running the test on 0.92 causes rs to not come up if you have dirty data under /tmp/. 2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown properly. I could not inspect this further, there might be an issue with surefire. As a result, I updated the patch to first boot up a mini dfs, and setup the hbase dir. And I also removed the timeouts (the test runner (maven) will timeout instead if something goes wrong). All my tests for trunk,0.94, and 0.92 seem to pass. @Ted, @Stack, can you please try the patch to see whether you can replicate? On an unrelated note, the ResourceChecker notifies that some of the daemon threads (like LruBlockCache.EvictionThread) are not shutdown properly (even when using MiniHBaseCluster, and shutting down properly). Any idea, whether we should dig into that? On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Patch Available (was: Reopened) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v4.patch Reattaching for Jenkins. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Resolution: Fixed Status: Resolved (was: Patch Available) Applied to 0.92, 0.94, and trunk (took Ted's work for it that it works -- thanks Ted). Thanks for the patch Enis and for digging in again. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v1.patch Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches. Tested this with pseudo-distributed setup on my laptop, by first launching regionserver, and observing that it does actually wait for the master to boot up, instead of aborting. I'll try to come up with a boot order unit test shortly. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Status: Patch Available (was: Open) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Open (was: Patch Available) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v2.patch Thanks Stack for taking a look into this. I have added a unit test for boot order for the cluster. To answer you earlier comment, I think the region server should just keep waiting until there is an active master. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Patch Available (was: Open) Rerunning hudson for patch v2. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Attachment: 5849v3.txt Enis's v2 patch with this added to end of test: {code} + @org.junit.Rule + public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu = +new org.apache.hadoop.hbase.ResourceCheckerJUnitRule(); {code} Nice test. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Resolution: Fixed Fix Version/s: 0.94.0 0.92.2 Release Note: Rather than exit, the regionserver will now wait even though the root directory in zookeeper has yet to be created. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92, 0.94, and to trunk. Thanks for the patch Enis. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5849: -- Attachment: 5849.jstack I refreshed my workspace for trunk. TestClusterBootOrder seemed to be stuck. See attached jstack. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849.jstack, 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5849: -- Attachment: (was: 5849.jstack) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5849: -- Comment: was deleted (was: I refreshed my workspace for trunk. TestClusterBootOrder seemed to be stuck. See attached jstack.) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira