[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-24 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Attachment: HBASE-5849_v4.patch
HBASE-5849_v4-0.92.patch
HBASE-5849_v4.patch

I have found 2 issues, that caused timeouts in 0.92 branch: 
1. hbase dir was not setup to use the temp dir under target/, but used the 
default one under /tmp/hadoop-${username}, so running the  test on 0.92 causes 
rs to not come up if you have dirty data under /tmp/. 
2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown 
properly. I could not inspect this further, there might be an issue with 
surefire. 

As a result, I updated the patch to first boot up a mini dfs, and setup the 
hbase dir. And I also removed the timeouts (the test runner (maven) will 
timeout instead if something goes wrong).

All my tests for trunk,0.94, and 0.92 seem to pass.  

@Ted, @Stack, can you please try the patch to see whether you can replicate?

On an unrelated note, the ResourceChecker notifies that some of the daemon 
threads (like LruBlockCache.EvictionThread) are not shutdown properly (even 
when using MiniHBaseCluster, and shutting down properly). Any idea, whether we 
should dig into that?

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, 
 HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, 
 HBASE-5849_v4.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-24 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Status: Patch Available  (was: Reopened)

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, 
 HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, 
 HBASE-5849_v4.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-24 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Attachment: HBASE-5849_v4.patch

Reattaching for Jenkins. 

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, 
 HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, 
 HBASE-5849_v4.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-24 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5849:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Applied to 0.92, 0.94, and trunk (took Ted's work for it that it works -- 
thanks Ted).   Thanks for the patch Enis and for digging in again.

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, 
 HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, 
 HBASE-5849_v4.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Attachment: HBASE-5849_v1.patch

Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches. 

Tested this with pseudo-distributed setup on my laptop, by first launching 
regionserver, and observing that it does actually wait for the master to boot 
up, instead of aborting. I'll try to come up with a boot order unit test 
shortly.

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5849_v1.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5849:
-

Status: Patch Available  (was: Open)

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5849_v1.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Status: Open  (was: Patch Available)

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5849_v1.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Attachment: HBASE-5849_v2.patch

Thanks Stack for taking a look into this. I have added a unit test for boot 
order for the cluster. 

To answer you earlier comment, I think the region server should just keep 
waiting until there is an active master. 

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5849:
-

Status: Patch Available  (was: Open)

Rerunning hudson for patch v2. 

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5849:
-

Attachment: 5849v3.txt

Enis's v2 patch with this added to end of test:

{code}
+  @org.junit.Rule
+  public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu =
+new org.apache.hadoop.hbase.ResourceCheckerJUnitRule();
{code}

Nice test.

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5849:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
   0.92.2
 Release Note: Rather than exit, the regionserver will now wait even though 
the root directory in zookeeper has yet to be created.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.92, 0.94, and to trunk.  Thanks for the patch Enis.

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5849:
--

Attachment: 5849.jstack

I refreshed my workspace for trunk.
TestClusterBootOrder seemed to be stuck.

See attached jstack.

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849.jstack, 5849v3.txt, HBASE-5849_v1.patch, 
 HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5849:
--

Attachment: (was: 5849.jstack)

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available

2012-04-23 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5849:
--

Comment: was deleted

(was: I refreshed my workspace for trunk.
TestClusterBootOrder seemed to be stuck.

See attached jstack.)

 On first cluster startup, RS aborts if root znode is not available
 --

 Key: HBASE-5849
 URL: https://issues.apache.org/jira/browse/HBASE-5849
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, zookeeper
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.2, 0.94.0

 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch


 When launching a fresh new cluster, the master has to be started first, which 
 might create race conditions for starting master and rs at the same time. 
 Master startup code is smt like this: 
  - establish zk connection
  - create root znodes in zk (/hbase)
  - create ephemeral node for master /hbase/master, 
  Region server start up code is smt like this: 
  - establish zk connection
  - check whether the root znode (/hbase) is there. If not, shutdown. 
  - wait for the master to create znodes /hbase/master
 So, the problem is on the very first launch of the cluster, RS aborts to 
 start since /hbase znode might not have been created yet (only the master 
 creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on 
 subsequent cluster starts, it does not matter which order the servers are 
 started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira