[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5572:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Resolved as part of HBASE-5549

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v1.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Attachment: 5572.v2.patch

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-13 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5572:
---

Status: Open  (was: Patch Available)

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira