[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287235#comment-13287235 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.92-security #109 (See [https://builds.apache.org/job/HBase-0.92-security/109/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344799) HBASE-6122 Backup master does not become Active master after ZK exception: REVERT (Revision 1344466) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344350) Result = SUCCESS ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287278#comment-13287278 ] nkeywal commented on HBASE-6122: @ram bq. I found some changes in the trunk code. So not sure if it is applicable in trunk. Attached patches for 0.94 and 0.92. Do you mean that the problem is not reproducible on trunk? Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287296#comment-13287296 ] ramkrishna.s.vasudevan commented on HBASE-6122: --- @N The trunk code is different. Currently there is a while(true) loop and as far as i see it should be ok in trunk. I did not try to reproduce in trunk. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287306#comment-13287306 ] nkeywal commented on HBASE-6122: Thanks, I will give it a try to be sure. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286359#comment-13286359 ] stack commented on HBASE-6122: -- Looks good Ram. +1 Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286765#comment-13286765 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.94 #240 (See [https://builds.apache.org/job/HBase-0.94/240/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344798) Result = SUCCESS ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286844#comment-13286844 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.92 #439 (See [https://builds.apache.org/job/HBase-0.92/439/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344799) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287164#comment-13287164 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.94-security #33 (See [https://builds.apache.org/job/HBase-0.94-security/33/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344798) HBASE-6122 Backup master does not become Active master after ZK exception: REVERT (Revision 1344467) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344348) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285823#comment-13285823 ] ramkrishna.s.vasudevan commented on HBASE-6122: --- Committed to 0.92 and 0.94. Thanks for the review Lars. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285871#comment-13285871 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.94 #233 (See [https://builds.apache.org/job/HBase-0.94/233/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344348) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285959#comment-13285959 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.92 #433 (See [https://builds.apache.org/job/HBase-0.92/433/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344350) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286072#comment-13286072 ] stack commented on HBASE-6122: -- I reverted from 0.92 and 0.94 branches till we figure the failures. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286166#comment-13286166 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.94 #236 (See [https://builds.apache.org/job/HBase-0.94/236/]) HBASE-6122 Backup master does not become Active master after ZK exception: REVERT (Revision 1344467) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286235#comment-13286235 ] Hudson commented on HBASE-6122: --- Integrated in HBase-0.92 #435 (See [https://builds.apache.org/job/HBase-0.92/435/]) HBASE-6122 Backup master does not become Active master after ZK exception: REVERT (Revision 1344466) Result = SUCCESS stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286298#comment-13286298 ] ramkrishna.s.vasudevan commented on HBASE-6122: --- Oh... Let me check out the reason for the failure. Sorry for the mess. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286334#comment-13286334 ] ramkrishna.s.vasudevan commented on HBASE-6122: --- I checked the test case. Ideally the flow is making the master to become active but the problem as described in this JIRA still makes the master to go down. I added a log in ActiveMasterManager.blockUntilBecomingActiveMaster {code} {code} {code} 2012-05-31 10:52:29,050 INFO [pool-29-thread-1] master.ActiveMasterManager(149): Master is now available Htipl-01388.china.huawei.com,3569,1338441734226 2012-05-31 10:52:29,050 INFO [pool-29-thread-1] master.ActiveMasterManager(151): Master=Htipl-01388.china.huawei.com,3569,1338441734226 {code} Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286350#comment-13286350 ] stack commented on HBASE-6122: -- @Ram Which assert should be changed? Do you want to include the assert change in your patch? Or are you suggesting a previous test case is broke? If so, which? Thanks. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286353#comment-13286353 ] ramkrishna.s.vasudevan commented on HBASE-6122: --- I have attached the patch Stack. It is changing the assert. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, HBASE-6122_0.94.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception
[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284961#comment-13284961 ] Lars Hofhansl commented on HBASE-6122: -- +1 patch looks good to me. Backup master does not become Active master after ZK exception -- Key: HBASE-6122 URL: https://issues.apache.org/jira/browse/HBASE-6122 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch - Active master gets ZK expiry exception. - Backup master becomes active. - The previous active master retries and becomes the back up master. Now when the new active master goes down and the current back up master comes up, it goes down again with the zk expiry exception it got in the first step. {code} if (abortNow(msg, t)) { if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); this.abort = true; stop(Aborting); } {code} In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the back up master becomes active. {code} synchronized (this.clusterHasActiveMaster) { while (this.clusterHasActiveMaster.get() !this.master.isStopped()) { try { this.clusterHasActiveMaster.wait(); } catch (InterruptedException e) { // We expect to be interrupted when a master dies, will fall out if so LOG.debug(Interrupted waiting for master to die, e); } } if (!clusterStatusTracker.isClusterUp()) { this.master.stop(Cluster went down before this master became active); } if (this.master.isStopped()) { return cleanSetOfActiveMaster; } // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); } return cleanSetOfActiveMaster; {code} When the back up master (it is in back up mode as he got ZK exception), once again tries to come to active we don't get the return value that comes out from {code} // Try to become active master again now that there is no active master blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); {code} We tend to return the 'cleanSetOfActiveMaster' which was previously false. Now because of this instead of again becoming active the back up master goes down in the abort() code. Thanks to Gopi,my colleague for reporting this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira