[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-06-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287235#comment-13287235
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.92-security #109 (See 
[https://builds.apache.org/job/HBase-0.92-security/109/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344799)
HBASE-6122 Backup master does not become Active master after ZK exception: 
REVERT (Revision 1344466)
HBASE-6122 Backup master does not become Active master after ZK exception (Ram) 
(Revision 1344350)

 Result = SUCCESS
ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java

ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-06-01 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287278#comment-13287278
 ] 

nkeywal commented on HBASE-6122:


@ram
bq. I found some changes in the trunk code. So not sure if it is applicable in 
trunk. Attached patches for 0.94 and 0.92.

Do you mean that the problem is not reproducible on trunk?

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-06-01 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287296#comment-13287296
 ] 

ramkrishna.s.vasudevan commented on HBASE-6122:
---

@N
The trunk code is different.  Currently there is a while(true) loop and as far 
as i see it should be ok in trunk.
I did not try to reproduce in trunk.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-06-01 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287306#comment-13287306
 ] 

nkeywal commented on HBASE-6122:


Thanks, I will give it a try to be sure.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286359#comment-13286359
 ] 

stack commented on HBASE-6122:
--

Looks good Ram.  +1

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286765#comment-13286765
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.94 #240 (See 
[https://builds.apache.org/job/HBase-0.94/240/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344798)

 Result = SUCCESS
ramkrishna : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286844#comment-13286844
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.92 #439 (See 
[https://builds.apache.org/job/HBase-0.92/439/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344799)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287164#comment-13287164
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.94-security #33 (See 
[https://builds.apache.org/job/HBase-0.94-security/33/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344798)
HBASE-6122 Backup master does not become Active master after ZK exception: 
REVERT (Revision 1344467)
HBASE-6122 Backup master does not become Active master after ZK exception (Ram) 
(Revision 1344348)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java

stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java

ramkrishna : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285823#comment-13285823
 ] 

ramkrishna.s.vasudevan commented on HBASE-6122:
---

Committed to 0.92 and 0.94.
Thanks for the review Lars.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285871#comment-13285871
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.94 #233 (See 
[https://builds.apache.org/job/HBase-0.94/233/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344348)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285959#comment-13285959
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.92 #433 (See 
[https://builds.apache.org/job/HBase-0.92/433/])
HBASE-6122 Backup master does not become Active master after ZK exception 
(Ram) (Revision 1344350)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286072#comment-13286072
 ] 

stack commented on HBASE-6122:
--

I reverted from 0.92 and 0.94 branches till we figure the failures.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286166#comment-13286166
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.94 #236 (See 
[https://builds.apache.org/job/HBase-0.94/236/])
HBASE-6122 Backup master does not become Active master after ZK exception: 
REVERT (Revision 1344467)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286235#comment-13286235
 ] 

Hudson commented on HBASE-6122:
---

Integrated in HBase-0.92 #435 (See 
[https://builds.apache.org/job/HBase-0.92/435/])
HBASE-6122 Backup master does not become Active master after ZK exception: 
REVERT (Revision 1344466)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java


 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286298#comment-13286298
 ] 

ramkrishna.s.vasudevan commented on HBASE-6122:
---

Oh... Let me check out the reason for the failure.  Sorry for the mess.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286334#comment-13286334
 ] 

ramkrishna.s.vasudevan commented on HBASE-6122:
---

I checked the test case.
Ideally the flow is making the master to become active but the problem as 
described in this JIRA still makes the master to go down.

I added a log in ActiveMasterManager.blockUntilBecomingActiveMaster
{code}
{code}
{code}
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(149): Master is now available 
Htipl-01388.china.huawei.com,3569,1338441734226
2012-05-31 10:52:29,050 INFO  [pool-29-thread-1] 
master.ActiveMasterManager(151): 
Master=Htipl-01388.china.huawei.com,3569,1338441734226
{code}

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286350#comment-13286350
 ] 

stack commented on HBASE-6122:
--

@Ram Which assert should be changed?  Do you want to include the assert change 
in your patch?  Or are you suggesting a previous test case is broke?  If so, 
which?  Thanks.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286353#comment-13286353
 ] 

ramkrishna.s.vasudevan commented on HBASE-6122:
---

I have attached the patch Stack.  It is changing the assert.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, 
 HBASE-6122_0.94.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6122) Backup master does not become Active master after ZK exception

2012-05-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284961#comment-13284961
 ] 

Lars Hofhansl commented on HBASE-6122:
--

+1 patch looks good to me.

 Backup master does not become Active master after ZK exception
 --

 Key: HBASE-6122
 URL: https://issues.apache.org/jira/browse/HBASE-6122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6122_0.92.patch, HBASE-6122_0.94.patch


 - Active master gets ZK expiry exception.
 - Backup master becomes active.
 - The previous active master retries and becomes the back up master.
 Now when the new active master goes down and the current back up master comes 
 up, it goes down again with the zk expiry exception it got in the first step.
 {code}
 if (abortNow(msg, t)) {
   if (t != null) LOG.fatal(msg, t);
   else LOG.fatal(msg);
   this.abort = true;
   stop(Aborting);
 }
 {code}
 In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the 
 back up master becomes active. 
 {code}
 synchronized (this.clusterHasActiveMaster) {
   while (this.clusterHasActiveMaster.get()  !this.master.isStopped()) {
 try {
   this.clusterHasActiveMaster.wait();
 } catch (InterruptedException e) {
   // We expect to be interrupted when a master dies, will fall out if 
 so
   LOG.debug(Interrupted waiting for master to die, e);
 }
   }
   if (!clusterStatusTracker.isClusterUp()) {
 this.master.stop(Cluster went down before this master became 
 active);
   }
   if (this.master.isStopped()) {
 return cleanSetOfActiveMaster;
   }
   // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 }
 return cleanSetOfActiveMaster;
 {code}
 When the back up master (it is in back up mode as he got ZK exception), once 
 again tries to come to active we don't get the return value that comes out 
 from 
 {code}
 // Try to become active master again now that there is no active master
   blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker);
 {code}
 We tend to return the 'cleanSetOfActiveMaster' which was previously false.
 Now because of this instead of again becoming active the back up master goes 
 down in the abort() code.  Thanks to Gopi,my colleague for reporting this 
 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira