[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-02-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570971#comment-13570971
 ] 

Hudson commented on HBASE-2611:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/])
HBASE-2611 Handle RS that fails while processing the failure of another one 
(Himanshu Vashishtha) (Revision 1440054)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566296#comment-13566296
 ] 

Hudson commented on HBASE-2611:
---

Integrated in HBase-0.94-security #102 (See 
[https://builds.apache.org/job/HBase-0.94-security/102/])
HBASE-2611 Handle RS that fails while processing the failure of another one 
(Himanshu Vashishtha) (Revision 1440054)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565341#comment-13565341
 ] 

Hudson commented on HBASE-2611:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #382 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/382/])
HBASE-2611 Handle RS that fails while processing the failure of another one 
(Himanshu) (Revision 1439744)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565728#comment-13565728
 ] 

Hudson commented on HBASE-2611:
---

Integrated in HBase-0.94 #800 (See 
[https://builds.apache.org/job/HBase-0.94/800/])
HBASE-2611 Handle RS that fails while processing the failure of another one 
(Himanshu Vashishtha) (Revision 1440054)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564910#comment-13564910
 ] 

Lars Hofhansl commented on HBASE-2611:
--

[~ted_yu] Let's commit this. +1 from me.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564936#comment-13564936
 ] 

Jean-Daniel Cryans commented on HBASE-2611:
---

Some comments:

bq. LOG.info(Moving  + rsZnode + 's hlogs to my queue);

This could be changed to say whether it's going to be done atomically or not.

bq. LOG.debug( The multi list is:  + listOfOps + , size:  + 
listOfOps.size());

This is going to print a lot of object references... not sure how useful this 
is. Maybe just keep the size?

bq. LOG.info(Atomically moved the dead regionserver logs. );

With my first comment this becomes redundant and somewhere else it will say 
when the move is done anyway.

bq. LOG.warn(Got exception in copyQueuesFromRSUsingMulti:  + e);

Put the e in the second paramater instead of appending it to the string.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565004#comment-13565004
 ] 

Hadoop QA commented on HBASE-2611:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566875/HBASE-2611-trunk-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4225//console

This message is automatically generated.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565005#comment-13565005
 ] 

Ted Yu commented on HBASE-2611:
---

@Himanshu:
Mind attaching patch for 0.94 ?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565080#comment-13565080
 ] 

Hadoop QA commented on HBASE-2611:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566899/2611-trunk-v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4228//console

This message is automatically generated.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-trunk-v4.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565081#comment-13565081
 ] 

Ted Yu commented on HBASE-2611:
---

Patch v4 integrated to trunk.

Thanks for the patch, Himanshu.

Thanks for the reviews, Lars and J-D.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-trunk-v4.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565089#comment-13565089
 ] 

Hadoop QA commented on HBASE-2611:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566908/2611-0.94.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4229//console

This message is automatically generated.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565095#comment-13565095
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Going to commit the 0.94 version tomorrow, unless I hear objections.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565131#comment-13565131
 ] 

Hudson commented on HBASE-2611:
---

Integrated in HBase-TRUNK #3820 (See 
[https://builds.apache.org/job/HBase-TRUNK/3820/])
HBASE-2611 Handle RS that fails while processing the failure of another one 
(Himanshu) (Revision 1439744)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564016#comment-13564016
 ] 

Lars Hofhansl commented on HBASE-2611:
--

[~jdcryans]?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562753#comment-13562753
 ] 

Hadoop QA commented on HBASE-2611:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566510/2611-trunk-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4181//console

This message is automatically generated.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562755#comment-13562755
 ] 

Ted Yu commented on HBASE-2611:
---

Will integrated patch v3 later today if there is no further review comment.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562847#comment-13562847
 ] 

Ted Yu commented on HBASE-2611:
---

[~jdcryans]:
It would be nice if you take a look at Himanshu's patch.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-trunk-v3.patch, 2611-v3.patch, 
 HBASE-2611-trunk-v2.patch, HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562383#comment-13562383
 ] 

Ted Yu commented on HBASE-2611:
---

{code}
p0 2611-upstream-v1.patch
patching file 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
Hunk #1 succeeded at 25 (offset -1 lines).
Hunk #2 FAILED at 41.
Hunk #3 succeeded at 858 (offset 131 lines).
1 out of 3 hunks FAILED -- saving rejects to file 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java.rej
patching file 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
Hunk #1 succeeded at 579 (offset 19 lines).
patching file 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
Reversed (or previously applied) patch detected!  Assume -R? [n] ^C
{code}
@Himanshu:
Can you update the upstream patch ?

Thanks

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-v3.patch, HBase-2611-upstream-v1.patch, 
 HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562460#comment-13562460
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Himanshu, you are officially my hero now. We've been discussing this for over a 
year, and it looks like we're finally fixing it.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-v3.patch, HBASE-2611-trunk-v2.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562479#comment-13562479
 ] 

Hadoop QA commented on HBASE-2611:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566463/HBASE-2611-trunk-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4177//console

This message is automatically generated.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-v3.patch, HBASE-2611-trunk-v2.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-22 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560005#comment-13560005
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Thanks for the review Lars :), and Ted for updating the patch.



 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-v3.patch, HBase-2611-upstream-v1.patch, 
 HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560036#comment-13560036
 ] 

Ted Yu commented on HBASE-2611:
---

The trunk patch depends on HBASE-7382

@Himanshu:
Can you run the tests listed @ 28/Jun/12 04:07 ?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.5

 Attachments: 2611-v3.patch, HBase-2611-upstream-v1.patch, 
 HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-21 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558938#comment-13558938
 ] 

Himanshu Vashishtha commented on HBASE-2611:


[~lhofhansl]: Yes, I followed the same approach in the attached patch.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559232#comment-13559232
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Hmm... Yes, you did. Sorry, somehow missed it when I looked at it first.
Cool then, we came to the same conclusion. Just took me much longer to get to 
it :)

+1 on patch, it should indeed fix this problem.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558487#comment-13558487
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Specifically, check out ReplicatoinSourceManager.NodeFailoverWorker.run().
First all surviving RSs race to obtain the lock:
{code}
  if (!zkHelper.lockOtherRS(rsZnode)) {
return;
  }
{code}
Only one RS will continue to move the failed RS's regions.

I think what we could do is this:
If multi is supported we just have all surviving RSs attempt to move the queues 
(don't bother with the lock step). If multi is as atomic as advertised that 
should work and only one of the RS will succeed to move the queues atomically, 
but all will try.
It seems like that should work.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-16 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555213#comment-13555213
 ] 

Himanshu Vashishtha commented on HBASE-2611:


bq. But what can happen is that the region server who wins the race to take 
over the dead region server's queues could die before it even manages to call 
multi.
Not following your question. How can a regionserver wins a race before calling 
multi? If regionserver A fails, *all* regionserver will call multi to do the 
failover, and only one (let's say B) will succeed. Now, if B also dies 
meanwhile (while it has succeeded in transferring the queue from zk 
perspective), the regionserver doing the failover for B will also process A's 
znodes (as they are with B now). Therefore, I don't see we really need a retry. 
Did I miss anything?


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555263#comment-13555263
 ] 

Lars Hofhansl commented on HBASE-2611:
--

But that is not case (unless I am misunderstanding completely). All RSs race to 
get the lock to take over the dead RS's queues. Once there is a winner, that RS 
will move the queues. So if the winning RS dies after it learn that it is the 
winner but before it move the queues those queues are lost.

What you describe is one way to solve the problem: All RSs simply try to move 
the queues. That would work, but would lead to the herding effect (which I 
think is acceptable).


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-16 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555488#comment-13555488
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Yes, your description is totally correct. So, you okay with the approach, Lars?


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1322#comment-1322
 ] 

Lars Hofhansl commented on HBASE-2611:
--

This change is good (so +1), but it does not fix the whole problem (you're not 
having all RSs attempt the queue failover).
Maybe we do your patch in a subtask and leave this issue open.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1324#comment-1324
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Or did you mean whether I'm OK with all RSs attempting to move the queues? I'm 
happy with that too. I think [~jdcryans] voiced some concerns over the incurred 
herding effect.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554574#comment-13554574
 ] 

Lars Hofhansl commented on HBASE-2611:
--

So that call to multi better not fail, ever. Otherwise we'll still lose track 
of data to be replicated.
There two problem currently:
# Transfer of queues is only attempted once
# Queues may be partially transferred

This patch addresses the only #2.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554582#comment-13554582
 ] 

Ted Yu commented on HBASE-2611:
---

{code}
+   * @param znode
+   * @return
{code}
Please finish javadoc.
The key of SortedMap is peer cluster Id, right ?
{code}
+  LOG.warn(Got exception in copyQueuesFromRSUsingMulti:  + e);
{code}
If you use comma in place of +, you would get method names.

There is no empty line in copyQueuesFromRSUsingMulti(). Consider adding empty 
line to separate sub-steps.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554584#comment-13554584
 ] 

Himanshu Vashishtha commented on HBASE-2611:


The call to multi is using RecoverableZookeeper#multi, which does a retry in 
case of 
{code}
  case CONNECTIONLOSS:
  case SESSIONEXPIRED:
  case OPERATIONTIMEOUT:
{code}
which by default, is three. I find this approach better than the existing one.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554586#comment-13554586
 ] 

Chris Trezzo commented on HBASE-2611:
-

But the retries in RecoverableZookeeper are not atomic... if the region server 
fails in the middle of RecoverableZooKeeper.multi, the queues will not get 
transferred.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554588#comment-13554588
 ] 

Chris Trezzo commented on HBASE-2611:
-

Also, I don't think your manual test described above hits this corner case. You 
need at least two region server failures for this to happen. For example, 
region server A fails, region server B races and wins the failover of A, 
and then region server B fails before it finishes copying A's queue to it's own 
queue. Then when someone picks up B, A's original queue will not get completely 
replicated.

Thanks for working on this though! It is a tricky one.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554617#comment-13554617
 ] 

Chris Trezzo commented on HBASE-2611:
-

[~hvash...@cs.ualberta.ca] Hmm I may have miss spoke... atomic was not the 
right word choice.

bq. But the retries in RecoverableZookeeper are not atomic... if the region 
server fails in the middle of RecoverableZooKeeper.multi, the queues will not 
get transferred.

I see that as long as a multi hasn't succeeded, all region servers will 
continue to try and failover the queues. So the problem seems to be more along 
the lines of if all region servers exhaust their multi retries, then the queues 
would get lost.

Is there ever a case in practice where we would run into this and zookeeper is 
not down?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554724#comment-13554724
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Chris: Thanks for taking a look.

bq. Is there ever a case in practice where we would run into this and zookeeper 
is not down?
Can't think of any. 
Even if that ever happens (let's say all regionservers can't connect to zk or 
whatever), then, we need something different (possibly beyond the scope of this 
jira) so any new joining regionserver take a look at existing log znodes, etc.

Re: Testing:
Yeah, I know. But, given that it is moved in one transaction, I can't think of 
how to replicate it in a testing environment. Therefore, I tested to see what 
happens when two regionservers tries to copy the queue, and whether this 
approach scales well with number of logs or not. 


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554747#comment-13554747
 ] 

Lars Hofhansl commented on HBASE-2611:
--

This is definitely an improvement.
What happens when a region server dies after it copied the queues but before it 
could finish shipping all the edits?

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554752#comment-13554752
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Lars: Then a regionserver who does the failover will also process the leftover 
znodes (just like what happens currently).

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554758#comment-13554758
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Cool... So as long as the multi itself does not fail we're good.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554761#comment-13554761
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Yes. I would ask, though, in what possible circumstances you foresee failure of 
multi()? 

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554783#comment-13554783
 ] 

Himanshu Vashishtha commented on HBASE-2611:


[~lhofhansl]: I asked about possible failure scenarios because it will be great 
if they can be worked upon beforehand.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554789#comment-13554789
 ] 

Lars Hofhansl commented on HBASE-2611:
--

Yeah, I don't know.

But what can happen is that the region server who wins the race to take over 
the dead region server's queues could die before it even manages to call multi. 
In the case - since the ephemeral znode is only removed once - we won't ever 
retry to move that region server's queues again. Right?
So another part of the puzzle is to have a way to retry the takeover later. 
Back in the comments here there are various suggestions about how to do that 
mostly centering around having all surviving RSs try to move a dead RS's queues.


 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2013-01-14 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553430#comment-13553430
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Working on it; will provide a patch soon.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.94.5

 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402582#comment-13402582
 ] 

Zhihong Ted Yu commented on HBASE-2611:
---

Putting patch on review board helps.

{code}
+   * @param opList: list of Op to be executed as one trx.
{code}
'trx' - 'transaction'
{code}
+if(opList == null || opList.size() ==0)
{code}
Space between if and (.
{code}
+}catch (InterruptedException ie) {
+  LOG.warn(multi call interrupted; process failed! + ie);
{code}
Restore interrupt status for the thread (same for doMultiAndWatch). Space 
between } and catch.
{code}
+  LOG.warn(multi call failed! One of the passed ops has failed which 
result in the rolled back.);
{code}
Line length beyond 100.
{code}
+   * @return
+   */
+  public SortedMapString, SortedSetString copyDeadRSLogsWithMulti(
+  String deadRSZnode) {
{code}
javadoc for the return value.
{code}
+  LOG.warn(This is us! Skipping the processing as we might be closing 
down.);
{code}
Add deadRSZnodePath to the log.
{code}
+RetryCounterFactory retryCounterFactory = new 
RetryCounterFactory(Integer.MAX_VALUE, 3 * 1000);
{code}
I don't think MAX_VALUE is a good choice.
{code}
+SortedSetString logQueue = new TreeSetString();
{code}
Why is logQueue backed by a TreeSet ?
{code}
+LOG.warn(KeeperException occurred in multi;  +
+seems some other regionserver took the logs before us.);
{code}
Add ke to the above message.
{code}
+Op deleteOpForLog = Op.delete(zNodeForCurrentLog, -1);
+znodesToWatch.add(logZnode);
+opsList.add(createOpForLog);
+opsList.add(deleteOpForLog);
{code}
Please reorder the above calls so that znodesToWatch.add() is after 
opsList.add() calls. This would make code more readable.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402594#comment-13402594
 ] 

Zhihong Ted Yu commented on HBASE-2611:
---

Suppose there are (relatively) large number of Op's in opsList, the chance of 
collision between active region servers is high.

Some experiments should be performed so that we get idea of how long this 
procedure takes to succeed.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402775#comment-13402775
 ] 

Himanshu Vashishtha commented on HBASE-2611:


Thanks for the review Ted. I will upload a modified version on the rb. My 
initial idea of putting it here was to get some feedback on the approach.

Yes, it is zk intensive as all other regionservers are competing to do the 
transaction. But, as soon as one is successful (the first one who create the 
list and issues the multi command), other regionservers which haven't had a 
chance to do a listChildern call on the dead regionserver znode will not see 
anything; and for other regionservers which have created the Ops, the very 
first Op will fail as the znode has already moved. Zookeeper#multi op is fail 
fast, it rolls back the transaction on first failure without retrying remaining 
Ops. I tested it on a 3 RS cluster with average load being 12-14 logs, and it 
usually is done within seconds after the regionserver failure is noticed. What 
sort of experiments you are thinking about.
On an another note, TestReplication passes.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402801#comment-13402801
 ] 

Zhihong Ted Yu commented on HBASE-2611:
---

bq. average load being 12-14 logs
Can you make the above 10x ?

Another consideration is when (which major release) zookeeper 3.4 would be 
listed as minimum requirement.
There hasn't been consensus so far.

Here're all the replication-related tests:
{code}
src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
src/test/java/org/apache/hadoop/hbase/replication/TestReplicationDeleteTypes.java
src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java
src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
{code}

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402811#comment-13402811
 ] 

Himanshu Vashishtha commented on HBASE-2611:


zookeeper 3.4 is there in 0.92+? What do you mean by minimum requirement? 
Please explain.

I find the related test, queuefailover, in TestReplication. Good to know about 
other test classes.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402816#comment-13402816
 ] 

Zhihong Ted Yu commented on HBASE-2611:
---

The 3.4 is only for zookeeper client.
Companies (such as StumbleUpon) run 3.3.x in production which doesn't support 
multi().

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402824#comment-13402824
 ] 

Jesse Yates commented on HBASE-2611:


3.4 is currently only required for security and further, is not yet a stable 
release of ZK. That said, if it does become stable its likely to be adopted 
given that its been pretty solid for many people.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402829#comment-13402829
 ] 

Himanshu Vashishtha commented on HBASE-2611:


bq. 3.4 is only for zookeeper client.

I find this a bit confusing. Why is it so? What do we gain by this?

@Jesse: TM is using secure hbase in their production (if i am not wrong). So, 
3.4 seems pretty reasonable choice. Has there been any discussion on this. I 
would like to know more context on this.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-27 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402863#comment-13402863
 ] 

Jesse Yates commented on HBASE-2611:


@Himanshu here is the thread I started on this on dev@ little while ago: 
http://search-hadoop.com/m/u2D7j1yRpi72 

It basically comes down to the fact that it would be irresponsible to do a 
release of HBase that requires an unstable dependency. Yeah, TM has it in 
production, but that doesn't mean their usage is representative of 
_everyone's_. If the ZK fellas decide that 3.4 is a stable release, then I'm 
all for making it the requirement in 0.96, but until the guys who write the 
software feel like its stable, I don't think we are qualified to say it is 
stable. 

I do think its weird that we make 3.4 a dependency, but it really would be too 
weird (and honestly a waste of effort) to support two versions of the protocol, 
especially considering the trickiness of dealing with ZK clusters that may be 
in the process of upgrade, etc. 

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2012-06-26 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401779#comment-13401779
 ] 

Himanshu Vashishtha commented on HBASE-2611:


I looked at this issue from the perspective of using Zookeeper#multi Operation 
(present in 3.4). This API guarantees to do a list of Op as a single 
transaction, rolling back all the Ops in case any of the Op fails. I tested 
this functionality as a standalone case (where the transaction was to move a 
bunch of Znodes from one parent to another), and it works good (out of N 
threads which race to do the transfer, only 1 is successful). And in case of a 
failure, all the Ops done so far are rolled back. I can attach the sample code 
if required.

Here is the approach I used to utilize multi for this issue:
a) All the active region servers tries to move the logs of peers under the 
dead regionserver znode. It involves creating Op objects for creating new 
znodes and deleting old ones. As per the multi API guarantee, only one 
regionserver will be successful in moving the znodes.

b) The regionservers will keep on trying to move the znodes from the dead 
regionserver untill they are sure that the node is deleted (by the successful 
regionserver), or there is no log to process. This is to avoid any corner case 
so as not to miss the logs for the dead regionserver. The number of trials can 
be made configurable.

c) In case of cascading failure (when the successful regionserver dies before 
it gets the notification from zk about the successful move), other 
regionservers will get this new event and will proceed as normal (will try to 
move all the znodes from this newly dead regionserver znode).


It will be good to know what others think about this approach. Other rogue 
conditions that can happen?

Attached is a patch based and I tested it by manually killing regionservers at 
random (not totally random, but killing one and then killing the successful one 
when it has just transferred the logs) (its difficult to kill it while 
transferring as its an atomic operation now). Any ideas/suggestions for more 
direct testing are welcome.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBase-2611-upstream-v1.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-12-08 Thread Chris Trezzo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165085#comment-13165085
 ] 

Chris Trezzo commented on HBASE-2611:
-

@J-D

LarsH and I were talking about another approach to region server replication 
hlog queue failover yesterday, and I wanted to get some feedback on it.

Currently when handling a nodeDeleted event, the live region servers only 
attempt to failover the node corresponding to the event. The nodeDeleted event 
is only fired once, so to protect ourselves from orphaning the znode state of 
the failed region server in a cascading failure scenario, we move the state to 
the znode of the region server that is performing the failover. Since we don't 
have an atomic way to move this state, it gets a little tricky.

Instead of this approach, we could have the region server attempt to failover 
all failed region servers every time it receives a nodeDeleted event. For 
example, the nodeDeleted method could go something like this: refresh the 
region server list, get the list of region servers in the replication znode 
structure, attempt to lock and failover any region server listed in the 
replication znode structure that is not currently alive.

The same race to lock the region server znode will occur. Only one region 
server will get the lock and handle the failover. Each NodeFailoverWorker that 
gets started could simply operate on the original dead region server znode 
structure. If the region server fails while preforming the failover, then both 
the region servers will get picked up by another region server when the 
nodeDeleted event for the second failure is fired. Locks would have to be 
ephemeral nodes to prevent permanent locking of a region server when the 
failover region server dies. Once the replication hlog queues are successfully 
replicated, the znode for the dead region server can be deleted.  

On the cons side, this approach makes the handling of a nodeDeleted event a 
heavier weight operation.

On the pros side, it makes the failover code much simpler because we no longer 
have to worry about moving the region server znode state around in zookeeper.

Thoughts always appreciated.

Thanks,
Chris

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-28 Thread Chris Trezzo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138637#comment-13138637
 ] 

Chris Trezzo commented on HBASE-2611:
-

I think adding the ability to atomically move a znode and all its child znodes 
might be a pretty invasive change. I couldn't seem to find any utility package 
for this on the net, but there is a patch in Zookeeper 
([ZOOKEEPER-965|https://issues.apache.org/jira/browse/ZOOKEEPER-965]) 
implementing atomic batch operations that is scheduled for 3.4.

I thought about the problem a little bit, and after conferring with Lars, I 
think we might not need the atomic move (although it would definitely make it 
simpler).

Below is some pseudo code for the algorithm I came up with. It is very similar 
to what you suggested above. Both intentions and locks are tagged with the 
region server they point to (i.e. locks are tagged with the rs that holds them, 
and intentions are tagged with the rs they intend to lock). Intentions are at 
the same level in the znode structure as locks. It is a recursive, depth first 
algorithm.

Questions/comments/suggestions always appreciated.

Chris

{code}

//this method is the top-level failover method (i.e. NodeFailoverWorker.run())
failOverRun(FailedNode a) {
  recordIntention(a, this);
  if(getLock(a, this)) {
//transfer all queues to local node
moveState(a, this, this);
  }
  else {
deleteIntention(a, this);
return;
  }
  replicateQueues();
}

moveState(NodeToMove a, CurrentNode c, TargetNode t) {
  if(lock exists on a) {
if(lock on a is owned by c) {
  moveStateHelper(a, c, t);
}
else {
  //someone else has the lock and is handling
  //the failover
  deleteIntention(a, c);
}
  }
  else {
if(queue znodes exist) {
  //we know that this node has queues to transfer
  if(getLock(a, c)) {
moveStateHelper(a, c, t);
  }
  else {
deleteIntention(a, c);
  }
}
else {
  //we know that this node is being deleted
  deleteState(a);
  deleteIntention(a, c);
}
  }
}

moveStateHelper(NodeToMove a, CurrentNode c, TargetNode t) {
  for(every intention b of a) {
moveState(b, a, t);
  }
  //we need to safely handle the case where we try to copy
  //queues that have already been copied
  copy all queues in a to t;
  deleteState(a);
  deleteIntention(a, c);
}

deleteState(NodeToDelete d) {
  //there is no need to traverse down the tree at all
  //because at this point everything below us should have
  //been deleted
  //
  //we need to safely handle the case where we attempt to delete
  //nodes that have already been deleted

  delete entire node;
}

{code}

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-28 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138739#comment-13138739
 ] 

Ted Yu commented on HBASE-2611:
---

In moveState(), if lock on a is owned by c, should lock be released after 
moveStateHelper() returns ?
I guess lock release can also be done at the end of moveStateHelper().

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-28 Thread Chris Trezzo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138762#comment-13138762
 ] 

Chris Trezzo commented on HBASE-2611:
-

I should have specified that in deleteState(), the line delete entire node 
deletes the entire znode replication hierarchy for that region server. This 
would include the lock znode, which is essentially releasing the lock at the 
end of moveStateHelper().

Thanks!
Chris

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-25 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135292#comment-13135292
 ] 

Jean-Daniel Cryans commented on HBASE-2611:
---

Actually it would be nice if it was in a separate utility package since 
atomically moving a znode folder recursively would be a very useful function 
in general. It might even already exist on the net.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-24 Thread Chris Trezzo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134549#comment-13134549
 ] 

Chris Trezzo commented on HBASE-2611:
-

@J-D

If you don't mind, I was thinking about taking a crack at this using your 4 
types of znode strategy. I'll start working on a sketch patch.

At a first glance, it seems as though most of the code changes are going to be 
in ReplicationSourceManager.NodeFailoverWorker.run().

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2611) Handle RS that fails while processing the failure of another one

2011-10-24 Thread Chris Trezzo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134568#comment-13134568
 ] 

Chris Trezzo commented on HBASE-2611:
-

...and of course ReplicationZookeeper.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans

 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira