[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485262#comment-13485262 ] stack commented on HBASE-6070: -- [~tychang] Would you mind making a new issue to remove the dead code? Thank you. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.94.1, 0.96.0 Attachments: HBASE-6070_0.92_1.patch, HBASE-6070_0.92.patch, HBASE-6070_0.94_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk_1.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287233#comment-13287233 ] Hudson commented on HBASE-6070: --- Integrated in HBase-0.92-security #109 (See [https://builds.apache.org/job/HBase-0.92-security/109/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342727) Result = SUCCESS ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283989#comment-13283989 ] Hudson commented on HBASE-6070: --- Integrated in HBase-0.94-security #32 (See [https://builds.apache.org/job/HBase-0.94-security/32/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342725) Result = SUCCESS ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283599#comment-13283599 ] ramkrishna.s.vasudevan commented on HBASE-6070: --- Committed to trunk, 0.94 and 0.92. Thanks for the review Ted. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283652#comment-13283652 ] Hudson commented on HBASE-6070: --- Integrated in HBase-0.94 #217 (See [https://builds.apache.org/job/HBase-0.94/217/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342725) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283658#comment-13283658 ] Hudson commented on HBASE-6070: --- Integrated in HBase-TRUNK #2922 (See [https://builds.apache.org/job/HBase-TRUNK/2922/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342724) Result = FAILURE ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283752#comment-13283752 ] Hudson commented on HBASE-6070: --- Integrated in HBase-0.92 #421 (See [https://builds.apache.org/job/HBase-0.92/421/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342727) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283830#comment-13283830 ] Hudson commented on HBASE-6070: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #16 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/16/]) HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342724) Result = FAILURE ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282516#comment-13282516 ] Zhihong Yu commented on HBASE-6070: --- {code} +// but the RS had went down before completing the split process then will not try to {code} 'had went down' - 'had gone down' {code} + if(response == null) return null; {code} Space after 'if' {code} + static Result getMetaTableRowResultAsSplittedRegion(final HRegionInfo hri, final ServerName sn) {code} The method should be called getMetaTableRowResultAsSplitRegion(). Should investigate the test failure in TestFromClientSide AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282818#comment-13282818 ] Zhihong Yu commented on HBASE-6070: --- +1 on patch v2. You may want to verify that the failed test below wasn't related to this change: https://builds.apache.org/job/PreCommit-HBASE-Build/1987/console AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283121#comment-13283121 ] ramkrishna.s.vasudevan commented on HBASE-6070: --- @Ted TestServerCustomProtocol.testSingleMethod() passes with the patch. I saw that even in someother precommit build the same has failed. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/ AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283123#comment-13283123 ] Zhihong Yu commented on HBASE-6070: --- All right. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281678#comment-13281678 ] ramkrishna.s.vasudevan commented on HBASE-6070: --- I plan to make the following change in AM.nodeDeleted. Currently as SSH is trying to handle the RIT in splitting state doing the same in AM.nodeDeleted leads to race. {code} -if (rs.isSplitting() || rs.isSplit()) { +if (rs.isSplit()) { LOG.debug(Ephemeral node deleted, regionserver crashed?, + clearing from RIT; rs= + rs); regionOffline(rs.getRegion()); {code} Pls provide your suggestions. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira