[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071919#comment-16071919 ] Hadoop QA commented on HBASE-12457: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for instructions. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HBASE-12457 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-12457 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12681485/12457-trunk-v3.txt | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7468/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Regions in transition for a long time when CLOSE interleaves with a slow > compaction > --- > > Key: HBASE-12457 > URL: https://issues.apache.org/jira/browse/HBASE-12457 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.7 >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 12457-combined-0.98.txt, 12457-combined-0.98-v2.txt, > 12457-combined-trunk.txt, 12457.interrupt.txt, 12457.interrupt-v2.txt, > 12457-minifix.txt, 12457-trunk-v3.txt, HBASE-12457_addendum.patch, > HBASE-12457.patch, TestRegionReplicas-jstack.txt > > > Under heave load we have observed regions remaining in transition for 20 > minutes when the master requests a close while a slow compaction is running. > The pattern is always something like this: > # RS starts a compaction > # HM request the region to be closed on this RS > # Compaction is not aborted for another 20 minutes > # The region is in transition and not usable. > In every case I tracked down so far the time between the requested CLOSE and > abort of the compaction is almost exactly 20 minutes, which is suspicious. > Of course part of the issue is having compactions that take over 20 minutes, > but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071917#comment-16071917 ] stack commented on HBASE-12457: --- Unscheduling old issue from 2.0.0. > Regions in transition for a long time when CLOSE interleaves with a slow > compaction > --- > > Key: HBASE-12457 > URL: https://issues.apache.org/jira/browse/HBASE-12457 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.7 >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 12457-combined-0.98.txt, 12457-combined-0.98-v2.txt, > 12457-combined-trunk.txt, 12457.interrupt.txt, 12457.interrupt-v2.txt, > 12457-minifix.txt, 12457-trunk-v3.txt, HBASE-12457_addendum.patch, > HBASE-12457.patch, TestRegionReplicas-jstack.txt > > > Under heave load we have observed regions remaining in transition for 20 > minutes when the master requests a close while a slow compaction is running. > The pattern is always something like this: > # RS starts a compaction > # HM request the region to be closed on this RS > # Compaction is not aborted for another 20 minutes > # The region is in transition and not usable. > In every case I tracked down so far the time between the requested CLOSE and > abort of the compaction is almost exactly 20 minutes, which is suspicious. > Of course part of the issue is having compactions that take over 20 minutes, > but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232329#comment-14232329 ] Lars Hofhansl commented on HBASE-12457: --- In order for this to work we actually need the DFS write side to be interruptable, which is currently it is not. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457-trunk-v3.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209470#comment-14209470 ] Hadoop QA commented on HBASE-12457: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681269/HBASE-12457.patch against trunk revision . ATTACHMENT ID: 12681269 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3787 checkstyle errors (more than the trunk's current 3786 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421) at org.apache.hadoop.hbase.ResourceCheckerJUnitListener.testFinished(ResourceCheckerJUnitListener.java:183) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11659//console This message is automatically generated. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209485#comment-14209485 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-TRUNK #5772 (See [https://builds.apache.org/job/HBase-TRUNK/5772/]) HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 231d3ee2adbfc32dfe4f7d7cd7a96ac33968520e) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209520#comment-14209520 ] Hudson commented on HBASE-12457: SUCCESS: Integrated in HBase-0.98 #674 (See [https://builds.apache.org/job/HBase-0.98/674/]) HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 56af34831fc854c177697aefaf80d535996f87e8) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209531#comment-14209531 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #642 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/642/]) HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 56af34831fc854c177697aefaf80d535996f87e8) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209660#comment-14209660 ] Dima Spivak commented on HBASE-12457: - [~lhofhansl], this commit looks to be [breaking test-compile on branch-1|https://builds.apache.org/job/HBase-1.0/462/console] and is [causing 5 tests from TestRegionReplicas to fail on master|https://builds.apache.org/job/HBase-TRUNK/5772/testReport/] :(. FWIW, I reran on my local build machines and got the same errors. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209704#comment-14209704 ] ramkrishna.s.vasudevan commented on HBASE-12457: [~larsh] {code} writestate.wait(millis); if (millis 0 EnvironmentEdgeManager.currentTime() - start = millis) { // if we waited once for compactions to finish, interrupt them, and try again if (LOG.isDebugEnabled()) { LOG.debug(Waited for + millis + ms for compactions to finish on close. Interrupting + currentCompactions.size() + compactions.); } for (Thread t : currentCompactions.keySet()) { // interrupt any current IO in the currently running compactions. t.interrupt(); } millis = 0; } {code} In this code we interrupt all the threads and set the millis = 0. So again the code goes to the outerloop and will once again wait for writeState.wait(0), expecting notify will happen. But what if by this time all the threads were interrupted and the notifyAll was also called. {code} finally { if (wasStateSet) { synchronized (writestate) { --writestate.compacting; if (writestate.compacting = 0) { writestate.notifyAll(); } } } {code} We will end up in infinite waiting? I may be wrong here pls correct me. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209924#comment-14209924 ] Andrew Purtell commented on HBASE-12457: +1 on the addendum for fixing test annotation import paths Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209952#comment-14209952 ] Andrew Purtell commented on HBASE-12457: I pushed the addendum to branch-1 and master. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209985#comment-14209985 ] Andrew Purtell commented on HBASE-12457: I can see a TestRegionReplicas hang. We are getting hung up on waiting for a HTable thread pool to terminate: {noformat} Thread-2297 prio=10 tid=0x7feee0d1c800 nid=0x6173 waiting on condition [0x7fee508c6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00078e04d4c8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468) at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1490) at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.afterClass(TestRegionReplicas.java:107) at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.restartRegionServer(TestRegionReplicas.java:220) at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421) {noformat} A worker thread in the HTable thread pool is hung up trying to get table state: {noformat} htable-pool53-t2 daemon prio=10 tid=0x7feea454c000 nid=0x566e waiting on condition [0x7feec0365000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1487) - locked 0x00078cc03140 (a java.lang.Object) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1522) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1727) - locked 0x00078cc03140 (a java.lang.Object) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTableState(ConnectionManager.java:2504) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableDisabled(ConnectionManager.java:894) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1064) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:289) at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:135) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Not sure how this relates to any compaction changes. At first glance it doesn't seem to. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210020#comment-14210020 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-TRUNK #5773 (See [https://builds.apache.org/job/HBase-TRUNK/5773/]) Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction; Test import fix (apurtell: rev f6d8cde1e4f67390a936e7bc9f8c70b65a808450) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210037#comment-14210037 ] Andrew Purtell commented on HBASE-12457: Well for whatever reason this change does trigger the above condition, due to some kind of timing change, because if I go back two commits, before this patch and the addendum, the test makes progress and completes. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210036#comment-14210036 ] Lars Hofhansl commented on HBASE-12457: --- Sorry about the build break on branch-1. I cherry-picked the patch. Usually I do a compile and run the relevant tests, but I spaced it this time. The hang will not happen since we only notify *after* we set writestate.compacting (or writestate.flushing) back to false, so there is no race. I looked at that part :) In the face of the test failures I am going to roll this back anyway, though. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210044#comment-14210044 ] Lars Hofhansl commented on HBASE-12457: --- reverted from all branches... sorry about the noise Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210046#comment-14210046 ] Lars Hofhansl commented on HBASE-12457: --- [~apurtell], you mean the test condition, right? Or did you see it hanging specifically on that writestate.wait(...)? Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210054#comment-14210054 ] Andrew Purtell commented on HBASE-12457: I meant the minicluster shutdown sequencing issue. Thanks for trying to get this in for .8 Lars. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210070#comment-14210070 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-1.0 #463 (See [https://builds.apache.org/job/HBase-1.0/463/]) Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction; Test import fix (apurtell: rev 9d2ad55cfa6108718d785b5e71ab10e9fb75a988) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210073#comment-14210073 ] stack commented on HBASE-12457: --- Thanks for backing out breaking change promptly. Feel free to retry given you are watching the build results. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210159#comment-14210159 ] Hudson commented on HBASE-12457: SUCCESS: Integrated in HBase-1.0 #464 (See [https://builds.apache.org/job/HBase-1.0/464/]) Revert Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction; Test import fix (larsh: rev 880c7c35fc50f28ec3e072a4c62a348fc964e9e0) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Revert HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 1861f9ce25bc8609629928a670fdf3566486ca25) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210160#comment-14210160 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-0.98 #675 (See [https://builds.apache.org/job/HBase-0.98/675/]) Revert HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 7f5f1570ce83c62ce9408701677994415b127b36) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210184#comment-14210184 ] Hudson commented on HBASE-12457: SUCCESS: Integrated in HBase-TRUNK #5774 (See [https://builds.apache.org/job/HBase-TRUNK/5774/]) Revert Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction; Test import fix (larsh: rev 9d634772fa12e16b86b0218802b2e38cacdfd528) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Revert HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev c29318c038f0f310562dc8194506b504eae72c1b) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211269#comment-14211269 ] Hudson commented on HBASE-12457: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #643 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/643/]) Revert HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 7f5f1570ce83c62ce9408701677994415b127b36) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211809#comment-14211809 ] Lars Hofhansl commented on HBASE-12457: --- OK... What caused TestRegionReplicas to hang was the change that moved {{this.parent.writestate.writesEnabled = true;}} from SplitTransaction to HRegion.initializeRegionInternals. That part is not needed anyway, it just looked like it would be more correct. Here's a patch for trunk that does passes TestRegionReplicas. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211906#comment-14211906 ] Hadoop QA commented on HBASE-12457: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681485/12457-trunk-v3.txt against trunk revision . ATTACHMENT ID: 12681485 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3787 checkstyle errors (more than the trunk's current 3786 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.coprocessor.TestMasterObserver.testRegionTransitionOperations(TestMasterObserver.java:1488) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11673//console This message is automatically generated. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457-trunk-v3.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208293#comment-14208293 ] Andrew Purtell commented on HBASE-12457: I looked at the 0.98 combined patch. Changes lgtm, except: - In HRegion#waitForFlushesAndCompactions, add debug level logging when issuing thread interrupts so we can determine if we are trying to interrupt but nothing subsequently happens. - Use EnvironmentEdgeManager#currentTime (EEM#currentTimeMillis on 0.98) instead of System.currentTimeMillis Nice test. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208502#comment-14208502 ] Andrew Purtell commented on HBASE-12457: 30s is ok. We're trying to limit the time clients will see NSRE for a parent region going offline in a split transaction so we shouldn't be too conservative with waiting here. Under what circumstances would we not want to clean up files in tmp from a failed or aborted compaction? They're broken or redundant or both. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208494#comment-14208494 ] Lars Hofhansl commented on HBASE-12457: --- bq. add debug level logging when issuing thread interrupts Was *just* thinking that :) Will. The new Test also needs a license header. You think 30s is good. Internally we found all compactions that do not have this issue are aborted within 8s. Could make it a minute - although it's not really hurting anything. The only part it (when interrupting the compaction) doesn't do it cleaning up the files in tmp. Maybe it should do that...? (might be a bit heard to distinguish this from other exception for which we presumably do not want to clean up the tmp file... Or do we?) Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208542#comment-14208542 ] Lars Hofhansl commented on HBASE-12457: --- Seeing an assertion failure now in the test... Checking. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208574#comment-14208574 ] Lars Hofhansl commented on HBASE-12457: --- Turns out that's because of HBASE-12454 (which I think was committed due to a misunderstanding) Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208637#comment-14208637 ] Andrew Purtell commented on HBASE-12457: I pushed reverts for HBASE-12454 so that should be good now. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208708#comment-14208708 ] Lars Hofhansl commented on HBASE-12457: --- We've seen this on two machines now. The wait on the other machine was also close to 20m (18m to be precise). Last question: Is 30s wait time before we interrupt good enough? The compactions should cancel themselves (in our case we find that unless they hang in the described way, they cancel themselves after no more than 8s). Could maybe wait a minute too. Not sure. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208769#comment-14208769 ] Andrew Purtell commented on HBASE-12457: v2 patch lgtm Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209142#comment-14209142 ] Andrew Purtell commented on HBASE-12457: Let's commit this if you feel comfortable with it [~lhofhansl] so the next 0.98.8 RC can get out the door. (Or we can try again later for .9) Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209271#comment-14209271 ] Lars Hofhansl commented on HBASE-12457: --- Lemme make a quick trunk patch. I think this is quite safe. If compaction manage to cancel themselves within 30s it is functionally unchanged. And the change in SplitTransaction seems correct to me as well. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209282#comment-14209282 ] Andrew Purtell commented on HBASE-12457: Thanks Lars, appreciate it. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209338#comment-14209338 ] Hadoop QA commented on HBASE-12457: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681251/12457-combined-trunk.txt against trunk revision . ATTACHMENT ID: 12681251 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3787 checkstyle errors (more than the trunk's current 3786 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.io.TestHeapSize Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11658//console This message is automatically generated. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209348#comment-14209348 ] Andrew Purtell commented on HBASE-12457: New patch adjusting HRegion heap size estimate coming right up. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209354#comment-14209354 ] Lars Hofhansl commented on HBASE-12457: --- Cool. Thanks [~apurtell]. So this is good to go. Unless there are objections I'll commit this now. [~stack], [~ram_krish], [~jxiang], if you guys have some time maybe put some extra sets of eyes on this (good even when done after commit). Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209394#comment-14209394 ] Lars Hofhansl commented on HBASE-12457: --- Pushed to 0.98+. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209443#comment-14209443 ] Hudson commented on HBASE-12457: FAILURE: Integrated in HBase-1.0 #462 (See [https://builds.apache.org/job/HBase-1.0/462/]) HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a slow compaction. (larsh: rev 0e795c1cf8621df2d33600f4b33a00344fe5de5a) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206880#comment-14206880 ] stack commented on HBASE-12457: --- We are aborting the compaction because we want to close. The compaction abort is not noticed for 20minutes? We shouldn't close if an ongoing compaction (that has not yet aborted)? Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206941#comment-14206941 ] Lars Hofhansl commented on HBASE-12457: --- Right. The timing is hard though. It seems the master considers the region closed once it sent the CLOSE. One option I though about is for the HRegion.doClose() to interrupt any compactions running (i.e. interrupt the CompactSplitThread). Then upon receiving an interrupted exception the compactor would recheck writestate.writesEnabled rather than waiting for the next 10mb chunk to finish writing. The symptom here looks like the compactor just hanging in some IO (either scanner.next or writer.append - my bet is on the latter). An interrupt can break out of that and allow the compactor to recheck the condition. Might be easiest to explain with a patch. :) Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206957#comment-14206957 ] Lars Hofhansl commented on HBASE-12457: --- Unfortunately there is not API to interrupt all threads in a Threadpool without shutting down the pool. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207211#comment-14207211 ] Andrew Purtell commented on HBASE-12457: Patch looks ok to me, modulo those cleanups you mention, getting rid of the casting. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207214#comment-14207214 ] Andrew Purtell commented on HBASE-12457: I guess the question is .. does it work ? :-) We should also have a test in TestCompaction Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207320#comment-14207320 ] Lars Hofhansl commented on HBASE-12457: --- Thinking about how to express this in a test without cut-and-pasting all of Compactor into a test class. I need to be able to block the writer during append. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207346#comment-14207346 ] Lars Hofhansl commented on HBASE-12457: --- Test upcoming soon. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207365#comment-14207365 ] Andrew Purtell commented on HBASE-12457: If you just want to test if interrupting compaction is possible, you might be able to use Mockito, something like: {code} // Get a reference to the DefaultCompactor instance as defaultCompactor somehow compactor = spy(defaultCompactor); // This should first call compactor.compact() and then execute the code in the answer() callback when(compactor.compact(anyObject())).thenAnswer(new Answer() { Object answer(InvocationOnMock invocation) { Thread.sleep(6); // or whatever return ...; // need to return a ListPath } }); // ... // Trigger compaction {code} ? Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207693#comment-14207693 ] Jerry He commented on HBASE-12457: -- Is this somehow similar or related to HBASE-10492? For that JIRA, I thought in the end it was probably the env I had at one point. I was running on IBM GPFS FileSystem.. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207696#comment-14207696 ] Lars Hofhansl commented on HBASE-12457: --- That looks differently to me. Here we cannot finish a compaction and we fail to abort it when the master wants to close the region. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206010#comment-14206010 ] Lars Hofhansl commented on HBASE-12457: --- Sometime (but not always) we Splits interspersed with this. While l scoured over the code I noticed the following: * SplitTransaction write CREATE_SPLIT_DIR after it created the daugther dirs and CLOSED_PARENT_REGION after the parent region is closed * Upon rollback writestate.writesEnabled is set back to true unconditionally at the CREATE_SPLIT_DIR stage. It seems that should only be done when we journaled CLOSED_PARENT_REGION. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction
[ https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206059#comment-14206059 ] Lars Hofhansl commented on HBASE-12457: --- That all said, in the end I have observed that only on a single region server (so far) so might be an environmental issue. Regions in transition for a long time when CLOSE interleaves with a slow compaction --- Key: HBASE-12457 URL: https://issues.apache.org/jira/browse/HBASE-12457 Project: HBase Issue Type: Bug Affects Versions: 0.98.7 Reporter: Lars Hofhansl Attachments: 12457-minifix.txt Under heave load we have observed regions remaining in transition for 20 minutes when the master requests a close while a slow compaction is running. The pattern is always something like this: # RS starts a compaction # HM request the region to be closed on this RS # Compaction is not aborted for another 20 minutes # The region is in transition and not usable. In every case I tracked down so far the time between the requested CLOSE and abort of the compaction is almost exactly 20 minutes, which is suspicious. Of course part of the issue is having compactions that take over 20 minutes, but maybe we can do better here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)