[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2017-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071919#comment-16071919
 ] 

Hadoop QA commented on HBASE-12457:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for 
instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HBASE-12457 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-12457 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12681485/12457-trunk-v3.txt |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7468/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> Regions in transition for a long time when CLOSE interleaves with a slow 
> compaction
> ---
>
> Key: HBASE-12457
> URL: https://issues.apache.org/jira/browse/HBASE-12457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.7
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 12457-combined-0.98.txt, 12457-combined-0.98-v2.txt, 
> 12457-combined-trunk.txt, 12457.interrupt.txt, 12457.interrupt-v2.txt, 
> 12457-minifix.txt, 12457-trunk-v3.txt, HBASE-12457_addendum.patch, 
> HBASE-12457.patch, TestRegionReplicas-jstack.txt
>
>
> Under heave load we have observed regions remaining in transition for 20 
> minutes when the master requests a close while a slow compaction is running.
> The pattern is always something like this:
> # RS starts a compaction
> # HM request the region to be closed on this RS
> # Compaction is not aborted for another 20 minutes
> # The region is in transition and not usable.
> In every case I tracked down so far the time between the requested CLOSE and 
> abort of the compaction is almost exactly 20 minutes, which is suspicious.
> Of course part of the issue is having compactions that take over 20 minutes, 
> but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2017-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071917#comment-16071917
 ] 

stack commented on HBASE-12457:
---

Unscheduling old issue  from 2.0.0.

> Regions in transition for a long time when CLOSE interleaves with a slow 
> compaction
> ---
>
> Key: HBASE-12457
> URL: https://issues.apache.org/jira/browse/HBASE-12457
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.7
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 12457-combined-0.98.txt, 12457-combined-0.98-v2.txt, 
> 12457-combined-trunk.txt, 12457.interrupt.txt, 12457.interrupt-v2.txt, 
> 12457-minifix.txt, 12457-trunk-v3.txt, HBASE-12457_addendum.patch, 
> HBASE-12457.patch, TestRegionReplicas-jstack.txt
>
>
> Under heave load we have observed regions remaining in transition for 20 
> minutes when the master requests a close while a slow compaction is running.
> The pattern is always something like this:
> # RS starts a compaction
> # HM request the region to be closed on this RS
> # Compaction is not aborted for another 20 minutes
> # The region is in transition and not usable.
> In every case I tracked down so far the time between the requested CLOSE and 
> abort of the compaction is almost exactly 20 minutes, which is suspicious.
> Of course part of the issue is having compactions that take over 20 minutes, 
> but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-12-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232329#comment-14232329
 ] 

Lars Hofhansl commented on HBASE-12457:
---

In order for this to work we actually need the DFS write side to be 
interruptable, which is currently it is not.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457-trunk-v3.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, 
 HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209470#comment-14209470
 ] 

Hadoop QA commented on HBASE-12457:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681269/HBASE-12457.patch
  against trunk revision .
  ATTACHMENT ID: 12681269

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3787 checkstyle errors (more than the trunk's current 3786 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421)
at 
org.apache.hadoop.hbase.ResourceCheckerJUnitListener.testFinished(ResourceCheckerJUnitListener.java:183)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11659//console

This message is automatically generated.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a 

[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209485#comment-14209485
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-TRUNK #5772 (See 
[https://builds.apache.org/job/HBase-TRUNK/5772/])
HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a 
slow compaction. (larsh: rev 231d3ee2adbfc32dfe4f7d7cd7a96ac33968520e)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209520#comment-14209520
 ] 

Hudson commented on HBASE-12457:


SUCCESS: Integrated in HBase-0.98 #674 (See 
[https://builds.apache.org/job/HBase-0.98/674/])
HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a 
slow compaction. (larsh: rev 56af34831fc854c177697aefaf80d535996f87e8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209531#comment-14209531
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #642 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/642/])
HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a 
slow compaction. (larsh: rev 56af34831fc854c177697aefaf80d535996f87e8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Dima Spivak (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209660#comment-14209660
 ] 

Dima Spivak commented on HBASE-12457:
-

[~lhofhansl], this commit looks to be [breaking test-compile on 
branch-1|https://builds.apache.org/job/HBase-1.0/462/console] and is [causing 5 
tests from TestRegionReplicas to fail on 
master|https://builds.apache.org/job/HBase-TRUNK/5772/testReport/] :(. FWIW, I 
reran on my local build machines and got the same errors.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209704#comment-14209704
 ] 

ramkrishna.s.vasudevan commented on HBASE-12457:


[~larsh]
{code}
writestate.wait(millis);
if (millis  0  EnvironmentEdgeManager.currentTime() - start = 
millis) {
  // if we waited once for compactions to finish, interrupt them, 
and try again
  if (LOG.isDebugEnabled()) {
LOG.debug(Waited for  + millis
  +  ms for compactions to finish on close. Interrupting 
  + currentCompactions.size() +  compactions.);
  }
  for (Thread t : currentCompactions.keySet()) {
// interrupt any current IO in the currently running 
compactions.
t.interrupt();
  }
  millis = 0;
}
{code}
In this code we interrupt all the threads and set the millis = 0.  So again the 
code goes to the outerloop and will once again wait for writeState.wait(0), 
expecting notify will happen. But what if by this time all the threads were 
interrupted and the notifyAll was also called.
{code}
finally {
if (wasStateSet) {
  synchronized (writestate) {
--writestate.compacting;
if (writestate.compacting = 0) {
  writestate.notifyAll();
}
  }
}
{code}
We will end up in infinite waiting?
I may be wrong here pls correct me.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209924#comment-14209924
 ] 

Andrew Purtell commented on HBASE-12457:


+1 on the addendum for fixing test annotation import paths 

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209952#comment-14209952
 ] 

Andrew Purtell commented on HBASE-12457:


I pushed the addendum to branch-1 and master.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209985#comment-14209985
 ] 

Andrew Purtell commented on HBASE-12457:


I can see a TestRegionReplicas hang. We are getting hung up on waiting for a 
HTable thread pool to terminate:
{noformat}
Thread-2297 prio=10 tid=0x7feee0d1c800 nid=0x6173 waiting on condition 
[0x7fee508c6000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x00078e04d4c8 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1490)
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.afterClass(TestRegionReplicas.java:107)
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.restartRegionServer(TestRegionReplicas.java:220)
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421)
{noformat}

A worker thread in the HTable thread pool is hung up trying to get table state:

{noformat}
htable-pool53-t2 daemon prio=10 tid=0x7feea454c000 nid=0x566e waiting on 
condition [0x7feec0365000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1487)
- locked 0x00078cc03140 (a java.lang.Object)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1522)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1727)
- locked 0x00078cc03140 (a java.lang.Object)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTableState(ConnectionManager.java:2504)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableDisabled(ConnectionManager.java:894)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1064)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:289)
at 
org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:135)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Not sure how this relates to any compaction changes. At first glance it doesn't 
seem to.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but 

[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210020#comment-14210020
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-TRUNK #5773 (See 
[https://builds.apache.org/job/HBase-TRUNK/5773/])
Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves 
with a slow compaction; Test import fix (apurtell: rev 
f6d8cde1e4f67390a936e7bc9f8c70b65a808450)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210037#comment-14210037
 ] 

Andrew Purtell commented on HBASE-12457:


Well for whatever reason this change does trigger the above condition, due to 
some kind of timing change, because if I go back two commits, before this patch 
and the addendum, the test makes progress and completes.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210036#comment-14210036
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Sorry about the build break on branch-1. I cherry-picked the patch. Usually I 
do a compile and run the relevant tests, but I spaced it this time.

The hang will not happen since we only notify *after* we set 
writestate.compacting (or writestate.flushing) back to false, so there is no 
race. I looked at that part :)

In the face of the test failures I am going to roll this back anyway, though.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210044#comment-14210044
 ] 

Lars Hofhansl commented on HBASE-12457:
---

reverted from all branches... sorry about the noise

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210046#comment-14210046
 ] 

Lars Hofhansl commented on HBASE-12457:
---

[~apurtell], you mean the test condition, right? Or did you see it hanging 
specifically on that writestate.wait(...)?

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210054#comment-14210054
 ] 

Andrew Purtell commented on HBASE-12457:


I meant the minicluster shutdown sequencing issue.  Thanks for trying to get 
this in for .8 Lars.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210070#comment-14210070
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-1.0 #463 (See 
[https://builds.apache.org/job/HBase-1.0/463/])
Amend HBASE-12457 Regions in transition for a long time when CLOSE interleaves 
with a slow compaction; Test import fix (apurtell: rev 
9d2ad55cfa6108718d785b5e71ab10e9fb75a988)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210073#comment-14210073
 ] 

stack commented on HBASE-12457:
---

Thanks for backing out breaking change promptly.  Feel free to retry given you 
are watching the build results.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210159#comment-14210159
 ] 

Hudson commented on HBASE-12457:


SUCCESS: Integrated in HBase-1.0 #464 (See 
[https://builds.apache.org/job/HBase-1.0/464/])
Revert Amend HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction; Test import fix (larsh: rev 
880c7c35fc50f28ec3e072a4c62a348fc964e9e0)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
Revert HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction. (larsh: rev 
1861f9ce25bc8609629928a670fdf3566486ca25)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210160#comment-14210160
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-0.98 #675 (See 
[https://builds.apache.org/job/HBase-0.98/675/])
Revert HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction. (larsh: rev 
7f5f1570ce83c62ce9408701677994415b127b36)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210184#comment-14210184
 ] 

Hudson commented on HBASE-12457:


SUCCESS: Integrated in HBase-TRUNK #5774 (See 
[https://builds.apache.org/job/HBase-TRUNK/5774/])
Revert Amend HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction; Test import fix (larsh: rev 
9d634772fa12e16b86b0218802b2e38cacdfd528)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
Revert HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction. (larsh: rev 
c29318c038f0f310562dc8194506b504eae72c1b)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211269#comment-14211269
 ] 

Hudson commented on HBASE-12457:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #643 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/643/])
Revert HBASE-12457 Regions in transition for a long time when CLOSE 
interleaves with a slow compaction. (larsh: rev 
7f5f1570ce83c62ce9408701677994415b127b36)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211809#comment-14211809
 ] 

Lars Hofhansl commented on HBASE-12457:
---

OK... What caused TestRegionReplicas to hang was the change that moved 
{{this.parent.writestate.writesEnabled = true;}} from SplitTransaction to 
HRegion.initializeRegionInternals.

That part is not needed anyway, it just looked like it would be more correct. 
Here's a patch for trunk that does passes TestRegionReplicas.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch, HBASE-12457_addendum.patch, 
 TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211906#comment-14211906
 ] 

Hadoop QA commented on HBASE-12457:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681485/12457-trunk-v3.txt
  against trunk revision .
  ATTACHMENT ID: 12681485

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3787 checkstyle errors (more than the trunk's current 3786 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.coprocessor.TestMasterObserver.testRegionTransitionOperations(TestMasterObserver.java:1488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11673//console

This message is automatically generated.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457-trunk-v3.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt, HBASE-12457.patch, 
 HBASE-12457_addendum.patch, TestRegionReplicas-jstack.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is 

[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208293#comment-14208293
 ] 

Andrew Purtell commented on HBASE-12457:


I looked at the 0.98 combined patch. Changes lgtm, except:

- In HRegion#waitForFlushesAndCompactions, add debug level logging when issuing 
thread interrupts so we can determine if we are trying to interrupt but nothing 
subsequently happens.

- Use EnvironmentEdgeManager#currentTime (EEM#currentTimeMillis on 0.98) 
instead of System.currentTimeMillis

Nice test.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208502#comment-14208502
 ] 

Andrew Purtell commented on HBASE-12457:


30s is ok. We're trying to limit the time clients will see NSRE for a parent 
region going offline in a split transaction so we shouldn't be too conservative 
with waiting here. 

Under what circumstances would we not want to clean up files in tmp from a 
failed or aborted compaction? They're  broken or redundant or both.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208494#comment-14208494
 ] 

Lars Hofhansl commented on HBASE-12457:
---

bq. add debug level logging when issuing thread interrupts

Was *just* thinking that :) Will. The new Test also needs a license header.

You think 30s is good. Internally we found all compactions that do not have 
this issue are aborted within 8s. Could make it a minute - although it's not 
really hurting anything. The only part it (when interrupting the compaction) 
doesn't do it cleaning up the files in tmp. Maybe it should do that...? (might 
be a bit heard to distinguish this from other exception for which we presumably 
do not want to clean up the tmp file... Or do we?)


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208542#comment-14208542
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Seeing an assertion failure now in the test... Checking.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208574#comment-14208574
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Turns out that's because of HBASE-12454 (which I think was committed due to a 
misunderstanding)

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208637#comment-14208637
 ] 

Andrew Purtell commented on HBASE-12457:


I pushed reverts for HBASE-12454 so that should be good now. 

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208708#comment-14208708
 ] 

Lars Hofhansl commented on HBASE-12457:
---

We've seen this on two machines now. The wait on the other machine was also 
close to 20m (18m to be precise).

Last question: Is 30s wait time before we interrupt good enough? The 
compactions should cancel themselves (in our case we find that unless they hang 
in the described way, they cancel themselves after no more than 8s). Could 
maybe wait a minute too. Not sure.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208769#comment-14208769
 ] 

Andrew Purtell commented on HBASE-12457:


v2 patch lgtm

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209142#comment-14209142
 ] 

Andrew Purtell commented on HBASE-12457:


Let's commit this if you feel comfortable with it [~lhofhansl] so the next 
0.98.8 RC can get out the door. (Or we can try again later for .9)

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209271#comment-14209271
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Lemme make a quick trunk patch. I think this is quite safe. If compaction 
manage to cancel themselves within 30s it is functionally unchanged. And the 
change in SplitTransaction seems correct to me as well.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209282#comment-14209282
 ] 

Andrew Purtell commented on HBASE-12457:


Thanks Lars, appreciate it.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-minifix.txt, 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209338#comment-14209338
 ] 

Hadoop QA commented on HBASE-12457:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12681251/12457-combined-trunk.txt
  against trunk revision .
  ATTACHMENT ID: 12681251

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3787 checkstyle errors (more than the trunk's current 3786 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11658//console

This message is automatically generated.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions 

[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209348#comment-14209348
 ] 

Andrew Purtell commented on HBASE-12457:


New patch adjusting HRegion heap size estimate coming right up.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209354#comment-14209354
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Cool. Thanks [~apurtell]. So this is good to go.

Unless there are objections I'll commit this now.

[~stack], [~ram_krish], [~jxiang], if you guys have some time maybe put some 
extra sets of eyes on this (good even when done after commit).


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209394#comment-14209394
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Pushed to 0.98+.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209443#comment-14209443
 ] 

Hudson commented on HBASE-12457:


FAILURE: Integrated in HBase-1.0 #462 (See 
[https://builds.apache.org/job/HBase-1.0/462/])
HBASE-12457 Regions in transition for a long time when CLOSE interleaves with a 
slow compaction. (larsh: rev 0e795c1cf8621df2d33600f4b33a00344fe5de5a)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DefaultCompactor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionIO.java


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98-v2.txt, 12457-combined-0.98.txt, 
 12457-combined-trunk.txt, 12457-minifix.txt, 12457.interrupt-v2.txt, 
 12457.interrupt.txt, HBASE-12457.patch


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206880#comment-14206880
 ] 

stack commented on HBASE-12457:
---

We are aborting the compaction because we want to close. The compaction abort 
is not noticed for 20minutes?  We shouldn't close if an ongoing compaction 
(that has not yet aborted)?

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206941#comment-14206941
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Right. The timing is hard though. It seems the master considers the region 
closed once it sent the CLOSE.

One option I though about is for the HRegion.doClose() to interrupt any 
compactions running (i.e. interrupt the CompactSplitThread). Then upon 
receiving an interrupted exception the compactor would recheck 
writestate.writesEnabled rather than waiting for the next 10mb chunk to finish 
writing.
The symptom here looks like the compactor just hanging in some IO (either 
scanner.next or writer.append - my bet is on the latter). An interrupt can 
break out of that and allow the compactor to recheck the condition.
Might be easiest to explain with a patch. :)

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206957#comment-14206957
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Unfortunately there is not API to interrupt all threads in a Threadpool without 
shutting down the pool.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207211#comment-14207211
 ] 

Andrew Purtell commented on HBASE-12457:


Patch looks ok to me, modulo those cleanups you mention, getting rid of the 
casting.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207214#comment-14207214
 ] 

Andrew Purtell commented on HBASE-12457:


I guess the question is .. does it work ? :-) We should also have a test in 
TestCompaction

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207320#comment-14207320
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Thinking about how to express this in a test without cut-and-pasting all of 
Compactor into a test class. I need to be able to block the writer during 
append.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207346#comment-14207346
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Test upcoming soon.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207365#comment-14207365
 ] 

Andrew Purtell commented on HBASE-12457:


If you just want to test if interrupting compaction is possible, you might be 
able to use Mockito, something like:
{code}
// Get a reference to the DefaultCompactor instance as defaultCompactor somehow
compactor = spy(defaultCompactor);
// This should first call compactor.compact() and then execute the code in the 
answer() callback
when(compactor.compact(anyObject())).thenAnswer(new Answer() {
Object answer(InvocationOnMock invocation) {
Thread.sleep(6); // or whatever
return ...; // need to return a ListPath
}
});
// ...
// Trigger compaction
{code}
?

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207693#comment-14207693
 ] 

Jerry He commented on HBASE-12457:
--

Is this somehow similar or related to HBASE-10492? 
For that JIRA, I thought in the end it was probably the env I had at one point.
I was running on IBM GPFS FileSystem..

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207696#comment-14207696
 ] 

Lars Hofhansl commented on HBASE-12457:
---

That looks differently to me. Here we cannot finish a compaction and we fail to 
abort it when the master wants to close the region.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: 12457-combined-0.98.txt, 12457-minifix.txt, 
 12457.interrupt-v2.txt, 12457.interrupt.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206010#comment-14206010
 ] 

Lars Hofhansl commented on HBASE-12457:
---

Sometime (but not always) we Splits interspersed with this.

While l scoured over the code I noticed the following:
* SplitTransaction write CREATE_SPLIT_DIR after it created the daugther dirs 
and CLOSED_PARENT_REGION after the parent region is closed
* Upon rollback writestate.writesEnabled is set back to true unconditionally at 
the CREATE_SPLIT_DIR stage.

It seems that should only be done when we journaled CLOSED_PARENT_REGION.


 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl

 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12457) Regions in transition for a long time when CLOSE interleaves with a slow compaction

2014-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14206059#comment-14206059
 ] 

Lars Hofhansl commented on HBASE-12457:
---

That all said, in the end I have observed that only on a single region server 
(so far) so might be an environmental issue.

 Regions in transition for a long time when CLOSE interleaves with a slow 
 compaction
 ---

 Key: HBASE-12457
 URL: https://issues.apache.org/jira/browse/HBASE-12457
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Lars Hofhansl
 Attachments: 12457-minifix.txt


 Under heave load we have observed regions remaining in transition for 20 
 minutes when the master requests a close while a slow compaction is running.
 The pattern is always something like this:
 # RS starts a compaction
 # HM request the region to be closed on this RS
 # Compaction is not aborted for another 20 minutes
 # The region is in transition and not usable.
 In every case I tracked down so far the time between the requested CLOSE and 
 abort of the compaction is almost exactly 20 minutes, which is suspicious.
 Of course part of the issue is having compactions that take over 20 minutes, 
 but maybe we can do better here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)