[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416679#comment-16416679
 ] 

Hudson commented on HBASE-20090:


Results for branch HBASE-19064
[build #77 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408915#comment-16408915
 ] 

Hudson commented on HBASE-20090:


Results for branch branch-2
[build #513 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408727#comment-16408727
 ] 

Hudson commented on HBASE-20090:


Results for branch branch-2.0
[build #69 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408156#comment-16408156
 ] 

stack commented on HBASE-20090:
---

Pushed to branch-2 and branch-2.0.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
>  

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407836#comment-16407836
 ] 

Hudson commented on HBASE-20090:


Results for branch master
[build #268 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/268/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407427#comment-16407427
 ] 

Ted Yu commented on HBASE-20090:


Integrated to master branch.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407425#comment-16407425
 ] 

stack commented on HBASE-20090:
---

Now I get it. Thanks. Let me backport...

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407254#comment-16407254
 ] 

Ted Yu commented on HBASE-20090:



bq. but we are still above the high-water mark?

At the moment of trigger of Precondition, the lower mark was crossed. See log:
{code}
2018-03-02 17:28:30,301 INFO  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
{code}
bq. Do you know why that is?

The write load from PE produced the above condition which lasted for some 
period of time.

bq. There were two regions in this server that was under pressue?

No. Among the two regions I mentioned, one was under pressure. 
0453f29030757eedb6e6a1c57e88c085 for TestTable was receiving writes and under 
pressure (and splitting).
The region for atlas_janus table was dormant during the period when PE ran. It 
had a zero-sized memstore.
Since the existing Precondition didn't account for this combination, the 
assertion was triggered.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407204#comment-16407204
 ] 

stack commented on HBASE-20090:
---

Yeah, sorry, RN makes  no sense to me still but the description update helped a 
bunch. Thanks.

So, we avoid the precondition illegal state exception (including this in 
description helped most) but we are still above the high-water mark? Do you 
know why that is? There were two regions in this server that was under pressue? 
One that had a zero-sized memstore and another that was currently splitting?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407182#comment-16407182
 ] 

Ted Yu commented on HBASE-20090:


I have updated both description and Release Note.

Thanks

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> Observed the following in region server log (running on hadoop3 cluster):
> {code}
> 2018-02-26 16:06:50,044 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
> regionserver.MemStoreFlusher: Memstore is above high water mark and block 
> 135ms
> 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Cache flusher failed for entry 
> org.apache.hadoop.hbase.regionserver.  
> MemStoreFlusher$WakeupFlushThread@2adfadd7
> java.lang.IllegalStateException
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() :
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) || 
> bestRegionReplicaSize > 0);
> {code}
> When the Preconditions check fails, IllegalStateException would be raised.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
>

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407149#comment-16407149
 ] 

stack commented on HBASE-20090:
---

So, to understand the release note I have to read the description ... which I 
have and I still don't get it -- I reread it and it talks about exceptions of 
which there are none in the description and it talks of preconditions but I 
don't see it the code snippet provided. I look at the patch and it adds a new 
check. What are we avoiding with this patch? Please put it in the RN. I'm 
trying to avoid having to read this whole issue to figure whether I need this 
patch in 2.0 or not. Thanks.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405145#comment-16405145
 ] 

Ted Yu commented on HBASE-20090:


bq. Why would a region not receive writes?

Please see description above:
Region 0453f29030757eedb6e6a1c57e88c085, belonging to TestTable, was being 
split.

The other region, atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae 
(of table atlas_janus), was not being written to - I used PerformanceEvaluation 
which wrote to TestTable only.
Region fbcb5e495344542daf8b499e4bac03ae's data size was 0, triggering the 
Precondition assertion (before the fix).


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405136#comment-16405136
 ] 

stack commented on HBASE-20090:
---

How does this happen "... if the only candidate left doesn't receive writes. " 
Why would a region not receive writes?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405066#comment-16405066
 ] 

stack commented on HBASE-20090:
---

Can we get a summary of what the problem is in release note and implications 
should the condition arise?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404392#comment-16404392
 ] 

ramkrishna.s.vasudevan commented on HBASE-20090:


+1

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-18 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404377#comment-16404377
 ] 

Anoop Sam John commented on HBASE-20090:


+1

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404237#comment-16404237
 ] 

Ted Yu commented on HBASE-20090:


[~ram_krish] [~anoop.hbase]:
Mind taking a look at patch v10 ?

Thanks

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-18 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403908#comment-16403908
 ] 

Eshcar Hillel commented on HBASE-20090:
---

+1

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400881#comment-16400881
 ] 

Ted Yu commented on HBASE-20090:


hadoopcheck seems to be related to build environment:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:2.5.2:install (default-install) 
on project hbase-thrift: Failed to install metadata 
org.apache.hbase:hbase-thrift:3.0.0-SNAPSHOT/maven-metadata.xml: Could not 
parse metadata 
/home/jenkins/.m2/repository/org/apache/hbase/hbase-thrift/3.0.0-SNAPSHOT/maven-metadata-local.xml:
 in epilog non whitespace content is not allowed but got / (position: END_TAG 
seen ...\n/... @25:2)  -> [Help 1]
{code}
Not caused by the patch.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400872#comment-16400872
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 7s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} hbase-server: The patch generated 0 new + 29 
unchanged - 1 fixed = 29 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 7s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  6m  
1s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  7m 
58s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} 
|
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m  
0s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}108m 
37s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914726/20090.v10.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d9d986ba910a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400449#comment-16400449
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} hbase-server: The patch generated 0 new + 29 
unchanged - 1 fixed = 29 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
50s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 16s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}155m 42s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.procedure.TestProcedurePriority |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914680/20090.v10.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 19ec4a5a5b0d 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 31da4d0bce |
| maven | version: Apache 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400200#comment-16400200
 ] 

Ted Yu commented on HBASE-20090:


bq. was to replace the precondition check with a test that checks the sizes and 
if both are 0 returns false plus logs the warning message.

Patch v6 was along that line.
Patch v10 adds DEBUG log - since user doesn't have actionable option so I think 
debug log should be enough.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, 
> 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-15 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400190#comment-16400190
 ] 

Eshcar Hillel commented on HBASE-20090:
---

Still I am not convinced the code in the last patch covers all possibilities.

Why to check only {{bestAnyRegionSize}}?

checkState checks that at least one of {{regionToFlush}} and 
{{bestRegionReplica}} has size greater than 0.

{{regionToFlush}} can be set to {{bestAnyRegion}} or to 
{{bestFlushableRegion.}} 

So by checking only {{bestAnyRegionSize}} we may end up still not satisfying 
the precondition.
I think one of the suggestion in this Jira was to replace the precondition 
check with a test that checks the sizes and if both are 0 returns false plus 
logs the warning message.
Can we try this solution?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399605#comment-16399605
 ] 

Ted Yu commented on HBASE-20090:


[~stack]:
Do you want this to go into branch-2.0 ?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399149#comment-16399149
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
7s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
46s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
36s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}135m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914503/20090.v9.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 51ea5f891167 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 84ee32c723 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398881#comment-16398881
 ] 

ramkrishna.s.vasudevan commented on HBASE-20090:


+1. Thanks [~yuzhih...@gmail.com] for working on this.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398720#comment-16398720
 ] 

Ted Yu commented on HBASE-20090:


Logged HBASE-20196 for improving region size accounting.

Thanks [~eshcar]

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398711#comment-16398711
 ] 

Ted Yu commented on HBASE-20090:


Patch v9 adds comment explaining the race condition.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt, 20090.v9.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398672#comment-16398672
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  7m 
42s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  8m 
29s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} 
|
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 
24s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} 
|
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 
32s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}103m 
23s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914447/20090.v8.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 057af463c828 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398504#comment-16398504
 ] 

Eshcar Hillel commented on HBASE-20090:
---

Yes, actually it makes sense to have all candidates sorted by size and not 
remove regions with the same size.

It should be applied to all three methods XXXbyoffheapsize, XXXbyonheapsize, 
XXXbydatasize.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398497#comment-16398497
 ] 

Ted Yu commented on HBASE-20090:


[~eshcar]:
I had another suggestion earlier:

https://issues.apache.org/jira/browse/HBASE-20090?focusedCommentId=16388569=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16388569

If it makes sense,  I can log a separate JIRA.

thanks

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398353#comment-16398353
 ] 

Anoop Sam John commented on HBASE-20090:


I think v8 looks very simple and most suitable.  You want to add some more info 
in that WARN log Ted?   This will be a surprise for any examining the logs.  
The global barrier is breached but the best region (by any way) is having no 
data to flush !   This can happen only when all the regions with data are not 
in a position to get flushed now.  Already they are under flush or about to be 
split..  May be some words like that will be helpful.   Any way this is very 
very rare case. Normally the RS will have more regions.  If this happens, there 
is a sleep of 1 sec in Flusher thread.  That is bad. The writes are blocked.  
But considering this as very rare, I think it is ok.  Pls add some fat comments 
about that also along with the one liner change.  

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398347#comment-16398347
 ] 

Ted Yu commented on HBASE-20090:


See if patch v8 is better - it aligns with Eshcar's initial suggestion and 
takes flushType into account when considering region size.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt, 20090.v8.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-14 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398277#comment-16398277
 ] 

Eshcar Hillel commented on HBASE-20090:
---

I am still not convinced it is necessary. You are not controlling the timing of 
setting writestate.writesEnabled to false.

Consider the case were during the first round of {{getBiggestMemStoreRegion 
}}writestate.writesEnabled of region A is true, then A is chosen in the first 
round, then in the second round writestate.writesEnabled of region A is false.


I believe simply adding the region-to-flush to the excluded set if it was not 
flushed at the end of the round is sufficient.

 

Other then that, returning false when we identify the size of the chosen region 
is 0 seems a legitimate solution to the original problem.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397096#comment-16397096
 ] 

Ted Yu commented on HBASE-20090:


bq. What can go wrong if you do not add A into the Set
{code}
  excludedRegions.add(region);
{code}
If the above line is omitted, null would still be returned in the scenario we 
discuss (region A splitting and region B having 0 size).
The above line is for skipping region A in the second call to 
{{getBiggestMemStoreRegion}} since we know region A is splitting.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397086#comment-16397086
 ] 

Eshcar Hillel commented on HBASE-20090:
---

yes to this set

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397080#comment-16397080
 ] 

Ted Yu commented on HBASE-20090:


bq. do not add A into the list

Can you be specific about which list ?
{{excludedRegions}} is a Set.

Were you referring to the following addition in the second if statement of 
{{getBiggestMemStoreRegion}} ?
{code}
  excludedRegions.add(region);
{code}

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397075#comment-16397075
 ] 

Eshcar Hillel commented on HBASE-20090:
---

What can go wrong if you do not add A into the list (just continue to the next 
entry) and then return null since B's size is zero?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397063#comment-16397063
 ] 

Ted Yu commented on HBASE-20090:


{{getBiggestMemStoreRegion}} is called twice with different values for 
{{checkStoreFileCount}}.
Suppose the region is added to exclusion in the following block during the 
first call:
{code}
  if (checkStoreFileCount && isTooManyStoreFiles(region))
{code}
In the second call to {{getBiggestMemStoreRegion}}, the region would be checked 
against {{excludedRegions}} at the beginning of the loop and not given a chance 
to execute the above check.
Note: value for {{checkStoreFileCount}} is different during the second call.
That is why I didn't add the region to exclusion in the above block.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-13 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397017#comment-16397017
 ] 

Eshcar Hillel commented on HBASE-20090:
---

So the suggested solution in v7 is to add region A into the excluded list and 
not to return B as a candidate since its size is 0, instead return null. This 
null is then identified in line 187 returning false to the caller as no flush 
was executed.

Question: why do we  add A into the excluded list while in the following if
{code:java}
if (checkStoreFileCount && isTooManyStoreFiles(region)){code}
we do not add the region into this list.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-12 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396552#comment-16396552
 ] 

ramkrishna.s.vasudevan commented on HBASE-20090:


I like Ted's patch may be in case where you add to excluded regions it may help 
to avoid that region in other subsequent iterations also  if for some reason 
other regions were not flushed.
Regarding the scenario as why this happened I think it is a valid case. 
[~eshcar] - You want to have a look at v7?


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396158#comment-16396158
 ] 

Ted Yu commented on HBASE-20090:


Eshcar:

bq. calling to flushOneForGlobalPressure region A is not selected (correct?),

Right. See the following condition in getBiggestMemStoreRegion():

if (region.writestate.flushing || !region.writestate.writesEnabled) {

  continue;

}



See also the log line from 20090-server-61260-01-07.log :

{code}

2018-03-02 17:28:30,096 DEBUG [MemStoreFlusher.0] regionserver.HRegion: NOT 
flushing memstore for region 
TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., flushing=false, 
writesEnabled=false
{code}
bq. Is this the right way to describe the problem?

Your description is mostly the same as my understanding (with minor correction 
below).

bq. is it because its size is set to 0 or since it is marked as non-flushable?

See the following log line from 20090-server-61260-01-07.log :
{code}
2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 
400432696
{code}
The size of the region (A) was non-zero. But the writestate being false 
resulted in the region not being included in the candidates.

bq. adding a check in line 187 would solve the problem?

That may work.
However, please take a look at patch v7 where the exclusion of ineligible 
regions is done in getBiggestMemStoreRegion().
With patch v7, the existing check on line 187 should suffice.

Thanks

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, 
> 20090.v7.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395817#comment-16395817
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
51s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
51s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 
26s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914097/20090.v7.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux e7481f8e84a3 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c8fba7071e |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-12 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395229#comment-16395229
 ] 

Eshcar Hillel commented on HBASE-20090:
---

I am trying to see if I understand correctly the discussed scenario:
 * Region A (TestTable) and region B(Atlas table) are the only two regions in 
the region server.
 * The region server exceeds the global pressure threshold (380MB)
 * Region A has 400MB while region B has 0MB
 * Region A has more than 5GB of data on disk (6.9GB) hence a split is 
triggered.
 * Finally, (this is what is not 100% clear:) in the next loop calling to 
flushOneForGlobalPressure region A is not selected (correct?), region B is 
selected but since it has size 0 the precondition test throws an exception.

Is this the right way to describe the problem? Can you explain again why region 
A is not selected, is it because its size is set to 0 or since it is marked as 
non-flushable?

To summarize, if I understand correctly there is a period of time where the 
global memstore size is high (>380MB) and at the same time the region holding 
all this data is being split and hence the data is in the process of being 
flushed, as a result the memstore flusher that is triggered due to global 
pressure finds itself in an inconsistent state:

{color:#008000}"Above memory mark but there are no flushable regions!"{color}

{color:#33}Do you think that adding a check in line 187 would solve the 
problem?
{color}
{code:java}
if (bestAnyRegion == null 
||  bestAnyRegion.getMemStoreDataSize() == 0) // NEW
{
  LOG.error("Above memory mark but there are no flushable regions!");
  return false;
}{code}
 

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389057#comment-16389057
 ] 

Ted Yu commented on HBASE-20090:


I also thought about adding test.

Since the race condition revolves around the online region traversal in 
flushOneForGlobalPressure(), it seems some knobs (which test can control) need 
to be added in flushOneForGlobalPressure(). I am not sure whether that can be 
achieved in a clean way.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389046#comment-16389046
 ] 

Ted Yu commented on HBASE-20090:


w.r.t. getBiggestMemstoreRegion(), there is 3rd parameter:
{code}
  boolean checkStoreFileCount) {
{code}
which is different for the two invocations.
I don't think the first invocation can add region to excludedRegions simply 
because the region doesn't pass the check in current call.
It seems more refactoring is needed to achieve what Ram suggested above.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388601#comment-16388601
 ] 

stack commented on HBASE-20090:
---

Thanks.

While the Enis commit is from 3 years ago, in case you did not know, memory 
accounting has been redone lately by [~eshcar]. Would suggest you consult with 
the authority before you make 'enhancement'.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388591#comment-16388591
 ] 

Ted Yu commented on HBASE-20090:


I have given example of exception from region server log above. Here is the 
snippet again:
{code}
2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.
   MemStoreFlusher$WakeupFlushThread@2adfadd7
java.lang.IllegalStateException
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
at java.lang.Thread.run(Thread.java:748)
{code}
bq. Who wrote this code?
{code}
4ac42a2f 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
 (Enis Soztutar   2015-03-06 14:32:05 -0800 256)   
Preconditions.checkState(
{code}
{code}
commit 4ac42a2f56b91ce864d1bcb04f1f9950e527aab1
Author: Enis Soztutar 
Date:   Fri Mar 6 14:32:05 2015 -0800

HBASE-12562 Handling memory pressure for secondary region replicas
{code}
>From 3 years ago.


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388586#comment-16388586
 ] 

stack commented on HBASE-20090:
---

bq. Who wrote this code? Flag them. I'm sure they'd be interested. Do you have 
an example of the exception or this just splunking?

I ask above early in this issue. Were these questions answered?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388569#comment-16388569
 ] 

Ted Yu commented on HBASE-20090:


Looking at the javadoc for getCopyOfOnlineRegionsSortedByOffHeapSize() :
{code}
   *   the biggest.  If two regions are the same size, then the last one found 
wins; i.e. this
   *   method may NOT return all regions.
{code}
Currently value type is HRegion - we only store one region per size.
One enhancement is to change value type to Collection so that we don't 
miss any region (potentially with big size).

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387971#comment-16387971
 ] 

Ted Yu commented on HBASE-20090:


My understanding is that flushOneForGlobalPressure() has better view of the 
regions' state than getBiggestMemStoreRegion() does.
For getBiggestMemStoreRegion(), null is returned if no region is found.

Ram:
If you can be more specific on how the additional check can be placed inside 
getBiggestMemStoreRegion(), I would be happy to pursue it.

Thanks

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387349#comment-16387349
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} hbase-server: The patch generated 0 new + 29 
unchanged - 1 fixed = 29 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
47s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}136m 
52s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913112/20090.v6.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 190f8227d5cc 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 4a4c012049 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-05 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387313#comment-16387313
 ] 

ramkrishna.s.vasudevan commented on HBASE-20090:


I think the case here is valid. As [~anoop.hbase] said this region size calc 
has been changed recently so we should be careful here.
Instead of changing the precondition to a normal 'if' clause, can we add a 
check to see if the region is having 0 data in this method
{code}
getBiggestMemStoreRegion()
{code}
It already has code for the concurrency that you are saying here where a split 
has happened and it has marked the region as not eligible for flush. So I think 
it would be better to add that check here? 


> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v6.txt
>
>
> Copied the following from a comment since this was better description of the 
> race condition.
> The original description was merged to the beginning of my first comment 
> below.
> With more debug logging, we can see the scenario where the exception was 
> triggered.
> {code}
> 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
> Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
> compaction_queue=(0:0), split_queue=1
> 2018-03-02 17:28:30,098 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 
> info  size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
> 2018-03-02 17:28:30,296 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush thread woke up because memory above low 
> water=381.5 M
> 2018-03-02 17:28:30,297 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
> regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696
> 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Flush of region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global
>  heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap 
> size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Nothing to flush for 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
> 2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] 
> regionserver.MemStoreFlusher: Excluding unflushable region 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
> find a different region to flush.
> {code}
> Region 0453f29030757eedb6e6a1c57e88c085 was being split.
> In HRegion#flushcache, the log from else branch can be seen in 
> 20090-server-61260-01-07.log :
> {code}
>   synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
>   this.writestate.flushing = true;
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("NOT flushing memstore for region " + this
> + ", flushing=" + writestate.flushing + ", writesEnabled="
> + writestate.writesEnabled);
>   }
> {code}
> Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving 
> memory pressure at high level.
> When MemStoreFlusher ran to the following call, the region was no longer a 
> flush candidate:
> {code}
>   HRegion bestFlushableRegion =
>   getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
> {code}
> So the other region, 
> atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
> next. Since the region was not receiving write, the (current) Precondition 
> check failed.
> The proposed fix is to convert the Precondition to normal return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385924#comment-16385924
 ] 

Ted Yu commented on HBASE-20090:


Here was how I produced the server log with additional DEBUG information:

* Wipe out TestTable
* Add DEBUG log, build tar ball and load tar ball onto the cluster
* Observe that TestTable has 8 regions, then search for the new DEBUG log in 
each region server log

The above procedure was repeated several times until I got what you saw as 
20090-server-61260-01-07.log

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, 
> 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385919#comment-16385919
 ] 

Ted Yu commented on HBASE-20090:


In another version of the patch I composed, fqe is passed to 
flushOneForGlobalPressure().

We can add the condition when fqe == WAKEUPFLUSH_INSTANCE, it is Okay to return 
early from flushOneForGlobalPressure() due to the race condition described 
above.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, 
> 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385904#comment-16385904
 ] 

Ted Yu commented on HBASE-20090:


Relevant region server log snippet attached.
See if the snippet provided what you were looking for.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, 
> 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385709#comment-16385709
 ] 

Anoop Sam John commented on HBASE-20090:


I did read the desc alone not the whole comments Ted. Sorry.. Ya my doubt also 
in that way..  But as recently there were some calc changes and the way select 
regions I thought this needs a careful eye.  Any way whole that area once again 
thorough needs to be reviewed.  Fine tomorrow or whenever u have some time try 
reproduce and gather all debug logs near by.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385681#comment-16385681
 ] 

Ted Yu commented on HBASE-20090:


bq. Is this reproducible?

Though not 100% reproducible, this has high probability of happening when data 
ingestion rate is high and regions split during ingestion.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385676#comment-16385676
 ] 

Ted Yu commented on HBASE-20090:


Thanks for taking a look, Anoop.

Did you read my explanation above the patch v01 QA result ?

bq. by the time the global heap pressure flush try to do the flush, the size 
become zero

There were two regions on the region server of interest, region 
0453f29030757eedb6e6a1c57e88c085 was being split.
>From the log I added, we can see that it appeared at 2018-03-02 17:28:30,298 
>with non-zero size.
However, when the following loop kicked in:
{code}
while (!flushedOne) {
{code}
It started splitting. Therefore the other region with memstore size 0 was 
picked up.
The Precondition check failed due to 0 memstore size.

I was thinking of other ways to fix this concurrency issue but ended up picking 
what you see in patch v4. The rationale is that the region being split would 
finish splitting and become eligible for future flushing.
Temporary suspension of flushing would be lifted later.

I can dig up more of the logs tomorrow - it is late in California.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385659#comment-16385659
 ] 

Anoop Sam John commented on HBASE-20090:


The thing to be checked is why a best region been selected having heap size of 
0 when it was about to flush. The selection of best region for flush is already 
based on the heap size of the region.  There were some changes in this area 
recently about tracking heap size not data size for the flush decision etc.  
But on first look seems that area not making any new harm..  Is this 
reproducible?  If so we need to have DEBUG logs and past the near by logs also. 
 Is it like the selected region was any way about to flush (because of flush 
decision per region itself) and so by the time the global heap pressure flush  
try to do the flush, the size become zero?  There is time a gap between the 
place the best region been selected and we assign its heap size to a variable.  
All these we can clearly say if we have nearby logs.  So IMO rather than simple 
fix, we should investigate the real reason.  Finally this fix may be the one we 
can really do.. But before that pls do a thorough investigation.. I think this 
will be a good jira to look at and investigate. 

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385519#comment-16385519
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} hbase-server: The patch generated 0 new + 29 
unchanged - 1 fixed = 29 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
51s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 
41s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912958/20090.v5.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a269a8957eed 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 485af49e53 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385441#comment-16385441
 ] 

Ted Yu commented on HBASE-20090:


Patch v5 addresses checkstyle warning.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385434#comment-16385434
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
52s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
5s{color} | {color:red} hbase-server: The patch generated 1 new + 30 unchanged 
- 0 fixed = 31 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
41s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 16s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}104m 
43s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912954/20090.v4.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c8ae7df5c2b0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 485af49e53 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384221#comment-16384221
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
4s{color} | {color:red} hbase-server: The patch generated 1 new + 30 unchanged 
- 0 fixed = 31 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
40s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}136m 
23s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912811/20090.v1.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 57fbe6304939 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1d25b60831 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383979#comment-16383979
 ] 

Hadoop QA commented on HBASE-20090:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} hbase-mapreduce: The patch generated 3 new + 51 
unchanged - 0 fixed = 54 total (was 51) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
36s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 
18s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20090 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912717/20094.v01.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d1c2d38fbd0e 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1d25b60831 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11782/artifact/patchprocess/diff-checkstyle-hbase-mapreduce.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11782/testReport/ |
| Max. process+thread count | 4391 (vs. ulimit of 1) |
| modules | C: hbase-mapreduce U: hbase-mapreduce |
| Console output | 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383887#comment-16383887
 ] 

Ted Yu commented on HBASE-20090:


With more debug logging, we can see the scenario where the exception was 
triggered.
{code}
2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: 
Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., 
compaction_queue=(0:0), split_queue=1
2018-03-02 17:28:30,098 DEBUG 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info  
size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1
2018-03-02 17:28:30,296 INFO  
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Flush thread woke up because memory above low water=381.5 M
2018-03-02 17:28:30,297 INFO  
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK
2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 
400432696
2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0
2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 
due to global heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal 
Memstore Heap size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
wake up by WAKEUPFLUSH_INSTANCE
2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Nothing to flush for 
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
2018-03-02 17:28:30,298 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Excluding unflushable region 
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
find a different region to flush.
{code}
Region 0453f29030757eedb6e6a1c57e88c085 was being split.
When MemStoreFlusher ran to this call, the region was no longer a flush 
candidate:
{code}
  HRegion bestFlushableRegion =
  getBiggestMemStoreRegion(regionsBySize, excludedRegions, true);
{code}
So the other region, 
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined 
next. Since the region was not receiving write, the (current) Precondition 
check failed.

The proposed fix is to convert the Precondition to normal log.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383743#comment-16383743
 ] 

Ted Yu commented on HBASE-20090:


Did some more testing on a 5 region server cluster with additional logging.
TestTable had 8 regions when PE stopped.

However, the additional log didn't show up in any of the region server logs.

Will conduct more testing.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383215#comment-16383215
 ] 

stack commented on HBASE-20090:
---

Why you posting a patch when you've not said what the problem 
is nor how the patch supposedly fixes this problem 
[~yuzhih...@gmail.com]

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383213#comment-16383213
 ] 

stack commented on HBASE-20090:
---

Patch is for the wrong issue?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-03-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383145#comment-16383145
 ] 

Ted Yu commented on HBASE-20090:


It seems the Preconditions check can be converted to a normal condition check.
[~ram_krish] [~anoop.hbase] [~anastas] :
Can you take a look at the patch ?
Here was snippet from region server log during PE randomWrite:
{code}
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 
due to global heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal 
Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Nothing to flush for 
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Excluding unflushable region 
atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to 
find a different region to flush.
{code}
Note atlas_janus was not the table being written.
TestTable was being written to.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377843#comment-16377843
 ] 

Ted Yu commented on HBASE-20090:


Observed the following in region server log (in hadoop3 cluster):
{code}
2018-02-26 16:06:49,962 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=804.67 KB
2018-02-26 16:06:50,028 INFO  [MemStoreFlusher.1] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, 
hasBloomFilter=true, into tmp file hdfs://  
mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25
2018-02-26 16:06:50,042 INFO  [MemStoreFlusher.1] regionserver.HStore: Added 
hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/
81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, 
filesize=813.9 K
2018-02-26 16:06:50,044 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region 
TestTable,  
00155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 
82ms, sequenceid=5448, compaction requested=true
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms

2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.
   MemStoreFlusher$WakeupFlushThread@2adfadd7
java.lang.IllegalStateException
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
at java.lang.Thread.run(Thread.java:748)
{code}
Unfortunately the DEBUG logging was not on.
Will see if I can reproduce the exception next time.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by 

[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

2018-02-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377838#comment-16377838
 ] 

stack commented on HBASE-20090:
---

Who wrote this code? Flag them. I'm sure they'd be interested. Do you have an 
example of the exception or this just splunking?

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> ---
>
> Key: HBASE-20090
> URL: https://issues.apache.org/jira/browse/HBASE-20090
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> Here is the code in branch-2 :
> {code}
> try {
>   wakeupPending.set(false); // allow someone to wake us up again
>   fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>   if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>   if (!flushOneForGlobalPressure()) {
> ...
>   FlushRegionEntry fre = (FlushRegionEntry) fqe;
>   if (!flushRegion(fre)) {
> break;
> ...
> } catch (Exception ex) {
>   LOG.error("Cache flusher failed for entry " + fqe, ex);
>   if (!server.checkFileSystem()) {
> break;
>   }
> }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>   Preconditions.checkState(
> (regionToFlush != null && regionToFlushSize > 0) ||
> (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)