[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416679#comment-16416679 ] Hudson commented on HBASE-20090: Results for branch HBASE-19064 [build #77 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/77//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 2.0.0 > > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408915#comment-16408915 ] Hudson commented on HBASE-20090: Results for branch branch-2 [build #513 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/513//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 2.0.0 > > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408727#comment-16408727 ] Hudson commented on HBASE-20090: Results for branch branch-2.0 [build #69 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/69//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 2.0.0 > > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408156#comment-16408156 ] stack commented on HBASE-20090: --- Pushed to branch-2 and branch-2.0. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 2.0.0 > > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407836#comment-16407836 ] Hudson commented on HBASE-20090: Results for branch master [build #268 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/268/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/268//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407427#comment-16407427 ] Ted Yu commented on HBASE-20090: Integrated to master branch. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407425#comment-16407425 ] stack commented on HBASE-20090: --- Now I get it. Thanks. Let me backport... > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407254#comment-16407254 ] Ted Yu commented on HBASE-20090: bq. but we are still above the high-water mark? At the moment of trigger of Precondition, the lower mark was crossed. See log: {code} 2018-03-02 17:28:30,301 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK {code} bq. Do you know why that is? The write load from PE produced the above condition which lasted for some period of time. bq. There were two regions in this server that was under pressue? No. Among the two regions I mentioned, one was under pressure. 0453f29030757eedb6e6a1c57e88c085 for TestTable was receiving writes and under pressure (and splitting). The region for atlas_janus table was dormant during the period when PE ran. It had a zero-sized memstore. Since the existing Precondition didn't account for this combination, the assertion was triggered. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407204#comment-16407204 ] stack commented on HBASE-20090: --- Yeah, sorry, RN makes no sense to me still but the description update helped a bunch. Thanks. So, we avoid the precondition illegal state exception (including this in description helped most) but we are still above the high-water mark? Do you know why that is? There were two regions in this server that was under pressue? One that had a zero-sized memstore and another that was currently splitting? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407182#comment-16407182 ] Ted Yu commented on HBASE-20090: I have updated both description and Release Note. Thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { >
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407149#comment-16407149 ] stack commented on HBASE-20090: --- So, to understand the release note I have to read the description ... which I have and I still don't get it -- I reread it and it talks about exceptions of which there are none in the description and it talks of preconditions but I don't see it the code snippet provided. I look at the patch and it adds a new check. What are we avoiding with this patch? Please put it in the RN. I'm trying to avoid having to read this whole issue to figure whether I need this patch in 2.0 or not. Thanks. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405145#comment-16405145 ] Ted Yu commented on HBASE-20090: bq. Why would a region not receive writes? Please see description above: Region 0453f29030757eedb6e6a1c57e88c085, belonging to TestTable, was being split. The other region, atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae (of table atlas_janus), was not being written to - I used PerformanceEvaluation which wrote to TestTable only. Region fbcb5e495344542daf8b499e4bac03ae's data size was 0, triggering the Precondition assertion (before the fix). > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405136#comment-16405136 ] stack commented on HBASE-20090: --- How does this happen "... if the only candidate left doesn't receive writes. " Why would a region not receive writes? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405066#comment-16405066 ] stack commented on HBASE-20090: --- Can we get a summary of what the problem is in release note and implications should the condition arise? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404392#comment-16404392 ] ramkrishna.s.vasudevan commented on HBASE-20090: +1 > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404377#comment-16404377 ] Anoop Sam John commented on HBASE-20090: +1 > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404237#comment-16404237 ] Ted Yu commented on HBASE-20090: [~ram_krish] [~anoop.hbase]: Mind taking a look at patch v10 ? Thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403908#comment-16403908 ] Eshcar Hillel commented on HBASE-20090: --- +1 > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400881#comment-16400881 ] Ted Yu commented on HBASE-20090: hadoopcheck seems to be related to build environment: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.5.2:install (default-install) on project hbase-thrift: Failed to install metadata org.apache.hbase:hbase-thrift:3.0.0-SNAPSHOT/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/hbase/hbase-thrift/3.0.0-SNAPSHOT/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got / (position: END_TAG seen ...\n/... @25:2) -> [Help 1] {code} Not caused by the patch. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400872#comment-16400872 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 7s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} hbase-server: The patch generated 0 new + 29 unchanged - 1 fixed = 29 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 1s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 58s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 0s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}108m 37s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}140m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914726/20090.v10.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d9d986ba910a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400449#comment-16400449 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} hbase-server: The patch generated 0 new + 29 unchanged - 1 fixed = 29 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 50s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 16s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}155m 42s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}201m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestProcedurePriority | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914680/20090.v10.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 19ec4a5a5b0d 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 31da4d0bce | | maven | version: Apache
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400200#comment-16400200 ] Ted Yu commented on HBASE-20090: bq. was to replace the precondition check with a test that checks the sizes and if both are 0 returns false plus logs the warning message. Patch v6 was along that line. Patch v10 adds DEBUG log - since user doesn't have actionable option so I think debug log should be enough. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v10.txt, > 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400190#comment-16400190 ] Eshcar Hillel commented on HBASE-20090: --- Still I am not convinced the code in the last patch covers all possibilities. Why to check only {{bestAnyRegionSize}}? checkState checks that at least one of {{regionToFlush}} and {{bestRegionReplica}} has size greater than 0. {{regionToFlush}} can be set to {{bestAnyRegion}} or to {{bestFlushableRegion.}} So by checking only {{bestAnyRegionSize}} we may end up still not satisfying the precondition. I think one of the suggestion in this Jira was to replace the precondition check with a test that checks the sizes and if both are 0 returns false plus logs the warning message. Can we try this solution? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399605#comment-16399605 ] Ted Yu commented on HBASE-20090: [~stack]: Do you want this to go into branch-2.0 ? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399149#comment-16399149 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 7s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 46s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 36s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}135m 24s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914503/20090.v9.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 51ea5f891167 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 84ee32c723 | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | Test Results |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398881#comment-16398881 ] ramkrishna.s.vasudevan commented on HBASE-20090: +1. Thanks [~yuzhih...@gmail.com] for working on this. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398720#comment-16398720 ] Ted Yu commented on HBASE-20090: Logged HBASE-20196 for improving region size accounting. Thanks [~eshcar] > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398711#comment-16398711 ] Ted Yu commented on HBASE-20090: Patch v9 adds comment explaining the race condition. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398672#comment-16398672 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 7m 42s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 52s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 29s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 24s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 32s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}103m 23s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}148m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914447/20090.v8.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 057af463c828 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398504#comment-16398504 ] Eshcar Hillel commented on HBASE-20090: --- Yes, actually it makes sense to have all candidates sorted by size and not remove regions with the same size. It should be applied to all three methods XXXbyoffheapsize, XXXbyonheapsize, XXXbydatasize. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398497#comment-16398497 ] Ted Yu commented on HBASE-20090: [~eshcar]: I had another suggestion earlier: https://issues.apache.org/jira/browse/HBASE-20090?focusedCommentId=16388569=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16388569 If it makes sense, I can log a separate JIRA. thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398353#comment-16398353 ] Anoop Sam John commented on HBASE-20090: I think v8 looks very simple and most suitable. You want to add some more info in that WARN log Ted? This will be a surprise for any examining the logs. The global barrier is breached but the best region (by any way) is having no data to flush ! This can happen only when all the regions with data are not in a position to get flushed now. Already they are under flush or about to be split.. May be some words like that will be helpful. Any way this is very very rare case. Normally the RS will have more regions. If this happens, there is a sleep of 1 sec in Flusher thread. That is bad. The writes are blocked. But considering this as very rare, I think it is ok. Pls add some fat comments about that also along with the one liner change. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398347#comment-16398347 ] Ted Yu commented on HBASE-20090: See if patch v8 is better - it aligns with Eshcar's initial suggestion and takes flushType into account when considering region size. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398277#comment-16398277 ] Eshcar Hillel commented on HBASE-20090: --- I am still not convinced it is necessary. You are not controlling the timing of setting writestate.writesEnabled to false. Consider the case were during the first round of {{getBiggestMemStoreRegion }}writestate.writesEnabled of region A is true, then A is chosen in the first round, then in the second round writestate.writesEnabled of region A is false. I believe simply adding the region-to-flush to the excluded set if it was not flushed at the end of the round is sufficient. Â Other then that, returning false when we identify the size of the chosen region is 0 seems a legitimate solution to the original problem. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. --
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397096#comment-16397096 ] Ted Yu commented on HBASE-20090: bq. What can go wrong if you do not add A into the Set {code} excludedRegions.add(region); {code} If the above line is omitted, null would still be returned in the scenario we discuss (region A splitting and region B having 0 size). The above line is for skipping region A in the second call to {{getBiggestMemStoreRegion}} since we know region A is splitting. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397086#comment-16397086 ] Eshcar Hillel commented on HBASE-20090: --- yes to this set > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397080#comment-16397080 ] Ted Yu commented on HBASE-20090: bq. do not add A into the list Can you be specific about which list ? {{excludedRegions}} is a Set. Were you referring to the following addition in the second if statement of {{getBiggestMemStoreRegion}} ? {code} excludedRegions.add(region); {code} > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397075#comment-16397075 ] Eshcar Hillel commented on HBASE-20090: --- What can go wrong if you do not add A into the list (just continue to the next entry) and then return null since B's size is zero? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397063#comment-16397063 ] Ted Yu commented on HBASE-20090: {{getBiggestMemStoreRegion}} is called twice with different values for {{checkStoreFileCount}}. Suppose the region is added to exclusion in the following block during the first call: {code} if (checkStoreFileCount && isTooManyStoreFiles(region)) {code} In the second call to {{getBiggestMemStoreRegion}}, the region would be checked against {{excludedRegions}} at the beginning of the loop and not given a chance to execute the above check. Note: value for {{checkStoreFileCount}} is different during the second call. That is why I didn't add the region to exclusion in the above block. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397017#comment-16397017 ] Eshcar Hillel commented on HBASE-20090: --- So the suggested solution in v7 is to add region A into the excluded list and not to return B as a candidate since its size is 0, instead return null. This null is then identified in line 187 returning false to the caller as no flush was executed. Question: why do we add A into the excluded list while in the following if {code:java} if (checkStoreFileCount && isTooManyStoreFiles(region)){code} we do not add the region into this list. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396552#comment-16396552 ] ramkrishna.s.vasudevan commented on HBASE-20090: I like Ted's patch may be in case where you add to excluded regions it may help to avoid that region in other subsequent iterations also if for some reason other regions were not flushed. Regarding the scenario as why this happened I think it is a valid case. [~eshcar] - You want to have a look at v7? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396158#comment-16396158 ] Ted Yu commented on HBASE-20090: Eshcar: bq. calling to flushOneForGlobalPressure region A is not selected (correct?), Right. See the following condition in getBiggestMemStoreRegion(): if (region.writestate.flushing || !region.writestate.writesEnabled) { continue; } See also the log line from 20090-server-61260-01-07.log : {code} 2018-03-02 17:28:30,096 DEBUG [MemStoreFlusher.0] regionserver.HRegion: NOT flushing memstore for region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., flushing=false, writesEnabled=false {code} bq. Is this the right way to describe the problem? Your description is mostly the same as my understanding (with minor correction below). bq. is it because its size is set to 0 or since it is marked as non-flushable? See the following log line from 20090-server-61260-01-07.log : {code} 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 {code} The size of the region (A) was non-zero. But the writestate being false resulted in the region not being included in the candidates. bq. adding a check in line 187 would solve the problem? That may work. However, please take a look at patch v7 where the exclusion of ineligible regions is done in getBiggestMemStoreRegion(). With patch v7, the existing check on line 187 should suffice. Thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt, > 20090.v7.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395817#comment-16395817 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 51s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 26s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12914097/20090.v7.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux e7481f8e84a3 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / c8fba7071e | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | Test Results |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395229#comment-16395229 ] Eshcar Hillel commented on HBASE-20090: --- I am trying to see if I understand correctly the discussed scenario: * Region A (TestTable) and region B(Atlas table) are the only two regions in the region server. * The region server exceeds the global pressure threshold (380MB) * Region A has 400MB while region B has 0MB * Region A has more than 5GB of data on disk (6.9GB) hence a split is triggered. * Finally, (this is what is not 100% clear:) in the next loop calling to flushOneForGlobalPressure region A is not selected (correct?), region B is selected but since it has size 0 the precondition test throws an exception. Is this the right way to describe the problem? Can you explain again why region A is not selected, is it because its size is set to 0 or since it is marked as non-flushable? To summarize, if I understand correctly there is a period of time where the global memstore size is high (>380MB) and at the same time the region holding all this data is being split and hence the data is in the process of being flushed, as a result the memstore flusher that is triggered due to global pressure finds itself in an inconsistent state: {color:#008000}"Above memory mark but there are no flushable regions!"{color} {color:#33}Do you think that adding a check in line 187 would solve the problem? {color} {code:java} if (bestAnyRegion == null || bestAnyRegion.getMemStoreDataSize() == 0) // NEW { LOG.error("Above memory mark but there are no flushable regions!"); return false; }{code} Â > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389057#comment-16389057 ] Ted Yu commented on HBASE-20090: I also thought about adding test. Since the race condition revolves around the online region traversal in flushOneForGlobalPressure(), it seems some knobs (which test can control) need to be added in flushOneForGlobalPressure(). I am not sure whether that can be achieved in a clean way. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389046#comment-16389046 ] Ted Yu commented on HBASE-20090: w.r.t. getBiggestMemstoreRegion(), there is 3rd parameter: {code} boolean checkStoreFileCount) { {code} which is different for the two invocations. I don't think the first invocation can add region to excludedRegions simply because the region doesn't pass the check in current call. It seems more refactoring is needed to achieve what Ram suggested above. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388601#comment-16388601 ] stack commented on HBASE-20090: --- Thanks. While the Enis commit is from 3 years ago, in case you did not know, memory accounting has been redone lately by [~eshcar]. Would suggest you consult with the authority before you make 'enhancement'. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388591#comment-16388591 ] Ted Yu commented on HBASE-20090: I have given example of exception from region server log above. Here is the snippet again: {code} 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver. MemStoreFlusher$WakeupFlushThread@2adfadd7 java.lang.IllegalStateException at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) at java.lang.Thread.run(Thread.java:748) {code} bq. Who wrote this code? {code} 4ac42a2f hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java (Enis Soztutar 2015-03-06 14:32:05 -0800 256) Preconditions.checkState( {code} {code} commit 4ac42a2f56b91ce864d1bcb04f1f9950e527aab1 Author: Enis SoztutarDate: Fri Mar 6 14:32:05 2015 -0800 HBASE-12562 Handling memory pressure for secondary region replicas {code} >From 3 years ago. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled);
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388586#comment-16388586 ] stack commented on HBASE-20090: --- bq. Who wrote this code? Flag them. I'm sure they'd be interested. Do you have an example of the exception or this just splunking? I ask above early in this issue. Were these questions answered? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388569#comment-16388569 ] Ted Yu commented on HBASE-20090: Looking at the javadoc for getCopyOfOnlineRegionsSortedByOffHeapSize() : {code} * the biggest. If two regions are the same size, then the last one found wins; i.e. this * method may NOT return all regions. {code} Currently value type is HRegion - we only store one region per size. One enhancement is to change value type to Collection so that we don't miss any region (potentially with big size). > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387971#comment-16387971 ] Ted Yu commented on HBASE-20090: My understanding is that flushOneForGlobalPressure() has better view of the regions' state than getBiggestMemStoreRegion() does. For getBiggestMemStoreRegion(), null is returned if no region is found. Ram: If you can be more specific on how the additional check can be placed inside getBiggestMemStoreRegion(), I would be happy to pursue it. Thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387349#comment-16387349 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} hbase-server: The patch generated 0 new + 29 unchanged - 1 fixed = 29 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 47s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}136m 52s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913112/20090.v6.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 190f8227d5cc 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 4a4c012049 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387313#comment-16387313 ] ramkrishna.s.vasudevan commented on HBASE-20090: I think the case here is valid. As [~anoop.hbase] said this region size calc has been changed recently so we should be careful here. Instead of changing the precondition to a normal 'if' clause, can we add a check to see if the region is having 0 data in this method {code} getBiggestMemStoreRegion() {code} It already has code for the concurrency that you are saying here where a split has happened and it has marked the region as not eligible for flush. So I think it would be better to add that check here? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v6.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-07.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385924#comment-16385924 ] Ted Yu commented on HBASE-20090: Here was how I produced the server log with additional DEBUG information: * Wipe out TestTable * Add DEBUG log, build tar ball and load tar ball onto the cluster * Observe that TestTable has 8 regions, then search for the new DEBUG log in each region server log The above procedure was repeated several times until I got what you saw as 20090-server-61260-01-07.log > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, > 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385919#comment-16385919 ] Ted Yu commented on HBASE-20090: In another version of the patch I composed, fqe is passed to flushOneForGlobalPressure(). We can add the condition when fqe == WAKEUPFLUSH_INSTANCE, it is Okay to return early from flushOneForGlobalPressure() due to the race condition described above. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, > 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385904#comment-16385904 ] Ted Yu commented on HBASE-20090: Relevant region server log snippet attached. See if the snippet provided what you were looking for. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090-server-61260-01-07.log, 20090.v1.txt, > 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385709#comment-16385709 ] Anoop Sam John commented on HBASE-20090: I did read the desc alone not the whole comments Ted. Sorry.. Ya my doubt also in that way.. But as recently there were some calc changes and the way select regions I thought this needs a careful eye. Any way whole that area once again thorough needs to be reviewed. Fine tomorrow or whenever u have some time try reproduce and gather all debug logs near by. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385681#comment-16385681 ] Ted Yu commented on HBASE-20090: bq. Is this reproducible? Though not 100% reproducible, this has high probability of happening when data ingestion rate is high and regions split during ingestion. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385676#comment-16385676 ] Ted Yu commented on HBASE-20090: Thanks for taking a look, Anoop. Did you read my explanation above the patch v01 QA result ? bq. by the time the global heap pressure flush try to do the flush, the size become zero There were two regions on the region server of interest, region 0453f29030757eedb6e6a1c57e88c085 was being split. >From the log I added, we can see that it appeared at 2018-03-02 17:28:30,298 >with non-zero size. However, when the following loop kicked in: {code} while (!flushedOne) { {code} It started splitting. Therefore the other region with memstore size 0 was picked up. The Precondition check failed due to 0 memstore size. I was thinking of other ways to fix this concurrency issue but ended up picking what you see in patch v4. The rationale is that the region being split would finish splitting and become eligible for future flushing. Temporary suspension of flushing would be lifted later. I can dig up more of the logs tomorrow - it is late in California. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385659#comment-16385659 ] Anoop Sam John commented on HBASE-20090: The thing to be checked is why a best region been selected having heap size of 0 when it was about to flush. The selection of best region for flush is already based on the heap size of the region. There were some changes in this area recently about tracking heap size not data size for the flush decision etc. But on first look seems that area not making any new harm.. Is this reproducible? If so we need to have DEBUG logs and past the near by logs also. Is it like the selected region was any way about to flush (because of flush decision per region itself) and so by the time the global heap pressure flush try to do the flush, the size become zero? There is time a gap between the place the best region been selected and we assign its heap size to a variable. All these we can clearly say if we have nearby logs. So IMO rather than simple fix, we should investigate the real reason. Finally this fix may be the one we can really do.. But before that pls do a thorough investigation.. I think this will be a good jira to look at and investigate. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385519#comment-16385519 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} hbase-server: The patch generated 0 new + 29 unchanged - 1 fixed = 29 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912958/20090.v5.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux a269a8957eed 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 485af49e53 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385441#comment-16385441 ] Ted Yu commented on HBASE-20090: Patch v5 addresses checkstyle warning. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20090.v1.txt, 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385434#comment-16385434 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 52s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 41s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 16s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}104m 43s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912954/20090.v4.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c8ae7df5c2b0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 485af49e53 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384221#comment-16384221 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 4s{color} | {color:red} hbase-server: The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 40s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}136m 23s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912811/20090.v1.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 57fbe6304939 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1d25b60831 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383979#comment-16383979 ] Hadoop QA commented on HBASE-20090: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 16s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s{color} | {color:red} hbase-mapreduce: The patch generated 3 new + 51 unchanged - 0 fixed = 54 total (was 51) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 36s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 18s{color} | {color:green} hbase-mapreduce in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20090 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912717/20094.v01.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d1c2d38fbd0e 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1d25b60831 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11782/artifact/patchprocess/diff-checkstyle-hbase-mapreduce.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11782/testReport/ | | Max. process+thread count | 4391 (vs. ulimit of 1) | | modules | C: hbase-mapreduce U: hbase-mapreduce | | Console output |
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383887#comment-16383887 ] Ted Yu commented on HBASE-20090: With more debug logging, we can see the scenario where the exception was triggered. {code} 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., compaction_queue=(0:0), split_queue=1 2018-03-02 17:28:30,098 DEBUG [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 2018-03-02 17:28:30,296 INFO [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=381.5 M 2018-03-02 17:28:30,297 INFO [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: region TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] regionserver.MemStoreFlusher: region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to find a different region to flush. {code} Region 0453f29030757eedb6e6a1c57e88c085 was being split. When MemStoreFlusher ran to this call, the region was no longer a flush candidate: {code} HRegion bestFlushableRegion = getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); {code} So the other region, atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined next. Since the region was not receiving write, the (current) Precondition check failed. The proposed fix is to convert the Precondition to normal log. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383743#comment-16383743 ] Ted Yu commented on HBASE-20090: Did some more testing on a 5 region server cluster with additional logging. TestTable had 8 regions when PE stopped. However, the additional log didn't show up in any of the region server logs. Will conduct more testing. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383215#comment-16383215 ] stack commented on HBASE-20090: --- Why you posting a patch when you've not said what the problem is nor how the patch supposedly fixes this problem [~yuzhih...@gmail.com] > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383213#comment-16383213 ] stack commented on HBASE-20090: --- Patch is for the wrong issue? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383145#comment-16383145 ] Ted Yu commented on HBASE-20090: It seems the Preconditions check can be converted to a normal condition check. [~ram_krish] [~anoop.hbase] [~anastas] : Can you take a look at the patch ? Here was snippet from region server log during PE randomWrite: {code} 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -trying to find a different region to flush. {code} Note atlas_janus was not the table being written. TestTable was being written to. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377843#comment-16377843 ] Ted Yu commented on HBASE-20090: Observed the following in region server log (in hadoop3 cluster): {code} 2018-02-26 16:06:49,962 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=804.67 KB 2018-02-26 16:06:50,028 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, hasBloomFilter=true, into tmp file hdfs:// mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25 2018-02-26 16:06:50,042 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/ 81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, filesize=813.9 K 2018-02-26 16:06:50,044 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region TestTable, 00155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 82ms, sequenceid=5448, compaction requested=true 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver. MemStoreFlusher$WakeupFlushThread@2adfadd7 java.lang.IllegalStateException at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) at java.lang.Thread.run(Thread.java:748) {code} Unfortunately the DEBUG logging was not on. Will see if I can reproduce the exception next time. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by
[jira] [Commented] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377838#comment-16377838 ] stack commented on HBASE-20090: --- Who wrote this code? Flag them. I'm sure they'd be interested. Do you have an example of the exception or this just splunking? > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > --- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)