[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874109#comment-16874109 ] Hudson commented on HBASE-22422: Results for branch branch-2 [build #2029 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2029/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2029//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2029//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2029//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870934#comment-16870934 ] Hudson commented on HBASE-22422: Results for branch master [build #1168 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1168/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1168//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1168//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1168//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849899#comment-16849899 ] ramkrishna.s.vasudevan commented on HBASE-22422: [~openinx] I just asked a question in the PR. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849286#comment-16849286 ] Zheng Hu commented on HBASE-22422: -- Pushed to HBASE-21879 branch, Thanks [~Apache9] & [~ram_krish] for reviewing. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848366#comment-16848366 ] HBase QA commented on HBASE-22422: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} HBASE-21879 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 26s{color} | {color:green} HBASE-21879 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} HBASE-21879 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} HBASE-21879 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 47s{color} | {color:blue} hbase-server in HBASE-21879 has 11 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} HBASE-21879 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s{color} | {color:red} hbase-server: The patch generated 1 new + 130 unchanged - 2 fixed = 131 total (was 132) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}130m 17s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}170m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/419/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22422 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12969803/HBASE-22422.HBASE-21879.v02.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux a8b269ea44a8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | HBASE-21879 / 111c95c11c | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.11 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/419/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/419/testReport/ | | Max. process+thread count | 4895 (vs. ulimit of 1) | |
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848024#comment-16848024 ] Zheng Hu commented on HBASE-22422: -- Upload a picture to show the current YCSB result ( see https://issues.apache.org/jira/secure/attachment/12969731/HBASE-22422-qps-after-fix-the-zero-retain-bug.png). At least, the QPS wouldn't drop to hundreds. but the sawtooth curve still looks some strange, anyway, will continue the digging. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847259#comment-16847259 ] ramkrishna.s.vasudevan commented on HBASE-22422: bq.Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to getBlock as following: Good one. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847199#comment-16847199 ] Zheng Hu commented on HBASE-22422: -- Data block reading failure will lead to an extra index-block release, that's to say: there will be a index block in LruBlockCache with refCnt=0, then all the following RPC requesting to this zero refCnt index-block will get a IllegalReferenceCountException, which make the QPS dropped from 25000/s to hunderds per second. Let me explain the detail, see the method HFileBlockIndex#loadDataBlockWithScanInfo: {code} HFileBlock block = null; boolean dataBlock = false; KeyOnlyKeyValue tmpNextIndexKV = new KeyValue.KeyOnlyKeyValue(); while (true) { try { //. block = cachingBlockReader.readBlock(currentOffset, currentOnDiskSize, shouldCache, pread, isCompaction, true, expectedBlockType, expectedDataBlockEncoding); // Loop until we got a DataBlock; } } finally { if (!dataBlock && block != null) { // Release the block immediately if it is not the data block block.release(); } } {code} The first time in while loop, the block is a index block and read successfully from the LRuBlockCache; The second time in while loop, need to read a data block in CombinedBLockcache, while read failure because of the above RAMCache concurrent issue. then an exception thrown when cachingBlockReader#readBlock. But the block variable still reference to a index block, then we did an extra release in the finally block. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847185#comment-16847185 ] Zheng Hu commented on HBASE-22422: -- Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to getBlock as following: Step.1 : get the block1 from RAMCache#delegate; Step.2 : call the block1#retain to increase its refCnt; But another thread2 have flushed block into IOEngine and start clear the block from RAMCache: Step.a : get the block1 by RAMCache#delegate.remove; Step.b: call the block1#release to decrease its refCnt. If those steps above ordered as following: Step.1 : get the block1 from RAMCache#delegate; Step.a : get the block1 by RAMCache#delegate.remove; Step.b: call the block1#release to decrease its refCnt, here the refCnt decrease from 1 to 0; Step.2 : call the block1#retain to increase its refCnt; Then, the concurrent bug will occur. One way to fix this is : make the getAndRetain/removeAndRelease to be atomic. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847172#comment-16847172 ] Zheng Hu commented on HBASE-22422: -- After running some hours, the bug reproduced in my pressure cluster, has the following log: {code} 2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> Start to dump callerSet for #641783987 2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: --> #641783987 -> caller: HFileScannerImpl#returnBlocks: return curBlock, refCnt before release is: 2 2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: --> #641783987 -> caller: RAMCache#remove, refCnt before release is: 1 2019-05-24,03:43:10,796 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> End to dump callerSet #641783987 2019-05-24,03:43:10,801 INFO org.apache.hadoop.hbase.regionserver.HRegion: Encountered an unknown exception in RegionScannerImpl: org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException: refCnt: 0, increment: 1 at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain0(AbstractReferenceCounted.java:87) at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.retain(AbstractReferenceCounted.java:74) at org.apache.hadoop.hbase.nio.RefCnt.retain(RefCnt.java:73) at org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:398) at org.apache.hadoop.hbase.nio.SingleByteBuff.retain(SingleByteBuff.java:39) at org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:457) at org.apache.hadoop.hbase.io.hfile.HFileBlock.retain(HFileBlock.java:115) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMCache.get(BucketCache.java:1539) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:483) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1306) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1472) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:339) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:843) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:794) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:394) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:249) at org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2063) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2054) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6493) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:6473) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2999) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2979) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2961) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2955) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2621) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2548) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:374) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) 2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: ===> Start to dump callerSet for #312566113 2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: --> #312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo, refCnt before release is: 1 2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: --> #312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo, refCnt before release is: 2 2019-05-24,03:43:10,813 INFO org.apache.hadoop.hbase.nio.RefCnt: --> #312566113 -> caller: CellBasedKeyBlockIndexReader#loadDataBlockWithScanInfo, refCnt
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846593#comment-16846593 ] Zheng Hu commented on HBASE-22422: -- Attached the debug4 as said above, let's see what happen in my pressure cluster. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846518#comment-16846518 ] Zheng Hu commented on HBASE-22422: -- The bug never happened since applied debug3.patch, it's some frustrating. Discussed with [~Apache9], it's possible that the stack track catcher make the release a bit slow and the concurrent bug disappear. {code} + @Override + public boolean release() { +callerSet.add(debugString(Thread.currentThread().getStackTrace(), this.refCnt())); +return super.release(); + } {code} So I plan to pass the caller's string message as an argument into release, then it won't cost that much time as the strace trace do at the same time, will continue to check those code paths. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845433#comment-16845433 ] Zheng Hu commented on HBASE-22422: -- After runing about 12 hours in my pressure cluster, still no IllegalReferenceCountException happened. That's strange, will wait some more time. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844851#comment-16844851 ] Zheng Hu commented on HBASE-22422: -- I've applied the debug3.patch into my test cluster, still waiting for the IllegalReferenceCountException... > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844564#comment-16844564 ] Zheng Hu commented on HBASE-22422: -- Update the patch with debug2.patch, which only LOG the non-data HFileblock's release caller. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844541#comment-16844541 ] Zheng Hu commented on HBASE-22422: -- After applied the debug.patch, seems it's easy to full gc and restart to the RS now because of the high Get throughput. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844460#comment-16844460 ] Zheng Hu commented on HBASE-22422: -- I've tried to check all the code and made some patch to ensure the bug, but seems it did not work. So I write a simple patch to dump all the release caller's stack trace if any IllegalReferenceCountException happen when retain. Let see what it will say. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841249#comment-16841249 ] Zheng Hu commented on HBASE-22422: -- Uploaded an initial patch for fixing above the comments, will design UT for each case , also will have a benchmark again. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22422.HBASE-21879.v1.patch, > LRUBlockCache-getBlock.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840973#comment-16840973 ] Zheng Hu commented on HBASE-22422: -- Here we should consider the unpacked block release if any prepareDecoding failure happen: {code} HFileBlock unpack(HFileContext fileContext, FSReader reader) throws IOException { if (!fileContext.isCompressedOrEncrypted()) { // TODO: cannot use our own fileContext here because HFileBlock(ByteBuffer, boolean), // which is used for block serialization to L2 cache, does not preserve encoding and // encryption details. return this; } HFileBlock unpacked = new HFileBlock(this); unpacked.allocateBuffer(); // allocates space for the decompressed block HFileBlockDecodingContext ctx = blockType == BlockType.ENCODED_DATA ? reader.getBlockDecodingContext() : reader.getDefaultBlockDecodingContext(); ByteBuff dup = this.buf.duplicate(); dup.position(this.headerSize()); dup = dup.slice(); ctx.prepareDecoding(unpacked.getOnDiskSizeWithoutHeader(), unpacked.getUncompressedSizeWithoutHeader(), unpacked.getBufferWithoutHeader(true), dup); return unpacked; } {code} > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: LRUBlockCache-getBlock.png, > image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840955#comment-16840955 ] Zheng Hu commented on HBASE-22422: -- Another risk is here, LruBlockCache#evictBlock, we should move the previous.getBuffer().release() to the last line before return because once the release decrease the refCnt to zero then nobody can access the buf (such as victimHandler). {code} protected long evictBlock(LruCachedBlock block, boolean evictedByEvictionProcess) { LruCachedBlock previous = map.remove(block.getCacheKey()); if (previous == null) { return 0; } // Decrease the block's reference count, and if refCount is 0, then it'll auto-deallocate. previous.getBuffer().release(); updateSizeMetrics(block, true); long val = elements.decrementAndGet(); if (LOG.isTraceEnabled()) { long size = map.size(); assertCounterSanity(size, val); } if (block.getBuffer().getBlockType().isData()) { dataBlockElements.decrement(); } if (evictedByEvictionProcess) { // When the eviction of the block happened because of invalidation of HFiles, no need to // update the stats counter. stats.evicted(block.getCachedTime(), block.getCacheKey().isPrimary()); if (victimHandler != null) { victimHandler.cacheBlock(block.getCacheKey(), block.getBuffer()); } } return block.heapSize(); } {code} > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: LRUBlockCache-getBlock.png, > image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840167#comment-16840167 ] Zheng Hu commented on HBASE-22422: -- The following code may also have some problem (HFileReaderImpl#readBlock): {code} // Load block from filesystem. HFileBlock hfileBlock = fsBlockReader.readBlockData(dataBlockOffset, onDiskBlockSize, pread, !isCompaction, shouldUseHeap(expectedBlockType)); validateBlockType(hfileBlock, expectedBlockType); HFileBlock unpacked = hfileBlock.unpack(hfileContext, fsBlockReader); BlockType.BlockCategory category = hfileBlock.getBlockType().getCategory(); // Cache the block if necessary AtomicBoolean cachedRaw = new AtomicBoolean(false); cacheConf.getBlockCache().ifPresent(cache -> { if (cacheBlock && cacheConf.shouldCacheBlockOnRead(category)) { cachedRaw.set(cacheConf.shouldCacheCompressed(category)); cache.cacheBlock(cacheKey, cachedRaw.get() ? hfileBlock : unpacked, cacheConf.isInMemory()); } }); if (unpacked != hfileBlock && !cachedRaw.get()) { // End of life here if hfileBlock is an independent block. hfileBlock.release(); } {code} > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: LRUBlockCache-getBlock.png, > image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840130#comment-16840130 ] Zheng Hu commented on HBASE-22422: -- An potential cause would be here: https://issues.apache.org/jira/secure/attachment/12968762/LRUBlockCache-getBlock.png 1. get block from map firstly; 2. retain the bock. Between the step.1 and step.2, if a release to zero happen, then we'll get the Exception which says we are retaining a block with refCnt=0. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: LRUBlockCache-getBlock.png, > image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)