Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
wchevreuil merged PR #7995: URL: https://github.com/apache/hbase/pull/7995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
dParikesit commented on code in PR #7995:
URL: https://github.com/apache/hbase/pull/7995#discussion_r3058161112
##
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:
##
@@ -244,9 +244,10 @@ public void onLeak(String s, String s1) {
Mockito.verify(cache).shouldCacheBlock(Mockito.any(), Mockito.anyLong(),
Mockito.any());
Mockito.verify(cache, Mockito.never()).cacheBlock(Mockito.any(),
Mockito.any(),
Mockito.anyBoolean(), Mockito.anyBoolean());
- System.gc();
- Thread.sleep(1000);
- alloc.allocate(128 * 1024).release();
+ for (int i = 0; i < 15 && counter.get() == 0; i++) {
+System.gc();
+alloc.allocate(128 * 1024).release();
+ }
Review Comment:
Thanks for the feedback! I believe the sleep should be done after we run
allocate.release() right? I have updated the commit to change it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
vaijosh commented on code in PR #7995:
URL: https://github.com/apache/hbase/pull/7995#discussion_r3055418544
##
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:
##
@@ -244,9 +244,10 @@ public void onLeak(String s, String s1) {
Mockito.verify(cache).shouldCacheBlock(Mockito.any(), Mockito.anyLong(),
Mockito.any());
Mockito.verify(cache, Mockito.never()).cacheBlock(Mockito.any(),
Mockito.any(),
Mockito.anyBoolean(), Mockito.anyBoolean());
- System.gc();
- Thread.sleep(1000);
- alloc.allocate(128 * 1024).release();
+ for (int i = 0; i < 15 && counter.get() == 0; i++) {
+System.gc();
+alloc.allocate(128 * 1024).release();
+ }
Review Comment:
@dParikesit
You have not added sleep inside the loop so it will finish quicky and
deafeat the purpose.
Please keep it something like below
for (int i = 0; i < 30 && counter.get() == 0; i++) {
System.gc();
alloc.allocate(128 * 1024).release();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
##
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java:
##
@@ -258,9 +256,10 @@ public void onLeak(String s, String s1) {
try {
writeDataBlocksAndCreateIndex(hbw, outputStream, biw);
assertTrue(biw.getNumLevels() >= 3);
- System.gc();
- Thread.sleep(1000);
- allocator.allocate(128 * 1024).release();
+ for (int i = 0; i < 15 && counter.get() == 0; i++) {
+System.gc();
+allocator.allocate(128 * 1024).release();
+ }
Review Comment:
@dParikesit
Please add Put slip inside the loop and make 30 retries instead of 15
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
dParikesit commented on PR #7995: URL: https://github.com/apache/hbase/pull/7995#issuecomment-4208552365 @wchevreuil thanks for the reminder. I have modified the tests to use retry-loop instead of sleep (including the test added in [HBASE-28890](https://issues.apache.org/jira/browse/HBASE-28890) ). I've tested it on my machine. Can you help trigger the CI pipeline? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
wchevreuil commented on PR #7995: URL: https://github.com/apache/hbase/pull/7995#issuecomment-4205473343 Any news on this, @dParikesit ? Please get the reviews addressed so that we can move forward here, as this is an important fix you have found. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
dParikesit commented on PR #7995: URL: https://github.com/apache/hbase/pull/7995#issuecomment-4155825124 Thanks for the suggestions. I'll get back to you after I fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
wchevreuil commented on PR #7995: URL: https://github.com/apache/hbase/pull/7995#issuecomment-4154337190 Please consider @vaijosh suggestions and also fix spotless issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
vaijosh commented on code in PR #7995:
URL: https://github.com/apache/hbase/pull/7995#discussion_r3007321459
##
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java:
##
@@ -216,6 +216,60 @@ public void onLeak(String s, String s1) {
assertEquals(0, counter.get());
}
+ @Test
+ public void testIntermediateIndexCacheOnWriteDoesNotLeak() throws Exception {
+Configuration localConf = new Configuration(TEST_UTIL.getConfiguration());
+localConf.setInt(HFile.FORMAT_VERSION_KEY, HFile.MAX_FORMAT_VERSION);
+localConf.setBoolean(CacheConfig.CACHE_INDEX_BLOCKS_ON_WRITE_KEY, true);
+localConf.setInt(ByteBuffAllocator.BUFFER_SIZE_KEY, 4096);
+localConf.setInt(ByteBuffAllocator.MAX_BUFFER_COUNT_KEY, 32);
+localConf.setInt(ByteBuffAllocator.MIN_ALLOCATE_SIZE_KEY, 0);
+ByteBuffAllocator allocator = ByteBuffAllocator.create(localConf, true);
+List buffers = new ArrayList<>();
+for (int i = 0; i < allocator.getTotalBufferCount(); i++) {
+ buffers.add(allocator.allocateOneBuffer());
+ assertEquals(0, allocator.getFreeBufferCount());
+}
+buffers.forEach(ByteBuff::release);
+assertEquals(allocator.getTotalBufferCount(),
allocator.getFreeBufferCount());
+ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID);
+final AtomicInteger counter = new AtomicInteger();
+RefCnt.detector.setLeakListener(new ResourceLeakDetector.LeakListener() {
+ @Override
+ public void onLeak(String s, String s1) {
+counter.incrementAndGet();
+ }
+});
+
+Path localPath = new Path(TEST_UTIL.getDataTestDir(),
+ "block_index_testIntermediateIndexCacheOnWriteDoesNotLeak_" + compr);
+HFileContext meta = new HFileContextBuilder().withHBaseCheckSum(true)
+
.withIncludesMvcc(includesMemstoreTS).withIncludesTags(true).withCompression(compr)
+ .withBytesPerCheckSum(HFile.DEFAULT_BYTES_PER_CHECKSUM).build();
+HFileBlock.Writer hbw = new HFileBlock.Writer(localConf, null, meta,
allocator,
+ meta.getBlocksize());
+FSDataOutputStream outputStream = fs.create(localPath);
+LruBlockCache cache = new LruBlockCache(8 * 1024 * 1024, 1024, true,
localConf);
+CacheConfig cacheConfig = new CacheConfig(localConf, null, cache,
allocator);
+HFileBlockIndex.BlockIndexWriter biw =
+ new HFileBlockIndex.BlockIndexWriter(hbw, cacheConfig,
localPath.getName(), null);
+biw.setMaxChunkSize(512);
+
+try {
+ writeDataBlocksAndCreateIndex(hbw, outputStream, biw);
+ assertTrue(biw.getNumLevels() >= 3);
+ System.gc();
+ Thread.sleep(1000);
Review Comment:
@dParikesit
I think the 1-second assumption might bite us on slower systems and make the
test flaky.
What if we wrap this in a loop instead? We could retry up to 15 times with a
small delay—that way it succeeds as soon as it's ready rather than relying on
luck.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] HBASE-30038: RefCnt Leak error when caching [hbase]
vaijosh commented on code in PR #7995:
URL: https://github.com/apache/hbase/pull/7995#discussion_r3007309262
##
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:
##
@@ -202,6 +207,53 @@ public void testReaderWithLRUBlockCache() throws Exception
{
lru.shutdown();
}
+ @Test
+ public void testWriterCacheOnWriteSkipDoesNotLeak() throws Exception {
+int bufCount = 32;
+int blockSize = 4 * 1024;
+ByteBuffAllocator alloc = initAllocator(true, blockSize, bufCount, 0);
+fillByteBuffAllocator(alloc, bufCount);
+ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID);
+Configuration myConf = HBaseConfiguration.create(conf);
+myConf.setBoolean(CacheConfig.CACHE_BLOCKS_ON_WRITE_KEY, true);
+myConf.setBoolean(CacheConfig.CACHE_INDEX_BLOCKS_ON_WRITE_KEY, false);
+myConf.setBoolean(CacheConfig.CACHE_BLOOM_BLOCKS_ON_WRITE_KEY, false);
+final AtomicInteger counter = new AtomicInteger();
+RefCnt.detector.setLeakListener(new ResourceLeakDetector.LeakListener() {
+ @Override
+ public void onLeak(String s, String s1) {
+counter.incrementAndGet();
+ }
+});
+BlockCache cache = Mockito.mock(BlockCache.class);
+Mockito.when(cache.shouldCacheBlock(Mockito.any(), Mockito.anyLong(),
Mockito.any()))
+ .thenReturn(Optional.of(false));
+Path hfilePath = new Path(TEST_UTIL.getDataTestDir(),
"testWriterCacheOnWriteSkipDoesNotLeak");
+HFileContext context = new
HFileContextBuilder().withBlockSize(blockSize).build();
+
+try {
+ Writer writer = new HFile.WriterFactory(myConf, new CacheConfig(myConf,
null, cache, alloc))
+.withPath(fs, hfilePath).withFileContext(context).create();
+ try {
+writer.append(new KeyValue(Bytes.toBytes("row"), Bytes.toBytes("cf"),
Bytes.toBytes("q"),
+ HConstants.LATEST_TIMESTAMP, Bytes.toBytes("value")));
+ } finally {
+writer.close();
+ }
+
+ Mockito.verify(cache).shouldCacheBlock(Mockito.any(), Mockito.anyLong(),
Mockito.any());
+ Mockito.verify(cache, Mockito.never()).cacheBlock(Mockito.any(),
Mockito.any(),
+Mockito.anyBoolean(), Mockito.anyBoolean());
+ System.gc();
+ Thread.sleep(1000);
Review Comment:
@dParikesit
I think the 1-second assumption might bite us on slower systems and make the
test flaky.
What if we wrap this in a loop instead? We could retry up to 15 times with a
small delay—that way it succeeds as soon as it's ready rather than relying on
luck.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
[PR] HBASE-30038: RefCnt Leak error when caching [hbase]
dParikesit opened a new pull request, #7995: URL: https://github.com/apache/hbase/pull/7995 JIRA: [HBASE-30038](https://issues.apache.org/jira/browse/HBASE-30038) This bug is similar to [HBASE-28890](https://issues.apache.org/jira/browse/HBASE-28890) HFileBlock.Writer.getBlockForCaching() creates a ref-counted HFileBlock, and the caller must release its own reference after the cache has taken the ownership it needs. In HFileWriterImpl.java, the leak happened when shouldCacheBlock() returned before cacheFormatBlock.release() ran. In HFileBlockIndex.java, the intermediate-index path cached blockForCaching but never released the reference at all. Because these blocks can be backed by pooled off-heap ByteBuffs, repeated HFile writes could steadily drain allocator buffers and effectively leak memory, even though the blocks were only meant to live long enough to be considered for cache-on-write. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
