[GitHub] [hbase] ben-manes commented on a change in pull request #3215: HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache
ben-manes commented on a change in pull request #3215: URL: https://github.com/apache/hbase/pull/3215#discussion_r648556289 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java ## @@ -158,7 +158,13 @@ public boolean containsBlock(BlockCacheKey cacheKey) { @Override public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) { -Cacheable value = cache.getIfPresent(cacheKey); +Cacheable value = cache.asMap().computeIfPresent(cacheKey, (blockCacheKey, cacheable) -> { + // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside + // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove + // the block and release, then we're retaining a block with refCnt=0 which is disallowed. + cacheable.retain(); + return cacheable; +}); Review comment: I would switch to `map.remove(cacheKey, cb)` so that a race doesn't discard a new mapping. If my naive reading is correct, this `map.remove(cacheKey)` would already occur after the `cb.release()`, so this may not be necessary. That could mean that a new block was computed, so this remove discards the wrong block mistakenly. You might not need the map removal here if you can rely on the release being performed after the map operation completed. https://github.com/apache/hbase/blob/947c03cf7249dec09162da445df7d36b8dbd4bfc/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java#L246-L252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ben-manes commented on a change in pull request #3215: HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache
ben-manes commented on a change in pull request #3215: URL: https://github.com/apache/hbase/pull/3215#discussion_r648556289 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java ## @@ -158,7 +158,13 @@ public boolean containsBlock(BlockCacheKey cacheKey) { @Override public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) { -Cacheable value = cache.getIfPresent(cacheKey); +Cacheable value = cache.asMap().computeIfPresent(cacheKey, (blockCacheKey, cacheable) -> { + // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside + // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove + // the block and release, then we're retaining a block with refCnt=0 which is disallowed. + cacheable.retain(); + return cacheable; +}); Review comment: I would switch to `map.remove(cacheKey, cb)` so that a race doesn't discard a new mapping. If my naive reading is correct, this `map.remove(cacheKey)` would already occur after the `cb.release()`, so this may not be necessary. That could mean that a new block was computed, so the remove discards that mistakenly. You might not need the map removal here if you can rely on the release being performed after the map operation completed. https://github.com/apache/hbase/blob/947c03cf7249dec09162da445df7d36b8dbd4bfc/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java#L246-L252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ben-manes commented on a change in pull request #3215: HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache
ben-manes commented on a change in pull request #3215: URL: https://github.com/apache/hbase/pull/3215#discussion_r648556289 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java ## @@ -158,7 +158,13 @@ public boolean containsBlock(BlockCacheKey cacheKey) { @Override public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) { -Cacheable value = cache.getIfPresent(cacheKey); +Cacheable value = cache.asMap().computeIfPresent(cacheKey, (blockCacheKey, cacheable) -> { + // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside + // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove + // the block and release, then we're retaining a block with refCnt=0 which is disallowed. + cacheable.retain(); + return cacheable; +}); Review comment: I would switch to `map.remove(cacheKey, cb)` so that a race doesn't discard a new mapping. If my naive reading is correct, this `map.remove(cacheKey)` would already occur before the `cb.release()`, so this may not be necessary. That could mean that a new block was computed, so the remove discards that mistakenly. You might not need the map removal here if you can rely on the release being performed after the map operation completed. https://github.com/apache/hbase/blob/947c03cf7249dec09162da445df7d36b8dbd4bfc/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java#L246-L252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ben-manes commented on a change in pull request #3215: HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache
ben-manes commented on a change in pull request #3215: URL: https://github.com/apache/hbase/pull/3215#discussion_r638239220 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java ## @@ -158,7 +158,13 @@ public boolean containsBlock(BlockCacheKey cacheKey) { @Override public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) { -Cacheable value = cache.getIfPresent(cacheKey); +Cacheable value = cache.asMap().computeIfPresent(cacheKey, (blockCacheKey, cacheable) -> { + // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside + // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove + // the block and release, then we're retaining a block with refCnt=0 which is disallowed. + cacheable.retain(); + return cacheable; +}); Review comment: Thanks @saintstack. This was from my analysis when contributing the original patch. Zipfian is wonderful for a perf benchmark by stressing locks, etc. to find bottlenecks, but isn't realistic for actual production performance. I'm not sure if there is a great approach other than network record/replay or canarying. If you have a workload trace we can try to simulate that, where the hit rates should be better (e.g. [database trace](https://github.com/ben-manes/caffeine/wiki/Efficiency#database). That wouldn't show actual system behavior, just the cache's expected hit rates in isolation. HBase's LRU is similar-ish to SLRU, so then ARC might be a good upper bound of expectations. Between zipfian benchmark and trace simulations, we can get a roughish idea of if there is a benefit. Otherwise canarying is the best that I've seen so far, which is a bit heavy handed but simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ben-manes commented on a change in pull request #3215: HBASE-25698 Fixing IllegalReferenceCountException when using TinyLfuBlockCache
ben-manes commented on a change in pull request #3215: URL: https://github.com/apache/hbase/pull/3215#discussion_r638213661 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java ## @@ -158,7 +158,13 @@ public boolean containsBlock(BlockCacheKey cacheKey) { @Override public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) { -Cacheable value = cache.getIfPresent(cacheKey); +Cacheable value = cache.asMap().computeIfPresent(cacheKey, (blockCacheKey, cacheable) -> { + // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside + // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove + // the block and release, then we're retaining a block with refCnt=0 which is disallowed. + cacheable.retain(); + return cacheable; +}); Review comment: I suspect that you might be able to switch to an optimistic read, where you validate that the `cacheable.retain()` was successful. My naive inspection of the code is that it will throw an `IllegalReferenceCountException`, which could be caught and returned as a cache miss. Because it is decremented after the entry was removed from the mapping (in `evictBlock`, though I don't see a release in the removal listener), the stale read should be fine as a subsequent read/write would not observe that discarded entry. In YCSB benchmarks, `TinyLfuBlockCache` is a little bit slower because Caffeine adds a small overhead on reads to have O(1) evictions with a higher hit rate. In comparison, the `LruBlockCache` does no policy work on read and performs an O(n lg n) operation on eviction. Due to using an SLRU policy, the Zipfian workload has the same optimal hit rate, causing only the small policy cost to on reads to be exposed. In a realistic workload, however, one will expect that the TinyLfu policy should have an improved hit rate which will reduces latencies (more requests served from cache, reducing I/O and cpu load). The O(1) eviction should be helpful for large caches when the system in under high load, as it avoids a spike of work and amortizes the cost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org