[jira] [Commented] (HBASE-15545) org.apache.hadoop.io.compress.DecompressorStream allocates too much memory
[ https://issues.apache.org/jira/browse/HBASE-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804097#comment-16804097 ] Eiichi Sato commented on HBASE-15545: - This is a picture comparing GC activities before and after these patches are applied. Note that the green RegionServer in this picture is always with these patches applied. We wanted to test these patches on a small number of servers first. !image-2019-03-29-01-20-56-863.png! > org.apache.hadoop.io.compress.DecompressorStream allocates too much memory > -- > > Key: HBASE-15545 > URL: https://issues.apache.org/jira/browse/HBASE-15545 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: image-2019-03-29-01-20-56-863.png > > > It accounts for ~ 11% of overall memory allocation during compaction when > compression (GZ) is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-15545) org.apache.hadoop.io.compress.DecompressorStream allocates too much memory
[ https://issues.apache.org/jira/browse/HBASE-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eiichi Sato updated HBASE-15545: Attachment: image-2019-03-29-01-20-56-863.png > org.apache.hadoop.io.compress.DecompressorStream allocates too much memory > -- > > Key: HBASE-15545 > URL: https://issues.apache.org/jira/browse/HBASE-15545 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: image-2019-03-29-01-20-56-863.png > > > It accounts for ~ 11% of overall memory allocation during compaction when > compression (GZ) is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15545) org.apache.hadoop.io.compress.DecompressorStream allocates too much memory
[ https://issues.apache.org/jira/browse/HBASE-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804066#comment-16804066 ] Eiichi Sato commented on HBASE-15545: - We had the same issue and found that this issue gets more serious when Lz4Codec or SnappyCodec (both of which by default allocates 256 KiB for every decompress(), which is far larger than the default 4 KiB of GzipCodec) is used or when compressed BlockCache (where every read needs to decompress() cached blocks) is enabled. In our case, DecompressorStream accounts for about 48% of memory allocations in a compaction thread, and more than half of RegionServer's memory allocations in total. Allocation rate was too high and we had suffered from occasional allocation stalls with ZGC. https://github.com/eiiches/hbase/commit/ad1ec4081b0ec9af5e20befaa1d09d0852e60d02 https://github.com/eiiches/hadoop/commit/e3337840b6e34236342c039b8a0b9fb9fcccfa40 We applied these patches to our cluster and saw 60-70% reduction in allocation rate. My approach is to cache DecompressorStream "weakly" in ThreadLocal and reuse them. WeakReference is used so that the cache won't be retained too long because I thought many people (especially for those who don't use compressed BlockCache) would prefer to keep heap usage minimum at the cost of slightly more frequent re-allocations. What do you think? As this requires a change to hadoop-common, I think I will go propose the Hadoop part of the change to the community as a first step, if you like this fix. > org.apache.hadoop.io.compress.DecompressorStream allocates too much memory > -- > > Key: HBASE-15545 > URL: https://issues.apache.org/jira/browse/HBASE-15545 > Project: HBase > Issue Type: Sub-task > Components: Compaction >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: image-2019-03-29-01-20-56-863.png > > > It accounts for ~ 11% of overall memory allocation during compaction when > compression (GZ) is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675394#comment-15675394 ] Eiichi Sato commented on HBASE-17072: - I think ThreadLocal#remove() needs to be called on every handler thread because it only removes a thread-local value on the calling thread. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657155#comment-15657155 ] Eiichi Sato commented on HBASE-17072: - The problem is called [primary clustering|https://en.wikipedia.org/wiki/Primary_clustering]. Actually, {{java.lang.ThreadLocal}} provided by Oracle JDK is implemented using open addressing based hash table with linear probing which, as described in the link, is known to be vulnerable to this phenomenon. On the other hand {{java.util.HashMap}} is backed by chained hash tables, which fails gradually to hash collisions. This is the reason why I think we shouldn't use an indefinite number of ThreadLocal instances. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657123#comment-15657123 ] Eiichi Sato commented on HBASE-17072: - Thanks for the suggestion. I wasn't aware of HBASE-17017. However, the problem I want to talk here is not about them being expensive themselves. They are interfered by thousands of {{ThreadLocal}} in the ThreadLocalMap and eventually that will make innocent ThreadLocal users (in this case, counters and histograms, prior to HBASE-16146) waste too much CPU time. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eiichi Sato updated HBASE-17072: Attachment: mat-threadlocals.png metrics.png disable-block-header-cache.patch mat-threads.png slave1.svg slave2.svg slave3.svg slave4.svg > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
Eiichi Sato created HBASE-17072: --- Summary: CPU usage starts to climb up to 90-100% when using G1GC Key: HBASE-17072 URL: https://issues.apache.org/jira/browse/HBASE-17072 Project: HBase Issue Type: Bug Components: Performance, regionserver Affects Versions: 1.2.0, 1.0.0 Reporter: Eiichi Sato h5. Problem CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts to gradually get higher up to nearly 90-100% when using G1GC. We've also run into this problem on CDH 5.7.3 and CDH 5.8.2. In our production cluster, it normally takes a few weeks for this to happen after restarting a RS. We reproduced this on our test cluster and attached the results. Please note that, to make it easy to reproduce, we did some "anti-tuning" on a table when running tests. In metrics.png, soon after we started running some workloads against a test cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of each RS process around 10:30 a.m. the next day. After investigating heapdumps from another occurrence on a test cluster running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary clustering. This caused more loops in {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU time. What is worse is that the method is called from RPC metrics code, which means even a small amount of per-RPC time soon adds up to a huge amount of CPU time. This is very similar to the issue in HBASE-16616, but we have many {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a number of ThreadLocal instances in the ThreadLocalMap of a single handler thread. {code} SELECT * FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value FROM OBJECTS 0x4ee380430) obj WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" #=> 10980 instances {code} {code} SELECT * FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value FROM OBJECTS 0x4ee380430) obj WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" #=> 2052 instances {code} Although as described in HBASE-16616 this somewhat seems to be an issue in G1GC side regarding weakly-reachable objects, we should keep ThreadLocal usage minimal and avoid creating an indefinite number (in this case, a number of HFiles) of ThreadLocal instances. HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve the issue (I just saw the patch, never tested it at all), but the {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which may cause issues in the future again. h5. Our Solution We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and fortunately we didn't notice any performance degradation for our production workloads. Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are handled randomly in any of the handlers, small Get or small Scan RPCs do not benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). Probably, we need to see how well reads are saved by the caching for large Scan or Get RPCs and especially for compactions if we really remove the caching. It's probably better if we can remove ThreadLocals without breaking the current caching behavior. FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)