[ https://issues.apache.org/jira/browse/HBASE-24298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158851#comment-17158851 ]
star edited comment on HBASE-24298 at 7/16/20, 3:08 AM: -------------------------------------------------------- A prototype implementation for review. It is a configurable feature in hbase client. You can activate the optimization per table by setting properties as following. {code:java} // table 'user' is optimized with ByteCopyOnWriteArrayMap conf.setString("hbase.client.meta.cache.optimize.user", "4,2");{code} Hbase client will cache meta data of table 'user', and locate regions in an optimized way. "4,2" means skipping 4 bytes and using followed 2 bytes to contract the ranger of binary search. The optimization reduce 50% of cpu time in a simple performance test case from our production environment TestByteCopyOnWriteMaps#testLowerKeyPerformance. was (Author: starphin): A prototype implementation for review. It is a configurable feature in hbase client. You can activate the optimization per table by setting properties as following. {code:java} // table 'user' is optimized with ByteCopyOnWriteArrayMap conf.setString("hbase.client.meta.cache.optimize.user", "4,2");{code} Hbase client will cache meta data of table 'user', and locate regions in an optimized way. "4,2" means skipping 4 bytes and using followed 2 bytes to contract the ranger of binary search. > Reduce cpu load of locating region especially in batch mode. > ------------------------------------------------------------ > > Key: HBASE-24298 > URL: https://issues.apache.org/jira/browse/HBASE-24298 > Project: HBase > Issue Type: Bug > Reporter: star > Assignee: star > Priority: Major > Attachments: HBASE-24298.patch, locating region.svg > > > Binary search is used to speedup the process of locating region. It is > already fast enough, while cpu of HBASE client becomes the bottleneck when > doing TCSB benchmark. We can make the process of locating region faster to > reduce cpu load in some special cases , which however is our common case in > production environment. It is the case: > 1. Predefined splits in uniform distribution. > > 2. Load data in batch mode. > The optimization is very simple, just to contract range of binary search. > Initially, record all startIndex and endIndex of first or two bytes of keys. > When a region key comes, find the contracted startIndex and endIndex of the > key. Then return to normal binary search process with the specified > startIndex and endIndex. > Then we can ideally reduce cpu to 1/8 with 1 byte or 1/16 with 2 bytes. -- This message was sent by Atlassian Jira (v8.3.4#803005)