[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Resolution: Implemented Status: Resolved (was: Patch Available) This has been implemented now with the ROW_INDEX_V1 block encoding. Closing... > Use binary search for seeking into a block > -- > > Key: HBASE-11811 > URL: https://issues.apache.org/jira/browse/HBASE-11811 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vladimir Rodionov >Priority: Major > Attachments: 11811-wip-v2.txt, 11811-wip-v4.txt, block_index-v2.txt > > > Currently upon every seek (including Gets) we need to linearly look through > the block from the beginning until we find the Cell we are looking for. > It should be possible to build a simple cache of offsets of Cells for each > block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Attachment: 11811-wip-v4.txt Slightly updated version. In all my local testing I see great improvement for Gets and no measurable penalty for Scans. In the worst case (if each block only receives a single Get request) we can expect that on average we now look through twice as many KV. If the block is indeed loaded from HDFS each time, this is negligible. I'll try to test with a real cluster and PE soon. Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 11811-wip-v2.txt, 11811-wip-v4.txt, block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Attachment: 11811-wip.txt Fixes TestHFileBlock. Possible improvements: * build index as we go to avoid building the complete index when only a single Cell is read from a block * persist the index. At write time we can pretty much get the offsets without extra cost * make this configurable...? We're adding 4 bytes per Cell of heap usage. Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 11811-wip.txt, block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Attachment: (was: 11811-wip.txt) Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 11811-wip-v2.txt, block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Attachment: 11811-wip-v2.txt This version gets rid of the unused code and also makes block seeking identical between HFileReaderV2 and HFileReaderV3. Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 11811-wip-v2.txt, block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Attachment: block_index-v2.txt Here's sample patch that reduce the time for 1m gets from 40s to 8.5s and there is probably more room for optimization. The data was simply generated by HBaseTestingUtility.loadTable(...) so the KVs are small. Some points: # the utility of this decreases as Cell get larger and only a few of them fit into a block # the index is not persisted, so when blocks are evicted and later reloaded the index needs to be build up again # not happy currently about the point where the index is built, as that needs to synchronize on the block (but only when the block actually had to be loaded) Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11811: -- Status: Patch Available (was: Open) Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.2#6252)