[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2019-05-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--
Resolution: Implemented
Status: Resolved  (was: Patch Available)

This has been implemented now with the ROW_INDEX_V1 block encoding. Closing...

> Use binary search for seeking into a block
> --
>
> Key: HBASE-11811
> URL: https://issues.apache.org/jira/browse/HBASE-11811
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: 11811-wip-v2.txt, 11811-wip-v4.txt, block_index-v2.txt
>
>
> Currently upon every seek (including Gets) we need to linearly look through 
> the block from the beginning until we find the Cell we are looking for.
> It should be possible to build a simple cache of offsets of Cells for each 
> block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-29 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Attachment: 11811-wip-v4.txt

Slightly updated version.

In all my local testing I see great improvement for Gets and no measurable 
penalty for Scans.

In the worst case (if each block only receives a single Get request) we can 
expect that on average we now look through twice as many KV. If the block is 
indeed loaded from HDFS each time, this is negligible.

I'll try to test with a real cluster and PE soon.


 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: 11811-wip-v2.txt, 11811-wip-v4.txt, block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-24 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Attachment: 11811-wip.txt

Fixes TestHFileBlock. Possible improvements:
* build index as we go to avoid building the complete index when only a single 
Cell is read from a block
* persist the index. At write time we can pretty much get the offsets without 
extra cost
* make this configurable...? We're adding 4 bytes per Cell of heap usage.


 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: 11811-wip.txt, block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-24 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Attachment: (was: 11811-wip.txt)

 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: 11811-wip-v2.txt, block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-24 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Attachment: 11811-wip-v2.txt

This version gets rid of the unused code and also makes block seeking identical 
between HFileReaderV2 and HFileReaderV3.

 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: 11811-wip-v2.txt, block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-23 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Attachment: block_index-v2.txt

Here's sample patch that reduce the time for 1m gets from 40s to 8.5s and there 
is probably more room for optimization.
The data was simply generated by HBaseTestingUtility.loadTable(...) so the KVs 
are small.

Some points:
# the utility of this decreases as Cell get larger and only a few of them fit 
into a block
# the index is not persisted, so when blocks are evicted and later reloaded the 
index needs to be build up again
# not happy currently about the point where the index is built, as that needs 
to synchronize on the block (but only when the block actually had to be loaded)


 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11811) Use binary search for seeking into a block

2014-08-23 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11811:
--

Status: Patch Available  (was: Open)

 Use binary search for seeking into a block
 --

 Key: HBASE-11811
 URL: https://issues.apache.org/jira/browse/HBASE-11811
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
 Attachments: block_index-v2.txt


 Currently upon every seek (including Gets) we need to linearly look through 
 the block from the beginning until we find the Cell we are looking for.
 It should be possible to build a simple cache of offsets of Cells for each 
 block as it is loaded and then use binary search to find the Cell in question.



--
This message was sent by Atlassian JIRA
(v6.2#6252)