[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2017-06-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9969:
-
Fix Version/s: (was: 2.0.0)

> Improve KeyValueHeap using loser tree
> -
>
> Key: HBASE-9969
> URL: https://issues.apache.org/jira/browse/HBASE-9969
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, regionserver
>Reporter: Chao Shi
>Assignee: Chao Shi
> Attachments: 9969-0.94.txt, hbase-9969.patch, hbase-9969.patch, 
> hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, 
> hbase-9969-v3.patch, KeyValueHeapBenchmark_v1.ods, 
> KeyValueHeapBenchmark_v2.ods, kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the 
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
> read from multiple HFiles in a single store, the other is merging results 
> from multiple stores. This patch should improve the both cases whenever CPU 
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its 
> public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2014-09-08 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9969:
-
Fix Version/s: (was: 0.99.0)
   2.0.0

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 2.0.0

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2014-03-12 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-9969:
--

Fix Version/s: (was: 0.98.1)

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.99.0

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2014-02-03 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-9969:
--

Status: Open  (was: Patch Available)

Cancelling stale patch

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.1, 0.99.0

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9969:
-

Fix Version/s: (was: 0.96.1)
   0.99.0

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.1, 0.99.0

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-12-01 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-9969:
--

Fix Version/s: (was: 0.98.0)
   0.98.1

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.96.1, 0.98.1

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-19 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-9969:
---

Attachment: hbase-9969-pq-v2.patch
KeyValueHeapBenchmark_v2.ods

attaching hbase-9969-pq-v2.patch and KeyValueHeapBenchmark_v2.ods

* this patch will hopefully apply (previous one accidentally had 
BenchmarkableKeyValueHeap.java in the src/test directory)
* KeyValueScannerPriorityQueue is no longer based on PriorityQueue.  it's now a 
custom implementation with class comments explaining
* KeyValueScannerHeap (based on KeyValueScannerPriorityQueue) looks to beat the 
existing KeyValueHeap in all cases (tested up to 32 scanners)
* LoserTree is usually faster with more scanners, but still slower when many 
consecutive KVs come from the same scanner

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-18 Thread Chao Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9969:


Attachment: hbase-9969-v3.patch

upload v3
- add an assertion as Ted mentioned
- prepend a 16 bytes prefix to row keys in the benchmark (will update the 
benchmark result tomorrow when I have access to my devbox)

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, 
 hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9969:
-

Fix Version/s: (was: 0.94.15)

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, 
 hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-18 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-9969:
---

Attachment: hbase-9969-pq-v1.patch

attaching hbase-9969-pq-v1.patch

* adds KeyValueScannerPriorityQueue, which is a stripped down copy of 
PriorityQueue that we can play with
* adds KeyValueScannerHeap (almost identical to KeyValueHeap, but uses the 
above)
* includes LoserTreeKeyValueHeap and LoserTree
* each of the 3 heaps implments BenchmarkableKeyValueHeap (not complete)
* enhances KeyValueHeapBenchmark to benchmark all 3 implementations

* always tests 1mm KVs, no matter how many scanners
* sorts the input KVs, though that doesn't seem to matter much
* does a few warmup runs

One problem is that this goes from 1 to 3 implementations behind the same 
interface which may not get inlined as well.  Absolute performance will 
therefore be lower, but hopefully performance will be comparable across the 3 
implementations.

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: 9969-0.94.txt, hbase-9969-pq-v1.patch, 
 hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-18 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-9969:
---

Attachment: KeyValueHeapBenchmark_v1.ods

Attaching KeyValueHeapBenchmark_v1.ods with the benchmark output for both 1 
col/row and 16 cols/row.

Some of it's hard to explain.  Looks like LoserTree is often faster at next() 
when there is more heaping to do, but not when KVs are coming from the same 
scanner, like when numScanners=1 or colsPerRow=16.  Maybe because it doesn't 
have an optimization for that case?

Anyway, just putting it up there for people to poke holes in or continue to 
test.

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
 hbase-9969-pq-v1.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, 
 hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Chao Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9969:


Attachment: kvheap-benchmark.txt
kvheap-benchmark.png
hbase-9969.patch

I ran two benchmarks:

A) KeyValueHeapBenchmark class included in the patch

It simply constructs a KeyValueHeap from several 
CollectionBackedKeyValueScanner and sees how many next/reseek calls per second.

||scanners|| lt-next || lt-reseek || pq-next || pq-reseek ||
|1|17543859.6|3058104|18181818.2|1798561.2|
|2|11299435|5102040.8|11173184.4|3053435.1|
|3|8547008.5|4854368.9|7915567.3|2859866.5|
|4|7936507.9|4866180|5891016.2|2507837|
|5|6711409.4|4739336.5|4748338.1|2296738.6|

lt- denotes LoserTree based KeyValueHeap.
pq- denotes PriorityQueue based KeyValueHeap.
A complete result (with up to 19 scanners) is attached.

B) ColumnPaginationFilter with offset=1M
I ran a mini-cluster and put a huge number of columns on a single row. Thes 
columns are uniformly written to several HFiles. Then query using 
ColumnPaginationFilter with offset = 1M. Blocks are cached, so it is CPU 
intensive. Qualifiers and values are 4 byte integers. Row key is test_row. 
Blocks are not compressed.

The below table shows how long the scan takes.

|| hfiles || lt || pq ||
| 1 | 749.8 ms | 986.69 ms |
| 2 |1511.28 ms | 2190.97 ms |
| 3 |2392.8 ms | 4029.8 ms |
| 4 | 3318.8 ms | 5760.22 ms



 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
Reporter: Chao Shi
 Attachments: hbase-9969.patch, kvheap-benchmark.png, 
 kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Chao Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9969:


Status: Patch Available  (was: Open)

Submit to hadoop QA.

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
Reporter: Chao Shi
 Attachments: hbase-9969.patch, kvheap-benchmark.png, 
 kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Chao Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9969:


Attachment: hbase-9969.patch

Repost the patch, as hadoop QA thought the benchmark result is the patch

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
Reporter: Chao Shi
 Attachments: hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-9969:


Assignee: Chao Shi

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
Reporter: Chao Shi
Assignee: Chao Shi
 Attachments: hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-9969:


Component/s: regionserver

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Attachments: hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9969:
-

  Component/s: Performance
Fix Version/s: 0.96.0
   0.96.1

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.96.0, 0.96.1

 Attachments: hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9969:
-

Fix Version/s: (was: 0.96.0)
   0.98.0

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Chao Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9969:


Attachment: hbase-9969-v2.patch

v2 is attached, fixed Ted mentioned problems.

https://reviews.apache.org/r/15563

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1

 Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, 
 kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9969:
-

Fix Version/s: 0.94.15

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969.patch, 
 hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9969:
-

Attachment: 9969-0.94.txt

0.94 patch for playing.

I ran my old tight loop scan test (with a real client and real server).
Time to scan through 20rows (1 col) went from 16.8s to 11.6s (using 
WildcardColumntracker - no columns specified in scan).
When using ExplicitColumnTracker it went from 26s to 21s.

That's pretty impressive. On top of that the code is easier to understand now.


 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969.patch, 
 hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)