[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2018-02-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355000#comment-16355000
 ] 

Anoop Sam John commented on HBASE-17339:


The BF blocks need NOT be always in memory.  It depends on the cache size and 
access pattern.  By default if we have on heap LRU cache alone, all the index, 
bloom and data blocks come to there and chances of any type of block miss is 
possible.  When one uses Bucket cache (We called it L2 but it not not really L2 
any more), the data blocks will be in BC always and the on heap cache will keep 
index, bloom blocks.  More likely we may have the blocks always in cache but 
this cache also size limited and so miss is possible.   This issue is closed 
now.  Did not see much perf boost. This was/is an interesting issue..

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
>Priority: Major
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-06-04 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036219#comment-16036219
 ] 

Edward Bortnikov commented on HBASE-17339:
--

Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to 
figure out the current implementation, to provide the community with the full 
picture (smile). 

We looked at a workload with temporal (rather than spatial) locality, namely 
writes closely followed by reads. This pattern is quite frequent in pub-sub 
scenarios. Instead of seeing a performance benefit in reading from MemStore 
first, we saw nearly 100% cache hit rate, and could not explain it for a while. 
The lazy evaluation procedure described by [~eshcar] sheds the light. 

Obviously, explicitly prioritizing reading from MemStore first rather than 
simply deferring the data fetch from disk could help avoid some access to Bloom 
filters, just to figure out whether the key has earlier versions on disk. Those 
accesses could be avoided. The main practical impact is when the BF itself is 
not in memory, and accessing it triggers I/O. Is that a realistic scenario? We 
assume that normally, BF's are permanently cached for all HFile's managed by 
the RS. 

Dear community - please speak up. Thanks. 

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-06-01 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033707#comment-16033707
 ] 

Eshcar Hillel commented on HBASE-17339:
---

After some time away from this Jira, and some additional experiments and 
digging into the code, here are our current understanding:
HBase already implements some optimization which makes the current suggestion 
less critical. I will try to explain it in a nutshell.
As mentioned, a get operation is divided into two main steps
(1) creating and filtering all HFile scanners and memory scanners,
(2) applying the next operation which retrieves the result for the operation.
HBase defers the seek operation of the scanners as much as possible. In step 
(1) all scanners are combined in a key-value heap which is sorted by the top 
key of all scanners. However if there is more than one scanner, then the HFiles 
scanners do not apply real seek. Instead they set the current cell to be a fake 
cell which simulates as if a seek to the key was done.  In cases were the key 
can be found both in memory and on disk memory segments have higher timestamps, 
and they reside at the top of the heap. Finally, in step (2) the store scanner 
gets the result from the scanners heap. It starts querying the scanners at the 
top. Only at this point if an HFile scanner is polled from the heap and no real 
seek was done HBase seeks the key in the file. This seek might end up finding 
the blocks in the cache or it retrieves them from disk.
In addition, in step (1) filtering HFile scanners requires reading HFile 
metadata and bloom filters -- in most cases these can be found in cache.

The optimization implemented in this Jira takes a different approach by trying 
to only look in memory segments as first step. When the data is found in memory 
this indeed reduces latency since it avoids the need to read HFile metadata and 
bloom filters and manages a bigger scanners heap, but when the data is only on 
disk it incurs the overhead of scanning the data twice (memory only and then 
full scan).

The question is, given this understanding is there a point in having the new 
optimization, or are we satisfied with the current one?
Is there a known scenario where not all bloom filters and metadata blocks are 
found in the cache?

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-27 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943573#comment-15943573
 ] 

Ben Manes commented on HBASE-17339:
---

I think its really difficult to tell, but I'd guess that there might be a small 
gain.

Those 30M misses sound compulsory, meaning that they would occur regardless of 
the cache size. Therefore we'd expect an unbounded cache to have 87% hit rate 
at 400M accesses or 90% at 300M. If you're observing 80%, then at best there is 
10% boost. If Bélády's optimal is lower then there is even less of a difference 
to boost by. It could be that SLRU captures frequency well enough that both 
policies are equivalent.

The [MultiQueue 
paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf] 
argues that 2nd level cache access patterns are frequency skewed. The 
LruBlockCache only retains if there were multiple accesses, not the counts, and 
tries to evict fairly across the buckets. Since TinyLFU captures a longer tail 
(freq. of items outside of the cache), there is a chance that it can make a 
better prediction. But we wouldn't know without an access trace to simulate 
with.

I suspect that the high hit rate means there isn't much cache pollution to 
lower the hit rate, so a good enough victim is chosen. At the tail most of the 
entries have a relatively similar frequency, too. It would be fun to find out, 
but you probably won't think it was worth the effort.

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-27 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943134#comment-15943134
 ] 

Edward Bortnikov commented on HBASE-17339:
--

Can't see how TinyLFU can do a better job with stationary distributions (in 
which item popularity does not change over time). I'd imagine it being good 
under bursty workloads. 

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-27 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943102#comment-15943102
 ] 

Eshcar Hillel commented on HBASE-17339:
---

Yes, sure I can run with your patch [~ben.manes] :)
I just wonder if you have any insight on wether or not tinyLFU *can* help 
before testing it.
So far memstore was considered the write cache and block cache the read cache. 
The optimization makes memstore a first tier read cache and block cache a 
second tier read cache. So with zipfian distribution the head of the 
distribution is found in memstore and the tail is searched in the block cache. 
With the current LRU cache we see the same number of eviction from the cache 
with and without the optimization.
Do you think tinyLFU can do a better job in managing the blocks with smarter 
admission-eviction so the hit rate is increased? Or since this is dealing with 
the "torso" and not the head of the distribution can't do better job?

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-26 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942296#comment-15942296
 ] 

Ben Manes commented on HBASE-17339:
---

Can you compare runs with the TinyLFU patch to observe the impact? 

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942268#comment-15942268
 ] 

Hadoop QA commented on HBASE-17339:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 37s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 103m 24s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
43s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 55s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12860563/HBASE-17339-V06.patch 
|
| JIRA Issue | HBASE-17339 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 83eadd35175c 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 4a076cd |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6223/testReport/ |
| modules | C: hbase-client hbase-server hbase-it U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/6223/console |
| 

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-03-26 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942225#comment-15942225
 ] 

Eshcar Hillel commented on HBASE-17339:
---

I am attaching results of an experiment with mixed workload, and also the most 
updated patch if anyone else wants to run it own experiments.
For the lower percentiles the optimization gains 8-9% in read latency, for high 
percentiles it ranges between -5% to +5%. 
The experiment ran 100M get operations. With no optimization this translates 
into 100M (full) scans, ~400M cache accesses from which ~30M are misses.
With the optimization we have only 62M (full) scans (the rest scan only the 
memory for results), and only ~300M cache accesses, but the same amount of 
misses ~30M. 
In other experiment I did I saw the hit ratio dropping from 90% with no 
optimization to 80% with the optimization.
If we can reduce the amount of misses we can reduce the read latency also in 
the high percentiles.

Can we have a different caching policy that reduces misses when reading less 
from the cache? Perhaps TinyLFU (HBASE-15560) can help here [~ben.manes]?

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859787#comment-15859787
 ] 

Hadoop QA commented on HBASE-17339:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
52s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 9s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 0s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 52s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 49s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 40s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 31s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 13m 17s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 15m 2s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 16m 46s 
{color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 34s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-02-08 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857939#comment-15857939
 ] 

Eshcar Hillel commented on HBASE-17339:
---

Currently the optimization is only on in TestAcidGuarantees but I will look 
into all failed tests.


> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-02-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857928#comment-15857928
 ] 

Hadoop QA commented on HBASE-17339:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
8s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 53s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 45s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 33s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 20s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 12s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 57s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 42s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 14m 28s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 16m 15s 
{color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 55s 
{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 6s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} The patch does not 

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-02-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857712#comment-15857712
 ] 

Hadoop QA commented on HBASE-17339:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 31s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 3s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 36s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 7s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 39s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 10s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 44s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 17s 
{color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 13m 51s 
{color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 45s 
{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 39s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not 

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-02-08 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857616#comment-15857616
 ] 

Eshcar Hillel commented on HBASE-17339:
---

new patch is available
Summary of changes:
* (TODO 1 above) init maxFlushedTimestamp from store file timestams (if any)
* (TODO 2) memoryScanOptimization is a table property
* added tests through TestAcidGuarantees

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-01-26 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839463#comment-15839463
 ] 

Eshcar Hillel commented on HBASE-17339:
---

Patch also available in review board.

> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

2017-01-26 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839458#comment-15839458
 ] 

Eshcar Hillel commented on HBASE-17339:
---

The attached patch is not complete and not properly tested and so may have some 
bugs (but it is compiling :) ).
I'm posting it to get feedback on the core logic.
The main property needed for this optimization is monotonicity. A store 
preserves *monotonicity* if all timestamps in its memstore are strictly greater 
than all timestamps in its store files.

The algorithm is as follows
{code}
0. decide if we should apply optimization: (1) flag is on (2) get operation 
over a specific set of columns
if decided to apply optimization then
 1. open all relevant *memory* scanners; 
 while opening scanners collect max flushed timestamps in all stores (first 
collect); 
 a null timestamp indicates the store does not maintain monotonicity
 2. if all stores are monotonic then 
2.1 get results
2.2 validate monotonicity: validate max flushed timestamps have not 
changed in all stores 
   (double-collect ensures results are taken from a consistent view) 
if decided not to apply optimization 
   *OR* stores are not monotonic 
   *OR* decided to apply optimization but results do not satisfy get operation 
(not enough versions per column) 
then
 3. open all scanners
 4. get results
{code}

Missing parts (TODOs)
- properly init maxFlushedTimestamp (in AbstractMemStore)  when recovering -- 
need to traverse all existing store files
- make memoryScanOptimization a table property instead of global property; set 
to true by default
- (Optional) add a flag in Get operation which indicates if the user wants to 
apply the optimization (per each operation!); set to true by default
- (Optional) check if we can change the implementation of getScanners in 
XXXMemstore to return multiple scanners so we can later filter out each one of 
them and not either keep all or eliminate all. Currently the implementation 
(both in default and compacting) returns a singleton list with one 
MemStoreScanner which comprises one to few segment scanners.


> Scan-Memory-First Optimization for Get Operations
> -
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
>  Issue Type: Improvement
>Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)