[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355000#comment-16355000 ] Anoop Sam John commented on HBASE-17339: The BF blocks need NOT be always in memory. It depends on the cache size and access pattern. By default if we have on heap LRU cache alone, all the index, bloom and data blocks come to there and chances of any type of block miss is possible. When one uses Bucket cache (We called it L2 but it not not really L2 any more), the data blocks will be in BC always and the on heap cache will keep index, bloom blocks. More likely we may have the blocks always in cache but this cache also size limited and so miss is possible. This issue is closed now. Did not see much perf boost. This was/is an interesting issue.. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036219#comment-16036219 ] Edward Bortnikov commented on HBASE-17339: -- Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to figure out the current implementation, to provide the community with the full picture (smile). We looked at a workload with temporal (rather than spatial) locality, namely writes closely followed by reads. This pattern is quite frequent in pub-sub scenarios. Instead of seeing a performance benefit in reading from MemStore first, we saw nearly 100% cache hit rate, and could not explain it for a while. The lazy evaluation procedure described by [~eshcar] sheds the light. Obviously, explicitly prioritizing reading from MemStore first rather than simply deferring the data fetch from disk could help avoid some access to Bloom filters, just to figure out whether the key has earlier versions on disk. Those accesses could be avoided. The main practical impact is when the BF itself is not in memory, and accessing it triggers I/O. Is that a realistic scenario? We assume that normally, BF's are permanently cached for all HFile's managed by the RS. Dear community - please speak up. Thanks. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033707#comment-16033707 ] Eshcar Hillel commented on HBASE-17339: --- After some time away from this Jira, and some additional experiments and digging into the code, here are our current understanding: HBase already implements some optimization which makes the current suggestion less critical. I will try to explain it in a nutshell. As mentioned, a get operation is divided into two main steps (1) creating and filtering all HFile scanners and memory scanners, (2) applying the next operation which retrieves the result for the operation. HBase defers the seek operation of the scanners as much as possible. In step (1) all scanners are combined in a key-value heap which is sorted by the top key of all scanners. However if there is more than one scanner, then the HFiles scanners do not apply real seek. Instead they set the current cell to be a fake cell which simulates as if a seek to the key was done. In cases were the key can be found both in memory and on disk memory segments have higher timestamps, and they reside at the top of the heap. Finally, in step (2) the store scanner gets the result from the scanners heap. It starts querying the scanners at the top. Only at this point if an HFile scanner is polled from the heap and no real seek was done HBase seeks the key in the file. This seek might end up finding the blocks in the cache or it retrieves them from disk. In addition, in step (1) filtering HFile scanners requires reading HFile metadata and bloom filters -- in most cases these can be found in cache. The optimization implemented in this Jira takes a different approach by trying to only look in memory segments as first step. When the data is found in memory this indeed reduces latency since it avoids the need to read HFile metadata and bloom filters and manages a bigger scanners heap, but when the data is only on disk it incurs the overhead of scanning the data twice (memory only and then full scan). The question is, given this understanding is there a point in having the new optimization, or are we satisfied with the current one? Is there a known scenario where not all bloom filters and metadata blocks are found in the cache? > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943573#comment-15943573 ] Ben Manes commented on HBASE-17339: --- I think its really difficult to tell, but I'd guess that there might be a small gain. Those 30M misses sound compulsory, meaning that they would occur regardless of the cache size. Therefore we'd expect an unbounded cache to have 87% hit rate at 400M accesses or 90% at 300M. If you're observing 80%, then at best there is 10% boost. If Bélády's optimal is lower then there is even less of a difference to boost by. It could be that SLRU captures frequency well enough that both policies are equivalent. The [MultiQueue paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf] argues that 2nd level cache access patterns are frequency skewed. The LruBlockCache only retains if there were multiple accesses, not the counts, and tries to evict fairly across the buckets. Since TinyLFU captures a longer tail (freq. of items outside of the cache), there is a chance that it can make a better prediction. But we wouldn't know without an access trace to simulate with. I suspect that the high hit rate means there isn't much cache pollution to lower the hit rate, so a good enough victim is chosen. At the tail most of the entries have a relatively similar frequency, too. It would be fun to find out, but you probably won't think it was worth the effort. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943134#comment-15943134 ] Edward Bortnikov commented on HBASE-17339: -- Can't see how TinyLFU can do a better job with stationary distributions (in which item popularity does not change over time). I'd imagine it being good under bursty workloads. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943102#comment-15943102 ] Eshcar Hillel commented on HBASE-17339: --- Yes, sure I can run with your patch [~ben.manes] :) I just wonder if you have any insight on wether or not tinyLFU *can* help before testing it. So far memstore was considered the write cache and block cache the read cache. The optimization makes memstore a first tier read cache and block cache a second tier read cache. So with zipfian distribution the head of the distribution is found in memstore and the tail is searched in the block cache. With the current LRU cache we see the same number of eviction from the cache with and without the optimization. Do you think tinyLFU can do a better job in managing the blocks with smarter admission-eviction so the hit rate is increased? Or since this is dealing with the "torso" and not the head of the distribution can't do better job? > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942296#comment-15942296 ] Ben Manes commented on HBASE-17339: --- Can you compare runs with the TinyLFU patch to observe the impact? > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942268#comment-15942268 ] Hadoop QA commented on HBASE-17339: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 17s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 28m 37s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 103m 24s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 155m 55s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12860563/HBASE-17339-V06.patch | | JIRA Issue | HBASE-17339 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 83eadd35175c 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 4a076cd | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/6223/testReport/ | | modules | C: hbase-client hbase-server hbase-it U: . | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/6223/console | |
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942225#comment-15942225 ] Eshcar Hillel commented on HBASE-17339: --- I am attaching results of an experiment with mixed workload, and also the most updated patch if anyone else wants to run it own experiments. For the lower percentiles the optimization gains 8-9% in read latency, for high percentiles it ranges between -5% to +5%. The experiment ran 100M get operations. With no optimization this translates into 100M (full) scans, ~400M cache accesses from which ~30M are misses. With the optimization we have only 62M (full) scans (the rest scan only the memory for results), and only ~300M cache accesses, but the same amount of misses ~30M. In other experiment I did I saw the hit ratio dropping from 90% with no optimization to 80% with the optimization. If we can reduce the amount of misses we can reduce the read latency also in the high percentiles. Can we have a different caching policy that reduces misses when reading less from the cache? Perhaps TinyLFU (HBASE-15560) can help here [~ben.manes]? > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, > HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859787#comment-15859787 ] Hadoop QA commented on HBASE-17339: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 9s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 0s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 52s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 49s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 40s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 31s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 13m 17s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 15m 2s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 16m 46s {color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 34s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} The patch does not generate ASF License warnings. {color} |
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857939#comment-15857939 ] Eshcar Hillel commented on HBASE-17339: --- Currently the optimization is only on in TestAcidGuarantees but I will look into all failed tests. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857928#comment-15857928 ] Hadoop QA commented on HBASE-17339: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 8s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 53s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 45s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 33s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 20s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 12s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 57s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 42s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 14m 28s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 16m 15s {color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 55s {color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 6s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s {color} | {color:green} The patch does not
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857712#comment-15857712 ] Hadoop QA commented on HBASE-17339: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 39s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 31s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 3s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 36s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 7s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 39s {color} | {color:red} The patch causes 17 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 10s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 44s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 17s {color} | {color:red} The patch causes 17 errors with Hadoop v2.7.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 13m 51s {color} | {color:red} The patch causes 17 errors with Hadoop v3.0.0-alpha2. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 45s {color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 39s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857616#comment-15857616 ] Eshcar Hillel commented on HBASE-17339: --- new patch is available Summary of changes: * (TODO 1 above) init maxFlushedTimestamp from store file timestams (if any) * (TODO 2) memoryScanOptimization is a table property * added tests through TestAcidGuarantees > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, > HBASE-17339-V03.patch, HBASE-17339-V03.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839463#comment-15839463 ] Eshcar Hillel commented on HBASE-17339: --- Patch also available in review board. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations
[ https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839458#comment-15839458 ] Eshcar Hillel commented on HBASE-17339: --- The attached patch is not complete and not properly tested and so may have some bugs (but it is compiling :) ). I'm posting it to get feedback on the core logic. The main property needed for this optimization is monotonicity. A store preserves *monotonicity* if all timestamps in its memstore are strictly greater than all timestamps in its store files. The algorithm is as follows {code} 0. decide if we should apply optimization: (1) flag is on (2) get operation over a specific set of columns if decided to apply optimization then 1. open all relevant *memory* scanners; while opening scanners collect max flushed timestamps in all stores (first collect); a null timestamp indicates the store does not maintain monotonicity 2. if all stores are monotonic then 2.1 get results 2.2 validate monotonicity: validate max flushed timestamps have not changed in all stores (double-collect ensures results are taken from a consistent view) if decided not to apply optimization *OR* stores are not monotonic *OR* decided to apply optimization but results do not satisfy get operation (not enough versions per column) then 3. open all scanners 4. get results {code} Missing parts (TODOs) - properly init maxFlushedTimestamp (in AbstractMemStore) when recovering -- need to traverse all existing store files - make memoryScanOptimization a table property instead of global property; set to true by default - (Optional) add a flag in Get operation which indicates if the user wants to apply the optimization (per each operation!); set to true by default - (Optional) check if we can change the implementation of getScanners in XXXMemstore to return multiple scanners so we can later filter out each one of them and not either keep all or eliminate all. Currently the implementation (both in default and compacting) returns a singleton list with one MemStoreScanner which comprises one to few segment scanners. > Scan-Memory-First Optimization for Get Operations > - > > Key: HBASE-17339 > URL: https://issues.apache.org/jira/browse/HBASE-17339 > Project: HBase > Issue Type: Improvement >Reporter: Eshcar Hillel > Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch > > > The current implementation of a get operation (to retrieve values for a > specific key) scans through all relevant stores of the region; for each store > both memory components (memstores segments) and disk components (hfiles) are > scanned in parallel. > We suggest to apply an optimization that speculatively scans memory-only > components first and only if the result is incomplete scans both memory and > disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)