[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930909#comment-13930909 ] Lars Hofhansl commented on HBASE-9000: -- HBASE-9778 is committed now. It allows a configurable optional look ahead in ExplicitColumn tracker. Hence it'll do it in both Memstore and StoreScanner. I did look into doing that in StoreFileScanner, but most of the time of a reseek is spent before we even get to StoreFileScanner (checking whether we are forward seeking, checking whether we're still on the same block, seeking every HFile forward, etc). Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Components: Performance Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930974#comment-13930974 ] Hadoop QA commented on HBASE-9000: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611241/hbase-9000.patch against trunk revision . ATTACHMENT ID: 12611241 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + public final static String MEMSTORE_RESEEK_LINEAR_SEARCH_LIMIT_KEY = hbase.hregion.memstore.linear.search.limit; + private static final Option NUM_COLUMNS_PER_ROW_OPTION = new Option(c, columns-per-row, true, columns per row); + private static final Option NUM_VERSIONS_OPTION = new Option(v, versions, true, number of versions); + private static final Option ROW_KEY_SIZE_OPTION = new Option(R, row-key-size, true, size of row-key); + private static final Option QUALIFIER_SIZE_OPTION = new Option(q, qualifier-size, true, size of qualifier); + private static final Option VALUE_SIZE_OPTION = new Option(V, value-size, true, size of value); + private static final Option RANDOM_SEED_OPTION = new Option(s, random-seed, true, random seed); + private void benchmarkReseeks(String name, KeyValueScanner scanner, ListKeyValue keys) throws IOException { {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.io.TestHeapSize Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8950//console This message is automatically generated. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Components: Performance Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819616#comment-13819616 ] Lars Hofhansl commented on HBASE-9000: -- Just to add some data here. I loaded some data into a table. Then scanned it with Phoenix: took 34s. Then I flushed the table, scanned again: 5s. There might have been some other factors at work so this is a bit anecdotal, but this definitely needs some work. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Components: Performance Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819630#comment-13819630 ] Lars Hofhansl commented on HBASE-9000: -- Never mind that was because of caching. When I major compact (and thus wipe the cache) it takes 34s again. I would delete my previous comment, but that would confusing. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Components: Performance Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814700#comment-13814700 ] Chao Shi commented on HBASE-9000: - bq. Should we do the same thing in StoreFileScanner? Yes, I think so. bq. If so, why not do this in StoreScanner, for example, call next some times before call reseek... This is because StoreScanner does not have enough knownlege to judge whether do a reseek vs. several times of next. As discussed before in this thread, an attempt to call next to do linear reseek may hit uncached block, whose cost is huge compared to a logarithmic reseek that touches only cached index blocks. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814608#comment-13814608 ] chunhui shen commented on HBASE-9000: - [~stepinto] I understand the scenario which the patch is used for. Should we do the same thing in StoreFileScanner? If so, why not do this in StoreScanner, for example, call next some times before call reseek... As personal view, such a action seems a little rude +0 from me Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814609#comment-13814609 ] chunhui shen commented on HBASE-9000: - I'm sorry I have no better idea to optimize performance for this scenariofor Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813676#comment-13813676 ] Chao Shi commented on HBASE-9000: - Hi [~zjushch], bq. How to decide the config value? I think most users should use the default value. This patch optimize performance on scenario that reseek on MemStore is the bottleneck, for example, scan with a filter that skips a lot of KVs. In this case, you need to tweak this value and make sure linear seeks happen rather than reseeks (i.e.. set this value to be greater than #versions in MemStore, if you are using SEEK_NEXT_COL). As most users don't have much versions per KV in their MemStore, I think the default value should play well. bq. IMPO, the new code in MemStore seems not friendly Could you please explain more on what can be improved? Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810444#comment-13810444 ] Ted Yu commented on HBASE-9000: --- Patch looks good. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810980#comment-13810980 ] chunhui shen commented on HBASE-9000: - As the above performance tests, the patch doesn't fit all. How to decide the config value? IMPO, the new code in MemStore seems not friendly Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808781#comment-13808781 ] Chao Shi commented on HBASE-9000: - bq. Hah. I just something very similar, but actually in StoreScanner. Not quite ready, but assumes a seek within a row or to the next row is a near seek, and the tries next() a few time (I picked 10 as a default). Did you mean to call next at StoreScanner level? I'm not sure if it is possible to do this there, as it may not have necessary knowledge about the cost of next vs. reseek. For example, a next attempt that causes reading another uncached block from FS is probably worse than do a reseek. I may be wrong as I didn't read your implementation. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809327#comment-13809327 ] Lars Hofhansl commented on HBASE-9000: -- Yeah I agree. It's actually better to do that at the MestoreScanner and StoreFileScanner level. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809366#comment-13809366 ] Ted Yu commented on HBASE-9000: --- 'reseek to next row' gets slower with patch. If linear.search.limit is lowered, would the slow down be less ? Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809383#comment-13809383 ] Ted Yu commented on HBASE-9000: --- Some comments on the benchmark: Please add Apache license header. Add class javadoc for MemStoreReseekBenchmark. {code} + private final Random random = new Random(); {code} Can you use seed for the above call and log the value of the seed ? {code} + private Listbyte[] rows; private Listbyte[] qualifiers; {code} Please put the above on two lines. {code} + boolean ok = scanner.reseek(key); + if (!ok) { +throw new AssertionError(!ok); {code} Please print the key which caused assertion. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809402#comment-13809402 ] Lars Hofhansl commented on HBASE-9000: -- bq. 'reseek to next row' gets slower with patch. This is because the limit of 20 not optimal for this particular case. We'll never get it 100% right for all cases, what we need to do is to avoid the worst cases. In this case it'll be faster per op if the limit is increased to 30 (my guess) or lowered. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809870#comment-13809870 ] Chao Shi commented on HBASE-9000: - bq. 'reseek to next row' gets slower with patch. If linear.search.limit is lowered, would the slow down be less ? Yes, the optimal value should vary case by case. I think a default value of 5 should not add much observable overhead. I think a long-term solution would be implementing our own version of lock-free skip-list, where we can access its higher-level of next pointers (i.e. to skip) from the current position. This patch could be a temporary solution for now, as it is very simple. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809889#comment-13809889 ] Ted Yu commented on HBASE-9000: --- @Chao: Thanks for the quick response. Can you update the table using new patch to see 'reseek to next row' gets faster ? Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809906#comment-13809906 ] Hadoop QA commented on HBASE-9000: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611241/hbase-9000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7684//console This message is automatically generated. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809958#comment-13809958 ] Chao Shi commented on HBASE-9000: - I re-ran the benchmark program, and get the following numbers. (As there the overhead is not significant, the numbers below are the median of 5 runs.) ||operation||trunk||w/ patch (n=5)||w/patch (n=20) |reseek to next row|5.82 us|5.76 us|6.14 us| |reseek to next column|3.397 us|0.596 us|0.572 us| (where n is the limit of max linear seeks) numbers are varying within +-10% between each run. bq. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. bq. -1 site. The patch appears to cause mvn site goal to fail. These two QA -1s seem not to be related with my patch. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807774#comment-13807774 ] Liang Xie commented on HBASE-9000: -- [~stepinto], nice work! Hi [~lhofhansl], would you like to impl your idea about far/near reseek in this jira? or let's just do a backport: change the current always far behivor into a similar manner like 0.89-fb's here? thanks! Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808517#comment-13808517 ] Lars Hofhansl commented on HBASE-9000: -- bq. or let's just do a backport: change the current always far behivor into a similar manner like 0.89-fb's here What does this entail? reseeking can be useful in many cases, I would not want to generally go back and disable that. We should also investigate other data structures for the memstore. And lastly, in a typical system only a small fraction of the data is in the memstore, the majority of the data will in HFiles and hence be scanned with StoreFileScanners - if that would not be the case I would not advocate the use of HBase and suggest something like memcached, etc, instead. That all said, opportunistically performing a few nexts and only then issueing a reseek would be a good addition. Could use MAX_VERSIONS as guidepost here, or make it configurable (there are usecases where many versions/columns might be kept in HBase). Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808569#comment-13808569 ] Lars Hofhansl commented on HBASE-9000: -- In all fairness, we should not divide the runtime by the number of ops. The whole point of seeking is to reduce the number of ops. In that case it looks like this: {quote} Next: 136000.0 us ReseekToNextRow: 443000.0 us ReseekToNextColumn: 2243000.0 us {quote} Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808771#comment-13808771 ] Lars Hofhansl commented on HBASE-9000: -- Hah. I just something very similar, but actually in StoreScanner. Not quite ready, but assumes a seek within a row or to the next row is a near seek, and the tries next() a few time (I picked 10 as a default). HBASE-9778 is yet another approach. Rather than doing a serious of next()'s when a reseek was requested that change would not request a reseek in the first place when it makes sense. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806669#comment-13806669 ] Chao Shi commented on HBASE-9000: - I don't think tailSet is efficient, considering a scenario that a filter is present and keep returning SEEK_NEXT_COL. A call to tailSet does not make use of the current position and require relocate to there over the skip list. In most cases, where maxVersions of a table is set to a small value, it can alternatively skip at most maxVersions keys. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807468#comment-13807468 ] Lars Hofhansl commented on HBASE-9000: -- We should quantify the difference in some microbenchmarks. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807651#comment-13807651 ] Liang Xie commented on HBASE-9000: -- Just my understanding: our current impl in trunk or the old 0.89-fb impl seems not optimized against all cases, one type of case probably is friendly with current logarithmic seek imple(tailSet), e.g. lots of kv needs to be skipped; another type of case is friendly with linear reseek, e.g. the target kv is near with the previous position, like 0.89-fb's old impl like : {code} public boolean reseek(KeyValue key) { while (kvsetNextRow != null comparator.compare(kvsetNextRow, key) 0) { kvsetNextRow = getNext(kvsetIt); } ... {code} to me, it's reasonable to have a setting value to : let the cheaper linear reseek go first, then fallback to more expensive seek() ops after reaching a config value. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807674#comment-13807674 ] Lars Hofhansl commented on HBASE-9000: -- I've been thinking about declaring a reseek as near or far. A near reseek would be to the next column or row, whereas a far reseek could be result of a seek hint from the Filter. In the near case we could try next() a few times without and then seek, in the far case we'd seek immediately as we expect to be able to skip a lot of KVs. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807685#comment-13807685 ] Liang Xie commented on HBASE-9000: -- sound great:) Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713995#comment-13713995 ] Lars Hofhansl commented on HBASE-9000: -- Is that an FB issue only? In current HBase (0.94+) I see: {code} kvsetIt = kvsetAtCreation.tailSet(getHighest(key, kvsetItRow)).iterator(); snapshotIt = snapshotAtCreation.tailSet(getHighest(key, snapshotItRow)).iterator(); {code} In reseek. Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira