[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381180#comment-14381180 ] Hudson commented on HBASE-13109: ABORTED: Integrated in Phoenix-master #638 (See [https://builds.apache.org/job/Phoenix-master/638/]) PHOENIX-1642 Make Phoenix Master Branch pointing to HBase1.0.0 - ADDENDUM for HBASE-13109 (enis: rev ad2ad0cefd5d19a9bc8434555a9ecbb55c78) * phoenix-core/src/main/java/org/apache/phoenix/hbase/index/scanner/FilteredKeyValueScanner.java * phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/IndexHalfStoreFileReader.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-0.98-v5.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372491#comment-14372491 ] Lars Hofhansl commented on HBASE-13109: --- [~vik.karma] confirmed that the scan mentioned in the description is about 3x faster. That's a 3x end-to-end improvement in an M/R job! Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-0.98-v5.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366055#comment-14366055 ] Lars Hofhansl commented on HBASE-13109: --- But the HBase code would call into the scanners injected by the Phoenix coprocessors... Anyway, since it works fine, there's something I am missing, which is just fine :) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365866#comment-14365866 ] Mujtaba Chohan commented on HBASE-13109: [~jamestaylor] Checked. Mutable/local index using existing 4.3.0 release works fine with 0.98.12-SNAPSHOT. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366046#comment-14366046 ] Lars Hofhansl commented on HBASE-13109: --- Pffeeewww... Good. (I admit I am surprised, since the HBase core code would call the new method on the HFileScanner and KeyValueScanner interfaces) The fact remains, though, that we have to get a new version of Phoenix 4.3 and 4.2 out before we ship 0.98.12, else there would be no released version of Phoenix to compile against the current version of 0.98. I discussed offline with [~apurtell], and we think that might the best option. I assume if we can't make that, we'll delay this until 0.98.13. (Sorry for the pain caused here. It's instructive, though, and worth the performance gains) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366050#comment-14366050 ] Andrew Purtell commented on HBASE-13109: bq. I admit I am surprised, since the HBase core code would call the new method on the HFileScanner and KeyValueScanner interfaces Phoenix code does not call the new method, of course, which is why the binary compatibility checker didn't flag this as a problem. It would have been a different story if there was a removal or rename of a specific method used by Phoenix. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364428#comment-14364428 ] James Taylor commented on HBASE-13109: -- Thanks, [~apurtell]. [~mujtabachohan] - would you mind trying existing Phoenix binary release (4.3.0 is fine) against this snapshot. In particular, do mutable and local indexing still work correctly? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364356#comment-14364356 ] Andrew Purtell commented on HBASE-13109: I've confirmed 0.98.12 snapshots are available in Apache Maven now. Be sure the Apache snapshots repository is included in your POM: {noformat} repository idapache.snapshots/id urlhttp://repository.apache.org/snapshots//url snapshots enabledtrue/enabled /snapshots /repository {noformat} Use the versions 0.98.12-hadoop2-SNAPSHOT for the Hadoop 2 build, 0.98.12-hadoop1-SNAPSHOT for the Hadoop 1 build. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363659#comment-14363659 ] stack commented on HBASE-13109: --- bq. Would it be possible to get the 0.98.12 snapshot into maven so we can see if/how Phoenix will work with it? Can't you not build it local? This will install it in your local repo. You can then check phoenix against the locally installed version? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363670#comment-14363670 ] Andrew Purtell commented on HBASE-13109: I'll publish a snapshot today, so you can do either. :-) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362243#comment-14362243 ] Lars Hofhansl commented on HBASE-13109: --- Ultimately I only need this method in: KeyValueHeap, StoreScanner, StoreFileScanner, and AbstractHFileReader.Scanner. But doing only would litter class casts all over the code along with class checks in hot code paths. I do not think that makes sense. So, we can undo this from 0.98 altogether. (but -1 on that from me). Or we can delay this until 0.98.13. By then Phoenix needs to have new minor versions of 4.2 and 4.3. I'd be +0 on that. And just in case, -1 on delaying this further than 0.98.13... Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362250#comment-14362250 ] Andrew Purtell commented on HBASE-13109: Thanks for checking. I'd be +0 on a delay as well Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362251#comment-14362251 ] Andrew Purtell commented on HBASE-13109: And -1 for permanent revert Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362609#comment-14362609 ] Lars Hofhansl commented on HBASE-13109: --- [~giacomotaylor], what do you think? If we delay to 0.98.13, I think we can have new versions of Phoenix by then. (If not, might as well leave it in 0.98.12) We should also check a version Phoenix built against t Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362744#comment-14362744 ] James Taylor commented on HBASE-13109: -- Would it be possible to get the 0.98.12 snapshot into maven so we can see if/how Phoenix will work with it? We plan on releasing a 4.3.1 soon - perhaps we can start a vote in a week. What's your time frame for 0.98.12? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362123#comment-14362123 ] James Taylor commented on HBASE-13109: -- Not positive it's broken, so we should try it for sure. Is it available in maven? Yes, I meant the combined HBase+Phoenix community. Of course it's absolutely your call - would just be good if we knew in advance. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362121#comment-14362121 ] Andrew Purtell commented on HBASE-13109: [~giacomotaylor] JavaACC says the changes have no binary compat impact with already compiled code. I don't see how previous releases of Phoenix are broken. I think that states the problem too strongly. It is true that recompilation will be problematic without accommodation (see above) or local patching, but that's not the same thing as broken, right? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362125#comment-14362125 ] Andrew Purtell commented on HBASE-13109: Let's see what Lars says first. I'd like to accommodate Phoenix if we can. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362197#comment-14362197 ] Lars Hofhansl commented on HBASE-13109: --- I do not think I can fix this without the extra methods. Lemme have a look. We can also undo this for 0.98.12 and put it into 0.98.13 instead, which time there should be new point versions of Phoenix. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361526#comment-14361526 ] Andrew Purtell commented on HBASE-13109: So here's where I think we are: - This commit is fine. - No new base classes and inheritance hierarchy changes - Handle updates to Phoenix on a Phoenix JIRA. I can push a 0.98.12 SNAPSHOT to Maven if that would help. Any issues with this [~giacomotaylor] [~lhofhansl] ? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361604#comment-14361604 ] James Taylor commented on HBASE-13109: -- I believe this essentially breaks all existing 4.x Phoenix releases (at a minimum, it would break mutable and local secondary indexes). The only way Phoenix users will be able to use 0.98.12+ is to wait for the next Phoenix release and upgrade to that one (at a minimum on the server side). Not sure if this is a problem for the user community. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358923#comment-14358923 ] Andrew Purtell commented on HBASE-13109: I would rather not impose a performance penalty on 0.98 for sake of compatibility with Phoenix where they've used a private interface. When I do 0.98 RC I check if Phoenix compiles using the heads of master and sometimes branch 4.0 if I have time. We could get that working today actually. I could publish a 0.98.12 SNAPSHOT to Maven now including this change, and update Phoenix POMs on branches master and 4.0 to use the snapshot, and add the necessary methods there. [~jamestaylor]? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358892#comment-14358892 ] Lars Hofhansl commented on HBASE-13109: --- Hmm... Not sure what we can do other than (1) by using a private/evolving interface you're on your own or (2) roll this back from 0.98. I'd be fine with #2. I suppose we could add implementations of these methods In Phoenix now (it's perfectly OK to just return null, just means this optimization will not be used). Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359047#comment-14359047 ] James Taylor commented on HBASE-13109: -- Would it be possible to have base classes for these we can extend from to shield us from interface additions? The FilteredKeyValueScanner class is deep in the bowels of mutable secondary indexing - [~jesse_yates] - any ideas for how to get this on to non private/evolving interfaces? The HFileScanner anonymous implementation is in the bowels of local indexes. Same question, [~rajeshbabu] - any ideas for how to get this on to non private/evolving interfaces? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359067#comment-14359067 ] Andrew Purtell commented on HBASE-13109: bq. Would it be possible to have base classes for these we can extend from to shield us from interface additions? Yes we can add these, but it's still one incompat change to move to using the base classes. Ok? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359074#comment-14359074 ] Andrew Purtell commented on HBASE-13109: Also, see above my suggestion to move to a 0.98 SNAPSHOT after making the change. Or, I can just ignore that Phoenix won't compile until updated with the 0.98.12 RC when working on the next release. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359115#comment-14359115 ] Andrew Purtell commented on HBASE-13109: bq. Want to make sure I understand the b/w compat implications. Does this mean that our current 4.3 and below releases will no longer work with 0.98.12 and above? And that 4.4 and above will only work with 0.98.12 and above? = 4.3 won't compile. I ran the binary compatibility checker and it says the addition of the abstract method getNextIndexedKey( ) to the interfaces has no effect. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359130#comment-14359130 ] Andrew Purtell commented on HBASE-13109: If we add base classes and change the inheritance hierarchy that may impact binary compat. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359189#comment-14359189 ] Lars Hofhansl commented on HBASE-13109: --- It'd be best to add the methods to Phoenix. If we do not add an override annotation that would work with old and new versions of HBase. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359085#comment-14359085 ] James Taylor commented on HBASE-13109: -- Want to make sure I understand the b/w compat implications. Does this mean that our current 4.3 and below releases will no longer work with 0.98.12 and above? And that 4.4 and above will only work with 0.98.12 and above? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357707#comment-14357707 ] Andrew Purtell commented on HBASE-13109: [~jesse_yates] Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357703#comment-14357703 ] Andrew Purtell commented on HBASE-13109: The commit of this to 0.98 branch breaks Phoenix compilation if using -Dhbase.version=0.98.12-SNAPSHOT (after installing latest 0.98 into the local Maven cache): {noformat} [ERROR] /Users/apurtell/src/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/IndexHalfStoreFileReader.java:[141,35] anonymous org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReader$1 is not abstract and does not override abstract method getNextIndexedKey() in org.apache.hadoop.hbase.io.hfile.HFileScanner [ERROR] /Users/apurtell/src/phoenix/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/scanner/FilteredKeyValueScanner.java:[37,8] org.apache.phoenix.hbase.index.scanner.FilteredKeyValueScanner is not abstract and does not override abstract method getNextIndexedKey() in org.apache.hadoop.hbase.regionserver.KeyValueScanner {noformat} KeyValueScanner and HFileScanner are both marked as InterfaceAudience.Private. What should we do here? [~jamestaylor] Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347025#comment-14347025 ] Ted Yu commented on HBASE-13109: +1 Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347800#comment-14347800 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-1.1 #247 (See [https://builds.apache.org/job/HBase-1.1/247/]) HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev f5020e9c1a98727cb100f24294df50072d599bf8) * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347797#comment-14347797 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-1.0 #788 (See [https://builds.apache.org/job/HBase-1.0/788/]) HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev a3e9325150de4ad89f3032535be8e20fb352f182) * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347894#comment-14347894 ] Jonathan Lawlor commented on HBASE-13109: - Looks like this introduced some new javadoc warnings that are being called out in other precommit build checks: {quote} [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java:82: warning - @param argument lookAhead is not a parameter name. [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:587: warning - @param argument off is not a parameter name. [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:587: warning - @param argument len is not a parameter name. [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:602: warning - @param argument off is not a parameter name. [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:602: warning - @param argument len is not a parameter name. {quote} Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348033#comment-14348033 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-1.1 #249 (See [https://builds.apache.org/job/HBase-1.1/249/]) HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 1cdcb6e9b8d386d43b482ff8a5aa6f1c0e3c6791) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348043#comment-14348043 ] Hudson commented on HBASE-13109: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #840 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/840/]) HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev b3bd0016492eb99e3a83353f0879bfddebff4ec1) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348065#comment-14348065 ] Hudson commented on HBASE-13109: SUCCESS: Integrated in HBase-TRUNK #6207 (See [https://builds.apache.org/job/HBase-TRUNK/6207/]) HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 0bdab85b065bd0876152ac30c2ec6d08adae8006) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347964#comment-14347964 ] Hudson commented on HBASE-13109: SUCCESS: Integrated in HBase-0.98 #883 (See [https://builds.apache.org/job/HBase-0.98/883/]) HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev b3bd0016492eb99e3a83353f0879bfddebff4ec1) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348150#comment-14348150 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-0.98 #884 (See [https://builds.apache.org/job/HBase-0.98/884/]) HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 2eda262dfee9889a008cb53d5c8a2a73959934e4) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348086#comment-14348086 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-1.0 #789 (See [https://builds.apache.org/job/HBase-1.0/789/]) HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev d72bb2f6a60bdf2ac9daf639f18030eee2ea9773) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347788#comment-14347788 ] Hudson commented on HBASE-13109: SUCCESS: Integrated in HBase-TRUNK #6205 (See [https://builds.apache.org/job/HBase-TRUNK/6205/]) HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 464e7ce685486e3ede13ec2351b45b0a0b65696c) * hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347928#comment-14347928 ] Lars Hofhansl commented on HBASE-13109: --- Uh oh... Lemme fix those. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347979#comment-14347979 ] Lars Hofhansl commented on HBASE-13109: --- Updated all branches. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348327#comment-14348327 ] Hudson commented on HBASE-13109: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #841 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/841/]) HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 2eda262dfee9889a008cb53d5c8a2a73959934e4) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344810#comment-14344810 ] ramkrishna.s.vasudevan commented on HBASE-13109: The optimize() logic makes sense. I think particularly it is going to be useful when there is one version of a cell and the filter/trackers say SEEK_TO_ROW/COL. Changing the HFileBlock Index to Cell is fine unless you have a concern on the number of objects being created and thrown away. In that case we may have to have a different approach but for the case of NO_INDEX_KEY-we cannot go with '==' check. In this case that is not there. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345276#comment-14345276 ] stack commented on HBASE-13109: --- [~lhofhansl] if you describe test you'd like I can try it here. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345553#comment-14345553 ] Lars Hofhansl commented on HBASE-13109: --- Actually PE only write single column rows, this needs many columns (or deletes) to show any improvement. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346481#comment-14346481 ] ramkrishna.s.vasudevan commented on HBASE-13109: +1 on patch. Nice work!! Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346204#comment-14346204 ] Lars Hofhansl commented on HBASE-13109: --- Did some more tests with Phoenix against 0.98, including some of the tests they used to validate their optimization to always use the WildcardColumnMatcher and doing the filtering themselves to avoid the cost of the ExplicitColumnTracker that does the seeking. Testing with 7 columns. One scenario was with all 7 columns in the same CF the other each column in its column family: Ran two queries: q1 = select count(1) where v3 = and v5 = and q2 = select avg(v2) where v3 = and v5 = 1CF case: || ||q1 w/ Phoenix p[t||q1 w/o Phoenix opt||q2 w/ Phoenix p[t||q2 w/o Phoenix opt|| |w/o patch|12.9|8.4|18.0|8.3| |w/ patch|7.5|7.2|7.5|7.1| Two observation: # Even with the Phoenix optimization this is faster because a bunch of SEEK_NEXT_ROWs are saved unless they're necessary. # The whole optimization is unnecessary now, it saves less than 10% in the *best* case with only one version per cell Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346217#comment-14346217 ] Lars Hofhansl commented on HBASE-13109: --- Same, but with each column in its own CF (in this case Phoenix does not use its WildcardTracker + Filter optimization) 6CF case: || ||q1||q2|| |w/o patch|15.3|15.5| |w/ patch|9.14|9.19| Any objection committing this to all branches. [~giacomotaylor], FYI (we can probably remove the ColumnProjectionFilter optimization when this is in) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346235#comment-14346235 ] James Taylor commented on HBASE-13109: -- Awesome - nice work, [~lhofhansl]! Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344622#comment-14344622 ] ramkrishna.s.vasudevan commented on HBASE-13109: {code} this.nextIndexedKV == HConstants.NO_NEXT_INDEXED_KV {code} This change in the patch is not quite right. Sorry about that. Here we may have to do a compare only if we change to Cell. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344671#comment-14344671 ] Lars Hofhansl commented on HBASE-13109: --- Oh and saw your sample patch. I think my version is even more radical... I change indexed key to Cell everywhere above the HFileBlockIndex :) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344598#comment-14344598 ] Lars Hofhansl commented on HBASE-13109: --- Ok... With deletes. Same as above but with an additional 400k deletes (deleted all columns every 10th row). Without patch: ||Wildcard||Col 2+4|| |4.38|12.0| With patch: ||Wildcard||Col 2+4|| |4.39|4.74| Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344663#comment-14344663 ] Lars Hofhansl commented on HBASE-13109: --- Ah. Too late, made a patch already :) And it does make things nicer. I'm fine with either -v4 or -v5. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344510#comment-14344510 ] Lars Hofhansl commented on HBASE-13109: --- Some more test with many HFiles (4m rows, 5 cols, 1 version - as above). No compactions at all - 25 HFiles of 10mb each. Without patch: ||Wildcard||Col 2+4|| |4.59|13.5| With patch: ||Wildcard||Col 2+4|| |4.38|4.89| Almost a 3x improvement. I have convinced myself that this is good. Somebody up for independent tests? I have a 0.98 patch as well (in fact that's the one I have used for testing)? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344656#comment-14344656 ] Hadoop QA commented on HBASE-13109: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702089/nextIndexKVChange_new.patch against master branch at commit daed00fc98167870463e77b620e9adb6ce9b204d. ATTACHMENT ID: 12702089 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestBlocksScanned {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.camel.test.junit4.CamelTestSupport.doStopCamelContext(CamelTestSupport.java:450) at org.apache.camel.test.junit4.CamelTestSupport.tearDown(CamelTestSupport.java:351) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13052//console This message is automatically generated. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344668#comment-14344668 ] Lars Hofhansl commented on HBASE-13109: --- TestBlockScanned passed locally. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344605#comment-14344605 ] ramkrishna.s.vasudevan commented on HBASE-13109: Am not hijacking the issue. Just trying to convey my point here. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344626#comment-14344626 ] Lars Hofhansl commented on HBASE-13109: --- NP [~ram_krish]. Thanks for taking a look! I am not actually too concerned about the KeyOnlyKeyValueObject (actually, could you have a look at the first patch I attached here, where I optimized it a bit?) I can make a Cell from the indexed key (in trunk at least). But KeyValue.KVComparator.compareOnlyKeyPortion(Cell, Cell) will not work, because I cannot make a Cell from the seek Cell in SQM without materializing the byte[]... That's the part I have to avoid. What I could do (in trunk) is... Instead of: {code} public int compareKey(byte[] key, int koff, int klen, byte[] row, int roff, int rlen, byte[] fam, int foff, int flen, byte[] col, int coff, int clen, long ts, byte type) {code} We'd wrap the indexed key in a KeyOnlyKeyValue and have: {code} public int compareKey(Cell cell, byte[] row, int roff, int rlen, byte[] fam, int foff, int flen, byte[] col, int coff, int clen, long ts, byte type) {code} I actually think then we should do it all the way down at AbstractHFileScanner and store the nextIndexedKey as Cell instead of byte[]. Lemme do that. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344524#comment-14344524 ] Lars Hofhansl commented on HBASE-13109: --- One more test I'll do is with deletes. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344647#comment-14344647 ] ramkrishna.s.vasudevan commented on HBASE-13109: bq.ut KeyValue.KVComparator.compareOnlyKeyPortion(Cell, Cell) will not work, because I cannot make a Cell from the seek Cell in SQM without materializing the byte[]... That's the part I have to avoid. I understand this as to why you have to avoid because you will not use the Cell from SQM directly as the ts and type is the one that you will be passing. We tried out a way in our internal branch in such cases where we want the FirstOnRow, LastOnCol, firstOnCol type of Kvs for which we created a new FirstOnCol cell object passing the cell - but the getTS and getType would return LATEST_TIMESTAMP/MIN_TIMESTAMP and type as MAX/MIN based on what we want such that it is two cells. Anyway I think changing to cell for nextIndexKey does not matter except that there is a new compare() API. Fine with carrying on as it is now in your patches. Thanks Lars. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342951#comment-14342951 ] Hadoop QA commented on HBASE-13109: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701810/13109-trunk-v3.txt against master branch at commit 4fb6f91cbad7d9b3c18f897ee3a4f70dc7c21595. ATTACHMENT ID: 12701810 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1938 checkstyle errors (more than the master's current 1937 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestScanEarlyTermination Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13032//console This message is automatically generated. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343534#comment-14343534 ] ramkrishna.s.vasudevan commented on HBASE-13109: [~larsh] Still going thro the logic of the patch. I verified the comparator API that was added. Seems fine to me. But one suggestion I have is, make the nextIndexedKEy to a Cell and create a KeyOnlyKeyValue out of it. Then use the normal CellComparator.compare(cell, cell). Will that not work out. What do you think? Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343587#comment-14343587 ] Lars Hofhansl commented on HBASE-13109: --- TestScanEarlyTermination passed locally. Also, thanks [~ram_krish] for looking at the comparator. And re: the KeyOnlyKeyValue, I saw that was done down in the AbstrackHFileScanner.reseekTo, there it's not time critical, since it only happens when we issued a seek anyway. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343575#comment-14343575 ] Lars Hofhansl commented on HBASE-13109: --- bq. make the nextIndexedKEy to a Cell and create a KeyOnlyKeyValue out of it. Then use the normal CellComparator.compare(cell, cell). I agree that would be nicer. But that would be *very* slow. This decision is made for every single KeyValue (if the SQM decides that it wants to seek). That's why I added a special compare that the key can be compared in place without creating a KV object or even a new byte[]. Note that there would be two Cell to be created: (1) the Cell representing the indexed key, and (2) the Cell representing the seek key in the SQM. With this patch no new objects are created at all. Just avoiding the creating the key array saved 0.7s over 4m rows (5 cols). Making KeyValues or Cells would be more expensive. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342597#comment-14342597 ] Lars Hofhansl commented on HBASE-13109: --- All the test failures have the same cause (using KeyOnlyKeyValue), I removed that part of the code (was just to avoid some code duplication). Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342754#comment-14342754 ] Lars Hofhansl commented on HBASE-13109: --- Numbers with new patch (avoiding the array creation helps): With patch: ||Wildcard||Col 2+4|| |3.9|4.4| The ExplicitColumnTracker is now only 10% slower than the wildcard column tracker (was almost 2x before). Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342803#comment-14342803 ] Hadoop QA commented on HBASE-13109: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701779/13109-trunk-v2.txt against master branch at commit 70ecf18817ef219389a9e024ff21ffb99b6615d9. ATTACHMENT ID: 12701779 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1938 checkstyle errors (more than the master's current 1937 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestBlocksRead Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13030//console This message is automatically generated. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342839#comment-14342839 ] stack commented on HBASE-13109: --- Should Scan.LOOK_AHEAD be deprecated/become a noop in case someone using it? We need to add more compare to KV? There ain't enough going on in there already (smile)? getNextIndexedKey makes sense but should we be returning byte [] ? Why not Cell? byte [] presumes a certain format? getKeyForNextRow is commented out. Remove? I like the way you add in this optimize method and it works or it doesn't. When will optimize be optimal? When will it not add value ( you say selecting 2 and 4 in above is worse case but generally?) Sorry for dumb questions. I don't know this stuff well. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342854#comment-14342854 ] Lars Hofhansl commented on HBASE-13109: --- Should deprecate Scan.LOOK_AHEAD in 1.0.1, so that we can remove it in 1.1. (per our policy that is possible) The indexed key comes out of the HFile as a key - and yes it presumes a KeyValue-key all over the place. :( Translating this into a Cell would be measurably slower, could try to record it as Cell in the first place. The compare in KV is needed unfortunately to avoid materializing the seek key just for this check. I did not like to write that part. Yeah need to remove commented stuff. Optimize is optimizing heuristically. * many versions of KVs are spread all over the HFiles. The heuristic of checking the top scanner might not be optimal in that case. But then too, we'd need to seek into many files for the reset, so compared the cost should be low. * SQM says SEEK, and optimize does not change this. In that case we wasted a compare, that's OK, seek is *way* more expensive. * It *is* a heuristic. In some one off cases we might be doing some SKIP before we end up seeking. I'd not be afraid to deploy for us in production (I am most worried that I got the new compare method wrong... Any chance eyeballing that [~stack]?) New patch coming to fix the test. The test is weird, setting the block size to 1 (yes, 1 byte), and then it counts the blocks loaded for Bloom filters - of course this throws this off. I will disable this optimization for Gets anyway, there's no point. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk-v2.txt, 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341966#comment-14341966 ] Lars Hofhansl commented on HBASE-13109: --- (I might work on optimizing requiring to call matcher.getKeyForNextColumn, since that produces a new byte[] that's I'd rather avoid) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342017#comment-14342017 ] Lars Hofhansl commented on HBASE-13109: --- Will look at those tomorrow. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341894#comment-14341894 ] Lars Hofhansl commented on HBASE-13109: --- The first patch does a bunch of things: # get rid of Scan.LOOK_AHEAD, that is no longer needed # Optimize KeyValue.KeyOnlyKeyValue a bit (to reuse the fields from KeyValue and save a few bytes of heap) - that part I could remove, I'm not using that anymore, but it seemed good anyway. # Uses the next indexed key that the HFileScanner already maintain to decide whether a seek would be likely to seek to a new block. If so the StoreScanner will continue to issue SEEKs, otherwise it will SKIP instead. # Adds a some helpers to KeyValueUtil to build a key array to avoid creating KeyValue objects unnecessarily Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341959#comment-14341959 ] Lars Hofhansl commented on HBASE-13109: --- Some more tests (similar to those in HBASE-9778, but this a different machine so don't compare them in absolute values): 4m row, 5 cols, 1 version. Without patch: ||Wildcard||Col 2+4|| |3.9|7.27| With patch: ||Wildcard||Col 2+4|| |3.9|5.1| (selecting columns 2 and 4 is the worst case) So this patch improves the ExplicitColumnTracker by almost 1/3rd, and the beauty of this change is that it will still work with very many versions, because it uses whether we can seek into another block as a metric to decide whether to seek or not. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341961#comment-14341961 ] Lars Hofhansl commented on HBASE-13109: --- I'll stop here until get some feedback. Any thoughts ([~stack]?) Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen a scenario of a very slow scan over a region using a timerange that happens to fall after the ts of any Cell in the region. Turns out we spend a lot of time seeking. Tested with a 5 column table, and the scan is 5x faster when the timerange falls before all Cells' ts. We can use the lookahead hint introduced in HBASE-9778 to do opportunistic SKIPing before we actually seek. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning
[ https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342012#comment-14342012 ] Hadoop QA commented on HBASE-13109: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701657/13109-trunk.txt against master branch at commit dd78f459e8f10e4587742a049e38d8c6b50dd0cb. ATTACHMENT ID: 12701657 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1938 checkstyle errors (more than the master's current 1937 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestScanWithBloomError org.apache.hadoop.hbase.regionserver.TestScanner org.apache.hadoop.hbase.regionserver.TestStoreFileRefresherChore org.apache.hadoop.hbase.regionserver.TestColumnSeeking org.apache.hadoop.hbase.regionserver.TestMinVersions org.apache.hadoop.hbase.regionserver.TestResettingCounters Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13020//console This message is automatically generated. Make better SEEK vs SKIP decisions during scanning -- Key: HBASE-13109 URL: https://issues.apache.org/jira/browse/HBASE-13109 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Attachments: 13109-trunk.txt I'm re-purposing this issue to add a heuristic as to when to SEEK and when to SKIP Cells. This has come up in various issues, and I think I have a way to finally fix this now. HBASE-9778, HBASE-12311, and friends are related. --- Old description --- This is a continuation of HBASE-9778. We've seen