[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381180#comment-14381180
 ] 

Hudson commented on HBASE-13109:


ABORTED: Integrated in Phoenix-master #638 (See 
[https://builds.apache.org/job/Phoenix-master/638/])
PHOENIX-1642 Make Phoenix Master Branch pointing to HBase1.0.0 - ADDENDUM for 
HBASE-13109 (enis: rev ad2ad0cefd5d19a9bc8434555a9ecbb55c78)
* 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/scanner/FilteredKeyValueScanner.java
* 
phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/IndexHalfStoreFileReader.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-0.98-v5.txt, 
 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 
 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372491#comment-14372491
 ] 

Lars Hofhansl commented on HBASE-13109:
---

[~vik.karma] confirmed that the scan mentioned in the description is about 3x 
faster. That's a 3x end-to-end improvement in an M/R job!


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-0.98-v5.txt, 
 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk-v4.txt, 
 13109-trunk-v5.txt, 13109-trunk.txt, nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366055#comment-14366055
 ] 

Lars Hofhansl commented on HBASE-13109:
---

But the HBase code would call into the scanners injected by the Phoenix 
coprocessors... Anyway, since it works fine, there's something I am missing, 
which is just fine :)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-17 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365866#comment-14365866
 ] 

Mujtaba Chohan commented on HBASE-13109:


[~jamestaylor] Checked. Mutable/local index using existing 4.3.0 release works 
fine with 0.98.12-SNAPSHOT.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366046#comment-14366046
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Pffeeewww... Good. (I admit I am surprised, since the HBase core code would 
call the new method on the HFileScanner and KeyValueScanner interfaces)

The fact remains, though, that we have to get a new version of Phoenix 4.3 and 
4.2 out before we ship 0.98.12, else there would be no released version of 
Phoenix to compile against the current version of 0.98. I discussed offline 
with [~apurtell], and we think that might the best option. I assume if we can't 
make that, we'll delay this until 0.98.13.

(Sorry for the pain caused here. It's instructive, though, and worth the 
performance gains)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366050#comment-14366050
 ] 

Andrew Purtell commented on HBASE-13109:


bq. I admit I am surprised, since the HBase core code would call the new method 
on the HFileScanner and KeyValueScanner interfaces

Phoenix code does not call the new method, of course, which is why the binary 
compatibility checker didn't flag this as a problem. It would have been a 
different story if there was a removal or rename of a specific method used by 
Phoenix.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364428#comment-14364428
 ] 

James Taylor commented on HBASE-13109:
--

Thanks, [~apurtell]. [~mujtabachohan] - would you mind trying existing Phoenix 
binary release (4.3.0 is fine) against this snapshot. In particular, do mutable 
and local indexing still work correctly?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364356#comment-14364356
 ] 

Andrew Purtell commented on HBASE-13109:


I've confirmed 0.98.12 snapshots are available in Apache Maven now.

Be sure the Apache snapshots repository is included in your POM:
{noformat}
repository
  idapache.snapshots/id
  urlhttp://repository.apache.org/snapshots//url
  snapshots
enabledtrue/enabled
  /snapshots
/repository
{noformat}

Use the versions 0.98.12-hadoop2-SNAPSHOT for the Hadoop 2 build, 
0.98.12-hadoop1-SNAPSHOT for the Hadoop 1 build.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363659#comment-14363659
 ] 

stack commented on HBASE-13109:
---

bq. Would it be possible to get the 0.98.12 snapshot into maven so we can see 
if/how Phoenix will work with it?

Can't you not build it local? This will install it in your local repo. You can 
then check phoenix against the locally installed version?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363670#comment-14363670
 ] 

Andrew Purtell commented on HBASE-13109:


I'll publish a snapshot today, so you can do either. :-)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362243#comment-14362243
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Ultimately I only need this method in: KeyValueHeap, StoreScanner, 
StoreFileScanner, and AbstractHFileReader.Scanner. But doing only would litter 
class casts all over the code along with class checks in hot code paths. I do 
not think that makes sense.

So, we can undo this from 0.98 altogether. (but -1 on that from me). Or we can 
delay this until 0.98.13. By then Phoenix needs to have new minor versions of 
4.2 and 4.3. I'd be +0 on that. And just in case, -1 on delaying this further 
than 0.98.13...


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-15 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362250#comment-14362250
 ] 

Andrew Purtell commented on HBASE-13109:


Thanks for checking. I'd be +0 on a delay as well 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-15 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362251#comment-14362251
 ] 

Andrew Purtell commented on HBASE-13109:


And -1 for permanent revert

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362609#comment-14362609
 ] 

Lars Hofhansl commented on HBASE-13109:
---

[~giacomotaylor], what do you think? If we delay to 0.98.13, I think we can 
have new versions of Phoenix by then.
(If not, might as well leave it in 0.98.12)

We should also check a version Phoenix built against t

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-15 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362744#comment-14362744
 ] 

James Taylor commented on HBASE-13109:
--

Would it be possible to get the 0.98.12 snapshot into maven so we can see 
if/how Phoenix will work with it? We plan on releasing a 4.3.1 soon - perhaps 
we can start a vote in a week. What's your time frame for 0.98.12?



 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362123#comment-14362123
 ] 

James Taylor commented on HBASE-13109:
--

Not positive it's broken, so we should try it for sure. Is it available in 
maven? 

Yes, I meant the combined HBase+Phoenix community. Of course it's absolutely 
your call - would just be good if we knew in advance.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362121#comment-14362121
 ] 

Andrew Purtell commented on HBASE-13109:


[~giacomotaylor] JavaACC says the changes have no binary compat impact with 
already compiled code. I don't see how previous releases of Phoenix are broken. 
I think that states the problem too strongly. It is true that recompilation 
will be problematic without accommodation (see above) or local patching, but 
that's not the same thing as broken, right?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362125#comment-14362125
 ] 

Andrew Purtell commented on HBASE-13109:


Let's see what Lars says first. I'd like to accommodate Phoenix if we can. 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362197#comment-14362197
 ] 

Lars Hofhansl commented on HBASE-13109:
---

I do not think I can fix this without the extra methods. Lemme have a look.
We can also undo this for 0.98.12 and put it into 0.98.13 instead, which time 
there should be new point versions of Phoenix.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361526#comment-14361526
 ] 

Andrew Purtell commented on HBASE-13109:


So here's where I think we are:
- This commit is fine.
- No new base classes and inheritance hierarchy changes
- Handle updates to Phoenix on a Phoenix JIRA. I can push a 0.98.12 SNAPSHOT to 
Maven if that would help.
Any issues with this [~giacomotaylor] [~lhofhansl] ?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361604#comment-14361604
 ] 

James Taylor commented on HBASE-13109:
--

I believe this essentially breaks all existing 4.x Phoenix releases (at a 
minimum, it would break mutable and local secondary indexes). The only way 
Phoenix users will be able to use 0.98.12+ is to wait for the next Phoenix 
release and upgrade to that one (at a minimum on the server side). Not sure if 
this is a problem for the user community. 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358923#comment-14358923
 ] 

Andrew Purtell commented on HBASE-13109:


I would rather not impose a performance penalty on 0.98 for sake of 
compatibility with Phoenix where they've used a private interface. When I do 
0.98 RC I check if Phoenix compiles using the heads of master and sometimes 
branch 4.0 if I have time. We could get that working today actually. I could 
publish a 0.98.12 SNAPSHOT to Maven now including this change, and update 
Phoenix POMs on branches master and 4.0 to use the snapshot, and add the 
necessary methods there. [~jamestaylor]?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358892#comment-14358892
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Hmm... Not sure what we can do other than (1) by using a private/evolving 
interface you're on your own or (2) roll this back from 0.98.
I'd be fine with #2.

I suppose we could add implementations of these methods In Phoenix now (it's 
perfectly OK to just return null, just means this optimization will not be 
used).


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359047#comment-14359047
 ] 

James Taylor commented on HBASE-13109:
--

Would it be possible to have base classes for these we can extend from to 
shield us from interface additions? 

The FilteredKeyValueScanner class is deep in the bowels of mutable secondary 
indexing - [~jesse_yates] - any ideas for how to get this on to non 
private/evolving interfaces?

The HFileScanner anonymous implementation is in the bowels of local indexes. 
Same question, [~rajeshbabu] - any ideas for how to get this on to non 
private/evolving interfaces?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359067#comment-14359067
 ] 

Andrew Purtell commented on HBASE-13109:


bq. Would it be possible to have base classes for these we can extend from to 
shield us from interface additions?

Yes we can add these, but it's still one incompat change to move to using the 
base classes. Ok? 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359074#comment-14359074
 ] 

Andrew Purtell commented on HBASE-13109:


Also, see above my suggestion to move to a 0.98 SNAPSHOT after making the 
change. Or, I can just ignore that Phoenix won't compile until updated with the 
0.98.12 RC when working on the next release.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359115#comment-14359115
 ] 

Andrew Purtell commented on HBASE-13109:


bq. Want to make sure I understand the b/w compat implications. Does this mean 
that our current 4.3 and below releases will no longer work with 0.98.12 and 
above? And that 4.4 and above will only work with 0.98.12 and above?

= 4.3 won't compile.

I ran the binary compatibility checker and it says the addition of the abstract 
method getNextIndexedKey( ) to the interfaces has no effect.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359130#comment-14359130
 ] 

Andrew Purtell commented on HBASE-13109:


If we add base classes and change the inheritance hierarchy that may impact 
binary compat.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359189#comment-14359189
 ] 

Lars Hofhansl commented on HBASE-13109:
---

It'd be best to add the methods to Phoenix. If we do not add an override 
annotation that would work with old and new versions of HBase.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-12 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359085#comment-14359085
 ] 

James Taylor commented on HBASE-13109:
--

Want to make sure I understand the b/w compat implications. Does this mean that 
our current 4.3 and below releases will no longer work with 0.98.12 and above? 
And that 4.4 and above will only work with 0.98.12 and above?

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357707#comment-14357707
 ] 

Andrew Purtell commented on HBASE-13109:


[~jesse_yates]

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357703#comment-14357703
 ] 

Andrew Purtell commented on HBASE-13109:


The commit of this to 0.98 branch breaks Phoenix compilation if using 
-Dhbase.version=0.98.12-SNAPSHOT (after installing latest 0.98 into the local 
Maven cache):
{noformat}
[ERROR] 
/Users/apurtell/src/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/IndexHalfStoreFileReader.java:[141,35]
 anonymous org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReader$1 is 
not abstract and does not override abstract method getNextIndexedKey() in 
org.apache.hadoop.hbase.io.hfile.HFileScanner
[ERROR] 
/Users/apurtell/src/phoenix/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/scanner/FilteredKeyValueScanner.java:[37,8]
 org.apache.phoenix.hbase.index.scanner.FilteredKeyValueScanner is not abstract 
and does not override abstract method getNextIndexedKey() in 
org.apache.hadoop.hbase.regionserver.KeyValueScanner
{noformat}

KeyValueScanner and HFileScanner are both marked as InterfaceAudience.Private. 
What should we do here?

[~jamestaylor]

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347025#comment-14347025
 ] 

Ted Yu commented on HBASE-13109:


+1

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347800#comment-14347800
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-1.1 #247 (See 
[https://builds.apache.org/job/HBase-1.1/247/])
HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 
f5020e9c1a98727cb100f24294df50072d599bf8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347797#comment-14347797
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-1.0 #788 (See 
[https://builds.apache.org/job/HBase-1.0/788/])
HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 
a3e9325150de4ad89f3032535be8e20fb352f182)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Jonathan Lawlor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347894#comment-14347894
 ] 

Jonathan Lawlor commented on HBASE-13109:
-

Looks like this introduced some new javadoc warnings that are being called out 
in other precommit build checks:

{quote}
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java:82:
 warning - @param argument lookAhead is not a parameter name.
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:587:
 warning - @param argument off is not a parameter name.
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:587:
 warning - @param argument len is not a parameter name.
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:602:
 warning - @param argument off is not a parameter name.
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:602:
 warning - @param argument len is not a parameter name.
{quote}


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348033#comment-14348033
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-1.1 #249 (See 
[https://builds.apache.org/job/HBase-1.1/249/])
HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 
1cdcb6e9b8d386d43b482ff8a5aa6f1c0e3c6791)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348043#comment-14348043
 ] 

Hudson commented on HBASE-13109:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #840 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/840/])
HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 
b3bd0016492eb99e3a83353f0879bfddebff4ec1)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348065#comment-14348065
 ] 

Hudson commented on HBASE-13109:


SUCCESS: Integrated in HBase-TRUNK #6207 (See 
[https://builds.apache.org/job/HBase-TRUNK/6207/])
HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 
0bdab85b065bd0876152ac30c2ec6d08adae8006)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347964#comment-14347964
 ] 

Hudson commented on HBASE-13109:


SUCCESS: Integrated in HBase-0.98 #883 (See 
[https://builds.apache.org/job/HBase-0.98/883/])
HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 
b3bd0016492eb99e3a83353f0879bfddebff4ec1)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348150#comment-14348150
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-0.98 #884 (See 
[https://builds.apache.org/job/HBase-0.98/884/])
HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 
2eda262dfee9889a008cb53d5c8a2a73959934e4)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348086#comment-14348086
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-1.0 #789 (See 
[https://builds.apache.org/job/HBase-1.0/789/])
HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 
d72bb2f6a60bdf2ac9daf639f18030eee2ea9773)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347788#comment-14347788
 ] 

Hudson commented on HBASE-13109:


SUCCESS: Integrated in HBase-TRUNK #6205 (See 
[https://builds.apache.org/job/HBase-TRUNK/6205/])
HBASE-13109 Make better SEEK vs SKIP decisions during scanning. (larsh: rev 
464e7ce685486e3ede13ec2351b45b0a0b65696c)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347928#comment-14347928
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Uh oh... Lemme fix those.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347979#comment-14347979
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Updated all branches.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348327#comment-14348327
 ] 

Hudson commented on HBASE-13109:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #841 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/841/])
HBASE-13109 Fix Javadoc warning; and some misc checkstyle warnings (larsh: rev 
2eda262dfee9889a008cb53d5c8a2a73959934e4)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.12

 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344810#comment-14344810
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


The optimize() logic makes sense.  I think particularly it is going to be 
useful when there is one version of a cell and the filter/trackers say 
SEEK_TO_ROW/COL.
Changing the HFileBlock Index to Cell is fine unless you have a concern on the 
number of objects being created and thrown away.  In that case we may have to 
have a different approach but for the case of NO_INDEX_KEY-we cannot go with 
'==' check.  In this case that is not there.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345276#comment-14345276
 ] 

stack commented on HBASE-13109:
---

[~lhofhansl] if you describe test you'd like I can try it here.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345553#comment-14345553
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Actually PE only write single column rows, this needs many columns (or deletes) 
to show any improvement.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346481#comment-14346481
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


+1 on patch. Nice work!!

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346204#comment-14346204
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Did some more tests with Phoenix against 0.98, including some of the tests they 
used to validate their optimization to always use the WildcardColumnMatcher and 
doing the filtering themselves to avoid the cost of the ExplicitColumnTracker 
that does the seeking. Testing with 7 columns. One scenario was with all 7 
columns in the same CF the other each column in its column family:

Ran two queries: q1 = select count(1) where v3 =  and v5 =  and q2 = select 
avg(v2) where v3 =  and v5 = 

1CF case:
|| ||q1 w/ Phoenix p[t||q1 w/o Phoenix opt||q2 w/ Phoenix p[t||q2 w/o Phoenix 
opt||
|w/o patch|12.9|8.4|18.0|8.3|
|w/ patch|7.5|7.2|7.5|7.1|

Two observation:
# Even with the Phoenix optimization this is faster because a bunch of 
SEEK_NEXT_ROWs are saved unless they're necessary.
# The whole optimization is unnecessary now, it saves less than 10% in the 
*best* case with only one version per cell


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346217#comment-14346217
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Same, but with each column in its own CF (in this case Phoenix does not use its 
WildcardTracker + Filter optimization)

6CF case:
|| ||q1||q2||
|w/o patch|15.3|15.5|
|w/ patch|9.14|9.19|

Any objection committing this to all branches.

[~giacomotaylor], FYI (we can probably remove the ColumnProjectionFilter 
optimization when this is in)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-03 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346235#comment-14346235
 ] 

James Taylor commented on HBASE-13109:
--

Awesome - nice work, [~lhofhansl]!

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344622#comment-14344622
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


{code}
this.nextIndexedKV == HConstants.NO_NEXT_INDEXED_KV
{code}
This change in the patch is not quite right. Sorry about that. Here we may have 
to do a compare only if we change to Cell. 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344671#comment-14344671
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Oh and saw your sample patch. I think my version is even more radical... I 
change indexed key to Cell everywhere above the HFileBlockIndex :)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344598#comment-14344598
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Ok... With deletes. Same as above but with an additional 400k deletes (deleted 
all columns every 10th row).

Without patch:
||Wildcard||Col 2+4||
|4.38|12.0|

With patch:
||Wildcard||Col 2+4||
|4.39|4.74|


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344663#comment-14344663
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Ah. Too late, made a patch already :) And it does make things nicer. I'm fine 
with either -v4 or -v5.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344510#comment-14344510
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Some more test with many HFiles (4m rows, 5 cols, 1 version - as above). No 
compactions at all - 25 HFiles of 10mb each.

Without patch:
||Wildcard||Col 2+4||
|4.59|13.5|

With patch:
||Wildcard||Col 2+4||
|4.38|4.89|

Almost a 3x improvement. I have convinced myself that this is good.

Somebody up for independent tests? I have a 0.98 patch as well (in fact that's 
the one I have used for testing)?


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344656#comment-14344656
 ] 

Hadoop QA commented on HBASE-13109:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702089/nextIndexKVChange_new.patch
  against master branch at commit daed00fc98167870463e77b620e9adb6ce9b204d.
  ATTACHMENT ID: 12702089

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.
{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestBlocksScanned

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.camel.test.junit4.CamelTestSupport.doStopCamelContext(CamelTestSupport.java:450)
at 
org.apache.camel.test.junit4.CamelTestSupport.tearDown(CamelTestSupport.java:351)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13052//console

This message is automatically generated.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old 

[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344668#comment-14344668
 ] 

Lars Hofhansl commented on HBASE-13109:
---

TestBlockScanned passed locally.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk-v5.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344605#comment-14344605
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


Am not hijacking the issue. Just trying to convey my point here. 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344626#comment-14344626
 ] 

Lars Hofhansl commented on HBASE-13109:
---

NP [~ram_krish]. Thanks for taking a look!

I am not actually too concerned about the KeyOnlyKeyValueObject (actually, 
could you have a look at the first patch I attached here, where I optimized it 
a bit?)
I can make a Cell from the indexed key (in trunk at least). But 
KeyValue.KVComparator.compareOnlyKeyPortion(Cell, Cell) will not work, because 
I cannot make a Cell from the seek Cell in SQM without materializing the 
byte[]... That's the part I have to avoid.

What I could do (in trunk) is... Instead of:
{code}
public int compareKey(byte[] key, int koff, int klen,
byte[] row, int roff, int rlen,
byte[] fam, int foff, int flen,
byte[] col, int coff, int clen,
long ts, byte type)
{code}

We'd wrap the indexed key in a KeyOnlyKeyValue and have:
{code}
public int compareKey(Cell cell,
byte[] row, int roff, int rlen,
byte[] fam, int foff, int flen,
byte[] col, int coff, int clen,
long ts, byte type)
{code}

I actually think then we should do it all the way down at AbstractHFileScanner 
and store the nextIndexedKey as Cell instead of byte[].

Lemme do that.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt, nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344524#comment-14344524
 ] 

Lars Hofhansl commented on HBASE-13109:
---

One more test I'll do is with deletes.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 
 13109-trunk-v4.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344647#comment-14344647
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


bq.ut KeyValue.KVComparator.compareOnlyKeyPortion(Cell, Cell) will not work, 
because I cannot make a Cell from the seek Cell in SQM without materializing 
the byte[]... That's the part I have to avoid.
I understand this as to why you have to avoid because you will not use the Cell 
from SQM directly as the ts and type is the one that you will be passing. 
We tried out a way in our internal branch in such cases where we want the 
FirstOnRow, LastOnCol, firstOnCol type of Kvs for which we created a new 
FirstOnCol cell object passing the cell - but the getTS and getType would 
return LATEST_TIMESTAMP/MIN_TIMESTAMP and type as MAX/MIN based on what we want 
such that it is two cells.  Anyway I think changing to cell for nextIndexKey 
does not matter except that there is a new compare() API. Fine with carrying on 
as it is now in your patches. Thanks Lars.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 13109-0.98-v4.txt, 13109-trunk-v2.txt, 
 13109-trunk-v3.txt, 13109-trunk-v4.txt, 13109-trunk.txt, 
 nextIndexKVChange_new.patch


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342951#comment-14342951
 ] 

Hadoop QA commented on HBASE-13109:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701810/13109-trunk-v3.txt
  against master branch at commit 4fb6f91cbad7d9b3c18f897ee3a4f70dc7c21595.
  ATTACHMENT ID: 12701810

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1938 checkstyle errors (more than the master's current 1937 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestScanEarlyTermination

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13032//console

This message is automatically generated.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in 

[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343534#comment-14343534
 ] 

ramkrishna.s.vasudevan commented on HBASE-13109:


[~larsh]
Still going thro the logic of the patch. I verified the comparator API that was 
added. Seems fine to me.
But one suggestion I have is, make the nextIndexedKEy to a Cell and create a 
KeyOnlyKeyValue out of it. Then use the normal CellComparator.compare(cell, 
cell).  Will that not work out. What do you think? 

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343587#comment-14343587
 ] 

Lars Hofhansl commented on HBASE-13109:
---

TestScanEarlyTermination passed locally.

Also, thanks [~ram_krish] for looking at the comparator.
And re: the KeyOnlyKeyValue, I saw that was done down in the 
AbstrackHFileScanner.reseekTo, there it's not time critical, since it only 
happens when we issued a seek anyway.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343575#comment-14343575
 ] 

Lars Hofhansl commented on HBASE-13109:
---

bq. make the nextIndexedKEy to a Cell and create a KeyOnlyKeyValue out of it. 
Then use the normal CellComparator.compare(cell, cell).

I agree that would be nicer. But that would be *very* slow. This decision is 
made for every single KeyValue (if the SQM decides that it wants to seek). 
That's why I added a special compare that the key can be compared in place 
without creating a KV object or even a new byte[].

Note that there would be two Cell to be created: (1) the Cell representing the 
indexed key, and (2) the Cell representing the seek key in the SQM. With this 
patch no new objects are created at all.

Just avoiding the creating the key array saved 0.7s over 4m rows (5 cols). 
Making KeyValues or Cells would be more expensive.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk-v3.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342597#comment-14342597
 ] 

Lars Hofhansl commented on HBASE-13109:
---

All the test failures have the same cause (using KeyOnlyKeyValue), I removed 
that part of the code (was just to avoid some code duplication).

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342754#comment-14342754
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Numbers with new patch (avoiding the array creation helps):
With patch:
||Wildcard||Col 2+4||
|3.9|4.4|

The ExplicitColumnTracker is now only 10% slower than the wildcard column 
tracker (was almost 2x before).


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342803#comment-14342803
 ] 

Hadoop QA commented on HBASE-13109:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701779/13109-trunk-v2.txt
  against master branch at commit 70ecf18817ef219389a9e024ff21ffb99b6615d9.
  ATTACHMENT ID: 12701779

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1938 checkstyle errors (more than the master's current 1937 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestBlocksRead

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13030//console

This message is automatically generated.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 

[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342839#comment-14342839
 ] 

stack commented on HBASE-13109:
---

Should Scan.LOOK_AHEAD be deprecated/become a noop in case someone using it?

We need to add more compare to KV? There ain't enough going on in there already 
(smile)?

getNextIndexedKey makes sense but should we be returning byte [] ?  Why not 
Cell? byte [] presumes a certain format?

getKeyForNextRow is commented out. Remove?

I like the way you add in this optimize method and it works or it doesn't.

When will optimize be optimal?  When will it not add value ( you say selecting 
2 and 4 in above is worse case but generally?) Sorry for dumb questions. I 
don't know this stuff well.









 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-03-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342854#comment-14342854
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Should deprecate Scan.LOOK_AHEAD in 1.0.1, so that we can remove it in 1.1. 
(per our policy that is possible)

The indexed key comes out of the HFile as a key - and yes it presumes a 
KeyValue-key all over the place. :(
Translating this into a Cell would be measurably slower, could try to record it 
as Cell in the first place.

The compare in KV is needed unfortunately to avoid materializing the seek key 
just for this check. I did not like to write that part.

Yeah need to remove commented stuff.

Optimize is optimizing heuristically.
* many versions of KVs are spread all over the HFiles. The heuristic of 
checking the top scanner might not be optimal in that case. But then too, we'd 
need to seek into many files for the reset, so compared the cost should be low.
* SQM says SEEK, and optimize does not change this. In that case we wasted a 
compare, that's OK, seek is *way* more expensive.
* It *is* a heuristic. In some one off cases we might be doing some SKIP before 
we end up seeking.

I'd not be afraid to deploy for us in production (I am most worried that I got 
the new compare method wrong... Any chance eyeballing that [~stack]?)

New patch coming to fix the test. The test is weird, setting the block size to 
1 (yes, 1 byte), and then it counts the blocks loaded for Bloom filters - of 
course this throws this off. I will disable this optimization for Gets anyway, 
there's no point.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk-v2.txt, 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341966#comment-14341966
 ] 

Lars Hofhansl commented on HBASE-13109:
---

(I might work on optimizing requiring to call matcher.getKeyForNextColumn, 
since that produces a new byte[] that's I'd rather avoid)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342017#comment-14342017
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Will look at those tomorrow.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341894#comment-14341894
 ] 

Lars Hofhansl commented on HBASE-13109:
---

The first patch does a bunch of things:
# get rid of Scan.LOOK_AHEAD, that is no longer needed
# Optimize KeyValue.KeyOnlyKeyValue a bit (to reuse the fields from KeyValue 
and save a few bytes of heap) - that part I could remove, I'm not using that 
anymore, but it seemed good anyway.
# Uses the next indexed key that the HFileScanner already maintain to decide 
whether a seek would be likely to seek to a new block. If so the StoreScanner 
will continue to issue SEEKs, otherwise it will SKIP instead.
# Adds a some helpers to KeyValueUtil to build a key array to avoid creating 
KeyValue objects unnecessarily


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341959#comment-14341959
 ] 

Lars Hofhansl commented on HBASE-13109:
---

Some more tests (similar to those in HBASE-9778, but this a different machine 
so don't compare them in absolute values): 4m row, 5 cols, 1 version.

Without patch:
||Wildcard||Col 2+4||
|3.9|7.27|

With patch:
||Wildcard||Col 2+4||
|3.9|5.1|
(selecting columns 2 and 4 is the worst case)

So this patch improves the ExplicitColumnTracker by almost 1/3rd, and the 
beauty of this change is that it will still work with very many versions, 
because it uses whether we can seek into another block as a metric to decide 
whether to seek or not.


 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341961#comment-14341961
 ] 

Lars Hofhansl commented on HBASE-13109:
---

I'll stop here until get some feedback. Any thoughts ([~stack]?)

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen a scenario of a very slow scan over a region using a timerange 
 that happens to fall after the ts of any Cell in the region.
 Turns out we spend a lot of time seeking.
 Tested with a 5 column table, and the scan is 5x faster when the timerange 
 falls before all Cells' ts.
 We can use the lookahead hint introduced in HBASE-9778 to do opportunistic 
 SKIPing before we actually seek.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13109) Make better SEEK vs SKIP decisions during scanning

2015-02-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342012#comment-14342012
 ] 

Hadoop QA commented on HBASE-13109:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701657/13109-trunk.txt
  against master branch at commit dd78f459e8f10e4587742a049e38d8c6b50dd0cb.
  ATTACHMENT ID: 12701657

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1938 checkstyle errors (more than the master's current 1937 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
  org.apache.hadoop.hbase.regionserver.TestScanner
  
org.apache.hadoop.hbase.regionserver.TestStoreFileRefresherChore
  org.apache.hadoop.hbase.regionserver.TestColumnSeeking
  org.apache.hadoop.hbase.regionserver.TestMinVersions
  org.apache.hadoop.hbase.regionserver.TestResettingCounters

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13020//console

This message is automatically generated.

 Make better SEEK vs SKIP decisions during scanning
 --

 Key: HBASE-13109
 URL: https://issues.apache.org/jira/browse/HBASE-13109
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 13109-trunk.txt


 I'm re-purposing this issue to add a heuristic as to when to SEEK and when to 
 SKIP Cells. This has come up in various issues, and I think I have a way to 
 finally fix this now. HBASE-9778, HBASE-12311, and friends are related.
 --- Old description ---
 This is a continuation of HBASE-9778.
 We've seen