[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2014-03-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930909#comment-13930909
 ] 

Lars Hofhansl commented on HBASE-9000:
--

HBASE-9778 is committed now. It allows a configurable optional look ahead in 
ExplicitColumn tracker. Hence it'll do it in both Memstore and StoreScanner. I 
did look into doing that in StoreFileScanner, but most of the time of a reseek 
is spent before we even get to StoreFileScanner (checking whether we are 
forward seeking, checking whether we're still on the same block, seeking every 
HFile forward, etc).


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2014-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930974#comment-13930974
 ] 

Hadoop QA commented on HBASE-9000:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611241/hbase-9000.patch
  against trunk revision .
  ATTACHMENT ID: 12611241

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  public final static String MEMSTORE_RESEEK_LINEAR_SEARCH_LIMIT_KEY = 
hbase.hregion.memstore.linear.search.limit;
+  private static final Option NUM_COLUMNS_PER_ROW_OPTION = new Option(c, 
columns-per-row, true, columns per row);
+  private static final Option NUM_VERSIONS_OPTION = new Option(v, 
versions, true, number of versions);
+  private static final Option ROW_KEY_SIZE_OPTION = new Option(R, 
row-key-size, true, size of row-key);
+  private static final Option QUALIFIER_SIZE_OPTION = new Option(q, 
qualifier-size, true, size of qualifier);
+  private static final Option VALUE_SIZE_OPTION = new Option(V, 
value-size, true, size of value);
+  private static final Option RANDOM_SEED_OPTION = new Option(s, 
random-seed, true, random seed);
+  private void benchmarkReseeks(String name, KeyValueScanner scanner, 
ListKeyValue keys) throws IOException {

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8950//console

This message is automatically generated.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819616#comment-13819616
 ] 

Lars Hofhansl commented on HBASE-9000:
--

Just to add some data here. I loaded some data into a table. Then scanned it 
with Phoenix: took 34s. Then I flushed the table, scanned again: 5s. There 
might have been some other factors at work so this is a bit anecdotal, but this 
definitely needs some work.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13819630#comment-13819630
 ] 

Lars Hofhansl commented on HBASE-9000:
--

Never mind that was because of caching. When I major compact (and thus wipe the 
cache) it takes 34s again. I would delete my previous comment, but that would 
confusing.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-06 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814700#comment-13814700
 ] 

Chao Shi commented on HBASE-9000:
-

bq. Should we do the same thing in StoreFileScanner? 
Yes, I think so.

bq. If so, why not do this in StoreScanner, for example, call next some times 
before call reseek...
This is because StoreScanner does not have enough knownlege to judge whether do 
a reseek vs. several times of next. As  discussed before in this thread, an 
attempt to call next to do linear reseek may hit uncached block, whose cost is 
huge compared to a logarithmic reseek that touches only cached index blocks.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-05 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814608#comment-13814608
 ] 

chunhui shen commented on HBASE-9000:
-

[~stepinto]
I understand the scenario which the patch is used for.
Should we do the same thing in StoreFileScanner?  
If so, why not do this in StoreScanner, for example, call next some times 
before call reseek...

As personal view, such a action seems a little rude

+0 from me


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-05 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814609#comment-13814609
 ] 

chunhui shen commented on HBASE-9000:
-

I'm sorry I have no better idea to optimize performance for this scenariofor

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-04 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813676#comment-13813676
 ] 

Chao Shi commented on HBASE-9000:
-

Hi [~zjushch],

bq. How to decide the config value?
I think most users should use the default value. This patch optimize 
performance on scenario that reseek on MemStore is the bottleneck, for example, 
scan with a filter that skips a lot of KVs. In this case, you need to tweak 
this value and make sure linear seeks happen rather than reseeks (i.e.. set 
this value to be greater than #versions in MemStore, if you are using 
SEEK_NEXT_COL). As most users don't have much versions per KV in their 
MemStore, I think the default value should play well.

bq. IMPO, the new code in MemStore seems not friendly
Could you please explain more on what can be improved?

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810444#comment-13810444
 ] 

Ted Yu commented on HBASE-9000:
---

Patch looks good.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-31 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810980#comment-13810980
 ] 

chunhui shen commented on HBASE-9000:
-

As the above performance tests, the patch doesn't fit all.

How to decide the config value?

IMPO,  the new code in MemStore seems not friendly

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808781#comment-13808781
 ] 

Chao Shi commented on HBASE-9000:
-

bq. Hah. I just something very similar, but actually in StoreScanner.  Not 
quite ready, but assumes a seek within a row or to the next row is a near seek, 
and the tries next() a few time (I picked 10 as a default).

Did you mean to call next at StoreScanner level? I'm not sure if it is possible 
to do this there, as it may not have necessary knowledge about the cost of next 
vs. reseek. For example, a next attempt that causes reading another uncached 
block from FS is probably worse than do a reseek. I may be wrong as I didn't 
read your implementation.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809327#comment-13809327
 ] 

Lars Hofhansl commented on HBASE-9000:
--

Yeah I agree. It's actually better to do that at the MestoreScanner and 
StoreFileScanner level.


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809366#comment-13809366
 ] 

Ted Yu commented on HBASE-9000:
---

'reseek to next row' gets slower with patch.
If linear.search.limit is lowered, would the slow down be less ?

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809383#comment-13809383
 ] 

Ted Yu commented on HBASE-9000:
---

Some comments on the benchmark:
Please add Apache license header.
Add class javadoc for MemStoreReseekBenchmark.
{code}
+  private final Random random = new Random();
{code}
Can you use seed for the above call and log the value of the seed ?
{code}
+  private Listbyte[] rows; private Listbyte[] qualifiers;
{code}
Please put the above on two lines.
{code}
+  boolean ok = scanner.reseek(key);
+  if (!ok) {
+throw new AssertionError(!ok);
{code}
Please print the key which caused assertion.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809402#comment-13809402
 ] 

Lars Hofhansl commented on HBASE-9000:
--

bq. 'reseek to next row' gets slower with patch.

This is because the limit of 20 not optimal for this particular case. We'll 
never get it 100% right for all cases, what we need to do is to avoid the worst 
cases. In this case it'll be faster per op if the limit is increased to 30 (my 
guess) or lowered.


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809870#comment-13809870
 ] 

Chao Shi commented on HBASE-9000:
-

bq. 'reseek to next row' gets slower with patch. If linear.search.limit is 
lowered, would the slow down be less ?

Yes, the optimal value should vary case by case. I think a default value of 5 
should not add much observable overhead.

I think a long-term solution would be implementing our own version of lock-free 
skip-list, where we can access its higher-level of next pointers (i.e. to skip) 
from the current position. This patch could be a temporary solution for now, as 
it is very simple.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809889#comment-13809889
 ] 

Ted Yu commented on HBASE-9000:
---

@Chao:
Thanks for the quick response.

Can you update the table using new patch to see 'reseek to next row' gets 
faster ?

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809906#comment-13809906
 ] 

Hadoop QA commented on HBASE-9000:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611241/hbase-9000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7684//console

This message is automatically generated.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-30 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809958#comment-13809958
 ] 

Chao Shi commented on HBASE-9000:
-

I re-ran the benchmark program, and get the following numbers. (As there the 
overhead is not significant, the numbers below are the median of 5 runs.)

||operation||trunk||w/ patch (n=5)||w/patch (n=20)
|reseek to next row|5.82 us|5.76 us|6.14 us|
|reseek to next column|3.397 us|0.596 us|0.572 us|

(where n is the limit of max linear seeks)

numbers are varying within +-10% between each run.

bq. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.
bq. -1 site. The patch appears to cause mvn site goal to fail.

These two QA -1s seem not to be related with my patch.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, hbase-9000.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-29 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807774#comment-13807774
 ] 

Liang Xie commented on HBASE-9000:
--

[~stepinto], nice work!

Hi [~lhofhansl], would you like to impl your idea about far/near reseek in 
this jira?  or let's just do a backport: change the current always far 
behivor into a similar manner like 0.89-fb's here? thanks!

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808517#comment-13808517
 ] 

Lars Hofhansl commented on HBASE-9000:
--

bq. or let's just do a backport: change the current always far behivor into a 
similar manner like 0.89-fb's here
What does this entail? reseeking can be useful in many cases, I would not want 
to generally go back and disable that.
We should also investigate other data structures for the memstore.
And lastly, in a typical system only a small fraction of the data is in the 
memstore, the majority of the data will in HFiles and hence be scanned with 
StoreFileScanners - if that would not be the case I would not advocate the use 
of HBase and suggest something like memcached, etc, instead.

That all said, opportunistically performing a few nexts and only then issueing 
a reseek would be a good addition. Could use MAX_VERSIONS as guidepost here, or 
make it configurable (there are usecases where many versions/columns might be 
kept in HBase).

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808569#comment-13808569
 ] 

Lars Hofhansl commented on HBASE-9000:
--

In all fairness, we should not divide the runtime by the number of ops. The 
whole point of seeking is to reduce the number of ops. In that case it looks 
like this:
{quote}
Next: 136000.0 us
ReseekToNextRow: 443000.0 us
ReseekToNextColumn: 2243000.0 us
{quote}


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808771#comment-13808771
 ] 

Lars Hofhansl commented on HBASE-9000:
--

Hah. I just something very similar, but actually in StoreScanner. Not quite 
ready, but assumes a seek within a row or to the next row is a near seek, and 
the tries next() a few time (I picked 10 as a default).

HBASE-9778 is yet another approach. Rather than doing a serious of next()'s 
when a reseek was requested that change would not request a reseek in the first 
place when it makes sense.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-28 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806669#comment-13806669
 ] 

Chao Shi commented on HBASE-9000:
-

I don't think tailSet is efficient, considering a scenario that a filter is 
present and keep returning SEEK_NEXT_COL. A call to tailSet does not make use 
of the current position and require relocate to there over the skip list. In 
most cases, where maxVersions of a table is set to a small value, it can 
alternatively skip at most maxVersions keys.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807468#comment-13807468
 ] 

Lars Hofhansl commented on HBASE-9000:
--

We should quantify the difference in some microbenchmarks.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-28 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807651#comment-13807651
 ] 

Liang Xie commented on HBASE-9000:
--

Just my understanding: our current impl in trunk or the old 0.89-fb impl seems 
not optimized against all cases,  one type of case probably is friendly with 
current  logarithmic seek imple(tailSet), e.g.  lots of kv needs to be skipped; 
 another type of case is friendly with linear reseek, e.g. the target kv is 
near with the previous position, like 0.89-fb's old impl like :
{code}
 public boolean reseek(KeyValue key) {
  while (kvsetNextRow != null 
  comparator.compare(kvsetNextRow, key)  0) {
kvsetNextRow = getNext(kvsetIt);
 }
...
{code}

to me, it's reasonable to have a setting value to : let the cheaper linear 
reseek go first, then fallback to more expensive seek() ops after reaching a 
config value.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-28 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807674#comment-13807674
 ] 

Lars Hofhansl commented on HBASE-9000:
--

I've been thinking about declaring a reseek as near or far. A near reseek 
would be to the next column or row, whereas a far reseek could be result of a 
seek hint from the Filter. In the near case we could try next() a few times 
without and then seek, in the far case we'd seek immediately as we expect to 
be able to skip a lot of KVs.


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-10-28 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807685#comment-13807685
 ] 

Liang Xie commented on HBASE-9000:
--

sound great:)

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713995#comment-13713995
 ] 

Lars Hofhansl commented on HBASE-9000:
--

Is that an FB issue only?
In current HBase (0.94+) I see:
{code}
  kvsetIt = kvsetAtCreation.tailSet(getHighest(key, kvsetItRow)).iterator();
  snapshotIt = snapshotAtCreation.tailSet(getHighest(key, 
snapshotItRow)).iterator();
{code}
In reseek.

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira