[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-10-05 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121568#comment-13121568
 ] 

Lars Hofhansl commented on HBASE-4241:
--

Hi Liyin, just seeing this now. Let me have a look.

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-10-05 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121594#comment-13121594
 ] 

Lars Hofhansl commented on HBASE-4241:
--

During memstore flush we only want to read from the snapshot. There is no 
principal reason why using memstore scanner for this wouldn't work (need to 
make sure that kvsetIt never has anything to iterate).

The memstore scanner now also mucks with readpoints (although I think that is 
new). It seemed easier to just wrap a simple scanner wrapper instance around 
the snapshot.

Wanna try using memstore scanner? If it turns out to be simpler I am happy to 
change it.

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-10-03 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119629#comment-13119629
 ] 

Liyin Tang commented on HBASE-4241:
---

Hi Lars, Thanks for your patch. I am trying to back port this feature for 
hbase-89
I have a quick question:) 

why we use CollectionBackedScanner but not reuse memstore scanner?

Thanks a lot

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090921#comment-13090921
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1636
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3677

The more we put into finally block, the higher the chance that the last 
close() may be skipped by exception from previous calls to writer (lines 518 
and 520).
Ideally calls to writer should be enclosed in try/catch blocks.


- Ted


On 2011-08-25 05:38:13, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 05:38:13)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091076#comment-13091076
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/
---

(Updated 2011-08-25 15:43:47.733133)


Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.


Changes
---

Make sure the StoreScanner is always closed (in analogy what 
Store.compactStore(...) does).


Summary
---

This avoids flushing row versions to disk that are known to be GC'd by the next 
compaction anyway.
This covers two scenarios:
1. maxVersions=N and we find at least N versions in the memstore. We can safely 
avoid flushing any further versions to disk.
2. similarly minVersions=N and we find at least N versions in the memstore. Now 
we can safely avoid flushing any further *expired* versions to disk.

This changes the Store flush to use the same mechanism that used for 
compactions.
I borrowed some code from the tests and refactored the test code to use a new 
utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.


This addresses bug HBASE-4241.
https://issues.apache.org/jira/browse/HBASE-4241


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 

Diff: https://reviews.apache.org/r/1650/diff


Testing
---

Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with 
or without my change).
I had to change three tests that incorrectly relied on old rows hanging around 
after a flush (or were otherwise incorrect).

No new test, as this should cause no functional change.


Thanks,

Lars



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091114#comment-13091114
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--



bq.  On 2011-08-25 16:08:05, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 523
bq.   https://reviews.apache.org/r/1650/diff/5/?file=35531#file35531line523
bq.  
bq.   Whitespace.

I got this on the commit


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1645
---


On 2011-08-25 15:43:47, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 15:43:47)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091219#comment-13091219
 ] 

Hudson commented on HBASE-4241:
---

Integrated in HBase-TRUNK #2146 (See 
[https://builds.apache.org/job/HBase-TRUNK/2146/])
HBASE-4241  Optimize flushing of the Memstore

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090426#comment-13090426
 ] 

Lars Hofhansl commented on HBASE-4241:
--

Might as well keep using this one... 

Please let me know what you think of the patch in general, then I'll clean it 
up, add my local unitests, and post for formal review.


 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Priority: Minor
 Attachments: 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090437#comment-13090437
 ] 

Ted Yu commented on HBASE-4241:
---

Interesting idea.
{code}
+/**
+ * Utility scanner that can wrap a collection and pretend to
+ * be a KeyValueScanner.
+ */
+public class ScannerWrapper implements KeyValueScanner {
{code}
This class might be used elsewhere.
How about naming it CollectionBackedScanner ?

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090439#comment-13090439
 ] 

Lars Hofhansl commented on HBASE-4241:
--

I like that. I had been struggling with a name for it.


 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090495#comment-13090495
 ] 

Ted Yu commented on HBASE-4241:
---

Year should be 2011.
{code}
+ * Copyright 2009 The Apache Software Foundation
{code}
I think a better description:
{code}
+ * Utility scanner that can wrap a collection (anything iterable)and pretend to
+ * be a KeyValueScanner.
{code}
would be:
{code}
+ * Utility scanner that wraps a collection (anything iterable) and serves
+ * as a KeyValueScanner.
{code}
Please run through unit tests.
If they pass, please use reviewboard for further comments.

Good work.


 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090513#comment-13090513
 ] 

Ted Yu commented on HBASE-4241:
---

Since HBASE-4071 was New Feature, this JIRA should be something similar.

Looking at CollectionBackedScanner again, all ctor's (implicitly) take a 
comparator. So the description of (anything iterable) is inaccurate.
How about the following:
{code}
+ * Utility scanner that wraps a sortable collection and serves
{code}

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090522#comment-13090522
 ] 

Lars Hofhansl commented on HBASE-4241:
--

Good point that would make it more clear. I also discovered another problem 
that broken many of the tests. Fixed now.
Once I get a complete test run through without failures I'll file a 
review-request.

 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090670#comment-13090670
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/
---

Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.


Summary
---

This avoids flushing row versions to disk that are known to be GC'd by the next 
compaction anyway.
This covers two scenarios:
1. maxVersions=N and we find at least N versions in the memstore. We can safely 
avoid flushing any further versions to disk.
2. similarly minVersions=N and we find at least N versions in the memstore. Now 
we can safely avoid flushing any further *expired* versions to disk.

This changes the Store flush to use the same mechanism that used for 
compactions.
I borrowed some code from the tests and refactored the test code to use a new 
utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.


This addresses bug HBASE-4241.
https://issues.apache.org/jira/browse/HBASE-4241


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 

Diff: https://reviews.apache.org/r/1650/diff


Testing
---

Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with 
or without my change).
I had to change three tests that incorrectly relied on old rows hanging around 
after a flush (or were otherwise incorrect).

No new test, as this should cause no functional change.


Thanks,

Lars



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090676#comment-13090676
 ] 

Lars Hofhansl commented on HBASE-4241:
--

I have an (unrealistic) local test scenario that basically updates the same 
row in a state table. VERSIONS is set to 1.

In this scenario 1m updates produced  50mb of garbage that is completely 
avoided with this change.


 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090761#comment-13090761
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1625
---


Nice work.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3654

Line wrap is needed for this long line.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3655

This should be false.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
https://reviews.apache.org/r/1650/#comment3656

Minor comment:
This ctor is used by KeyValueScanFixture below. Putting comparator as first 
parameter would make it more readable.


- Ted


On 2011-08-25 01:28:34, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 01:28:34)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only 

[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090765#comment-13090765
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--



bq.  On 2011-08-25 04:29:19, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 490
bq.   https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line490
bq.  
bq.   Line wrap is needed for this long line.

Will do.


bq.  On 2011-08-25 04:29:19, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 491
bq.   https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line491
bq.  
bq.   This should be false.

This definitely needs to be true. Deletes need to be retained when the cache is 
flushed.
True, means to retain deletes.

(I had this the wrong way in my first attempt as well and then wondered why all 
the deleted rows were not removed)


bq.  On 2011-08-25 04:29:19, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java,
 line 64
bq.   https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64
bq.  
bq.   Minor comment:
bq.   This ctor is used by KeyValueScanFixture below. Putting comparator 
as first parameter would make it more readable.

Agreed, I'll change the order.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1625
---


On 2011-08-25 01:28:34, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 01:28:34)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl

[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090767#comment-13090767
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--



bq.  On 2011-08-25 04:29:19, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java,
 line 64
bq.   https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64
bq.  
bq.   Minor comment:
bq.   This ctor is used by KeyValueScanFixture below. Putting comparator 
as first parameter would make it more readable.
bq.  
bq.  Lars Hofhansl wrote:
bq.  Agreed, I'll change the order.

Actually all the other constructors take the comparator as the 2nd argument.
I think I either change them all or leave them all in this order.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1625
---


On 2011-08-25 01:28:34, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 01:28:34)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current 

[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090773#comment-13090773
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/
---

(Updated 2011-08-25 04:49:36.340305)


Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.


Changes
---

Added linebreak.
Added comment that deletes need to be retained.


Summary
---

This avoids flushing row versions to disk that are known to be GC'd by the next 
compaction anyway.
This covers two scenarios:
1. maxVersions=N and we find at least N versions in the memstore. We can safely 
avoid flushing any further versions to disk.
2. similarly minVersions=N and we find at least N versions in the memstore. Now 
we can safely avoid flushing any further *expired* versions to disk.

This changes the Store flush to use the same mechanism that used for 
compactions.
I borrowed some code from the tests and refactored the test code to use a new 
utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.


This addresses bug HBASE-4241.
https://issues.apache.org/jira/browse/HBASE-4241


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 

Diff: https://reviews.apache.org/r/1650/diff


Testing
---

Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with 
or without my change).
I had to change three tests that incorrectly relied on old rows hanging around 
after a flush (or were otherwise incorrect).

No new test, as this should cause no functional change.


Thanks,

Lars



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090779#comment-13090779
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1629
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
https://reviews.apache.org/r/1650/#comment3662

I think vararg form (KeyValue... incData) is more flexible than array form 
([]).
vararg has to be the last parameter.


- Ted


On 2011-08-25 04:49:36, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 04:49:36)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090783#comment-13090783
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1628
---

Ship it!


Minor comments below.  Will wait on your feedback before commit.  There is one 
thing to fix I think -- the finally thing.  Let me know and I can do it on 
commit.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3661

oh.  so we never cared about this before?  we'd just keep doing versions 
even if well in excess of max?  We were pruning versions though?  Higher in the 
stack?  In the scanner that called this one?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3663

This is nice, the way you are going to the scanner to next rather than what 
we did before where we'd iterate a set.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
https://reviews.apache.org/r/1650/#comment3664

Should this be in a finally block?  Would it make a diff if this was left 
hanging open because of some exception?  We can get one up out of the 
scanner.next, right?

oh, there is a finally just after... maybe move the scanner.close in there? 
 



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
https://reviews.apache.org/r/1650/#comment3665

Nice


- Michael


On 2011-08-25 04:49:36, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 04:49:36)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue 

[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090782#comment-13090782
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--



bq.  On 2011-08-25 05:12:20, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java,
 line 64
bq.   https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line64
bq.  
bq.   I think vararg form (KeyValue... incData) is more flexible than 
array form ([]).
bq.   vararg has to be the last parameter.

Oh I see what you mean. Agreed, will upload a new version in minute.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1629
---


On 2011-08-25 04:49:36, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 04:49:36)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 
bq.  
bq.  Diff: https://reviews.apache.org/r/1650/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran all tests. TestHTablePool and TestDistributedLogSplitting error out 
(with or without my change).
bq.  I had to change three tests that incorrectly relied on old rows hanging 
around after a flush (or were otherwise incorrect).
bq.  
bq.  No new test, as this should cause no functional change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090790#comment-13090790
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--



bq.  On 2011-08-25 05:16:29, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java,
 line 33
bq.   https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line33
bq.  
bq.   Nice

I stole most of this from the test code :)


bq.  On 2011-08-25 05:16:29, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 505
bq.   https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line505
bq.  
bq.   This is nice, the way you are going to the scanner to next rather 
than what we did before where we'd iterate a set.

It's also nice because now it uses the exact same logic that is used during 
compaction.


bq.  On 2011-08-25 05:16:29, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 488
bq.   https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line488
bq.  
bq.   oh.  so we never cared about this before?  we'd just keep doing 
versions even if well in excess of max?  We were pruning versions though?  
Higher in the stack?  In the scanner that called this one?

Versions were only ever pruned as part of compactions. (A StoreScanner is also 
in Store.compactStore(...))

We can't get rid of all excess versions at flush time (as not all of them are 
in the memstore), but we can make a good effort to avoid flushing some of the 
versions.


bq.  On 2011-08-25 05:16:29, Michael Stack wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
 line 516
bq.   https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line516
bq.  
bq.   Should this be in a finally block?  Would it make a diff if this was 
left hanging open because of some exception?  We can get one up out of the 
scanner.next, right?
bq.   
bq.   oh, there is a finally just after... maybe move the scanner.close in 
there?

Hmm... I was thinking since KeyValueScanner.close() can throw IOException I 
don't want it to be in the finally clause.
I also think this close this close() is not actually needed.
Looking at StoreScanner.close(), though, I see that it does not throw anything.

Not sure about this one, I'll move it into the finally block, after the close 
of the writer, just to be save.

I'll upload a new version soon and also includes suggestions from Ted.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/#review1628
---


On 2011-08-25 04:49:36, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1650/
bq.  ---
bq.  
bq.  (Updated 2011-08-25 04:49:36)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This avoids flushing row versions to disk that are known to be GC'd by the 
next compaction anyway.
bq.  This covers two scenarios:
bq.  1. maxVersions=N and we find at least N versions in the memstore. We can 
safely avoid flushing any further versions to disk.
bq.  2. similarly minVersions=N and we find at least N versions in the 
memstore. Now we can safely avoid flushing any further *expired* versions to 
disk.
bq.  
bq.  This changes the Store flush to use the same mechanism that used for 
compactions.
bq.  I borrowed some code from the tests and refactored the test code to use a 
new utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.
bq.  
bq.  
bq.  This addresses bug HBASE-4241.
bq.  https://issues.apache.org/jira/browse/HBASE-4241
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
bq.

[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions

2011-08-24 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090792#comment-13090792
 ] 

jirapos...@reviews.apache.org commented on HBASE-4241:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1650/
---

(Updated 2011-08-25 05:38:13.482663)


Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.


Summary
---

This avoids flushing row versions to disk that are known to be GC'd by the next 
compaction anyway.
This covers two scenarios:
1. maxVersions=N and we find at least N versions in the memstore. We can safely 
avoid flushing any further versions to disk.
2. similarly minVersions=N and we find at least N versions in the memstore. Now 
we can safely avoid flushing any further *expired* versions to disk.

This changes the Store flush to use the same mechanism that used for 
compactions.
I borrowed some code from the tests and refactored the test code to use a new 
utility class that wraps a sorted collection and then behaves like 
KeyValueScanner. The same class is used to create scanner over the memstore's 
snapshot.


This addresses bug HBASE-4241.
https://issues.apache.org/jira/browse/HBASE-4241


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1161347 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
 1161347 

Diff: https://reviews.apache.org/r/1650/diff


Testing
---

Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with 
or without my change).
I had to change three tests that incorrectly relied on old rows hanging around 
after a flush (or were otherwise incorrect).

No new test, as this should cause no functional change.


Thanks,

Lars



 Optimize flushing of the Store cache for max versions and (new) min versions
 

 Key: HBASE-4241
 URL: https://issues.apache.org/jira/browse/HBASE-4241
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 4241-v2.txt, 4241.txt


 As discussed with with Jon, there is room for improvement in how the memstore 
 is flushed to disk.
 Currently only expired KVs are pruned before flushing, but we can also prune 
 versions if we find at least maxVersions versions in the memstore.
 The same holds for the new minversion feature: If we find at least minVersion 
 versions in the store we can remove all further versions that are expired.
 Generally we should use the same mechanism here that is used for Compaction. 
 I.e. StoreScanner. We only need to add a scanner to Memstore that can scan 
 along the current snapshot.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira