[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121568#comment-13121568 ] Lars Hofhansl commented on HBASE-4241: -- Hi Liyin, just seeing this now. Let me have a look. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121594#comment-13121594 ] Lars Hofhansl commented on HBASE-4241: -- During memstore flush we only want to read from the snapshot. There is no principal reason why using memstore scanner for this wouldn't work (need to make sure that kvsetIt never has anything to iterate). The memstore scanner now also mucks with readpoints (although I think that is new). It seemed easier to just wrap a simple scanner wrapper instance around the snapshot. Wanna try using memstore scanner? If it turns out to be simpler I am happy to change it. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119629#comment-13119629 ] Liyin Tang commented on HBASE-4241: --- Hi Lars, Thanks for your patch. I am trying to back port this feature for hbase-89 I have a quick question:) why we use CollectionBackedScanner but not reuse memstore scanner? Thanks a lot Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090921#comment-13090921 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1636 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3677 The more we put into finally block, the higher the chance that the last close() may be skipped by exception from previous calls to writer (lines 518 and 520). Ideally calls to writer should be enclosed in try/catch blocks. - Ted On 2011-08-25 05:38:13, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 05:38:13) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091076#comment-13091076 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ --- (Updated 2011-08-25 15:43:47.733133) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- Make sure the StoreScanner is always closed (in analogy what Store.compactStore(...) does). Summary --- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241. https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing --- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091114#comment-13091114 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- bq. On 2011-08-25 16:08:05, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 523 bq. https://reviews.apache.org/r/1650/diff/5/?file=35531#file35531line523 bq. bq. Whitespace. I got this on the commit - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1645 --- On 2011-08-25 15:43:47, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 15:43:47) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091219#comment-13091219 ] Hudson commented on HBASE-4241: --- Integrated in HBase-TRUNK #2146 (See [https://builds.apache.org/job/HBase-TRUNK/2146/]) HBASE-4241 Optimize flushing of the Memstore stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090426#comment-13090426 ] Lars Hofhansl commented on HBASE-4241: -- Might as well keep using this one... Please let me know what you think of the patch in general, then I'll clean it up, add my local unitests, and post for formal review. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Priority: Minor Attachments: 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090437#comment-13090437 ] Ted Yu commented on HBASE-4241: --- Interesting idea. {code} +/** + * Utility scanner that can wrap a collection and pretend to + * be a KeyValueScanner. + */ +public class ScannerWrapper implements KeyValueScanner { {code} This class might be used elsewhere. How about naming it CollectionBackedScanner ? Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090439#comment-13090439 ] Lars Hofhansl commented on HBASE-4241: -- I like that. I had been struggling with a name for it. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090495#comment-13090495 ] Ted Yu commented on HBASE-4241: --- Year should be 2011. {code} + * Copyright 2009 The Apache Software Foundation {code} I think a better description: {code} + * Utility scanner that can wrap a collection (anything iterable)and pretend to + * be a KeyValueScanner. {code} would be: {code} + * Utility scanner that wraps a collection (anything iterable) and serves + * as a KeyValueScanner. {code} Please run through unit tests. If they pass, please use reviewboard for further comments. Good work. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090513#comment-13090513 ] Ted Yu commented on HBASE-4241: --- Since HBASE-4071 was New Feature, this JIRA should be something similar. Looking at CollectionBackedScanner again, all ctor's (implicitly) take a comparator. So the description of (anything iterable) is inaccurate. How about the following: {code} + * Utility scanner that wraps a sortable collection and serves {code} Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090522#comment-13090522 ] Lars Hofhansl commented on HBASE-4241: -- Good point that would make it more clear. I also discovered another problem that broken many of the tests. Fixed now. Once I get a complete test run through without failures I'll file a review-request. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090670#comment-13090670 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ --- Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary --- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241. https://issues.apache.org/jira/browse/HBASE-4241 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing --- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090676#comment-13090676 ] Lars Hofhansl commented on HBASE-4241: -- I have an (unrealistic) local test scenario that basically updates the same row in a state table. VERSIONS is set to 1. In this scenario 1m updates produced 50mb of garbage that is completely avoided with this change. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090761#comment-13090761 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 --- Nice work. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3654 Line wrap is needed for this long line. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3655 This should be false. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java https://reviews.apache.org/r/1650/#comment3656 Minor comment: This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. - Ted On 2011-08-25 01:28:34, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 01:28:34) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090765#comment-13090765 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- bq. On 2011-08-25 04:29:19, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 490 bq. https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line490 bq. bq. Line wrap is needed for this long line. Will do. bq. On 2011-08-25 04:29:19, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 491 bq. https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line491 bq. bq. This should be false. This definitely needs to be true. Deletes need to be retained when the cache is flushed. True, means to retain deletes. (I had this the wrong way in my first attempt as well and then wondered why all the deleted rows were not removed) bq. On 2011-08-25 04:29:19, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64 bq. https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64 bq. bq. Minor comment: bq. This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. Agreed, I'll change the order. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 --- On 2011-08-25 01:28:34, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 01:28:34) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090767#comment-13090767 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- bq. On 2011-08-25 04:29:19, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64 bq. https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64 bq. bq. Minor comment: bq. This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. bq. bq. Lars Hofhansl wrote: bq. Agreed, I'll change the order. Actually all the other constructors take the comparator as the 2nd argument. I think I either change them all or leave them all in this order. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 --- On 2011-08-25 01:28:34, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 01:28:34) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090773#comment-13090773 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ --- (Updated 2011-08-25 04:49:36.340305) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- Added linebreak. Added comment that deletes need to be retained. Summary --- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241. https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing --- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090779#comment-13090779 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1629 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java https://reviews.apache.org/r/1650/#comment3662 I think vararg form (KeyValue... incData) is more flexible than array form ([]). vararg has to be the last parameter. - Ted On 2011-08-25 04:49:36, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 04:49:36) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090783#comment-13090783 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1628 --- Ship it! Minor comments below. Will wait on your feedback before commit. There is one thing to fix I think -- the finally thing. Let me know and I can do it on commit. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3661 oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3663 This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java https://reviews.apache.org/r/1650/#comment3664 Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right? oh, there is a finally just after... maybe move the scanner.close in there? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java https://reviews.apache.org/r/1650/#comment3665 Nice - Michael On 2011-08-25 04:49:36, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 04:49:36) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090782#comment-13090782 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- bq. On 2011-08-25 05:12:20, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64 bq. https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line64 bq. bq. I think vararg form (KeyValue... incData) is more flexible than array form ([]). bq. vararg has to be the last parameter. Oh I see what you mean. Agreed, will upload a new version in minute. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1629 --- On 2011-08-25 04:49:36, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 04:49:36) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 bq. bq. Diff: https://reviews.apache.org/r/1650/diff bq. bq. bq. Testing bq. --- bq. bq. Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). bq. I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). bq. bq. No new test, as this should cause no functional change. bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090790#comment-13090790 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- bq. On 2011-08-25 05:16:29, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 33 bq. https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line33 bq. bq. Nice I stole most of this from the test code :) bq. On 2011-08-25 05:16:29, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 505 bq. https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line505 bq. bq. This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set. It's also nice because now it uses the exact same logic that is used during compaction. bq. On 2011-08-25 05:16:29, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 488 bq. https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line488 bq. bq. oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one? Versions were only ever pruned as part of compactions. (A StoreScanner is also in Store.compactStore(...)) We can't get rid of all excess versions at flush time (as not all of them are in the memstore), but we can make a good effort to avoid flushing some of the versions. bq. On 2011-08-25 05:16:29, Michael Stack wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 516 bq. https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line516 bq. bq. Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right? bq. bq. oh, there is a finally just after... maybe move the scanner.close in there? Hmm... I was thinking since KeyValueScanner.close() can throw IOException I don't want it to be in the finally clause. I also think this close this close() is not actually needed. Looking at StoreScanner.close(), though, I see that it does not throw anything. Not sure about this one, I'll move it into the finally block, after the close of the writer, just to be save. I'll upload a new version soon and also includes suggestions from Ted. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1628 --- On 2011-08-25 04:49:36, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1650/ bq. --- bq. bq. (Updated 2011-08-25 04:49:36) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. bq. This covers two scenarios: bq. 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. bq. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. bq. bq. This changes the Store flush to use the same mechanism that used for compactions. bq. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. bq. bq. bq. This addresses bug HBASE-4241. bq. https://issues.apache.org/jira/browse/HBASE-4241 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 bq.
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090792#comment-13090792 ] jirapos...@reviews.apache.org commented on HBASE-4241: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ --- (Updated 2011-08-25 05:38:13.482663) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary --- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further *expired* versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241. https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing --- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 4241-v2.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira