[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051291#comment-15051291 ] Yu Li commented on HBASE-14906: --- Ping [~stack] [~Apache9] :-) > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052117#comment-15052117 ] Hudson commented on HBASE-14906: FAILURE: Integrated in HBase-Trunk_matrix #545 (See [https://builds.apache.org/job/HBase-Trunk_matrix/545/]) HBASE-14906 Improvements on FlushLargeStoresPolicy (Yu Li) (stack: rev c15e0af84aeb4ab992482a957c2b242d2ab57d76) * hbase-common/src/main/resources/hbase-default.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052062#comment-15052062 ] Yu Li commented on HBASE-14906: --- Thanks for help review and commit sir! [~stack] > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043916#comment-15043916 ] Yu Li commented on HBASE-14906: --- [~stack] and [~gaurav.menghani], any comments on this one? Thanks. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039665#comment-15039665 ] Yu Li commented on HBASE-14906: --- Thanks for the review and comments Duo! > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037731#comment-15037731 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037519#comment-15037519 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037561#comment-15037561 ] Yu Li commented on HBASE-14906: --- >From the HadoopQA report, observe below failures (errors) although it says +1: {noformat} Tests in error: org.apache.hadoop.hbase.regionserver.TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Run 1: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � Run 2: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � Run 3: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Run 1: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... Run 2: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... Run 3: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... {noformat} And detailed exception: {noformat} shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Time elapsed: 0.043 sec <<< ERROR! java.lang.Exception: Unexpected exception, expected but was at org.apache.hadoop.hbase.regionserver.FlushLargeStoresPolicy.configureForRegion(FlushLargeStoresPolicy.java:59) at org.apache.hadoop.hbase.regionserver.FlushPolicyFactory.create(FlushPolicyFactory.java:52) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:845) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:786) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6195) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6204) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:239) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:249) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(TestBulkLoad.java:207) {noformat} Since these are Errors not Failures, the test stop at the middle phase. The issue is caused by the patch here since it doesn't handle the case that column family number is zero, although this won't happen in real world, it's possible in our unit test case like TestBulkload. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038063#comment-15038063 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038139#comment-15038139 ] Yu Li commented on HBASE-14906: --- Confirmed no more UT failure from the testReport. However, the report still looks strange: summary about core tests, javadoc, etc. seems to disappear. [~stack] could you please take a look here sir? > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038608#comment-15038608 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12775599/HBASE-14906.v4.patch against master branch at commit a154ecda00d9d9a58e83d322dae7ffd3518b633c. ATTACHMENT ID: 12775599 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not generate new checkstyle errors. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16758//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16758//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16758//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16758//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038605#comment-15038605 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12775599/HBASE-14906.v4.patch against master branch at commit a154ecda00d9d9a58e83d322dae7ffd3518b633c. ATTACHMENT ID: 12775599 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not generate new checkstyle errors. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16757//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16757//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16757//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16757//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038201#comment-15038201 ] stack commented on HBASE-14906: --- This is my fault [~carp84] Will rerun the patch in a minute after I fix the reporting... > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039587#comment-15039587 ] Duo Zhang commented on HBASE-14906: --- So there is no exact lower bound in global config? > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039630#comment-15039630 ] Yu Li commented on HBASE-14906: --- {quote} So there is no exact lower bound in global config? {quote} Yes, there's only a "minimum lower bound" in global config, or say the default value for lower bound would be "dynamic" and table-specific now instead of any fixed value, unless the auto computed one is smaller than the minimum config. After the change there's no way for user to specify a _uniform small_ lower bound for *all* tables any more, which is also unnecessary IMHO. Expert user who wants to specify a smaller lower bound for any specified table could still achieve the goal by setting {{hbase.hregion.percolumnfamilyflush.size.lower.bound}} in table descriptor Feel free to let me know if any concern here. Thanks. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039657#comment-15039657 ] Duo Zhang commented on HBASE-14906: --- I'm OK with it. +1 on the newest patch. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036035#comment-15036035 ] Yu Li commented on HBASE-14906: --- Thanks for the quick response [~Apache9]! About config in table description, I think we need to preserve a way for experts to override the default value, in case they really need to set a *smaller* value than {{region.getMemstoreFlushSize() / familyNumber}} for their use case (setting the global config to a smaller value won't take effect by current design). Agree? > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037148#comment-15037148 ] Duo Zhang commented on HBASE-14906: --- {quote} About config in table description, I think we need to preserve a way for experts to override the default value, in case they really need to set a smaller value than region.getMemstoreFlushSize() / familyNumber for their use case {quote} Sounds reasonable. What about introduce two configurations here? One is the lower bound of lower bound, one is the exact lower bound? Make global config and table config have different meanings is a little confusing I think. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037238#comment-15037238 ] Yu Li commented on HBASE-14906: --- {quote} What about introduce two configurations here? One is the lower bound of lower bound, one is the exact lower bound? {quote} Make sense, will rename the global config name to {{hbase.hregion.percolumnfamilyflush.size.lower.bound.min}} to better reflect what the property means, or if you have a better name in mind, just let me know. :-) This change will invalidate user's previous setting on {{hbase.hregion.percolumnfamilyflush.size.lower.bound}}, but this would be by-design since we have a better default ({{region.getMemstoreFlushSize() / familyNumber}}) now. Will add the note into ReleaseNote. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035610#comment-15035610 ] Duo Zhang commented on HBASE-14906: --- +1 on the idea. Only one thing, I think we should still use {{region.getMemstoreFlushSize() / familyNumber}} even if we have a table config in the table description? In the patch you only compare the value with global config. Thanks. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035452#comment-15035452 ] Yu Li commented on HBASE-14906: --- [~Apache9] and [~gaurav.menghani], could you take a look here and share your thoughts? Thanks. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035442#comment-15035442 ] Yu Li commented on HBASE-14906: --- The result of TestPerColumnFamilyFlush#testCompareStoreFileCount shows a promising improvement (further less flush for small cf): w/o patch: 2015-12-01 22:15:39,749 INFO [Thread-1] regionserver.TestPerColumnFamilyFlush(637): disable selective flush: f1=>11, f2=>11, f3=>11 2015-12-01 22:15:39,749 INFO [Thread-1] regionserver.TestPerColumnFamilyFlush(640): enable selective flush: f1=>6, f2=>9, f3=>12 w/ patch: 2015-12-01 22:23:21,649 INFO [Thread-1] regionserver.TestPerColumnFamilyFlush(634): disable selective flush: f1=>11, f2=>11, f3=>11 2015-12-01 22:23:21,649 INFO [Thread-1] regionserver.TestPerColumnFamilyFlush(637): enable selective flush: f1=>6, f2=>7, f3=>13 > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035450#comment-15035450 ] Yu Li commented on HBASE-14906: --- Also applied the same test case as [HBASE-10201|https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14171950=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14171950] in real cluster, allow me to repeat the test method: 3 CFs, 16B value for CF1, 256B value for CF2 and 4K value for CF3, 1M rows, 128M memstore flush size, 16M CF flush size. And the result is: w/o patch: {noformat} metric_storeCount : 3, metric_storeFileCount : 9, metric_memStoreSize : 112519968, metric_storeFileSize : 4396528692, metric_compactionsCompletedCount : 17, metric_numBytesCompactedCount : 18891018964, metric_numFilesCompactedCount : 89 {noformat} w/ patch: {noformat} metric_storeCount : 3, metric_storeFileCount : 13, metric_memStoreSize : 58168928, metric_storeFileSize : 4446829180, metric_compactionsCompletedCount : 15, metric_numBytesCompactedCount : 15101162833, metric_numFilesCompactedCount : 82 {noformat} Flush numbers of different column family: w/o patch: {noformat} CF1: 9 times CF2: 19 times CF3: 39 times {noformat} w/ patch: {noformat} CF1: 4 times CF2: 8 times CF3: 53 times {noformat} >From the metrics we could see both compaction times and bytes involved in >compaction reduced. We could also see there're less flushes of small CF but more of large CF. This makes sense by theory since more memstores of small cf retain in memory causing flush for large CF becoming more frequent, until small cf also reaches the flush line. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037101#comment-15037101 ] stack commented on HBASE-14906: --- This is nice work. +1 on patch. I like the test results. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Attachments: HBASE-14906.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)