[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051291#comment-15051291
 ] 

Yu Li commented on HBASE-14906:
---

Ping [~stack] [~Apache9] :-)

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052117#comment-15052117
 ] 

Hudson commented on HBASE-14906:


FAILURE: Integrated in HBase-Trunk_matrix #545 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/545/])
HBASE-14906 Improvements on FlushLargeStoresPolicy (Yu Li) (stack: rev 
c15e0af84aeb4ab992482a957c2b242d2ab57d76)
* hbase-common/src/main/resources/hbase-default.xml
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java


> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052062#comment-15052062
 ] 

Yu Li commented on HBASE-14906:
---

Thanks for help review and commit sir! [~stack]

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-06 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043916#comment-15043916
 ] 

Yu Li commented on HBASE-14906:
---

[~stack] and [~gaurav.menghani], any comments on this one? Thanks.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039665#comment-15039665
 ] 

Yu Li commented on HBASE-14906:
---

Thanks for the review and comments Duo!

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037731#comment-15037731
 ] 

Hadoop QA commented on HBASE-14906:
---

{color:green}+1 overall{color}.  
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16750//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16750//console

This message is automatically generated.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037519#comment-15037519
 ] 

Hadoop QA commented on HBASE-14906:
---

{color:green}+1 overall{color}.  
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16749//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16749//console

This message is automatically generated.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037561#comment-15037561
 ] 

Yu Li commented on HBASE-14906:
---

>From the HadoopQA report, observe below failures (errors) although it says +1:
{noformat}
Tests in error: 
org.apache.hadoop.hbase.regionserver.TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor(org.apache.hadoop.hbase.regionserver.TestBulkLoad)
  Run 1: 
TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor
 � 
  Run 2: 
TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor
 � 
  Run 3: 
TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor
 � 

org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad)
  Run 1: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath �  
Unexpected ex...
  Run 2: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath �  
Unexpected ex...
  Run 3: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath �  
Unexpected ex...
{noformat}

And detailed exception:
{noformat}
shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad)
  Time elapsed: 0.043 sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at 
org.apache.hadoop.hbase.regionserver.FlushLargeStoresPolicy.configureForRegion(FlushLargeStoresPolicy.java:59)
at 
org.apache.hadoop.hbase.regionserver.FlushPolicyFactory.create(FlushPolicyFactory.java:52)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:845)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:786)
at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6195)
at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6204)
at 
org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:239)
at 
org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:249)
at 
org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(TestBulkLoad.java:207)
{noformat}

Since these are Errors not Failures, the test stop at the middle phase.

The issue is caused by the patch here since it doesn't handle the case that 
column family number is zero, although this won't happen in real world, it's 
possible in our unit test case like TestBulkload.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038063#comment-15038063
 ] 

Hadoop QA commented on HBASE-14906:
---

{color:green}+1 overall{color}.  
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16755//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16755//console

This message is automatically generated.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038139#comment-15038139
 ] 

Yu Li commented on HBASE-14906:
---

Confirmed no more UT failure from the testReport. However, the report still 
looks strange: summary about core tests, javadoc, etc. seems to disappear. 
[~stack] could you please take a look here sir?

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038608#comment-15038608
 ] 

Hadoop QA commented on HBASE-14906:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12775599/HBASE-14906.v4.patch
  against master branch at commit a154ecda00d9d9a58e83d322dae7ffd3518b633c.
  ATTACHMENT ID: 12775599

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not generate new 
checkstyle errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16758//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16758//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16758//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16758//console

This message is automatically generated.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038605#comment-15038605
 ] 

Hadoop QA commented on HBASE-14906:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12775599/HBASE-14906.v4.patch
  against master branch at commit a154ecda00d9d9a58e83d322dae7ffd3518b633c.
  ATTACHMENT ID: 12775599

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not generate new 
checkstyle errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .
{color:green}+1 core zombie tests -- no zombies!{color}.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16757//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16757//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16757//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16757//console

This message is automatically generated.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038201#comment-15038201
 ] 

stack commented on HBASE-14906:
---

This is my fault [~carp84] Will rerun the patch in a minute after I fix the 
reporting...

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039587#comment-15039587
 ] 

Duo Zhang commented on HBASE-14906:
---

So there is no exact lower bound in global config?

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039630#comment-15039630
 ] 

Yu Li commented on HBASE-14906:
---

{quote}
So there is no exact lower bound in global config?
{quote}
Yes, there's only a "minimum lower bound" in global config, or say the default 
value for lower bound would be "dynamic" and table-specific now instead of any 
fixed value, unless the auto computed one is smaller than the minimum config.

After the change there's no way for user to specify a _uniform small_ lower 
bound for *all* tables any more, which is also unnecessary IMHO. Expert user 
who wants to specify a smaller lower bound for any specified table could still 
achieve the goal by setting 
{{hbase.hregion.percolumnfamilyflush.size.lower.bound}} in table descriptor

Feel free to let me know if any concern here. Thanks.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039657#comment-15039657
 ] 

Duo Zhang commented on HBASE-14906:
---

I'm OK with it.

+1 on the newest patch.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, 
> HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036035#comment-15036035
 ] 

Yu Li commented on HBASE-14906:
---

Thanks for the quick response [~Apache9]!

About config in table description, I think we need to preserve a way for 
experts to override the default value, in case they really need to set a 
*smaller* value than {{region.getMemstoreFlushSize() / familyNumber}} for their 
use case (setting the global config to a smaller value won't take effect by 
current design). Agree?

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037148#comment-15037148
 ] 

Duo Zhang commented on HBASE-14906:
---

{quote}
About config in table description, I think we need to preserve a way for 
experts to override the default value, in case they really need to set a 
smaller value than region.getMemstoreFlushSize() / familyNumber for their use 
case
{quote}
Sounds reasonable. What about introduce two configurations here? One is the 
lower bound of lower bound, one is the exact lower bound? Make global config 
and table config have different meanings is a little confusing I think.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037238#comment-15037238
 ] 

Yu Li commented on HBASE-14906:
---

{quote}
What about introduce two configurations here? One is the lower bound of lower 
bound, one is the exact lower bound? 
{quote}
Make sense, will rename the global config name to 
{{hbase.hregion.percolumnfamilyflush.size.lower.bound.min}} to better reflect 
what the property means, or if you have a better name in mind, just let me 
know. :-)

This change will invalidate user's previous setting on 
{{hbase.hregion.percolumnfamilyflush.size.lower.bound}}, but this would be 
by-design since we have a better default ({{region.getMemstoreFlushSize() / 
familyNumber}}) now. Will add the note into ReleaseNote.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch, HBASE-14906.v2.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035610#comment-15035610
 ] 

Duo Zhang commented on HBASE-14906:
---

+1 on the idea.

Only one thing, I think we should still use {{region.getMemstoreFlushSize() / 
familyNumber}} even if we have a table config in the table description? In the 
patch you only compare the value with global config.

Thanks.


> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035452#comment-15035452
 ] 

Yu Li commented on HBASE-14906:
---

[~Apache9] and [~gaurav.menghani], could you take a look here and share your 
thoughts? Thanks.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035442#comment-15035442
 ] 

Yu Li commented on HBASE-14906:
---

The result of TestPerColumnFamilyFlush#testCompareStoreFileCount shows a 
promising improvement (further less flush for small cf):

w/o patch:
2015-12-01 22:15:39,749 INFO  [Thread-1] 
regionserver.TestPerColumnFamilyFlush(637): disable selective flush: f1=>11, 
f2=>11, f3=>11
2015-12-01 22:15:39,749 INFO  [Thread-1] 
regionserver.TestPerColumnFamilyFlush(640): enable selective flush: f1=>6, 
f2=>9, f3=>12

w/ patch:
2015-12-01 22:23:21,649 INFO  [Thread-1] 
regionserver.TestPerColumnFamilyFlush(634): disable selective flush: f1=>11, 
f2=>11, f3=>11
2015-12-01 22:23:21,649 INFO  [Thread-1] 
regionserver.TestPerColumnFamilyFlush(637): enable selective flush: f1=>6, 
f2=>7, f3=>13

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035450#comment-15035450
 ] 

Yu Li commented on HBASE-14906:
---

Also applied the same test case as 
[HBASE-10201|https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14171950=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14171950]
 in real cluster, allow me to repeat the test method:

3 CFs, 16B value for CF1, 256B value for CF2 and 4K value for CF3, 1M rows, 
128M memstore flush size, 16M CF flush size.

And the result is:

w/o patch:
{noformat}
metric_storeCount : 3,
metric_storeFileCount : 9,
metric_memStoreSize : 112519968,
metric_storeFileSize : 4396528692,
metric_compactionsCompletedCount : 17,
metric_numBytesCompactedCount : 18891018964,
metric_numFilesCompactedCount : 89
{noformat}

w/ patch:
{noformat}
metric_storeCount : 3,
metric_storeFileCount : 13,
metric_memStoreSize : 58168928,
metric_storeFileSize : 4446829180,
metric_compactionsCompletedCount : 15,
metric_numBytesCompactedCount : 15101162833,
metric_numFilesCompactedCount : 82
{noformat}

Flush numbers of different column family:
w/o patch:
{noformat}
CF1: 9 times
CF2: 19 times
CF3: 39 times
{noformat}

w/ patch:
{noformat}
CF1: 4 times
CF2: 8 times
CF3: 53 times
{noformat}

>From the metrics we could see both compaction times and bytes involved in 
>compaction reduced.

We could also see there're less flushes of small CF but more of large CF. This 
makes sense by theory since more memstores of small cf retain in memory causing 
flush for large CF becoming more frequent, until small cf also reaches the 
flush line.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy

2015-12-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037101#comment-15037101
 ] 

stack commented on HBASE-14906:
---

This is nice work. +1 on patch. I like the test results.

> Improvements on FlushLargeStoresPolicy
> --
>
> Key: HBASE-14906
> URL: https://issues.apache.org/jira/browse/HBASE-14906
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-14906.patch
>
>
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how 
> many actual families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound 
> could not fit in all cases, and requires user to know details of the 
> implementation to properly set it. We propose to use 
> "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   
> hbase.hregion.percolumnfamilyflush.size.lower.bound
> 16777216
> 
> If FlushLargeStoresPolicy is used and there are multiple column families,
> then every time that we hit the total memstore limit, we find out all the
> column families whose memstores exceed a "lower bound" and only flush them
> while retaining the others in memory. The "lower bound" will be
> "hbase.hregion.memstore.flush.size / column_family_number" by default
> unless value of this property is larger than that. If none of the families
> have their memstore size more than lower bound, all the memstores will be
> flushed (just as usual).
> 
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)