[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255185#comment-14255185 ] Ted Yu commented on HBASE-10201: Addendum integrated to master and branch-1 Thanks Duo. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255212#comment-14255212 ] Hudson commented on HBASE-10201: SUCCESS: Integrated in HBase-1.1 #16 (See [https://builds.apache.org/job/HBase-1.1/16/]) HBASE-10201 Addendum fixes typo of putIfAbsent (Duo Zhang) (tedyu: rev fbc852b6809184bdba0bbccb8ef3e1fe848d6f22) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255222#comment-14255222 ] Hudson commented on HBASE-10201: SUCCESS: Integrated in HBase-TRUNK #5955 (See [https://builds.apache.org/job/HBase-TRUNK/5955/]) HBASE-10201 Addendum fixes typo of putIfAbsent (Duo Zhang) (tedyu: rev 51334fb951232aa56add118d142e6b82da204494) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254418#comment-14254418 ] stack commented on HBASE-10201: --- [~Apache9] Any chance of your taking a look at the test failure here: https://builds.apache.org/job/PreCommit-HBASE-Build/12165//testReport/ It is per column family flushing https://builds.apache.org/job/PreCommit-HBASE-Build/12165/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush-output.txt Says this: --- Test set: org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 248.398 sec FAILURE! - in org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush testCompareStoreFileCount(org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush) Time elapsed: 53.153 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush.testCompareStoreFileCount(TestPerColumnFamilyFlush.java:589) Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254432#comment-14254432 ] zhangduo commented on HBASE-10201: -- [~stack] Yeah, the testcase is flakey.. It is used to confirm that per column family flush generates less store files. But flush is asynchronized, so there maybe a change that the original flush is delayed more than the per column family flush scenario and generate less store files, it depends on the machine's state that running the testcase... I think we can make an addendum to remove it for now to get a stable testing result. I will try to find a more stable way to confirm that per column family flush does work. Thanks~ Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254450#comment-14254450 ] stack commented on HBASE-10201: --- Thanks [~Apache9] Do it in new issue when you get a chance since this one is long enough already (smile). Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252596#comment-14252596 ] stack commented on HBASE-10201: --- Committed to branch-1 so will be in 1.1.0. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252761#comment-14252761 ] Hudson commented on HBASE-10201: SUCCESS: Integrated in HBase-1.1 #5 (See [https://builds.apache.org/job/HBase-1.1/5/]) HBASE-10201 Port 'Make flush decisions per column family' to trunk (stack: rev e55ef7a663dd9a18fa88a506afd8fe0ced10563d) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFlushRegionEntry.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushAllStoresPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestDefaultWALProvider.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicyFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestFSHLog.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java * hbase-common/src/main/resources/hbase-default.xml HBASE-10201 Addendum changes TestPerColumnFamilyFlush to LargeTest (stack: rev 5d34d2d02af39037a2426fe4fb5be9a447202bd7) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248724#comment-14248724 ] Enis Soztutar commented on HBASE-10201: --- I don't think we should have this in 1.0.0. I am planning on cutting the RC tomorrow, and this seems to be a huge change for the last minute. Can we target 1.1 instead? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248725#comment-14248725 ] Jeffrey Zhong commented on HBASE-10201: --- Looks good to me(+1) for master branch. Branch-1 should rely on [~enis]'s feedbacks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248793#comment-14248793 ] stack commented on HBASE-10201: --- bq. Can we target 1.1 instead? Sure. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248816#comment-14248816 ] stack commented on HBASE-10201: --- I forgot to say thank you [~Apache9] for your persistence on getting this in. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248943#comment-14248943 ] Hudson commented on HBASE-10201: FAILURE: Integrated in HBase-TRUNK #5930 (See [https://builds.apache.org/job/HBase-TRUNK/5930/]) HBASE-10201 Port 'Make flush decisions per column family' to trunk (stack: rev c7fad665f34fd3c17999d5cc60b04d3faff6a7f5) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestFSHLog.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java * hbase-common/src/main/resources/hbase-default.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFlushRegionEntry.java * hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALFactory.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushAllStoresPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestDefaultWALProvider.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicyFactory.java Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249188#comment-14249188 ] stack commented on HBASE-10201: --- That'll work. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249191#comment-14249191 ] Ted Yu commented on HBASE-10201: Addendum pushed to master branch. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249199#comment-14249199 ] zhangduo commented on HBASE-10201: -- {quote} I forgot to say thank you zhangduo for your persistence on getting this in. {quote} It's my pleasure to contribute code to a famous project:) Thanks Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249275#comment-14249275 ] Hudson commented on HBASE-10201: FAILURE: Integrated in HBase-TRUNK #5933 (See [https://builds.apache.org/job/HBase-TRUNK/5933/]) HBASE-10201 Addendum changes TestPerColumnFamilyFlush to LargeTest (tedyu: rev 885b065683499540f467cb54086a3f60e64b9c8a) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 2.0.0 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246385#comment-14246385 ] stack commented on HBASE-10201: --- I'm +1 on this going into master branch. I am +1 on this going into branch-1 but with it disabled by default as an experimental feature; users would have to enable the FlushLargeStoresPolicy explicitly (You ok w/ that [~enis])? Any chance of more +1s? [~jeffreyz]? Any other reviews out there? This is an old issue, nicely addressed, that can make a nice dent in our i/o profile when more than one column family but it would be good to get more eyes on it given its messing with sequenceids. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245245#comment-14245245 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687023/HBASE-10201_19.patch against master branch at commit a0e473730e2cd819e7442dbd2b332d7833755ba2. ATTACHMENT ID: 12687023 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12065//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245187#comment-14245187 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687009/HBASE-10201_19.patch against master branch at commit a0e473730e2cd819e7442dbd2b332d7833755ba2. ATTACHMENT ID: 12687009 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12064//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions.
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243274#comment-14243274 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} Now I always generate a new flushSeqId and use this as the seqId of flushed StoreFiles. And use a maxFlushedSeqId to record completeSequenceId that passed to HMaster. Is it OK? {quote} Sounds good to me. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243345#comment-14243345 ] stack commented on HBASE-10201: --- [~jeffreyz] What about the comment on issue w/ 1. above? See https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14240737page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14240737 Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243528#comment-14243528 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] {quote} Are you referring to the following: Will this mean we drop edits because region thinks its sequenceid is higher than it should be? {quote} Yes, as of today during replay edits in both modes, we drop WAL edits whose seqId less than relating store Seq Ids. There some edge cases(like a new PUT, region move to a different RS, DELETE on the new PUT, major compaction, move back to the original RS and the RS crashes) we have to know the hFile seqId accurately otherwise the PUT may be restored after recovery. We need to pass flushed seqIds per store to master so that we can optimize recovery process but doesn't impact correctness. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243546#comment-14243546 ] stack commented on HBASE-10201: --- [~jeffreyz] I'm referring to the fact that if three column families, and one has edit #1, another edit #2 (which came later) and the third had edit #3 and then if the policy decides flush the third CF, we'll write it out with a seqid of #3 but edits #1 and #2 are still in memory. We report to the master our lowest number is #1 but master crashes (so we lose info that #1 is earliest safe edit number). The RS hosting the three column famiilies also crashes. On recovery, we open the region and see a hfile with seqid #3 so we set the region current seqid to #4.. even though #1 and #2 were never persisted. This is possible with this patch as is especially when policy is disconnected from flush. bq. We need to pass flushed seqIds per store to master so that we can optimize recovery process but doesn't impact correctness. This would not fix the above case? The master might know that #3 was persisted and that column family 1 and 2 had edits less than #3 but if it crashes, we're back in the scenario described above (unless we persist the flush reports?) Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243610#comment-14243610 ] zhangduo commented on HBASE-10201: -- [~stack] In your scenario, I think we will use #1 to skip edits, not #4. As I see code in replayRecoveredEditsIfAny {code} long minSeqIdForTheRegion = -1; for (Long maxSeqIdInStore : maxSeqIdInStores.values()) { if (maxSeqIdInStore minSeqIdForTheRegion || minSeqIdForTheRegion == -1) { minSeqIdForTheRegion = maxSeqIdInStore; } } {code} And this {code} maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqIdForTheRegion) { if (LOG.isDebugEnabled()) { String msg = Maximum sequenceid for this wal is + maxSeqId + and minimum sequenceid for the region is + minSeqIdForTheRegion + , skipped the whole file, path= + edits; LOG.debug(msg); } continue; } {code} And in replayRecoveredEdits, we skip edit cells using per store seqId {code} // Now, figure if we should skip this edit. if (key.getLogSeqNum() = maxSeqIdInStores.get(store.getFamily() .getName())) { skippedEdits++; continue; } {code} And when splitting log, we use a lastSeqId got from HMaster to skip edits. If master crash and loss the information, then we will not skip any edits? I'm not sure but I didn't find the code to get lastSeqId from any place other than HMaster. [~jeffreyz] Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243631#comment-14243631 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] Besides [~Apache9] mentioned, we skip edits using seqId of each relating store, the #4(which is #3) is only set after region is full recovered(i.e all WAL edits are already replayed). {quote} If master crash and loss the information, then we will not skip any edits? {quote} yes, we'll lose the info and will replay more edits. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243714#comment-14243714 ] stack commented on HBASE-10201: --- Yes. I think it is going to be ok. I missed the 'skip edits using seqid of each relating store' bit. My calc was region based. Thanks for entertaining my question. In my scenario, the first column family that had edit #1 should have a store seqid of -1 which would mean we'd not skip edit #1 when it came into replayRecoveredEditsIfAny, I'm wondering how to make a unit test. One thought was to stand up a single HRegion of multiple column families and populate it in various ways, out of balance, and then add a means of 'killing' the region. Then create a 'recoved.edits' file and reopen the region to verify edits are as expected (and do same for DLR replay scenario)? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243728#comment-14243728 ] zhangduo commented on HBASE-10201: -- {quote} I'm wondering how to make a unit test. {quote} TestPerColumnFamilyFlush.testLogReplay has tested log replay for selective flush. I think it only misses the things that it does not kill HMaster when log replay. I can add a testcase to test the scenario that we can not get up to date lastSeqId from HMaster(kill master first, then kill regionserver, then restart master). [~stack], is this OK? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243758#comment-14243758 ] stack commented on HBASE-10201: --- bq. TestPerColumnFamilyFlush.testLogReplay has tested log replay for selective flush. Woah. Thats a nice test. How long has that been around? I missed it in previous reviews if it was present. I think this test is enough to give us confidence in this radical change. The kill of master so we don't have latest seqid is a nice to have but not necessary; we just over replay the edits. Let me go over your last posted patch. Seems like a bunch of new stuff has shown up (or I was blind last time I read through the patch). Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241121#comment-14241121 ] zhangduo commented on HBASE-10201: -- I ran the performance test in TestPerColumnFamilyFlush to confirm the patch is still work after I changed the behavior of FlushPolicy. The result is same with previous test metric_storeCount: 3, metric_storeFileCount: 9, metric_memStoreSize: 1272, metric_storeFileSize: 4509402744, metric_compactionsCompletedCount: 56, metric_numBytesCompactedCount: 20654822724, metric_numFilesCompactedCount: 184, Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241695#comment-14241695 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12686240/HBASE-10201_18.patch against master branch at commit 84b41f8029fd5822832255daeee73ff2283a622a. ATTACHMENT ID: 12686240 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/12045//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241806#comment-14241806 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} because we are not doing DLR in 0.98 or for some other reason? This patch is unlikely to make it back to 0.98 I'd say. {quote} It's because we defer mvcc values clean up(by HBASE-11315) but anyway we should maintain the semantics that HStore file seqId is the largest flushed SeqId for the file. {quote} And do I need to change original log split policy to also use a familyName-seqId map to filter out cells that already flushed? {quote} Yes, we should but you could do in a separate issue on this though. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241985#comment-14241985 ] zhangduo commented on HBASE-10201: -- {quote} but anyway we should maintain the semantics that HStore file seqId is the largest flushed SeqId for the file. {quote} I modified the code to {code} flushSeqId = getNextSequenceId(wal); long oldestUnflushedSeqId = wal.getEarliestMemstoreSeqNum(encodedRegionName); // no oldestUnflushedSeqId means we flushed all stores. // or the unflushed stores are all empty. maxFlushedSeqId = oldestUnflushedSeqId == HConstants.NO_SEQNUM ? flushSeqId : oldestUnflushedSeqId - 1; {code} Now I always generate a new flushSeqId and use this as the seqId of flushed StoreFiles. And use a maxFlushedSeqId to record completeSequenceId that passed to HMaster. Is it OK? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239588#comment-14239588 ] stack commented on HBASE-10201: --- bq. I followed RegionSplitPolicy to write FlushPolicy I see. Yes you did it seems. bq. ...Ted Yu suggested using FlushPolicyFactory and placing the factory method in it instead of FlushPolicy. OK. Its his fault then. bq. Maybe the code of RegionSplitPolicy is old and need refactoring too... My comments above apply to it too it seems yes. The master is checking it can load the split policy reaching across into the regionserver package. I suppose the idea is checking split policy in one central place. Should just load default on regionserver if we can't find the configured one. bq. ReflectionUtils.newInstance(clazz, conf) will call setConf. ok bq. Can be fixed later. ok Yeah, looks like you followed pattern in code base so not a problem of your making. Can fix both in a followup issue. bq. flushSeqId will not be bumped if we do not flush all stores. Because? bq. And actually I do not know where we use FlushMarker so I do not know the meaning of flushSeqId in the Marker... It may not be used just yet but it will be used soon by following read replicas. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239658#comment-14239658 ] Ted Yu commented on HBASE-10201: w.r.t. FlushPolicyFactory, I made the comment before sanityCheckTableDescriptor() was added. It was not my intention that master directly references class in region server module. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239958#comment-14239958 ] Jeffrey Zhong commented on HBASE-10201: --- This is a nice feature. I scan through the patch and below are my comments: 1) There may be a correctness issue for same version(same row key version) updates. Because you use following code as store file flush id, we could end up multiple hstore files with exact same flush seq id. While HBase resolve same version updates by store files' seqid(flush id). Therefore, we may end up with incorrect results. This issue may only happen in 0.98 though. {code} + long oldestUnflushedSeqId = wal + .getEarliestMemstoreSeqNum(encodedRegionName); {code} In order to fix the issue, we should use current store's max flushed seq id as its real hstore seq id. While we need to change HRegion.lastFlushSeqId to use oldestUnflushedSeqId to report back Master otherwise we may have data loss issue. 2) We have a feature where we force a flush by hbase.regionserver.optionalcacheflushinterval or hbase.regionserver.flush.per.changes while I didn't see you handle both cases in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always return true and end up with small hstore files. 3) For region server recovery, we have an optimization by using lastFlushSeqId reported by region servers to skip writing edits into recovered.edits files. With this feature, we may unnecessarily write much more data into recovered.edits. This issue doesn't happen in log replay case. 4) Relating to your FlushMarker question, FulshMarker(or similar RegionEventWALEdit) are used for region replica feature and reasoning on region/store state. As you can see(in WALEdit class), those special events are using special column family METAFAMILY which doesn't exist for data regions. You should handle those events specially in getFamilyNames() otherwise they may affect your book keeping on oldest un-flushed seqid. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240419#comment-14240419 ] zhangduo commented on HBASE-10201: -- {quote} 1) There may be a correctness issue for same version(same row key version) updates... {quote} I think you mean the KVScannerComparator will use sequenceId to compare if we get the same key. Yes this is a problem I missed. I think we need to change the code below as you suggested, use store's max seqId instead of flushSeqId here. {code} for (Store s : storesToFlush) { totalFlushableSizeOfFlushableStores += s.getFlushableSize(); storeFlushCtxs.add(s.createFlushContext(flushSeqId)); committedFiles.put(s.getFamily().getName(), null); // for writing stores to WAL } {code} {quote} 2) We have a feature where we force a flush... {quote} That's why I introduce a FlushPolicy. Now the policy is simple that we only consider the size of a store. So if we keep a store for a long time then there will be a force flush all stores request which may generate unnecessary small files. I think we can introduce new FlushPolicy later to handle it better. {quote} 3) For region server recovery... {quote} I think the issue in 1) also make the problem even worse that the flushSeqId passed to createFlushContext will be used as maxSeqId in a store...I will fix it in the next patch. And If we want to skip WAL exactly, then we need to report a familyName-seqId map to master which will change the rpc protocol(and the format of zk data in distributed log replay). This is a big change so I think we can reopen HBASE-12405 to handle it after HBASE-10201 getting in. {quote} 4) Relating to your FlushMarker question... {quote} I will fix getFamilyNames(), thanks. And is there anything else that make read replicas broken? I'm not familiar with read replicas so may miss something. Thanks~ Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0, 0.98.10 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240436#comment-14240436 ] zhangduo commented on HBASE-10201: -- {quote} OK. Its his fault then. {quote} It's not his fault as he just let me use a factory class and didn't told me to make reference to regionserver in HMaster... I think we need to find a way to do sanity check when loading tables without making reference to regionserver... {quote} flushSeqId will not be bumped if we do not flush all stores. Because? {quote} The flushSeqId will be one less than the oldest edit still in a memstore if we do not flush all stores. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0, 0.98.10 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240441#comment-14240441 ] stack commented on HBASE-10201: --- Made it critical again. Removed it from 0.98 (not critical for 0.98). Trying to get into 1.0 even if it is turned off by default. Thats why I had it critical. Speak up if you think otherwise [~enis] Doing the testing for this feature as though it were critical. Benefit is nice and its an old issue getting fixed. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240461#comment-14240461 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} (and the format of zk data in distributed log replay) {quote} You don't have to change this because log replay already gets max seqId per store before sending edits for replay. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240465#comment-14240465 ] stack commented on HBASE-10201: --- Changed my mind. Set it to Major. Doesn't need to be critical. If it gets done in time, well and good. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240528#comment-14240528 ] zhangduo commented on HBASE-10201: -- [~jeffreyz] I think flushSeqId is ambiguous here. We have two things actually, one is maxFlushedSeqId, and the other is seqIdOfFlushOperation. Before this patch, maxFlushedSeqId is equal to seqIdOfFlushOperation because we flush all the datas before the flush operation. After this patch, they are different. So I think we need to introduce a new field called maxFlushSeqId instead of lastFlushSeqId in HRegion, and generate flushSeqId in the old way(increase sequenceId of region). Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240568#comment-14240568 ] zhangduo commented on HBASE-10201: -- {quote} You don't have to change this because log replay already gets max seqId per store before sending edits for replay. {quote} There maybe some misunderstand. I pass a wrong value to createFlushContext, so the maxSeqId of store is less than it should be. This will cause unnecessary log split and replay. And if I fix this, then the problem will be what HBASE-12405 described. We need to store a map to solve HBASE-12405 perfectly. And for distributed log replay, postOpenDeployTasks will call updateRecoveringRegionLastFlushedSequenceId to store the maxSeqId in zk, and WALSplitter.splitLogFile will use it to skip WAL, then pass WAL to a regionserver to replay. We can use a map when replay the WAL to skip unnecessary cells(this is what we do in the patch). But if we store a map on zk, then we can skip the WAL earlier. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240619#comment-14240619 ] zhangduo commented on HBASE-10201: -- Oh, after digging into the code I found that the file stored on zk already have the seqId of each store, but WALSplitter.splitLogFile only use the LastFlushedSequenceId and ignore the store sequence id. So it is easy to change the split work to use sequence id of store. Let me try. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240737#comment-14240737 ] stack commented on HBASE-10201: --- [~jeffreyz] When you say... bq. ... This issue may only happen in 0.98 though. because we are not doing DLR in 0.98 or for some other reason? This patch is unlikely to make it back to 0.98 I'd say. On the fix for 1.) above, hfiles, will be written out with the stores flushed seqid but we will tell keep on telling master the oldest unflushed edit (oldestUnflushedSeqId). Since flush policies can return any set of Stores without regard to sequenceid, we could have edits in memstores with sequenceids that are in earlier than those of persisted hfiles. Since telling the master oldestUnflushedSeqId does not guarantee that oldestUnflushedSeqId will be available at recovery time (it is in the master memory only IIRC, and master may crash and lose it), when region opens post-recovery, we look at sequenceids from hfiles to figure the regions sequenceid. Will this mean we drop edits because region thinks its sequenceid is higher than it should be? 3. is a 'known' cost. Good to know that DLR won't have this issue. 4. is a good point (as is 2.) Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240754#comment-14240754 ] zhangduo commented on HBASE-10201: -- {quote} 3. is a 'known' cost. Good to know that DLR won't have this issue. {quote} Yeah, at last I found that LogReplayOutputSink will filter out cells using regionMaxSeqIdInStores in groupEditsByServer method. This is actually what we want. And do I need to change original log split policy to also use a familyName-seqId map to filter out cells that already flushed? Thanks~ Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240779#comment-14240779 ] zhangduo commented on HBASE-10201: -- {quote} 2) We have a feature where we force a flush by hbase.regionserver.optionalcacheflushinterval or hbase.regionserver.flush.per.changes while I didn't see you handle both cases in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always return true and end up with small hstore files. {quote} I think a get the point. Actually I use forceFlushAllStores=true when shouldFlush returns true so there will not be a situation that HRegion.shouldFlush() always returns true because we will flush all stores. But I think we can pass forceFlushAllStores=false in that case and add old stores to the specificStoresToFlush in selectStoresToFlush to better handle it. I will fix it. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237602#comment-14237602 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685679/HBASE-10201_17.patch against master branch at commit 87e44140040ab9a864e592c13f164dcde6ed6c03. ATTACHMENT ID: 12685679 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11993//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238808#comment-14238808 ] stack commented on HBASE-10201: --- [~jeffreyz] Would you mind taking a look at the sequenceid accounting that is going on in this patch? I am currently testing. Would be good to get the view of another with a seqid fixation. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239008#comment-14239008 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] Sure. Let me take a look at this patch! Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239062#comment-14239062 ] stack commented on HBASE-10201: --- Here is some feedback on reading through latest version of the patch. Lets address these minor items after we are sure the most important part is working, the sequenceid handling (I'm running tests here but its taking a while -- first I need to prove that hbase 1.0 branch is healthy, then I intro your patch... and [~jeffreyz], Mr SequenceId is going to take a look too). So hold on making a patch till testing and Jeffrey's review are done. Sorry this is taking so long to get in. On the one hand, you can tell we are excited about getting this patch in because the improvement is really nice but it touches a very sensitive part of hbase, the region sequenceid'ing, so we need to exercise extra caution. Thanks for you patience [~Apache9] Nits to be addressed on commit or if you make a new version of the patch (you've done enough as it is -- smile -- and I could do below on commit np). I can go over the javadoc on commit. Small edit would fix it all up nicely. Below is a nit that can be addressed in a follow-on: This config is not general. It belongs to a particular policy (If FlushLargeStoresPolicy is used...): hbase.hregion.percolumnfamilyflush.size.lower.bound Should probably have the policy it is for in its name. Maybe just don't mention is in hbase-default.xml. Let uses find it if they need it (16MB is a nice default low-bound). It is odd that this is public: public static Class? extends FlushPolicy getFlushPolicyClass( It is nice that the master tests that we can load a policy but it does not even use flush policy (if we fail to load fall back to default with big warning?) And flush policies are over in regionserver package so here we have master reaching over and into the regionserver package. Would be good to avoid doing this x-package reach especially when it does not seem to be needed. I would think this would be an internal method for the factory to use? Also in HTD, you call it getFlushPolicyClassName but here you call it getFlushPolicyClass... would be good to be same. This policy stuff you've added is nicer than what was here previous. Good one. Should these two strings just be the same? FLUSH_SIZE_LOWER_BOUND_KEY and DEFAULT_FLUSH_SIZE_LOWER_BOUND even though they are read from different places? No harm the key being the same especially since in HTD, you hide the key by providing getter/setters. The FlushPolicy api is a little odd. It implements Configured but where do you do a setConf on it? Then in the configureForRegion method, you take a Region but all it is used for is to emit region name on Strings and to get instance of HTableDescriptor. The flush takes a list of stores. Can't it get them from the region it was given when configuredForRegion? This is a nit comment. Ignore for now. ... Stopped at sequence id changes will be back. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239084#comment-14239084 ] zhangduo commented on HBASE-10201: -- [~stack] I followed RegionSplitPolicy to write FlushPolicy, expect that [~tedyu] suggested using FlushPolicyFactory and placing the factory method in it instead of FlushPolicy. Maybe the code of RegionSplitPolicy is old and need refactoring too... {quote} The FlushPolicy api is a little odd. It implements Configured but where do you do a setConf on it? Then in the configureForRegion method, you take a Region but all it is used for is to emit region name on Strings and to get instance of HTableDescriptor. The flush takes a list of stores. Can't it get them from the region it was given when configuredForRegion? This is a nit comment. Ignore for now. {quote} ReflectionUtils.newInstance(clazz, conf) will call setConf. And I agreed that if we implement configureForRegion, then the list of stores is not necessary when doing selection. Can be fixed later. [~jeffreyz] I think the biggest problem is that this patch change the flushSeqId generation. flushSeqId will not be bumped if we do not flush all stores. I think the flushSeqId should be called as highestFlushedToDiskSeqId in this patch. And actually I do not know where we use FlushMarker so I do not know the meaning of flushSeqId in the Marker... Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237213#comment-14237213 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685611/HBASE-10201_14.patch against master branch at commit bb15fd5fe0a89e647cd9cefa0ceae342578f0833. ATTACHMENT ID: 12685611 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11982//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237214#comment-14237214 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685612/HBASE-10201_15.patch against master branch at commit bb15fd5fe0a89e647cd9cefa0ceae342578f0833. ATTACHMENT ID: 12685612 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11983//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237370#comment-14237370 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685645/HBASE-10201_16.patch against master branch at commit 9fd6db3703d3e7ec50b32b1e96c65ed9f2b1456d. ATTACHMENT ID: 12685645 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + private static final Class? extends FlushPolicy DEFAULT_FLUSH_POLICY_CLASS = FlushLargeStoresPolicy.class; +new WALKey(info.getEncodedNameAsBytes(), htd.getTableName(), System.currentTimeMillis()), {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11990//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235277#comment-14235277 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685263/HBASE-10201_13.patch against master branch at commit 08754f2c431b829b0d6269bdb23284dd679ed8ca. ATTACHMENT ID: 12685263 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11945//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236647#comment-14236647 ] zhangduo commented on HBASE-10201: -- {quote} I see less compactions and less hfiles (so less i/o), memstores carrying more (its hard to see but you should be able to make out memstore sizes do not go to zero or near zero when the patch is enabled) {quote} Glad to see it does help:). Thanks~ Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236671#comment-14236671 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685524/HBASE-10201_14.patch against master branch at commit 4a36f662c2738a61535cf188f27d478d72c5a38a. ATTACHMENT ID: 12685524 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 15 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11969//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234975#comment-14234975 ] zhangduo commented on HBASE-10201: -- {quote} Suggest you drop memstore from the names of these configs: hbase.hregion.memstore.percolumnfamilyflush.enabled hbase.hregion.memstore.percolumnfamilyflush.size.lower.bound {quote} done. {quote} Importing HRegion into HMaster should be avoided – we are reaching across packages – especially just to get at a define. Move this config up into HConstants since it is used by two major subpackages or probably better, put it into HRegionInfo. {quote} Now there is only a FlushPolicy.getFlushPolicyClass(htd, conf); in HMaster. It is same with RegionSplitPolicy. {quote} Why do we have to change the API on FlushRequest? Can the flush implementation not do all the necessary figuring of what to flush reading necessary configs., etc.? Maybe you need the flag to 'force' a full region flush? If so, should it be a force flag rather than the effete 'selectiveFlushRequest'? {quote} I changed selectiveFlushRequest to forceFlushAllStores, and done a true/false reversion. Hope I didn't miss something. {quote} Add the fact that we are doing per col flushing as an attribute on summary line printed out on region instantiation rather than give it its own log line: {quote} Sorry I didn't find the summary log. But now the log is disappeared after the introducing of FlushPolicy. And is the QA bot still broken? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233701#comment-14233701 ] stack commented on HBASE-10201: --- Trying this patch out reviewing v12. Suggest you drop memstore from the names of these configs: hbase.hregion.memstore.percolumnfamilyflush.enabled hbase.hregion.memstore.percolumnfamilyflush.size.lower.bound Its only memstores that are flushed so having memstore in the name is redundant. Same here MEMSTORE_COLUMNFAMILY_FLUSHSIZE_LOWER_BOUND and here getMemStoreColumnFamilyFlushSizeLowerBound This is a nit. Can do later if we make more patches. Importing HRegion into HMaster should be avoided -- we are reaching across packages -- especially just to get at a define. Move this config up into HConstants since it is used by two major subpackages or probably better, put it into HRegionInfo. So I enable this feature in hbase-site.xml and I can enable it globally also in hbase-site.xml but I can also enable it on a per-table basis? Thats good. Why do we have to change the API on FlushRequest? Can the flush implementation not do all the necessary figuring of what to flush reading necessary configs., etc.? Maybe you need the flag to 'force' a full region flush? If so, should it be a force flag rather than the effete 'selectiveFlushRequest'? Add the fact that we are doing per col flushing as an attribute on summary line printed out on region instantiation rather than give it its own log line: +if (LOG.isDebugEnabled()) { + LOG.debug(Per Column Family Flushing: + perColumnFamilyFlushEnabled); +} More review later. This patch is great. THere were some rejects but easy to fix. Let me try and get some numbers to help make the case for this patch going in. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231134#comment-14231134 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684565/HBASE-10201_12.patch against master branch at commit 94d57f81dc114feba14906b05b3d2c6b78bf3299. ATTACHMENT ID: 12684565 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.camel.component.jms.JmsDefaultTaskExecutorTypeTest.testSimpleAsyncTaskExecutor(JmsDefaultTaskExecutorTypeTest.java:70) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11891//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228057#comment-14228057 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684125/HBASE-10201_11.patch against master branch at commit 0f8894cd6435ed6962ec3d7c81be4cb0d4f7657e. ATTACHMENT ID: 12684125 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11857//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220880#comment-14220880 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682853/HBASE-10201_10.patch against master branch at commit 325cdc0987f8176ac46695f5b0c93b0fc6605ab9. ATTACHMENT ID: 12682853 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11774//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220602#comment-14220602 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682818/HBASE-10201_9.patch against master branch at commit c5690b1be3ae84efa52ee3c4589248c447e12f3f. ATTACHMENT ID: 12682818 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +final WAL wal, final long myseqid, CollectionStore storesToFlush, MonitoredTask status) +new WALKey(info.getEncodedNameAsBytes(), htd.getTableName(), System.currentTimeMillis()), {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11771//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214707#comment-14214707 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681891/HBASE-10201_7.patch against trunk revision . ATTACHMENT ID: 12681891 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3793 checkstyle errors (more than the trunk's current 3788 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + private final ConcurrentMapbyte[], ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIds = new ConcurrentSkipListMapbyte[], ConcurrentMapbyte[], Long( + ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = oldestUnflushedStoreSequenceIds + // assert not empty. Less rigorous, but safer, alternative is telling the caller to stop. + Mapbyte[], Long storeSeqNumsBeforeFlushStarts = this.lowestFlushingStoreSequenceIds.remove(encodedRegionName); + if (currentSeqNum != null currentSeqNum.longValue() = familyNameAndSeqId.getValue().longValue()) { +ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = this.oldestUnflushedStoreSequenceIds +return oldestUnflushedStoreSequenceIdsOfRegion != null ? getLowestSeqId(oldestUnflushedStoreSequenceIdsOfRegion) +ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = this.oldestUnflushedStoreSequenceIds +final HLog wal, final long myseqid, CollectionStore storesToFlush, MonitoredTask status) {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11705//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214971#comment-14214971 ] Ted Yu commented on HBASE-10201: @Duo: Can you briefly describe the addition in patch v7 w.r.t. per store sequence Id ? {code} + @Test + public void testCompareStoreFileCount() throws Exception { {code} Mind adding comment describing what the above test verifies ? {code} + public static void main(String[] args) throws Exception { {code} Why is main() needed in the unit test ? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215113#comment-14215113 ] stack commented on HBASE-10201: --- bq. We need to change protobuf definition We could add extra fields in pb and write to two places for the life of an hbase version to support rolling upgrade. I hope you do not mind me surfacing here questions asked off list -- its best to keep the discussion up here rather than off-list so others can participate too. You described off-list how the distributed log replay opens a region and puts the highest *sequenceid* found up in zk and then uses this to figure which edits to replay. You also talk of how regionServerReport includes the last flush id of each region we carry and that the master keeps this around so on log replay we can skip edits already flushed. You then ask: bq. I think I need to change all these places to use a map which stored familyName-maxSeqId instead of a single SeqId. Am I right? The sequenceid is *region-scoped*: i.e. we keep a running sequenceid per region. For the above to work out, we'd need to change the sequenceid scope to be instead column-family rather than region. Since our memstore is by column family, and since the memstore now uses the region sequenceid as its MVCC, this might be a good direction to go in but it is not what we have now. You cannot have it so there are discontinuities in the progress of the flush sequenceid. If four column families, the edits can go in to any of the four families in any order. You could do something like [~gaurav.menghani] did (See https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203) suggests above where rather than report on successful flush, the highest sequenceid of all a regions' memstores involved in a flush, instead, when you flush a column family only, you'd have to report one less than the oldest outstanding edit still alive up in a column family memstore. What if you did something much less involved; when there is pressure to flush, flush the stores with the oldest edits until you've freed enough memory? Upsides are that you'd clear out old edits from memory and we might let go of WALs a little faster. Also, you might not flush all of the content in a region -- because flushing just a few stores might be enough to get you back under the threshold -- so we might make less small storefiles? Downsides are we'd make some small storefiles (e.g. for those stores that have a few old edits in them and little else) and we'd do the flush in series rather than in //. Because of sequenceid accounting, we might replay more edits than we have to. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215472#comment-14215472 ] zhangduo commented on HBASE-10201: -- {quote} Can you briefly describe the addition in patch v7 w.r.t. per store sequence Id {quote} I move the map which stores familyName-oldestSeqIdInStore to FSHLog, and when start a flush, pass the familyNames which will be flushed to HLog. And when replay, skip WAL cells with seqId per store instead of a single seqId of region. {quote} Why is main() needed in the unit test ? {quote} It is used to benchmark a real cluster Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215501#comment-14215501 ] zhangduo commented on HBASE-10201: -- {quote} You could do something like Gaurav Menghani did (See https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203) suggests above where rather than report on successful flush, the highest sequenceid of all a regions' memstores involved in a flush, instead, when you flush a column family only, you'd have to report one less than the oldest outstanding edit still alive up in a column family memstore. {quote} Yes, this is what the patch doing now. This is the way which has minimal impact on existing code. {quote} What if you did something much less involved; when there is pressure to flush, flush the stores with the oldest edits until you've freed enough memory? {quote} I think we need to identify the reason why we need a flush. If we need a flush due to large memstore size, then flush large store is enough. If we need a flush due to the oldest seqId alived in memstore is far away from now(which means we have lots of WAL that can not be archived), then we need to flush the store which has the oldest seqId in memstore(or maybe just flush all the stores? simple but useful). Maybe I can change the return value of shouldFlush from boolean to enum to indicate the reason why we need a flush. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215702#comment-14215702 ] stack commented on HBASE-10201: --- bq. If we need a flush due to large memstore size, then flush large store is enough. To be clear, you are suggesting that we would flush the big store but we would not move the sequenceid forward; it would still be one less than the oldest edit still in a memstore? Then, our other flush forcing function, the one that wants to clear up old WALs, would come along and force the flushing of the old memstores? That is an interesting idea. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215724#comment-14215724 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682076/HBASE-10201_8.patch against trunk revision . ATTACHMENT ID: 12682076 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11727//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215740#comment-14215740 ] zhangduo commented on HBASE-10201: -- {quote} To be clear, you are suggesting that we would flush the big store but we would not move the sequenceid forward; it would still be one less than the oldest edit still in a memstore? {quote} Yes, the complete sequenceId is always the oldest edit still in memstore minus one except we flush all stores. {quote} Then, our other flush forcing function, the one that wants to clear up old WALs, would come along and force the flushing of the old memstores? {quote} Yes, for LogRoller and PeriodicMemstoreFlusher. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215743#comment-14215743 ] stack commented on HBASE-10201: --- You have any means of trying out your patch to get rough numbers to see if it helps [~Apache9]? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215745#comment-14215745 ] stack commented on HBASE-10201: --- [~Apache9] Thanks for working on this. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207872#comment-14207872 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681029/HBASE-10201_6.patch against trunk revision . ATTACHMENT ID: 12681029 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11649//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208277#comment-14208277 ] stack commented on HBASE-10201: --- This cannot go in till HBASE-12405 is done, right? (I'm trying to write up a doc on how sequenceid is used in hbase to help). How'd you generate the numbers and what is WAF? Thanks [~Apache9] Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208962#comment-14208962 ] zhangduo commented on HBASE-10201: -- I used to think this should go into master first as an experimental feature, and HBASE-12405 is based on this issue. Now I think you'are right stack, seems making HBASE-10201 base on the work of HBASE-12405 is more natural, not the reverse Never mind, I will finish HBASE-12405 as soon as possible {quote} How'd you generate the numbers and what is WAF? {quote} I create a table with 3 CFs, disable split(use a large constants split size), and put 1M rows into the table. key is a 16B, and 16B value for CF1, 256B value for CF2, 4KB value for CF3. the result number is copied from the jmx web page of regionserver. WAF is short for Write Amplification, and I calculate it simply by numBytesCompactedCount/storeFileSize Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198411#comment-14198411 ] Jean-Marc Spaggiari commented on HBASE-10201: - Do you have the metrics with the last version of the patch? From a previous version it was: With per CF flush: metric_storeCount: 3, metric_storeFileCount: 7, metric_memStoreSize: 110195648, metric_storeFileSize: 4369570622, metric_compactionsCompletedCount: 27, metric_numBytesCompactedCount: 10353718691, metric_numFilesCompactedCount: 89, Write amplification: 2.37 Does it changed? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198501#comment-14198501 ] zhangduo commented on HBASE-10201: -- No, I have not run the test on master branch yet. I will run it tomorrow cause it is already 23:15 in China... Sad for the time zone difference... Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199822#comment-14199822 ] zhangduo commented on HBASE-10201: -- Results on master branch 2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa metric_storeCount: 3, metric_storeFileCount: 9, metric_memStoreSize: 40117320, metric_storeFileSize: 4461018704, metric_compactionsCompletedCount: 92, metric_numBytesCompactedCount: 22091556672, metric_numFilesCompactedCount: 290 Write amplification(numBytesCompactedCount/storeFileSize): 4.95 Elapsed time: 23m32s 2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa with HBASE-10201 metric_storeCount: 3, metric_storeFileCount: 8, metric_memStoreSize: 16400424, metric_storeFileSize: 4483028246, metric_compactionsCompletedCount: 54, metric_numBytesCompactedCount: 20497293164, metric_numFilesCompactedCount: 178 Write amplification(numBytesCompactedCount/storeFileSize): 4.57 Elapsed time: 23m5s 2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa with HBASE-10201 but disable selective flush metric_storeCount: 3, metric_storeFileCount: 9, metric_memStoreSize: 39937056, metric_storeFileSize: 4461185232, metric_compactionsCompletedCount: 92, metric_numBytesCompactedCount: 22092540348, metric_numFilesCompactedCount: 290 Write amplification(numBytesCompactedCount/storeFileSize): 4.95 Elapsed time: 22m51s Seems default config on master will do compactions more aggresive, but the result of WAF decrease is not changed too much. (4.95-4.57)/4.95=7.68% Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190775#comment-14190775 ] stack commented on HBASE-10201: --- Let me review this patch once more.That all tests pass with it enabled is encouraging. Can work on these it test failures separately. It is not your issue. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190804#comment-14190804 ] Gaurav Menghani commented on HBASE-10201: - [~Apache9] Great work porting this patch! Glad to see this getting ported from 0.89-fb to trunk :) Please let me know if you need any help. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190830#comment-14190830 ] Ted Yu commented on HBASE-10201: +1 on turning on this in master branch. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191008#comment-14191008 ] stack commented on HBASE-10201: --- I wrote the list to get another opinion on the patch pre-commit. Has this patch been deployed somewhere in production (smile?). If so, would be good to know. In production, it helps? On rereview: This value should be less than half of the total memstore +threshold (hbase.hregion.memstore.flush.size). Do we ensure this in code? If not, should we? bq. I think it is better to open another issue to handle the duplication. Can you do this for the accounting fixup so by-Store in HLog. Should log when we do this: +long columnfamilyFlushSize = this.htableDescriptor +.getMemStoreColumnFamilyFlushSize(); +if (columnfamilyFlushSize = 0) { + columnfamilyFlushSize = conf.getLong( + HConstants.HREGION_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND, + HTableDescriptor.DEFAULT_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND); I can add on commit unless we are doing a new version. This does not have to be public since it is used from same package: + public long getEarliestFlushTimeForAllStores() { ditto this getLatestFlushTimeForAllStores And this ... isPerColumnFamilyFlushEnabled nit: Guard debug logging with an if LOG.isDebugEnabled... + LOG.debug(Since none of the CFs were above the size, flushing all.); When we flush, we write the sequenceid flush to WAL. This patch should have no effect on it. Sequenceids are region scoped. If we flush by Store, will there be holes in our accounting? For example, given 3 column families, A, B, and C. I write sequenceid 1 to A, sequenceid 2 to B, and sequenceid 3 to C. I then write sequence 4 to A. The edit at sequenceid 4 is big and pushes us over and brings on a flush. We flush A and edits 1 and 4. Is the fact that edits 2 and 3 are still up in memory going to mess us up Say the server crashes, at replay time we see we flushed up to edit 4, will we think that we edits 2 and 3 persisted? If you don't have an answer, I can work on the answer. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191015#comment-14191015 ] Gaurav Menghani commented on HBASE-10201: - [~stack] From my design, in this case, 1 and 4 are flushed, but 2 and 3 are retained in the memory. But we can only mark 1 as safe. 2, 3 and 4 will all be replayed if the server crashes. I am not sure, if this has changed in the patch. The Per-CF change is not running in prod right now. I didn't see any big difference deploying it out of the box with the biggest customer where we have a lot of CFs (probably also high-lighted by the small difference in WAF). But I can try running it internally on a shadow cluster again. Let me know if there are some interesting metrics you want me to look at. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191091#comment-14191091 ] zhangduo commented on HBASE-10201: -- {quote} Sequenceids are region scoped. If we flush by Store, will there be holes in our accounting? I write sequenceid 1 to A, sequenceid 2 to B, and sequenceid 3 to C. I then write sequence 4 to A. The edit at sequenceid 4 is big and pushes us over and brings on a flush. We flush A and edits 1 and 4. Is the fact that edits 2 and 3 are still up in memory going to mess us up Say the server crashes, at replay time we see we flushed up to edit 4, will we think that we edits 2 and 3 persisted? If you don't have an answer, I can work on the answer. {quote} Yes, we write flush seqId 1(Oh I made a mistake, I write seqId 2 in this case, flushSeqId = oldestSeqIdInStoresNotToFlush should be flushSeqId = oldestSeqIdInStoresNotToFlush - 1, I will fix it) in this case, so there will be holes and some WAL replay is unnecessary when doing recovery. We need to store a map of seqId per store instead of a single seqId to solve this, and also need some efforts on log truncation and log replay. {quote} Has this patch been deployed somewhere in production (smile?). If so, would be good to know. In production, it helps? {quote} For me, no. I am using 0.98.6.1 with HBASE-12078 patched right now(so I first try to port it to 0.98 in this issue...). Some test result is posted above. And in our production, I always see log like this {quote} 2014-09-29 13:16:25,061 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for sync:Snapshot,\x00\x00\x00\x00\x02$\x0CC,1411782012686.50aba6be7ff3150be983cb6fd77fc686., current region memstore size 128.3 M 2014-09-29 13:16:25,121 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=265.7 K, hasBloomFilter=true, into tmp file hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff315 0be983cb6fd77fc686/.tmp/129e5ef69d7449fea9c2357aa6c4340a 2014-09-29 13:16:25,192 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=2.2 M, hasBloomFilter=true, into tmp file hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150b e983cb6fd77fc686/.tmp/316fee39423142e09cdb767de9f9bc5d 2014-09-29 13:16:25,528 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=27.9 M, hasBloomFilter=true, into tmp file hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150 be983cb6fd77fc686/.tmp/a886c1e39565468fbf93be6c434f5fc5 2014-09-29 13:16:26,190 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=98.0 M, hasBloomFilter=true, into tmp file hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150 be983cb6fd77fc686/.tmp/ec722497c6e14d0fa732c2a9d29e3391 {quote} The smallest store is always flushed with only KBs. That's the reason why I found this issue and started to working on it... {quote} Can you do this for the accounting fixup so by-Store in HLog. {quote} Yes, I can open another issue to work on this. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191117#comment-14191117 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678367/HBASE-10201_4.patch against trunk revision . ATTACHMENT ID: 12678367 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11531//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191203#comment-14191203 ] stack commented on HBASE-10201: --- [~gaurav.menghani] Thank you for helping land this upstream and thanks for the update on its state at your shop. What about the recording of last-flushed-sequenceid at the master so it knows what it can safely skip replaying edits on crash for a region; would that only report '1' in our scenario above? Thanks. [~Apache9] Thanks for the new patch. I think I need to go through and check sequenceid accounting. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191302#comment-14191302 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678382/HBASE-10201_5.patch against trunk revision . ATTACHMENT ID: 12678382 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestCoprocessorHConnection Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11533//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188070#comment-14188070 ] stack commented on HBASE-10201: --- bq. I think it is better to open another issue to handle the duplication. OK. bq. getEarliestFlushTimeForAllStore should be public because TestIOFencing use it(which in another package). FYI, we mark these with @VisibleForTesting annotation.. I can do on commit. bq. but I see lots of other similar methods declared as public... Yeah, sorry about that; we ain't always consistent trying. bq. Does this meet the requirement? Yes. Out of interest, are you using the hbase formatter? bq. I tried but failed to make dev-support/test-patch.sh work properly... Yeah, this stuff is focused on the master. Unit tests passing on branch-1 would be great. Just note it here in the issue. You going to try hbase-it? Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188093#comment-14188093 ] zhangduo commented on HBASE-10201: -- {quote} Out of interest, are you using the hbase formatter? {quote} No, I just use the default formatter with indent and max length changed. Only new code is formatted, old code is format manually to keep the patch clean... I will try the hbase formatter later. I found it when looking for test-patch.sh, thanks. {quote} You going to try hbase-it? {quote} Yes I have run it with 'mvn verify' under hbase-it. There are some fails and errors, I need to see the source code to identify the reason. Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189417#comment-14189417 ] stack commented on HBASE-10201: --- bq. Yes I have run it with 'mvn verify' under hbase-it. There are some fails and errors, I need to see the source code to identify the reason. Suggest you first run it before your patch is applied. It may not be in healthy state currently. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189517#comment-14189517 ] zhangduo commented on HBASE-10201: -- Yes, I run without the patch first, the result is {quote} Results : Failed tests: IntegrationTestIngestWithACLIntegrationTestBase.setUp:122-setUpCluster:64-IntegrationTestIngest.setUpCluster:88-IntegrationTestIngest.initTable:93 Failed to initialize LoadTestTool expected:0 but was:1 Tests in error: IntegrationTestMTTR.testRestartRsHoldingTable:261-run:305 ? Execution org.apa... Tests run: 20, Failures: 1, Errors: 1, Skipped: 1 {quote} Result with patch is {quote} Results : Failed tests: IntegrationTestIngestWithVisibilityLabelsIntegrationTestIngest.testIngest:104-IntegrationTestIngest.runIngestTest:166 Update failed with error code 1 IntegrationTestIngestWithACLIntegrationTestBase.setUp:122-setUpCluster:64-IntegrationTestIngest.setUpCluster:88-IntegrationTestIngest.initTable:93 Failed to initialize LoadTestTool expected:0 but was:1 IntegrationTestIngestWithTagsIntegrationTestIngest.testIngest:104-IntegrationTestIngest.runIngestTest:174 Verification failed with error code 1 Tests in error: IntegrationTestMTTR.testRestartRsHoldingTable:261-run:305 ? Execution org.apa... IntegrationTestMTTR.testMoveRegion:271-run:305 ? Execution org.apache.hadoop Tests run: 20, Failures: 3, Errors: 2, Skipped: 1 {quote} For IntegrationTestMTTR.testMoveRegion, it will be passed if I run it separately with other methods in the same class being commented, and using command mvn clean test-compile failsafe:integration-test -Dit.test=IntegrationTestMTTR -DfailIfNoTests=false. Now i'm debugging IntegrationTestIngestWithVisibilityLabels, but the log is flooded with {quote} java.io.IOException: Compression algorithm 'lz4' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:90) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:4936) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4923) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4896) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4868) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4824) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4775) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:276) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:103) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) java.io.IOException: Compression algorithm 'snappy' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:90) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:4936) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4923) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4896) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4868) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4824) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4775) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:276) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:103) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {quote} and it is hard to find useful informations from the output. I have compiled hadoop native libs, but I do not know where to place it when running tests... Or is there a way to disable compression when running integration tests? I think the result will not be changed since the patch has nothing to do with compression... Thanks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186735#comment-14186735 ] zhangduo commented on HBASE-10201: -- {quote} Oh, seems like this is lower bound. Only familys than this size get flushed. Thats nice. Need to rename I'd say. DEFAULT_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND? {quote} Done {quote} nit: '+ (just as usual). This value should less than half of the total memstore' missing a 'be' {quote} Done {quote} Are we double-accounting here? + // keep track of oldest sequence id of edit in a store. + private final ConcurrentMapStore, AtomicLong oldestSeqIdOfStore = + new ConcurrentHashMapStore, AtomicLong(); The oldest seqid to region is being done by the WAL subsystem IIRC. Now we are doing it in here by Store. Should we do it in the one place only? The WAL is keeping accounting so it knows when to release WALs that no longer have edits. Does this accounting interfere? {quote} I add a comment on it to noticed that there is a duplication. There is also other duplication in HRegion such as lastFlushSeqId. And for oldestSeqIdOfStore, I think it is better to store it in FSHLog because it is single-threaded and sequence id is generated in that thread, the update logic can be more straight forward without any performance issue, and also, only need to modify one place instead of four places in current solution. But this means we change FSHLog's tracking unit from Region to Store, there may be a lot of work to do which is not related to this issue. I think it is better to open another issue to handle the duplication. {quote} Add a log WARN here: + if (columnfamilyFlushSize = 0) { ? {quote} This is same with memstoreFlushSize's initialize code above. 0 is possible if it is not set in HTableDescriptor. {quote} Name 'getMinFlushTimeForAllStores' as 'getEarliestFlushTimeForAllStore'? I think it clearer that it does if you have 'earlier' in there (as you have it in your comments). In sympathy add 'latest' into this method name getMaxFlushTimeForAllStores Do these two above methods need to be public? Can they be package private? Do they need to be exposed at all? Ditto for this isPerColumnFamilyFlushEnabled and flushcache {quote} Renaming is done. getEarliestFlushTimeForAllStore should be public because TestIOFencing use it(which in another package). For other methods, they can be package private, but I see lots of other similar methods declared as public... {quote} Should be a WARN: + LOG.debug(Disabling selective flushing of Column Families' memstores.); ? This comment right? Should it be 'region' rather than 'memstore' in some of the below? + // We now have to flush the memstore since it has + // reached the threshold, however, we might not need + // to flush the entire memstore. If there are certain {quote} Done {quote} Make one log line rather than two: + LOG.info(Started memstore flush for + this + , current region memstore size + + StringUtils.byteDesc(this.memstoreSize.get()) + , and + storesToFlush.size() + / + + stores.size() + column families' memstores are being flushed. + + ((wal != null) ? : ; wal is null, using passed sequenceid= + myseqid)); + for (Store store: storesToFlush) { {quote} I use a formatter to wrap it automatically. I modify it manually to {code:title=HRegion.java|borderStyle=solid} LOG.info(Started memstore flush for + this + , current region memstore size + StringUtils.byteDesc(this.memstoreSize.get()) + , and + storesToFlush.size() + / + stores.size() + column families' memstores are being flushed. + ((wal != null) ? : ; wal is null, using passed sequenceid= + myseqid)); {code} Does this meet the requirement? {quote} How you justify removing this? flushSeqId = getNextSequenceId(wal); {quote} it is not removed. I move it startCacheFlush. {quote} nit: in below if ((now - getLastFlushTime() flushCheckInterval)) { + if ((now - getMinFlushTimeForAllStores() flushCheckInterval)) { The former is 'LastFlushTime' and the new code is 'MinFlushTime'... which should it be? Do we intend same thing here? {quote} I think it should be getLatestFlushTimeForAllStores, not getEarliestFlushTimeForAllStores. Although we may not flush all the stores. {quote} Instead of + for (AtomicLong oldestSeqId: needToUpdate) { ... can you use what is in AtomicUtils? {quote} Use AtomicUtils.updateMin instead. Thanks. {quote} THis can't be package private getOldestSeqIdOfStore ? {quote} TestPerColumnFamilyFlush need it and is in another package. {quote} What is difference between oldest and lowest in below? Mapbyte[], Long oldestFlushingSeqNumsLocal = null; Mapbyte[], Long oldestUnflushedSeqNumsLocal = null; + Mapbyte[], Long lowestFlushingRegionSequenceIdsLocal = null; + Mapbyte[], Long oldestUnflushedRegionSequenceIdsLocal = null; {quote} They are just copies of class fields, so
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186762#comment-14186762 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677580/HBASE-10201_2.patch against trunk revision . ATTACHMENT ID: 12677580 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: ++ , and + storesToFlush.size() + / + stores.size() + column families' memstores are being flushed. + * oldestUnflushedRegionSequenceIds. We use these Maps to find out the low bound regions sequence id, or + lowestFlushingRegionSequenceIdsLocal = new HashMapbyte[], Long(this.lowestFlushingRegionSequenceIds); + Long oldValue = this.lowestFlushingRegionSequenceIds.put(encodedRegionName, oldRegionSeqNum); + hlog.startCacheFlush(hri1.getEncodedNameAsBytes(), Long.MAX_VALUE, Long.MAX_VALUE, sequenceId1); + private void flushRegion(HLog hlog, byte[] regionEncodedName, AtomicLong sequenceId) throws IOException { +final HLog wal, final long myseqid, CollectionStore storesToFlush, MonitoredTask status) {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11490//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch Currently the flush decision is made using the aggregate size of all column
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187090#comment-14187090 ] Hadoop QA commented on HBASE-10201: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677610/HBASE-10201_3.patch against trunk revision . ATTACHMENT ID: 12677610 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11492//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188024#comment-14188024 ] Hadoop QA commented on HBASE-10201: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677829/HBASE-10201-0.99.patch against trunk revision . ATTACHMENT ID: 12677829 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11501//console This message is automatically generated. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184997#comment-14184997 ] Anoop Sam John commented on HBASE-10201: {code} + // -- + // STEP 6. Record oldest sequence id of memstore + // -- + long seqId = walKey.getSequenceId(); + for (Store store: storesUpdated) { +store.setSeqIdOfOldestEdit(seqId); + } + // --- - // STEP 6. Release row locks, etc. + // STEP 7. Release row locks, etc. // --- if (locked) { this.updatesLock.readLock().unlock(); {code} Adding this step here under rowlocks will make us wait more and will affect write throughput. [~stack] done some test related to this in another Jira. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Reporter: Ted Yu Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185013#comment-14185013 ] zhangduo commented on HBASE-10201: -- This step must be done here unless we change the behavior of internalFlushcache. In internalFlushcache, we do the following step acquire updatesLock-begin mvcc insert-prepare flush-release updatesLock-advance mvcc-flushing seqIdOfOldestEdit is used by prepare flush to get a flushSeqId(because we do not flush all stores). If we do not record seqIdOfOldestEdit under updatesLock, we may get an inconsistent view of the memstore's snapshot and seqIdOfOldestEdit. Maybe I canonly record it when perColumnFamilyFlushEnabled is true? Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Reporter: Ted Yu Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185023#comment-14185023 ] zhangduo commented on HBASE-10201: -- oh, I think there is a trick to optimize it. In most case, store.setSeqIdOfOldestEdit(seqId) will not change the value of seqIdOfOldestEdit because the sequence id is increase and we need to record the smallest one. maybe we can record current sequence id before appendNoSync, and compare it with seqIdOfOldestEdit, if it is already larger than seqIdOfOldestEdit then we can skip the setSeqIdOfOldestEdit call because the actual sequence id must be larger than current sequence id value. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Reporter: Ted Yu Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)