[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255185#comment-14255185
 ] 

Ted Yu commented on HBASE-10201:


Addendum integrated to master and branch-1

Thanks Duo.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, 
 HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, 
 HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, 
 HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, 
 HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255212#comment-14255212
 ] 

Hudson commented on HBASE-10201:


SUCCESS: Integrated in HBase-1.1 #16 (See 
[https://builds.apache.org/job/HBase-1.1/16/])
HBASE-10201 Addendum fixes typo of putIfAbsent (Duo Zhang) (tedyu: rev 
fbc852b6809184bdba0bbccb8ef3e1fe848d6f22)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, 
 HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, 
 HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, 
 HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, 
 HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255222#comment-14255222
 ] 

Hudson commented on HBASE-10201:


SUCCESS: Integrated in HBase-TRUNK #5955 (See 
[https://builds.apache.org/job/HBase-TRUNK/5955/])
HBASE-10201 Addendum fixes typo of putIfAbsent (Duo Zhang) (tedyu: rev 
51334fb951232aa56add118d142e6b82da204494)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201-addendum_1.patch, HBASE-10201.patch, 
 HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, 
 HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, 
 HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, 
 HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254418#comment-14254418
 ] 

stack commented on HBASE-10201:
---

[~Apache9] Any chance of your taking a look at the test failure here: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12165//testReport/ It is 
per column family flushing

https://builds.apache.org/job/PreCommit-HBASE-Build/12165/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush-output.txt

Says this:

---
Test set: org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush
---
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 248.398 sec  
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush
testCompareStoreFileCount(org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush)
  Time elapsed: 53.153 sec   FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush.testCompareStoreFileCount(TestPerColumnFamilyFlush.java:589)

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-19 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254432#comment-14254432
 ] 

zhangduo commented on HBASE-10201:
--

[~stack] Yeah, the testcase is flakey.. It is used to confirm that per column 
family flush generates less store files.

But flush is asynchronized, so there maybe a change that the original flush is 
delayed more than the per column family flush scenario and generate less store 
files, it depends on the machine's state that running the testcase...

I think we can make an addendum to remove it for now to get a stable testing 
result. I will try to find a more stable way to confirm that per column family 
flush does work.

Thanks~

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254450#comment-14254450
 ] 

stack commented on HBASE-10201:
---

Thanks [~Apache9] Do it in new issue when you get a chance since this one is 
long enough already (smile).  Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252596#comment-14252596
 ] 

stack commented on HBASE-10201:
---

Committed to branch-1 so will be in 1.1.0.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252761#comment-14252761
 ] 

Hudson commented on HBASE-10201:


SUCCESS: Integrated in HBase-1.1 #5 (See 
[https://builds.apache.org/job/HBase-1.1/5/])
HBASE-10201 Port 'Make flush decisions per column family' to trunk (stack: rev 
e55ef7a663dd9a18fa88a506afd8fe0ced10563d)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALFactory.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFlushRegionEntry.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushAllStoresPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestDefaultWALProvider.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicyFactory.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestFSHLog.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
* hbase-common/src/main/resources/hbase-default.xml
HBASE-10201 Addendum changes TestPerColumnFamilyFlush to LargeTest (stack: rev 
5d34d2d02af39037a2426fe4fb5be9a447202bd7)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248724#comment-14248724
 ] 

Enis Soztutar commented on HBASE-10201:
---

I don't think we should have this in 1.0.0. I am planning on cutting the RC 
tomorrow, and this seems to be a huge change for the last minute. Can we target 
1.1 instead? 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248725#comment-14248725
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

Looks good to me(+1) for master branch.  Branch-1 should rely on [~enis]'s 
feedbacks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248793#comment-14248793
 ] 

stack commented on HBASE-10201:
---

bq. Can we target 1.1 instead?

Sure.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248816#comment-14248816
 ] 

stack commented on HBASE-10201:
---

I forgot to say thank you [~Apache9] for your persistence on getting this in.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248943#comment-14248943
 ] 

Hudson commented on HBASE-10201:


FAILURE: Integrated in HBase-TRUNK #5930 (See 
[https://builds.apache.org/job/HBase-TRUNK/5930/])
HBASE-10201 Port 'Make flush decisions per column family' to trunk (stack: rev 
c7fad665f34fd3c17999d5cc60b04d3faff6a7f5)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestFSHLog.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFlushRegionEntry.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALFactory.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushLargeStoresPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushAllStoresPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestDefaultWALProvider.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicy.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushPolicyFactory.java


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249188#comment-14249188
 ] 

stack commented on HBASE-10201:
---

That'll work. Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249191#comment-14249191
 ] 

Ted Yu commented on HBASE-10201:


Addendum pushed to master branch.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249199#comment-14249199
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
I forgot to say thank you zhangduo for your persistence on getting this in.
{quote}

It's my pleasure to contribute code to a famous project:)
Thanks


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249275#comment-14249275
 ] 

Hudson commented on HBASE-10201:


FAILURE: Integrated in HBase-TRUNK #5933 (See 
[https://builds.apache.org/job/HBase-TRUNK/5933/])
HBASE-10201 Addendum changes TestPerColumnFamilyFlush to LargeTest (tedyu: rev 
885b065683499540f467cb54086a3f60e64b9c8a)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPerColumnFamilyFlush.java


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: 10201-addendum.txt, 3149-trunk-v1.txt, 
 HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, 
 HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, 
 HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, 
 HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, 
 HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, 
 HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246385#comment-14246385
 ] 

stack commented on HBASE-10201:
---

I'm +1 on this going into master branch.  I am +1 on this going into branch-1 
but with it disabled by default as an experimental feature; users would have to 
enable the FlushLargeStoresPolicy explicitly (You ok w/ that [~enis])?

Any chance of more +1s?  [~jeffreyz]? Any other reviews out there? This is an 
old issue, nicely addressed, that can make a nice dent in our i/o profile when 
more than one column family but it would be good to get more eyes on it given 
its messing with sequenceids. Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245245#comment-14245245
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687023/HBASE-10201_19.patch
  against master branch at commit a0e473730e2cd819e7442dbd2b332d7833755ba2.
  ATTACHMENT ID: 12687023

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12065//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245187#comment-14245187
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687009/HBASE-10201_19.patch
  against master branch at commit a0e473730e2cd819e7442dbd2b332d7833755ba2.
  ATTACHMENT ID: 12687009

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12064//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.




[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243274#comment-14243274
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

{quote}
Now I always generate a new flushSeqId and use this as the seqId of flushed 
StoreFiles. And use a maxFlushedSeqId to record completeSequenceId that passed 
to HMaster. Is it OK?
{quote}
Sounds good to me. 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243345#comment-14243345
 ] 

stack commented on HBASE-10201:
---

[~jeffreyz] What about the comment on issue w/ 1. above? See 
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14240737page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14240737

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243528#comment-14243528
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

[~saint@gmail.com] 
{quote}
Are you referring to the following: Will this mean we drop edits because 
region thinks its sequenceid is higher than it should be?
{quote}
Yes, as of today during replay edits in both modes, we drop WAL edits whose 
seqId less than relating store Seq Ids. There some edge cases(like a new PUT, 
region move to a different RS, DELETE on the new PUT, major compaction, move 
back to the original RS and the RS crashes) we have to know the hFile seqId 
accurately otherwise the PUT may be restored after recovery. 

We need to pass flushed seqIds per store to master so that we can optimize 
recovery process but doesn't impact correctness. 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243546#comment-14243546
 ] 

stack commented on HBASE-10201:
---

[~jeffreyz] I'm referring to the fact that if three column families, and one 
has edit #1, another edit #2 (which came later) and the third had edit #3 and 
then if the policy decides flush the third CF, we'll write it out with a seqid 
of #3 but edits #1 and #2 are still in memory. We report to the master our 
lowest number is #1 but master crashes (so we lose info that #1 is earliest 
safe edit number).  The RS hosting the three column famiilies also crashes.  On 
recovery, we open the region and see a hfile with seqid #3 so we set the region 
current seqid to #4.. even though #1 and #2 were never persisted.  This is 
possible with this patch as is especially when policy is disconnected from 
flush.

bq. We need to pass flushed seqIds per store to master so that we can optimize 
recovery process but doesn't impact correctness.

This would not fix the above case?  The master might know that #3 was persisted 
and that column family 1 and 2 had edits less than #3 but if it crashes, we're 
back in the scenario described above (unless we persist the flush reports?)

Thanks.


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243610#comment-14243610
 ] 

zhangduo commented on HBASE-10201:
--

[~stack] In your scenario, I think we will use #1 to skip edits, not #4.
As I see code in replayRecoveredEditsIfAny
{code}
long minSeqIdForTheRegion = -1;
for (Long maxSeqIdInStore : maxSeqIdInStores.values()) {
  if (maxSeqIdInStore  minSeqIdForTheRegion || minSeqIdForTheRegion == -1) 
{
minSeqIdForTheRegion = maxSeqIdInStore;
  }
}
{code}
And this
{code}
  maxSeqId = Math.abs(Long.parseLong(fileName));
  if (maxSeqId = minSeqIdForTheRegion) {
if (LOG.isDebugEnabled()) {
  String msg = Maximum sequenceid for this wal is  + maxSeqId
+  and minimum sequenceid for the region is  + 
minSeqIdForTheRegion
+ , skipped the whole file, path= + edits;
  LOG.debug(msg);
}
continue;
  }
{code}
And in replayRecoveredEdits, we skip edit cells using per store seqId
{code}
// Now, figure if we should skip this edit.
if (key.getLogSeqNum() = maxSeqIdInStores.get(store.getFamily()
.getName())) {
  skippedEdits++;
  continue;
}
{code}

And when splitting log, we use a lastSeqId got from HMaster to skip edits. If 
master crash and loss the information, then we will not skip any edits? I'm not 
sure but I didn't find the code to get lastSeqId from any place other than 
HMaster. [~jeffreyz]

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243631#comment-14243631
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

[~saint@gmail.com] Besides [~Apache9] mentioned, we skip edits using seqId 
of each relating store, the #4(which is  #3) is only set after region is full 
recovered(i.e all WAL edits are already replayed).

{quote}
 If master crash and loss the information, then we will not skip any edits?
{quote}
yes, we'll lose the info and will replay more edits. 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243714#comment-14243714
 ] 

stack commented on HBASE-10201:
---

Yes. I think it is going to be ok. I missed the 'skip edits using seqid of each 
relating store' bit. My calc was region based.  Thanks for entertaining my 
question.  In my scenario, the first column family that had edit #1 should have 
a store seqid of -1 which would mean we'd not skip edit #1 when it came into 
replayRecoveredEditsIfAny,

I'm wondering how to make a unit test.  One thought was to stand up a single 
HRegion of multiple column families and populate it in various ways, out of 
balance, and then add a means of 'killing' the region.  Then create a 
'recoved.edits' file and reopen the region to verify edits are as expected (and 
do same for DLR replay scenario)?





 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243728#comment-14243728
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
I'm wondering how to make a unit test.
{quote}
TestPerColumnFamilyFlush.testLogReplay has tested log replay for selective 
flush. I think it only misses the things that it does not kill HMaster when log 
replay. I can add a testcase to test the scenario that we can not get up to 
date lastSeqId from HMaster(kill master first, then kill regionserver, then 
restart master). [~stack], is this OK?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243758#comment-14243758
 ] 

stack commented on HBASE-10201:
---

bq. TestPerColumnFamilyFlush.testLogReplay has tested log replay for selective 
flush.

Woah. Thats a nice test.  How long has that been around?  I missed it in 
previous reviews if it was present.  I think this test is enough to give us 
confidence in this radical change.  The kill of master so we don't have latest 
seqid is a nice to have but not necessary; we just over replay the edits.

Let me go over your last posted patch.  Seems like a bunch of new stuff has 
shown up (or I was blind last time I read through the patch).



 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-10 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241121#comment-14241121
 ] 

zhangduo commented on HBASE-10201:
--

I ran the performance test in TestPerColumnFamilyFlush to confirm the patch is 
still work after I changed the behavior of FlushPolicy.

The result is same with previous test

metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 1272,
metric_storeFileSize: 4509402744,
metric_compactionsCompletedCount: 56,
metric_numBytesCompactedCount: 20654822724,
metric_numFilesCompactedCount: 184,

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241695#comment-14241695
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686240/HBASE-10201_18.patch
  against master branch at commit 84b41f8029fd5822832255daeee73ff2283a622a.
  ATTACHMENT ID: 12686240

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12045//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was 

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-10 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241806#comment-14241806
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

{quote}
because we are not doing DLR in 0.98 or for some other reason? This patch is 
unlikely to make it back to 0.98 I'd say.
{quote}
It's because we defer mvcc values clean up(by HBASE-11315) but anyway we should 
maintain the semantics that HStore file seqId is the largest flushed SeqId for 
the file.

{quote}
And do I need to change original log split policy to also use a 
familyName-seqId map to filter out cells that already flushed? 
{quote}
Yes, we should but you could do in a separate issue on this though.


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-10 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241985#comment-14241985
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
but anyway we should maintain the semantics that HStore file seqId is the 
largest flushed SeqId for the file.
{quote}
I modified the code to 
{code}
  flushSeqId = getNextSequenceId(wal);
  long oldestUnflushedSeqId = 
wal.getEarliestMemstoreSeqNum(encodedRegionName);
  // no oldestUnflushedSeqId means we flushed all stores.
  // or the unflushed stores are all empty.
  maxFlushedSeqId =
  oldestUnflushedSeqId == HConstants.NO_SEQNUM ? flushSeqId : 
oldestUnflushedSeqId - 1;
{code}
Now I always generate a new flushSeqId and use this as the seqId of flushed 
StoreFiles. And use a maxFlushedSeqId to record completeSequenceId that passed 
to HMaster. Is it OK?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239588#comment-14239588
 ] 

stack commented on HBASE-10201:
---

bq. I followed RegionSplitPolicy to write FlushPolicy

I see. Yes you did it seems.

bq. ...Ted Yu suggested using FlushPolicyFactory and placing the factory method 
in it instead of FlushPolicy. 

OK. Its his fault then.

bq. Maybe the code of RegionSplitPolicy is old and need refactoring too...

My comments above apply to it too it seems yes.  The master is checking it can 
load the split policy reaching across into the regionserver package. I suppose 
the idea is checking split policy in one central place.  Should just load 
default on regionserver if we can't find the configured one.

bq. ReflectionUtils.newInstance(clazz, conf) will call setConf. 

ok

bq. Can be fixed later.

ok

Yeah, looks like you followed pattern in code base so not a problem of your 
making. Can fix both in a followup issue.

bq. flushSeqId will not be bumped if we do not flush all stores.

Because?

bq. And actually I do not know where we use FlushMarker so I do not know the 
meaning of flushSeqId in the Marker...

It may not be used just yet but it will be used soon by following read replicas.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239658#comment-14239658
 ] 

Ted Yu commented on HBASE-10201:


w.r.t. FlushPolicyFactory, I made the comment before 
sanityCheckTableDescriptor() was added.

It was not my intention that master directly references class in region server 
module.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239958#comment-14239958
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

This is a nice feature. I scan through the patch and below are my comments:

1) There may be a correctness issue for same version(same row key  version) 
updates. Because you use following code as store file flush id, we could end up 
multiple hstore files with exact same flush seq id. While HBase resolve same 
version updates by store files' seqid(flush id). Therefore, we may end up with 
incorrect results.  This issue may only happen in 0.98 though.
{code}
+  long oldestUnflushedSeqId = wal
+  .getEarliestMemstoreSeqNum(encodedRegionName);
{code} 
In order to fix the issue, we should use current store's max flushed seq id as 
its real hstore seq id. While we need to change HRegion.lastFlushSeqId to use 
oldestUnflushedSeqId to report back Master otherwise we may have data loss 
issue.

2)  We have a feature where we force a flush by 
hbase.regionserver.optionalcacheflushinterval or 
hbase.regionserver.flush.per.changes while I didn't see you handle both cases 
in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always 
return true and end up with small hstore files.

3) For region server recovery, we have an optimization by using lastFlushSeqId 
reported by region servers to skip writing edits into recovered.edits files. 
With this feature, we may unnecessarily write much more data into 
recovered.edits. This issue doesn't happen in log replay case.

4) Relating to your FlushMarker question, FulshMarker(or similar 
RegionEventWALEdit) are used for region replica feature and reasoning on 
region/store state. As you can see(in WALEdit class), those special events are 
using special column family METAFAMILY which doesn't exist for data regions. 
You should handle those events specially in getFamilyNames() otherwise they may 
affect your book keeping on oldest un-flushed seqid.  


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240419#comment-14240419
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
1) There may be a correctness issue for same version(same row key  version) 
updates...
{quote}
I think you mean the KVScannerComparator will use sequenceId to compare if we 
get the same key. Yes this is a problem I missed. I think we need to change the 
code below as you suggested, use store's max seqId instead of flushSeqId here.
{code}
for (Store s : storesToFlush) {
  totalFlushableSizeOfFlushableStores += s.getFlushableSize();
  storeFlushCtxs.add(s.createFlushContext(flushSeqId));
  committedFiles.put(s.getFamily().getName(), null); // for writing 
stores to WAL
}
{code}
{quote}
2) We have a feature where we force a flush...
{quote}
That's why I introduce a FlushPolicy. Now the policy is simple that we only 
consider the size of a store. So if we keep a store for a long time then there 
will be a force flush all stores request which may generate unnecessary small 
files. I think we can introduce new FlushPolicy later to handle it better.
{quote}
3) For region server recovery...
{quote}
I think the issue in 1) also make the problem even worse that the flushSeqId 
passed to createFlushContext will be used as maxSeqId in a store...I will fix 
it in the next patch. And If we want to skip WAL exactly, then we need to 
report a familyName-seqId map to master which will change the rpc protocol(and 
the format of zk data in distributed log replay). This is a big change so I 
think we can reopen HBASE-12405 to handle it after HBASE-10201 getting in.
{quote}
4) Relating to your FlushMarker question...
{quote}
I will fix getFamilyNames(), thanks. And is there anything else that make read 
replicas broken? I'm not familiar with read replicas so may miss something.

Thanks~

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0, 0.98.10

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240436#comment-14240436
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
OK. Its his fault then.
{quote}
It's not his fault as he just let me use a factory class and didn't told me to 
make reference to regionserver in HMaster... 
I think we need to find a way to do sanity check when loading tables without 
making reference to regionserver...

{quote}
flushSeqId will not be bumped if we do not flush all stores.
Because?
{quote}
The flushSeqId will be one less than the oldest edit still in a memstore if we 
do not flush all stores.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0, 0.98.10

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240441#comment-14240441
 ] 

stack commented on HBASE-10201:
---

Made it critical again. Removed it from 0.98 (not critical for 0.98).  Trying 
to get into 1.0 even if it is turned off by default.  Thats why I had it 
critical.  Speak up if you think otherwise [~enis]  Doing the testing for this 
feature as though it were critical.  Benefit is nice and its an old issue 
getting fixed.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240461#comment-14240461
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

{quote}
(and the format of zk data in distributed log replay)
{quote}
You don't have to change this because log replay already gets max seqId per 
store before sending edits for replay.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240465#comment-14240465
 ] 

stack commented on HBASE-10201:
---

Changed my mind. Set it to Major. Doesn't need to be critical.  If it gets done 
in time, well and good.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240528#comment-14240528
 ] 

zhangduo commented on HBASE-10201:
--

[~jeffreyz] I think flushSeqId is ambiguous here. We have two things 
actually, one is maxFlushedSeqId, and the other is seqIdOfFlushOperation.

Before this patch, maxFlushedSeqId is equal to seqIdOfFlushOperation because we 
flush all the datas before the flush operation. After this patch, they are 
different.
So I think we need to introduce a new field called maxFlushSeqId instead of 
lastFlushSeqId in HRegion, and generate flushSeqId in the old way(increase 
sequenceId of region).

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240568#comment-14240568
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
You don't have to change this because log replay already gets max seqId per 
store before sending edits for replay.
{quote}
There maybe some misunderstand. 
I pass a wrong value to createFlushContext, so the maxSeqId of store is less 
than it should be. This will cause unnecessary log split and replay.
And if I fix this, then the problem will be what HBASE-12405 described. We need 
to store a map to solve HBASE-12405 perfectly.

And for distributed log replay, postOpenDeployTasks will call 
updateRecoveringRegionLastFlushedSequenceId to store the maxSeqId in zk, and 
WALSplitter.splitLogFile will use it to skip WAL, then pass WAL to a 
regionserver to replay. We can use a map when replay the WAL to skip 
unnecessary cells(this is what we do in the patch). But if we store a map on 
zk, then we can skip the WAL earlier.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240619#comment-14240619
 ] 

zhangduo commented on HBASE-10201:
--

Oh, after digging into the code I found that the file stored on zk already have 
the seqId of each store, but WALSplitter.splitLogFile only use the 
LastFlushedSequenceId and ignore the store sequence id. So it is easy to change 
the split work to use sequence id of store. Let me try. Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240737#comment-14240737
 ] 

stack commented on HBASE-10201:
---

[~jeffreyz] When you say...

bq. ... This issue may only happen in 0.98 though.

because we are not doing DLR in 0.98 or for some other reason?  This patch is 
unlikely to make it back to 0.98 I'd say.

On the fix for 1.) above, hfiles, will be written out with the stores flushed 
seqid but we will tell keep on telling master the oldest unflushed edit 
(oldestUnflushedSeqId).  Since flush policies can return any set of Stores 
without regard to sequenceid, we could have edits in memstores with sequenceids 
that are in earlier than those of persisted hfiles.  Since telling the master 
oldestUnflushedSeqId does not guarantee that oldestUnflushedSeqId will be 
available at recovery time (it is in the master memory only IIRC, and master 
may crash and lose it), when region opens post-recovery, we look at sequenceids 
from hfiles to figure the regions sequenceid.  Will this mean we drop edits 
because region thinks its sequenceid is higher than it should be?

3. is a 'known' cost.  Good to know that DLR won't have this issue.

4. is a good point (as is 2.)


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240754#comment-14240754
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
3. is a 'known' cost. Good to know that DLR won't have this issue.
{quote}
Yeah, at last I found that LogReplayOutputSink will filter out cells using 
regionMaxSeqIdInStores in groupEditsByServer method. This is actually what we 
want.

And do I need to change original log split policy to also use a 
familyName-seqId map to filter out cells that already flushed?  
Thanks~

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-09 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240779#comment-14240779
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
2) We have a feature where we force a flush by 
hbase.regionserver.optionalcacheflushinterval or 
hbase.regionserver.flush.per.changes while I didn't see you handle both cases 
in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always 
return true and end up with small hstore files.
{quote}
I think a get the point. Actually I use forceFlushAllStores=true when 
shouldFlush returns true so there will not be a situation that 
HRegion.shouldFlush() always returns true because we will flush all stores.
But I think we can pass forceFlushAllStores=false in that case and add old 
stores to the specificStoresToFlush in selectStoresToFlush to better handle it.
I will fix it. Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
 Fix For: 1.0.0, 2.0.0

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237602#comment-14237602
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685679/HBASE-10201_17.patch
  against master branch at commit 87e44140040ab9a864e592c13f164dcde6ed6c03.
  ATTACHMENT ID: 12685679

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11993//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238808#comment-14238808
 ] 

stack commented on HBASE-10201:
---

[~jeffreyz] Would you mind taking a look at the sequenceid accounting that is 
going on in this patch? I am currently testing.  Would be good to get the view 
of another with a seqid fixation.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239008#comment-14239008
 ] 

Jeffrey Zhong commented on HBASE-10201:
---

[~saint@gmail.com] Sure. Let me take a look at this patch! 

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239062#comment-14239062
 ] 

stack commented on HBASE-10201:
---

Here is some feedback on reading through latest version of the patch. Lets 
address these minor items after we are sure the most important part is working, 
the sequenceid handling (I'm running tests here but its taking a while -- first 
I need to prove that hbase 1.0 branch is healthy, then I intro your patch... 
and [~jeffreyz], Mr SequenceId is going to take a look too).  So hold on making 
a patch till testing and Jeffrey's review are done.

Sorry this is taking so long to get in.  On the one hand, you can tell we are 
excited about getting this patch in because the improvement is really nice but 
it touches a very sensitive part of hbase, the region sequenceid'ing, so we 
need to exercise extra caution.  Thanks for you patience [~Apache9]

Nits to be addressed on commit or if you make a new version of the patch 
(you've done enough as it is -- smile -- and I could do below on commit np).

I can go over the javadoc on commit. Small edit would fix it all up nicely.

Below is a nit that can be addressed in a follow-on:

This config is not general.  It belongs to a particular policy (If 
FlushLargeStoresPolicy is used...): 
hbase.hregion.percolumnfamilyflush.size.lower.bound  Should probably have the 
policy it is for in its name.  Maybe just don't mention is in 
hbase-default.xml.  Let uses find it if they need it (16MB is a nice default 
low-bound).

It is odd that this is public:

public static Class? extends FlushPolicy getFlushPolicyClass(

It is nice that the master tests that we can load a policy but it does not even 
use flush policy (if we fail to load fall back to default with big warning?)  
And flush policies are over in regionserver package so here we have master 
reaching over and into the regionserver package. Would be good to avoid doing 
this x-package reach especially when it does not seem to be needed.

I would think this would be an internal method for the factory to use?

Also in HTD, you call it getFlushPolicyClassName but here you call it 
getFlushPolicyClass... would be good to be same.

This policy stuff you've added is nicer than what was here previous.  Good one.

Should these two strings just be the same?

FLUSH_SIZE_LOWER_BOUND_KEY and DEFAULT_FLUSH_SIZE_LOWER_BOUND even though they 
are read from different places?  No harm the key being the same especially 
since in HTD, you hide the key by providing getter/setters.

The FlushPolicy api is a little odd.  It implements Configured but where do you 
do a setConf on it? Then in the configureForRegion method, you take a Region 
but all it is used for is to emit region name on Strings and to get instance of 
HTableDescriptor.  The flush takes a list of stores.  Can't it get them from 
the region it was given when configuredForRegion?  This is a nit comment.  
Ignore for now.

... Stopped at sequence id  changes will be back. Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-08 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239084#comment-14239084
 ] 

zhangduo commented on HBASE-10201:
--

[~stack] I followed RegionSplitPolicy to write FlushPolicy, expect that 
[~tedyu] suggested using FlushPolicyFactory and placing the factory method in 
it instead of FlushPolicy. Maybe the code of RegionSplitPolicy is old and need 
refactoring too...

{quote}
The FlushPolicy api is a little odd. It implements Configured but where do you 
do a setConf on it? Then in the configureForRegion method, you take a Region 
but all it is used for is to emit region name on Strings and to get instance of 
HTableDescriptor. The flush takes a list of stores. Can't it get them from the 
region it was given when configuredForRegion? This is a nit comment. Ignore for 
now.
{quote}
ReflectionUtils.newInstance(clazz, conf) will call setConf. And I agreed that 
if we implement configureForRegion, then the list of stores is not necessary 
when doing selection. Can be fixed later.

[~jeffreyz] I think the biggest problem is that this patch change the 
flushSeqId generation. flushSeqId will not be bumped if we do not flush all 
stores. I think the flushSeqId should be called as highestFlushedToDiskSeqId 
in this patch. And actually I do not know where we use FlushMarker so I do not 
know the meaning of flushSeqId in the Marker...

Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237213#comment-14237213
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685611/HBASE-10201_14.patch
  against master branch at commit bb15fd5fe0a89e647cd9cefa0ceae342578f0833.
  ATTACHMENT ID: 12685611

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11982//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237214#comment-14237214
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685612/HBASE-10201_15.patch
  against master branch at commit bb15fd5fe0a89e647cd9cefa0ceae342578f0833.
  ATTACHMENT ID: 12685612

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11983//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, 
 memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237370#comment-14237370
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685645/HBASE-10201_16.patch
  against master branch at commit 9fd6db3703d3e7ec50b32b1e96c65ed9f2b1456d.
  ATTACHMENT ID: 12685645

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  private static final Class? extends FlushPolicy 
DEFAULT_FLUSH_POLICY_CLASS = FlushLargeStoresPolicy.class;
+new WALKey(info.getEncodedNameAsBytes(), htd.getTableName(), 
System.currentTimeMillis()),

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11990//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, 
 HBASE-10201_16.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, 
 compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small 

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235277#comment-14235277
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685263/HBASE-10201_13.patch
  against master branch at commit 08754f2c431b829b0d6269bdb23284dd679ed8ca.
  ATTACHMENT ID: 12685263

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11945//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236647#comment-14236647
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
I see less compactions and less hfiles (so less i/o), memstores carrying more 
(its hard to see but you should be able to make out memstore sizes do not go to 
zero or near zero when the patch is enabled)
{quote}
Glad to see it does help:). Thanks~

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236671#comment-14236671
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685524/HBASE-10201_14.patch
  against master branch at commit 4a36f662c2738a61535cf188f27d478d72c5a38a.
  ATTACHMENT ID: 12685524

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
15 warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11969//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-04 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234975#comment-14234975
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
Suggest you drop memstore from the names of these configs:
hbase.hregion.memstore.percolumnfamilyflush.enabled
hbase.hregion.memstore.percolumnfamilyflush.size.lower.bound
{quote}
done.
{quote}
Importing HRegion into HMaster should be avoided – we are reaching across 
packages – especially just to get at a define. Move this config up into 
HConstants since it is used by two major subpackages or probably better, put it 
into HRegionInfo.
{quote}
Now there is only a  FlushPolicy.getFlushPolicyClass(htd, conf); in HMaster. It 
is same with RegionSplitPolicy.

{quote}
Why do we have to change the API on FlushRequest? Can the flush implementation 
not do all the necessary figuring of what to flush reading necessary configs., 
etc.? Maybe you need the flag to 'force' a full region flush? If so, should it 
be a force flag rather than the effete 'selectiveFlushRequest'?
{quote}
I changed selectiveFlushRequest to forceFlushAllStores, and done a true/false 
reversion. Hope I didn't miss something.

{quote}
Add the fact that we are doing per col flushing as an attribute on summary line 
printed out on region instantiation rather than give it its own log line:
{quote}
Sorry I didn't find the summary log. But now the log is disappeared after the 
introducing of FlushPolicy.

And is the QA bot still broken?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233701#comment-14233701
 ] 

stack commented on HBASE-10201:
---

Trying this patch out reviewing v12.

Suggest you drop memstore from the names of these configs:

hbase.hregion.memstore.percolumnfamilyflush.enabled
hbase.hregion.memstore.percolumnfamilyflush.size.lower.bound

Its only memstores that are flushed so having memstore in the name is redundant.

Same here MEMSTORE_COLUMNFAMILY_FLUSHSIZE_LOWER_BOUND and here 
getMemStoreColumnFamilyFlushSizeLowerBound

This is a nit. Can do later if we make more patches.

Importing HRegion into HMaster should be avoided -- we are reaching across 
packages -- especially just to get at a define. Move this config up into 
HConstants since it is used by two major subpackages or probably better, put it 
into HRegionInfo.

So I enable this feature in hbase-site.xml and I can enable it globally also in 
hbase-site.xml but I can also enable it on a per-table basis?  Thats good.

Why do we have to change the API on FlushRequest?  Can the flush implementation 
not do all the necessary figuring of what to flush reading necessary configs., 
etc.?  Maybe you need the flag to 'force' a full region flush?  If so, should 
it be a force flag rather than the effete 'selectiveFlushRequest'?

Add the fact that we are doing per col flushing as an attribute on summary line 
printed out on region instantiation rather than give it its own log line:

+if (LOG.isDebugEnabled()) {
+  LOG.debug(Per Column Family Flushing:  + perColumnFamilyFlushEnabled);
+}

More review later. This patch is great.  THere were some rejects but easy to 
fix. Let me try and get some numbers to help make the case for this patch going 
in.






 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-12-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231134#comment-14231134
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684565/HBASE-10201_12.patch
  against master branch at commit 94d57f81dc114feba14906b05b3d2c6b78bf3299.
  ATTACHMENT ID: 12684565

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.camel.component.jms.JmsDefaultTaskExecutorTypeTest.testSimpleAsyncTaskExecutor(JmsDefaultTaskExecutorTypeTest.java:70)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11891//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228057#comment-14228057
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684125/HBASE-10201_11.patch
  against master branch at commit 0f8894cd6435ed6962ec3d7c81be4cb0d4f7657e.
  ATTACHMENT ID: 12684125

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11857//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_11.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, 
 HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, 
 HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220880#comment-14220880
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682853/HBASE-10201_10.patch
  against master branch at commit 325cdc0987f8176ac46695f5b0c93b0fc6605ab9.
  ATTACHMENT ID: 12682853

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11774//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, 
 HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, 
 HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, 
 HBASE-10201_8.patch, HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220602#comment-14220602
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682818/HBASE-10201_9.patch
  against master branch at commit c5690b1be3ae84efa52ee3c4589248c447e12f3f.
  ATTACHMENT ID: 12682818

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+final WAL wal, final long myseqid, CollectionStore 
storesToFlush, MonitoredTask status)
+new WALKey(info.getEncodedNameAsBytes(), htd.getTableName(), 
System.currentTimeMillis()),

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11771//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, 
 HBASE-10201_9.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214707#comment-14214707
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681891/HBASE-10201_7.patch
  against trunk revision .
  ATTACHMENT ID: 12681891

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3793 checkstyle errors (more than the trunk's current 3788 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  private final ConcurrentMapbyte[], ConcurrentMapbyte[], Long 
oldestUnflushedStoreSequenceIds = new ConcurrentSkipListMapbyte[], 
ConcurrentMapbyte[], Long(
+  ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = 
oldestUnflushedStoreSequenceIds
+  //   assert not empty. Less rigorous, but safer, alternative is 
telling the caller to stop.
+  Mapbyte[], Long storeSeqNumsBeforeFlushStarts = 
this.lowestFlushingStoreSequenceIds.remove(encodedRegionName);
+  if (currentSeqNum != null  currentSeqNum.longValue() = 
familyNameAndSeqId.getValue().longValue()) {
+ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = 
this.oldestUnflushedStoreSequenceIds
+return oldestUnflushedStoreSequenceIdsOfRegion != null ? 
getLowestSeqId(oldestUnflushedStoreSequenceIdsOfRegion)
+ConcurrentMapbyte[], Long oldestUnflushedStoreSequenceIdsOfRegion = 
this.oldestUnflushedStoreSequenceIds
+final HLog wal, final long myseqid, CollectionStore 
storesToFlush, MonitoredTask status)

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11705//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214971#comment-14214971
 ] 

Ted Yu commented on HBASE-10201:


@Duo:
Can you briefly describe the addition in patch v7 w.r.t. per store sequence Id ?
{code}
+  @Test
+  public void testCompareStoreFileCount() throws Exception {
{code}
Mind adding comment describing what the above test verifies ?
{code}
+  public static void main(String[] args) throws Exception {
{code}
Why is main() needed in the unit test ?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215113#comment-14215113
 ] 

stack commented on HBASE-10201:
---

bq. We need to change protobuf definition

We could add extra fields in pb and write to two places for the life of an 
hbase version to support rolling upgrade.

I hope you do not mind me surfacing here questions asked off list -- its best 
to keep the discussion up here rather than off-list so others can participate 
too. 

You described off-list how the distributed log replay opens a region and puts 
the highest *sequenceid* found up in zk and then uses this to figure which 
edits to replay. You also talk of how regionServerReport includes the last 
flush id of each region we carry and that the master keeps this around so on 
log replay we can skip edits already flushed. You then ask:

bq. I think I need to change all these places to use a map which stored 
familyName-maxSeqId instead of a single SeqId. Am I right?

The sequenceid is *region-scoped*: i.e. we keep a running sequenceid per 
region. For the above to work out, we'd need to change the sequenceid scope to 
be instead column-family rather than region.  Since our memstore is by column 
family, and since the memstore now uses the region sequenceid as its MVCC, this 
might be a good direction to go in but it is not what we have now.

You cannot have it so there are discontinuities in the progress of the flush 
sequenceid. If four column families, the edits can go in to any of the four 
families in any order. 

You could do something like [~gaurav.menghani] did (See 
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
 suggests above where rather than report on successful flush, the highest 
sequenceid of all a regions' memstores involved in a flush, instead, when you 
flush a column family only, you'd have to report one less than the oldest 
outstanding edit still alive up in a column family memstore.

What if you did something much less involved; when there is pressure to flush, 
flush the stores with the oldest edits until you've freed enough memory?

Upsides are that you'd clear out old edits from memory and we might let go of 
WALs a little faster.  Also, you might not flush all of the content in a region 
-- because flushing just a few stores might be enough to get you back under the 
threshold -- so we might make less small storefiles?

Downsides are we'd make some small storefiles (e.g. for those stores that have 
a few old edits in them and little else) and we'd do the flush in series rather 
than in //.  Because of sequenceid accounting, we might replay more edits than 
we have to.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215472#comment-14215472
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
Can you briefly describe the addition in patch v7 w.r.t. per store sequence Id
{quote}
I move the map which stores familyName-oldestSeqIdInStore to FSHLog, and when 
start a flush, pass the familyNames which will be flushed to HLog.
And when replay, skip WAL cells with seqId per store instead of a single seqId 
of region.

{quote}
Why is main() needed in the unit test ?
{quote}
It is used to benchmark a real cluster

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215501#comment-14215501
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
You could do something like Gaurav Menghani did (See 
https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
 suggests above where rather than report on successful flush, the highest 
sequenceid of all a regions' memstores involved in a flush, instead, when you 
flush a column family only, you'd have to report one less than the oldest 
outstanding edit still alive up in a column family memstore.
{quote}
Yes, this is what the patch doing now. This is the way which has minimal impact 
on existing code.

{quote}
What if you did something much less involved; when there is pressure to flush, 
flush the stores with the oldest edits until you've freed enough memory?
{quote}

I think we need to identify the reason why we need a flush. If we need a flush 
due to large memstore size, then flush large store is enough. If we need a 
flush due to the oldest seqId alived in memstore is far away from now(which 
means we have lots of WAL that can not be archived), then we need to flush the 
store which has the oldest seqId in memstore(or maybe just flush all the 
stores? simple but useful). Maybe I can change the return value of shouldFlush 
from boolean to enum to indicate the reason why we need a flush.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215702#comment-14215702
 ] 

stack commented on HBASE-10201:
---

bq. If we need a flush due to large memstore size, then flush large store is 
enough.

To be clear, you are suggesting that we would flush the big store but we would 
not move the sequenceid forward; it would still be one less than the oldest 
edit still in a memstore?

Then, our other flush forcing function, the one that wants to clear up old 
WALs, would come along and force the flushing of the old memstores?

That is an interesting idea.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215724#comment-14215724
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682076/HBASE-10201_8.patch
  against trunk revision .
  ATTACHMENT ID: 12682076

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11727//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215740#comment-14215740
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
To be clear, you are suggesting that we would flush the big store but we would 
not move the sequenceid forward; it would still be one less than the oldest 
edit still in a memstore?
{quote}
Yes, the complete sequenceId is always the oldest edit still in memstore minus 
one except we flush all stores.

{quote}
Then, our other flush forcing function, the one that wants to clear up old 
WALs, would come along and force the flushing of the old memstores?
{quote}
Yes, for LogRoller and PeriodicMemstoreFlusher.



 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215743#comment-14215743
 ] 

stack commented on HBASE-10201:
---

You have any means of trying out your patch to get rough numbers to see if it 
helps [~Apache9]?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215745#comment-14215745
 ] 

stack commented on HBASE-10201:
---

[~Apache9] Thanks for working on this.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207872#comment-14207872
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681029/HBASE-10201_6.patch
  against trunk revision .
  ATTACHMENT ID: 12681029

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11649//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208277#comment-14208277
 ] 

stack commented on HBASE-10201:
---

This cannot go in till HBASE-12405 is done, right? (I'm trying to write up a 
doc on how sequenceid is used in hbase to help).

How'd you generate the numbers and what is WAF?

Thanks [~Apache9]

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-12 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208962#comment-14208962
 ] 

zhangduo commented on HBASE-10201:
--

I used to think this should go into master first as an experimental feature, 
and HBASE-12405 is based on this issue.

Now I think you'are right stack, seems making HBASE-10201 base on the work of 
HBASE-12405 is more natural, not the reverse

Never mind, I will finish HBASE-12405 as soon as possible

{quote}
How'd you generate the numbers and what is WAF?
{quote}

I create a table with 3 CFs, disable split(use a large constants split size), 
and put 1M rows into the table.
key is a 16B, and 16B value for CF1, 256B value for CF2, 4KB value for CF3.
the result number is copied from the jmx web page of regionserver.

WAF is short for Write Amplification, and I calculate it simply by 
numBytesCompactedCount/storeFileSize

Thanks.


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.98.9, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, 
 HBASE-10201_6.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-05 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198411#comment-14198411
 ] 

Jean-Marc Spaggiari commented on HBASE-10201:
-

Do you have the metrics with the last version of the patch?

From a previous version it was:
With per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 7,
metric_memStoreSize: 110195648,
metric_storeFileSize: 4369570622,
metric_compactionsCompletedCount: 27,
metric_numBytesCompactedCount: 10353718691,
metric_numFilesCompactedCount: 89,
Write amplification: 2.37

Does it changed?

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198501#comment-14198501
 ] 

zhangduo commented on HBASE-10201:
--

No, I have not run the test on master branch yet. I will run it tomorrow cause 
it is already 23:15 in China...

Sad for the time zone difference...

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-11-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199822#comment-14199822
 ] 

zhangduo commented on HBASE-10201:
--

Results on master branch

2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa

metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 40117320,
metric_storeFileSize: 4461018704,
metric_compactionsCompletedCount: 92,
metric_numBytesCompactedCount: 22091556672,
metric_numFilesCompactedCount: 290
Write amplification(numBytesCompactedCount/storeFileSize): 4.95
Elapsed time: 23m32s

2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa with 
HBASE-10201
metric_storeCount: 3,
metric_storeFileCount: 8,
metric_memStoreSize: 16400424,
metric_storeFileSize: 4483028246,
metric_compactionsCompletedCount: 54,
metric_numBytesCompactedCount: 20497293164,
metric_numFilesCompactedCount: 178
Write amplification(numBytesCompactedCount/storeFileSize): 4.57
Elapsed time: 23m5s

2.0.0-SNAPSHOT, revision=ecd708671c135052a175c88603d5215a0434e4fa with 
HBASE-10201 but disable selective flush
metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 39937056,
metric_storeFileSize: 4461185232,
metric_compactionsCompletedCount: 92,
metric_numBytesCompactedCount: 22092540348,
metric_numFilesCompactedCount: 290
Write amplification(numBytesCompactedCount/storeFileSize): 4.95
Elapsed time: 22m51s

Seems default config on master will do compactions more aggresive, but the 
result of WAF decrease is not changed too much.

(4.95-4.57)/4.95=7.68%

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190775#comment-14190775
 ] 

stack commented on HBASE-10201:
---

Let me review this patch once more.That all tests pass with it enabled is 
encouraging.  Can work on these it test failures separately. It is not your 
issue.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread Gaurav Menghani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190804#comment-14190804
 ] 

Gaurav Menghani commented on HBASE-10201:
-

[~Apache9] Great work porting this patch! Glad to see this getting ported from 
0.89-fb to trunk :) Please let me know if you need any help.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190830#comment-14190830
 ] 

Ted Yu commented on HBASE-10201:


+1 on turning on this in master branch.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191008#comment-14191008
 ] 

stack commented on HBASE-10201:
---

I wrote the list to get another opinion on the patch pre-commit.

Has this patch been deployed somewhere in production (smile?).  If so, would be 
good to know.  In production, it helps?

On rereview:

 This value should be less than half of the total memstore
+threshold (hbase.hregion.memstore.flush.size).

Do we ensure this in code?  If not, should we?

bq.  I think it is better to open another issue to handle the duplication.

Can you do this for the accounting fixup so by-Store in HLog.

Should log when we do this:

+long columnfamilyFlushSize = this.htableDescriptor
+.getMemStoreColumnFamilyFlushSize();
+if (columnfamilyFlushSize = 0) {
+  columnfamilyFlushSize = conf.getLong(
+  HConstants.HREGION_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND,
+  
HTableDescriptor.DEFAULT_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND);

I can add on commit unless we are doing a new version.

This does not have to be public since it is used from same package:

+  public long getEarliestFlushTimeForAllStores() {

ditto this

getLatestFlushTimeForAllStores

And this ... isPerColumnFamilyFlushEnabled

nit: Guard debug logging with an if LOG.isDebugEnabled... +
LOG.debug(Since none of the CFs were above the size, flushing all.);

When we flush, we write the sequenceid flush to WAL.  This patch should have no 
effect on it.

Sequenceids are region scoped.  If we flush by Store, will there be holes in 
our accounting?

For example, given 3 column families, A, B, and C.

I write sequenceid 1 to A, sequenceid 2 to B, and sequenceid 3 to C.  I then 
write sequence 4 to A.  The edit at sequenceid 4 is big and pushes us over and 
brings on a flush.  We flush A and edits 1 and 4.  Is the fact that edits 2 and 
3 are still up in memory going to mess us up Say the server crashes, at 
replay time we see we flushed up to edit 4, will we think that we edits 2 and 3 
persisted? If you don't have an answer, I can work on the answer.



 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread Gaurav Menghani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191015#comment-14191015
 ] 

Gaurav Menghani commented on HBASE-10201:
-

[~stack] From my design, in this case, 1 and 4 are flushed, but 2 and 3 are 
retained in the memory. But we can only mark 1 as safe. 2, 3 and 4 will all be 
replayed if the server crashes. I am not sure, if this has changed in the patch.

The Per-CF change is not running in prod right now. I didn't see any big 
difference deploying it out of the box with the biggest customer where we have 
a lot of CFs (probably also high-lighted by the small difference in WAF). But I 
can try running it internally on a shadow cluster again. Let me know if there 
are some interesting metrics you want me to look at.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191091#comment-14191091
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
Sequenceids are region scoped. If we flush by Store, will there be holes in our 
accounting?
I write sequenceid 1 to A, sequenceid 2 to B, and sequenceid 3 to C. I then 
write sequence 4 to A. The edit at sequenceid 4 is big and pushes us over and 
brings on a flush. We flush A and edits 1 and 4. Is the fact that edits 2 and 3 
are still up in memory going to mess us up Say the server crashes, at 
replay time we see we flushed up to edit 4, will we think that we edits 2 and 3 
persisted? If you don't have an answer, I can work on the answer.
{quote}
Yes, we write flush seqId 1(Oh I made a mistake, I write seqId 2 in this case, 
flushSeqId = oldestSeqIdInStoresNotToFlush should be  flushSeqId = 
oldestSeqIdInStoresNotToFlush - 1, I will fix it) in this case, so there will 
be holes and some WAL replay is unnecessary when doing recovery. 

We need to store a map of seqId per store instead of a single seqId to solve 
this, and also need some efforts on log truncation and log replay.

{quote}
Has this patch been deployed somewhere in production (smile?). If so, would be 
good to know. In production, it helps?
{quote}
For me, no. I am using 0.98.6.1 with HBASE-12078 patched right now(so I first 
try to port it to 0.98 in this issue...).
Some test result is posted above. And in our production, I always see log like 
this
{quote}
2014-09-29 13:16:25,061 INFO  [MemStoreFlusher.0] regionserver.HRegion: Started 
memstore flush for 
sync:Snapshot,\x00\x00\x00\x00\x02$\x0CC,1411782012686.50aba6be7ff3150be983cb6fd77fc686.,
 current region memstore size 128.3 M
2014-09-29 13:16:25,121 INFO  [MemStoreFlusher.0] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=265.7 
K, hasBloomFilter=true, into tmp file 
hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff315
0be983cb6fd77fc686/.tmp/129e5ef69d7449fea9c2357aa6c4340a
2014-09-29 13:16:25,192 INFO  [MemStoreFlusher.0] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=2.2 M, 
hasBloomFilter=true, into tmp file 
hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150b
e983cb6fd77fc686/.tmp/316fee39423142e09cdb767de9f9bc5d
2014-09-29 13:16:25,528 INFO  [MemStoreFlusher.0] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=27.9 M, 
hasBloomFilter=true, into tmp file 
hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150
be983cb6fd77fc686/.tmp/a886c1e39565468fbf93be6c434f5fc5
2014-09-29 13:16:26,190 INFO  [MemStoreFlusher.0] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=10932563, memsize=98.0 M, 
hasBloomFilter=true, into tmp file 
hdfs://online-hbase/hbase/data/sync/Snapshot/50aba6be7ff3150
be983cb6fd77fc686/.tmp/ec722497c6e14d0fa732c2a9d29e3391
{quote}
The smallest store is always flushed with only KBs. That's the reason why I 
found this issue and started to working on it...

{quote}
Can you do this for the accounting fixup so by-Store in HLog.
{quote}
Yes, I can open another issue to work on this.

Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191117#comment-14191117
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678367/HBASE-10201_4.patch
  against trunk revision .
  ATTACHMENT ID: 12678367

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11531//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191203#comment-14191203
 ] 

stack commented on HBASE-10201:
---

[~gaurav.menghani] Thank you for helping land this upstream and thanks for the 
update on its state at your shop.  What about the recording of 
last-flushed-sequenceid at the master so it knows what it can safely skip 
replaying edits on crash for a region; would that only report '1' in our 
scenario above?  Thanks.

[~Apache9] Thanks for the new patch. I think I need to go through and check 
sequenceid accounting.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191302#comment-14191302
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678382/HBASE-10201_5.patch
  against trunk revision .
  ATTACHMENT ID: 12678382

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestCoprocessorHConnection

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11533//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188070#comment-14188070
 ] 

stack commented on HBASE-10201:
---

bq.  I think it is better to open another issue to handle the duplication.

OK.

bq.  getEarliestFlushTimeForAllStore should be public because TestIOFencing use 
it(which in another package). 

FYI, we mark these with @VisibleForTesting annotation..  I can do on commit.

bq. but I see lots of other similar methods declared as public...

Yeah, sorry about that; we ain't always consistent trying.

bq. Does this meet the requirement?

Yes. Out of interest, are you using the hbase formatter?

bq. I tried but failed to make dev-support/test-patch.sh work properly...

Yeah, this stuff is focused on the master.  Unit tests passing on branch-1 
would be great. Just note it here in the issue.

You going to try hbase-it?

Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-29 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188093#comment-14188093
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
Out of interest, are you using the hbase formatter?
{quote}
No, I just use the default formatter with indent and max length changed. Only 
new code is formatted, old code is format manually to keep the patch clean...
I will try the hbase formatter later. I found it when looking for 
test-patch.sh, thanks.

{quote}
You going to try hbase-it?
{quote}
Yes I have run it with 'mvn verify' under hbase-it. There are some fails and 
errors, I need to see the source code to identify the reason.

Thanks.


 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189417#comment-14189417
 ] 

stack commented on HBASE-10201:
---

bq. Yes I have run it with 'mvn verify' under hbase-it. There are some fails 
and errors, I need to see the source code to identify the reason.

Suggest you first run it before your patch is applied.  It may not be in 
healthy state currently.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-29 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189517#comment-14189517
 ] 

zhangduo commented on HBASE-10201:
--

Yes, I run without the patch first, the result is 

{quote}
Results :

Failed tests:
  
IntegrationTestIngestWithACLIntegrationTestBase.setUp:122-setUpCluster:64-IntegrationTestIngest.setUpCluster:88-IntegrationTestIngest.initTable:93
 Failed to initialize LoadTestTool expected:0 but was:1

Tests in error:
  IntegrationTestMTTR.testRestartRsHoldingTable:261-run:305 ? Execution 
org.apa...

Tests run: 20, Failures: 1, Errors: 1, Skipped: 1
{quote}

Result with patch is
{quote}
Results :

Failed tests:
  
IntegrationTestIngestWithVisibilityLabelsIntegrationTestIngest.testIngest:104-IntegrationTestIngest.runIngestTest:166
 Update failed with error code 1
  
IntegrationTestIngestWithACLIntegrationTestBase.setUp:122-setUpCluster:64-IntegrationTestIngest.setUpCluster:88-IntegrationTestIngest.initTable:93
 Failed to initialize LoadTestTool expected:0 but was:1
  
IntegrationTestIngestWithTagsIntegrationTestIngest.testIngest:104-IntegrationTestIngest.runIngestTest:174
 Verification failed with error code 1

Tests in error:
  IntegrationTestMTTR.testRestartRsHoldingTable:261-run:305 ? Execution 
org.apa...
  IntegrationTestMTTR.testMoveRegion:271-run:305 ? Execution 
org.apache.hadoop

Tests run: 20, Failures: 3, Errors: 2, Skipped: 1
{quote}

For IntegrationTestMTTR.testMoveRegion, it will be passed if I run it 
separately with other methods in the same class being commented, and using 
command mvn clean test-compile failsafe:integration-test 
-Dit.test=IntegrationTestMTTR -DfailIfNoTests=false.

Now i'm debugging IntegrationTestIngestWithVisibilityLabels, but the log is 
flooded with
{quote}
java.io.IOException: Compression algorithm 'lz4' previously failed test.
at 
org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:90)
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:4936)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4923)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4896)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4868)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4824)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4775)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:276)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:103)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: Compression algorithm 'snappy' previously failed test.
at 
org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:90)
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:4936)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4923)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4896)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4868)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4824)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4775)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:276)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:103)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{quote}
and it is hard to find useful informations from the output.

I have compiled hadoop native libs, but I do not know where to place it when 
running tests...

Or is there a way to disable compression when running integration tests? I 
think the result will not be changed since the patch has nothing to do with 
compression...

Thanks.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-28 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186735#comment-14186735
 ] 

zhangduo commented on HBASE-10201:
--

{quote}
Oh, seems like this is lower bound. Only familys  than this size get flushed. 
Thats nice. Need to rename I'd say.  
DEFAULT_MEMSTORE_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND?
{quote}
Done

{quote}
nit: '+ (just as usual). This value should less than half of the total 
memstore' missing a 'be'
{quote}
Done

{quote}
Are we double-accounting here?
+ // keep track of oldest sequence id of edit in a store.
+ private final ConcurrentMapStore, AtomicLong oldestSeqIdOfStore =
+ new ConcurrentHashMapStore, AtomicLong();
The oldest seqid to region is being done by the WAL subsystem IIRC. Now we are 
doing it in here by Store. Should we do it in the one place only? The WAL is 
keeping accounting so it knows when to release WALs that no longer have edits. 
Does this accounting interfere?
{quote}

I add a comment on it to noticed that there is a duplication. There is also 
other duplication in HRegion such as lastFlushSeqId. And for 
oldestSeqIdOfStore, I think it is better to store it in FSHLog because it is 
single-threaded and sequence id is generated in that thread, the update logic 
can be more straight forward without any performance issue, and also, only need 
to modify one place instead of four places in current solution. But this means 
we change FSHLog's tracking unit from Region to Store, there may be a lot of 
work to do which is not related to this issue. I think it is better to open 
another issue to handle the duplication.

{quote}
Add a log WARN here: + if (columnfamilyFlushSize = 0) { ?
{quote}

This is same with memstoreFlushSize's initialize code above.  0 is possible if 
it is not set in HTableDescriptor.

{quote}
Name 'getMinFlushTimeForAllStores' as 'getEarliestFlushTimeForAllStore'? I 
think it clearer that it does if you have 'earlier' in there (as you have it in 
your comments). In sympathy add 'latest' into this method name 
getMaxFlushTimeForAllStores
Do these two above methods need to be public? Can they be package private? Do 
they need to be exposed at all? Ditto for this isPerColumnFamilyFlushEnabled 
and flushcache
{quote}

Renaming is done. getEarliestFlushTimeForAllStore should be public because 
TestIOFencing use it(which in another package). For other methods, they can be 
package private, but I see lots of other similar methods declared as public...

{quote}
Should be a WARN: + LOG.debug(Disabling selective flushing of Column Families' 
memstores.); ?
This comment right? Should it be 'region' rather than 'memstore' in some of the 
below?
+ // We now have to flush the memstore since it has
+ // reached the threshold, however, we might not need
+ // to flush the entire memstore. If there are certain
{quote}
Done

{quote}
Make one log line rather than two:
+ LOG.info(Started memstore flush for  + this + , current region memstore 
size 
+ + StringUtils.byteDesc(this.memstoreSize.get()) + , and  + 
storesToFlush.size() + /
+ + stores.size() +  column families' memstores are being flushed.
+ + ((wal != null) ?  : ; wal is null, using passed sequenceid= + myseqid));
+ for (Store store: storesToFlush) {
{quote}
I use a formatter to wrap it automatically. I modify it manually to
{code:title=HRegion.java|borderStyle=solid}
LOG.info(Started memstore flush for  + this
+ , current region memstore size  + 
StringUtils.byteDesc(this.memstoreSize.get())
+ , and  + storesToFlush.size() + /  + stores.size() +  column 
families' memstores are being flushed.
+ ((wal != null) ?  : ; wal is null, using passed sequenceid= + 
myseqid));
{code}
Does this meet the requirement?

{quote}
How you justify removing this?
flushSeqId = getNextSequenceId(wal);
{quote}

it is not removed. I move it startCacheFlush.

{quote}
nit: in below
if ((now - getLastFlushTime()  flushCheckInterval)) {
+ if ((now - getMinFlushTimeForAllStores()  flushCheckInterval)) {
The former is 'LastFlushTime' and the new code is 'MinFlushTime'... which 
should it be? Do we intend same thing here?
{quote}
I think it should be getLatestFlushTimeForAllStores, not 
getEarliestFlushTimeForAllStores. Although we may not flush all the stores.

{quote}
Instead of + for (AtomicLong oldestSeqId: needToUpdate) {
... can you use what is in AtomicUtils?
{quote}
Use AtomicUtils.updateMin instead. Thanks.

{quote}
THis can't be package private getOldestSeqIdOfStore ?
{quote}
TestPerColumnFamilyFlush need it and is in another package.

{quote}
What is difference between oldest and lowest in below?
Mapbyte[], Long oldestFlushingSeqNumsLocal = null;
Mapbyte[], Long oldestUnflushedSeqNumsLocal = null;
+ Mapbyte[], Long lowestFlushingRegionSequenceIdsLocal = null;
+ Mapbyte[], Long oldestUnflushedRegionSequenceIdsLocal = null;
{quote}
They are just copies of class fields, so 

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186762#comment-14186762
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677580/HBASE-10201_2.patch
  against trunk revision .
  ATTACHMENT ID: 12677580

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
++ , and  + storesToFlush.size() + /  + stores.size() +  
column families' memstores are being flushed.
+   * oldestUnflushedRegionSequenceIds. We use these Maps to find out the low 
bound regions sequence id, or
+  lowestFlushingRegionSequenceIdsLocal = new HashMapbyte[], 
Long(this.lowestFlushingRegionSequenceIds);
+  Long oldValue = 
this.lowestFlushingRegionSequenceIds.put(encodedRegionName, oldRegionSeqNum);
+  hlog.startCacheFlush(hri1.getEncodedNameAsBytes(), Long.MAX_VALUE, 
Long.MAX_VALUE, sequenceId1);
+  private void flushRegion(HLog hlog, byte[] regionEncodedName, AtomicLong 
sequenceId) throws IOException {
+final HLog wal, final long myseqid, CollectionStore 
storesToFlush, MonitoredTask status)

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11490//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, 
 HBASE-10201_1.patch, HBASE-10201_2.patch


 Currently the flush decision is made using the aggregate size of all column 
 

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187090#comment-14187090
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677610/HBASE-10201_3.patch
  against trunk revision .
  ATTACHMENT ID: 12677610

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11492//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch, 
 HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188024#comment-14188024
 ] 

Hadoop QA commented on HBASE-10201:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677829/HBASE-10201-0.99.patch
  against trunk revision .
  ATTACHMENT ID: 12677829

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11501//console

This message is automatically generated.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Ted Yu
Assignee: zhangduo
Priority: Critical
 Fix For: 2.0.0, 0.99.2

 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, 
 HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, 
 HBASE-10201_3.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184997#comment-14184997
 ] 

Anoop Sam John commented on HBASE-10201:


{code}
+  // --
+  // STEP 6. Record oldest sequence id of memstore
+  // --
+  long seqId = walKey.getSequenceId();
+  for (Store store: storesUpdated) {
+store.setSeqIdOfOldestEdit(seqId);
+  }
+
   // ---
-  // STEP 6. Release row locks, etc.
+  // STEP 7. Release row locks, etc.
   // ---
   if (locked) {
 this.updatesLock.readLock().unlock();
{code}
Adding this step here under rowlocks will make us wait more and will affect 
write throughput. [~stack] done some test related to this in another Jira.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-27 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185013#comment-14185013
 ] 

zhangduo commented on HBASE-10201:
--

This step must be done here unless we change the behavior of internalFlushcache.

In internalFlushcache, we do the following step

acquire updatesLock-begin mvcc insert-prepare flush-release 
updatesLock-advance mvcc-flushing

seqIdOfOldestEdit is used by prepare flush to get a flushSeqId(because we do 
not flush all stores). If we do not record seqIdOfOldestEdit under updatesLock, 
we may get an inconsistent 
view of the memstore's snapshot and seqIdOfOldestEdit.

Maybe I canonly record it when perColumnFamilyFlushEnabled is true?




 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

2014-10-27 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185023#comment-14185023
 ] 

zhangduo commented on HBASE-10201:
--

oh, I think there is a trick to optimize it.

In most case, store.setSeqIdOfOldestEdit(seqId) will not change the value of 
seqIdOfOldestEdit because the sequence id is increase and we need to record the 
smallest one.

maybe we can record current sequence id before appendNoSync, and compare it 
with seqIdOfOldestEdit, if it is already larger than seqIdOfOldestEdit then we 
can skip the setSeqIdOfOldestEdit call because the actual sequence id must be 
larger than current sequence id value.

 Port 'Make flush decisions per column family' to trunk
 --

 Key: HBASE-10201
 URL: https://issues.apache.org/jira/browse/HBASE-10201
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
 HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201.patch


 Currently the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >