[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536029#comment-13536029
 ] 

Hadoop QA commented on HBASE-5778:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12561641/HBASE-5778-0.94-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3611//console

This message is automatically generated.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778-0.94-v7.patch, 
 HBASE-5778.patch, HBASE-5778-trunk-v6.patch, HBASE-5778-trunk-v7.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536047#comment-13536047
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

[~ram_krish] I wasn't aware of it so I guess it's still an issue. Will open a 
Jira.

[~stack] Ok thanks, also I'll change the title.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778-0.94-v7.patch, 
 HBASE-5778.patch, HBASE-5778-trunk-v6.patch, HBASE-5778-trunk-v7.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535650#comment-13535650
 ] 

ramkrishna.s.vasudevan commented on HBASE-5778:
---

@JD
http://mail-archives.apache.org/mod_mbox/hbase-dev/201205.mbox/%3C00bc01cd31e6$7caf1320$760d3960$%25vasude...@huawei.com%3E.
Do we still get the OOME with WAL compression?

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778-0.94-v7.patch, 
 HBASE-5778.patch, HBASE-5778-trunk-v6.patch, HBASE-5778-trunk-v7.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535665#comment-13535665
 ] 

stack commented on HBASE-5778:
--

I'm ok w/ committing it but I think it should be off in 0.96.  It looks too 
flakey to be on by default (thanks for the OOME reminder Ram).

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778-0.94-v7.patch, 
 HBASE-5778.patch, HBASE-5778-trunk-v6.patch, HBASE-5778-trunk-v7.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533212#comment-13533212
 ] 

stack commented on HBASE-5778:
--

Good finds [~jdcryans]. Sounds like WAL compression could do w/ some more 
testing especially if it becomes default in 0.96.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532765#comment-13532765
 ] 

Ted Yu commented on HBASE-5778:
---

TestWALReplayCompressed passes with the rest of trunk patch:
{code}
Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
2012-12-14 14:56:28.308 java[85149:1903] Unable to load realm mapping info from 
SCDynamicStore
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.388 sec
{code}
Shall we keep this test ?

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532766#comment-13532766
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

bq. Shall we keep this test ?

TestWALReplayCompressed is TestWALReplay with compression enabled which in 
trunk will now be default, so it would be redundant to keep it.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532772#comment-13532772
 ] 

Ted Yu commented on HBASE-5778:
---

Thanks for the reminder, J-D.
My question becomes: shall we introduce TestWALReplayUncompressed ?
Running the patch on Linux I got:
{code}
testSimplePutDelete(org.apache.hadoop.hbase.replication.TestMasterReplication)  
Time elapsed: 0.12 sec   FAILURE!
java.lang.AssertionError: Waited too much time for put replication
  at org.junit.Assert.fail(Assert.java:93)
  at 
org.apache.hadoop.hbase.replication.TestMasterReplication.putAndWait(TestMasterReplication.java:276)
  at 
org.apache.hadoop.hbase.replication.TestMasterReplication.testSimplePutDelete(TestMasterReplication.java:213)
queueFailover(org.apache.hadoop.hbase.replication.TestReplication)  Time 
elapsed: 0.119 sec   FAILURE!
java.lang.AssertionError: Waited too much time for queueFailover replication. 
Waited 17533ms.
  at org.junit.Assert.fail(Assert.java:93)
  at 
org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:765)
{code}
For ReplicationHLogReaderManager.java:
{code}
+public class ReplicationHLogReaderManager {
{code}
Please add annotation for audience and stability.
For readNextAndSetPosition():
{code}
+   * Get the next entry, returned and also added in the array
{code}
Please phase the above line so that it is easier to understand.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532783#comment-13532783
 ] 

Ted Yu commented on HBASE-5778:
---

{code}
+  public HLog.Entry readNextAndSetPosition(HLog.Entry[] entriesArray, int 
currentNbEntries) throws IOException {
...
+HLog.Entry entry = 
this.repLogReader.readNextAndSetPosition(this.entriesArray, 
this.currentNbEntries);
{code}
nit: line too long.


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532784#comment-13532784
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

bq. My question becomes: shall we introduce TestWALReplayUncompressed

That makes sense.

bq. Running the patch on Linux I got:

If you change SLEEP_TIME for 1500, does it still fail? If not, that's the IPV6 
problem.

bq. Please add annotation for audience and stability.

Thanks, forgot about that.

bq. Please phase the above line so that it is easier to understand.

It works the same way as reader.next, is there anything in particular you think 
needs more explanation?

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532795#comment-13532795
 ] 

Ted Yu commented on HBASE-5778:
---

bq. is there anything in particular you think needs more explanation?
No.
bq. If you change SLEEP_TIME for 1500, does it still fail? If not, that's the 
IPV6 problem.
{code}
Running org.apache.hadoop.hbase.replication.TestMasterReplication
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 80.499 sec
{code}

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532840#comment-13532840
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

Turns out the patch fails in trunk with TestHLogSplit on 4 tests. Probably code 
that is getting messy with the HLog internals and is not expecting compression. 
Investigating.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532866#comment-13532866
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

Figured it.

First there's Compressor.uncompressIntoArray that doesn't protect itself 
against bad dict indexes. It comes out as an IndexOutOfBoundsException and 
kills log splitting.

Then there's FaultySequenceFileLogReader that doesn't speak compression and 
basically just needs to pass the compressionContext down to the HLog.Entry else 
it fails on a NegativeArraySizeException.

The test passes now with those fixes. Will post new patches later.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778-0.94-v5.patch, HBASE-5778-0.94-v6.patch, HBASE-5778.patch, 
 HBASE-5778-trunk-v6.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-10 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528430#comment-13528430
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

bq. You have to explain the difference between a reopen and a getReader 
somewhere...

Can do.

bq. If it is a compressed WAL, then we'd reopen the file... if not compressed, 
the reset is a noop

Getting a new WALReader on a file that's being written to will let us see the 
new length, so it's not a noop. Will add comments on that.

bq. This javadoc is on the wrong method:

Actually that method used to have a positionToSkipTo paramater but yeah, 
removing.

bq. I think this patch is almost there.

Thanks for your patience and guidance.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527669#comment-13527669
 ] 

stack commented on HBASE-5778:
--

reopen may not be too bad.  You have to explain the difference between a reopen 
and a getReader somewhere... as is there is none.  I don't think it would take 
much to explain why you'd reopen (would 'reset' be a better name as in 
'resetting the reader'... as to what it does reseting is implementation 
specific... If it is a compressed WAL, then we'd reopen the file... if not 
compressed, the reset is a noop -- right?)?

ReplicationHLogReader does not implement WAL HLog.Reader interface.  Should it?

This javadoc is on the wrong method:

+   * if a positionToSkipTo was specified, this method will take care of 
seeking there

I think this patch is almost there.




 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch, 
 HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-11-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497443#comment-13497443
 ] 

stack commented on HBASE-5778:
--

This is better.  Here's some comments:

Does CompressionContext class have to be public?  Can it stay pkg private?  
You'll have to move your new class into wal package but that seems fine to me.

Does the base Reader interface have to know about a compression context?  Can 
this not be internal to the implementation?

You call it ReplicationHLogReader but is it a replication only class?  If so, 
it does not belong in regionserver package but over in replication package.

My sense though is that this is a generally useful WAL reader?  One that can do 
compressed or non-compressed WAL?  One that can be used by replication but also 
by fellas who want to index hbase, etc.

Missing a license

Can it be in the wal package?  Then don't have to open up so much of HLog?

Its unfortunate that you can't tell its a compressed wal from reading say some 
magic or metadata at the head of the file.  It seems a bit broke consulting 
configuration.

Yeah, why can't an implementation of HLog.Reader manage the compression context 
internally?  Why it have to be out here in this ReplicationHLogReader class?  
Afterall, isn't the dictionary reconstructed on read?  You don't save it around?

So, a HLog.ReaderFactory that looks at configuration and returns a HLog.Reader 
that either does compressed or not by looking at configs?

Is this right:

+if (entry != null) {
+  entry.setCompressionContext(null);




 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-11-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13497528#comment-13497528
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

bq. Does CompressionContext class have to be public? Can it stay pkg private? 
You'll have to move your new class into wal package but that seems fine to me.
bq. My sense though is that this is a generally useful WAL reader? One that can 
do compressed or non-compressed WAL? One that can be used by replication but 
also by fellas who want to index hbase, etc.
bq. Can it be in the wal package? Then don't have to open up so much of HLog?

You're right that it can be a generally useful WAL reader, for users that need 
to be able to seek directly into newly open files without having to scan 
everything that comes before if compression is on. Right now its API is 
tailored to replication's need, we could make it more general but, unless we 
have another use case for it right now, I don't see the point.

So I'll move it to wal and rename the class/methods a bit.

bq. Does the base Reader interface have to know about a compression context? 
Can this not be internal to the implementation?

Until HDFS lets us tail a file under construction we need to pass the dict back 
when opening the file.

bq. Yeah, why can't an implementation of HLog.Reader manage the compression 
context internally? Why it have to be out here in this ReplicationHLogReader 
class? Afterall, isn't the dictionary reconstructed on read? You don't save it 
around?

It would be fine if we didn't have to:

 - seek into a file we never read before (after a region server died and we 
pick up the queue)
 - reopen files in order to tail them (when normally replicating)

We could augment HLog.Reader to support reopening of files, basically push down 
the stuff is doing ReplicationHLogReader even more. That way we could hide all 
the dirty details? I haven't thought about modifying that before.

bq. Missing a license

Oh thanks.

bq. Its unfortunate that you can't tell its a compressed wal from reading say 
some magic or metadata at the head of the file. It seems a bit broke consulting 
configuration.

Good point, it would simplify a lot of things. HLog compression was implemented 
at the HLog.Entry level though so technically it's not even the WAL itself 
that's compressed. My next comment shows what that means.

bq. Is this right:

Yes, if you keep the compression context in there it'll replicate the 
HLog.Entry[] compressed with a dictionary that the slave has no knowledge of. I 
had this comment in my first patch and I think I forgot to move it over:

bq. // Setting it to null prevents from sending compressed edits that the sink 
wouldn't parse

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-11-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493453#comment-13493453
 ] 

stack commented on HBASE-5778:
--

Adding compression context to the general HLog Interface seems incorrect to me. 
 This kinda of thing will not make sense for all implementations of HLog.   We 
are going against the effort which tries to turn HLog into an Interface with 
this patch as is.

Ditto on ReplicationSource having to know anything about HLog compression, 
carrying compression context (This seems 'off' having to do this in 
ReplicationSource -- +import 
org.apache.hadoop.hbase.regionserver.wal.CompressionContext;).  What happens if 
HLog has a different kind of compression than our current type?  All will break?

This seems wrong having to do this over in ReplicationSource:

{code}
+// If we're compressing logs and the oldest recovered log's last 
position is greater
+// than 0, we need to rebuild the dictionary up to that point without 
replicating
+// the edits again. The rebuilding part is simply done by reading the 
log.
{code}

Why can't the internal implementation do the skipping if dictionary is empty 
and we are at an offset  0?

Rather than passing compression context to SequenceFileLogReader, can we not 
have a CompressedSequenceLogReader and internally it manages compression 
contexts not letting them outside of CSLR?


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778-0.94-v2.patch, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-11-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492967#comment-13492967
 ] 

Ted Yu commented on HBASE-5778:
---

Is the goal to turn on WAL compression by default ?
If so, do you plan to address the test failure mentioned @ 13/Apr/12 02:53 ?
{code}
testAppendClose(org.apache.hadoop.hbase.regionserver.wal.TestHLog)  Time 
elapsed: 0.104 sec   ERROR!
java.lang.NegativeArraySizeException
  at 
org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:303)
  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1894)
  at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1934)
  at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose(TestHLog.java:483)
{code}

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
 HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-05-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280311#comment-13280311
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

I don't see how in theory the seek can be a problem when tail'ing a log from 
the start since we read the whole file. The only case where it will need to be 
handled differently is when a region server needs to replicate a log that 
another RS started working on but died. In that case we can just read the file 
up to the last seek position but don't replicate anything.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-05-20 Thread Li Pi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279927#comment-13279927
 ] 

Li Pi commented on HBASE-5778:
--

How far can replication lag behind our writes? If we can guarantee that an 
entry won't be evicted before replication, we can simply consult the main 
dictionary to decompress it.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-05-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279939#comment-13279939
 ] 

Lars Hofhansl commented on HBASE-5778:
--

We do that guarantee (J-D, please correct me if I'm wrong).
The problem - I think - is that replication directly seeks to the position 
indicated in ZK and starts playing logs from there. That would not longer be 
possible, instead we'd have to start from the beginning of the WAL file and 
scan all the way to the position that we want to replicate.
Again, I think that is what the problem is, J-D will probably know more here.


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-16 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255203#comment-13255203
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

bq. The files need to be read from the beginning to build up the dictionary.

Aren't the dictionary entries spread out in the log? If so, it should be 
possible to slowly build it up as we tail the log (that's another feature 
that's broken, tailing).

Then if you replay so WAL from another region server, for the first log you'd 
read from the beginning in order to build up the dict then when you hit the 
offset that's in ZK you start shipping.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-16 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255214#comment-13255214
 ] 

Lars Hofhansl commented on HBASE-5778:
--

If we could tail the logs it would work. We just cannot seek into an HLog in 
the middle and start reading from it.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-16 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255225#comment-13255225
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

I think everything is fine then :)

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-16 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255230#comment-13255230
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Hmm... Then that does not explain what I saw. I saw the ReplicationSource 
trying to read from a position in the file (indicated by ZK) and then the read 
failing because the dictionary was not built up.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-14 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254136#comment-13254136
 ] 

Zhihong Yu commented on HBASE-5778:
---

I think ReplicationSource now has the additional responsibility of shipping 
dictionaries to replication sink.
We just need to find a clean way of exposing 
SequenceFileLogWriter.compressionContext to ReplicationSource.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-14 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254204#comment-13254204
 ] 

Lars Hofhansl commented on HBASE-5778:
--

@Ted: Unfortunately it is not as simple as that. As I tried to explain above, 
the ReplicationSource reads from the WAL files at offsets that are stored in 
ZK. This does not work any longer, as you can no longer start reading the WAL 
at an offset. The files need to be read from the beginning to build up the 
dictionary.


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253454#comment-13253454
 ] 

stack commented on HBASE-5778:
--

I backed it out of 0.94 and trunk.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253522#comment-13253522
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-TRUNK #2752 (See 
[https://builds.apache.org/job/HBase-TRUNK/2752/])
HBASE-5778 Turn on WAL compression by default (Revision 1325801)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253529#comment-13253529
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

Sorry for all the trouble guys, I thought the feature was more tested than that 
:(

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253528#comment-13253528
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-0.94 #113 (See 
[https://builds.apache.org/job/HBase-0.94/113/])
HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253588#comment-13253588
 ] 

Lars Hofhansl commented on HBASE-5778:
--

I still don't understand why this is a problem with replication. J-D do you 
have any insights?

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253611#comment-13253611
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

I haven't had a look, but I'd guess that if we're reading files that are being 
written then we don't have access to the dict.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253614#comment-13253614
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Oh I see. The KVs are only decompressed when read.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253615#comment-13253615
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-0.94-security #9 (See 
[https://builds.apache.org/job/HBase-0.94-security/9/])
HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803)
HBASE-5778  Turn on WAL compression by default (Revision 1325567)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java

jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253636#comment-13253636
 ] 

Hadoop QA commented on HBASE-5778:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestWALPlayer
  org.apache.hadoop.hbase.coprocessor.TestClassLoading

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1515//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1515//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1515//console

This message is automatically generated.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253750#comment-13253750
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-TRUNK-security #170 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/170/])
HBASE-5778 Turn on WAL compression by default (Revision 1325801)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253995#comment-13253995
 ] 

Lars Hofhansl commented on HBASE-5778:
--

This fundamentally break replication.

The problem above is actually that the HLogKey and WALEdit after being read 
from a compressed HLog have the compression context set, and hence this will be 
used to compress them when sent over the wire to the sink. Of course the sink 
does not know how to uncompress.

So I just set the compression context to null in ReplicationSource.

With that hurdle out of the way, I find that seeking to a specific position in 
the HLog (the position stored in ZK) does not work, because the dictionary is 
not build up (compressed HLogs always need to read from the beginning).

Not sure how to fix the 2nd part.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252924#comment-13252924
 ] 

Todd Lipcon commented on HBASE-5778:


Do we have this in hbase-default.xml as well? if not, +1

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252927#comment-13252927
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

It's not in there, do we want it since we turn it on? Or do we act like we 
always had it? :)

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252939#comment-13252939
 ] 

Lars Hofhansl commented on HBASE-5778:
--

+1 on patch

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252947#comment-13252947
 ] 

stack commented on HBASE-5778:
--

+1  Add release note w/ how to turn it off

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252976#comment-13252976
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-TRUNK #2749 (See 
[https://builds.apache.org/job/HBase-TRUNK/2749/])
HBASE-5778  Turn on WAL compression by default (Revision 1325566)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253001#comment-13253001
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-0.94 #109 (See 
[https://builds.apache.org/job/HBase-0.94/109/])
HBASE-5778  Turn on WAL compression by default (Revision 1325567)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253053#comment-13253053
 ] 

Lars Hofhansl commented on HBASE-5778:
--

I see a bunch of suspicious test failures now:
{code}
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:305)
{code}


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253058#comment-13253058
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Yeah... The failures in TestHLog are because of this. Need to rollback or 
figure out what the problem is. Probably test related.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253061#comment-13253061
 ] 

Lars Hofhansl commented on HBASE-5778:
--

TestHLog.testAppendClose() uses this to read back the WALEdits:
{code}
// Make sure you can read all the content
SequenceFile.Reader reader
  = new SequenceFile.Reader(this.fs, walPath, this.conf);
{code}
Well, dah, that does not work.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253063#comment-13253063
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Then there's FaultySequenceFileLogReader, which does not do the right thing.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253065#comment-13253065
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Have a fix for TestHLog, working on TestHLogSplit

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253091#comment-13253091
 ] 

Lars Hofhansl commented on HBASE-5778:
--

Similar... There're some other issues. I'll have a patch soon.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253107#comment-13253107
 ] 

Ted Yu commented on HBASE-5778:
---

{code}
+  } catch (IndexOutOfBoundsException iobe) {
+// this can happen with a corrupted file, fall through
+  }
{code}
I think we should note down the cause of failure to retrieve dictionary entry 
and provide clearer message in the IOE below:
{code}
   if (entry == null) {
 throw new IOException(Missing dictionary entry for index 
 + dictIdx);
{code}

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253110#comment-13253110
 ] 

Hadoop QA commented on HBASE-5778:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1507//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1507//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1507//console

This message is automatically generated.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253118#comment-13253118
 ] 

Lars Hofhansl commented on HBASE-5778:
--

mvn failed with an OOME. Let's revert this change, until we track these issues 
down.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253127#comment-13253127
 ] 

Ted Yu commented on HBASE-5778:
---

The remaining issue is about how the replication sink correctly decompresses 
WAL.
From test output, I saw:
{code}
java.io.EOFException
  at java.io.DataInputStream.readFully(DataInputStream.java:180)
  at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243)
  at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249)
  at 
org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129)
  at 
org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700)
{code}
For replication sink, there is no CompressionContext in HLog$Entry which can be 
used to perform decompression.

I agree the change should be reverted.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253138#comment-13253138
 ] 

Hudson commented on HBASE-5778:
---

Integrated in HBase-TRUNK-security #169 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/169/])
HBASE-5778  Turn on WAL compression by default (Revision 1325566)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java


 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira