[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-22 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-11927:

Fix Version/s: (was: 1.1.4)

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-22 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Re-resolving. I'd say time for new JIRA if you fellas pulling it back.

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.1.4, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Attachment: HBASE-11927-branch-1.1.patch

This looks like a good one to bring back to branch-1.1. Here's Appy's patch but 
with the default held at CRC32. A review is appreciated.

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Fix Version/s: 1.1.4

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0, 1.1.4
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Status: Patch Available  (was: Reopened)

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0, 1.1.4
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
Attachment: HBASE-11927-v8.patch

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Issue Type: Improvement  (was: Bug)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Component/s: Performance

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
   Resolution: Fixed
Fix Version/s: 1.2.0
   2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Pushed to branch-1 and master. Thanks for the nice patch [~appy]

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Release Note: 
Checksumming is cpu intensive. HBase computes additional checksums for HFiles 
(hdfs does checksums too) and stores them inline with file data. During 
reading, these checksums are verified to ensure data is not corrupted. This 
patch tries to use Hadoop Native Library for checksum computation, if it’s 
available, otherwise falls back to standard Java libraries. Instructions to 
load NHL in HBase can be found here 
(http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily 
because of two reasons: 1) CRC32C has better error detection properties, and 2) 
New Intel processors have a dedicated instruction for crc32c computation 
(SSE4.2 instruction set)*. This change is fully backward compatible. Also, 
users should not see any differences except decrease in cpu usage. To keep old 
settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see 
if your processor supports SSE4.2.

  was:
Checksumming is cpu intensive. HBase computes additional checksums for HFiles 
(hdfs does checksums too) and stores them inline with file data. During 
reading, these checksums are verified to ensure data is not corrupted. This 
patch tries to use Hadoop Native Library for checksum computation, if it’s 
available, otherwise falls back to standard Java libraries. Instructions to 
load NHL in HBase can be found here 
(http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily 
because of two reasons: 1) CRC32C has better error detection properties, and 2) 
New Intel processors have a dedicated instruction for crc32c computation 
(SSE4.2 instruction set)*. This changes is fully backward compatible. Also, 
users should not see any differences except decrease in cpu usage. To keep old 
settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see 
if your processor supports SSE4.2.


 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Open  (was: Patch Available)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Patch Available  (was: Open)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: HBASE-11927-v5.patch

v5 patch
- moved DEFAULT_CHECKSUM_TYPE to ChecksumType.
- added fn to get datachecksum type for ChecksumType
- added one more test.
- created jira for ChecksumType cleanup. HBASE-13692

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Release Note: 
Checksumming is cpu intensive. HBase computes additional checksums for HFiles 
(hdfs does checksums too) and stores them inline with file data. During 
reading, these checksums are verified to ensure data is not corrupted. This 
patch tries to use Hadoop Native Library for checksum computation, if it’s 
available, otherwise falls back to standard Java libraries. Instructions to 
load NHL in HBase can be found here 
(http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily 
because of two reasons: 1) CRC32C has better error detection properties, and 2) 
New Intel processors have a dedicated instruction for crc32c computation 
(SSE4.2 instruction set)*. This changes is fully backward compatible. Also, 
users should not see any differences except decrease in cpu usage. To keep old 
settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see 
if your processor supports SSE4.2.

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Patch Available  (was: Open)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Open  (was: Patch Available)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: HBASE-11927-v6.patch

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Patch Available  (was: Open)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: HBASE-11927-v7.patch

Fixing hudson issues.
Thanks [~stack].

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Open  (was: Patch Available)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Patch Available  (was: Open)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Open  (was: Patch Available)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Patch Available  (was: Open)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: HBASE-11927-v8.patch

Fixes TestHFileBlock which hard coded the checksum code.

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-14 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Status: Open  (was: Patch Available)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, 
 after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
Summary: Use Native Hadoop Library for HFile checksum (And flip default 
from CRC32 to CRC32C)  (was: Use Native Hadoop Library for HFile checksum)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927.patch, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, 
 compact-with-native.svg, compact-without-native.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-12 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: after-randomWrite1M-0.5%.svg
before-randomWrite1M-5%.svg
before-compact-22%.svg
after-compact-2%.svg
HBASE-11927-v4.patch

- removed duplicated defaults, now there is single one in HConstants.
- testing: added test and removed obsolete test.
- cleaned up dead code from ChecksumUtil.

- Attaching flame graphs.
TODO: write release note.
 

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, compact-with-native.svg, compact-without-native.svg, 
 crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-12 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: (was: compact-with-native.svg)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-12 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Attachment: (was: compact-without-native.svg)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, 
 after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
 before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
 c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)