subject:"\[jira\] \[Updated\] \(HBASE\-11927\) Use Native Hadoop Library for HFile checksum \(And flip default from CRC32 to CRC32C\)"

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-22 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-11927:

Fix Version/s: (was: 1.1.4)

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-22 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Re-resolving. I'd say time for new JIRA if you fellas pulling it back.

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.1.4, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Attachment: HBASE-11927-branch-1.1.patch

This looks like a good one to bring back to branch-1.1. Here's Appy's patch but 
with the default held at CRC32. A review is appreciated.

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Fix Version/s: 1.1.4

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0, 1.1.4
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2016-02-11 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-11927:
-
Status: Patch Available  (was: Reopened)

> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
> CRC32C)
> 
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Assignee: Appy
> Fix For: 2.0.0, 1.2.0, 1.1.4
>
> Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, 
> HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, 
> HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, 
> HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, 
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg, 
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, 
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
Attachment: HBASE-11927-v8.patch

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Issue Type: Improvement  (was: Bug)

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-11927:

Component/s: Performance

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11927:
--
   Resolution: Fixed
Fix Version/s: 1.2.0
   2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Pushed to branch-1 and master. Thanks for the nice patch [~appy]

 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to 
 CRC32C)
 

 Key: HBASE-11927
 URL: https://issues.apache.org/jira/browse/HBASE-11927
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, 
 HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, 
 HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, 
 HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, 
 before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, 
 c2021.write.2.svg, c2021.zip.svg, crc32ct.svg


 Up in hadoop they have this change. Let me publish some graphs to show that 
 it makes a difference (CRC is a massive amount of our CPU usage in my 
 profiling of an upload because of compacting, flushing, etc.).  We should 
 also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
 hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

2015-05-15 Thread Apekshit Sharma (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apekshit Sharma updated HBASE-11927:

Release Note:
Checksumming is cpu intensive. HBase computes additional checksums for HFiles
(hdfs does checksums too) and stores them inline with file data. During
reading, these checksums are verified to ensure data is not corrupted. This
patch tries to use Hadoop Native Library for checksum computation, if it’s
available, otherwise falls back to standard Java libraries. Instructions to
load NHL in HBase can be found here
(http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily
because of two reasons: 1) CRC32C has better error detection properties, and 2)
New Intel processors have a dedicated instruction for crc32c computation
(SSE4.2 instruction set)*. This change is fully backward compatible. Also,
users should not see any differences except decrease in cpu usage. To keep old
settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see
if your processor supports SSE4.2.

was:
Checksumming is cpu intensive. HBase computes additional checksums for HFiles
(hdfs does checksums too) and stores them inline with file data. During
reading, these checksums are verified to ensure data is not corrupted. This
patch tries to use Hadoop Native Library for checksum computation, if it’s
available, otherwise falls back to standard Java libraries. Instructions to
load NHL in HBase can be found here
(http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm has been changed from CRC32 to CRC32C primarily
because of two reasons: 1) CRC32C has better error detection properties, and 2)
New Intel processors have a dedicated instruction for crc32c computation
(SSE4.2 instruction set)*. This changes is fully backward compatible. Also,
users should not see any differences except decrease in cpu usage. To keep old
settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see
if your processor supports SSE4.2.

Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to
CRC32C)

Key: HBASE-11927
URL: https://issues.apache.org/jira/browse/HBASE-11927
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: Apekshit Sharma
Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch,
HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch,
HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch,
HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg,
before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg,
c2021.write.2.svg, c2021.zip.svg, crc32ct.svg

Up in hadoop they have this change. Let me publish some graphs to show that
it makes a difference (CRC is a massive amount of our CPU usage in my
profiling of an upload because of compacting, flushing, etc.). We should
also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in
hbase but that is another issue for now.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

29 matches

Mail list logo