[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-11927: Fix Version/s: (was: 1.1.4) > Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to > CRC32C) > > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: stack >Assignee: Appy > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, > HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, > HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, > HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, > after-randomWrite1M-0.5%.svg, before-compact-22%.svg, > before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, > c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that > it makes a difference (CRC is a massive amount of our CPU usage in my > profiling of an upload because of compacting, flushing, etc.). We should > also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in > hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11927: -- Resolution: Fixed Status: Resolved (was: Patch Available) Re-resolving. I'd say time for new JIRA if you fellas pulling it back. > Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to > CRC32C) > > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: stack >Assignee: Appy > Fix For: 2.0.0, 1.1.4, 1.2.0 > > Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, > HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, > HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, > HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, > after-randomWrite1M-0.5%.svg, before-compact-22%.svg, > before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, > c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that > it makes a difference (CRC is a massive amount of our CPU usage in my > profiling of an upload because of compacting, flushing, etc.). We should > also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in > hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-11927: - Attachment: HBASE-11927-branch-1.1.patch This looks like a good one to bring back to branch-1.1. Here's Appy's patch but with the default held at CRC32. A review is appreciated. > Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to > CRC32C) > > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: stack >Assignee: Appy > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, > HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, > HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, > HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, > after-randomWrite1M-0.5%.svg, before-compact-22%.svg, > before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, > c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that > it makes a difference (CRC is a massive amount of our CPU usage in my > profiling of an upload because of compacting, flushing, etc.). We should > also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in > hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-11927: - Fix Version/s: 1.1.4 > Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to > CRC32C) > > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: stack >Assignee: Appy > Fix For: 2.0.0, 1.2.0, 1.1.4 > > Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, > HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, > HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, > HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, > after-randomWrite1M-0.5%.svg, before-compact-22%.svg, > before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, > c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that > it makes a difference (CRC is a massive amount of our CPU usage in my > profiling of an upload because of compacting, flushing, etc.). We should > also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in > hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-11927: - Status: Patch Available (was: Reopened) > Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to > CRC32C) > > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: stack >Assignee: Appy > Fix For: 2.0.0, 1.2.0, 1.1.4 > > Attachments: HBASE-11927-branch-1.1.patch, HBASE-11927-v1.patch, > HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, > HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, > HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, > after-randomWrite1M-0.5%.svg, before-compact-22%.svg, > before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, > c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that > it makes a difference (CRC is a massive amount of our CPU usage in my > profiling of an upload because of compacting, flushing, etc.). We should > also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in > hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11927: -- Attachment: HBASE-11927-v8.patch Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Issue Type: Improvement (was: Bug) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Apekshit Sharma Fix For: 2.0.0, 1.2.0 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Component/s: Performance Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Improvement Components: Performance Reporter: stack Assignee: Apekshit Sharma Fix For: 2.0.0, 1.2.0 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11927: -- Resolution: Fixed Fix Version/s: 1.2.0 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-1 and master. Thanks for the nice patch [~appy] Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Fix For: 2.0.0, 1.2.0 Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Release Note: Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib). Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This change is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’. * On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2. was: Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib). Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This changes is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’. * On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2. Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Open (was: Patch Available) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Patch Available (was: Open) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: HBASE-11927-v5.patch v5 patch - moved DEFAULT_CHECKSUM_TYPE to ChecksumType. - added fn to get datachecksum type for ChecksumType - added one more test. - created jira for ChecksumType cleanup. HBASE-13692 Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Release Note: Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib). Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This changes is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’. * On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2. Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Patch Available (was: Open) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Open (was: Patch Available) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: HBASE-11927-v6.patch Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Patch Available (was: Open) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: HBASE-11927-v7.patch Fixing hudson issues. Thanks [~stack]. Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Open (was: Patch Available) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Patch Available (was: Open) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Open (was: Patch Available) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Patch Available (was: Open) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: HBASE-11927-v8.patch Fixes TestHFileBlock which hard coded the checksum code. Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Status: Open (was: Patch Available) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927-v5.patch, HBASE-11927-v6.patch, HBASE-11927-v7.patch, HBASE-11927-v8.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11927: -- Summary: Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) (was: Use Native Hadoop Library for HFile checksum) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927.patch, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, compact-with-native.svg, compact-without-native.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: after-randomWrite1M-0.5%.svg before-randomWrite1M-5%.svg before-compact-22%.svg after-compact-2%.svg HBASE-11927-v4.patch - removed duplicated defaults, now there is single one in HConstants. - testing: added test and removed obsolete test. - cleaned up dead code from ChecksumUtil. - Attaching flame graphs. TODO: write release note. Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, compact-with-native.svg, compact-without-native.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: (was: compact-with-native.svg) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11927) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)
[ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-11927: Attachment: (was: compact-without-native.svg) Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C) Key: HBASE-11927 URL: https://issues.apache.org/jira/browse/HBASE-11927 Project: HBase Issue Type: Bug Reporter: stack Assignee: Apekshit Sharma Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch, HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg, after-randomWrite1M-0.5%.svg, before-compact-22%.svg, before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)