[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575394#comment-16575394 ] ASF GitHub Bot commented on ORC-175: Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/159 > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing >Priority: Major > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160952#comment-16160952 ] ASF GitHub Bot commented on ORC-175: Github user xndai commented on the issue: https://github.com/apache/orc/pull/159 @iamhumanbeing did you compare it with zstd? Based on my experience, zstd is way better than igzip. I would expect a similar result with ISA-L. It doesn't seem to be adding a lot of value if we plan to support zstd in near future. > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152021#comment-16152021 ] ASF GitHub Bot commented on ORC-175: Github user iamhumanbeing commented on the issue: https://github.com/apache/orc/pull/159 @omalley: How about we add a compiling option which use ISA-L ZLIB support for ZLIB compression&decompression? this option is just an optimization on performance. > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146185#comment-16146185 ] Gopal V commented on ORC-175: - The fact that the underlying codec is compatible means that we should be able to extend the decompression speedups to all users who already use Zlib for their storage - which is a big advantage if it can be pulled off. If Hadoop's ZlibCodec could be built to work against ISA-L, then all of the projects would get ISA-L support at the same time. That might be much easier than trying to build it into ORC alone, since hadoop-common already has a partial dependency on ISA-L and JNI code which depends on it. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt#L126 {code} find_library(ISAL_LIBRARY NAMES isal PATHS ${CUSTOM_ISAL_PREFIX} ${CUSTOM_ISAL_PREFIX}/lib ${CUSTOM_ISAL_PREFIX}/lib64 ${CUSTOM_ISAL_LIB} /usr/lib) {code} > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146136#comment-16146136 ] ASF GitHub Bot commented on ORC-175: Github user omalley commented on the issue: https://github.com/apache/orc/pull/159 Is the ISA-L zlib support sufficient to read and write the ORC files with zlib compression? I agree with Gopal that it doesn't feel like a separate compression codec. I'm don't think it is a good idea to build in support for proprietary compression formats. If it is just an optimization on performance, that might be workable. As a unique codec, it isn't. > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139500#comment-16139500 ] ASF GitHub Bot commented on ORC-175: Github user 10110346 commented on the issue: https://github.com/apache/orc/pull/159 LGTM > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137820#comment-16137820 ] ASF GitHub Bot commented on ORC-175: Github user iamhumanbeing commented on a diff in the pull request: https://github.com/apache/orc/pull/159#discussion_r134650087 --- Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java --- @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException { return new GZIPOutputStream(out); case SNAPPY: return new SnappyCodec().createOutputStream(out); + case ISAL: --- End diff -- 1. igzip is only part of the ISAL. 2. igzip only support level 0 - level 1 compression, it can not replace libz totally. 3. igzip's API can not replace ligz's API directly. > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136559#comment-16136559 ] ASF GitHub Bot commented on ORC-175: Github user t3rmin4t0r commented on a diff in the pull request: https://github.com/apache/orc/pull/159#discussion_r134427978 --- Diff: java/bench/src/java/org/apache/orc/bench/CompressionKind.java --- @@ -53,6 +54,8 @@ public OutputStream create(OutputStream out) throws IOException { return new GZIPOutputStream(out); case SNAPPY: return new SnappyCodec().createOutputStream(out); + case ISAL: --- End diff -- ISAL is just the library for gzip? That doesn't need a new codec - have you tried LD_PRELOAD to load ISAL instead of libz? > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136466#comment-16136466 ] iamhumanbeing commented on ORC-175: --- https://github.com/apache/orc/pull/159 > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128514#comment-16128514 ] iamhumanbeing commented on ORC-175: --- [~owen.omalley] > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126936#comment-16126936 ] iamhumanbeing commented on ORC-175: --- [~gopalv]:got orc-benchmark data for ISA-L ColumnProjectionBenchmark.orc isal taxi avgt3 944503.179 卤 43834.261 us/op ColumnProjectionBenchmark.orc zlib taxi avgt3 1029682.551 卤 26364.565 us/op FullReadBenchmark.orc isal taxi avgt3 14192224.371 卤 180230.436 us/op FullReadBenchmark.orc zlib taxi avgt3 16234465.657 卤 415264.953 us/op seems 8%-14% speedups > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119622#comment-16119622 ] Gopal V commented on ORC-175: - igzip is not testing the same codepath as ORC, because ORC does use a mix of LZ77 widths and tries to use faster decode loops, at lower compression levels (it uses different zlib combinations for different column types). https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/ZlibCodec.java#L138 Does ISA-L offer any meaningful speedups for those combinations of Zlib-compatible algorithms? > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119607#comment-16119607 ] iamhumanbeing commented on ORC-175: --- Gopal V:inflate performance--》 ZLIB:197MB/s;ISA-L:356MB/s perf@perf-master:~/git/isa-l/igzip$ ./igzip_inflate_perf part-00127 isal_inflate_perf: Using igzip compression igzip_zlib_inflate_perf: part-00127 215 iterations file part-00127 - in_size=4632710 out_size=4632710 iter=215 igzip_file: runtime =5053893 usecs, bandwidth 949 MB in 5.0539 sec = 197.08 MB/s End of igzip_zlib_inflate_perf isal_inflate_stateless_perf: part-00127 215 iterations file part-00127 - in_size=4632710 out_size=4632710 iter=215 igzip_file: runtime =2794267 usecs, bandwidth 949 MB in 2.7943 sec = 356.46 MB/s End of isal_inflate_stateless_perf > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119392#comment-16119392 ] Gopal V commented on ORC-175: - [~iamhumanbeing]: the really interesting part is not the compressor, but the inflate_fast() performance - do you have any numbers on the throughput gains? > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119368#comment-16119368 ] iamhumanbeing commented on ORC-175: --- the ISA-L API is compatible with ZLIB C API。We have tested it on Hive 1.2.1。For “insert table” SQL, 10% performance improvement。 > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-175) ZLIB performance
[ https://issues.apache.org/jira/browse/ORC-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116796#comment-16116796 ] Owen O'Malley commented on ORC-175: --- Is the ISA-L API-compatible with zlib or is the change bigger than that? > ZLIB performance > > > Key: ORC-175 > URL: https://issues.apache.org/jira/browse/ORC-175 > Project: ORC > Issue Type: Improvement > Components: Java >Reporter: iamhumanbeing >Assignee: iamhumanbeing > Labels: performance > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v6.4.14#64029)