[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721131#comment-15721131 ] Tao Li commented on HADOOP-13849: - [~ste...@apache.org] Thanks Steve. I will do some profiling in the future to find the bottleneck. > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712862#comment-15712862 ] Ravi Prakash commented on HADOOP-13849: --- That makes sense Steve and Tao Li! Thanks for your efforts. Please keep us updated if you find any bottlenecks. > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711831#comment-15711831 ] Steve Loughran commented on HADOOP-13849: - Well, if you want to work on it, feel free. however, know that the native codec uses the standard {{libbz2}}; there's not much that can be done in the Hadoop code to speed that up other than any improvements in how data is moved between the Java memory structures and those of libbz...if there are memory copies taking place then that could be hurting performance. Anything that can help there would be good. bq. I think the "system native" should have better compress/decompress performance than "java builtin". That's something to explore. The latest Java 8 compilers are fast, and if the algorithms aren't doing lots of object creation, then bit operations in Java should be on a par with C-language actions against general registers. Where you would expect differences is if the native code uses some special CPU registers and operations (example, Intel SSE2) for significant performance. I don't know if bzip does that. The fun part in benchmarking is isolating things. For codec performance, maybe have some test data being pre generated in CPU & cached in RAM. in standard formats (avro, orc), and the different codecs, then compressing that to RAM not HDD, so that the compression code is isolated from Disk IO, etc, etc. If the isolated native code is faster than the java one, then the implication is that the bottleneck is elsewhere in the workflow, not the codec. Again: that's interesting information. bq. My hardware CPU/Memory/Network bandwidh/Disk bandwidh are not bottleneck one of them is. Always —and it can be things like CPU cache latencies, excess synchronization in the code, even branch-misprediction in the CPU can hurt efficiency. FWIW, Flamegraphs are current the tool of choice for visualising performance during microbenchmarks > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710658#comment-15710658 ] Tao Li commented on HADOOP-13849: - Yes. I think the "system native" should have better compress/decompress performance than "java builtin". > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710655#comment-15710655 ] Tao Li commented on HADOOP-13849: - [~ste...@apache.org] 1. I saw the "using java builtin" or "using system-native" in my test cases log, so I am sure my test cases are correct. 2. My hardware CPU/Memory/Network bandwidh/Disk bandwidh are not bottleneck 3. I have also tested decompress speed. I even found that the "java builtin" is faster than "system native" > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709305#comment-15709305 ] Ravi Prakash commented on HADOOP-13849: --- Hi Tao Li! Thanks for your effort to benchmark the two implementations. Are you proposing to make one faster than the other? > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed
[ https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709247#comment-15709247 ] Steve Loughran commented on HADOOP-13849: - # you can see what code is used in the logs; if it says "java builtin" then it's using the java one; if it says system, then its using system. # there are other factors in performance, like disk bandwidth. you may not get speedup. # compare the decompress times too. Closing as invalid, sorry https://wiki.apache.org/hadoop/InvalidJiraIssues > Bzip2 java-builtin and system-native have almost the same compress speed > > > Key: HADOOP-13849 > URL: https://issues.apache.org/jira/browse/HADOOP-13849 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.0 > Environment: os version: redhat6 > hadoop version: 2.6.0 > native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64 >Reporter: Tao Li > > I tested bzip2 java-builtin and system-native compression, and I found the > compress speed is almost the same. (I think the system-native should have > better compress speed than java-builtin) > My test case: > 1. input file: 2.7GB text file without compression > 2. after bzip2 java-builtin compress: 457MB, 12min 4sec > 3. after bzip2 system-native compress: 457MB, 12min 19sec > My MapReduce Config: > conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false"); > conf.set("mapreduce.output.fileoutputformat.compress", "true"); > conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK"); > conf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.BZip2Codec"); > conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for > java-builtin > conf.set("io.compression.codec.bzip2.library", "system-native"); // for > system-native > And I am sure I have enable the bzip2 native, the output of command "hadoop > checknative -a" is as follows: > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: true /usr/lib64/libcrypto.so -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org