[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-12-04 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721131#comment-15721131
 ] 

Tao Li commented on HADOOP-13849:
-

[~ste...@apache.org] Thanks Steve. I will do some profiling in the future to 
find the bottleneck.

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-12-01 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712862#comment-15712862
 ] 

Ravi Prakash commented on HADOOP-13849:
---

That makes sense Steve and Tao Li! Thanks for your efforts. Please keep us 
updated if you find any bottlenecks. 

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-12-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711831#comment-15711831
 ] 

Steve Loughran commented on HADOOP-13849:
-

Well, if you want to work on it, feel free. 

however, know that the native codec uses the standard {{libbz2}}; there's not 
much that can be done in the Hadoop code to speed that up other than any 
improvements in how data is moved between the Java memory structures and those 
of libbz...if there are memory copies taking place then that could be hurting 
performance. Anything that can help there would be good.


bq. I think the "system native" should have better compress/decompress 
performance than "java builtin".

That's something to explore. The latest Java 8 compilers are fast, and if the 
algorithms aren't doing lots of object creation, then bit operations in Java 
should be on a par with C-language actions against general registers. Where you 
would expect differences is if the native code uses some special CPU registers 
and operations (example, Intel SSE2) for significant performance. I don't know 
if bzip does that.

The fun part in benchmarking is isolating things. For codec performance, maybe 
have some test data being pre generated in CPU & cached in RAM. in standard 
formats (avro, orc), and the different codecs, then compressing that to RAM not 
HDD, so that the compression code is isolated from Disk IO, etc, etc. 

If the isolated native code is faster than the java one, then the implication 
is that the bottleneck is elsewhere in the workflow, not the codec. Again: 
that's interesting information.

bq. My hardware CPU/Memory/Network bandwidh/Disk bandwidh are not bottleneck

one of them is. Always —and it can be things like CPU cache latencies, excess 
synchronization in the code, even branch-misprediction in the CPU can hurt 
efficiency. FWIW, Flamegraphs are current the tool of choice for visualising 
performance during microbenchmarks





> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-11-30 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710658#comment-15710658
 ] 

Tao Li commented on HADOOP-13849:
-

Yes. I think the "system native" should have better compress/decompress 
performance than "java builtin".

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-11-30 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710655#comment-15710655
 ] 

Tao Li commented on HADOOP-13849:
-

[~ste...@apache.org]
1. I saw the "using java builtin" or "using system-native" in my test cases 
log, so I am sure my test cases are correct.
2. My hardware CPU/Memory/Network bandwidh/Disk bandwidh are not bottleneck
3. I have also tested decompress speed. I even found that the "java builtin" is 
faster than "system native"

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-11-30 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709305#comment-15709305
 ] 

Ravi Prakash commented on HADOOP-13849:
---

Hi Tao Li!

Thanks for your effort to benchmark the two implementations. Are you proposing 
to make one faster than the other?

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13849) Bzip2 java-builtin and system-native have almost the same compress speed

2016-11-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709247#comment-15709247
 ] 

Steve Loughran commented on HADOOP-13849:
-

# you can see what code is used in the logs; if it says "java builtin" then 
it's using the java one; if it says system, then its using system.
# there are other factors in performance, like disk bandwidth. you may not get 
speedup.
# compare the decompress times too.

Closing as invalid, sorry
https://wiki.apache.org/hadoop/InvalidJiraIssues 

> Bzip2 java-builtin and system-native have almost the same compress speed
> 
>
> Key: HADOOP-13849
> URL: https://issues.apache.org/jira/browse/HADOOP-13849
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: os version: redhat6
> hadoop version: 2.6.0
> native bzip2 version: bzip2-devel-1.0.5-7.el6_0.x86_64
>Reporter: Tao Li
>
> I tested bzip2 java-builtin and system-native compression, and I found the 
> compress speed is almost the same. (I think the system-native should have 
> better compress speed than java-builtin)
> My test case:
> 1. input file: 2.7GB text file without compression
> 2. after bzip2 java-builtin compress: 457MB, 12min 4sec
> 3. after bzip2 system-native compress: 457MB, 12min 19sec
> My MapReduce Config:
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false");
> conf.set("mapreduce.output.fileoutputformat.compress", "true");
> conf.set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
> conf.set("mapreduce.output.fileoutputformat.compress.codec", 
> "org.apache.hadoop.io.compress.BZip2Codec");
> conf.set("io.compression.codec.bzip2.library", "java-builtin"); // for 
> java-builtin
> conf.set("io.compression.codec.bzip2.library", "system-native"); // for 
> system-native
> And I am sure I have enable the bzip2 native, the output of command "hadoop 
> checknative -a" is as follows:
> Native library checking:
> hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> zlib:true /lib64/libz.so.1
> snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
> lz4: true revision:99
> bzip2:   true /lib64/libbz2.so.1
> openssl: true /usr/lib64/libcrypto.so



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org