Can someone tell me why native bzip2 de/compression works in hadoop 2.4.1 for
map output compression, but the java bzip2 implementation is used for input
file
decompression? Is this expected?
While profiling some hadoop wordcount jobs using a bzip2 compressed input file,
it
looks like bzip2 decompression is using the java implementation rather than the
native
library for input file decompression. Output from the linux perf tool (see
below), shows
that the java bzip2 implementation is used.
1.83% java perf-12473.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
1.42% java perf-11567.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
1.16% java perf-12473.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
1.05% java perf-12174.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.99% java perf-11770.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.98% java perf-12826.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.89% java perf-12174.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
0.79% java perf-12739.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
0.79% java perf-12544.map [.]
Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
When using the perf tool to check map output compression, it shows that the
library version
is correctly used.
This cluster is running Apache Hadoop version 2.4.1 which has been compiled
from source
to include native compression libraries for bzip2 et al on 64 bit ubuntu 12.04.
Checknative
shows that the native compression libraries should be used:
hadoop checknative -a
14/10/07 15:15:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized
native-bzip2 library system-native
14/10/07 15:15:57 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
Native library checking:
hadoop: true
/usr/local/hadoop-local-build/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1
I have verified that the io.compression.codec.bzip2.library configuration uses
the default
system-native.
Thanks,
Don