Hi, I am using Apache Whirr 0.6.0-incubating to create small (i.e. 10-20 nodes) Hadoop clusters on Amazon EC2 for testing.
In my hadoop-ec2.properties I have: whirr.hardware-id=m1.large # See http://alestic.com/ whirr.image-id=eu-west-1/ami-8293a5f6 whirr.location-id=eu-west-1 whirr.hadoop.version=0.20.204.0 whirr.hadoop.tarball.url=http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz I am able to successfully run MapReduce jobs but I see errors as soon as I try to enable compression (either for the map output or for the SequenceFile at the end of a job). In the task logs I see this warning, which I think is relevant: WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable This is what I have in my "driver(s)", for the intermediate map output: if ( useCompression ) { configuration.setBoolean("mapred.compress.map.output", true); configuration.set("mapred.output.compression.type", "BLOCK"); configuration.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec"); } For the final job output: if ( useCompression ) { SequenceFileOutputFormat.setCompressOutput(job, true); SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK); } As you can see I am trying to use a simple GzipCodec, nothing strange. What am I doing wrong? What should I do in order to be able to use compression in my MapReduce jobs? Thank you in advance for your help, Paolo
