Hi Andrei, thank you for your reply. As I wrote in my previous message, I have this in my hadoop-ec2.properties file: whirr.hadoop.version=0.20.204.0 whirr.hadoop.tarball.url=http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz
Therefore I am using Apache Hadoop release 0.20.204.0. Can anyone confirm that you cannot use compression with Hadoop clusters installed via Whirr? Paolo On 7 October 2011 14:23, Andrei Savu <[email protected]> wrote: > I am not sure but I think you need a custom Hadoop build to support > compression. Are you using the Apache release or CDH? > > On Oct 7, 2011 3:57 PM, "Paolo Castagna" <[email protected]> > wrote: >> >> Hi, >> I am using Apache Whirr 0.6.0-incubating to create small (i.e. 10-20 >> nodes) >> Hadoop clusters on Amazon EC2 for testing. >> >> In my hadoop-ec2.properties I have: >> >> whirr.hardware-id=m1.large >> # See http://alestic.com/ >> whirr.image-id=eu-west-1/ami-8293a5f6 >> whirr.location-id=eu-west-1 >> whirr.hadoop.version=0.20.204.0 >> >> whirr.hadoop.tarball.url=http://archive.apache.org/dist/hadoop/core/hadoop-${whirr.hadoop.version}/hadoop-${whirr.hadoop.version}.tar.gz >> >> I am able to successfully run MapReduce jobs but I see errors as soon as I >> try to enable compression (either for the map output or for the >> SequenceFile >> at the end of a job). >> >> In the task logs I see this warning, which I think is relevant: >> WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes >> where applicable >> >> This is what I have in my "driver(s)", for the intermediate map output: >> >> if ( useCompression ) { >> configuration.setBoolean("mapred.compress.map.output", true); >> configuration.set("mapred.output.compression.type", "BLOCK"); >> configuration.set("mapred.map.output.compression.codec", >> "org.apache.hadoop.io.compress.GzipCodec"); >> } >> >> For the final job output: >> >> if ( useCompression ) { >> SequenceFileOutputFormat.setCompressOutput(job, true); >> SequenceFileOutputFormat.setOutputCompressorClass(job, >> GzipCodec.class); >> SequenceFileOutputFormat.setOutputCompressionType(job, >> CompressionType.BLOCK); >> } >> >> As you can see I am trying to use a simple GzipCodec, nothing strange. >> >> What am I doing wrong? >> What should I do in order to be able to use compression in my MapReduce >> jobs? >> >> Thank you in advance for your help, >> Paolo >
