Also, if you want finalized outputs in LZO, set "mapred.output.compression.codec" to that codec. You have it set to Snappy presently.
On Wed, Feb 1, 2012 at 2:04 PM, Marek Miglinski <[email protected]> wrote: > Hello guys, > > I have a Clouderas CDH3U2 package installed on a 3 node cluster and I've > added to mapred-site: > <property> > <name>mapred.compress.map.output</name> > <value>true</value> > </property> > > <property> > <name>mapred.map.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > > Also to my pig job properties: > <property> > <name>io.compression.codec.lzo.class</name> > <value>com.hadoop.compression.lzo.LzoCodec</value> > </property> > <property> > <name>pig.tmpfilecompression</name> > <value>true</value> > </property> > <property> > <name>pig.tmpfilecompression.codec</name> > <value>lzo</value> > </property> > <property> > <name>mapred.output.compress</name> > <value>true</value> > </property> > <property> > <name>mapred.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > </property> > <property> > <name>mapred.compress.map.output</name> > <value>true</value> > </property> > <property> > <name>mapred.map.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > <property> > <name>mapreduce.map.output.compress</name> > <value>true</value> > </property> > <property> > <name>mapreduce.map.output.compress.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > > So I want PIG to compress it's data with LZO but mapreduce with Snappy, but > as I see in the tasktracker details (Map Bytes Out) data is not compressed at > all, which reduces performance a lot (IO is 100% most of the time)... What am > I doing wrong and how do I fix it? > > > Thanks, > Marek M. -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
