Ok but you're talking about reducer output not mapper. It should not be compressed in the first place. On Mar 7, 2012 12:29 AM, "Luke Forehand" < [email protected]> wrote:
> I want the results of the kmeans clustering to be uncompressed or > compressed in a way that my users can natively decompress on their > machines. All our other hadoop jobs use Snappy compression when writing > output, but our users don't have Snappy and don't particularly want to > install it (especially because of problems installing on mac). I'll try > adding this param to the HADOOP_OPTS and in the longterm probably come up > with a cleaner way to do this. Thanks! > > -Luke > > On 3/6/12 6:24 PM, "Sean Owen" <[email protected]> wrote: > > >-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I > >recall). Or you configure this in your Hadoop config files. It has no > >meaning to the driver script. Why do you want to disable compression > >after the mapper? > > > >On Wed, Mar 7, 2012 at 12:11 AM, Luke Forehand > ><[email protected]> wrote: > >> I tried the following and it does not work: > >> > >> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c > >> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01 > >> -x 100 \ > >> -Dmapreduce.map.output.compress=false > >> > >> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c > >> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01 > >> -x 100 \ > >> > >>-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec > >> > >> > >> And still getting the default codec being used (which is Snappy in this > >> case and I don't want the users to have to install native snappy which > >>is > >> why I'm trying to override this param). Passing -Dkey=value on the > >>mahout > >> command line does not seem to have any effect on the mapreduce job > >> configuration from what I can tell. Any ideas? > >> > >> -Luke > >> > >> On 3/6/12 3:48 PM, "Sean Owen" <[email protected]> wrote: > >> > >>>Mapper compression? -Dmapreduce.map.output.compress=false. I think the > >>>key was mapred.output.compress in Hadoop 0.20.0. > >>>I am not sure if there is reducer compression built-in, but, I could > >>>have missed it. > >>> > >>>On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand > >>><[email protected]> wrote: > >>>> Hello, > >>>> > >>>> Is there a way to run the mahout kmeans program from the command line, > >>>>with a parameter that will override (and disable) the reducer task > >>>>compression? I have tried several different ways of specifying -D > >>>>parameter but I can't seem to get any options to pass through to the > >>>>hadoop mapreduce configuration. > >>>> > >>>> Thanks! > >>>> Luke > >> > >
