I tried the following and it does not work: mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01 -x 100 \ -Dmapreduce.map.output.compress=false
mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01 -x 100 \ -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec And still getting the default codec being used (which is Snappy in this case and I don't want the users to have to install native snappy which is why I'm trying to override this param). Passing -Dkey=value on the mahout command line does not seem to have any effect on the mapreduce job configuration from what I can tell. Any ideas? -Luke On 3/6/12 3:48 PM, "Sean Owen" <[email protected]> wrote: >Mapper compression? -Dmapreduce.map.output.compress=false. I think the >key was mapred.output.compress in Hadoop 0.20.0. >I am not sure if there is reducer compression built-in, but, I could >have missed it. > >On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand ><[email protected]> wrote: >> Hello, >> >> Is there a way to run the mahout kmeans program from the command line, >>with a parameter that will override (and disable) the reducer task >>compression? I have tried several different ways of specifying -D >>parameter but I can't seem to get any options to pass through to the >>hadoop mapreduce configuration. >> >> Thanks! >> Luke
