I tried the following and it does not work:

mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
/mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
-x 100 \
-Dmapreduce.map.output.compress=false

mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
/mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
-x 100 \
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec


And still getting the default codec being used (which is Snappy in this
case and I don't want the users to have to install native snappy which is
why I'm trying to override this param).  Passing -Dkey=value on the mahout
command line does not seem to have any effect on the mapreduce job
configuration from what I can tell.  Any ideas?

-Luke

On 3/6/12 3:48 PM, "Sean Owen" <[email protected]> wrote:

>Mapper compression? -Dmapreduce.map.output.compress=false. I think the
>key was mapred.output.compress in Hadoop 0.20.0.
>I am not sure if there is reducer compression built-in, but, I could
>have missed it.
>
>On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand
><[email protected]> wrote:
>> Hello,
>>
>> Is there a way to run the mahout kmeans program from the command line,
>>with a parameter that will override (and disable) the reducer task
>>compression?  I have tried several different ways of specifying -D
>>parameter but I can't seem to get any options to pass through to the
>>hadoop mapreduce configuration.
>>
>> Thanks!
>> Luke

Reply via email to