Re: override mapreduce compression?

Sean Owen Tue, 06 Mar 2012 16:24:43 -0800

-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I
recall). Or you configure this in your Hadoop config files.  It has no
meaning to the driver script. Why do you want to disable compression
after the mapper?


On Wed, Mar 7, 2012 at 12:11 AM, Luke Forehand
<[email protected]> wrote:
> I tried the following and it does not work:
>
> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
> -x 100 \
> -Dmapreduce.map.output.compress=false
>
> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd 0.01
> -x 100 \
> -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
>
>
> And still getting the default codec being used (which is Snappy in this
> case and I don't want the users to have to install native snappy which is
> why I'm trying to override this param).  Passing -Dkey=value on the mahout
> command line does not seem to have any effect on the mapreduce job
> configuration from what I can tell.  Any ideas?
>
> -Luke
>
> On 3/6/12 3:48 PM, "Sean Owen" <[email protected]> wrote:
>
>>Mapper compression? -Dmapreduce.map.output.compress=false. I think the
>>key was mapred.output.compress in Hadoop 0.20.0.
>>I am not sure if there is reducer compression built-in, but, I could
>>have missed it.
>>
>>On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand
>><[email protected]> wrote:
>>> Hello,
>>>
>>> Is there a way to run the mahout kmeans program from the command line,
>>>with a parameter that will override (and disable) the reducer task
>>>compression?  I have tried several different ways of specifying -D
>>>parameter but I can't seem to get any options to pass through to the
>>>hadoop mapreduce configuration.
>>>
>>> Thanks!
>>> Luke
>

Re: override mapreduce compression?

Reply via email to