Why should it not be compressed in the first place?

Here is the header of one of the reducer parts that was written into
/mahout/kmeans/clusters-5-final

SEQorg.apache.hadoop.io.Text+org.apache.mahout.clustering.kmeans.Cluster
)org.apache.hadoop.io.compress.SnappyCodec


On 3/6/12 6:33 PM, "Sean Owen" <[email protected]> wrote:

>Ok but you're talking about reducer output not mapper. It should not be
>compressed in the first place.
>On Mar 7, 2012 12:29 AM, "Luke Forehand" <
>[email protected]> wrote:
>
>> I want the results of the kmeans clustering to be uncompressed or
>> compressed in a way that my users can natively decompress on their
>> machines.  All our other hadoop jobs use Snappy compression when writing
>> output, but our users don't have Snappy and don't particularly want to
>> install it (especially because of problems installing on mac).  I'll try
>> adding this param to the HADOOP_OPTS and in the longterm probably come
>>up
>> with a cleaner way to do this.  Thanks!
>>
>> -Luke
>>
>> On 3/6/12 6:24 PM, "Sean Owen" <[email protected]> wrote:
>>
>> >-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I
>> >recall). Or you configure this in your Hadoop config files.  It has no
>> >meaning to the driver script. Why do you want to disable compression
>> >after the mapper?
>> >
>> >On Wed, Mar 7, 2012 at 12:11 AM, Luke Forehand
>> ><[email protected]> wrote:
>> >> I tried the following and it does not work:
>> >>
>> >> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
>> >> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd
>>0.01
>> >> -x 100 \
>> >> -Dmapreduce.map.output.compress=false
>> >>
>> >> mahout kmeans -i /mahout/sparse/test1/tfidf-vectors -c
>> >> /mahout/initial-clusters/test1 -o /mahout/kmeans/test1 -k 10000 -cd
>>0.01
>> >> -x 100 \
>> >>
>> 
>>>>-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCod
>>>>ec
>> >>
>> >>
>> >> And still getting the default codec being used (which is Snappy in
>>this
>> >> case and I don't want the users to have to install native snappy
>>which
>> >>is
>> >> why I'm trying to override this param).  Passing -Dkey=value on the
>> >>mahout
>> >> command line does not seem to have any effect on the mapreduce job
>> >> configuration from what I can tell.  Any ideas?
>> >>
>> >> -Luke
>> >>
>> >> On 3/6/12 3:48 PM, "Sean Owen" <[email protected]> wrote:
>> >>
>> >>>Mapper compression? -Dmapreduce.map.output.compress=false. I think
>>the
>> >>>key was mapred.output.compress in Hadoop 0.20.0.
>> >>>I am not sure if there is reducer compression built-in, but, I could
>> >>>have missed it.
>> >>>
>> >>>On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand
>> >>><[email protected]> wrote:
>> >>>> Hello,
>> >>>>
>> >>>> Is there a way to run the mahout kmeans program from the command
>>line,
>> >>>>with a parameter that will override (and disable) the reducer task
>> >>>>compression?  I have tried several different ways of specifying -D
>> >>>>parameter but I can't seem to get any options to pass through to the
>> >>>>hadoop mapreduce configuration.
>> >>>>
>> >>>> Thanks!
>> >>>> Luke
>> >>
>>
>>

Reply via email to