Re: Mahout 1.0: parallelism/number tasks during SimilarityAnalysis.rowSimilarity

Ted Dunning Mon, 13 Oct 2014 11:34:19 -0700

On Mon, Oct 13, 2014 at 12:32 PM, Reinis Vicups <[email protected]> wrote:


>
>  Do you think that simply increasing this parameter is a safe and sane
>> thing
>> to do?
>>
>
> Why would it be unsafe?
>
> In my own implementation I am using 400 tasks on my 4-node-2cpu cluster
> and the execution times of largest shuffle stage have dropped around 10
> times.
> I have number of test values back from the time when I used "old"
> RowSimilarityJob and with some exceptions (I guess due to randomized
> sparsization) I still have approx. the same values with my own row
> similarity implementation.
>

Splitting things too far can make processes much less efficient.  Setting
parameters like this may propagate further than desired.

I asked because I don't know, however.

Re: Mahout 1.0: parallelism/number tasks during SimilarityAnalysis.rowSimilarity

Reply via email to