On Mon, Oct 13, 2014 at 12:32 PM, Reinis Vicups <[email protected]> wrote:
> > Do you think that simply increasing this parameter is a safe and sane >> thing >> to do? >> > > Why would it be unsafe? > > In my own implementation I am using 400 tasks on my 4-node-2cpu cluster > and the execution times of largest shuffle stage have dropped around 10 > times. > I have number of test values back from the time when I used "old" > RowSimilarityJob and with some exceptions (I guess due to randomized > sparsization) I still have approx. the same values with my own row > similarity implementation. > Splitting things too far can make processes much less efficient. Setting parameters like this may propagate further than desired. I asked because I don't know, however.
