Re: Mahout 1.0: parallelism/number tasks during SimilarityAnalysis.rowSimilarity

Ted Dunning Mon, 13 Oct 2014 09:07:58 -0700

On Mon, Oct 13, 2014 at 11:56 AM, Reinis Vicups <mah...@orbit-x.de> wrote:


> I have my own implementation of SimilarityAnalysis and by tuning number of
> tasks I have reached HUGE performance gains.
>
> Since I couldn't find how to pass the number of tasks to shuffle
> operations directly, I have set following in spark config
>
> configuration = new SparkConf().setAppName(jobConfig.jobName)
>         .set("spark.serializer", "org.apache.spark.serializer.
> KryoSerializer")
>         .set("spark.kryo.registrator", "org.apache.mahout.sparkbindings.io
> .MahoutKryoRegistrator")
>         .set("spark.kryo.referenceTracking", "false")
>         .set("spark.kryoserializer.buffer.mb", "200")
>         .set("spark.default.parallelism", 400) // <- this is the line
> supposed to set default parallelism to some high number
>
> Thank you for your help
>

Thank you for YOUR help!

Do you think that simply increasing this parameter is a safe and sane thing
to do?

Re: Mahout 1.0: parallelism/number tasks during SimilarityAnalysis.rowSimilarity

Reply via email to