On Mon, Oct 13, 2014 at 11:56 AM, Reinis Vicups <mah...@orbit-x.de> wrote:
> I have my own implementation of SimilarityAnalysis and by tuning number of > tasks I have reached HUGE performance gains. > > Since I couldn't find how to pass the number of tasks to shuffle > operations directly, I have set following in spark config > > configuration = new SparkConf().setAppName(jobConfig.jobName) > .set("spark.serializer", "org.apache.spark.serializer. > KryoSerializer") > .set("spark.kryo.registrator", "org.apache.mahout.sparkbindings.io > .MahoutKryoRegistrator") > .set("spark.kryo.referenceTracking", "false") > .set("spark.kryoserializer.buffer.mb", "200") > .set("spark.default.parallelism", 400) // <- this is the line > supposed to set default parallelism to some high number > > Thank you for your help > Thank you for YOUR help! Do you think that simply increasing this parameter is a safe and sane thing to do?