Please use local[24]. Sent from my iPad.
> 在 2014年1月15日,上午2:35,[email protected] 写道: > > I am using local > > Thanks, > Hussam > > From: Huangguowei [mailto:[email protected]] > Sent: Tuesday, January 14, 2014 4:43 AM > To: [email protected] > Subject: 答复: squestion on using spark parallelism vs using num partitions in > spark api > > “Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node” > > Local or standalone(single node) ? > > 发件人: [email protected] [mailto:[email protected]] > 发送时间: 2014年1月14日 13:42 > 收件人: user > 主题: Re: squestion on using spark parallelism vs using num partitions in spark > api > > I think the parallelism param just control how many tasks could be run > together in each work. > it could't control how many tasks should be split . > > [email protected] > > From: [email protected] > Date: 2014-01-14 09:17 > To: [email protected] > Subject: squestion on using spark parallelism vs using num partitions in > spark api > Hi, > > Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node > > It’s looks like upon setting spark parallelism using > System.setProperty("spark.default.parallelism", 24) before creating my spark > context as described in > http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism > has no effect on the default number of partitions that spark uses in its > api’s like saveAsTextFile() . > > For example if I set spark.default.parallelism to 24, I was expecting 24 > tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I > am seeing only 1 task get invoked > > If I set my RDD parallelize() to 2 as > dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2); > then invoke > dataSetRDD.saveAsTextFile(JavaRddFilePath); > > I am seeing 2 tasks get invoked even my spark.default.parallelism was set to > 24 > > Can someone explain the above behavior? > > Thanks, > Hussam
