Re: squestion on using spark parallelism vs using num partitions in spark api

huangjay Fri, 17 Jan 2014 05:04:57 -0800

Please use local[24].

Sent from my iPad.


> 在 2014年1月15日，上午2:35，[email protected] 写道：
> 
> I am using local
>  
> Thanks,
> Hussam
>  
> From: Huangguowei [mailto:[email protected]] 
> Sent: Tuesday, January 14, 2014 4:43 AM
> To: [email protected]
> Subject: 答复: squestion on using spark parallelism vs using num partitions in 
> spark api
>  
> “Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”
>  
> Local or standalone(single node) ?
>  
> 发件人: [email protected] [mailto:[email protected]] 
> 发送时间: 2014年1月14日 13:42
> 收件人: user
> 主题: Re: squestion on using spark parallelism vs using num partitions in spark 
> api
>  
> I think the parallelism param just control how many tasks could be run 
> together in each work.
> it could't control how many tasks should be split .
>  
> [email protected]
>  
> From: [email protected]
> Date: 2014-01-14 09:17
> To: [email protected]
> Subject: squestion on using spark parallelism vs using num partitions in 
> spark api
> Hi,
>  
> Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node
>  
> It’s looks like upon setting spark parallelism using 
> System.setProperty("spark.default.parallelism", 24) before creating my spark 
> context as described in 
> http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism
>  has no effect on the default number of partitions that spark uses in its 
> api’s like saveAsTextFile() .
>  
> For example if I set spark.default.parallelism to 24, I was expecting 24 
> tasks to be invoked upon calling saveAsTextFile() but it’s not the case as I 
> am seeing only 1 task get invoked
>  
> If I set my RDD parallelize() to 2 as
> dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
> then invoke
> dataSetRDD.saveAsTextFile(JavaRddFilePath);
>  
> I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 
> 24
>  
> Can someone explain the above behavior?
>  
> Thanks,
> Hussam

Re: squestion on using spark parallelism vs using num partitions in spark api

Reply via email to