Re: How to force parallel processing of RDD using multiple thread

Sean Owen Thu, 15 Jan 2015 20:06:07 -0800

Check the number of partitions in your input. It may be much less than
the available parallelism of your small cluster. For example, input
that lives in just 1 partition will spawn just 1 task.


Beyond that parallelism just happens. You can see the parallelism of
each operation in the Spark UI.

On Thu, Jan 15, 2015 at 10:53 PM, Wang, Ningjun (LNG-NPV)
<[email protected]> wrote:
> Spark Standalone cluster.
>
> My program is running very slow, I suspect it is not doing parallel 
> processing of rdd. How can I force it to run parallel? Is there anyway to 
> check whether it is processed in parallel?
>
> Regards,
>
> Ningjun Wang
> Consulting Software Engineer
> LexisNexis
> 121 Chanlon Road
> New Providence, NJ 07974-1541
>
>
> -----Original Message-----
> From: Sean Owen [mailto:[email protected]]
> Sent: Thursday, January 15, 2015 4:29 PM
> To: Wang, Ningjun (LNG-NPV)
> Cc: [email protected]
> Subject: Re: How to force parallel processing of RDD using multiple thread
>
> What is your cluster manager? For example on YARN you would specify 
> --executor-cores. Read:
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
> On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) 
> <[email protected]> wrote:
>> I have a standalone spark cluster with only one node with 4 CPU cores.
>> How can I force spark to do parallel processing of my RDD using
>> multiple threads? For example I can do the following
>>
>>
>>
>> Spark-submit  --master local[4]
>>
>>
>>
>> However I really want to use the cluster as follow
>>
>>
>>
>> Spark-submit  --master spark://10.125.21.15:7070
>>
>>
>>
>> In that case, how can I make sure the RDD is processed with multiple
>> threads/cores?
>>
>>
>>
>> Thanks
>>
>> Ningjun
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: How to force parallel processing of RDD using multiple thread

Reply via email to