Re: Spark #cores

Saliya Ekanayake Wed, 18 Jan 2017 12:22:43 -0800

So, I should be using spark.sql.shuffle.partitions to control the
parallelism? Is there there a guide to how to tune this?


Thank you,
Saliya

On Wed, Jan 18, 2017 at 2:01 PM, Yong Zhang <java8...@hotmail.com> wrote:

> spark.sql.shuffle.partitions is not only controlling of the Spark SQL, but
> also in any implementation based on Spark DataFrame.
>
>
> If you are using "spark.ml" package, then most ML libraries in it are
> based on DataFrame. So you shouldn't use "spark.default.parallelism",
> instead of "spark.sql.shuffle.partitions".
>
>
> Yong
>
>
> ------------------------------
> *From:* Saliya Ekanayake <esal...@gmail.com>
> *Sent:* Wednesday, January 18, 2017 12:33 PM
> *To:* spline_pal...@yahoo.com
> *Cc:* jasbir.s...@accenture.com; User
> *Subject:* Re: Spark #cores
>
> The Spark version I am using is 2.10. The language is Scala. This is
> running in standalone cluster mode.
>
> Each worker is able to use all physical CPU cores in the cluster as is the
> default case.
>
> I was using the following parameters to spark-submit
>
> --conf spark.executor.cores=1 --conf spark.default.parallelism=32
>
> Later, I read that the term "cores" doesn't mean physical CPU cores but
> rather #tasks that an executor can execute.
>
> Anyway, I don't have a clear idea how to set the number of executors per
> physical node. I see there's an option in the Yarn mode, but it's not
> available for standalone cluster mode.
>
> Thank you,
> Saliya
>
> On Wed, Jan 18, 2017 at 12:13 PM, Palash Gupta <spline_pal...@yahoo.com>
> wrote:
>
>> Hi,
>>
>> Can you please share how you are assigning cpu core & tell us spark
>> version and language you are using?
>>
>> //Palash
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake
>> <esal...@gmail.com> wrote:
>> Thank you, for the quick response. No, this is not Spark SQL. I am
>> running the built-in PageRank.
>>
>> On Wed, Jan 18, 2017 at 10:33 AM, <jasbir.s...@accenture.com> wrote:
>>
>>> Are you talking here of Spark SQL ?
>>>
>>> If yes, spark.sql.shuffle.partitions needs to be changed.
>>>
>>>
>>>
>>> *From:* Saliya Ekanayake [mailto:esal...@gmail.com]
>>> *Sent:* Wednesday, January 18, 2017 8:56 PM
>>> *To:* User <user@spark.apache.org>
>>> *Subject:* Spark #cores
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I am running a Spark application setting the number of executor cores 1
>>> and a default parallelism of 32 over 8 physical nodes.
>>>
>>>
>>>
>>> The web UI shows it's running on 200 cores. I can't relate this number
>>> to the parameters I've used. How can I control the parallelism in a more
>>> deterministic way?
>>>
>>>
>>>
>>> Thank you,
>>>
>>> Saliya
>>>
>>>
>>>
>>> --
>>>
>>> Saliya Ekanayake, Ph.D
>>>
>>> Applied Computer Scientist
>>>
>>> Network Dynamics and Simulation Science Laboratory (NDSSL)
>>>
>>> Virginia Tech, Blacksburg
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you have
>>> received it in error, please notify the sender immediately and delete the
>>> original. Any other use of the e-mail by you is prohibited. Where allowed
>>> by local law, electronic communications with Accenture and its affiliates,
>>> including e-mail and instant messaging (including content), may be scanned
>>> by our systems for the purposes of information security and assessment of
>>> internal compliance with Accenture policy.
>>> ______________________________ ______________________________
>>> __________________________
>>>
>>> www.accenture.com
>>>
>>
>>
>>
>> --
>> Saliya Ekanayake, Ph.D
>> Applied Computer Scientist
>> Network Dynamics and Simulation Science Laboratory (NDSSL)
>> Virginia Tech, Blacksburg
>>
>>
>
>
> --
> Saliya Ekanayake, Ph.D
> Applied Computer Scientist
> Network Dynamics and Simulation Science Laboratory (NDSSL)
> Virginia Tech, Blacksburg
>
>


-- 
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network Dynamics and Simulation Science Laboratory (NDSSL)
Virginia Tech, Blacksburg

Re: Spark #cores

Reply via email to