So, I should be using spark.sql.shuffle.partitions to control the parallelism? Is there there a guide to how to tune this?
Thank you, Saliya On Wed, Jan 18, 2017 at 2:01 PM, Yong Zhang <java8...@hotmail.com> wrote: > spark.sql.shuffle.partitions is not only controlling of the Spark SQL, but > also in any implementation based on Spark DataFrame. > > > If you are using "spark.ml" package, then most ML libraries in it are > based on DataFrame. So you shouldn't use "spark.default.parallelism", > instead of "spark.sql.shuffle.partitions". > > > Yong > > > ------------------------------ > *From:* Saliya Ekanayake <esal...@gmail.com> > *Sent:* Wednesday, January 18, 2017 12:33 PM > *To:* spline_pal...@yahoo.com > *Cc:* jasbir.s...@accenture.com; User > *Subject:* Re: Spark #cores > > The Spark version I am using is 2.10. The language is Scala. This is > running in standalone cluster mode. > > Each worker is able to use all physical CPU cores in the cluster as is the > default case. > > I was using the following parameters to spark-submit > > --conf spark.executor.cores=1 --conf spark.default.parallelism=32 > > Later, I read that the term "cores" doesn't mean physical CPU cores but > rather #tasks that an executor can execute. > > Anyway, I don't have a clear idea how to set the number of executors per > physical node. I see there's an option in the Yarn mode, but it's not > available for standalone cluster mode. > > Thank you, > Saliya > > On Wed, Jan 18, 2017 at 12:13 PM, Palash Gupta <spline_pal...@yahoo.com> > wrote: > >> Hi, >> >> Can you please share how you are assigning cpu core & tell us spark >> version and language you are using? >> >> //Palash >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> >> On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake >> <esal...@gmail.com> wrote: >> Thank you, for the quick response. No, this is not Spark SQL. I am >> running the built-in PageRank. >> >> On Wed, Jan 18, 2017 at 10:33 AM, <jasbir.s...@accenture.com> wrote: >> >>> Are you talking here of Spark SQL ? >>> >>> If yes, spark.sql.shuffle.partitions needs to be changed. >>> >>> >>> >>> *From:* Saliya Ekanayake [mailto:esal...@gmail.com] >>> *Sent:* Wednesday, January 18, 2017 8:56 PM >>> *To:* User <user@spark.apache.org> >>> *Subject:* Spark #cores >>> >>> >>> >>> Hi, >>> >>> >>> >>> I am running a Spark application setting the number of executor cores 1 >>> and a default parallelism of 32 over 8 physical nodes. >>> >>> >>> >>> The web UI shows it's running on 200 cores. I can't relate this number >>> to the parameters I've used. How can I control the parallelism in a more >>> deterministic way? >>> >>> >>> >>> Thank you, >>> >>> Saliya >>> >>> >>> >>> -- >>> >>> Saliya Ekanayake, Ph.D >>> >>> Applied Computer Scientist >>> >>> Network Dynamics and Simulation Science Laboratory (NDSSL) >>> >>> Virginia Tech, Blacksburg >>> >>> >>> >>> ------------------------------ >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise confidential information. If you have >>> received it in error, please notify the sender immediately and delete the >>> original. Any other use of the e-mail by you is prohibited. Where allowed >>> by local law, electronic communications with Accenture and its affiliates, >>> including e-mail and instant messaging (including content), may be scanned >>> by our systems for the purposes of information security and assessment of >>> internal compliance with Accenture policy. >>> ______________________________ ______________________________ >>> __________________________ >>> >>> www.accenture.com >>> >> >> >> >> -- >> Saliya Ekanayake, Ph.D >> Applied Computer Scientist >> Network Dynamics and Simulation Science Laboratory (NDSSL) >> Virginia Tech, Blacksburg >> >> > > > -- > Saliya Ekanayake, Ph.D > Applied Computer Scientist > Network Dynamics and Simulation Science Laboratory (NDSSL) > Virginia Tech, Blacksburg > > -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg