Tried it first, to see if it indeed changes the parallelism you want to control in the pageRank you are running.
Starting it with the # of cores you want to give to your job, increasing it when your job fails due to GC OOM. Yong ________________________________ From: Saliya Ekanayake <esal...@gmail.com> Sent: Wednesday, January 18, 2017 3:21 PM To: Yong Zhang Cc: spline_pal...@yahoo.com; jasbir.s...@accenture.com; User Subject: Re: Spark #cores So, I should be using spark.sql.shuffle.partitions to control the parallelism? Is there there a guide to how to tune this? Thank you, Saliya On Wed, Jan 18, 2017 at 2:01 PM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: spark.sql.shuffle.partitions is not only controlling of the Spark SQL, but also in any implementation based on Spark DataFrame. If you are using "spark.ml<http://spark.ml>" package, then most ML libraries in it are based on DataFrame. So you shouldn't use "spark.default.parallelism", instead of "spark.sql.shuffle.partitions". Yong ________________________________ From: Saliya Ekanayake <esal...@gmail.com<mailto:esal...@gmail.com>> Sent: Wednesday, January 18, 2017 12:33 PM To: spline_pal...@yahoo.com<mailto:spline_pal...@yahoo.com> Cc: jasbir.s...@accenture.com<mailto:jasbir.s...@accenture.com>; User Subject: Re: Spark #cores The Spark version I am using is 2.10. The language is Scala. This is running in standalone cluster mode. Each worker is able to use all physical CPU cores in the cluster as is the default case. I was using the following parameters to spark-submit --conf spark.executor.cores=1 --conf spark.default.parallelism=32 Later, I read that the term "cores" doesn't mean physical CPU cores but rather #tasks that an executor can execute. Anyway, I don't have a clear idea how to set the number of executors per physical node. I see there's an option in the Yarn mode, but it's not available for standalone cluster mode. Thank you, Saliya On Wed, Jan 18, 2017 at 12:13 PM, Palash Gupta <spline_pal...@yahoo.com<mailto:spline_pal...@yahoo.com>> wrote: Hi, Can you please share how you are assigning cpu core & tell us spark version and language you are using? //Palash Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake <esal...@gmail.com<mailto:esal...@gmail.com>> wrote: Thank you, for the quick response. No, this is not Spark SQL. I am running the built-in PageRank. On Wed, Jan 18, 2017 at 10:33 AM, <jasbir.s...@accenture.com> wrote: Are you talking here of Spark SQL ? If yes, spark.sql.shuffle.partitions needs to be changed. From: Saliya Ekanayake [mailto:esal...@gmail.com] Sent: Wednesday, January 18, 2017 8:56 PM To: User <user@spark.apache.org> Subject: Spark #cores Hi, I am running a Spark application setting the number of executor cores 1 and a default parallelism of 32 over 8 physical nodes. The web UI shows it's running on 200 cores. I can't relate this number to the parameters I've used. How can I control the parallelism in a more deterministic way? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. ______________________________ ______________________________ __________________________ www.accenture.com<http://www.accenture.com> -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg