When you click on a stage in the Spark UI at 4040, you can see how many
tasks are running concurrently.

How many tasks should I expect to see running concurrently, given I have
things set up optimally in my cluster, and my RDDs are partitioned properly?

Is it the total number of virtual cores across all my slaves?

I devised the following script to give me that number for a cluster created
by spark-ec2.

# spark-ec2 cluster
# run on driver node
# total number of virtual cores across all slaves
yum install -y pssh
{ nproc; pssh -i -h /root/spark-ec2/slaves nproc; } | grep -v "SUCCESS" |
paste -sd+ | bc

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/What-level-of-parallelism-should-I-expect-from-my-cluster-tp3999.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to