When you click on a stage in the Spark UI at 4040, you can see how many tasks are running concurrently.
How many tasks should I expect to see running concurrently, given I have things set up optimally in my cluster, and my RDDs are partitioned properly? Is it the total number of virtual cores across all my slaves? I devised the following script to give me that number for a cluster created by spark-ec2. # spark-ec2 cluster # run on driver node # total number of virtual cores across all slaves yum install -y pssh { nproc; pssh -i -h /root/spark-ec2/slaves nproc; } | grep -v "SUCCESS" | paste -sd+ | bc Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-level-of-parallelism-should-I-expect-from-my-cluster-tp3999.html Sent from the Apache Spark User List mailing list archive at Nabble.com.