I have a cluster with 3 nodes (each with 8 cores) using Spark 1.0.1.

I have an RDD<String> which I've repartitioned so it has 100 partitions (hoping 
to increase the parallelism).

When I do a transformation (such as filter) on this RDD, I can't  seem to get 
more than 24 tasks (my total number of cores across the 3 nodes) going at one 
point in time.  By tasks, I mean the number of tasks that appear under the 
Application UI.  I tried explicitly setting the spark.default.parallelism to 48 
(hoping I would get 48 tasks concurrently running) and verified this in the 
Application UI for the running application but this had no effect.  Perhaps, 
this is ignored for a 'filter' and the default is the total number of cores 
available.

I'm fairly new with Spark so maybe I'm just missing or misunderstanding 
something fundamental.  Any help would be appreciated.

Thanks.

Darin.

Reply via email to