I have a happy healthy Mesos cluster (0.24) running in my lab. I've compiled spark-1.5.0 and it seems to be working fine, except for one small issue, my tasks all seem to run on one node. (I have 6 in the cluster).
Basically, I have directory of compressed text files. Compressed, these 25 files add up to 1.2 GB of data, in bin/pyspark I do: txtfiles = sc.textFile("/path/to/my/data/*") txtfiles.count() This goes through and gives me the correct count, but all my tasks (25 of them) run on one node, let's call it node4. Interesting. So I was running spark from node4, but I would have thought it would have hit up more nodes. So I ran it on node5. In executors tab on the spark UI, there is only one registered, and it's node4, and once again all tasks ran on node4. I am running in fine grain mode... is there a setting somewhere to allow for more executors? This seems weird. I've been away from Spark from 1.2.x but I don't seem to remember this...