Alright, so I guess I understand now why spark-ec2 allows you to select different instance types for the driver node and worker nodes. If the driver node is just driving and not doing any large collect()s or heavy processing, it can be much smaller than the worker nodes.
With regards to data locality, that may not be an issue in my usage pattern if, in theory, I wanted to make the driver node also do work. I launch clusters using spark-ec2 and source data from S3, so I'm missing out on that data locality benefit from the get-go. The firewall may be an issue if spark-ec2 doesn't punch open the appropriate holes. And it may well not, since it doesn't seem to have an option to configure the driver node to also do work. Anyway, I'll definitely leave things the way they are. If I want a beefier cluster, it's probably much easier to just launch a cluster with more slaves using spark-ec2 than it is to set the driver node to a non-default configuration. On Tue, Apr 8, 2014 at 4:48 PM, Sean Owen <so...@cloudera.com> wrote: > If you want the machine that hosts the driver to also do work, you can > designate it as a worker too, if I'm not mistaken. I don't think the > driver should do work, logically, but, that's not to say that the > machine it's on shouldn't do work. > -- > Sean Owen | Director, Data Science | London > > > On Tue, Apr 8, 2014 at 8:24 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > So I have a cluster in EC2 doing some work, and when I take a look here > > > > http://driver-node:4040/executors/ > > > > I see that my driver node is snoozing on the job: No tasks, no memory > used, > > and no RDD blocks cached. > > > > I'm assuming that it was a conscious design choice not to have the driver > > node partake in the cluster's workload. > > > > Why is that? It seems like a wasted resource. > > > > What's more, the slaves may rise up one day and overthrow the driver out > of > > resentment. > > > > Nick > > > > > > ________________________________ > > View this message in context: Why doesn't the driver node do any work? > > Sent from the Apache Spark User List mailing list archive at Nabble.com. >