Memory configuration of standalone clusters

Andrew Ash Wed, 04 Dec 2013 12:44:31 -0800

Hello,

I have a few questions about configuring memory usage on standalone
clusters.  Can someone help me out?

1) The terms "slave" in ./bin/start-slaves.sh and "worker" in the docs seem
to be used interchangeably. Are they the same?

2) On a worker/slave, is there only one JVM running that has all the data
in it, or is there a separate JVM spun up for each application (Hadoop
style)? Ignoring the SPARK_WORKER_INSTANCES setting.

3) There are lots of configuration options for defining memory usage. What
do they all mean? Is my below summary correct?
a) SPARK_WORKER_MEMORY -- maximum amount of memory that the Spark worker
will ever use, regardless of applications started. This sets the maximum
heap size of the Spark worker JVM
b) ./bin/start-worker.sh --memory -- same as SPARK_WORKER_MEMORY? If both
are set, which takes priority?
c) SPARK_JAVA_OPTS="-Xmx512m -Xms512m" -- same as (a) and (b)?
d) -Dspark.executor.memory -- maximum amount of memory this particular
application will use on a single worker

If that's correct I'll send a PR to the docs that would have clarified
these for me.

http://spark.incubator.apache.org/docs/latest/configuration.html#system-properties
http://spark.incubator.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts

4) If I set SPARK_WORKER_MEMORY really high, I think I then also have to
set spark.executor.memory really high to take advantage of it (since
spark.executor.memory is 512m by default). Is there a way to optimize a
cluster for the "one big application" scenario better than manually keeping
these in sync? I'd like spark.executor.memory to match SPARK_WORKER_MEMORY
if it's not set, I think.

Thanks!
Andrew

Memory configuration of standalone clusters

Reply via email to