question about resource allocation on the spark standalone cluster

Tomer Benyamini Tue, 30 Jun 2015 23:45:54 -0700

Hello spark-users,

I would like to use the spark standalone cluster for multi-tenants, to run
multiple apps at the same time. The issue is, when submitting an app to the
spark standalone cluster, you cannot pass "--num-executors" like on yarn,
but only "--total-executor-cores". *This may cause starvation when
submitting multiple apps*. Here's an example: Say I have a cluster of 4
machines with 20GB RAM and 4 cores. In case I submit using
--total-executor-cores=4 and --executor-memory=20GB, I may get these 2
extreme resource allocations:
- 4 workers (on 4 machines) with 1 core each, 20GB each, blocking the
entire cluster
- 1 worker (on 1 machine) with 4 cores, 20GB for this machine, leaving 3
free machines to be used by other apps.


Is there a way to restrict / push the standalone cluster towards the 2nd
strategy (use all cores of a given worker before using a second worker)? A
workaround that we did is to set SPARK_WORKER_CORES to
1, SPARK_WORKER_MEMORY to 5gb and SPARK_WORKER_INSTANCES to 4, but this is
suboptimal since it runs 4 worker instances on 1 machine, which has the JVM
overhead, and does not allow to share memory across partitions on the same
worker.

Thanks,
Tomer

question about resource allocation on the spark standalone cluster

Reply via email to