The workers and executors run as separate JVM processes in the standalone mode.

The use of multiple workers on a single machine depends on how you will be 
using the clusters. If you run multiple Spark applications simultaneously, each 
application gets its own its executor. So, for example, if you allocate 8GB to 
each application, you can run 192/8 Spark applications simultaneously (assuming 
you also have large number of cores). Each executor has only 8GB heap, so GC 
should not be issue. Alternatively, if you know that you will have few 
applications running simultaneously on that cluster, running multiple workers 
on each machine will allow you to avoid GC issues associated with allocating 
large heap to a single JVM process. This option allows you to run multiple 
executors for an application on a single machine and each executor can be 
configured with optimal memory.


Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Simone Franzini [mailto:captainfr...@gmail.com]
Sent: Monday, May 2, 2016 9:27 AM
To: user
Subject: Fwd: Spark standalone workers, executors and JVMs

I am still a little bit confused about workers, executors and JVMs in 
standalone mode.
Are worker processes and executors independent JVMs or do executors run within 
the worker JVM?
I have some memory-rich nodes (192GB) and I would like to avoid deploying 
massive JVMs due to well known performance issues (GC and such).
As of Spark 1.4 it is possible to either deploy multiple workers 
(SPARK_WORKER_INSTANCES + SPARK_WORKER_CORES) or multiple executors per worker 
(--executor-cores). Which option is preferable and why?

Thanks,
Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

Reply via email to