I am still a little bit confused about workers, executors and JVMs in standalone mode. Are worker processes and executors independent JVMs or do executors run within the worker JVM? I have some memory-rich nodes (192GB) and I would like to avoid deploying massive JVMs due to well known performance issues (GC and such). As of Spark 1.4 it is possible to either deploy multiple workers (SPARK_WORKER_INSTANCES + SPARK_WORKER_CORES) or multiple executors per worker (--executor-cores). Which option is preferable and why?
Thanks, Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini