As I was writing a long-ish message to explain how it doesn't work, it
dawned on me that maybe driver connects to executors only after there's
some work to do (while I was trying to find the number of executors BEFORE
starting the actual work).
So the solution was to simply execute a dummy task (
Is there any reliable way to find out the number of executors
programatically - regardless of how the job is run? A method that
preferably works for spark-standalone, yarn, mesos, regardless whether the
code runs from the shell or not?
Things that I tried and don't work:
-
Hi Akhil,
I'm using spark 1.4.1.
Number of executors is not in the command line, not in the
getExecutorMemoryStatus
(I already mentioned that I tried that, works in spark-shell but not when
executed via spark-submit). I tried looking at defaultParallelism too,
it's 112 (7 executors * 16 cores)
Which version spark are you using? There was a discussion happened over
here
http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html
Following is a method that retrieves the list of executors registered to a
spark context. It worked perfectly with spark-submit in standalone mode for my
project.
/** * A simplified method that just returns the current active/registered
executors * excluding the driver. * @param sc *