As I was writing a long-ish message to explain how it doesn't work, it
dawned on me that maybe driver connects to executors only after there's
some work to do (while I was trying to find the number of executors BEFORE
starting the actual work).

So the solution was to simply execute a dummy task (
sparkContext.parallelize(1 until 1000, 200).reduce(_+_) ) before attempting
to retrieve the executors. It works now :)


On Sat, Aug 22, 2015 at 12:44 AM, Du Li <> wrote:

> Following is a method that retrieves the list of executors registered to a
> spark context. It worked perfectly with spark-submit in standalone mode for
> my project.
> /**
>    * A simplified method that just returns the current active/registered
> executors
>    * excluding the driver.
>    * @param sc
>    *           The spark context to retrieve registered executors.
>    * @return
>    *         A list of executors each in the form of host:port.
>    */
>   def currentActiveExecutors(sc: SparkContext): Seq[String] = {
>     val allExecutors =
>     val driverHost: String = sc.getConf.get("")
>     allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
>   }
> On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <>
> wrote:
> Hi Akhil,
> I'm using spark 1.4.1.
> Number of executors is not in the command line, not in the 
> getExecutorMemoryStatus
> (I already mentioned that I tried that, works in spark-shell but not when
> executed via spark-submit). I tried looking at "defaultParallelism" too,
> it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when
> ran via spark-submit.
> But the scheduler obviously knows this information. It *must* know it. How
> can I access it? Other that parsing the HTML of the WebUI, that is...
> that's pretty much guaranteed to work, and maybe I'll do that, but it's
> extremely convoluted.
> Regards,
> Virgil.
> On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <>
> wrote:
> Which version spark are you using? There was a discussion happened over
> here
> On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <> wrote:
> Is there any reliable way to find out the number of executors
> programatically - regardless of how the job  is run? A method that
> preferably works for spark-standalone, yarn, mesos, regardless whether the
> code runs from the shell or not?
> Things that I tried and don't work:
> - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
> does not work if task submitted via  spark-submit
> - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't
> work unless explicitly configured
> - call to http://master:8080/json (this used to work, but doesn't
> anymore?)
> I guess I could parse the output html from the Spark UI... but that seems
> dumb. is there really no better way?
> Thanks,
> Virgil.

Reply via email to