As I was writing a long-ish message to explain how it doesn't work, it dawned on me that maybe driver connects to executors only after there's some work to do (while I was trying to find the number of executors BEFORE starting the actual work).
So the solution was to simply execute a dummy task ( sparkContext.parallelize(1 until 1000, 200).reduce(_+_) ) before attempting to retrieve the executors. It works now :) Virgil. On Sat, Aug 22, 2015 at 12:44 AM, Du Li <l...@yahoo-inc.com> wrote: > Following is a method that retrieves the list of executors registered to a > spark context. It worked perfectly with spark-submit in standalone mode for > my project. > > /** > * A simplified method that just returns the current active/registered > executors > * excluding the driver. > * @param sc > * The spark context to retrieve registered executors. > * @return > * A list of executors each in the form of host:port. > */ > def currentActiveExecutors(sc: SparkContext): Seq[String] = { > val allExecutors = sc.getExecutorMemoryStatus.map(_._1) > val driverHost: String = sc.getConf.get("spark.driver.host") > allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList > } > > > > > On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <virg...@gmail.com> > wrote: > > > Hi Akhil, > > I'm using spark 1.4.1. > Number of executors is not in the command line, not in the > getExecutorMemoryStatus > (I already mentioned that I tried that, works in spark-shell but not when > executed via spark-submit). I tried looking at "defaultParallelism" too, > it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when > ran via spark-submit. > > But the scheduler obviously knows this information. It *must* know it. How > can I access it? Other that parsing the HTML of the WebUI, that is... > that's pretty much guaranteed to work, and maybe I'll do that, but it's > extremely convoluted. > > Regards, > Virgil. > > On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > > Which version spark are you using? There was a discussion happened over > here > > http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html > > http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccacbyxk+ya1rbbnkwjheekpnbsbh10rykuzt-laqgpdanvhm...@mail.gmail.com%3E > On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vir...@palanciuc.eu> wrote: > > Is there any reliable way to find out the number of executors > programatically - regardless of how the job is run? A method that > preferably works for spark-standalone, yarn, mesos, regardless whether the > code runs from the shell or not? > > Things that I tried and don't work: > - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell, > does not work if task submitted via spark-submit > - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't > work unless explicitly configured > - call to http://master:8080/json (this used to work, but doesn't > anymore?) > > I guess I could parse the output html from the Spark UI... but that seems > dumb. is there really no better way? > > Thanks, > Virgil. > > > > > >