As I was writing a long-ish message to explain how it doesn't work, it
dawned on me that maybe driver connects to executors only after there's
some work to do (while I was trying to find the number of executors BEFORE
starting the actual work).

So the solution was to simply execute a dummy task (
sparkContext.parallelize(1 until 1000, 200).reduce(_+_) ) before attempting
to retrieve the executors. It works now :)

Virgil.

On Sat, Aug 22, 2015 at 12:44 AM, Du Li <l...@yahoo-inc.com> wrote:

> Following is a method that retrieves the list of executors registered to a
> spark context. It worked perfectly with spark-submit in standalone mode for
> my project.
>
> /**
>    * A simplified method that just returns the current active/registered
> executors
>    * excluding the driver.
>    * @param sc
>    *           The spark context to retrieve registered executors.
>    * @return
>    *         A list of executors each in the form of host:port.
>    */
>   def currentActiveExecutors(sc: SparkContext): Seq[String] = {
>     val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
>     val driverHost: String = sc.getConf.get("spark.driver.host")
>     allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
>   }
>
>
>
>
> On Friday, August 21, 2015 1:53 PM, Virgil Palanciuc <virg...@gmail.com>
> wrote:
>
>
> Hi Akhil,
>
> I'm using spark 1.4.1.
> Number of executors is not in the command line, not in the 
> getExecutorMemoryStatus
> (I already mentioned that I tried that, works in spark-shell but not when
> executed via spark-submit). I tried looking at "defaultParallelism" too,
> it's 112 (7 executors * 16 cores) when ran via spark-shell, but just 2 when
> ran via spark-submit.
>
> But the scheduler obviously knows this information. It *must* know it. How
> can I access it? Other that parsing the HTML of the WebUI, that is...
> that's pretty much guaranteed to work, and maybe I'll do that, but it's
> extremely convoluted.
>
> Regards,
> Virgil.
>
> On Fri, Aug 21, 2015 at 11:35 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> Which version spark are you using? There was a discussion happened over
> here
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Determine-number-of-running-executors-td19453.html
>
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccacbyxk+ya1rbbnkwjheekpnbsbh10rykuzt-laqgpdanvhm...@mail.gmail.com%3E
> On Aug 21, 2015 7:42 AM, "Virgil Palanciuc" <vir...@palanciuc.eu> wrote:
>
> Is there any reliable way to find out the number of executors
> programatically - regardless of how the job  is run? A method that
> preferably works for spark-standalone, yarn, mesos, regardless whether the
> code runs from the shell or not?
>
> Things that I tried and don't work:
> - sparkContext.getExecutorMemoryStatus.size - 1 // works from the shell,
> does not work if task submitted via  spark-submit
> - sparkContext.getConf.getInt("spark.executor.instances", 1) - doesn't
> work unless explicitly configured
> - call to http://master:8080/json (this used to work, but doesn't
> anymore?)
>
> I guess I could parse the output html from the Spark UI... but that seems
> dumb. is there really no better way?
>
> Thanks,
> Virgil.
>
>
>
>
>
>

Reply via email to