Thank you Nishkam, I have read your code. So, for the sake of my understanding, it seems that for each spark context there is one executor per node? Can anyone confirm this?
-- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:12 AM, Nishkam Ravi <nr...@cloudera.com> wrote: > See if this helps: > > https://github.com/nishkamravi2/SparkAutoConfig/ > > It's a very simple tool for auto-configuring default parameters in Spark. > Takes as input high-level parameters (like number of nodes, cores per node, > memory per node, etc) and spits out default configuration, user advice and > command line. Compile (javac SparkConfigure.java) and run (java > SparkConfigure). > > Also cc'ing dev in case others are interested in helping evolve this over > time (by refining the heuristics and adding more parameters). > > > On Wed, Jul 23, 2014 at 8:31 AM, Martin Goodson <mar...@skimlinks.com> > wrote: > >> Thanks Andrew, >> >> So if there is only one SparkContext there is only one executor per >> machine? This seems to contradict Aaron's message from the link above: >> >> "If each machine has 16 GB of RAM and 4 cores, for example, you might set >> spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by Spark.)" >> >> Am I reading this incorrectly? >> >> Anyway our configuration is 21 machines (one master and 20 slaves) each >> with 60Gb. We would like to use 4 cores per machine. This is pyspark so we >> want to leave say 16Gb on each machine for python processes. >> >> Thanks again for the advice! >> >> >> >> -- >> Martin Goodson | VP Data Science >> (0)20 3397 1240 >> [image: Inline image 1] >> >> >> On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <and...@andrewash.com> wrote: >> >>> Hi Martin, >>> >>> In standalone mode, each SparkContext you initialize gets its own set of >>> executors across the cluster. So for example if you have two shells open, >>> they'll each get two JVMs on each worker machine in the cluster. >>> >>> As far as the other docs, you can configure the total number of cores >>> requested for the SparkContext, the amount of memory for the executor JVM >>> on each machine, the amount of memory for the Master/Worker daemons (little >>> needed since work is done in executors), and several other settings. >>> >>> Which of those are you interested in? What spec hardware do you have >>> and how do you want to configure it? >>> >>> Andrew >>> >>> >>> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <mar...@skimlinks.com> >>> wrote: >>> >>>> We are having difficulties configuring Spark, partly because we still >>>> don't understand some key concepts. For instance, how many executors are >>>> there per machine in standalone mode? This is after having closely >>>> read the documentation several times: >>>> >>>> *http://spark.apache.org/docs/latest/configuration.html >>>> <http://spark.apache.org/docs/latest/configuration.html>* >>>> *http://spark.apache.org/docs/latest/spark-standalone.html >>>> <http://spark.apache.org/docs/latest/spark-standalone.html>* >>>> *http://spark.apache.org/docs/latest/tuning.html >>>> <http://spark.apache.org/docs/latest/tuning.html>* >>>> *http://spark.apache.org/docs/latest/cluster-overview.html >>>> <http://spark.apache.org/docs/latest/cluster-overview.html>* >>>> >>>> The cluster overview has some information here about executors but is >>>> ambiguous about whether there are single executors or multiple executors on >>>> each machine. >>>> >>>> This message from Aaron Davidson implies that the executor memory >>>> should be set to total available memory on the machine divided by the >>>> number of cores: >>>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E >>>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vj_plbve6zh_dn5sjwpznpbcp...@mail.gmail.com%3E>* >>>> >>>> But other messages imply that the executor memory should be set to the >>>> *total* available memory of each machine. >>>> >>>> We would very much appreciate some clarity on this and the myriad of >>>> other memory settings available (daemon memory, worker memory etc). Perhaps >>>> a worked example could be added to the docs? I would be happy to provide >>>> some text as soon as someone can enlighten me on the technicalities! >>>> >>>> Thank you >>>> >>>> -- >>>> Martin Goodson | VP Data Science >>>> (0)20 3397 1240 >>>> [image: Inline image 1] >>>> >>> >>> >> >