+1 for such a document. ---- Eric Friedman
> On Aug 15, 2014, at 1:10 PM, Kevin Markey <kevin.mar...@oracle.com> wrote: > > Sandy and others: > > Is there a single source of Yarn/Hadoop properties that should be set or > reset for running Spark on Yarn? > We've sort of stumbled through one property after another, and (unless > there's an update I've not yet seen) CDH5 Spark-related properties are for > running the Spark Master instead of Yarn. > > Thanks > Kevin > >> On 08/15/2014 12:47 PM, Sandy Ryza wrote: >> We generally recommend setting yarn.scheduler.maximum-allocation-mbto the >> maximum node capacity. >> >> -Sandy >> >> >>> On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta <soumya.sima...@gmail.com> >>> wrote: >>> I just checked the YARN config and looks like I need to change this value. >>> Should be upgraded to 48G (the max memory allocated to YARN) per node ? >>> >>> <property> >>> <name>yarn.scheduler.maximum-allocation-mb</name> >>> <value>6144</value> >>> <source>java.io.BufferedInputStream@2e7e1ee</source> >>> </property> >>> >>> >>>> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <soumya.sima...@gmail.com> >>>> wrote: >>>> Andrew, >>>> >>>> Thanks for your response. >>>> >>>> When I try to do the following. >>>> ./spark-shell --executor-memory 46g --master yarn >>>> >>>> I get the following error. >>>> >>>> Exception in thread "main" java.lang.Exception: When running with master >>>> 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the >>>> environment. >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166) >>>> >>>> at >>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61) >>>> >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50) >>>> >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> >>>> After this I set the following env variable. >>>> >>>> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/ >>>> >>>> The program launches but then halts with the following error. >>>> >>>> >>>> 14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104 MB), >>>> is above the max threshold (6144 MB) of this cluster. >>>> >>>> I guess this is some YARN setting that is not set correctly. >>>> >>>> Thanks >>>> >>>> -Soumya >>>> >>>> >>>> >>>>> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com> wrote: >>>>> Hi Soumya, >>>>> >>>>> The driver's console output prints out how much memory is actually >>>>> granted to each executor, so from there you can verify how much memory >>>>> the executors are actually getting. You should use the >>>>> '--executor-memory' argument in spark-shell. For instance, assuming each >>>>> node has 48G of memory, >>>>> >>>>> bin/spark-shell --executor-memory 46g --master yarn >>>>> >>>>> We leave a small cushion for the OS so we don't take up all of the entire >>>>> system's memory. This option also applies to the standalone mode you've >>>>> been using, but if you have been using the ec2 scripts, we set >>>>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically >>>>> so you don't have to specify it each time on the command line. Of course, >>>>> you can also do the same in YARN. >>>>> >>>>> -Andrew >>>>> >>>>> >>>>> >>>>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>: >>>>> >>>>>> I've been using the standalone cluster all this time and it worked fine. >>>>>> Recently I'm using another Spark cluster that is based on YARN and I've >>>>>> not experience with YARN. >>>>>> >>>>>> The YARN cluster has 10 nodes and a total memory of 480G. >>>>>> >>>>>> I'm having trouble starting the spark-shell with enough memory. >>>>>> I'm doing a very simple operation - reading a file 100GB from HDFS and >>>>>> running a count on it. This fails due to out of memory on the executors. >>>>>> >>>>>> Can someone point to the command line parameters that I should use for >>>>>> spark-shell so that it? >>>>>> >>>>>> >>>>>> Thanks >>>>>> -Soumya > > --------------------------------------------------------------------- To > unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org