We generally recommend setting yarn.scheduler.maximum-allocation-mbto the maximum node capacity.
-Sandy On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta <soumya.sima...@gmail.com> wrote: > I just checked the YARN config and looks like I need to change this value. > Should be upgraded to 48G (the max memory allocated to YARN) per node ? > > <property> > <name>yarn.scheduler.maximum-allocation-mb</name> > <value>6144</value> > <source>java.io.BufferedInputStream@2e7e1ee</source> > </property> > > > On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <soumya.sima...@gmail.com> > wrote: > >> Andrew, >> >> Thanks for your response. >> >> When I try to do the following. >> >> ./spark-shell --executor-memory 46g --master yarn >> >> I get the following error. >> >> Exception in thread "main" java.lang.Exception: When running with master >> 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the >> environment. >> >> at >> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166) >> >> at >> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61) >> >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50) >> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> After this I set the following env variable. >> >> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/ >> >> The program launches but then halts with the following error. >> >> >> *14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104 >> MB), is above the max threshold (6144 MB) of this cluster.* >> >> I guess this is some YARN setting that is not set correctly. >> >> >> Thanks >> >> -Soumya >> >> >> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com> wrote: >> >>> Hi Soumya, >>> >>> The driver's console output prints out how much memory is actually >>> granted to each executor, so from there you can verify how much memory the >>> executors are actually getting. You should use the '--executor-memory' >>> argument in spark-shell. For instance, assuming each node has 48G of memory, >>> >>> bin/spark-shell --executor-memory 46g --master yarn >>> >>> We leave a small cushion for the OS so we don't take up all of the >>> entire system's memory. This option also applies to the standalone mode >>> you've been using, but if you have been using the ec2 scripts, we set >>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically >>> so you don't have to specify it each time on the command line. Of course, >>> you can also do the same in YARN. >>> >>> -Andrew >>> >>> >>> >>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>: >>> >>> I've been using the standalone cluster all this time and it worked fine. >>>> Recently I'm using another Spark cluster that is based on YARN and I've >>>> not experience with YARN. >>>> >>>> The YARN cluster has 10 nodes and a total memory of 480G. >>>> >>>> I'm having trouble starting the spark-shell with enough memory. >>>> I'm doing a very simple operation - reading a file 100GB from HDFS and >>>> running a count on it. This fails due to out of memory on the executors. >>>> >>>> Can someone point to the command line parameters that I should use for >>>> spark-shell so that it? >>>> >>>> >>>> Thanks >>>> -Soumya >>>> >>>> >>> >> >