Re: Running Spark shell on YARN

Andrew Or Fri, 15 Aug 2014 11:20:43 -0700

Hi Soumya,

The driver's console output prints out how much memory is actually granted
to each executor, so from there you can verify how much memory the
executors are actually getting. You should use the '--executor-memory'
argument in spark-shell. For instance, assuming each node has 48G of memory,


bin/spark-shell --executor-memory 46g --master yarn

We leave a small cushion for the OS so we don't take up all of the entire
system's memory. This option also applies to the standalone mode you've
been using, but if you have been using the ec2 scripts, we set
"spark.executor.memory" in conf/spark-defaults.conf for you automatically
so you don't have to specify it each time on the command line. Of course,
you can also do the same in YARN.

-Andrew



2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>:

> I've been using the standalone cluster all this time and it worked fine.
> Recently I'm using another Spark cluster that is based on YARN and I've
> not experience with YARN.
>
> The YARN cluster has 10 nodes and a total memory of 480G.
>
> I'm having trouble starting the spark-shell with enough memory.
> I'm doing a very simple operation - reading a file 100GB from HDFS and
> running a count on it. This fails due to out of memory on the executors.
>
> Can someone point to the command line parameters that I should use for
> spark-shell so that it?
>
>
> Thanks
> -Soumya
>
>

Re: Running Spark shell on YARN

Reply via email to