Re: Running Spark shell on YARN

Sandy Ryza Fri, 15 Aug 2014 11:49:06 -0700

We generally recommend setting yarn.scheduler.maximum-allocation-mbto the
maximum node capacity.


-Sandy


On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta <soumya.sima...@gmail.com>
wrote:

> I just checked the YARN config and looks like I need to change this value.
> Should be upgraded to 48G (the max memory allocated to YARN) per node ?
>
> <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>6144</value>
> <source>java.io.BufferedInputStream@2e7e1ee</source>
> </property>
>
>
> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <soumya.sima...@gmail.com>
> wrote:
>
>> Andrew,
>>
>> Thanks for your response.
>>
>> When I try to do the following.
>>
>>  ./spark-shell --executor-memory 46g --master yarn
>>
>> I get the following error.
>>
>> Exception in thread "main" java.lang.Exception: When running with master
>> 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
>> environment.
>>
>> at
>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166)
>>
>> at
>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61)
>>
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50)
>>
>>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> After this I set the following env variable.
>>
>> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/
>>
>> The program launches but then halts with the following error.
>>
>>
>> *14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104
>> MB), is above the max threshold (6144 MB) of this cluster.*
>>
>> I guess this is some YARN setting that is not set correctly.
>>
>>
>> Thanks
>>
>> -Soumya
>>
>>
>> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com> wrote:
>>
>>> Hi Soumya,
>>>
>>> The driver's console output prints out how much memory is actually
>>> granted to each executor, so from there you can verify how much memory the
>>> executors are actually getting. You should use the '--executor-memory'
>>> argument in spark-shell. For instance, assuming each node has 48G of memory,
>>>
>>> bin/spark-shell --executor-memory 46g --master yarn
>>>
>>> We leave a small cushion for the OS so we don't take up all of the
>>> entire system's memory. This option also applies to the standalone mode
>>> you've been using, but if you have been using the ec2 scripts, we set
>>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically
>>> so you don't have to specify it each time on the command line. Of course,
>>> you can also do the same in YARN.
>>>
>>> -Andrew
>>>
>>>
>>>
>>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>:
>>>
>>> I've been using the standalone cluster all this time and it worked fine.
>>>> Recently I'm using another Spark cluster that is based on YARN and I've
>>>> not experience with YARN.
>>>>
>>>> The YARN cluster has 10 nodes and a total memory of 480G.
>>>>
>>>> I'm having trouble starting the spark-shell with enough memory.
>>>> I'm doing a very simple operation - reading a file 100GB from HDFS and
>>>> running a count on it. This fails due to out of memory on the executors.
>>>>
>>>> Can someone point to the command line parameters that I should use for
>>>> spark-shell so that it?
>>>>
>>>>
>>>> Thanks
>>>> -Soumya
>>>>
>>>>
>>>
>>
>

Re: Running Spark shell on YARN

Reply via email to