Re: Running Spark shell on YARN

Eric Friedman Sat, 16 Aug 2014 07:16:09 -0700

+1 for such a document. 

----
Eric Friedman


> On Aug 15, 2014, at 1:10 PM, Kevin Markey <kevin.mar...@oracle.com> wrote:
> 
> Sandy and others:
> 
> Is there a single source of Yarn/Hadoop properties that should be set or 
> reset for running Spark on Yarn?
> We've sort of stumbled through one property after another, and (unless 
> there's an update I've not yet seen) CDH5 Spark-related properties are for 
> running the Spark Master instead of Yarn.
> 
> Thanks
> Kevin
> 
>> On 08/15/2014 12:47 PM, Sandy Ryza wrote:
>> We generally recommend setting yarn.scheduler.maximum-allocation-mbto the 
>> maximum node capacity.
>> 
>> -Sandy
>> 
>> 
>>> On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta <soumya.sima...@gmail.com> 
>>> wrote:
>>> I just checked the YARN config and looks like I need to change this value. 
>>> Should be upgraded to 48G (the max memory allocated to YARN) per node ? 
>>> 
>>> <property>
>>> <name>yarn.scheduler.maximum-allocation-mb</name>
>>> <value>6144</value>
>>> <source>java.io.BufferedInputStream@2e7e1ee</source>
>>> </property>
>>> 
>>> 
>>>> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <soumya.sima...@gmail.com> 
>>>> wrote:
>>>> Andrew, 
>>>> 
>>>> Thanks for your response. 
>>>> 
>>>> When I try to do the following. 
>>>>  ./spark-shell --executor-memory 46g --master yarn
>>>> 
>>>> I get the following error. 
>>>> 
>>>> Exception in thread "main" java.lang.Exception: When running with master 
>>>> 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
>>>> environment.
>>>> 
>>>> at 
>>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166)
>>>> 
>>>> at 
>>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61)
>>>> 
>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50)
>>>> 
>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>> 
>>>> After this I set the following env variable. 
>>>> 
>>>> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/
>>>> 
>>>> The program launches but then halts with the following error. 
>>>> 
>>>> 
>>>> 14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104 MB), 
>>>> is above the max threshold (6144 MB) of this cluster.
>>>> 
>>>> I guess this is some YARN setting that is not set correctly. 
>>>> 
>>>> Thanks
>>>> 
>>>> -Soumya
>>>> 
>>>> 
>>>> 
>>>>> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com> wrote:
>>>>> Hi Soumya,
>>>>> 
>>>>> The driver's console output prints out how much memory is actually 
>>>>> granted to each executor, so from there you can verify how much memory 
>>>>> the executors are actually getting. You should use the 
>>>>> '--executor-memory' argument in spark-shell. For instance, assuming each 
>>>>> node has 48G of memory,
>>>>> 
>>>>> bin/spark-shell --executor-memory 46g --master yarn
>>>>> 
>>>>> We leave a small cushion for the OS so we don't take up all of the entire 
>>>>> system's memory. This option also applies to the standalone mode you've 
>>>>> been using, but if you have been using the ec2 scripts, we set 
>>>>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically 
>>>>> so you don't have to specify it each time on the command line. Of course, 
>>>>> you can also do the same in YARN.
>>>>> 
>>>>> -Andrew
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>:
>>>>> 
>>>>>> I've been using the standalone cluster all this time and it worked fine. 
>>>>>> Recently I'm using another Spark cluster that is based on YARN and I've 
>>>>>> not experience with YARN. 
>>>>>> 
>>>>>> The YARN cluster has 10 nodes and a total memory of 480G. 
>>>>>> 
>>>>>> I'm having trouble starting the spark-shell with enough memory. 
>>>>>> I'm doing a very simple operation - reading a file 100GB from HDFS and 
>>>>>> running a count on it. This fails due to out of memory on the executors. 
>>>>>> 
>>>>>> Can someone point to the command line parameters that I should use for 
>>>>>> spark-shell so that it?
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> -Soumya
> 
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
> commands, e-mail: user-h...@spark.apache.org

Re: Running Spark shell on YARN

Reply via email to