Re: Running Spark shell on YARN

Soumya Simanta Fri, 15 Aug 2014 12:50:01 -0700

After changing the allocation I'm getting the following in my logs. No idea
what this means.


14/08/15 15:44:33 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:34 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:35 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:36 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:37 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:38 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:39 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:40 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:41 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:42 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:43 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:44 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:45 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:46 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:47 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:48 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:49 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:50 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372

 yarnAppState: ACCEPTED


14/08/15 15:44:51 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:

 appMasterRpcPort: -1

 appStartTime: 1408131861372
yarnAppState: ACCEPTED


On Fri, Aug 15, 2014 at 2:47 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> We generally recommend setting yarn.scheduler.maximum-allocation-mbto the
> maximum node capacity.
>
> -Sandy
>
>
> On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta <soumya.sima...@gmail.com
> > wrote:
>
>> I just checked the YARN config and looks like I need to change this
>> value. Should be upgraded to 48G (the max memory allocated to YARN) per
>> node ?
>>
>> <property>
>> <name>yarn.scheduler.maximum-allocation-mb</name>
>> <value>6144</value>
>> <source>java.io.BufferedInputStream@2e7e1ee</source>
>> </property>
>>
>>
>> On Fri, Aug 15, 2014 at 2:37 PM, Soumya Simanta <soumya.sima...@gmail.com
>> > wrote:
>>
>>> Andrew,
>>>
>>> Thanks for your response.
>>>
>>> When I try to do the following.
>>>
>>>  ./spark-shell --executor-memory 46g --master yarn
>>>
>>> I get the following error.
>>>
>>> Exception in thread "main" java.lang.Exception: When running with master
>>> 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the
>>> environment.
>>>
>>> at
>>> org.apache.spark.deploy.SparkSubmitArguments.checkRequiredArguments(SparkSubmitArguments.scala:166)
>>>
>>> at
>>> org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:61)
>>>
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:50)
>>>
>>>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>> After this I set the following env variable.
>>>
>>> export YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/
>>>
>>> The program launches but then halts with the following error.
>>>
>>>
>>> *14/08/15 14:33:22 ERROR yarn.Client: Required executor memory (47104
>>> MB), is above the max threshold (6144 MB) of this cluster.*
>>>
>>> I guess this is some YARN setting that is not set correctly.
>>>
>>>
>>> Thanks
>>>
>>> -Soumya
>>>
>>>
>>> On Fri, Aug 15, 2014 at 2:19 PM, Andrew Or <and...@databricks.com>
>>> wrote:
>>>
>>>> Hi Soumya,
>>>>
>>>> The driver's console output prints out how much memory is actually
>>>> granted to each executor, so from there you can verify how much memory the
>>>> executors are actually getting. You should use the '--executor-memory'
>>>> argument in spark-shell. For instance, assuming each node has 48G of 
>>>> memory,
>>>>
>>>> bin/spark-shell --executor-memory 46g --master yarn
>>>>
>>>> We leave a small cushion for the OS so we don't take up all of the
>>>> entire system's memory. This option also applies to the standalone mode
>>>> you've been using, but if you have been using the ec2 scripts, we set
>>>> "spark.executor.memory" in conf/spark-defaults.conf for you automatically
>>>> so you don't have to specify it each time on the command line. Of course,
>>>> you can also do the same in YARN.
>>>>
>>>> -Andrew
>>>>
>>>>
>>>>
>>>> 2014-08-15 10:45 GMT-07:00 Soumya Simanta <soumya.sima...@gmail.com>:
>>>>
>>>> I've been using the standalone cluster all this time and it worked
>>>>> fine.
>>>>> Recently I'm using another Spark cluster that is based on YARN and
>>>>> I've not experience with YARN.
>>>>>
>>>>> The YARN cluster has 10 nodes and a total memory of 480G.
>>>>>
>>>>> I'm having trouble starting the spark-shell with enough memory.
>>>>> I'm doing a very simple operation - reading a file 100GB from HDFS and
>>>>> running a count on it. This fails due to out of memory on the executors.
>>>>>
>>>>> Can someone point to the command line parameters that I should use for
>>>>> spark-shell so that it?
>>>>>
>>>>>
>>>>> Thanks
>>>>> -Soumya
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Running Spark shell on YARN

Reply via email to