GitHub user attilapiros opened a pull request:
https://github.com/apache/spark/pull/21635
[SPARK-24594][YARN] Introducing metrics for YARN executor allocation
problems
## What changes were proposed in this pull request?
In this PR metrics are introduced for YARN allocation failures. As up to
now there was no metrics in the YARN module a new metric system is created with
the name "yarn".
To support both client and cluster mode the metric system lifecycle is
bound to the AM.
## How was this patch tested?
Both client and cluster mode was tested manually.
Before the test on one of the YARN node spark-core was removed to cause the
allocation failure.
Spark was started as (in case of client mode):
```
spark2-submit \
--class org.apache.spark.examples.SparkPi \
--conf "spark.yarn.blacklist.executor.launch.blacklisting.enabled=true"
--conf "spark.blacklist.application.maxFailedExecutorsPerNode=2" --conf
"spark.dynamicAllocation.enabled=true" --conf
"spark.metrics.conf.*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink"
\
--master yarn \
--deploy-mode client \
original-spark-examples_2.11-2.4.0-SNAPSHOT.jar \
1000
```
In both cases the YARN logs contained the new metrics as:
```
$ yarn logs --applicationId application_1529926424933_0015 | grep -A1 -B1
yarn.numFailedExecutors
18/06/25 07:08:29 INFO client.RMProxy: Connecting to ResourceManager at ...
-- Gauges
----------------------------------------------------------------------
yarn.numFailedExecutors
value = 0
--
-- Gauges
----------------------------------------------------------------------
yarn.numFailedExecutors
value = 3
--
-- Gauges
----------------------------------------------------------------------
yarn.numFailedExecutors
value = 3
--
-- Gauges
----------------------------------------------------------------------
yarn.numFailedExecutors
value = 3
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/attilapiros/spark SPARK-24594
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21635.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21635
----
commit 9b033ccfa572c93d7c2dc7bca06f9be1e363f88a
Author: âattilapirosâ <piros.attila.zsolt@...>
Date: 2018-06-19T19:40:20Z
Initial commit (yarn metrics)
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]