GitHub user morenn520 opened a pull request:
https://github.com/apache/spark/pull/17238
getRackForHost returns None if host is unknown by driver
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-19894
## How was this patch tested?
It tests on our production cluster(YARN) by YARN-cluster mode, and resolve
user rack-local problems by applying this patch.
Problem:
In our production cluster(YARN), one node(called missing-rack-info node)
miss some rack information for other nodes. One Spark Streaming
program(Datasource: Kafka, Mode: Yarn-cluster), runs driver on this
missing-rack-info node.
The nodes whose host is missed on Driver node, and the Kafka broker node
whose host is also unknown by YARN, would both be recognized as "/default-rack"
by YARN scheduler, so that all tasks would be assigned to the nodes for
RACK_LOCAL.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/morenn520/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17238.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17238
----
commit 6630e747efa52bff5ca48bb0a5610357c7754c10
Author: Chen Yuechen <[email protected]>
Date: 2017-03-10T07:24:48Z
getRackForHost returns None if host is unknown by driver
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]