GitHub user morenn520 opened a pull request:

    https://github.com/apache/spark/pull/17238

    getRackForHost returns None if host is unknown by driver

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-19894
    
    ## How was this patch tested?
    
    It tests on our production cluster(YARN) by YARN-cluster mode, and resolve 
user rack-local problems by applying this patch.
    Problem:
    In our production cluster(YARN), one node(called missing-rack-info node) 
miss some rack information for other nodes. One Spark Streaming 
program(Datasource: Kafka, Mode: Yarn-cluster), runs driver on this 
missing-rack-info node.
    The nodes whose host is missed on Driver node, and the Kafka broker node 
whose host is also unknown by YARN, would both be recognized as "/default-rack" 
by YARN scheduler, so that all tasks would be assigned to the nodes for 
RACK_LOCAL.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/morenn520/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17238
    
----
commit 6630e747efa52bff5ca48bb0a5610357c7754c10
Author: Chen Yuechen <[email protected]>
Date:   2017-03-10T07:24:48Z

    getRackForHost returns None if host is unknown by driver

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to