[ 
https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-9399:
---------------------------------

    Assignee: Íñigo Goiri

> Yarn Client may use stale DNS to connect to RM
> ----------------------------------------------
>
>                 Key: YARN-9399
>                 URL: https://issues.apache.org/jira/browse/YARN-9399
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Leon zhang
>            Assignee: Íñigo Goiri
>            Priority: Major
>              Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This happens more frequently when running yarn in Kubernetes. When yarn 
> client try to connect to RM, if the DNS of RM is not resovable due to 
> kube-dns failure or not ready, the yarn client will initaize itself with 
> unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM 
> will fail with UnknownHostException. Yarn client will retry the connection by 
> RetryProxy by it always use the cached unresolved InetSocketAddress. The 
> retry will never success. When RM is reschdured to another kubernetes node, 
> which changed the RM ip, this bug will also happen. Currently the work around 
> is to restarting the Yarn client. 
> This issue happens in both HA and non-HA of RM. HDFS has simialr issues. 
> [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
> I propose to add a new RMFailoverProxyProvider called 
> AutoRefreshRMFailoverProxyProvider which will resove the DNS in the 
> overwriten function getProxy(). This way, RetryProxy can resolve the DNS each 
> time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to