[jira] [Commented] (YARN-9399) Yarn Client may use stale DNS to connect to RM

2019-03-22 Thread Fengnan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799531#comment-16799531
 ] 

Fengnan Li commented on YARN-9399:
--

[~elgoiri] [~xianliangz] This is an interesting issue.

I think the solution depends on where the cache is kept. After a little 
research I found this article: 
[https://www-01.ibm.com/support/docview.wss?uid=swg21207534]

and it seems the cache is inside InetAddress, which InetSocketAddress also uses.

[~xianliangz] Can we try to set the cache ttl with JVM to make the DNS cache 
expire much quicker?

It seems some OS does the DNS cache itself and if that's the case then we 
probably need to find some config to tune the system.

> Yarn Client may use stale DNS to connect to RM
> --
>
> Key: YARN-9399
> URL: https://issues.apache.org/jira/browse/YARN-9399
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Leon zhang
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This happens more frequently when running yarn in Kubernetes. When yarn 
> client try to connect to RM, if the DNS of RM is not resovable due to 
> kube-dns failure or not ready, the yarn client will initaize itself with 
> unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM 
> will fail with UnknownHostException. Yarn client will retry the connection by 
> RetryProxy by it always use the cached unresolved InetSocketAddress. The 
> retry will never success. When RM is reschdured to another kubernetes node, 
> which changed the RM ip, this bug will also happen. Currently the work around 
> is to restarting the Yarn client. 
> This issue happens in both HA and non-HA of RM. HDFS has simialr issues. 
> [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
> I propose to add a new RMFailoverProxyProvider called 
> AutoRefreshRMFailoverProxyProvider which will resove the DNS in the 
> overwriten function getProxy(). This way, RetryProxy can resolve the DNS each 
> time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9399) Yarn Client may use stale DNS to connect to RM

2019-03-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796289#comment-16796289
 ] 

Íñigo Goiri commented on YARN-9399:
---

This is somewhat related to HDFS-4957 in the HDFS side.
The discussion seems pretty related jere:
https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48
[~fengnanli], in HDFS-14327 you are using FQDN addresses.
Should we cover this scenario there?


> Yarn Client may use stale DNS to connect to RM
> --
>
> Key: YARN-9399
> URL: https://issues.apache.org/jira/browse/YARN-9399
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Leon zhang
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This happens more frequently when running yarn in Kubernetes. When yarn 
> client try to connect to RM, if the DNS of RM is not resovable due to 
> kube-dns failure or not ready, the yarn client will initaize itself with 
> unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM 
> will fail with UnknownHostException. Yarn client will retry the connection by 
> RetryProxy by it always use the cached unresolved InetSocketAddress. The 
> retry will never success. When RM is reschdured to another kubernetes node, 
> which changed the RM ip, this bug will also happen. Currently the work around 
> is to restarting the Yarn client. 
> This issue happens in both HA and non-HA of RM. HDFS has simialr issues. 
> [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
> I propose to add a new RMFailoverProxyProvider called 
> AutoRefreshRMFailoverProxyProvider which will resove the DNS in the 
> overwriten function getProxy(). This way, RetryProxy can resolve the DNS each 
> time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9399) Yarn Client may use stale DNS to connect to RM

2019-03-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795559#comment-16795559
 ] 

Íñigo Goiri commented on YARN-9399:
---

Moved from HDFS to YARN. 

> Yarn Client may use stale DNS to connect to RM
> --
>
> Key: YARN-9399
> URL: https://issues.apache.org/jira/browse/YARN-9399
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Leon zhang
>Priority: Major
>  Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This happens more frequently when running yarn in Kubernetes. When yarn 
> client try to connect to RM, if the DNS of RM is not resovable due to 
> kube-dns failure or not ready, the yarn client will initaize itself with 
> unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM 
> will fail with UnknownHostException. Yarn client will retry the connection by 
> RetryProxy by it always use the cached unresolved InetSocketAddress. The 
> retry will never success. When RM is reschdured to another kubernetes node, 
> which changed the RM ip, this bug will also happen. Currently the work around 
> is to restarting the Yarn client. 
> This issue happens in both HA and non-HA of RM. HDFS has simialr issues. 
> [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]
> I propose to add a new RMFailoverProxyProvider called 
> AutoRefreshRMFailoverProxyProvider which will resove the DNS in the 
> overwriten function getProxy(). This way, RetryProxy can resolve the DNS each 
> time it retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org