[ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699908#comment-14699908
 ] 

Wangda Tan commented on YARN-4024:
----------------------------------

Hi [~zhiguohong],

Thanks for working on this,
For your comments:
bq. I think that's too complicated...
Agree, I changed my idea, see 
https://issues.apache.org/jira/browse/YARN-4024?focusedCommentId=14660607&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660607.

Approach in your patch general looks good to me, few suggestions:
1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from the 
cache to force update its ip, since a node status change will (likely) update 
its ip. So this may require update the Resolver interface
2) seconds -> something like normalizedHostnameCacheTimeout

[~wilsoncraft],
bq. When a nodemanager is decommissioned, is the IP cached for that host 
flushed out of the cache? Normally when a host gets a new IP its because it 
gets moved or some other deliberate maintenance which would normally be 
preceded by a decommission. If the IP is flushed when decommissioned or a IP is 
always resolved from the host name when a new or recommissioned nodemanager is 
added to the cluster I think that would be adequate IMHO.
I'm not quite sure about what did you mean, does my comment solve the problem 
you meantioned?
bq. 1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from 
the cache to force update its ip, since a node status change will (likely) 
update its ip. So this may require update the Resolver interface

bq. Also, it may be worthwhile or adequate to expose the method in a yarn 
rmadin command to force a flush of the IP cache. Is this IP cache the same used 
for Rack Awareness by the RM?
I prefer keep this to be an internal behavior, this won't be used to determine 
rack IIUC.

Please let me know your thoughts.

Thanks,
Wangda

> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> ----------------------------------------------------------------------
>
>                 Key: YARN-4024
>                 URL: https://issues.apache.org/jira/browse/YARN-4024
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Wangda Tan
>            Assignee: Hong Zhiguo
>         Attachments: YARN-4024-draft.patch
>
>
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to