[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699908#comment-14699908 ]
Wangda Tan commented on YARN-4024: ---------------------------------- Hi [~zhiguohong], Thanks for working on this, For your comments: bq. I think that's too complicated... Agree, I changed my idea, see https://issues.apache.org/jira/browse/YARN-4024?focusedCommentId=14660607&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660607. Approach in your patch general looks good to me, few suggestions: 1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from the cache to force update its ip, since a node status change will (likely) update its ip. So this may require update the Resolver interface 2) seconds -> something like normalizedHostnameCacheTimeout [~wilsoncraft], bq. When a nodemanager is decommissioned, is the IP cached for that host flushed out of the cache? Normally when a host gets a new IP its because it gets moved or some other deliberate maintenance which would normally be preceded by a decommission. If the IP is flushed when decommissioned or a IP is always resolved from the host name when a new or recommissioned nodemanager is added to the cluster I think that would be adequate IMHO. I'm not quite sure about what did you mean, does my comment solve the problem you meantioned? bq. 1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from the cache to force update its ip, since a node status change will (likely) update its ip. So this may require update the Resolver interface bq. Also, it may be worthwhile or adequate to expose the method in a yarn rmadin command to force a flush of the IP cache. Is this IP cache the same used for Rack Awareness by the RM? I prefer keep this to be an internal behavior, this won't be used to determine rack IIUC. Please let me know your thoughts. Thanks, Wangda > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > ---------------------------------------------------------------------- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Wangda Tan > Assignee: Hong Zhiguo > Attachments: YARN-4024-draft.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)