Wangda Tan commented on YARN-4024:

Thanks for comment, I'm not sure what's the "cache lookup fails". 

There're two different kinds of cache lookup fails. One is the IP doesn't in 
the cache, we definitely need to re-resolve the address. Another one is the 
resolved IP is not a valid host according to hostsReader, there're two 
different cases:

1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a 
to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is 
on whitelist.
2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve 
every time when the node doing heartbeat since it may change to its IP to a one 
not on the blacklist.

So my thinking on this is: there should be a switch to control this, when a 
node's IP won't change OR there's no black/white node list, we should do 
caching, otherwise we need do resolving for every node heartbeat.


> YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
> ----------------------------------------------------------------------
>                 Key: YARN-4024
>                 URL: https://issues.apache.org/jira/browse/YARN-4024
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
> Currently, YARN RM NodesListManager will resolve IP address every time when 
> node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
> blocked and cannot make progress.

This message was sent by Atlassian JIRA

Reply via email to