Xu Cang created YARN-10516:
------------------------------

             Summary: In HA mode, when one Resource Manager has networking 
issue, getTokenService() should not throw runtime exception
                 Key: YARN-10516
                 URL: https://issues.apache.org/jira/browse/YARN-10516
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: client
            Reporter: Xu Cang


We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 

 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
Is being called,  "yarnConf.getSocketAddr" will throw runtime exception, more 
specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be resolved.

This runtime exception then floats all the way into our application and cause 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly. 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to