[
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xu Cang updated YARN-10516:
---------------------------
Description:
We have observed one issue from YARN client around this piece of code:
[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
While
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address,
defaultAddr, defaultPort)) .toString());
{code}
is being called, "yarnConf.getSocketAddr" will throw runtime exception, more
specifically, UnknownHostException from here:
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be
resolved.
This runtime exception then floats all the way up to our application and causes
MR job submission failed.
In my opinion, since we have HA here, multiple RMs are still alive and
available. We should catch this exception in getTokenService() and handle it
properly, instead of failing the whole action.
Would like to hear your opinion on this, if agreed, I will provide a patch on
this. Thank you.
was:
We have observed one issue from YARN client around this piece of code:
[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
While
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address,
defaultAddr, defaultPort)) .toString());
{code}
Is being called, "yarnConf.getSocketAddr" will throw runtime exception, more
specifically, UnknownHostException from here:
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be resolved.
This runtime exception then floats all the way into our application and cause
MR job submission failed.
In my opinion, since we have HA here, multiple RMs are still alive and
available. We should catch this exception in getTokenService() and handle it
properly.
Would like to hear your opinion on this, if agreed, I will provide a patch on
this. Thank you.
> In HA mode, when one Resource Manager has networking issue, getTokenService()
> should not throw runtime exception
> ----------------------------------------------------------------------------------------------------------------
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: client
> Reporter: Xu Cang
> Priority: Minor
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>
> While
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address,
> defaultAddr, defaultPort)) .toString());
>
> {code}
> is being called, "yarnConf.getSocketAddr" will throw runtime exception,
> more specifically, UnknownHostException from here:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
> while one of the RM host was having networking issue that IP cannot be
> resolved.
> This runtime exception then floats all the way up to our application and
> causes MR job submission failed.
> In my opinion, since we have HA here, multiple RMs are still alive and
> available. We should catch this exception in getTokenService() and handle it
> properly, instead of failing the whole action.
>
>
> Would like to hear your opinion on this, if agreed, I will provide a patch on
> this. Thank you.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]