[
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maysam Yabandeh updated YARN-713:
---------------------------------
Attachment: YARN-713.patch
In the attached patch, the exception is handled in
RMContainerTokenSecretManager#createContainerToken by returning null. The null
values are supposed to trigger a try, as in FifoScheduler#assignContainer:
{code:java}
if (containerToken == null) {
return i; // Try again later.
}
{code}
Regarding the sweep of RM to find other places that a DNS failure should be
handled properly, I guess a cleaner approach is to directly throw
UnknownHostException instead of hiding it in a InvalidArgumentException, which
is also semantically confusing. This however would result in widespread changes
allover the project, as each user of SecurityUtil must either handle the
exception or declare it to be caught by its callers. If this approach is fine
with you guys, I can give it a go.
> ResourceManager can exit unexpectedly if DNS is unavailable
> -----------------------------------------------------------
>
> Key: YARN-713
> URL: https://issues.apache.org/jira/browse/YARN-713
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Jason Lowe
> Priority: Critical
> Attachments: YARN-713.patch, YARN-713.patch
>
>
> As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could
> lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and
> that ultimately would cause the RM to exit. The RM should not exit during
> DNS hiccups.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira