[
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maysam Yabandeh updated YARN-713:
---------------------------------
Attachment: YARN-713.patch
A preliminary patch is attached. Three points:
# With the new changes of YARN-571, I did not find a way to write a unit test
that simulates a DNS error. I am temporarily changing the visibility of
setTokenServiceUserIp to allow the test go thorough.
# The patch catches the IllegalArgumentException and verifies whether it is
related to IP resolving. I guess a cleaner way is to directly throw the
UnknownHostException at SecurityUtil.java
# The current patch simply logs the exception. I am wondering whether a more
complicated reaction is desired, such as recirculating the event after a
timeout. In general, we should determine which component is responsible of
retrying a failed event.
> ResourceManager can exit unexpectedly if DNS is unavailable
> -----------------------------------------------------------
>
> Key: YARN-713
> URL: https://issues.apache.org/jira/browse/YARN-713
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Jason Lowe
> Priority: Critical
> Attachments: YARN-713.patch
>
>
> As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could
> lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and
> that ultimately would cause the RM to exit. The RM should not exit during
> DNS hiccups.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira