Jim Brennan created YARN-10479:
----------------------------------
Summary: RMProxy should retry on SocketTimeout Exceptions
Key: YARN-10479
URL: https://issues.apache.org/jira/browse/YARN-10479
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn
Affects Versions: 2.10.1, 3.4.1
Reporter: Jim Brennan
Assignee: Jim Brennan
During an incident involving a DNS outage, a large number of nodemanagers
failed to come back into service because they hit a socket timeout when trying
to re-register with the RM.
SocketTimeoutException is not currently one of the exceptions that the RMProxy
will retry. Based on this incident, it seems like it should be. We made this
change internally about a year ago and it has been running in production since.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]