[ https://issues.apache.org/jira/browse/YARN-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227063#comment-17227063 ]
Eric Payne commented on YARN-10479: ----------------------------------- Thanks [~Jim_Brennan]. I committed this to 3.1 through trunk. > RMProxy should retry on SocketTimeout Exceptions > ------------------------------------------------ > > Key: YARN-10479 > URL: https://issues.apache.org/jira/browse/YARN-10479 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn > Affects Versions: 2.10.1, 3.4.1 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Major > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10479.001.patch, YARN-10479.002.patch, > YARN-10479.003.patch > > > During an incident involving a DNS outage, a large number of nodemanagers > failed to come back into service because they hit a socket timeout when > trying to re-register with the RM. > SocketTimeoutException is not currently one of the exceptions that the > RMProxy will retry. Based on this incident, it seems like it should be. We > made this change internally about a year ago and it has been running in > production since. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org