[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711462#comment-14711462 ]
Varun Saxena commented on YARN-4079: ------------------------------------ Hmm...I wasnt necessarily thinking of making it public. Just adding a way for it to be read from config so that it can be set to false if required(in rare scenarios) temporarily. But is there something else we can do ? Maybe we can add an exclusion list for which exceptions to be ignored. But the same exception might be a very critical bug in one area of code and not in other. So that may not be a viable alternative as well. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > ------------------------------------------------------------------------------------------------ > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 2.7.1 > Reporter: Varun Saxena > Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)