[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4079:
-------------------------------
    Description: 
Currently in all daemons this config is explicitly set to true so that daemons 
can crash instead of hanging around. While this seems to be correct, as a  
recoverable exception should be caught and handled and NOT leaked through to 
AsyncDispatcher. And a non recoverable one should lead to a crash anyways.

But this can make system more fragile in case we miss to catch all recoverable 
exceptions.

Currently we do not even have an option of setting it to false in 
configuration, even if we would want. 

Probably we can read this value from configuration and set it to true in 
daemons if not configured.
This way in production clusters if there is an exception which is leading to 
the daemon crashing frequently and we find that its unavoidable but not a very 
big issue(i.e daemon can still work normally for most part), we can atleast set 
the configuration to false in config file.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4079
>                 URL: https://issues.apache.org/jira/browse/YARN-4079
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to