[ 
https://issues.apache.org/jira/browse/YARN-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487889#comment-15487889
 ] 

Allen Wittenauer edited comment on YARN-5635 at 9/13/16 5:49 PM:
-----------------------------------------------------------------

bq. does that hold true for even making it an option via a configuration 
setting?

Yes.

I don't know how many ways I can tell you that depending upon on an exit code 
here is extremely dangerous and has proven to be unreliable due to the 
constantly shifting nature of the state of the node on busy clusters. Throw in 
all of this "magically expanding/shrinking" task resource management bits that 
have gone in, and the situation gets even worse.

Besides, if you REALLY REALLY REALLY want to do this, all you need to do is 
wrap your existing health check in something else that, upon failure, prints 
the ERROR message.  


was (Author: aw):
bq. does that hold true for even making it an option via a configuration 
setting?

Yes.

I don't know how many ways I can tell you that depending upon on an error code 
here is extremely dangerous and has proven to be unreliable due to the 
constantly shifting nature of the state of the node on busy clusters. Throw in 
all of this "magically expanding/shrinking" task resource management bits that 
have gone in, and the situation gets even worse.

Besides, if you REALLY REALLY REALLY want to do this, all you need to do is 
wrap your existing health check in something else that, upon failure, prints 
the ERROR message.  

> Better handling when bad script is configured as Node's HealthScript
> --------------------------------------------------------------------
>
>                 Key: YARN-5635
>                 URL: https://issues.apache.org/jira/browse/YARN-5635
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Allen Wittenauer
>            Assignee: Yufei Gu
>
> Earlier fix to YARN-5567 is reverted because its not ideal to get the whole 
> cluster down because of a bad script. At the same time its important to 
> report that script is erroneous which is configured as node health script as 
> it might miss to detect bad health of a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to