[
https://issues.apache.org/jira/browse/YARN-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487889#comment-15487889
]
Allen Wittenauer commented on YARN-5635:
----------------------------------------
bq. does that hold true for even making it an option via a configuration
setting?
Yes.
I don't know how many ways I can tell you that depending upon on an error code
here is extremely dangerous and has proven to be unreliable due to the
constantly shifting nature of the state of the node on busy clusters. Throw in
all of this "magically expanding/shrinking" task resource management bits that
have gone in, and the situation gets even worse.
Besides, if you REALLY REALLY REALLY want to do this, all you need to do is
wrap your existing health check in something else that, upon failure, prints
the ERROR message.
> Better handling when bad script is configured as Node's HealthScript
> --------------------------------------------------------------------
>
> Key: YARN-5635
> URL: https://issues.apache.org/jira/browse/YARN-5635
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Allen Wittenauer
> Assignee: Yufei Gu
>
> Earlier fix to YARN-5567 is reverted because its not ideal to get the whole
> cluster down because of a bad script. At the same time its important to
> report that script is erroneous which is configured as node health script as
> it might miss to detect bad health of a node.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]