[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487635#comment-15487635 ] Naganarasimha G R commented on YARN-5567: - reopened YARN-5635, put some comments there for further discussion ... > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487355#comment-15487355 ] Allen Wittenauer commented on YARN-5567: It needs to be available via metrics2, otherwise it's invisible to most large scale ops teams. Someone open a new JIRA or rework YARN-5635 for this discussion. This JIRA is effectively dead for any new development. :( > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486556#comment-15486556 ] Naganarasimha G R commented on YARN-5567: - bq. standardizing on a specific error code for "detected bad Node" vs "bad script" I was thinking something in the same lines when i mentioned earlier ??"Should we think of some other state which could warn the admin about this(which is captured in webui/Rest)"?? If NM can inform Healthy/UnHealthy/HealthValidationError, And this can be sent across Heartbeat to RM and RM can capture the state of this NM to be other than Running and UnHealthy (a New state). This can be displayed in the WebUI and also in the can be queried using {{./yarn node -list -state}} > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484820#comment-15484820 ] Allen Wittenauer commented on YARN-5567: bq. would you prefer this be a config setting to choose the behavior? The history of the health check script is interesting, but long. But not trusting the exit code was one of the key learnings by the ops team from the HOD experience. It fails a lot more often than people realize, mainly due to users doing crazy things, especially on insecure systems. This is one of those times where it's going to be extremely difficult to convince me otherwise. I can't think of a reason to ever trust the exit code enough to bring down the NodeManager. In this particular environment, the number of conditions that the script can fail for reasons which may be temporary/pointless are many. Now it could be argued that those temporary failures should cause the NM to come down, but then you get into a race condition between heartbeats and actual issues. HDFS worked around it by basically saying "it has to fail for X long". Ignoring the exit code avoids that problem because one can be sure that "ERROR -" really did come from the script. bq. Alternatively, would you be okay with standardizing on a specific error code for "detected bad Node" vs "bad script"? If by error code you specifically mean the value the NM reports back to the RM, yes that makes sense. It just can't fail the node. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619 ] Ray Chiang commented on YARN-5567: -- [~aw], would you prefer this be a config setting to choose the behavior? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484429#comment-15484429 ] Allen Wittenauer commented on YARN-5567: bq. Should we think of some other state which could warn the admin about this(which is captured in webui/Rest)? Probably. The key problem is going to be putting it some place that admins will actually notice it. (Hint: most folks in ops that I know don't actually look at the web UIs...) If folks want to pursue that, they'll need to do it in another JIRA since this one has been in a release. :( > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481979#comment-15481979 ] Naganarasimha G R commented on YARN-5567: - [~aw], I understand that with typo in the health check script can bring down the whole cluster hence we need to revert this, but at the same time with erroneous script there could be possibility that the script missed to detect some health check failures on the node ? Should we think of some other state which could warn the admin about this(which is captured in webui/Rest) ? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481755#comment-15481755 ] Allen Wittenauer commented on YARN-5567: I'm going to mark this as fixed so that the release notes for alpha1 reflect that this change is present in it. I've open and closed YARN-5635 so that alpha2's release notes reflect this change being reverted. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474844#comment-15474844 ] Hudson commented on YARN-5567: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10411 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10411/]) Revert "YARN-5567. Fix script exit code checking in (aw: rev cae331186da266eea1b0a6fc2c82604907ab0153) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474805#comment-15474805 ] Allen Wittenauer commented on YARN-5567: I've reverted this change. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472384#comment-15472384 ] Allen Wittenauer commented on YARN-5567: -1 Please revert this change. The exit code getting ignored is *intentional*. We don't want to bring the nodemanager down in case the script has a syntax error in it. Such a condition would bring down *entire clusters* at once, instantaneously. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456763#comment-15456763 ] Andrew Wang commented on YARN-5567: --- Looking at git log, it looks like this will also be included in alpha1. I rebranched right before sending the RC, and we picked up this JIRA as part of that. So, I think we're good? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456719#comment-15456719 ] Ray Chiang commented on YARN-5567: -- I think they've already started the vote on alpha-1 RC0. I believe it will show up in alpha2 automatically. [~andrew.wang], let me know how to handle this situation. Thanks. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456445#comment-15456445 ] Yufei Gu commented on YARN-5567: I think we should push it 3.0.0-alpha1, otherwise there will be a incompatibility in Hadoop 3. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451647#comment-15451647 ] Naganarasimha G R commented on YARN-5567: - Well i would have liked to port it to 2.9 atleast as in my view its better to flag scrit syntax error as an error rather than silently passing it as success, but i am go in taking less riskier approach too! bq. I would be OK with the change in trunk. It does make the behaviour clearer. Well the catch again here is 3.0.0-alpha1 is seperate branch from trunk, so you guys plan to push it to 3.0.0-alpha1 branch too right ? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 3.0.0-alpha1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450469#comment-15450469 ] Wilfred Spiegelenburg commented on YARN-5567: - I would be OK with the change in trunk. It does make the behaviour clearer. We do need the additional changes on the comments and the javadoc on top of what we have now. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450114#comment-15450114 ] Ray Chiang commented on YARN-5567: -- So, that's two votes for trunk only. [~wilfreds] or [~Naganarasimha], are both of you okay with that? > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449979#comment-15449979 ] Yufei Gu commented on YARN-5567: Thanks [~wilfreds] for pointing out. Nice catch. My bad to miss the part of the Java doc. Thanks [~Naganarasimha] and [~rchiang]'s comments. I prefer to revert it in branch-2.8 and branch-2 for compatibility reason, and keep it in trunk with updating of documentation. Thanks [~rchiang] for filing the follow up JIRA. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449735#comment-15449735 ] Ray Chiang commented on YARN-5567: -- One point of clarification. While this *is* an incompatible change, I was debating about the "hardness" of it. It will break on broken health checking scripts (assuming anyone is even using the feature). If we want to treat this as a hard incompatibility, then I'd go with my earlier suggestion. In general, I prefer being conservative along these lines. If others are of the mind that this is a "softer" incompatibility, we could keep it in branch-2.8. Either way, I agree the documentation and Javadoc need to be updated to match. I've filed YARN-5595 as a follow up. > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449561#comment-15449561 ] Ray Chiang commented on YARN-5567: -- Thanks [~wilfreds] for that. For incompatible changes, I'd prefer to leave it in trunk, pull it out of branch-2.8 and debate about branch-2 (effectively 2.9). > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448000#comment-15448000 ] Naganarasimha G R commented on YARN-5567: - Thanks for pointing it out [~wilfreds], but it was not completely overlooked by me. My thoughts behind going ahead with this issue is, if the script has syntax error then there is possibility that the script might not execute properly and detect any issues with the node's health (if any). So i felt warning to the user that the script has error(or returning a error code) is better than just passing the evaluation as successful. bq. If we are going to change the behaviour that is documented we should not do it in release 2.8.1 and also update all related documentation. Agree that required documentation and comments needs to be modified/upgrade (which we missed in the patch). But not to do in 2.8.1 release is a debatable topic which can be further discussed upon. Few points in favor of doing it is, # we are doing the change in minor version than the major version. (2.8.0 is not yet released and if possible we can incorporate in it too) # As mentioned above if there is issue in the script better to flag it as an error rather than silently passing it as success, so better to flag an error if any script issues even for existing cluster too. Thoughts ? would also like to see others input too > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447465#comment-15447465 ] Yufei Gu commented on YARN-5567: Thanks [~rchiang] for the review and commit! Thanks [~Naganarasimha] for the review! > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
[ https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447442#comment-15447442 ] Hudson commented on YARN-5567: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10371 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10371/]) YARN-5567. Fix script exit code checking in (rchiang: rev 05ede003868871addc30162e9707c3dc14ed6b7a) * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java > Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus > -- > > Key: YARN-5567 > URL: https://issues.apache.org/jira/browse/YARN-5567 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.8.1 > > Attachments: YARN-5567.001.patch > > > In case of FAILED_WITH_EXIT_CODE, health status should be false. > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > {code} > should be > {code} > case FAILED_WITH_EXIT_CODE: > setHealthStatus(false, "", now); > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org