[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487635#comment-15487635
 ] 

Naganarasimha G R commented on YARN-5567:
-

reopened YARN-5635, put some comments there for further discussion ...

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487355#comment-15487355
 ] 

Allen Wittenauer commented on YARN-5567:


It needs to be available via metrics2, otherwise it's invisible to most large 
scale ops teams.

Someone open a new JIRA or rework  YARN-5635 for this discussion.  This JIRA is 
effectively dead for any new development. :(

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486556#comment-15486556
 ] 

Naganarasimha G R commented on YARN-5567:
-

bq. standardizing on a specific error code for "detected bad Node" vs "bad 
script"
I was thinking something in the same lines when i mentioned earlier ??"Should 
we think of some other state which could warn the admin about this(which is 
captured in webui/Rest)"?? 
If NM can inform Healthy/UnHealthy/HealthValidationError, And this can be sent 
across Heartbeat to RM and RM can capture the state of this NM to be other than 
Running and UnHealthy (a New state).  This can be displayed in the WebUI and 
also in the can be queried using {{./yarn node -list -state}}

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484820#comment-15484820
 ] 

Allen Wittenauer commented on YARN-5567:


bq. would you prefer this be a config setting to choose the behavior?

The history of the health check script is interesting, but long.  But not 
trusting the exit code was one of the key learnings by the ops team from the 
HOD experience. It fails a lot more often than people realize, mainly due to 
users doing crazy things, especially on insecure systems.

This is one of those times where it's going to be extremely difficult to 
convince me otherwise.  I can't think of a reason to ever trust the exit code 
enough to bring down the NodeManager.   In this particular environment, the 
number of conditions that the script can fail for reasons which may be 
temporary/pointless are many.  

Now it could be argued that those temporary failures should cause the NM to 
come down, but then you get into a race condition between heartbeats and actual 
issues.  HDFS worked around it by basically saying "it has to fail for X long". 
Ignoring the exit code avoids that problem because one can be sure that "ERROR 
-" really did come from the script.

bq. Alternatively, would you be okay with standardizing on a specific error 
code for "detected bad Node" vs "bad script"?

If by error code you specifically mean the value the NM reports back to the RM, 
yes that makes sense.  It just can't fail the node.  

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484619#comment-15484619
 ] 

Ray Chiang commented on YARN-5567:
--

[~aw], would you prefer this be a config setting to choose the behavior?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484429#comment-15484429
 ] 

Allen Wittenauer commented on YARN-5567:


bq. Should we think of some other state which could warn the admin about 
this(which is captured in webui/Rest)?

Probably. The key problem is going to be putting it some place that admins will 
actually notice it. (Hint: most folks in ops that I know don't actually look at 
the web UIs...)

If folks want to pursue that, they'll need to do it in another JIRA since this 
one has been in a release. :(

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-11 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481979#comment-15481979
 ] 

Naganarasimha G R commented on YARN-5567:
-

[~aw], I understand that with typo in the health check script can bring down 
the whole cluster hence we need to revert this, but at the same time with 
erroneous script there could be possibility that the script missed to detect 
some health check failures on the node ?
Should we think of some other state which could warn the admin about this(which 
is captured in webui/Rest) ?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481755#comment-15481755
 ] 

Allen Wittenauer commented on YARN-5567:


I'm going to mark this as fixed so that the release notes for alpha1 reflect 
that this change is present in it.  I've open and closed YARN-5635 so that 
alpha2's release notes reflect this change being reverted.

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474844#comment-15474844
 ] 

Hudson commented on YARN-5567:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10411 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10411/])
Revert "YARN-5567. Fix script exit code checking in (aw: rev 
cae331186da266eea1b0a6fc2c82604907ab0153)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474805#comment-15474805
 ] 

Allen Wittenauer commented on YARN-5567:


I've reverted this change.

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15472384#comment-15472384
 ] 

Allen Wittenauer commented on YARN-5567:


-1  Please revert this change.

The exit code getting ignored is *intentional*.  We don't want to bring the 
nodemanager down in case the script has a syntax error in it.  Such a condition 
would bring down *entire clusters* at once, instantaneously.



> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456763#comment-15456763
 ] 

Andrew Wang commented on YARN-5567:
---

Looking at git log, it looks like this will also be included in alpha1. I 
rebranched right before sending the RC, and we picked up this JIRA as part of 
that.

So, I think we're good?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-01 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456719#comment-15456719
 ] 

Ray Chiang commented on YARN-5567:
--

I think they've already started the vote on alpha-1 RC0.  I believe it will 
show up in alpha2 automatically.

[~andrew.wang], let me know how to handle this situation.  Thanks.


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-09-01 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456445#comment-15456445
 ] 

Yufei Gu commented on YARN-5567:


I think we should push it 3.0.0-alpha1, otherwise there will be a 
incompatibility in Hadoop 3. 

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-31 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451647#comment-15451647
 ] 

Naganarasimha G R commented on YARN-5567:
-

Well i would have liked to port it to 2.9 atleast as in my view its better to 
flag scrit syntax error as an error rather than silently passing it as success, 
but i am go in taking less riskier approach too!
bq. I would be OK with the change in trunk. It does make the behaviour clearer.
Well the catch again here is 3.0.0-alpha1 is seperate branch from trunk, so you 
guys plan to push it to 3.0.0-alpha1 branch too right ?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-30 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450469#comment-15450469
 ] 

Wilfred Spiegelenburg commented on YARN-5567:
-

I would be OK with the change in trunk. It does make the behaviour clearer.
We do need the additional changes on the comments and the javadoc on top of 
what we have now.

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-30 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450114#comment-15450114
 ] 

Ray Chiang commented on YARN-5567:
--

So, that's two votes for trunk only.  [~wilfreds] or [~Naganarasimha], are both 
of you okay with that?

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-30 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449979#comment-15449979
 ] 

Yufei Gu commented on YARN-5567:


Thanks [~wilfreds] for pointing out. Nice catch. My bad to miss the part of the 
Java doc. Thanks [~Naganarasimha] and [~rchiang]'s comments. I prefer to revert 
it in branch-2.8 and branch-2 for compatibility reason,  and keep it in trunk 
with updating of documentation. Thanks [~rchiang] for filing the follow up JIRA.


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-30 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449735#comment-15449735
 ] 

Ray Chiang commented on YARN-5567:
--

One point of clarification.  While this *is* an incompatible change, I was 
debating about the "hardness" of it.  It will break on broken health checking 
scripts (assuming anyone is even using the feature).  If we want to treat this 
as a hard incompatibility, then I'd go with my earlier suggestion.  In general, 
I prefer being conservative along these lines.

If others are of the mind that this is a "softer" incompatibility, we could 
keep it in branch-2.8.

Either way, I agree the documentation and Javadoc need to be updated to match.  
I've filed YARN-5595 as a follow up.


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-30 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449561#comment-15449561
 ] 

Ray Chiang commented on YARN-5567:
--

Thanks [~wilfreds] for that.  For incompatible changes, I'd prefer to leave it 
in trunk, pull it out of branch-2.8 and debate about branch-2 (effectively 2.9).


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448000#comment-15448000
 ] 

Naganarasimha G R commented on YARN-5567:
-

Thanks for pointing it out [~wilfreds], but it was not completely overlooked by 
me. My thoughts behind going ahead with this issue is, if the script has syntax 
error then there is possibility that the script might not execute properly and 
detect any issues with the node's health (if any). So i felt warning to the 
user that the script has error(or returning a error code) is better than just 
passing the evaluation as successful. 
bq. If we are going to change the behaviour that is documented we should not do 
it in release 2.8.1 and also update all related documentation.
Agree that required documentation and comments needs to be modified/upgrade 
(which we missed in the patch). But not to do in 2.8.1 release is a debatable 
topic which can be further discussed upon. Few points in favor of doing it is, 
# we are doing the change in minor version than the major version. (2.8.0 is 
not yet released and if possible we can incorporate in it too)
# As mentioned above if there is issue in the script better to flag it as an 
error rather than silently passing it as success, so better to flag an error if 
any script issues even for existing cluster too.

Thoughts ? would also like to see others input too


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-29 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447465#comment-15447465
 ] 

Yufei Gu commented on YARN-5567:


Thanks [~rchiang] for the review and commit! Thanks [~Naganarasimha] for the 
review!

> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5567) Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus

2016-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447442#comment-15447442
 ] 

Hudson commented on YARN-5567:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10371 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10371/])
YARN-5567. Fix script exit code checking in (rchiang: rev 
05ede003868871addc30162e9707c3dc14ed6b7a)
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java


> Fix script exit code checking in NodeHealthScriptRunner#reportHealthStatus
> --
>
> Key: YARN-5567
> URL: https://issues.apache.org/jira/browse/YARN-5567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.1
>
> Attachments: YARN-5567.001.patch
>
>
> In case of FAILED_WITH_EXIT_CODE, health status should be false.
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
> {code}
> should be 
> {code}
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(false, "", now);
> break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org