STORM-3240 any non-zero exit code causes health check failure

Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/0b32a295
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/0b32a295
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/0b32a295

Branch: refs/heads/master
Commit: 0b32a2950c61814ec6f9a9d73a82242559bb003f
Parents: 9e84142
Author: Aaron Gresch <agre...@yahoo-inc.com>
Authored: Tue Oct 2 15:35:59 2018 -0500
Committer: Aaron Gresch <agre...@yahoo-inc.com>
Committed: Tue Oct 2 15:35:59 2018 -0500

----------------------------------------------------------------------
 docs/Setting-up-a-Storm-cluster.md                               | 2 +-
 .../main/java/org/apache/storm/healthcheck/HealthChecker.java    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/0b32a295/docs/Setting-up-a-Storm-cluster.md
----------------------------------------------------------------------
diff --git a/docs/Setting-up-a-Storm-cluster.md 
b/docs/Setting-up-a-Storm-cluster.md
index c4a637c..d770a58 100644
--- a/docs/Setting-up-a-Storm-cluster.md
+++ b/docs/Setting-up-a-Storm-cluster.md
@@ -92,7 +92,7 @@ drpc.servers: ["111.222.333.44"]
 
 ### Monitoring Health of Supervisors
 
-Storm provides a mechanism by which administrators can configure the 
supervisor to run administrator supplied scripts periodically to determine if a 
node is healthy or not. Administrators can have the supervisor determine if the 
node is in a healthy state by performing any checks of their choice in scripts 
located in storm.health.check.dir. If a script detects the node to be in an 
unhealthy state, it must print a line to standard output beginning with the 
string ERROR and return a non-zero exit code. In pre-Storm 2.x releases, a bug 
considered a script exit value of 0 to be a failure.  This has now been fixed.  
The supervisor will periodically run the scripts in the health check dir and 
check the output. If the script’s output contains the string ERROR, as 
described above, the supervisor will shut down any workers and exit.
+Storm provides a mechanism by which administrators can configure the 
supervisor to run administrator supplied scripts periodically to determine if a 
node is healthy or not. Administrators can have the supervisor determine if the 
node is in a healthy state by performing any checks of their choice in scripts 
located in storm.health.check.dir. If a script detects the node to be in an 
unhealthy state, it must return a non-zero exit code. In pre-Storm 2.x 
releases, a bug considered a script exit value of 0 to be a failure.  This has 
now been fixed.  The supervisor will periodically run the scripts in the health 
check dir and check the output. If the script’s output contains the string 
ERROR, as described above, the supervisor will shut down any workers and exit.
 
 If the supervisor is running with supervision "/bin/storm node-health-check" 
can be called to determine if the supervisor should be launched or if the node 
is unhealthy.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/0b32a295/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java
----------------------------------------------------------------------
diff --git 
a/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java 
b/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java
index 38bcf64..b5f3655 100644
--- a/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java
+++ b/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java
@@ -107,10 +107,10 @@ public class HealthChecker {
                 while ((str = reader.readLine()) != null) {
                     if (str.startsWith("ERROR")) {
                         LOG.warn("The healthcheck process {} exited with code 
{}", script, process.exitValue());
-                        return FAILED_WITH_EXIT_CODE;
+                        return FAILED;
                     }
                 }
-                return SUCCESS;
+                return FAILED_WITH_EXIT_CODE;
             }
             return SUCCESS;
         } catch (InterruptedException | ClosedByInterruptException e) {

Reply via email to