Sorry if my initial question was misleading. The "Storm kill" command returned 
143, there was no exit code from our topology. Our topology was never shutdown 
and never received a command to shutdown. As far as I can tell, Nimbus never 
received a command from running Storm kill in this case. So the process created 
to carry out the kill command was the one terminated. As Derek mentioned, it 
seems like something killed the process. I am wondering if since so many 
topologies were being brought down at once, the process took a long time to 
communicate with Nimbus and timed out/was terminated. Is something like this 
possible? As far as I can tell, there was no external command at the time to 
kill the process.

From: [email protected] At: 05/06/19 13:14:02To:  [email protected]
Subject: Re: Storm kill fails with exit code 143

I would assume that what actually happened is that most of your workers don't 
manage to finish shutting down the worker gracefully, and so exit with code 20 
due to the 1 second time limit imposed by the shutdown hook. One of your 
workers happened to run the entire shutdown sequence within the 1 second time 
limit, and so returns 143.

Basically what is happening is that the supervisor sends SIGTERM to the worker 
to get it to shut down. The worker then runs its shutdown sequence to shutdown 
gracefully. Before starting the shutdown sequence, the worker sets up a new 
thread that sleeps for 1 second, then halts the JVM with exit code 20. If the 
shutdown exceeds the time limit, you get exit code 20. If the shutdown is 
finished within the time limit, you get 143 in response to the original 
SIGTERM. 

Den man. 6. maj 2019 kl. 18.22 skrev Derek Dagit <[email protected]>:

An exit code of 143 indicates a SIGTERM was received. (143 - 128 = 15).

It seems like something killed the shutdown script.

https://www.tldp.org/LDP/abs/html/exitcodes.html

On Sun, May 5, 2019 at 8:19 PM JF Chen <[email protected]> wrote:

Do you run your storm application on yarn? 

Regard,
Junfeng Chen


On Mon, May 6, 2019 at 4:53 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) 
<[email protected]> wrote:

Recently our shutdown script failed when calling storm kill with a return code 
of 143. Typically this means that SIGTERM was received and the process was 
terminated. I see in https://issues.apache.org/jira/browse/STORM-2176 that it 
is possible to get this exit code if a topology takes too long to come down. 
However, we are running version 1.2.1 of Storm, which should have the fix 
mentioned in the issue. Is it possible that we have the same cause for our 
error? When this occurred, many topologies were brought down at once, but only 
this one topology seemed to have an issue.


Reply via email to