Ah, sorry, got off on the wrong track due to the linked issue, which is
talking about worker JVM exit codes.

Den man. 6. maj 2019 kl. 19.34 skrev Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
[email protected]>:

> Sorry if my initial question was misleading. The "Storm kill" command
> returned 143, there was no exit code from our topology. Our topology was
> never shutdown and never received a command to shutdown. As far as I can
> tell, Nimbus never received a command from running Storm kill in this case.
> So the process created to carry out the kill command was the one
> terminated. As Derek mentioned, it seems like something killed the process.
> I am wondering if since so many topologies were being brought down at once,
> the process took a long time to communicate with Nimbus and timed out/was
> terminated. Is something like this possible? As far as I can tell, there
> was no external command at the time to kill the process.
>
> From: [email protected] At: 05/06/19 13:14:02
> To: [email protected]
> Subject: Re: Storm kill fails with exit code 143
>
> I would assume that what actually happened is that most of your workers
> don't manage to finish shutting down the worker gracefully, and so exit
> with code 20 due to the 1 second time limit imposed by the shutdown hook.
> One of your workers happened to run the entire shutdown sequence within the
> 1 second time limit, and so returns 143.
>
> Basically what is happening is that the supervisor sends SIGTERM to the
> worker to get it to shut down. The worker then runs its shutdown sequence
> to shutdown gracefully. Before starting the shutdown sequence, the worker
> sets up a new thread that sleeps for 1 second, then halts the JVM with exit
> code 20. If the shutdown exceeds the time limit, you get exit code 20. If
> the shutdown is finished within the time limit, you get 143 in response to
> the original SIGTERM.
>
> Den man. 6. maj 2019 kl. 18.22 skrev Derek Dagit <[email protected]>:
>
>> An exit code of 143 indicates a SIGTERM was received. (143 - 128 = 15).
>>
>> It seems like something killed the shutdown script.
>>
>> https://www.tldp.org/LDP/abs/html/exitcodes.html
>>
>> On Sun, May 5, 2019 at 8:19 PM JF Chen <[email protected]> wrote:
>>
>>> Do you run your storm application on yarn?
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>>
>>> On Mon, May 6, 2019 at 4:53 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
>>> [email protected]> wrote:
>>>
>>>> Recently our shutdown script failed when calling storm kill with a
>>>> return code of 143. Typically this means that SIGTERM was received and the
>>>> process was terminated. I see in
>>>> https://issues.apache.org/jira/browse/STORM-2176 that it is possible
>>>> to get this exit code if a topology takes too long to come down. However,
>>>> we are running version 1.2.1 of Storm, which should have the fix mentioned
>>>> in the issue. Is it possible that we have the same cause for our error?
>>>> When this occurred, many topologies were brought down at once, but only
>>>> this one topology seemed to have an issue.
>>>>
>>>
>

Reply via email to