I would assume that what actually happened is that most of your workers don't manage to finish shutting down the worker gracefully, and so exit with code 20 due to the 1 second time limit imposed by the shutdown hook. One of your workers happened to run the entire shutdown sequence within the 1 second time limit, and so returns 143.
Basically what is happening is that the supervisor sends SIGTERM to the worker to get it to shut down. The worker then runs its shutdown sequence to shutdown gracefully. Before starting the shutdown sequence, the worker sets up a new thread that sleeps for 1 second, then halts the JVM with exit code 20. If the shutdown exceeds the time limit, you get exit code 20. If the shutdown is finished within the time limit, you get 143 in response to the original SIGTERM. Den man. 6. maj 2019 kl. 18.22 skrev Derek Dagit <[email protected]>: > An exit code of 143 indicates a SIGTERM was received. (143 - 128 = 15). > > It seems like something killed the shutdown script. > > https://www.tldp.org/LDP/abs/html/exitcodes.html > > On Sun, May 5, 2019 at 8:19 PM JF Chen <[email protected]> wrote: > >> Do you run your storm application on yarn? >> >> Regard, >> Junfeng Chen >> >> >> On Mon, May 6, 2019 at 4:53 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) < >> [email protected]> wrote: >> >>> Recently our shutdown script failed when calling storm kill with a >>> return code of 143. Typically this means that SIGTERM was received and the >>> process was terminated. I see in >>> https://issues.apache.org/jira/browse/STORM-2176 that it is possible to >>> get this exit code if a topology takes too long to come down. However, we >>> are running version 1.2.1 of Storm, which should have the fix mentioned in >>> the issue. Is it possible that we have the same cause for our error? When >>> this occurred, many topologies were brought down at once, but only this one >>> topology seemed to have an issue. >>> >>
