Say that we want to kill all topologies when a machine is brought down. The machine will be brought back up shortly after, which includes restarting supervisor. If supervisor always restarts worker processes after kill_workers, then won't they still be restarted when supervisor is brought back up, since active topologies are kept within ZooKeeper? And if supervisor is required for this command to work, then supervisor must be running until kill_workers completes. How can we guarantee that supervisor is then killed before the worker processes are restarted?
From: user@storm.apache.org At: 04/30/19 16:58:17To: user@storm.apache.org Subject: Re: Kill_workers cli not working as expected I believe kill_workers is for cleaning up workers if e.g. you want to shut down a supervisor node, or if you have an unstable machine you want to take out of the cluster. The command was introduced because simply killing the supervisor process would leave the workers alive. If you want to kill the workers and keep them dead, you should also kill the supervisor on that machine. More context at https://issues.apache.org/jira/browse/STORM-1058 Den tir. 30. apr. 2019 kl. 22.28 skrev Mitchell Rathbun (BLOOMBERG/ 731 LEX) <mrathb...@bloomberg.net>: We currently run both Nimbus and Supervisor on the same cluster. When running 'storm kill_workers', I have noticed that all of the workers are killed, but then are restarted. In the supervisor log I see the following for each topology: 2019-04-30 16:21:17,571 INFO Slot [SLOT_19227] STATE KILL_AND_RELAUNCH msInState: 5 topo:WingmanTopology998-1-1556594165 worker:f0de5 54d-81a1-48ce-82e8-9beef009969b -> WAITING_FOR_WORKER_START msInState: 0 topo:WingmanTopology998-1-1556594165 worker:f0de554d-81a1-48c e-82e8-9beef009969b 2019-04-30 16:21:25,574 INFO Slot [SLOT_19227] STATE WAITING_FOR_WORKER_START msInState: 8003 topo:WingmanTopology998-1-1556594165 wo rker:f0de554d-81a1-48ce-82e8-9beef009969b -> RUNNING msInState: 0 topo:WingmanTopology998-1-1556594165 worker:f0de554d-81a1-48ce-82e8- 9beef009969b Is this the expected behavior (worker process is bounced, not killed)? I thought that kill_workers would essentially run 'storm kill' for each of the worker processes.