When a worker process dies, Supervisor is responsible for restarting it. If the 
worker process continues to fail, then eventually Nimbus will try to reschedule 
the tasks to a different worker node. This is all explained here: 
https://storm.apache.org/releases/1.2.1/Daemon-Fault-Tolerance.html

Is there a limit to how often workers will be restarted? If there is a bug on 
startup that causes repeated crashes, at some point it doesn't make sense to 
try and keep restarting the worker. I saw that there is a configuration 
property called STORM_NIMBUS_RETRY_TIMES, but this doesn't seem to apply to 
restarting worker processes.

Reply via email to