When you look at the worker logs, do some of the workers sometimes kill themselves because there is a missing stormconf.ser file? If so, grab that error message and have fun googling.
Some will say those problems went away with the latest release. Apparently, it is complicated. My best advice is to avoid resource contention issues ( failed heartbeats ) or unhandled exceptions ( worker commits suicide ) that result in workers dying or being killed. This may lead to a state that I describe as the fubar worker loop. I haven't figured it out, but basically, a worker gets up, the supervisor is looking for the wrong worker, the supervisor decides the worker is dead, so cleans up, the clean up means the worker no longer sees its stormconf.ser, so it commits suicide. That's one version of events. Good luck. Thank you for your time! +++++++++++++++++++++ Jeff Maass <[email protected]> linkedin.com/in/jeffmaass stackoverflow.com/users/373418/maassql +++++++++++++++++++++ On Fri, May 29, 2015 at 12:36 PM, Grant Overby (groverby) < [email protected]> wrote: > Supervisor is reporting that the worker “still hasn’t started” and > eventually kills and restarts the worker. However; the worker has started > and is processing tuples. This repeats indefinitely. > > Debugging steps? > > > *Grant Overby* > Software Engineer > Cisco.com <http://www.cisco.com/> > [email protected] > Mobile: *865 724 4910 <865%20724%204910>* > > > > Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosure > by others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > > >
