When you look at the worker logs, do some of the workers sometimes kill
themselves because there is a missing stormconf.ser file?  If so, grab that
error message and have fun googling.

Some will say those problems went away with the latest release.
Apparently, it is complicated.

My best advice is to avoid resource contention issues ( failed heartbeats )
or unhandled exceptions ( worker commits suicide ) that result in workers
dying or being killed.  This may lead to a state that I describe as the
fubar worker loop.  I haven't figured it out, but basically, a worker gets
up, the supervisor is looking for the wrong worker, the supervisor decides
the worker is dead, so cleans up, the clean up means the worker no longer
sees its stormconf.ser, so it commits suicide.  That's one version of
events.

Good luck.

Thank you for your time!

+++++++++++++++++++++
Jeff Maass <[email protected]>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Fri, May 29, 2015 at 12:36 PM, Grant Overby (groverby) <
[email protected]> wrote:

>  Supervisor is reporting that the worker “still hasn’t started” and
> eventually kills and restarts the worker. However; the worker has started
> and is processing tuples. This repeats indefinitely.
>
>  Debugging steps?
>
>
>         *Grant Overby*
> Software Engineer
> Cisco.com <http://www.cisco.com/>
> [email protected]
> Mobile: *865 724 4910 <865%20724%204910>*
>
>
>
>        Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> Please click here
> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for
> Company Registration Information.
>
>
>
>

Reply via email to