We are using Storm 0.9.0.1 with Netty and Trident topologies on a single 
machine (nimbus, supervisor, and drpc running on the same machine).  Supervisor 
keeps dying and gets restarted after 7-8 seconds by Supervisord (the service 
that restarts storm and zookeeper processes).  Here is the error in 
supervisor.log we see over and over:

2014-04-15 21:13:13 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: java.io.EOFException
        at backtype.storm.utils.Utils.deserialize(Utils.java:69) 
~[storm-core-0.9.0.1.jar:na]
        at backtype.storm.utils.LocalState.snapshot(LocalState.java:28) 
~[storm-core-0.9.0.1.jar:na]
        at backtype.storm.utils.LocalState.get(LocalState.java:39) 
~[storm-core-0.9.0.1.jar:na]
        at 
backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:187) 
~[storm-core-0.9.0.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
        at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
        at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) 
~[clojure-1.4.0.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
        at backtype.storm.event$event_manager$fn__3072.invoke(event.clj:24) 
~[storm-core-0.9.0.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
        at java.lang.Thread.run(Unknown Source) [na:1.7.0_45]
Caused by: java.io.EOFException: null
        at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source) 
~[na:1.7.0_45]
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown 
Source) ~[na:1.7.0_45]
        at java.io.ObjectInputStream.readStreamHeader(Unknown Source) 
~[na:1.7.0_45]
        at java.io.ObjectInputStream.<init>(Unknown Source) ~[na:1.7.0_45]
        at backtype.storm.utils.Utils.deserialize(Utils.java:64) 
~[storm-core-0.9.0.1.jar:na]
        ... 11 common frames omitted
2014-04-15 21:13:13 b.s.util [INFO] Halting process: ("Error when processing an 
event")

Any ideas why supervisor might be dying?

Per recommendation from the post "Supervisor throwing error on start up" from 
https://groups.google.com/forum/#!topic/storm-user/2gapTYTRrX8, we stopped 
storm processes, cleared the storm and zookeeper data directories, and it was 
fine (after we loaded the topologies again).  However, we would like to know 
how to prevent this bug from happening in a production system environment.

We are also getting a ton of Connection refused errors in the Nimbus and Worker 
logs.  I expect this would be the case if Supervisor can't start up.

Thank you,
Randy

Reply via email to