Hi Josh We are having the same issue for long time, and only solution is restart the whole storm cluster. (Actually I have asked the same question on 12 May but got no response.)
In the meantime, we are currently evaluating switch to Apache Spark for streaming, you might also have a look. On Wed, May 14, 2014 at 11:25 PM, Josh Walton <[email protected]> wrote: > Recently, we have had a couple of power failures for the servers running > our zookeeper cluster. When zookeeper dies, the nimbus and supervisor > processes eventually die as well. After the zookeeper failure, the only way > I have gotten the supervisor processes to start back up is to delete the > supervisor and worker directories as specified in the storm.yaml file. Is > there a better/cleaner way to restart them? > > I have also noticed that when I start nimbus and the UI process back up, > and navigate to the storm status page, the topologies we had started are > still shown as active (even though they are not). > > This is the exception in the supervisor logs when I try to start them up > after the zookeeper failure: > > 2014-05-14 09:16:03 b.s.event [ERROR] Error when processing event > java.lang.RuntimeException: java.io.EOFException > at backtype.storm.utils.Utils.deserialize(Utils.java:69) > ~[storm-core-0.9.0-rc3.jar:na] > at backtype.storm.utils.LocalState.snapshot(LocalState.java:28) > ~[storm-core-0.9.0-rc3.jar:na] > at backtype.storm.utils.LocalState.get(LocalState.java:39) > ~[storm-core-0.9.0-rc3.jar:na] > at > backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:187) > ~[storm-core-0.9.0-rc3.jar:na] > at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na] > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na] > at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na] > at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) > ~[clojure-1.4.0.jar:na] > at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na] > at backtype.storm.event$event_manager$fn__3070.invoke(event.clj:24) > ~[storm-core-0.9.0-rc3.jar:na] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na] > at java.lang.Thread.run(Thread.java:722) [na:1.7.0_21] > Caused by: java.io.EOFException: null > at > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323) > ~[na:1.7.0_21] > at > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792) > ~[na:1.7.0_21] > at > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:799) > ~[na:1.7.0_21] > at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) > ~[na:1.7.0_21] > at backtype.storm.utils.Utils.deserialize(Utils.java:64) > ~[storm-core-0.9.0-rc3.jar:na] > ... 11 common frames omitted > 2014-05-14 09:16:03 b.s.util [INFO] Halting process: ("Error when > processing an event") > >
