Hi,

We have been running STORM for a few months in production.

We started facing an issue with workers crashing all the time.

*2016-11-23 11:56:51.818 o.a.s.util [ERROR] Halting process: ("Worker
died")*
*java.lang.RuntimeException: ("Worker died")*
* at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
[storm-core-1.0.0.jar:1.0.0]*
* at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]*
* at
org.apache.storm.daemon.worker$fn__8831$fn__8832.invoke(worker.clj:762)
[storm-core-1.0.0.jar:1.0.0]*
* at
org.apache.storm.daemon.executor$mk_executor_data$fn__8046$fn__8047.invoke(executor.clj:271)
[storm-core-1.0.0.jar:1.0.0]*
* at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:494)
[storm-core-1.0.0.jar:1.0.0]*
* at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]*
* at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]*

Our suspicion is that its being caused because of the following error,

java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /partition_2/145665
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:448)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:414)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.daemon.executor$fn__8226$fn__8239$fn__8292.invoke(executor.clj:851)
~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484)
[storm-core-1.0.0.jar:1.0.0]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
Caused by: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /partition_2/145665
at
org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:119)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.trident.spout.OpaquePartitionedTridentSpoutExecutor$Emitter.commit(OpaquePartitionedTridentSpoutExecutor.java:167)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.trident.spout.TridentSpoutExecutor.execute(TridentSpoutExecutor.java:70)
~[storm-core-1.0.0.jar:1.0.0]
at
org.apache.storm.trident.topology.TridentBoltExecutor.execute(TridentBoltExecutor.java:328)
~[storm-core-1.0.0.jar:1.0.0]

This error is happening all the time, we have workers crashing every few
minutes in our prod cluster currently.

We found the following JIRA for this issue,

https://issues.apache.org/jira/browse/STORM-1114

which looks similar, but we don't have the problem in our beta or
alpha cluster.

Any help would be highly appreciated.

Uday

Reply via email to