Hi Storm user While testing a topology in LocalCluster, I observed a topology failed with the following RuntimeException, sorry for ugly formatting.
ERROR [Thread-3] (NO_SOURCE_FILE:0) - Error when processing event java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storms/metrics-test-1-1429301826 at backtype.storm.util$wrap_in_runtime.invoke(util.clj:28) at backtype.storm.zookeeper$exists_node_QMARK_$fn__991.invoke(zookeeper.clj:82) at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:78) at backtype.storm.zookeeper$get_data.invoke(zookeeper.clj:104) at backtype.storm.cluster$mk_distributed_cluster_state$reify__1696.get_data(cluster.clj:82) at backtype.storm.cluster$mk_storm_cluster_state$reify__2115.storm_base(cluster.clj:319) at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) at backtype.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:313) at backtype.storm.daemon.nimbus$mk_assignments$iter__5348__5352$fn__5353.invoke(nimbus.clj:639) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:473) at clojure.core$seq.invoke(core.clj:133) at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) at clojure.core.protocols$fn__5875.invoke(protocols.clj:54) at clojure.core.protocols$fn__5828$G__5823__5841.invoke(protocols.clj:13) at clojure.core$reduce.invoke(core.clj:6030) at clojure.core$into.invoke(core.clj:6077) at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:638) at clojure.lang.RestFn.invoke(RestFn.java:410) at backtype.storm.daemon.nimbus$fn__5528$exec_fn__1229__auto____5529$fn__5534$fn__5535.invoke(nimbus.clj:895) at backtype.storm.daemon.nimbus$fn__5528$exec_fn__1229__auto____5529$fn__5534.invoke(nimbus.clj:894) at backtype.storm.timer$schedule_recurring$this__3019.invoke(timer.clj:77) at backtype.storm.timer$mk_timer$fn__3002$fn__3003.invoke(timer.clj:33) at backtype.storm.timer$mk_timer$fn__3002.invoke(timer.clj:26) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storms/metrics-test-1-1429301826 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815) at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:149) at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:138) at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85) at com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:134) at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:125) at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:34) at backtype.storm.zookeeper$exists_node_QMARK_$fn__991.invoke(zookeeper.clj:81) ... 29 more I am afraid that this will happen in the production environment. If I want to avoid total topology failure on zookeeper connection or session timeout, what should I do? Thank you Best, Jae
