Hi all
I have used storm version 1.1.1 and zookeer 3.4.11 as no problem for a long
time.
A few days ago, zookeeper service failed and connection timeout occured with
storm during about 2minute.
So all supervisors halted and storm service failed for a long time.
Supervisor log is below.
How can I make the storm falut tolerant even if zookeeper timeout occurs?
My storm configuration is default in connect with zookeeper.
============================================
2018-05-18 01:11:19.348 o.a.s.u.Utils [ERROR] Halting process: Error when
processing an event
java.lang.RuntimeException: Halting process: Error when processing an event
at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.daemon.supervisor.DefaultUncaughtExceptionHandler.uncaughtException(DefaultUncaughtExceptionHandler.java:29)
~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:104)
~[storm-core-1.1.1.jar:1.1.1]
2018-05-18 01:11:19.348 o.a.s.e.EventManagerImp [ERROR] {} Error when
processing event
java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /assignments
at
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:182)
~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54)
~[storm-core-1.1.1.jar:1.1.1]
Caused by: java.lang.RuntimeException:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /assignments
at org.apache.storm.utils.Utils.wrapInRuntime(Utils.java:1531)
~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.zookeeper.Zookeeper.getChildren(Zookeeper.java:265)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.cluster.ZKStateStorage.get_children(ZKStateStorage.java:174)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.cluster.StormClusterStateImpl.assignments(StormClusterStateImpl.java:153)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:126)
~[storm-core-1.1.1.jar:1.1.1]
... 1 more
Caused by:
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /assignments
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1625)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:226)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.zookeeper.Zookeeper.getChildren(Zookeeper.java:260)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.cluster.ZKStateStorage.get_children(ZKStateStorage.java:174)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.cluster.StormClusterStateImpl.assignments(StormClusterStateImpl.java:153)
~[storm-core-1.1.1.jar:1.1.1]
at
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:126)
~[storm-core-1.1.1.jar:1.1.1]
... 1 more
2018-05-18 01:11:19.348 o.a.s.u.Utils [ERROR] Halting process: Error when
processing an event
java.lang.RuntimeException: Halting process: Error when processing an event
at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773)
~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:63)
~[storm-core-1.1.1.jar:1.1.1]
2018-05-18 01:11:19.350 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket
connection to server 1.2.3.4/1.2.3.4:10013. Will not attempt to authenticate
using SASL (unknown error)
2018-05-18 01:11:19.351 o.a.s.d.s.Supervisor [INFO] Shutting down supervisor
43e735b5-f39d-493f-bd25-990e85812a8d
=======================================