Hi all
I have used storm version 1.1.1 and zookeer 3.4.11 as no problem for a long 
time.
A few days ago, zookeeper service failed and  connection timeout occured with 
storm during about 2minute.
So all supervisors halted and storm service failed for a long time.
Supervisor log is below.
How can I make the storm falut tolerant even if zookeeper timeout occurs?
My storm configuration is default  in connect with zookeeper. 
​
​
​
============================================
2018-05-18 01:11:19.348 o.a.s.u.Utils [ERROR] Halting process: Error when 
processing an event
java.lang.RuntimeException: Halting process: Error when processing an event
       at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773) 
~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.daemon.supervisor.DefaultUncaughtExceptionHandler.uncaughtException(DefaultUncaughtExceptionHandler.java:29)
 ~[storm-core-1.1.1.jar:1.1.1]
       at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:104) 
~[storm-core-1.1.1.jar:1.1.1]
2018-05-18 01:11:19.348 o.a.s.e.EventManagerImp [ERROR] {} Error when 
processing event
java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /assignments
       at 
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:182)
 ~[storm-core-1.1.1.jar:1.1.1]
       at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54) 
~[storm-core-1.1.1.jar:1.1.1]
Caused by: java.lang.RuntimeException: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /assignments
       at org.apache.storm.utils.Utils.wrapInRuntime(Utils.java:1531) 
~[storm-core-1.1.1.jar:1.1.1]
       at org.apache.storm.zookeeper.Zookeeper.getChildren(Zookeeper.java:265) 
~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.cluster.ZKStateStorage.get_children(ZKStateStorage.java:174) 
~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.cluster.StormClusterStateImpl.assignments(StormClusterStateImpl.java:153)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:126)
 ~[storm-core-1.1.1.jar:1.1.1]
       ... 1 more
Caused by: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /assignments
       at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1625)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:226)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
 ~[storm-core-1.1.1.jar:1.1.1]
       at org.apache.storm.zookeeper.Zookeeper.getChildren(Zookeeper.java:260) 
~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.cluster.ZKStateStorage.get_children(ZKStateStorage.java:174) 
~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.cluster.StormClusterStateImpl.assignments(StormClusterStateImpl.java:153)
 ~[storm-core-1.1.1.jar:1.1.1]
       at 
org.apache.storm.daemon.supervisor.ReadClusterState.run(ReadClusterState.java:126)
 ~[storm-core-1.1.1.jar:1.1.1]
       ... 1 more
2018-05-18 01:11:19.348 o.a.s.u.Utils [ERROR] Halting process: Error when 
processing an event
java.lang.RuntimeException: Halting process: Error when processing an event
       at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773) 
~[storm-core-1.1.1.jar:1.1.1]
       at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:63) 
~[storm-core-1.1.1.jar:1.1.1]
2018-05-18 01:11:19.350 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket 
connection to server 1.2.3.4/1.2.3.4:10013. Will not attempt to authenticate 
using SASL (unknown error)
2018-05-18 01:11:19.351 o.a.s.d.s.Supervisor [INFO] Shutting down supervisor 
43e735b5-f39d-493f-bd25-990e85812a8d​
​
=======================================​

Reply via email to