Hi, we are use storm version 0.9.6 with kafka,Esper
(5 node. nimbus + supervisor*5. 3topology.)
But, storm worker was down and restart topology repeatedly.
worker error occured like below.
2017-05-14 17:11:03,516 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x25892cdbc1e00d2, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-05-14 17:11:03,517 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /xxx.xxx.xxx.142:54007 which had sessionid
0x25892cdbc1e00d2
In supervisor logged this error.
2017-05-14T17:15:42.191+0900 b.s.d.supervisor [INFO] Shutting down and clearing
state for id fa8049b5-7c1d-410e-ae38-61fbc45f4d36. Current supervisor time:
1494749742. State: :timed-out, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1494749441, :storm-id
"mornitoring-topology-8-1491886218", :executors #{[34 34] [4 4] [39 39] [9 9]
[44 44] [14 14] [49 49] [19 19] [54 54] [24 24] [29 29] [-1 -1]}, :port 6704}
2017-05-14T17:15:42.213+0900 b.s.d.supervisor [INFO] Shutting down
2e27e4ad-5917-43b1-aa7a-4488805508d5:fa8049b5-7c1d-410e-ae38-61fbc45f4d36
2017-05-14T17:15:42.217+0900 b.s.util [INFO] Error when trying to kill 29467.
Process is probably already dead.
And worker log, logged this error.
2017-05-14T17:14:33.255+0900 b.s.m.n.Client [INFO] closing Netty Client
Netty-Client-storm02/xxx.xxx.xxx.142:6703
2017-05-14T17:14:33.256+0900 b.s.m.n.Client [INFO] waiting up to 600000 ms to
send 0 pending messages to Netty-Client-storm02/xxx.xxx.xxx:6703
2017-05-14T17:14:33.267+0900 STDIO [ERROR] May 14, 2017 5:14:33 PM
org.apache.storm.netty.util.HashedWheelTimer
WARNING: An exception was thrown by TimerTask.
java.lang.RuntimeException: Giving up to scheduleConnect to
Netty-Client-storm02/xxx.xxx.xxx.142:6703 after 302 failed attempts. 865
messages were lost
at backtype.storm.messaging.netty.Client$Connect.run(Client.java:506)
at
org.apache.storm.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
at
org.apache.storm.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
at
org.apache.storm.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at java.lang.Thread.run(Thread.java:745)
.....
2017-05-14T17:14:33.655+0900 b.s.m.n.Client [ERROR] discarding 1 messages
because the Netty client to Netty-Client-storm02/xxx.xxx.xxx.142:6703 is being
closed
2017-05-14T17:14:33.755+0900 b.s.m.n.Client [ERROR] discarding 1 messages
because the Netty client to Netty-Client-storm02/xxx.xxx.xxx.142:6703 is being
closed
I cant find any other solution. How can i fix it?!
Oh, i added my configuration zookeeper and storm.
zookeeper zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/jb_log/zkdata/1
clientPort=2181
server.1=storm01:2888:3888
server.2=storm02:2888:3888
server.3=storm03:2888:3888
server.4=storm04:2888:3888
server.5=storm05:2888:3888
autopurge.purgeInterval=1
storm.yaml
storm.zookeeper.servers:
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
storm.zookeeper.port: 2181
zookeeper.multiple.setup:
follower.port:2888
election.port:3888
nimbus.host: "storm01"
nimbus.task.timeout.secs: 300
nimbus.task.launch.secs: 360
nimbus.supervisor.timeout.secs: 300
supervisor.worker.timeout.secs: 300
ui.port: 80
storm.supervisor.hosts:
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
storm.local.dir: /jb_log/storm-data
worker.childopts: "-Xms4096m -Xmx4096m -Djava.net.preferIPv4Stack=true"
topology.workers: 5
storm.log.dir: /jb_log/storm-log