storm worker down repeatedly in 0.9.6

최재완 Mon, 15 May 2017 03:02:07 -0700

Hi, we are use storm version 0.9.6 with kafka,Esper
(5 node. nimbus + supervisor*5. 3topology.)
But, storm worker was down and restart topology repeatedly.
worker error occured like below.
2017-05-14 17:11:03,516 [myid:2] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x25892cdbc1e00d2, likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2017-05-14 17:11:03,517 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /xxx.xxx.xxx.142:54007 which had sessionid 
0x25892cdbc1e00d2


In supervisor logged this error.
2017-05-14T17:15:42.191+0900 b.s.d.supervisor [INFO] Shutting down and clearing 
state for id fa8049b5-7c1d-410e-ae38-61fbc45f4d36. Current supervisor time: 
1494749742. State: :timed-out, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1494749441, :storm-id 
"mornitoring-topology-8-1491886218", :executors #{[34 34] [4 4] [39 39] [9 9] 
[44 44] [14 14] [49 49] [19 19] [54 54] [24 24] [29 29] [-1 -1]}, :port 6704}
2017-05-14T17:15:42.213+0900 b.s.d.supervisor [INFO] Shutting down 
2e27e4ad-5917-43b1-aa7a-4488805508d5:fa8049b5-7c1d-410e-ae38-61fbc45f4d36
2017-05-14T17:15:42.217+0900 b.s.util [INFO] Error when trying to kill 29467. 
Process is probably already dead.

And worker log, logged this error.
2017-05-14T17:14:33.255+0900 b.s.m.n.Client [INFO] closing Netty Client 
Netty-Client-storm02/xxx.xxx.xxx.142:6703
2017-05-14T17:14:33.256+0900 b.s.m.n.Client [INFO] waiting up to 600000 ms to 
send 0 pending messages to Netty-Client-storm02/xxx.xxx.xxx:6703
2017-05-14T17:14:33.267+0900 STDIO [ERROR] May 14, 2017 5:14:33 PM 
org.apache.storm.netty.util.HashedWheelTimer
WARNING: An exception was thrown by TimerTask.
java.lang.RuntimeException: Giving up to scheduleConnect to 
Netty-Client-storm02/xxx.xxx.xxx.142:6703 after 302 failed attempts. 865 
messages were lost
        at backtype.storm.messaging.netty.Client$Connect.run(Client.java:506)
        at 
org.apache.storm.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
        at 
org.apache.storm.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
        at 
org.apache.storm.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
        at 
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at java.lang.Thread.run(Thread.java:745)
.....
2017-05-14T17:14:33.655+0900 b.s.m.n.Client [ERROR] discarding 1 messages 
because the Netty client to Netty-Client-storm02/xxx.xxx.xxx.142:6703 is being 
closed
2017-05-14T17:14:33.755+0900 b.s.m.n.Client [ERROR] discarding 1 messages 
because the Netty client to Netty-Client-storm02/xxx.xxx.xxx.142:6703 is being 
closed
  
I cant find any other solution. How can i fix it?!
 
 
Oh, i added my configuration zookeeper and storm.
 
zookeeper zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/jb_log/zkdata/1
clientPort=2181
server.1=storm01:2888:3888
server.2=storm02:2888:3888
server.3=storm03:2888:3888
server.4=storm04:2888:3888
server.5=storm05:2888:3888
autopurge.purgeInterval=1

storm.yaml
 storm.zookeeper.servers:
     - "storm01"
     - "storm02"
     - "storm03"
     - "storm04"
     - "storm05"
 storm.zookeeper.port: 2181
 zookeeper.multiple.setup:
  follower.port:2888
  election.port:3888
 nimbus.host: "storm01"
 nimbus.task.timeout.secs: 300
 nimbus.task.launch.secs: 360
 nimbus.supervisor.timeout.secs: 300
 supervisor.worker.timeout.secs: 300
 ui.port: 80
 storm.supervisor.hosts:
  - "storm01"
  - "storm02"
  - "storm03"
  - "storm04"
  - "storm05"
 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
     - 6704
 storm.local.dir: /jb_log/storm-data
 worker.childopts: "-Xms4096m -Xmx4096m -Djava.net.preferIPv4Stack=true"
 topology.workers: 5
 storm.log.dir: /jb_log/storm-log

storm worker down repeatedly in 0.9.6

Reply via email to