Hello, Storm users! I'm facing wired storm related problem with acking in topologies deployed in cluster of storm-0.9.0.1. After deploying the topology everything is fine: spout emits messages and gets acks. But if I kill the spout worker process it is restarted and don't receive acks any more. All messages are failed after 'topology.message.timeout.secs' time elapsed. This behaviour can be reproduced in storm cluster with example topology: *https://gist.github.com/isimonenko/9270833 <https://gist.github.com/isimonenko/9270833>* . Spout logs and emits anchored message. Bolt logs and acks it. Problem is not reproduced if topology is deployed in different slots on the same server (on localhost). But if all workers are on different servers (before and after restart) it is reproduced repeatedly.
Here are some properties from storm.yaml which i consider could be important for this issue: nimbus.childopts: "-Djava.net.preferIPv4Stack=true -Xmx1024m" supervisor.childopts: "-Djava.net.preferIPv4Stack=true -Xmx256m" ui.childopts: "-Djava.net.preferIPv4Stack=true -Xmx768m" topology.worker.childopts: "-Djava.net.preferIPv4Stack=true" worker.childopts: "-Djava.net.preferIPv4Stack=true -Xmx2048m -Dworker.id=%ID% -Dnode.home=/usr/local/storm" storm.messaging.transport: "backtype.storm.messaging.netty.Context" storm.messaging.netty.server_worker_threads: 1 storm.messaging.netty.client_worker_threads: 1 storm.messaging.netty.buffer_size: 5242880 storm.messaging.netty.max_retries: 100 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 Topology is deployed in cluster of 24 storm nodes. Storm GUI screenshots with statistics and verbose logs before and after restart are given below. *Before restart:* Storm GUI screenshots: topology summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg>* AckSpoutTest summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg>* AckTestBolt summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg>* __acker summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg>* Logs: Spout: 2014-02-28 16:14:02 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 178-1393589642202 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout default [178-1393589642202] 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [4258648627535891027 -2569531891572915240 2] 2014-02-28 16:14:03 b.s.d.executor [INFO] Processing received message source: __acker:3, stream: __ack_ack, id: {}, [4258648627535891027] 2014-02-28 16:14:03 b.s.d.executor [INFO] Acking message 178-1393589642202 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout received ack for message with id = 178-1393589642202 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 179-1393589643207 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout default [179-1393589643207] 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [6102388282704904874 6787936987021307408 2] 2014-02-28 16:14:04 b.s.d.executor [INFO] Processing received message source: __acker:3, stream: __ack_ack, id: {}, [6102388282704904874] 2014-02-28 16:14:04 b.s.d.executor [INFO] Acking message 179-1393589643207 2014-02-28 16:14:04 c.p.z.t.AckTestTopology [INFO] AckTestSpout received ack for message with id = 179-1393589643207 Bolt: 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 178-1393589642202 2014-02-28 16:13:52 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [4258648627535891027 -2569531891572915240] 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 178-1393589642202 2014-02-28 16:13:53 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {6102388282704904874=6787936987021307408}, [179-1393589643207] 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 179-1393589643207 2014-02-28 16:13:53 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [6102388282704904874 6787936987021307408] 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 179-1393589643207 2014-02-28 16:13:54 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {5171521871336409169=4992888649245426560}, [180-1393589644211] Acker: 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [4258648627535891027, -2569531891572915240, 2] 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [4258648627535891027, -2569531891572915240] 2014-02-28 16:14:08 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [4258648627535891027] 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [6102388282704904874, 6787936987021307408, 2] 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [6102388282704904874, 6787936987021307408] 2014-02-28 16:14:09 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [6102388282704904874] 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [5171521871336409169, 4992888649245426560, 2] 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [-8625148924215110757, 420872446365787180, 2] 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [5171521871336409169, 4992888649245426560] 2014-02-28 16:14:10 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [5171521871336409169] Then I kill Spout worker process and wait while it is restarted by supervisor. *After restart:* topology summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg>* Spout summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg>* Bolt summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg>* __acker summary *http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg <http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg>* Spout: 2014-02-28 16:23:26 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 21-1393590206068 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout default [21-1393590206068] 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [3347220725071045289 -3829793676544359544 2] 2014-02-28 16:23:27 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 22-1393590207071 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout default [22-1393590207071] 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [401953823061778910 6589282934733062550 2] 2014-02-28 16:23:28 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 23-1393590208075 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout default [23-1393590208075] 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [5841692507792529958 -7336350672377944188 2] ..... 2014-02-28 16:23:33 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = 30-1393590213096 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout default [30-1393590213096] 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init [-2972038151496619125 3635851016679519459 2] 2014-02-28 16:23:46 b.s.d.executor [INFO] Processing received message source: __system:-1, stream: __tick, id: {}, [20] 2014-02-28 16:24:06 b.s.d.executor [INFO] Processing received message source: __system:-1, stream: __tick, id: {}, [20] 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 24-1393590209079 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 23-1393590208075 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 22-1393590207071 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 26-1393590211086 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 30-1393590213096 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 29-1393590212092 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 28-1393590212092 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 25-1393590209080 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 21-1393590206068 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received fail for message with id = 27-1393590212090 Bolt: 2014-02-28 16:22:45 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {8230692570439655861=-4837331369108321771}, [20-1393590175101] 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 20-1393590175101 2014-02-28 16:22:45 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [8230692570439655861 -4837331369108321771] 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 20-1393590175101 2014-02-28 16:23:16 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {3347220725071045289=-3829793676544359544}, [21-1393590206068] 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 21-1393590206068 2014-02-28 16:23:16 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [3347220725071045289 -3829793676544359544] 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 21-1393590206068 2014-02-28 16:23:17 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {401953823061778910=6589282934733062550}, [22-1393590207071] 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 22-1393590207071 2014-02-28 16:23:17 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [401953823061778910 6589282934733062550] 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 22-1393590207071 2014-02-28 16:23:18 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {5841692507792529958=-7336350672377944188}, [23-1393590208075] 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 23-1393590208075 2014-02-28 16:23:18 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [5841692507792529958 -7336350672377944188] 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 23-1393590208075 ..... 2014-02-28 16:23:23 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: default, id: {-2972038151496619125=3635851016679519459}, [30-1393590213096] 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: 30-1393590213096 2014-02-28 16:23:23 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack [-2972038151496619125 3635851016679519459] 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: 30-1393590213096 Acker: 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [8230692570439655861, -4837331369108321771, 2] 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [8230692570439655861, -4837331369108321771] 2014-02-28 16:23:01 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [8230692570439655861] 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [3347220725071045289, -3829793676544359544, 2] 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [3347220725071045289, -3829793676544359544] 2014-02-28 16:23:32 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [3347220725071045289] 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [401953823061778910, 6589282934733062550, 2] 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [401953823061778910, 6589282934733062550] 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [401953823061778910] 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [5841692507792529958, -7336350672377944188, 2] 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [5841692507792529958, -7336350672377944188] 2014-02-28 16:23:34 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [5841692507792529958] ..... 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message source: AckTestSpout:2, stream: __ack_init, id: {}, [-2972038151496619125, 3635851016679519459, 2] 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message source: AckTestBolt:1, stream: __ack_ack, id: {}, [-2972038151496619125, 3635851016679519459] 2014-02-28 16:23:39 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack [-2972038151496619125] Any suggestions are highly appreciated. Ivan Simonenko
