Hello, Storm users!

I'm facing wired storm related problem with acking in topologies deployed
in cluster of storm-0.9.0.1. After deploying the topology everything is
fine: spout emits messages and gets acks. But if I kill the spout worker
process it is restarted and don't receive acks any more. All messages are
failed after 'topology.message.timeout.secs' time elapsed. This behaviour
can be reproduced in storm cluster with example topology:
*https://gist.github.com/isimonenko/9270833
<https://gist.github.com/isimonenko/9270833>* . Spout logs and emits
anchored message. Bolt logs and acks it.  Problem is not reproduced if
topology is deployed in different slots on the same server (on localhost).
But if all workers are on different servers (before and after restart) it
is reproduced repeatedly.

Here are some properties from storm.yaml which i consider could be
important for this issue:

nimbus.childopts: "-Djava.net.preferIPv4Stack=true -Xmx1024m"
supervisor.childopts: "-Djava.net.preferIPv4Stack=true -Xmx256m"
ui.childopts: "-Djava.net.preferIPv4Stack=true -Xmx768m"
topology.worker.childopts: "-Djava.net.preferIPv4Stack=true"
worker.childopts: "-Djava.net.preferIPv4Stack=true -Xmx2048m
-Dworker.id=%ID% -Dnode.home=/usr/local/storm"

storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100


Topology is deployed in cluster of 24 storm nodes.

Storm GUI screenshots with statistics and verbose logs  before and after
restart are given below.

*Before restart:*

Storm GUI screenshots:
topology summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg>*
AckSpoutTest summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg>*
AckTestBolt summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg>*
__acker summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg>*


Logs:

Spout:
2014-02-28 16:14:02 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
178-1393589642202
2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout default
[178-1393589642202]
2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[4258648627535891027 -2569531891572915240 2]
2014-02-28 16:14:03 b.s.d.executor [INFO] Processing received message
source: __acker:3, stream: __ack_ack, id: {}, [4258648627535891027]
2014-02-28 16:14:03 b.s.d.executor [INFO] Acking message 178-1393589642202
2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
ack for message with id = 178-1393589642202
2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
179-1393589643207
2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout default
[179-1393589643207]
2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[6102388282704904874 6787936987021307408 2]
2014-02-28 16:14:04 b.s.d.executor [INFO] Processing received message
source: __acker:3, stream: __ack_ack, id: {}, [6102388282704904874]
2014-02-28 16:14:04 b.s.d.executor [INFO] Acking message 179-1393589643207
2014-02-28 16:14:04 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
ack for message with id = 179-1393589643207

Bolt:
2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
178-1393589642202
2014-02-28 16:13:52 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[4258648627535891027 -2569531891572915240]
2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
178-1393589642202
2014-02-28 16:13:53 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{6102388282704904874=6787936987021307408}, [179-1393589643207]
2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
179-1393589643207
2014-02-28 16:13:53 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[6102388282704904874 6787936987021307408]
2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
179-1393589643207
2014-02-28 16:13:54 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{5171521871336409169=4992888649245426560}, [180-1393589644211]

Acker:
2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [4258648627535891027,
-2569531891572915240, 2]
2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [4258648627535891027,
-2569531891572915240]
2014-02-28 16:14:08 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[4258648627535891027]
2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [6102388282704904874,
6787936987021307408, 2]
2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [6102388282704904874,
6787936987021307408]
2014-02-28 16:14:09 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[6102388282704904874]
2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [5171521871336409169,
4992888649245426560, 2]
2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [-8625148924215110757,
420872446365787180, 2]
2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [5171521871336409169,
4992888649245426560]
2014-02-28 16:14:10 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[5171521871336409169]



Then I kill Spout worker process and wait while it is restarted by
supervisor.

*After restart:*
topology summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg>*
Spout summary 
*http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg>*
Bolt summary 
*http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg>*
__acker summary
*http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg
<http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg>*


Spout:
2014-02-28 16:23:26 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
21-1393590206068
2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout default
[21-1393590206068]
2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[3347220725071045289 -3829793676544359544 2]
2014-02-28 16:23:27 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
22-1393590207071
2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout default
[22-1393590207071]
2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[401953823061778910 6589282934733062550 2]
2014-02-28 16:23:28 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
23-1393590208075
2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout default
[23-1393590208075]
2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[5841692507792529958 -7336350672377944188 2]
.....
2014-02-28 16:23:33 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
30-1393590213096
2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout default
[30-1393590213096]
2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
[-2972038151496619125 3635851016679519459 2]
2014-02-28 16:23:46 b.s.d.executor [INFO] Processing received message
source: __system:-1, stream: __tick, id: {}, [20]
2014-02-28 16:24:06 b.s.d.executor [INFO] Processing received message
source: __system:-1, stream: __tick, id: {}, [20]
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 24-1393590209079
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 23-1393590208075
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 22-1393590207071
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 26-1393590211086
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 30-1393590213096
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 29-1393590212092
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 28-1393590212092
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 25-1393590209080
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 21-1393590206068
2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
fail for message with id = 27-1393590212090


Bolt:
2014-02-28 16:22:45 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{8230692570439655861=-4837331369108321771}, [20-1393590175101]
2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
20-1393590175101
2014-02-28 16:22:45 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[8230692570439655861 -4837331369108321771]
2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
20-1393590175101
2014-02-28 16:23:16 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{3347220725071045289=-3829793676544359544}, [21-1393590206068]
2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
21-1393590206068
2014-02-28 16:23:16 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[3347220725071045289 -3829793676544359544]
2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
21-1393590206068
2014-02-28 16:23:17 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{401953823061778910=6589282934733062550}, [22-1393590207071]
2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
22-1393590207071
2014-02-28 16:23:17 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[401953823061778910 6589282934733062550]
2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
22-1393590207071
2014-02-28 16:23:18 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{5841692507792529958=-7336350672377944188}, [23-1393590208075]
2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
23-1393590208075
2014-02-28 16:23:18 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[5841692507792529958 -7336350672377944188]
2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
23-1393590208075
.....
2014-02-28 16:23:23 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: default, id:
{-2972038151496619125=3635851016679519459}, [30-1393590213096]
2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
30-1393590213096
2014-02-28 16:23:23 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
[-2972038151496619125 3635851016679519459]
2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
30-1393590213096

Acker:
2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [8230692570439655861,
-4837331369108321771, 2]
2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [8230692570439655861,
-4837331369108321771]
2014-02-28 16:23:01 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[8230692570439655861]
2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [3347220725071045289,
-3829793676544359544, 2]
2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [3347220725071045289,
-3829793676544359544]
2014-02-28 16:23:32 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[3347220725071045289]
2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [401953823061778910,
6589282934733062550, 2]
2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [401953823061778910,
6589282934733062550]
2014-02-28 16:23:33 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[401953823061778910]
2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [5841692507792529958,
-7336350672377944188, 2]
2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [5841692507792529958,
-7336350672377944188]
2014-02-28 16:23:34 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[5841692507792529958]
.....
2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message
source: AckTestSpout:2, stream: __ack_init, id: {}, [-2972038151496619125,
3635851016679519459, 2]
2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message
source: AckTestBolt:1, stream: __ack_ack, id: {}, [-2972038151496619125,
3635851016679519459]
2014-02-28 16:23:39 b.s.d.task [INFO] Emitting direct: 2; __acker __ack_ack
[-2972038151496619125]


Any suggestions are highly appreciated.

Ivan Simonenko

Reply via email to