Is anybody able to reproduce this problem?


On Fri, Feb 28, 2014 at 6:47 PM, Ivan Simonenko <[email protected]>wrote:

> Hello, Storm users!
>
> I'm facing wired storm related problem with acking in topologies deployed
> in cluster of storm-0.9.0.1. After deploying the topology everything is
> fine: spout emits messages and gets acks. But if I kill the spout worker
> process it is restarted and don't receive acks any more. All messages are
> failed after 'topology.message.timeout.secs' time elapsed. This behaviour
> can be reproduced in storm cluster with example topology: 
> *https://gist.github.com/isimonenko/9270833
> <https://gist.github.com/isimonenko/9270833>* . Spout logs and emits
> anchored message. Bolt logs and acks it.  Problem is not reproduced if
> topology is deployed in different slots on the same server (on localhost).
> But if all workers are on different servers (before and after restart) it
> is reproduced repeatedly.
>
>
> Here are some properties from storm.yaml which i consider could be
> important for this issue:
>
> nimbus.childopts: "-Djava.net.preferIPv4Stack=true -Xmx1024m"
> supervisor.childopts: "-Djava.net.preferIPv4Stack=true -Xmx256m"
> ui.childopts: "-Djava.net.preferIPv4Stack=true -Xmx768m"
> topology.worker.childopts: "-Djava.net.preferIPv4Stack=true"
> worker.childopts: "-Djava.net.preferIPv4Stack=true -Xmx2048m
> -Dworker.id=%ID% -Dnode.home=/usr/local/storm"
>
> storm.messaging.transport: "backtype.storm.messaging.netty.Context"
> storm.messaging.netty.server_worker_threads: 1
> storm.messaging.netty.client_worker_threads: 1
> storm.messaging.netty.buffer_size: 5242880
> storm.messaging.netty.max_retries: 100
> storm.messaging.netty.max_wait_ms: 1000
> storm.messaging.netty.min_wait_ms: 100
>
>
> Topology is deployed in cluster of 24 storm nodes.
>
> Storm GUI screenshots with statistics and verbose logs  before and after
> restart are given below.
>
> *Before restart:*
>
> Storm GUI screenshots:
>
> topology summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg>*
> AckSpoutTest summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg>*
> AckTestBolt summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg>*
> __acker summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg>*
>
>
> Logs:
>
> Spout:
> 2014-02-28 16:14:02 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 178-1393589642202
> 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout default
> [178-1393589642202]
> 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [4258648627535891027 -2569531891572915240 2]
> 2014-02-28 16:14:03 b.s.d.executor [INFO] Processing received message
> source: __acker:3, stream: __ack_ack, id: {}, [4258648627535891027]
> 2014-02-28 16:14:03 b.s.d.executor [INFO] Acking message 178-1393589642202
> 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> ack for message with id = 178-1393589642202
> 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 179-1393589643207
> 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout default
> [179-1393589643207]
> 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [6102388282704904874 6787936987021307408 2]
> 2014-02-28 16:14:04 b.s.d.executor [INFO] Processing received message
> source: __acker:3, stream: __ack_ack, id: {}, [6102388282704904874]
> 2014-02-28 16:14:04 b.s.d.executor [INFO] Acking message 179-1393589643207
> 2014-02-28 16:14:04 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> ack for message with id = 179-1393589643207
>
> Bolt:
> 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 178-1393589642202
> 2014-02-28 16:13:52 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [4258648627535891027 -2569531891572915240]
> 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 178-1393589642202
> 2014-02-28 16:13:53 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {6102388282704904874=6787936987021307408}, [179-1393589643207]
> 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 179-1393589643207
> 2014-02-28 16:13:53 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [6102388282704904874 6787936987021307408]
> 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 179-1393589643207
> 2014-02-28 16:13:54 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {5171521871336409169=4992888649245426560}, [180-1393589644211]
>
> Acker:
> 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [4258648627535891027,
> -2569531891572915240, 2]
> 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [4258648627535891027,
> -2569531891572915240]
> 2014-02-28 16:14:08 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [4258648627535891027]
> 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [6102388282704904874,
> 6787936987021307408, 2]
> 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [6102388282704904874,
> 6787936987021307408]
> 2014-02-28 16:14:09 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [6102388282704904874]
> 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [5171521871336409169,
> 4992888649245426560, 2]
> 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [-8625148924215110757,
> 420872446365787180, 2]
> 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [5171521871336409169,
> 4992888649245426560]
> 2014-02-28 16:14:10 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [5171521871336409169]
>
>
>
> Then I kill Spout worker process and wait while it is restarted by
> supervisor.
>
> *After restart:*
> topology summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg>*
> Spout summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg>*
> Bolt summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg>*
> __acker summary 
> *http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg
> <http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg>*
>
>
> Spout:
> 2014-02-28 16:23:26 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 21-1393590206068
> 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout default
> [21-1393590206068]
> 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [3347220725071045289 -3829793676544359544 2]
> 2014-02-28 16:23:27 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 22-1393590207071
> 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout default
> [22-1393590207071]
> 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [401953823061778910 6589282934733062550 2]
> 2014-02-28 16:23:28 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 23-1393590208075
> 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout default
> [23-1393590208075]
> 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [5841692507792529958 -7336350672377944188 2]
> .....
> 2014-02-28 16:23:33 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: =
> 30-1393590213096
> 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout default
> [30-1393590213096]
> 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init
> [-2972038151496619125 3635851016679519459 2]
> 2014-02-28 16:23:46 b.s.d.executor [INFO] Processing received message
> source: __system:-1, stream: __tick, id: {}, [20]
> 2014-02-28 16:24:06 b.s.d.executor [INFO] Processing received message
> source: __system:-1, stream: __tick, id: {}, [20]
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 24-1393590209079
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 23-1393590208075
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 22-1393590207071
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 26-1393590211086
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 30-1393590213096
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 29-1393590212092
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 28-1393590212092
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 25-1393590209080
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 21-1393590206068
> 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received
> fail for message with id = 27-1393590212090
>
>
> Bolt:
> 2014-02-28 16:22:45 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {8230692570439655861=-4837331369108321771}, [20-1393590175101]
> 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 20-1393590175101
> 2014-02-28 16:22:45 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [8230692570439655861 -4837331369108321771]
> 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 20-1393590175101
> 2014-02-28 16:23:16 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {3347220725071045289=-3829793676544359544}, [21-1393590206068]
> 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 21-1393590206068
> 2014-02-28 16:23:16 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [3347220725071045289 -3829793676544359544]
> 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 21-1393590206068
> 2014-02-28 16:23:17 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {401953823061778910=6589282934733062550}, [22-1393590207071]
> 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 22-1393590207071
> 2014-02-28 16:23:17 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [401953823061778910 6589282934733062550]
> 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 22-1393590207071
> 2014-02-28 16:23:18 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {5841692507792529958=-7336350672377944188}, [23-1393590208075]
> 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 23-1393590208075
> 2014-02-28 16:23:18 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [5841692507792529958 -7336350672377944188]
> 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 23-1393590208075
> .....
> 2014-02-28 16:23:23 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: default, id:
> {-2972038151496619125=3635851016679519459}, [30-1393590213096]
> 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg:
> 30-1393590213096
> 2014-02-28 16:23:23 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack
> [-2972038151496619125 3635851016679519459]
> 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg:
> 30-1393590213096
>
> Acker:
> 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [8230692570439655861,
> -4837331369108321771, 2]
> 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [8230692570439655861,
> -4837331369108321771]
> 2014-02-28 16:23:01 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [8230692570439655861]
> 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [3347220725071045289,
> -3829793676544359544, 2]
> 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [3347220725071045289,
> -3829793676544359544]
> 2014-02-28 16:23:32 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [3347220725071045289]
> 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [401953823061778910,
> 6589282934733062550, 2]
> 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [401953823061778910,
> 6589282934733062550]
> 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [401953823061778910]
> 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [5841692507792529958,
> -7336350672377944188, 2]
> 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [5841692507792529958,
> -7336350672377944188]
> 2014-02-28 16:23:34 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [5841692507792529958]
> .....
> 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message
> source: AckTestSpout:2, stream: __ack_init, id: {}, [-2972038151496619125,
> 3635851016679519459, 2]
> 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message
> source: AckTestBolt:1, stream: __ack_ack, id: {}, [-2972038151496619125,
> 3635851016679519459]
> 2014-02-28 16:23:39 b.s.d.task [INFO] Emitting direct: 2; __acker
> __ack_ack [-2972038151496619125]
>
>
> Any suggestions are highly appreciated.
>
> Ivan Simonenko
>

Reply via email to