Is anybody able to reproduce this problem?
On Fri, Feb 28, 2014 at 6:47 PM, Ivan Simonenko <[email protected]>wrote: > Hello, Storm users! > > I'm facing wired storm related problem with acking in topologies deployed > in cluster of storm-0.9.0.1. After deploying the topology everything is > fine: spout emits messages and gets acks. But if I kill the spout worker > process it is restarted and don't receive acks any more. All messages are > failed after 'topology.message.timeout.secs' time elapsed. This behaviour > can be reproduced in storm cluster with example topology: > *https://gist.github.com/isimonenko/9270833 > <https://gist.github.com/isimonenko/9270833>* . Spout logs and emits > anchored message. Bolt logs and acks it. Problem is not reproduced if > topology is deployed in different slots on the same server (on localhost). > But if all workers are on different servers (before and after restart) it > is reproduced repeatedly. > > > Here are some properties from storm.yaml which i consider could be > important for this issue: > > nimbus.childopts: "-Djava.net.preferIPv4Stack=true -Xmx1024m" > supervisor.childopts: "-Djava.net.preferIPv4Stack=true -Xmx256m" > ui.childopts: "-Djava.net.preferIPv4Stack=true -Xmx768m" > topology.worker.childopts: "-Djava.net.preferIPv4Stack=true" > worker.childopts: "-Djava.net.preferIPv4Stack=true -Xmx2048m > -Dworker.id=%ID% -Dnode.home=/usr/local/storm" > > storm.messaging.transport: "backtype.storm.messaging.netty.Context" > storm.messaging.netty.server_worker_threads: 1 > storm.messaging.netty.client_worker_threads: 1 > storm.messaging.netty.buffer_size: 5242880 > storm.messaging.netty.max_retries: 100 > storm.messaging.netty.max_wait_ms: 1000 > storm.messaging.netty.min_wait_ms: 100 > > > Topology is deployed in cluster of 24 storm nodes. > > Storm GUI screenshots with statistics and verbose logs before and after > restart are given below. > > *Before restart:* > > Storm GUI screenshots: > > topology summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/aa4/19f/7c69aee919dc105bfdfff72f8d95e87a72305a70.jpg>* > AckSpoutTest summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/2f5/84a/0b42ae155df903852e1b42e787da404bd520df4c.jpg>* > AckTestBolt summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/76f/6cc/96e52480a7b843452094c76bef32699f4d676b10.jpg>* > __acker summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/bd2/eaf/4a95b17de16afcb185968492b434dc954c50adec.jpg>* > > > Logs: > > Spout: > 2014-02-28 16:14:02 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 178-1393589642202 > 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout default > [178-1393589642202] > 2014-02-28 16:14:02 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [4258648627535891027 -2569531891572915240 2] > 2014-02-28 16:14:03 b.s.d.executor [INFO] Processing received message > source: __acker:3, stream: __ack_ack, id: {}, [4258648627535891027] > 2014-02-28 16:14:03 b.s.d.executor [INFO] Acking message 178-1393589642202 > 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > ack for message with id = 178-1393589642202 > 2014-02-28 16:14:03 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 179-1393589643207 > 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout default > [179-1393589643207] > 2014-02-28 16:14:03 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [6102388282704904874 6787936987021307408 2] > 2014-02-28 16:14:04 b.s.d.executor [INFO] Processing received message > source: __acker:3, stream: __ack_ack, id: {}, [6102388282704904874] > 2014-02-28 16:14:04 b.s.d.executor [INFO] Acking message 179-1393589643207 > 2014-02-28 16:14:04 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > ack for message with id = 179-1393589643207 > > Bolt: > 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 178-1393589642202 > 2014-02-28 16:13:52 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [4258648627535891027 -2569531891572915240] > 2014-02-28 16:13:52 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 178-1393589642202 > 2014-02-28 16:13:53 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {6102388282704904874=6787936987021307408}, [179-1393589643207] > 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 179-1393589643207 > 2014-02-28 16:13:53 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [6102388282704904874 6787936987021307408] > 2014-02-28 16:13:53 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 179-1393589643207 > 2014-02-28 16:13:54 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {5171521871336409169=4992888649245426560}, [180-1393589644211] > > Acker: > 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [4258648627535891027, > -2569531891572915240, 2] > 2014-02-28 16:14:08 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [4258648627535891027, > -2569531891572915240] > 2014-02-28 16:14:08 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [4258648627535891027] > 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [6102388282704904874, > 6787936987021307408, 2] > 2014-02-28 16:14:09 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [6102388282704904874, > 6787936987021307408] > 2014-02-28 16:14:09 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [6102388282704904874] > 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [5171521871336409169, > 4992888649245426560, 2] > 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [-8625148924215110757, > 420872446365787180, 2] > 2014-02-28 16:14:10 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [5171521871336409169, > 4992888649245426560] > 2014-02-28 16:14:10 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [5171521871336409169] > > > > Then I kill Spout worker process and wait while it is restarted by > supervisor. > > *After restart:* > topology summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/1f6/116/4aa3b863b4aecd8559f43c42af389619ca617746.jpg>* > Spout summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/80f/343/120a9dab4b9bd4110236501e9ebac0aed379fa93.jpg>* > Bolt summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/079/4ef/e7769b8d31331d26ca78fc7ceacddbc6ec4c0440.jpg>* > __acker summary > *http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg > <http://cdn.joxi.ru/uploads/prod/2014/02/27/e2d/a12/58e046f0786bc38cce6a75dc9c93ad669ee3325d.jpg>* > > > Spout: > 2014-02-28 16:23:26 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 21-1393590206068 > 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout default > [21-1393590206068] > 2014-02-28 16:23:26 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [3347220725071045289 -3829793676544359544 2] > 2014-02-28 16:23:27 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 22-1393590207071 > 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout default > [22-1393590207071] > 2014-02-28 16:23:27 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [401953823061778910 6589282934733062550 2] > 2014-02-28 16:23:28 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 23-1393590208075 > 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout default > [23-1393590208075] > 2014-02-28 16:23:28 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [5841692507792529958 -7336350672377944188 2] > ..... > 2014-02-28 16:23:33 c.p.z.t.AckTestTopology [INFO] AckTestSpout emit: = > 30-1393590213096 > 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout default > [30-1393590213096] > 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting: AckTestSpout __ack_init > [-2972038151496619125 3635851016679519459 2] > 2014-02-28 16:23:46 b.s.d.executor [INFO] Processing received message > source: __system:-1, stream: __tick, id: {}, [20] > 2014-02-28 16:24:06 b.s.d.executor [INFO] Processing received message > source: __system:-1, stream: __tick, id: {}, [20] > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 24-1393590209079 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 23-1393590208075 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 22-1393590207071 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 26-1393590211086 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 30-1393590213096 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 29-1393590212092 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 28-1393590212092 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 25-1393590209080 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 21-1393590206068 > 2014-02-28 16:24:06 c.p.z.t.AckTestTopology [INFO] AckTestSpout received > fail for message with id = 27-1393590212090 > > > Bolt: > 2014-02-28 16:22:45 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {8230692570439655861=-4837331369108321771}, [20-1393590175101] > 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 20-1393590175101 > 2014-02-28 16:22:45 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [8230692570439655861 -4837331369108321771] > 2014-02-28 16:22:45 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 20-1393590175101 > 2014-02-28 16:23:16 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {3347220725071045289=-3829793676544359544}, [21-1393590206068] > 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 21-1393590206068 > 2014-02-28 16:23:16 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [3347220725071045289 -3829793676544359544] > 2014-02-28 16:23:16 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 21-1393590206068 > 2014-02-28 16:23:17 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {401953823061778910=6589282934733062550}, [22-1393590207071] > 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 22-1393590207071 > 2014-02-28 16:23:17 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [401953823061778910 6589282934733062550] > 2014-02-28 16:23:17 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 22-1393590207071 > 2014-02-28 16:23:18 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {5841692507792529958=-7336350672377944188}, [23-1393590208075] > 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 23-1393590208075 > 2014-02-28 16:23:18 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [5841692507792529958 -7336350672377944188] > 2014-02-28 16:23:18 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 23-1393590208075 > ..... > 2014-02-28 16:23:23 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: default, id: > {-2972038151496619125=3635851016679519459}, [30-1393590213096] > 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt got msg: > 30-1393590213096 > 2014-02-28 16:23:23 b.s.d.task [INFO] Emitting: AckTestBolt __ack_ack > [-2972038151496619125 3635851016679519459] > 2014-02-28 16:23:23 c.p.z.t.AckTestTopology [INFO] AckTestBolt acked msg: > 30-1393590213096 > > Acker: > 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [8230692570439655861, > -4837331369108321771, 2] > 2014-02-28 16:23:01 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [8230692570439655861, > -4837331369108321771] > 2014-02-28 16:23:01 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [8230692570439655861] > 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [3347220725071045289, > -3829793676544359544, 2] > 2014-02-28 16:23:32 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [3347220725071045289, > -3829793676544359544] > 2014-02-28 16:23:32 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [3347220725071045289] > 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [401953823061778910, > 6589282934733062550, 2] > 2014-02-28 16:23:33 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [401953823061778910, > 6589282934733062550] > 2014-02-28 16:23:33 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [401953823061778910] > 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [5841692507792529958, > -7336350672377944188, 2] > 2014-02-28 16:23:34 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [5841692507792529958, > -7336350672377944188] > 2014-02-28 16:23:34 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [5841692507792529958] > ..... > 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message > source: AckTestSpout:2, stream: __ack_init, id: {}, [-2972038151496619125, > 3635851016679519459, 2] > 2014-02-28 16:23:39 b.s.d.executor [INFO] Processing received message > source: AckTestBolt:1, stream: __ack_ack, id: {}, [-2972038151496619125, > 3635851016679519459] > 2014-02-28 16:23:39 b.s.d.task [INFO] Emitting direct: 2; __acker > __ack_ack [-2972038151496619125] > > > Any suggestions are highly appreciated. > > Ivan Simonenko >
