Hello,
We have a rather large topology with 1023 bolts, all of those with parallelism = 1. When I run the topology on a single worker, the topology starts up in 35 minutes (!!). If I split the topology on two workers, it's 3 minutes. And if it gets split between three workers, the start-up time is reduced to ~30 seconds. Obviously Nimbus normally kills the supervisor much sooner than after 35 minutes, so supervisor.worker.start.timeout.secs, nimbus.supervisor.timeout. secs and nimbus.task.launch.secs have been configured to a few hours. The bulk of the time is spent like this (log snippet from the init sequence, spacing mine, bolt names censored): 2018-07-17_18:18:46.430 o.a.s.d.executor [INFO] Loading executor bolt-123: [664 664] 2018-07-17_18:18:46.436 o.a.s.d.executor [INFO] Loaded executor tasks bolt- 123:[664 664] 2018-07-17_18:18:46.443 o.a.s.d.executor [INFO] Finished loading executor bolt-123:[664 664] 2018-07-17_18:18:47.471 o.a.s.d.executor [INFO] Loading executor bolt-abc: [582 582] 2018-07-17_18:18:47.472 o.a.s.d.executor [INFO] Loaded executor tasks bolt- abc:[582 582] 2018-07-17_18:18:47.476 o.a.s.d.executor [INFO] Finished loading executor bolt-abc:[582 582] 2018-07-17_18:18:47.883 o.a.s.d.executor [INFO] Loading executor bolt-xyz: [220 220] 2018-07-17_18:18:47.885 o.a.s.d.executor [INFO] Loaded executor tasks bolt- xyz:[220 220] 2018-07-17_18:18:47.893 o.a.s.d.executor [INFO] Finished loading executor bolt-xyz:[220 220] 2018-07-17_18:18:52.346 o.a.s.d.executor [INFO] Loading executor bolt-789: [783 783] 2018-07-17_18:18:52.353 o.a.s.d.executor [INFO] Loaded executor tasks bolt- 789:[783 783] 2018-07-17_18:18:52.360 o.a.s.d.executor [INFO] Finished loading executor bolt-789:[783 783] 2018-07-17_18:18:54.154 o.a.s.d.executor [INFO] Loading executor bolt-def: [898 898] 2018-07-17_18:18:54.155 o.a.s.d.executor [INFO] Loaded executor tasks bolt- def:[898 898] 2018-07-17_18:18:54.159 o.a.s.d.executor [INFO] Finished loading executor bolt-def:[898 898] Please note the _insane_ time delays between the particular bolt loads. This is reproducible on Storm 1.01, 1.0.6 and 1.2.2, with Java 1.8u141. uname -a: 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x 86_64 x86_64 GNU/Linux worker.childopts: "-Xmx96G" (but during the init it does not grow bigger that 2 GBs and produces basically no GC activity) I understand that Storm is designed for horizontal scaling, but scaling vertically so badly based on the number of bolts (quadratically? exponentially?) seems to be an oversight. Is there any configuration we could use to improve the situation, e.g. by parallelizing the loading procedure? Should I file a Jira? Thank you, Petr Janeček
