You can rebalance your topology with proper wait time without killing all workers manually. When 'kill' or 'rebalance' is issued, topology is immediately 'deactivated', so spouts are not fetching / emitting tuples. In wait time, bolts process tuples which are already emitted from Spout. If bolts can process all flowing tuples, it's a graceful restart. Same thing applies to kill, 'graceful stop' in this case.
- Jungtaek Lim (HeartSaVioR) 2016년 5월 26일 (목) 오후 11:06, Julián Bermejo Ferreiro | BEEVA < [email protected]>님이 작성: > Hi Jungtaek, > > We are running, Storm 0.9.4, but we are planning to migrate to 1.0.1 > version. > > We deploy our topologies to move messages inside RabbitMQ brokers. > > Certanly, we have made the test of forcing a worker's die, and once nimbus > timeout has happened, a new worker appeared in another node, but system > doesn't behave as good as it should. It was necessary to kill some other > workers and rebalance a couple of times in order to get everything OK (A > constant message flow inside our brokers). > > Is it possible to kill all the workers inside a topology and rebalance > (like a kind of graceful shutdown)? Or once you kill all of them you must > redeploy de hole topology? > > Is 1.0.1 version a possible solution? > > Thanks again. > > > > > *JULIÁN BERMEJO FERREIRO* > *Departamento de Tecnología * > *[email protected] <[email protected]>* > <http://www.beeva.com/> > > > > > 2016-05-26 15:34 GMT+02:00 Jungtaek Lim <[email protected]>: > >> Hi Julián, >> >> Which version of Storm do you use? >> I remember some of Storm 0.9.x versions has some issues when workers are >> failing, so I'd like to know about it. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 2016년 5월 26일 (목) 오후 5:53, Julián Bermejo Ferreiro | BEEVA < >> [email protected]>님이 작성: >> >>> Hello, >>> >>> We have a multiple-node storm cluster running on a Production >>> environment. We have had some issues with a couple of machines, which have >>> been out of service for a few hours. >>> >>> Because some workers of the deployed topologies were running on the >>> failed machines, cluster's behaviour has been unusual (It has been running >>> but not as it should). >>> >>> Once we recovered the failed nodes, and rebalanced the topologies, the >>> cluster returned to work properly. >>> >>> We would like to know if there is any way to alert nimbus, when a node >>> fall down, in order to rebalance the affected topologies and create new >>> workers in the healthy nodes of the cluster that supply those who were >>> working on the failed ones. >>> >>> This would have helped us so much, because we could have kept >>> consistency in our service in spite of the failed nodes. >>> >>> Any advice? >>> >>> Tahnks in advance! >>> >>> >>> >>> >>> >>> >>> *JULIÁN BERMEJO FERREIRO* >>> *Departamento de Tecnología * >>> *[email protected] <[email protected]>* >>> <http://www.beeva.com/> >>> >>> >>> >>> >
