Hey Everyone, We're currently using 0.9.4 and I've seen situations where a small number of bolts or spouts will fail--usually five or less in a topology of 20 KafkaSpout and 800 bolt executors--and the topology re-balances, and therefore restarts all workers. I've seen other situations where failure of a small number spouts/bolts results in restarts of the same number of spouts/bolts in new workers instances but no re-balancing of the entire topology.
Does anyone know what the algorithm is for Storm rebalancing vs. restarting the workers corresponding to the failed executors? Any insights would be really appreciated.- Thanks --John
