[Storm Version: 0.9.2-incubating] Hello,
I am trying to test failover scenarios with my storm cluster. The following are the details of the cluster: * 4 nodes * Each node with 2 slots * Topology with around 600 spouts and bolts * Num. Workers for the topology = 4 I am running a test that generating a constant load. The cluster is able to handle this load fairly well and the CPU utilization at this point is below 50% on all the nodes. 1 slot is occupied on each of the nodes. I then bring down one of the nodes (kill the supervisor and the worker processes on a node). After this, another worker is created on one of the remaining nodes. But the CPU utilization jumps up to 100%. At this point, nimbus cannot communicate with the supervisor on the node and keeps killing and restarting workers. The CPU utilization remains pegged at 100% as long as the load is on. If I stop the tests and restart the test after a while, the same set up with just 3 nodes works perfectly fine with less CPU utilization. Any pointers to how to figure out the reason for the high CPU utilization during the failover? Thanks Vinay
