[Storm Version: 0.9.2-incubating]

Hello,

I am trying to test failover scenarios with my storm cluster. The following
are the details of the cluster:

* 4 nodes
* Each node with 2 slots
* Topology with around 600 spouts and bolts
* Num. Workers for the topology = 4

I am running a test that generating a constant load. The cluster is able to
handle this load fairly well and the CPU utilization at this point is below
50% on all the nodes. 1 slot is occupied on each of the nodes.

I then bring down one of the nodes (kill the supervisor and the worker
processes on a node). After this, another worker is created on one of the
remaining nodes. But the CPU utilization jumps up to 100%. At this point,
nimbus cannot communicate with the supervisor on the node and keeps killing
and restarting workers.

The CPU utilization remains pegged at 100% as long as the load is on. If I
stop the tests and restart the test after a while, the same set up with
just 3 nodes works perfectly fine with less CPU utilization.

Any pointers to how to figure out the reason for the high CPU utilization
during the failover?

Thanks
Vinay

Reply via email to