Hi
I have a storm cluster in production.
Recently CPU usage by the supervisor machines is hitting 100% during
weekends, this is kind of weird as we have least traffic on our website
during weekends. The system gets hung and the supervisord daemon keeps
trying to restart the storm daemons. Since all the supervisors are being
affected, the topology is getting hung.
Whenever this happens, we loose ssh access to the servers, and have to
reboot so that the memory gets cleaned up.
There are 4 supervisor machines(VM's) each with 8GB RAM & 4 cores
And a separate Nimbus machine(8GB RAM, 4 cores).
There are 12 workers in each node, we currently have around 15 unused slots.
Generally the CPU used is around 50-60 percent for these systems and out
of 8GB only 3-4 GB of RAM is used.
What could be happening?
--
Regards,
*Chitra Raveendran*
*Data Scientist*
Mobile: +91 819753660│*Email:* [email protected]
*Flutura Business Solutions Private Limited – “A Decision Sciences &
Analytics Company”*│ #693, 2nd Floor, Geetanjali, 15th Cross, J.P
Nagar 2nd Phase,
Bangalore – 560078│