On 2011-11-14 13:18, Dan Frincu wrote: > Hi, > > On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed <ihjazmoha...@yahoo.co.in> > wrote: >> Hi All, >> As part of some robustness test for my cluster, I tried killing the corosync >> process using kill -9 <pid>. After this I see that the pacemakerd service is >> stopped but the processes crmd, stonithd, lrmd, cib and attrd are still >> running and are hogging up the cpu. > > I have seen this kind of testing before and I have to say I don't > consider it the recommended way of testing the cluster stack's > "robustness". Pacemaker processes rely on corosync for proper > functioning. You kill corosync and then want to "cleanup" the > processes? You have to go through a lot more literature in order to > understand how this cluster stack works.
Well I, for my part, don't consider this kind of testing unreasonable at all. If Corosync dies, say due to a segfault, then the cluster had better recover to a consistent state. Thus, this (very valid) testing highlights that the cluster is evidently misconfigured; it's either not using Pacemaker MCP at all, or doesn't have STONITH configured, or neither. Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker