Hi, On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed <ihjazmoha...@yahoo.co.in> wrote: > Hi All, > As part of some robustness test for my cluster, I tried killing the corosync > process using kill -9 <pid>. After this I see that the pacemakerd service is > stopped but the processes crmd, stonithd, lrmd, cib and attrd are still > running and are hogging up the cpu.
I have seen this kind of testing before and I have to say I don't consider it the recommended way of testing the cluster stack's "robustness". Pacemaker processes rely on corosync for proper functioning. You kill corosync and then want to "cleanup" the processes? You have to go through a lot more literature in order to understand how this cluster stack works. For the Master Control Process, how it works and other related information (which is related to what you are experiencing), see http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for The essential guide you need is http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ HTH, Dan > > top - 06:26:51 up 2:01, 4 users, load average: 12.04, 12.01, 11.98 > Tasks: 330 total, 13 running, 317 sleeping, 0 stopped, 0 zombie > Cpu(s): 7.1%us, 17.1%sy, 0.0%ni, 75.6%id, 0.1%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 8015444k total, 4804412k used, 3211032k free, 54800k buffers > Swap: 10256376k total, 0k used, 10256376k free, 1604464k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2053 hacluste RT 0 90492 3324 2476 R 100.0 0.0 113:40.61 crmd > 2047 root RT 0 81480 2108 1712 R 99.8 0.0 113:40.43 stonithd > 2048 hacluste RT 0 83404 5260 2992 R 99.8 0.1 113:40.90 cib > 2050 hacluste RT 0 85896 2388 1952 R 99.8 0.0 113:40.43 attrd > 5018 root 20 0 8787m 345m 56m S 2.0 4.4 0:56.95 java > 19017 root 20 0 15068 1252 796 R 2.0 0.0 0:00.01 top > 1 root 20 0 19232 1444 1156 S 0.0 0.0 0:01.71 init > 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd > 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 > 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 > > > Is there a way to cleanup these processes ? OR Do I need to kill them one by > one before respawning the corosync? > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > -- Dan Frincu CCNA, RHCE _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker