Hello. Regarding the original issue, good news are the resource-agents ocf-shellfuncs is no more causing fork bombs to the dummy OCF RA [0] after the fix [1] done. The bad news are that "self-forking" monitors issue seems remaining for the rabbitmq OCF RA [2], and I can reproduce it for another custom agent [3], so I'd guess it may be a valid for another ones as well.
IIUC, the issue seems related to how lrmd's forking monitor actions. I tried to debug both pacemaker 1.1.10, 1.1.12 with gdb as the following: # cat ./cmds set follow-fork-mode child set detach-on-fork off set follow-exec-mode new catch fork catch vfork cont # gdb -x cmds /usr/lib/pacemaker/lrmd `pgrep lrmd` I can confirm it catches forked monitors and makes nested forks as well. But I have *many* debug symbols missing, bt is full of question marks and, honestly, I'm not a gdb guru and do not now that to check in for reproduced cases. So any help with how to troubleshooting things further are very appreciated! [0] https://github.com/bogdando/dummy-ocf-ra [1] https://github.com/ClusterLabs/resource-agents/issues/734 [2] https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf [3] https://git.openstack.org/cgit/openstack/fuel-library/tree/files/fuel-ha-utils/ocf/ns_vrouter On 04.01.2016 17:33, Bogdan Dobrelya wrote: > On 04.01.2016 17:14, Dejan Muhamedagic wrote: >> Hi, >> >> On Mon, Jan 04, 2016 at 04:52:43PM +0100, Bogdan Dobrelya wrote: >>> On 04.01.2016 16:36, Ken Gaillot wrote: >>>> On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote: >>>>> On 04.01.2016 15:50, Bogdan Dobrelya wrote: >> [...] >>>>> Also note, that lrmd spawns *many* monitors like: >>>>> root 6495 0.0 0.0 70268 1456 ? Ss 2015 4:56 \_ >>>>> /usr/lib/pacemaker/lrmd >>>>> root 31815 0.0 0.0 4440 780 ? S 15:08 0:00 | \_ >>>>> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>> root 31908 0.0 0.0 4440 388 ? S 15:08 0:00 | >>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>> root 31910 0.0 0.0 4440 384 ? S 15:08 0:00 | >>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>> root 31915 0.0 0.0 4440 392 ? S 15:08 0:00 | >>>>> \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor >>>>> ... >>>> >>>> At first glance, that looks like your monitor action is calling itself >>>> recursively, but I don't see how in your code. >>> >>> Yes, it should be a bug in the ocf-shellfuncs's ocf_log(). >> >> If you're sure about that, please open an issue at >> https://github.com/ClusterLabs/resource-agents/issues > > Submitted [0]. Thank you! > Note, that it seems the very import action causes the issue, not the > ocf_run or ocf_log code itself. > > [0] https://github.com/ClusterLabs/resource-agents/issues/734 > >> >> Thanks, >> >> Dejan >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org