On Tue, 2022-06-14 at 14:36 +0200, Ulrich Windl wrote: > Hi! > > I had a case where a VirtualDomain monitor operation ended in a core > dump (actually it was pacemaker-execd, but it counted as "monitor" > operation), and the cluster decided to restart the VM. Wouldn't it be > worth to retry the monitor operation first?
It counts like any other monitor failure > Chances are that a re-tried monitor operation returns a better status > than segmentation fault. > Or dies the logic just ignore processes dying on signals? > > 20201202.ba59be712-150300.4.21.1.x86_64 (SLES15 SP3) > > Jun 14 14:09:16 h19 systemd-coredump[28788]: Process 28786 > (pacemaker-execd) of user 0 dumped core. > Jun 14 14:09:16 h19 pacemaker-execd[7440]: warning: > prm_xen_v04_monitor_600000[28786] terminated with signal: > Segmentation fault This means that the child process forked to execute the resource agent segfaulted, which is odd. Is the agent a compiled program? If not, it's possible the tiny amount of pacemaker code that executes the agent is what segfaulted. Do you have the actual core, and can you do a backtrace? > Jun 14 14:09:16 h19 pacemaker-controld[7443]: error: Result of > monitor operation for prm_xen_v04 on h19: Error > Jun 14 14:09:16 h19 pacemaker-controld[7443]: notice: Transition 9 > action 107 (prm_xen_v04_monitor_600000 on h19): expected 'ok' but got > 'error' > ... > Jun 14 14:09:16 h19 pacemaker-schedulerd[7442]: notice: * > Recover prm_xen_v04 ( h19 ) > > Regards, > ulrich > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/