Hi Ken, Hi All, In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed.
However, pacemaker included in RHEL8.4beta fails to execute this crm_mon. - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). The problem can be easily reproduced in the following ways. Step1. Modify to execute crm_mon in the stop process of the Dummy resource. ---- dummy_stop() { mon=$(crm_mon -1) ret=$? ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}" dummy_monitor if [ $? = $OCF_SUCCESS ]; then rm ${OCF_RESKEY_state} fi return $OCF_SUCCESS } ---- Step2. Configure a cluster with two nodes. ---- [root@rh84-beta01 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:00:52 2021 * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta01 rh84-beta02 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01 Migration Summary: ---- Step3. Stop the node where the Dummy resource is running. The resource will fail over. ---- [root@rh84-beta02 ~]# crm_mon -rfA1 Cluster Summary: * Stack: corosync * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Thu Apr 8 18:08:56 2021 * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on rh84-beta01 * 2 nodes configured * 1 resource instance configured Node List: * Online: [ rh84-beta02 ] * OFFLINE: [ rh84-beta01 ] Full List of Resources: * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02 ---- However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed. ---- Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI #### crm_mon[102] : Pacemaker daemons shutting down ... Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output) notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ] ---- Similarly, pgsql also executes crm_mon with demote or stop, so control fails. The problem seems to be related to the next fix. * Report pacemakerd in state waiting for sbd - https://github.com/ClusterLabs/pacemaker/pull/2278 The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3. This issue has a huge impact on the user. Perhaps it also affects the control of other resources that utilize crm_mon. Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem. * Distributions other than RHEL may also be affected in future releases. ---- This content is the same as the following Bugzilla. - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 ---- Best Regards, Hideo Yamauchi. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/