Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Klaus Wenninger Fri, 09 Apr 2021 07:04:51 -0700

On 4/9/21 3:45 PM, Klaus Wenninger wrote:

On 4/9/21 3:36 PM, Klaus Wenninger wrote:

On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote:

Hi Klaus,
Thanks for your comment.
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?
Selinux is not enabled.
Isn't crm_mon caused by not returning a response when pacemakerdprepares to stop?

yep ... that doesn't look good.
While in pcmk_shutdown_worker ipc isn't handled.

Stop ... that should actually work as pcmk_shutdown_worker
should exit quite quickly and proceed after mainloop
dispatching when called again.
Don't see anything atm that might be blocking for longer ...
but let me dig into it further ...

Question is why that didn't create issue earlier.
Probably I didn't test with resources that had crm_mon in
their stop/monitor-actions but sbd should have run into
issues.

Klaus

But when shutting down a node the resources should be
shutdown before pacemakerd goes down.
But let me have a look if it can happen that pacemakerd
doesn't react to the ipc-pings before. That btw. might be
lethal for sbd-scenarios (if the phase is too long and it
migh actually not be defined).

My idea with selinux would have been that it might block
the ipc if crm_mon is issued by execd. But well forget
about it as it is not enabled ;-)


Klaus

pgsql needs the result of crm_mon in demote processing and stopprocessing.crm_mon should return a response even after pacemakerd goes into astop operation.


Best Regards,
Hideo Yamauchi.


----- Original Message -----

From: Klaus Wenninger <kwenn...@redhat.com>

To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics relatedto open-source clustering welcomed <users@clusterlabs.org>

Cc:
Date: 2021/4/9, Fri 21:12

Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resourcecontrol fails.


On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote:

  Hi Ken,
  Hi All,
In the pgsql resource, crm_mon is executed in the process ofdemote and

stop, and the result is processed.

However, pacemaker included in RHEL8.4beta fails to execute thiscrm_mon.
    - The problem also occurs on github

master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).

  The problem can be easily reproduced in the following ways.

Step1. Modify to execute crm_mon in the stop process of theDummy resource.

  ----

  dummy_stop() {
       mon=$(crm_mon -1)
       ret=$?
       ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
       dummy_monitor
       if [ $? =  $OCF_SUCCESS ]; then
           rm ${OCF_RESKEY_state}
       fi
       return $OCF_SUCCESS
  }
  ----

  Step2. Configure a cluster with two nodes.
  ----

  [root@rh84-beta01 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync

* Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) -partition

with quorum

     * Last updated: Thu Apr  8 18:00:52 2021
     * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on

rh84-beta01

     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta01 rh84-beta02 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01

  Migration Summary:
  ----

Step3. Stop the node where the Dummy resource is running. Theresource will

fail over.

  ----
  [root@rh84-beta02 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync
* Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) -partition

with quorum

     * Last updated: Thu Apr  8 18:08:56 2021
     * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on

rh84-beta01

     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta02 ]
     * OFFLINE: [ rh84-beta01 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
  ----

However, if you look at the log, you can see that the executionof crm_mon

in the stop processing of the Dummy resource has failed.

  ----
  Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI ####

crm_mon[102] : Pacemaker daemons shutting down ...

Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219](log_op_output)

notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: clusteris not

available on this node ]
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?

Klaus

  ----
Similarly, pgsql also executes crm_mon with demote or stop, socontrol

fails.

  The problem seems to be related to the next fix.
    * Report pacemakerd in state waiting for sbd
     - https://github.com/ClusterLabs/pacemaker/pull/2278

The problem does not occur with the release version of Pacemaker2.0.5 or

the Pacemaker included with RHEL8.3.

  This issue has a huge impact on the user.

  Perhaps it also affects the control of other resources that utilize

crm_mon.

Please improve the release version of RHEL8.4 so that itincludes Pacemaker

which does not cause this problem.

* Distributions other than RHEL may also be affected in futurereleases.


  ----
  This content is the same as the following Bugzilla.
    - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
  ----

  Best Regards,
  Hideo Yamauchi.

  _______________________________________________
  Manage your subscription:
  https://lists.clusterlabs.org/mailman/listinfo/users

  ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Reply via email to