On 4/9/21 3:45 PM, Klaus Wenninger wrote:
On 4/9/21 3:36 PM, Klaus Wenninger wrote:
On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote:
Hi Klaus,

Thanks for your comment.

Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?

Selinux is not enabled.
Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop?
yep ... that doesn't look good.
While in pcmk_shutdown_worker ipc isn't handled.
Stop ... that should actually work as pcmk_shutdown_worker
should exit quite quickly and proceed after mainloop
dispatching when called again.
Don't see anything atm that might be blocking for longer ...
but let me dig into it further ...
Question is why that didn't create issue earlier.
Probably I didn't test with resources that had crm_mon in
their stop/monitor-actions but sbd should have run into
issues.

Klaus
But when shutting down a node the resources should be
shutdown before pacemakerd goes down.
But let me have a look if it can happen that pacemakerd
doesn't react to the ipc-pings before. That btw. might be
lethal for sbd-scenarios (if the phase is too long and it
migh actually not be defined).

My idea with selinux would have been that it might block
the ipc if crm_mon is issued by execd. But well forget
about it as it is not enabled ;-)


Klaus

pgsql needs the result of crm_mon in demote processing and stop processing. crm_mon should return a response even after pacemakerd goes into a stop operation.

Best Regards,
Hideo Yamauchi.


----- Original Message -----
From: Klaus Wenninger <kwenn...@redhat.com>
To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org>
Cc:
Date: 2021/4/9, Fri 21:12
Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp wrote:
  Hi Ken,
  Hi All,

  In the pgsql resource, crm_mon is executed in the process of demote and
stop, and the result is processed.
  However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.
    - The problem also occurs on github
master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
  The problem can be easily reproduced in the following ways.

  Step1. Modify to execute crm_mon in the stop process of the Dummy resource.
  ----

  dummy_stop() {
       mon=$(crm_mon -1)
       ret=$?
       ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
       dummy_monitor
       if [ $? =  $OCF_SUCCESS ]; then
           rm ${OCF_RESKEY_state}
       fi
       return $OCF_SUCCESS
  }
  ----

  Step2. Configure a cluster with two nodes.
  ----

  [root@rh84-beta01 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync
     * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition
with quorum
     * Last updated: Thu Apr  8 18:00:52 2021
     * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on
rh84-beta01
     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta01 rh84-beta02 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01

  Migration Summary:
  ----

  Step3. Stop the node where the Dummy resource is running. The resource will
fail over.
  ----
  [root@rh84-beta02 ~]# crm_mon -rfA1
  Cluster Summary:
     * Stack: corosync
     * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition
with quorum
     * Last updated: Thu Apr  8 18:08:56 2021
     * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on
rh84-beta01
     * 2 nodes configured
     * 1 resource instance configured

  Node List:
     * Online: [ rh84-beta02 ]
     * OFFLINE: [ rh84-beta01 ]

  Full List of Resources:
     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
  ----

  However, if you look at the log, you can see that the execution of crm_mon
in the stop processing of the Dummy resource has failed.
  ----
  Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI ####
crm_mon[102] : Pacemaker daemons shutting down ...
  Apr 08 18:05:17 rh84-beta01 pacemaker-execd     [2219] (log_op_output)
notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not
available on this node ]
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?

Klaus
  ----

  Similarly, pgsql also executes crm_mon with demote or stop, so control
fails.
  The problem seems to be related to the next fix.
    * Report pacemakerd in state waiting for sbd
     - https://github.com/ClusterLabs/pacemaker/pull/2278

  The problem does not occur with the release version of Pacemaker 2.0.5 or
the Pacemaker included with RHEL8.3.
  This issue has a huge impact on the user.

  Perhaps it also affects the control of other resources that utilize
crm_mon.
  Please improve the release version of RHEL8.4 so that it includes Pacemaker
which does not cause this problem.
    * Distributions other than RHEL may also be affected in future releases.

  ----
  This content is the same as the following Bugzilla.
    - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
  ----

  Best Regards,
  Hideo Yamauchi.

  _______________________________________________
  Manage your subscription:
  https://lists.clusterlabs.org/mailman/listinfo/users

  ClusterLabs home: https://www.clusterlabs.org/



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to