Hi Ken, Hi Klaus, Thanks for your comment.
>We did not have time to get it into the RHEL 8.4 GA (general >availability) release, which means for example it will not be in 8.4 >install images, but we did get a 0-day fix, which means that it will be >available via "yum update" the same day that 8.4 is released. > >Thanks for testing the 8.4 build and finding the issue! Okay! Best Regards, Hideo Yamauchi. ----- Original Message ----- >From: Ken Gaillot <kgail...@redhat.com> >To: renayama19661...@ybb.ne.jp >Cc: kwenning <kwenn...@redhat.com> >Date: 2021/4/24, Sat 01:25 >Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control >fails. > >Hi Hideo, > >A private reply to follow up: > >The fix will be in the 2.1.0 upstream release. > >We did not have time to get it into the RHEL 8.4 GA (general >availability) release, which means for example it will not be in 8.4 >install images, but we did get a 0-day fix, which means that it will be >available via "yum update" the same day that 8.4 is released. > >Thanks for testing the 8.4 build and finding the issue! > >On Thu, 2021-04-15 at 11:45 +0900, renayama19661...@ybb.ne.jp wrote: >> Hi Klaus, >> Hi Ken, >> >> We have confirmed that the operation is improved by the test. >> Thank you for your prompt response. >> >> We look forward to including this fix in the release version of RHEL >> 8.4. >> >> Best Regards, >> Hideo Yamauchi. >> >> >> >> ----- Original Message ----- >> > From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp> >> > To: "kwenn...@redhat.com" <kwenn...@redhat.com>; Cluster Labs - All >> > topics related to open-source clustering welcomed < >> > users@clusterlabs.org>; Cluster Labs - All topics related to open- >> > source clustering welcomed <users@clusterlabs.org> >> > Cc: >> > Date: 2021/4/13, Tue 07:08 >> > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource >> > control fails. >> > >> > Hi Klaus, >> > Hi Ken, >> > >> > > I've opened https://github.com/ClusterLabs/pacemaker/pull/2342 >> > > with >> > > I guess the simplest possible solution to the immediate issue so >> > > that we can discuss it. >> > >> > >> > Thank you for the fix. >> > >> > >> > I have confirmed that the fixes have been merged. >> > >> > I'll test this fix today just in case. >> > >> > Many thanks, >> > Hideo Yamauchi. >> > >> > >> > ----- Original Message ----- >> > > From: Klaus Wenninger <kwenn...@redhat.com> >> > > To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics >> > > related to >> > >> > open-source clustering welcomed <users@clusterlabs.org> >> > > Cc: >> > > Date: 2021/4/12, Mon 22:22 >> > > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql >> > > resource control >> > >> > fails. >> > > >> > > On 4/9/21 5:13 PM, Klaus Wenninger wrote: >> > > > On 4/9/21 4:04 PM, Klaus Wenninger wrote: >> > > > > On 4/9/21 3:45 PM, Klaus Wenninger wrote: >> > > > > > On 4/9/21 3:36 PM, Klaus Wenninger wrote: >> > > > > > > On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: >> > > > > > > > Hi Klaus, >> > > > > > > > >> > > > > > > > Thanks for your comment. >> > > > > > > > >> > > > > > > > > Hmm ... is that with selinux enabled? >> > > > > > > > > Respectively do you see any related avc messages? >> > > > > > > > >> > > > > > > > Selinux is not enabled. >> > > > > > > > Isn't crm_mon caused by not returning a response >> > >> > when >> > > pacemakerd >> > > > > > > > prepares to stop? >> > > > > > >> > > > > > yep ... that doesn't look good. >> > > > > > While in pcmk_shutdown_worker ipc isn't handled. >> > > > > >> > > > > Stop ... that should actually work as pcmk_shutdown_worker >> > > > > should exit quite quickly and proceed after mainloop >> > > > > dispatching when called again. >> > > > > Don't see anything atm that might be blocking for longer >> > > > > ... >> > > > > but let me dig into it further ... >> > > > >> > > > What happens is clear (thanks Ken for the hint ;-) ). >> > > > When pacemakerd is shutting down - already when it >> > > > shuts down the resources and not just when it starts to >> > > > reap the subdaemons - crm_mon reads that state and >> > > > doesn't try to connect to the cib anymore. >> > > >> > > I've opened https://github.com/ClusterLabs/pacemaker/pull/2342 >> > > with >> > > I guess the simplest possible solution to the immediate issue so >> > > that we can discuss it. >> > > > > > Question is why that didn't create issue earlier. >> > > > > > Probably I didn't test with resources that had crm_mon in >> > > > > > their stop/monitor-actions but sbd should have run into >> > > > > > issues. >> > > > > > >> > > > > > Klaus >> > > > > > > But when shutting down a node the resources should be >> > > > > > > shutdown before pacemakerd goes down. >> > > > > > > But let me have a look if it can happen that pacemakerd >> > > > > > > doesn't react to the ipc-pings before. That btw. might >> > >> > be >> > > > > > > lethal for sbd-scenarios (if the phase is too long and >> > > > > > > it >> > > > > > > migh actually not be defined). >> > > > > > > >> > > > > > > My idea with selinux would have been that it might >> > > > > > > block >> > > > > > > the ipc if crm_mon is issued by execd. But well forget >> > > > > > > about it as it is not enabled ;-) >> > > > > > > >> > > > > > > >> > > > > > > Klaus >> > > > > > > > >> > > > > > > > pgsql needs the result of crm_mon in demote >> > > > > > > > processing >> > >> > and >> > > stop >> > > > > > > > processing. >> > > > > > > > crm_mon should return a response even after >> > > > > > > > pacemakerd >> > >> > goes >> > > into a >> > > > > > > > stop operation. >> > > > > > > > >> > > > > > > > Best Regards, >> > > > > > > > Hideo Yamauchi. >> > > > > > > > >> > > > > > > > >> > > > > > > > ----- Original Message ----- >> > > > > > > > > From: Klaus Wenninger <kwenn...@redhat.com> >> > > > > > > > > To: renayama19661...@ybb.ne.jp; Cluster Labs - All >> > > topics related >> > > > > > > > > to open-source clustering welcomed >> > > >> > > <users@clusterlabs.org> >> > > > > > > > > Cc: >> > > > > > > > > Date: 2021/4/9, Fri 21:12 >> > > > > > > > > Subject: Re: [ClusterLabs] [Problem] In >> > >> > RHEL8.4beta, >> > > pgsql >> > > > > > > > > resource control fails. >> > > > > > > > > >> > > > > > > > > On 4/8/21 11:21 PM, renayama19661...@ybb.ne.jp >> > >> > wrote: >> > > > > > > > > > Hi Ken, >> > > > > > > > > > Hi All, >> > > > > > > > > > >> > > > > > > > > > In the pgsql resource, crm_mon is executed >> > >> > in the >> > > process of >> > > > > > > > > > demote and >> > > > > > > > > >> > > > > > > > > stop, and the result is processed. >> > > > > > > > > > However, pacemaker included in RHEL8.4beta >> > >> > fails >> > > to execute >> > > > > > > > > > this crm_mon. >> > > > > > > > > > - The problem also occurs on github >> > > > > > > > > >> > > > > > > > > master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). >> > > > > > > > > > The problem can be easily reproduced in the >> > > >> > > following ways. >> > > > > > > > > > >> > > > > > > > > > Step1. Modify to execute crm_mon in the stop >> > > process of the >> > > > > > > > > > Dummy resource. >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > dummy_stop() { >> > > > > > > > > > mon=$(crm_mon -1) >> > > > > > > > > > ret=$? >> > > > > > > > > > ocf_log info "### YAMAUCHI #### >> > > >> > > crm_mon[${ret}] : ${mon}" >> > > > > > > > > > dummy_monitor >> > > > > > > > > > if [ $? = $OCF_SUCCESS ]; then >> > > > > > > > > > rm ${OCF_RESKEY_state} >> > > > > > > > > > fi >> > > > > > > > > > return $OCF_SUCCESS >> > > > > > > > > > } >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > Step2. Configure a cluster with two nodes. >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > [root@rh84-beta01 ~]# crm_mon -rfA1 >> > > > > > > > > > Cluster Summary: >> > > > > > > > > > * Stack: corosync >> > > > > > > > > > * Current DC: rh84-beta01 (version >> > > >> > > 2.0.5-8.el8-ba59be7122) >> > > > > > > > > > - partition >> > > > > > > > > >> > > > > > > > > with quorum >> > > > > > > > > > * Last updated: Thu Apr 8 18:00:52 2021 >> > > > > > > > > > * Last change: Thu Apr 8 18:00:38 2021 >> > >> > by >> > > root via >> > > > > > > > > > cibadmin on >> > > > > > > > > >> > > > > > > > > rh84-beta01 >> > > > > > > > > > * 2 nodes configured >> > > > > > > > > > * 1 resource instance configured >> > > > > > > > > > >> > > > > > > > > > Node List: >> > > > > > > > > > * Online: [ rh84-beta01 rh84-beta02 ] >> > > > > > > > > > >> > > > > > > > > > Full List of Resources: >> > > > > > > > > > * dummy-1 (ocf::heartbeat:Dummy): >> > >> > Started >> > > rh84-beta01 >> > > > > > > > > > >> > > > > > > > > > Migration Summary: >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > Step3. Stop the node where the Dummy >> > >> > resource is >> > > running. The >> > > > > > > > > > resource will >> > > > > > > > > >> > > > > > > > > fail over. >> > > > > > > > > > ---- >> > > > > > > > > > [root@rh84-beta02 ~]# crm_mon -rfA1 >> > > > > > > > > > Cluster Summary: >> > > > > > > > > > * Stack: corosync >> > > > > > > > > > * Current DC: rh84-beta02 (version >> > > >> > > 2.0.5-8.el8-ba59be7122) >> > > > > > > > > > - partition >> > > > > > > > > >> > > > > > > > > with quorum >> > > > > > > > > > * Last updated: Thu Apr 8 18:08:56 2021 >> > > > > > > > > > * Last change: Thu Apr 8 18:05:08 2021 >> > >> > by >> > > root via >> > > > > > > > > > cibadmin on >> > > > > > > > > >> > > > > > > > > rh84-beta01 >> > > > > > > > > > * 2 nodes configured >> > > > > > > > > > * 1 resource instance configured >> > > > > > > > > > >> > > > > > > > > > Node List: >> > > > > > > > > > * Online: [ rh84-beta02 ] >> > > > > > > > > > * OFFLINE: [ rh84-beta01 ] >> > > > > > > > > > >> > > > > > > > > > Full List of Resources: >> > > > > > > > > > * dummy-1 (ocf::heartbeat:Dummy): >> > >> > Started >> > > rh84-beta02 >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > However, if you look at the log, you can see >> > >> > that >> > > the >> > > > > > > > > > execution of crm_mon >> > > > > > > > > >> > > > > > > > > in the stop processing of the Dummy resource has >> > > >> > > failed. >> > > > > > > > > > ---- >> > > > > > > > > > Apr 08 18:05:17 Dummy(dummy-1)[2631]: >> > >> > INFO: >> > > ### YAMAUCHI #### >> > > > > > > > > crm_mon[102] : Pacemaker daemons shutting down ... >> > > > > > > > > > Apr 08 18:05:17 rh84-beta01 pacemaker-execd >> > >> > >> > > [2219] >> > > > > > > > > > (log_op_output) >> > > > > > > > > >> > > > > > > > > notice: dummy-1_stop_0[2631] error output [ >> > >> > crm_mon: >> > > Error: >> > > > > > > > > cluster is not >> > > > > > > > > available on this node ] >> > > > > > > > > Hmm ... is that with selinux enabled? >> > > > > > > > > Respectively do you see any related avc messages? >> > > > > > > > > >> > > > > > > > > Klaus >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > Similarly, pgsql also executes crm_mon with >> > > >> > > demote or stop, so >> > > > > > > > > > control >> > > > > > > > > >> > > > > > > > > fails. >> > > > > > > > > > The problem seems to be related to the next >> > >> > fix. >> > > > > > > > > > * Report pacemakerd in state waiting for >> > >> > sbd >> > > > > > > > > > - >> > > >> > > https://github.com/ClusterLabs/pacemaker/pull/2278 >> > > > > > > > > > >> > > > > > > > > > The problem does not occur with the release >> > > >> > > version of >> > > > > > > > > > Pacemaker 2.0.5 or >> > > > > > > > > >> > > > > > > > > the Pacemaker included with RHEL8.3. >> > > > > > > > > > This issue has a huge impact on the user. >> > > > > > > > > > >> > > > > > > > > > Perhaps it also affects the control of other >> > > resources that >> > > > > > > > > > utilize >> > > > > > > > > >> > > > > > > > > crm_mon. >> > > > > > > > > > Please improve the release version of >> > >> > RHEL8.4 so >> > > that it >> > > > > > > > > > includes Pacemaker >> > > > > > > > > >> > > > > > > > > which does not cause this problem. >> > > > > > > > > > * Distributions other than RHEL may also >> > >> > be >> > > affected in >> > > > > > > > > > future releases. >> > > > > > > > > > >> > > > > > > > > > ---- >> > > > > > > > > > This content is the same as the following >> > > >> > > Bugzilla. >> > > > > > > > > > - >> > > >> > > https://bugs.clusterlabs.org/show_bug.cgi?id=5471 >> > > > > > > > > > ---- >> > > > > > > > > > >> > > > > > > > > > Best Regards, >> > > > > > > > > > Hideo Yamauchi. >> > > > > > > > > > >> > > > > > > > > > >> > >> > _______________________________________________ >> > > > >> > >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> > >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >-- >Ken Gaillot <kgail...@redhat.com> > > > > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/