Hi ALl, Sorry... Due to my operation mistake, the same email was sent multiple times.
Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp> > To: Cluster Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org>; Cluster Labs - All topics related to open-source > clustering welcomed <users@clusterlabs.org> > Cc: > Date: 2021/4/15, Thu 11:45 > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control > fails. > > Hi Klaus, > Hi Ken, > > We have confirmed that the operation is improved by the test. > Thank you for your prompt response. > > We look forward to including this fix in the release version of RHEL 8.4. > > Best Regards, > Hideo Yamauchi. > > > > ----- Original Message ----- >> From: "renayama19661...@ybb.ne.jp" > <renayama19661...@ybb.ne.jp> >> To: "kwenn...@redhat.com" <kwenn...@redhat.com>; Cluster > Labs - All topics related to open-source clustering welcomed > <users@clusterlabs.org>; Cluster Labs - All topics related to open-source > clustering welcomed <users@clusterlabs.org> >> Cc: >> Date: 2021/4/13, Tue 07:08 >> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control > fails. >> >> Hi Klaus, >> Hi Ken, >> >>> I've opened https://github.com/ClusterLabs/pacemaker/pull/2342 > with >> >>> I guess the simplest possible solution to the immediate issue so >>> that we can discuss it. >> >> >> Thank you for the fix. >> >> >> I have confirmed that the fixes have been merged. >> >> I'll test this fix today just in case. >> >> Many thanks, >> Hideo Yamauchi. >> >> >> ----- Original Message ----- >>> From: Klaus Wenninger <kwenn...@redhat.com> >>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to >> open-source clustering welcomed <users@clusterlabs.org> >>> Cc: >>> Date: 2021/4/12, Mon 22:22 >>> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource > control >> fails. >>> >>> On 4/9/21 5:13 PM, Klaus Wenninger wrote: >>>> On 4/9/21 4:04 PM, Klaus Wenninger wrote: >>>>> On 4/9/21 3:45 PM, Klaus Wenninger wrote: >>>>>> On 4/9/21 3:36 PM, Klaus Wenninger wrote: >>>>>>> On 4/9/21 2:37 PM, renayama19661...@ybb.ne.jp wrote: >>>>>>>> Hi Klaus, >>>>>>>> >>>>>>>> Thanks for your comment. >>>>>>>> >>>>>>>>> Hmm ... is that with selinux enabled? >>>>>>>>> Respectively do you see any related avc > messages? >>>>>>>> >>>>>>>> Selinux is not enabled. >>>>>>>> Isn't crm_mon caused by not returning a > response >> when >>> pacemakerd >>>>>>>> prepares to stop? >>>>>> yep ... that doesn't look good. >>>>>> While in pcmk_shutdown_worker ipc isn't handled. >>>>> Stop ... that should actually work as pcmk_shutdown_worker >>>>> should exit quite quickly and proceed after mainloop >>>>> dispatching when called again. >>>>> Don't see anything atm that might be blocking for longer > ... >>>>> but let me dig into it further ... >>>> What happens is clear (thanks Ken for the hint ;-) ). >>>> When pacemakerd is shutting down - already when it >>>> shuts down the resources and not just when it starts to >>>> reap the subdaemons - crm_mon reads that state and >>>> doesn't try to connect to the cib anymore. >>> I've opened https://github.com/ClusterLabs/pacemaker/pull/2342 > with >>> I guess the simplest possible solution to the immediate issue so >>> that we can discuss it. >>>>>> Question is why that didn't create issue earlier. >>>>>> Probably I didn't test with resources that had > crm_mon in >>>>>> their stop/monitor-actions but sbd should have run into >>>>>> issues. >>>>>> >>>>>> Klaus >>>>>>> But when shutting down a node the resources should be >>>>>>> shutdown before pacemakerd goes down. >>>>>>> But let me have a look if it can happen that > pacemakerd >>>>>>> doesn't react to the ipc-pings before. That btw. > might >> be >>>>>>> lethal for sbd-scenarios (if the phase is too long > and it >>>>>>> migh actually not be defined). >>>>>>> >>>>>>> My idea with selinux would have been that it might > block >>>>>>> the ipc if crm_mon is issued by execd. But well > forget >>>>>>> about it as it is not enabled ;-) >>>>>>> >>>>>>> >>>>>>> Klaus >>>>>>>> >>>>>>>> pgsql needs the result of crm_mon in demote > processing >> and >>> stop >>>>>>>> processing. >>>>>>>> crm_mon should return a response even after > pacemakerd >> goes >>> into a >>>>>>>> stop operation. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Hideo Yamauchi. >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: Klaus Wenninger > <kwenn...@redhat.com> >>>>>>>>> To: renayama19661...@ybb.ne.jp; Cluster Labs > - All >> >>> topics related >>>>>>>>> to open-source clustering welcomed >>> <users@clusterlabs.org> >>>>>>>>> Cc: >>>>>>>>> Date: 2021/4/9, Fri 21:12 >>>>>>>>> Subject: Re: [ClusterLabs] [Problem] In >> RHEL8.4beta, >>> pgsql >>>>>>>>> resource control fails. >>>>>>>>> >>>>>>>>> On 4/8/21 11:21 PM, > renayama19661...@ybb.ne.jp >> wrote: >>>>>>>>>> Hi Ken, >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> In the pgsql resource, crm_mon is > executed >> in the >>> process of >>>>>>>>>> demote and >>>>>>>>> stop, and the result is processed. >>>>>>>>>> However, pacemaker included in > RHEL8.4beta >> fails >>> to execute >>>>>>>>>> this crm_mon. >>>>>>>>>> - The problem also occurs on github >>>>>>>>> > master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f). >>>>>>>>>> The problem can be easily reproduced in > the >>> following ways. >>>>>>>>>> >>>>>>>>>> Step1. Modify to execute crm_mon in the > stop >> >>> process of the >>>>>>>>>> Dummy resource. >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> dummy_stop() { >>>>>>>>>> mon=$(crm_mon -1) >>>>>>>>>> ret=$? >>>>>>>>>> ocf_log info "### YAMAUCHI > #### >>> crm_mon[${ret}] : ${mon}" >>>>>>>>>> dummy_monitor >>>>>>>>>> if [ $? = $OCF_SUCCESS ]; then >>>>>>>>>> rm ${OCF_RESKEY_state} >>>>>>>>>> fi >>>>>>>>>> return $OCF_SUCCESS >>>>>>>>>> } >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> Step2. Configure a cluster with two > nodes. >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> [root@rh84-beta01 ~]# crm_mon -rfA1 >>>>>>>>>> Cluster Summary: >>>>>>>>>> * Stack: corosync >>>>>>>>>> * Current DC: rh84-beta01 (version >>> 2.0.5-8.el8-ba59be7122) >>>>>>>>>> - partition >>>>>>>>> with quorum >>>>>>>>>> * Last updated: Thu Apr 8 18:00:52 > 2021 >>>>>>>>>> * Last change: Thu Apr 8 18:00:38 > 2021 >> by >>> root via >>>>>>>>>> cibadmin on >>>>>>>>> rh84-beta01 >>>>>>>>>> * 2 nodes configured >>>>>>>>>> * 1 resource instance configured >>>>>>>>>> >>>>>>>>>> Node List: >>>>>>>>>> * Online: [ rh84-beta01 rh84-beta02 > ] >>>>>>>>>> >>>>>>>>>> Full List of Resources: >>>>>>>>>> * dummy-1 > (ocf::heartbeat:Dummy): >> Started >>> rh84-beta01 >>>>>>>>>> >>>>>>>>>> Migration Summary: >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> Step3. Stop the node where the Dummy >> resource is >>> running. The >>>>>>>>>> resource will >>>>>>>>> fail over. >>>>>>>>>> ---- >>>>>>>>>> [root@rh84-beta02 ~]# crm_mon -rfA1 >>>>>>>>>> Cluster Summary: >>>>>>>>>> * Stack: corosync >>>>>>>>>> * Current DC: rh84-beta02 (version >>> 2.0.5-8.el8-ba59be7122) >>>>>>>>>> - partition >>>>>>>>> with quorum >>>>>>>>>> * Last updated: Thu Apr 8 18:08:56 > 2021 >>>>>>>>>> * Last change: Thu Apr 8 18:05:08 > 2021 >> by >>> root via >>>>>>>>>> cibadmin on >>>>>>>>> rh84-beta01 >>>>>>>>>> * 2 nodes configured >>>>>>>>>> * 1 resource instance configured >>>>>>>>>> >>>>>>>>>> Node List: >>>>>>>>>> * Online: [ rh84-beta02 ] >>>>>>>>>> * OFFLINE: [ rh84-beta01 ] >>>>>>>>>> >>>>>>>>>> Full List of Resources: >>>>>>>>>> * dummy-1 > (ocf::heartbeat:Dummy): >> Started >>> rh84-beta02 >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> However, if you look at the log, you > can see >> that >>> the >>>>>>>>>> execution of crm_mon >>>>>>>>> in the stop processing of the Dummy resource > has >>> failed. >>>>>>>>>> ---- >>>>>>>>>> Apr 08 18:05:17 Dummy(dummy-1)[2631]: > >> INFO: >>> ### YAMAUCHI #### >>>>>>>>> crm_mon[102] : Pacemaker daemons shutting > down ... >>>>>>>>>> Apr 08 18:05:17 rh84-beta01 > pacemaker-execd >> >>> [2219] >>>>>>>>>> (log_op_output) >>>>>>>>> notice: dummy-1_stop_0[2631] error output [ >> crm_mon: >>> Error: >>>>>>>>> cluster is not >>>>>>>>> available on this node ] >>>>>>>>> Hmm ... is that with selinux enabled? >>>>>>>>> Respectively do you see any related avc > messages? >>>>>>>>> >>>>>>>>> Klaus >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> Similarly, pgsql also executes crm_mon > with >>> demote or stop, so >>>>>>>>>> control >>>>>>>>> fails. >>>>>>>>>> The problem seems to be related to the > next >> fix. >>>>>>>>>> * Report pacemakerd in state waiting > for >> sbd >>>>>>>>>> - >>> https://github.com/ClusterLabs/pacemaker/pull/2278 >>>>>>>>>> >>>>>>>>>> The problem does not occur with the > release >>> version of >>>>>>>>>> Pacemaker 2.0.5 or >>>>>>>>> the Pacemaker included with RHEL8.3. >>>>>>>>>> This issue has a huge impact on the > user. >>>>>>>>>> >>>>>>>>>> Perhaps it also affects the control of > other >> >>> resources that >>>>>>>>>> utilize >>>>>>>>> crm_mon. >>>>>>>>>> Please improve the release version of >> RHEL8.4 so >>> that it >>>>>>>>>> includes Pacemaker >>>>>>>>> which does not cause this problem. >>>>>>>>>> * Distributions other than RHEL may > also >> be >>> affected in >>>>>>>>>> future releases. >>>>>>>>>> >>>>>>>>>> ---- >>>>>>>>>> This content is the same as the > following >>> Bugzilla. >>>>>>>>>> - >>> https://bugs.clusterlabs.org/show_bug.cgi?id=5471 >>>>>>>>>> ---- >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Hideo Yamauchi. >>>>>>>>>> >>>>>>>>>> >> _______________________________________________ >>>> >>> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/