Hi Steffen, I've been experimenting with it since last weekend, but I haven't been able to reproduce the same situation. It seems that the cause is that the reproduction method cannot be limited.
Can I attach a problem log? Best Regards, Hideo Yamauchi. ----- Original Message ----- > From: Klaus Wenninger <kwenn...@redhat.com> > To: Steffen Vinther Sørensen <svint...@gmail.com>; Cluster Labs - All topics > related to open-source clustering welcomed <users@clusterlabs.org> > Cc: > Date: 2021/1/7, Thu 21:42 > Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status > > On 1/7/21 1:13 PM, Steffen Vinther Sørensen wrote: >> Hi Klaus, >> >> Yes then the status does sync to the other nodes. Also it looks like >> there are some hostname resolving problems in play here, maybe causing >> problems, here is my notes from restarting pacemaker etc. > Don't think there are hostname resolving problems. > The messages you are seeing, that look as if, are caused > by using -EHOSTUNREACH as error-code to fail a pending > fence action when a node that is just coming up sees > a pending action that is claimed to be handled by himself. > Back then I chose that error-code as there was none > that really matched available right away and it was > urgent for some reason so that introduction of something > new was to risky at that state. > Probably would make sense to introduce something that > is more descriptive. > Back then the issue was triggered by fenced crashing and > being restarted - so not a node-restart but just fenced > restarting. > And it looks as if building the failed-message failed somehow. > So that could be the reason why the pending action persists. > Would be something else then what we solved with Bug 5401. > But what triggers the logs below might as well just be a > follow-up issue after the Bug 5401 thing. > Will try to find time for a deeper look later today. > > Klaus >> >> pcs cluster standby kvm03-node02.avigol-gcs.dk >> pcs cluster stop kvm03-node02.avigol-gcs.dk >> pcs status >> >> Pending Fencing Actions: >> * reboot of kvm03-node02.avigol-gcs.dk pending: client=crmd.37819, >> origin=kvm03-node03.avigol-gcs.dk >> >> # From logs on all 3 nodes: >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: received >> pending action we are supposed to be the owner but it's not in our >> records -> fail it >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: Operation >> 'reboot' targeting kvm03-node02.avigol-gcs.dk on <no-one> for >> crmd.37...@kvm03-node03.avigol-gcs.dk.56a3018c: No route to host >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: >> stonith_construct_reply: Triggered assert at commands.c:2406 : request >> != NULL >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: Can't create >> a sane reply >> Jan 07 12:48:18 kvm03-node03 crmd[37819]: notice: Peer >> kvm03-node02.avigol-gcs.dk was not terminated (reboot) by <anyone> on >> behalf of crmd.37819: No route to host >> >> pcs cluster start kvm03-node02.avigol-gcs.dk >> pcs status (now outputs the same on all 3 nodes) >> >> Failed Fencing Actions: >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=, >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk, >> last-failed='Thu Jan 7 12:48:18 2021' >> >> >> pcs cluster unstandby kvm03-node02.avigol-gcs.dk >> >> # Now libvirtd refuses to start >> >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read /etc/hosts - 8 addresses >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read >> /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses >> Jan 07 12:51:44 kvm03-node02 dnsmasq-dhcp[20884]: read >> /var/lib/libvirt/dnsmasq/default.hostsfile >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.729+0000: 24160: info : libvirt version: 4.5.0, package: >> 36.el7_9.3 (CentOS BuildSystem <http://bugs.centos.org >, >> 2020-11-16-16:25:20, x86-01.bsys.centos.org) >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.729+0000: 24160: info : hostname: kvm03-node02 >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.729+0000: 24160: error : qemuMonitorOpenUnix:392 : failed to >> connect to monitor socket: Connection refused >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.729+0000: 24159: error : qemuMonitorOpenUnix:392 : failed to >> connect to monitor socket: Connection refused >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.730+0000: 24161: error : qemuMonitorOpenUnix:392 : failed to >> connect to monitor socket: Connection refused >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 >> 11:51:44.730+0000: 24162: error : qemuMonitorOpenUnix:392 : failed to >> connect to monitor socket: Connection refused >> >> pcs status >> >> Failed Resource Actions: >> * libvirtd_start_0 on kvm03-node02.avigol-gcs.dk 'unknown error' > (1): >> call=142, status=complete, exitreason='', >> last-rc-change='Thu Jan 7 12:51:44 2021', queued=0ms, > exec=2157ms >> >> Failed Fencing Actions: >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=, >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk, >> last-failed='Thu Jan 7 12:48:18 2021' >> >> >> # from /etc/hosts on all 3 nodes: >> >> 172.31.0.31 kvm03-node01 kvm03-node01.avigol-gcs.dk >> 172.31.0.32 kvm03-node02 kvm03-node02.avigol-gcs.dk >> 172.31.0.33 kvm03-node03 kvm03-node03.avigol-gcs.dk >> >> On Thu, Jan 7, 2021 at 11:15 AM Klaus Wenninger <kwenn...@redhat.com> > wrote: >>> Hi Steffen, >>> >>> If you just see the leftover pending-action on one node >>> it would be interesting if restarting of pacemaker on >>> one of the other nodes does sync it to all of the >>> nodes. >>> >>> Regards, >>> Klaus >>> >>> On 1/7/21 9:54 AM, renayama19661...@ybb.ne.jp wrote: >>>> Hi Steffen, >>>> >>>>> Unfortunately not sure about the exact scenario. But I have > been doing >>>>> some recent experiments with node standby/unstandby stop/start. > This >>>>> to get procedures right for updating node rpms etc. >>>>> >>>>> Later I noticed the uncomforting "pending fencing > actions" status msg. >>>> Okay! >>>> >>>> Repeat the standby and unstandby steps in the same way to check. >>>> We will start checking after tomorrow, so I think it will take some > time until next week. >>>> >>>> >>>> Many thanks, >>>> Hideo Yamauchi. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "renayama19661...@ybb.ne.jp" > <renayama19661...@ybb.ne.jp> >>>>> To: Reid Wahl <nw...@redhat.com>; Cluster Labs - All > topics related to open-source clustering welcomed <users@clusterlabs.org> >>>>> Cc: >>>>> Date: 2021/1/7, Thu 17:51 >>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs > status >>>>> >>>>> Hi Steffen, >>>>> Hi Reid, >>>>> >>>>> The fencing history is kept inside stonith-ng and is not > written to cib. >>>>> However, getting the entire cib and getting it sent will help > you to reproduce >>>>> the problem. >>>>> >>>>> Best Regards, >>>>> Hideo Yamauchi. >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: Reid Wahl <nw...@redhat.com> >>>>>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics > related to >>>>> open-source clustering welcomed <users@clusterlabs.org> >>>>>> Date: 2021/1/7, Thu 17:39 >>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in > pcs status >>>>>> >>>>>> >>>>>> Hi, Steffen. Those attachments don't contain the CIB. > They contain the >>>>> `pcs config` output. You can get the cib with `pcs cluster cib >> >>>>> $(hostname).cib.xml`. >>>>>> Granted, it's possible that this fence action > information wouldn't >>>>> be in the CIB at all. It might be stored in fencer memory. >>>>>> On Thu, Jan 7, 2021 at 12:26 AM > <renayama19661...@ybb.ne.jp> wrote: >>>>>> >>>>>> Hi Steffen, >>>>>>>> Here CIB settings attached (pcs config show) for > all 3 of my nodes >>>>>>>> (all 3 seems 100% identical), node03 is the DC. >>>>>>> Thank you for the attachment. >>>>>>> >>>>>>> What is the scenario when this situation occurs? >>>>>>> In what steps did the problem appear when fencing was > performed (or >>>>> failed)? >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> From: Steffen Vinther Sørensen > <svint...@gmail.com> >>>>>>>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All > topics related >>>>> to open-source clustering welcomed > <users@clusterlabs.org> >>>>>>>> Cc: >>>>>>>> Date: 2021/1/7, Thu 17:05 >>>>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions > shown in pcs >>>>> status >>>>>>>> Hi Hideo, >>>>>>>> >>>>>>>> Here CIB settings attached (pcs config show) for > all 3 of my nodes >>>>>>>> (all 3 seems 100% identical), node03 is the DC. >>>>>>>> >>>>>>>> Regards >>>>>>>> Steffen >>>>>>>> >>>>>>>> On Thu, Jan 7, 2021 at 8:06 AM > <renayama19661...@ybb.ne.jp> >>>>> wrote: >>>>>>>>> Hi Steffen, >>>>>>>>> Hi Reid, >>>>>>>>> >>>>>>>>> I also checked the Centos source rpm and it > seems to include a >>>>> fix for the >>>>>>>> problem. >>>>>>>>> As Steffen suggested, if you share your CIB > settings, I might >>>>> know >>>>>>>> something. >>>>>>>>> If this issue is the same as the fix, the > display will only be >>>>> displayed on >>>>>>>> the DC node and will not affect the operation. >>>>>>>>> The pending actions shown will remain for a > long time, but >>>>> will not have a >>>>>>>> negative impact on the cluster. >>>>>>>>> Best Regards, >>>>>>>>> Hideo Yamauchi. >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>> > From: Reid Wahl <nw...@redhat.com> >>>>>>>>> > To: Cluster Labs - All topics related to > open-source >>>>> clustering >>>>>>>> welcomed <users@clusterlabs.org> >>>>>>>>> > Cc: >>>>>>>>> > Date: 2021/1/7, Thu 15:58 >>>>>>>>> > Subject: Re: [ClusterLabs] Pending > Fencing Actions shown >>>>> in pcs status >>>>>>>>> > >>>>>>>>> > It's supposedly fixed in that > version. >>>>>>>>> > - > https://bugzilla.redhat.com/show_bug.cgi?id=1787749 >>>>>>>>> > - > https://access.redhat.com/solutions/4713471 >>>>>>>>> > >>>>>>>>> > So you may be hitting a different issue > (unless >>>>> there's a bug in >>>>>>>> the >>>>>>>>> > pcmk 1.1 backport of the fix). >>>>>>>>> > >>>>>>>>> > I may be a little bit out of my area of > knowledge here, >>>>> but can you >>>>>>>>> > share the CIBs from nodes 1 and 3? Maybe > Hideo, Klaus, or >>>>> Ken has some >>>>>>>>> > insight. >>>>>>>>> > >>>>>>>>> > On Wed, Jan 6, 2021 at 10:53 PM Steffen > Vinther Sørensen >>>>>>>>> > <svint...@gmail.com> wrote: >>>>>>>>> >> >>>>>>>>> >> Hi Hideo, >>>>>>>>> >> >>>>>>>>> >> If the fix is not going to make it > into the CentOS7 >>>>> pacemaker >>>>>>>> version, >>>>>>>>> >> I guess the stable approach to take > advantage of it >>>>> is to build >>>>>>>> the >>>>>>>>> >> cluster on another OS than CentOS7 > ? A little late >>>>> for that in >>>>>>>> this >>>>>>>>> >> case though :) >>>>>>>>> >> >>>>>>>>> >> Regards >>>>>>>>> >> Steffen >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Thu, Jan 7, 2021 at 7:27 AM >>>>> <renayama19661...@ybb.ne.jp> >>>>>>>> wrote: >>>>>>>>> >> > >>>>>>>>> >> > Hi Steffen, >>>>>>>>> >> > >>>>>>>>> >> > The fix pointed out by Reid is > affecting it. >>>>>>>>> >> > >>>>>>>>> >> > Since the fencing action > requested by the DC >>>>> node exists >>>>>>>> only in the >>>>>>>>> > DC node, such an event occurs. >>>>>>>>> >> > You will need to take > advantage of the modified >>>>> pacemaker to >>>>>>>> resolve >>>>>>>>> > the issue. >>>>>>>>> >> > >>>>>>>>> >> > Best Regards, >>>>>>>>> >> > Hideo Yamauchi. >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > ----- Original Message ----- >>>>>>>>> >> > > From: Reid Wahl > <nw...@redhat.com> >>>>>>>>> >> > > To: Cluster Labs - All > topics related to >>>>> open-source >>>>>>>> clustering >>>>>>>>> > welcomed <users@clusterlabs.org> >>>>>>>>> >> > > Cc: >>>>>>>>> >> > > Date: 2021/1/7, Thu 15:07 >>>>>>>>> >> > > Subject: Re: > [ClusterLabs] Pending Fencing >>>>> Actions >>>>>>>> shown in pcs >>>>>>>>> > status >>>>>>>>> >> > > >>>>>>>>> >> > > Hi, Steffen. Are your > cluster nodes all >>>>> running the >>>>>>>> same >>>>>>>>> > Pacemaker >>>>>>>>> >> > > versions? This looks like > Bug 5401[1], >>>>> which is fixed >>>>>>>> by upstream >>>>>>>>> >> > > commit df71a07[2]. > I'm a little bit >>>>> confused about >>>>>>>> why it >>>>>>>>> > only shows >>>>>>>>> >> > > up on one out of three > nodes though. >>>>>>>>> >> > > >>>>>>>>> >> > > [1] >>>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5401 >>>>>>>>> >> > > [2] >>>>>>>> > https://github.com/ClusterLabs/pacemaker/commit/df71a07 >>>>>>>>> >> > > >>>>>>>>> >> > > On Tue, Jan 5, 2021 at > 8:31 AM Steffen >>>>> Vinther Sørensen >>>>>>>>> >> > > > <svint...@gmail.com> wrote: >>>>>>>>> >> > >> >>>>>>>>> >> > >> Hello >>>>>>>>> >> > >> >>>>>>>>> >> > >> node 1 is showing > this in 'pcs >>>>> status' >>>>>>>>> >> > >> >>>>>>>>> >> > >> Pending Fencing > Actions: >>>>>>>>> >> > >> * reboot of >>>>> kvm03-node02.avigol-gcs.dk pending: >>>>>>>>> > client=crmd.37819, >>>>>>>>> >> > >> > origin=kvm03-node03.avigol-gcs.dk >>>>>>>>> >> > >> >>>>>>>>> >> > >> node 2 and node 3 > outputs no such >>>>> thing (node 3 is >>>>>>>> DC) >>>>>>>>> >> > >> >>>>>>>>> >> > >> Google is not much > help, how to >>>>> investigate this >>>>>>>> further and >>>>>>>>> > get rid >>>>>>>>> >> > >> of such terrifying > status message ? >>>>>>>>> >> > >> >>>>>>>>> >> > >> Regards >>>>>>>>> >> > >> Steffen >>>>>>>>> >> > >> >>>>> _______________________________________________ >>>>>>>>> >> > >> Manage your > subscription: >>>>>>>>> >> > >> >>>>>>>> > https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >> > >> >>>>>>>>> >> > >> ClusterLabs home: >>>>> https://www.clusterlabs.org/ >>>>>>>>> >> > >> >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > -- >>>>>>>>> >> > > Regards, >>>>>>>>> >> > > >>>>>>>>> >> > > Reid Wahl, RHCA >>>>>>>>> >> > > Senior Software > Maintenance Engineer, Red >>>>> Hat >>>>>>>>> >> > > CEE - Platform Support > Delivery - >>>>> ClusterHA >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>> _______________________________________________ >>>>>>>>> >> > > Manage your subscription: >>>>>>>>> >> > > >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >> > > >>>>>>>>> >> > > ClusterLabs home: >>>>> https://www.clusterlabs.org/ >>>>>>>>> >> > > >>>>>>>>> >> > >>>>>>>>> >> > > _______________________________________________ >>>>>>>>> >> > Manage your subscription: >>>>>>>>> >> > >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >> > >>>>>>>>> >> > ClusterLabs home: > https://www.clusterlabs.org/ >>>>>>>>> >> > _______________________________________________ >>>>>>>>> >> Manage your subscription: >>>>>>>>> >> > https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >> >>>>>>>>> >> ClusterLabs home: > https://www.clusterlabs.org/ >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > Regards, >>>>>>>>> > >>>>>>>>> > Reid Wahl, RHCA >>>>>>>>> > Senior Software Maintenance Engineer, > Red Hat >>>>>>>>> > CEE - Platform Support Delivery - > ClusterHA >>>>>>>>> > >>>>>>>>> > > _______________________________________________ >>>>>>>>> > Manage your subscription: >>>>>>>>> > > https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> > >>>>>>>>> > ClusterLabs home: > https://www.clusterlabs.org/ >>>>>>>>> > >>>>>>>>> >>>>>>>>> > _______________________________________________ >>>>>>>>> Manage your subscription: >>>>>>>>> > https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>>>> >>>>>>>>> ClusterLabs home: > https://www.clusterlabs.org/ >>>>>>> _______________________________________________ >>>>>>> Manage your subscription: >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>>>> >>>>>> -- >>>>>> >>>>>> Regards, >>>>>> >>>>>> Reid Wahl, RHCA >>>>>> >>>>>> Senior Software Maintenance Engineer, Red Hat >>>>>> CEE - Platform Support Delivery - ClusterHA >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Manage your subscription: >>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> ClusterLabs home: https://www.clusterlabs.org/ >>>>> >>>> _______________________________________________ >>>> Manage your subscription: >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> ClusterLabs home: https://www.clusterlabs.org/ >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/