Hello Hideo. I am overwhelmed by how serious this group is taking good care of issues.
For your information, the 'pending fencing action' status disappeared after bringing the nodes offline, and during that I found some gfs2 errors that were fixed by fsck.gfs2, and since then my cluster has been acting very stable. If I can provide more info let me know. /Steffen On Tue, Jan 12, 2021 at 3:45 AM <renayama19661...@ybb.ne.jp> wrote: > Hi Steffen, > > I've been experimenting with it since last weekend, but I haven't been > able to reproduce the same situation. > It seems that the cause is that the reproduction method cannot be limited. > > Can I attach a problem log? > > Best Regards, > Hideo Yamauchi. > > > ----- Original Message ----- > > From: Klaus Wenninger <kwenn...@redhat.com> > > To: Steffen Vinther Sørensen <svint...@gmail.com>; Cluster Labs - All > topics related to open-source clustering welcomed <users@clusterlabs.org> > > Cc: > > Date: 2021/1/7, Thu 21:42 > > Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs status > > > > On 1/7/21 1:13 PM, Steffen Vinther Sørensen wrote: > >> Hi Klaus, > >> > >> Yes then the status does sync to the other nodes. Also it looks like > >> there are some hostname resolving problems in play here, maybe causing > >> problems, here is my notes from restarting pacemaker etc. > > Don't think there are hostname resolving problems. > > The messages you are seeing, that look as if, are caused > > by using -EHOSTUNREACH as error-code to fail a pending > > fence action when a node that is just coming up sees > > a pending action that is claimed to be handled by himself. > > Back then I chose that error-code as there was none > > that really matched available right away and it was > > urgent for some reason so that introduction of something > > new was to risky at that state. > > Probably would make sense to introduce something that > > is more descriptive. > > Back then the issue was triggered by fenced crashing and > > being restarted - so not a node-restart but just fenced > > restarting. > > And it looks as if building the failed-message failed somehow. > > So that could be the reason why the pending action persists. > > Would be something else then what we solved with Bug 5401. > > But what triggers the logs below might as well just be a > > follow-up issue after the Bug 5401 thing. > > Will try to find time for a deeper look later today. > > > > Klaus > >> > >> pcs cluster standby kvm03-node02.avigol-gcs.dk > >> pcs cluster stop kvm03-node02.avigol-gcs.dk > >> pcs status > >> > >> Pending Fencing Actions: > >> * reboot of kvm03-node02.avigol-gcs.dk pending: client=crmd.37819, > >> origin=kvm03-node03.avigol-gcs.dk > >> > >> # From logs on all 3 nodes: > >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: received > >> pending action we are supposed to be the owner but it's not in our > >> records -> fail it > >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: Operation > >> 'reboot' targeting kvm03-node02.avigol-gcs.dk on <no-one> for > >> crmd.37...@kvm03-node03.avigol-gcs.dk.56a3018c: No route to host > >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: error: > >> stonith_construct_reply: Triggered assert at commands.c:2406 : request > >> != NULL > >> Jan 07 12:48:18 kvm03-node03 stonith-ng[37815]: warning: Can't create > >> a sane reply > >> Jan 07 12:48:18 kvm03-node03 crmd[37819]: notice: Peer > >> kvm03-node02.avigol-gcs.dk was not terminated (reboot) by <anyone> on > >> behalf of crmd.37819: No route to host > >> > >> pcs cluster start kvm03-node02.avigol-gcs.dk > >> pcs status (now outputs the same on all 3 nodes) > >> > >> Failed Fencing Actions: > >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=, > >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk, > >> last-failed='Thu Jan 7 12:48:18 2021' > >> > >> > >> pcs cluster unstandby kvm03-node02.avigol-gcs.dk > >> > >> # Now libvirtd refuses to start > >> > >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read /etc/hosts - 8 > addresses > >> Jan 07 12:51:44 kvm03-node02 dnsmasq[20884]: read > >> /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses > >> Jan 07 12:51:44 kvm03-node02 dnsmasq-dhcp[20884]: read > >> /var/lib/libvirt/dnsmasq/default.hostsfile > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.729+0000: 24160: info : libvirt version: 4.5.0, package: > >> 36.el7_9.3 (CentOS BuildSystem <http://bugs.centos.org >, > >> 2020-11-16-16:25:20, x86-01.bsys.centos.org) > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.729+0000: 24160: info : hostname: kvm03-node02 > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.729+0000: 24160: error : qemuMonitorOpenUnix:392 : failed to > >> connect to monitor socket: Connection refused > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.729+0000: 24159: error : qemuMonitorOpenUnix:392 : failed to > >> connect to monitor socket: Connection refused > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.730+0000: 24161: error : qemuMonitorOpenUnix:392 : failed to > >> connect to monitor socket: Connection refused > >> Jan 07 12:51:44 kvm03-node02 libvirtd[24091]: 2021-01-07 > >> 11:51:44.730+0000: 24162: error : qemuMonitorOpenUnix:392 : failed to > >> connect to monitor socket: Connection refused > >> > >> pcs status > >> > >> Failed Resource Actions: > >> * libvirtd_start_0 on kvm03-node02.avigol-gcs.dk 'unknown error' > > (1): > >> call=142, status=complete, exitreason='', > >> last-rc-change='Thu Jan 7 12:51:44 2021', queued=0ms, > > exec=2157ms > >> > >> Failed Fencing Actions: > >> * reboot of kvm03-node02.avigol-gcs.dk failed: delegate=, > >> client=crmd.37819, origin=kvm03-node03.avigol-gcs.dk, > >> last-failed='Thu Jan 7 12:48:18 2021' > >> > >> > >> # from /etc/hosts on all 3 nodes: > >> > >> 172.31.0.31 kvm03-node01 kvm03-node01.avigol-gcs.dk > >> 172.31.0.32 kvm03-node02 kvm03-node02.avigol-gcs.dk > >> 172.31.0.33 kvm03-node03 kvm03-node03.avigol-gcs.dk > >> > >> On Thu, Jan 7, 2021 at 11:15 AM Klaus Wenninger <kwenn...@redhat.com> > > wrote: > >>> Hi Steffen, > >>> > >>> If you just see the leftover pending-action on one node > >>> it would be interesting if restarting of pacemaker on > >>> one of the other nodes does sync it to all of the > >>> nodes. > >>> > >>> Regards, > >>> Klaus > >>> > >>> On 1/7/21 9:54 AM, renayama19661...@ybb.ne.jp wrote: > >>>> Hi Steffen, > >>>> > >>>>> Unfortunately not sure about the exact scenario. But I have > > been doing > >>>>> some recent experiments with node standby/unstandby stop/start. > > This > >>>>> to get procedures right for updating node rpms etc. > >>>>> > >>>>> Later I noticed the uncomforting "pending fencing > > actions" status msg. > >>>> Okay! > >>>> > >>>> Repeat the standby and unstandby steps in the same way to check. > >>>> We will start checking after tomorrow, so I think it will take some > > time until next week. > >>>> > >>>> > >>>> Many thanks, > >>>> Hideo Yamauchi. > >>>> > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> From: "renayama19661...@ybb.ne.jp" > > <renayama19661...@ybb.ne.jp> > >>>>> To: Reid Wahl <nw...@redhat.com>; Cluster Labs - All > > topics related to open-source clustering welcomed <users@clusterlabs.org > > > >>>>> Cc: > >>>>> Date: 2021/1/7, Thu 17:51 > >>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in pcs > > status > >>>>> > >>>>> Hi Steffen, > >>>>> Hi Reid, > >>>>> > >>>>> The fencing history is kept inside stonith-ng and is not > > written to cib. > >>>>> However, getting the entire cib and getting it sent will help > > you to reproduce > >>>>> the problem. > >>>>> > >>>>> Best Regards, > >>>>> Hideo Yamauchi. > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> From: Reid Wahl <nw...@redhat.com> > >>>>>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics > > related to > >>>>> open-source clustering welcomed <users@clusterlabs.org> > >>>>>> Date: 2021/1/7, Thu 17:39 > >>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions shown in > > pcs status > >>>>>> > >>>>>> > >>>>>> Hi, Steffen. Those attachments don't contain the CIB. > > They contain the > >>>>> `pcs config` output. You can get the cib with `pcs cluster cib > >> > >>>>> $(hostname).cib.xml`. > >>>>>> Granted, it's possible that this fence action > > information wouldn't > >>>>> be in the CIB at all. It might be stored in fencer memory. > >>>>>> On Thu, Jan 7, 2021 at 12:26 AM > > <renayama19661...@ybb.ne.jp> wrote: > >>>>>> > >>>>>> Hi Steffen, > >>>>>>>> Here CIB settings attached (pcs config show) for > > all 3 of my nodes > >>>>>>>> (all 3 seems 100% identical), node03 is the DC. > >>>>>>> Thank you for the attachment. > >>>>>>> > >>>>>>> What is the scenario when this situation occurs? > >>>>>>> In what steps did the problem appear when fencing was > > performed (or > >>>>> failed)? > >>>>>>> Best Regards, > >>>>>>> Hideo Yamauchi. > >>>>>>> > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> From: Steffen Vinther Sørensen > > <svint...@gmail.com> > >>>>>>>> To: renayama19661...@ybb.ne.jp; Cluster Labs - All > > topics related > >>>>> to open-source clustering welcomed > > <users@clusterlabs.org> > >>>>>>>> Cc: > >>>>>>>> Date: 2021/1/7, Thu 17:05 > >>>>>>>> Subject: Re: [ClusterLabs] Pending Fencing Actions > > shown in pcs > >>>>> status > >>>>>>>> Hi Hideo, > >>>>>>>> > >>>>>>>> Here CIB settings attached (pcs config show) for > > all 3 of my nodes > >>>>>>>> (all 3 seems 100% identical), node03 is the DC. > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> Steffen > >>>>>>>> > >>>>>>>> On Thu, Jan 7, 2021 at 8:06 AM > > <renayama19661...@ybb.ne.jp> > >>>>> wrote: > >>>>>>>>> Hi Steffen, > >>>>>>>>> Hi Reid, > >>>>>>>>> > >>>>>>>>> I also checked the Centos source rpm and it > > seems to include a > >>>>> fix for the > >>>>>>>> problem. > >>>>>>>>> As Steffen suggested, if you share your CIB > > settings, I might > >>>>> know > >>>>>>>> something. > >>>>>>>>> If this issue is the same as the fix, the > > display will only be > >>>>> displayed on > >>>>>>>> the DC node and will not affect the operation. > >>>>>>>>> The pending actions shown will remain for a > > long time, but > >>>>> will not have a > >>>>>>>> negative impact on the cluster. > >>>>>>>>> Best Regards, > >>>>>>>>> Hideo Yamauchi. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>> > From: Reid Wahl <nw...@redhat.com> > >>>>>>>>> > To: Cluster Labs - All topics related to > > open-source > >>>>> clustering > >>>>>>>> welcomed <users@clusterlabs.org> > >>>>>>>>> > Cc: > >>>>>>>>> > Date: 2021/1/7, Thu 15:58 > >>>>>>>>> > Subject: Re: [ClusterLabs] Pending > > Fencing Actions shown > >>>>> in pcs status > >>>>>>>>> > > >>>>>>>>> > It's supposedly fixed in that > > version. > >>>>>>>>> > - > > https://bugzilla.redhat.com/show_bug.cgi?id=1787749 > >>>>>>>>> > - > > https://access.redhat.com/solutions/4713471 > >>>>>>>>> > > >>>>>>>>> > So you may be hitting a different issue > > (unless > >>>>> there's a bug in > >>>>>>>> the > >>>>>>>>> > pcmk 1.1 backport of the fix). > >>>>>>>>> > > >>>>>>>>> > I may be a little bit out of my area of > > knowledge here, > >>>>> but can you > >>>>>>>>> > share the CIBs from nodes 1 and 3? Maybe > > Hideo, Klaus, or > >>>>> Ken has some > >>>>>>>>> > insight. > >>>>>>>>> > > >>>>>>>>> > On Wed, Jan 6, 2021 at 10:53 PM Steffen > > Vinther Sørensen > >>>>>>>>> > <svint...@gmail.com> wrote: > >>>>>>>>> >> > >>>>>>>>> >> Hi Hideo, > >>>>>>>>> >> > >>>>>>>>> >> If the fix is not going to make it > > into the CentOS7 > >>>>> pacemaker > >>>>>>>> version, > >>>>>>>>> >> I guess the stable approach to take > > advantage of it > >>>>> is to build > >>>>>>>> the > >>>>>>>>> >> cluster on another OS than CentOS7 > > ? A little late > >>>>> for that in > >>>>>>>> this > >>>>>>>>> >> case though :) > >>>>>>>>> >> > >>>>>>>>> >> Regards > >>>>>>>>> >> Steffen > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> >> On Thu, Jan 7, 2021 at 7:27 AM > >>>>> <renayama19661...@ybb.ne.jp> > >>>>>>>> wrote: > >>>>>>>>> >> > > >>>>>>>>> >> > Hi Steffen, > >>>>>>>>> >> > > >>>>>>>>> >> > The fix pointed out by Reid is > > affecting it. > >>>>>>>>> >> > > >>>>>>>>> >> > Since the fencing action > > requested by the DC > >>>>> node exists > >>>>>>>> only in the > >>>>>>>>> > DC node, such an event occurs. > >>>>>>>>> >> > You will need to take > > advantage of the modified > >>>>> pacemaker to > >>>>>>>> resolve > >>>>>>>>> > the issue. > >>>>>>>>> >> > > >>>>>>>>> >> > Best Regards, > >>>>>>>>> >> > Hideo Yamauchi. > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > ----- Original Message ----- > >>>>>>>>> >> > > From: Reid Wahl > > <nw...@redhat.com> > >>>>>>>>> >> > > To: Cluster Labs - All > > topics related to > >>>>> open-source > >>>>>>>> clustering > >>>>>>>>> > welcomed <users@clusterlabs.org> > >>>>>>>>> >> > > Cc: > >>>>>>>>> >> > > Date: 2021/1/7, Thu 15:07 > >>>>>>>>> >> > > Subject: Re: > > [ClusterLabs] Pending Fencing > >>>>> Actions > >>>>>>>> shown in pcs > >>>>>>>>> > status > >>>>>>>>> >> > > > >>>>>>>>> >> > > Hi, Steffen. Are your > > cluster nodes all > >>>>> running the > >>>>>>>> same > >>>>>>>>> > Pacemaker > >>>>>>>>> >> > > versions? This looks like > > Bug 5401[1], > >>>>> which is fixed > >>>>>>>> by upstream > >>>>>>>>> >> > > commit df71a07[2]. > > I'm a little bit > >>>>> confused about > >>>>>>>> why it > >>>>>>>>> > only shows > >>>>>>>>> >> > > up on one out of three > > nodes though. > >>>>>>>>> >> > > > >>>>>>>>> >> > > [1] > >>>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5401 > >>>>>>>>> >> > > [2] > >>>>>>>> > > https://github.com/ClusterLabs/pacemaker/commit/df71a07 > >>>>>>>>> >> > > > >>>>>>>>> >> > > On Tue, Jan 5, 2021 at > > 8:31 AM Steffen > >>>>> Vinther Sørensen > >>>>>>>>> >> > > > > <svint...@gmail.com> wrote: > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> Hello > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> node 1 is showing > > this in 'pcs > >>>>> status' > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> Pending Fencing > > Actions: > >>>>>>>>> >> > >> * reboot of > >>>>> kvm03-node02.avigol-gcs.dk pending: > >>>>>>>>> > client=crmd.37819, > >>>>>>>>> >> > >> > > origin=kvm03-node03.avigol-gcs.dk > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> node 2 and node 3 > > outputs no such > >>>>> thing (node 3 is > >>>>>>>> DC) > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> Google is not much > > help, how to > >>>>> investigate this > >>>>>>>> further and > >>>>>>>>> > get rid > >>>>>>>>> >> > >> of such terrifying > > status message ? > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> Regards > >>>>>>>>> >> > >> Steffen > >>>>>>>>> >> > >> > >>>>> _______________________________________________ > >>>>>>>>> >> > >> Manage your > > subscription: > >>>>>>>>> >> > >> > >>>>>>>> > > https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >> > >> > >>>>>>>>> >> > >> ClusterLabs home: > >>>>> https://www.clusterlabs.org/ > >>>>>>>>> >> > >> > >>>>>>>>> >> > > > >>>>>>>>> >> > > > >>>>>>>>> >> > > -- > >>>>>>>>> >> > > Regards, > >>>>>>>>> >> > > > >>>>>>>>> >> > > Reid Wahl, RHCA > >>>>>>>>> >> > > Senior Software > > Maintenance Engineer, Red > >>>>> Hat > >>>>>>>>> >> > > CEE - Platform Support > > Delivery - > >>>>> ClusterHA > >>>>>>>>> >> > > > >>>>>>>>> >> > > > >>>>> _______________________________________________ > >>>>>>>>> >> > > Manage your subscription: > >>>>>>>>> >> > > > >>>>> https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >> > > > >>>>>>>>> >> > > ClusterLabs home: > >>>>> https://www.clusterlabs.org/ > >>>>>>>>> >> > > > >>>>>>>>> >> > > >>>>>>>>> >> > > > _______________________________________________ > >>>>>>>>> >> > Manage your subscription: > >>>>>>>>> >> > > >>>>> https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >> > > >>>>>>>>> >> > ClusterLabs home: > > https://www.clusterlabs.org/ > >>>>>>>>> >> > > _______________________________________________ > >>>>>>>>> >> Manage your subscription: > >>>>>>>>> >> > > https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> >> > >>>>>>>>> >> ClusterLabs home: > > https://www.clusterlabs.org/ > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > -- > >>>>>>>>> > Regards, > >>>>>>>>> > > >>>>>>>>> > Reid Wahl, RHCA > >>>>>>>>> > Senior Software Maintenance Engineer, > > Red Hat > >>>>>>>>> > CEE - Platform Support Delivery - > > ClusterHA > >>>>>>>>> > > >>>>>>>>> > > > _______________________________________________ > >>>>>>>>> > Manage your subscription: > >>>>>>>>> > > > https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> > > >>>>>>>>> > ClusterLabs home: > > https://www.clusterlabs.org/ > >>>>>>>>> > > >>>>>>>>> > >>>>>>>>> > > _______________________________________________ > >>>>>>>>> Manage your subscription: > >>>>>>>>> > > https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>>>> > >>>>>>>>> ClusterLabs home: > > https://www.clusterlabs.org/ > >>>>>>> _______________________________________________ > >>>>>>> Manage your subscription: > >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users > >>>>>>> > >>>>>>> ClusterLabs home: https://www.clusterlabs.org/ > >>>>>>> > >>>>>> -- > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Reid Wahl, RHCA > >>>>>> > >>>>>> Senior Software Maintenance Engineer, Red Hat > >>>>>> CEE - Platform Support Delivery - ClusterHA > >>>>>> > >>>>>> > >>>>> _______________________________________________ > >>>>> Manage your subscription: > >>>>> https://lists.clusterlabs.org/mailman/listinfo/users > >>>>> > >>>>> ClusterLabs home: https://www.clusterlabs.org/ > >>>>> > >>>> _______________________________________________ > >>>> Manage your subscription: > >>>> https://lists.clusterlabs.org/mailman/listinfo/users > >>>> > >>>> ClusterLabs home: https://www.clusterlabs.org/ > >>> _______________________________________________ > >>> Manage your subscription: > >>> https://lists.clusterlabs.org/mailman/listinfo/users > >>> > >>> ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/