On Wed, Dec 21, 2022 at 11:26 AM Reid Wahl <nw...@redhat.com> wrote: > On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl > <ulrich.wi...@rz.uni-regensburg.de> wrote: > > > > Hi! > > > > I wonder: Could the error message be triggered by adding an exclusive > manatory > > lock in the ip binary? > > If that triggers the bug, I'm rather sure that the error message is bad. > > Shouldn't that be EWOULDBLOCK then? > > I did some cursory reading earlier today, and it seems that ETXTBSY is > becoming less common: https://lwn.net/Articles/866493/ > > Either way, that would be a question for kernel maintainers. >
Maybe network-stack-guys there or sbdy with deeper insight of how the ip-tool is currently interfering with the kernel. Without knowing any details certain things might be handled calling bpf-binaries and ip being the userspace application this might still be shown if it was actually rather about a bpf-binary to be executed. Thinking of race-conditions at that front ... > > > (I have no idea how Sophos AV works, though. If they open the files to > check > > in write-mode, it's really stupid then IMHO) > > > > Regards, > > Ulrich > > > > > > >>> Reid Wahl <nw...@redhat.com> schrieb am 21.12.2022 um 10:19 in > Nachricht > > <capiuu9-fisqxapf123ersrwmwrake2nnk6pgwwfq4fmfsnx...@mail.gmail.com>: > > > On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS <t...@ikoula.com> wrote: > > >> > > >> Ken, > > >> > > >> Antivirus (sophos-av) is running but not in "real time access > scanning", > > the > > > scheduled scan is however at 9pm every day. > > >> 7 minutes later, we got these alerts. > > >> The anti virus may indeed be the cause. > > > > > > I see. That does seem fairly likely. At least, there's no other > > > obvious candidate for the cause. > > > > > > I used to work on a customer-facing support team for the ClusterLabs > > > suite, and we received a fair number of cases where bizarre issues > > > (such as hangs and access errors) were apparently caused by an > > > antivirus. In those cases, all other usual lines of investigation were > > > exhausted, and when we asked the customer to disable their AV, the > > > issue disappeared. This happened with several different AV products. > > > > > > I can't say with any certainty that the AV is causing your issue, and > > > I know it's frustrating that you won't know whether any given > > > intervention worked, since this only happens once every few months. > > > > > > You may want to either exclude certain files from the scan, or write a > > > short script to place the cluster in maintenance mode before the scan > > > and take it out of maintenance after the scan is complete. > > > > > >> > > >> I had the case on December 13 (with systemctl here): > > >> > > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 > pacemaker-controld > > > > > [5082] (process_lrm_event) notice: > wd-websqlng01-NGINX_monitor_15000:454 [ > > > > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: > systemctl: Text > > > > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd: > > > /bin/systemctl: Text file busy\n ] > > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 > pacemaker-controld > > > > > [5082] (process_lrm_event) notice: > wd-websqlng01-NGINX_monitor_15000:454 [ > > > > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: > systemctl: Text > > > > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd: > > > /bin/systemctl: Text file busy\n ] > > >> > > >> After, this happens rarely, we had the case in August: > > >> > > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 > pacemaker-controld > > > > > [3718] (process_lrm_event) notice: > > > wd-websqlng01-NGINX-VIP-232_monitor_10000:2877 [ > > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file > > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ] > > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 > pacemaker-controld > > > > > [3718] (process_lrm_event) notice: > > > wd-websqlng01-NGINX-VIP-231_monitor_10000:2880 [ > > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file > > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ] > > >> > > >> It's always around 9:00-9:07 pm, > > >> I'll move the virus scan to 10pm and see. > > > > > > That also sounds like a good plan to confirm the cause :) It might > > > take a while to find out though. > > > > > >> > > >> Thanks, > > >> Best regards, > > >> > > >> Thomas Cas | Technicien du support infogérance > > >> PHONE : +33 3 51 25 23 26 WEB : www.ikoula.com/en > > >> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE > > >> Before printing this letter, think about the impact on the > environment! > > >> > > >> -----Message d'origine----- > > >> De : Reid Wahl <nw...@redhat.com> > > >> Envoyé : mardi 20 décembre 2022 20:34 > > >> À : Cluster Labs - All topics related to open-source clustering > welcomed > > > <users@clusterlabs.org> > > >> Cc : Ken Gaillot <kgail...@redhat.com>; Service Infogérance > > > <infogera...@ikoula.com> > > >> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP > > >> > > >> [Vous ne recevez pas souvent de courriers de nw...@redhat.com. > Découvrez > > > pourquoi ceci est important à > https://aka.ms/LearnAboutSenderIdentification > > ] > > >> > > >> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <t...@ikoula.com> wrote: > > >> > > > >> > Hello Ken, > > >> > > > >> > Thanks for your answer. > > >> > There was no update running at the time of the bug, which is why I > > thought > > > that having too many IPs caused this type of error. > > >> > The /usr/sbin/ip executable was not being modified either. > > >> > > > >> > We have many clusters, and only this one has so many IPs and this > > problem. > > >> > > >> How often does this happen, and is it reliably reproducible under any > > > circumstances? Any antivirus software running? It'd be nice to check > > > something like lsof or strace while it's happening, but that may not be > > > feasible if it's sporadic; running those at every monitor would > generate > > lots > > > of logs. > > >> > > >> AFAICT, having multiple processes execute (or read) the `ip` binary > > > simultaneously *shouldn't* cause problems, as long as nothing opens it > for > > > write. > > >> > > >> > > > >> > Best regards, > > >> > > > >> > Thomas Cas | Technicien du support infogérance > > >> > PHONE : +33 3 51 25 23 26 WEB : > > > > > > https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c > > > > > > > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com > %7C9aab91944bd6454a773808dae2c13ae4%7C > > > > > > cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF > > > > > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 > > > > > > C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve > > > d=0 > > >> > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before > > >> > printing this letter, think about the impact on the environment! > > >> > > > >> > -----Message d'origine----- > > >> > De : Ken Gaillot <kgail...@redhat.com> Envoyé : lundi 19 décembre > 2022 > > >> > 22:08 À : Cluster Labs - All topics related to open-source > clustering > > >> > welcomed <users@clusterlabs.org> Cc : Service Infogérance > > >> > <infogera...@ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker > with > > >> > multiple IP > > >> > > > >> > [Vous ne recevez pas souvent de courriers de kgail...@redhat.com. > > >> > Découvrez pourquoi ceci est important à > > >> > https://aka.ms/LearnAboutSenderIdentification ] > > >> > > > >> > On Mon, 2022-12-19 at 09:48 +0000, Thomas CAS wrote: > > >> > > Hello Clusterlabs, > > >> > > > > >> > > I would like to report a bug on Pacemaker with the "IPaddr2" > > >> > > resource: > > >> > > > > >> > > OS: Debian 10 > > >> > > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian > 4.19.208-1 > > >> > > (2021-09-29) x86_64 GNU/Linux > > >> > > Pacemaker version: 2.0.1-5+deb10u2 > > >> > > > > >> > > You will find the configuration of our cluster with 2 nodes > attached. > > >> > > > > >> > > Bug : > > >> > > > > >> > > We have several IP configured in the cluster configuration (12) > > >> > > Sometimes the cluster is unstable with the following errors in the > > >> > > pacemaker logs: > > >> > > > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 232_monitor_10000:28835:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > > >> > This doesn't sound like a bug in the agent; "Text file busy" > suggests > > that > > > the system "ip" command is being modified while the command is > running. Is a > > > > > software update happening when the problem occurs? > > >> > > > >> > I'm not sure whether there's some other situation that could cause > that > > > error, but simply executing the command a bunch of times simultaneously > > > shouldn't cause it as far as I know. > > >> > > > >> > If simultaneous monitors is somehow causing the problem, you should > be > > able > > > to work around it by using different intervals for different monitors. > > >> > > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 239_monitor_10000:28877:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 239_monitor_10000:28877:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 234_monitor_10000:28830:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 231_monitor_10000:28900:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 231_monitor_10000:28900:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 235_monitor_10000:28905:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 235_monitor_10000:28905:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 237_monitor_10000:28890:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 237_monitor_10000:28890:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 238_monitor_10000:28876:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: NGINX-VIP- > > >> > > 238_monitor_10000:28876:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: > NGINX-VIP_monitor_10000:28880:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > > >> > > (operation_finished) notice: > NGINX-VIP_monitor_10000:28880:stderr [ > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > > >> > > > > >> > > The reason is that there are a lot of IPs configured and if the > > >> > > monitors take place at the same time it causes this type of error. > > >> > > > > >> > > Best regards, > > >> > > > > >> > > Thomas Cas | Technicien du support infogérance > > >> > > PHONE : +33 3 51 25 23 26 WEB : > > > > > > https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c > > > > > > > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com > %7C9aab91944bd6454a773808dae2c13ae4%7C > > > > > > cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF > > > > > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 > > > > > > C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve > > > d=0 > > >> > > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE > Before > > >> > > printing this letter, think about the impact on the environment! > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > _______________________________________________ > > >> > > Manage your subscription: > > >> > > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli > > >> > > st > > >> > > s.clusterlabs.org > %2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4 > > >> > > 0i > > >> > > koula.com > %7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8 > > >> > > 0d > > >> > > > b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI > > >> > > jo > > >> > > > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > > >> > > C% > > >> > > > 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res > > >> > > er > > >> > > ved=0 > > >> > > > > >> > > ClusterLabs home: > > >> > > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > > >> > > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com > %7C541f4960600340 > > >> > > f9 > > >> > > > 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070 > > >> > > 80 > > >> > > > 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM > > >> > > zI > > >> > > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on > > >> > > Dk > > >> > > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0 > > >> > -- > > >> > Ken Gaillot <kgail...@redhat.com> > > >> > > > >> > _______________________________________________ > > >> > Manage your subscription: > > >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > > >> > s.clusterlabs.org > %2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i > > >> > koula.com > %7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d > > >> > > b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo > > >> > > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% > > >> > > 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved= > > >> > 0 > > >> > > > >> > ClusterLabs home: > > >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > > >> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com > %7C9aab91944bd6454a > > >> > > 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161 > > >> > > 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI > > >> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy > > >> > %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0 > > >> > > > >> > > >> > > >> -- > > >> Regards, > > >> > > >> Reid Wahl (He/Him) > > >> Senior Software Engineer, Red Hat > > >> RHEL High Availability - Pacemaker > > >> > > > > > > > > > -- > > > Regards, > > > > > > Reid Wahl (He/Him) > > > Senior Software Engineer, Red Hat > > > RHEL High Availability - Pacemaker > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > -- > Regards, > > Reid Wahl (He/Him) > Senior Software Engineer, Red Hat > RHEL High Availability - Pacemaker > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/