On Wed, Dec 21, 2022 at 2:15 AM Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> wrote: > > Hi! > > I wonder: Could the error message be triggered by adding an exclusive manatory > lock in the ip binary? > If that triggers the bug, I'm rather sure that the error message is bad. > Shouldn't that be EWOULDBLOCK then?
I did some cursory reading earlier today, and it seems that ETXTBSY is becoming less common: https://lwn.net/Articles/866493/ Either way, that would be a question for kernel maintainers. > (I have no idea how Sophos AV works, though. If they open the files to check > in write-mode, it's really stupid then IMHO) > > Regards, > Ulrich > > > >>> Reid Wahl <nw...@redhat.com> schrieb am 21.12.2022 um 10:19 in Nachricht > <capiuu9-fisqxapf123ersrwmwrake2nnk6pgwwfq4fmfsnx...@mail.gmail.com>: > > On Wed, Dec 21, 2022 at 12:24 AM Thomas CAS <t...@ikoula.com> wrote: > >> > >> Ken, > >> > >> Antivirus (sophos-av) is running but not in "real time access scanning", > the > > scheduled scan is however at 9pm every day. > >> 7 minutes later, we got these alerts. > >> The anti virus may indeed be the cause. > > > > I see. That does seem fairly likely. At least, there's no other > > obvious candidate for the cause. > > > > I used to work on a customer-facing support team for the ClusterLabs > > suite, and we received a fair number of cases where bizarre issues > > (such as hangs and access errors) were apparently caused by an > > antivirus. In those cases, all other usual lines of investigation were > > exhausted, and when we asked the customer to disable their AV, the > > issue disappeared. This happened with several different AV products. > > > > I can't say with any certainty that the AV is causing your issue, and > > I know it's frustrating that you won't know whether any given > > intervention worked, since this only happens once every few months. > > > > You may want to either exclude certain files from the scan, or write a > > short script to place the cluster in maintenance mode before the scan > > and take it out of maintenance after the scan is complete. > > > >> > >> I had the case on December 13 (with systemctl here): > >> > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld > > > [5082] (process_lrm_event) notice: wd-websqlng01-NGINX_monitor_15000:454 [ > > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text > > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd: > > /bin/systemctl: Text file busy\n ] > >> pacemaker.log-20221217.gz:Dec 13 21:07:53 wd-websqlng01 pacemaker-controld > > > [5082] (process_lrm_event) notice: wd-websqlng01-NGINX_monitor_15000:454 [ > > > /etc/init.d/nginx: 33: /lib/lsb/init-functions.d/40-systemd: systemctl: Text > > > file busy\n/etc/init.d/nginx: 82: /lib/lsb/init-functions.d/40-systemd: > > /bin/systemctl: Text file busy\n ] > >> > >> After, this happens rarely, we had the case in August: > >> > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld > > > [3718] (process_lrm_event) notice: > > wd-websqlng01-NGINX-VIP-232_monitor_10000:2877 [ > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ] > >> pacemaker.log-20220826.gz:Aug 25 21:06:31 wd-websqlng01 pacemaker-controld > > > [3718] (process_lrm_event) notice: > > wd-websqlng01-NGINX-VIP-231_monitor_10000:2880 [ > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: uname: Text file > > busy\nocf-exit-reason:IPaddr2 only supported Linux.\n ] > >> > >> It's always around 9:00-9:07 pm, > >> I'll move the virus scan to 10pm and see. > > > > That also sounds like a good plan to confirm the cause :) It might > > take a while to find out though. > > > >> > >> Thanks, > >> Best regards, > >> > >> Thomas Cas | Technicien du support infogérance > >> PHONE : +33 3 51 25 23 26 WEB : www.ikoula.com/en > >> IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE > >> Before printing this letter, think about the impact on the environment! > >> > >> -----Message d'origine----- > >> De : Reid Wahl <nw...@redhat.com> > >> Envoyé : mardi 20 décembre 2022 20:34 > >> À : Cluster Labs - All topics related to open-source clustering welcomed > > <users@clusterlabs.org> > >> Cc : Ken Gaillot <kgail...@redhat.com>; Service Infogérance > > <infogera...@ikoula.com> > >> Objet : Re: [ClusterLabs] Bug pacemaker with multiple IP > >> > >> [Vous ne recevez pas souvent de courriers de nw...@redhat.com. Découvrez > > pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification > ] > >> > >> On Tue, Dec 20, 2022 at 6:25 AM Thomas CAS <t...@ikoula.com> wrote: > >> > > >> > Hello Ken, > >> > > >> > Thanks for your answer. > >> > There was no update running at the time of the bug, which is why I > thought > > that having too many IPs caused this type of error. > >> > The /usr/sbin/ip executable was not being modified either. > >> > > >> > We have many clusters, and only this one has so many IPs and this > problem. > >> > >> How often does this happen, and is it reliably reproducible under any > > circumstances? Any antivirus software running? It'd be nice to check > > something like lsof or strace while it's happening, but that may not be > > feasible if it's sporadic; running those at every monitor would generate > lots > > of logs. > >> > >> AFAICT, having multiple processes execute (or read) the `ip` binary > > simultaneously *shouldn't* cause problems, as long as nothing opens it for > > write. > >> > >> > > >> > Best regards, > >> > > >> > Thomas Cas | Technicien du support infogérance > >> > PHONE : +33 3 51 25 23 26 WEB : > > > https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c > > > > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C > > > cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF > > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 > > > C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve > > d=0 > >> > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before > >> > printing this letter, think about the impact on the environment! > >> > > >> > -----Message d'origine----- > >> > De : Ken Gaillot <kgail...@redhat.com> Envoyé : lundi 19 décembre 2022 > >> > 22:08 À : Cluster Labs - All topics related to open-source clustering > >> > welcomed <users@clusterlabs.org> Cc : Service Infogérance > >> > <infogera...@ikoula.com> Objet : Re: [ClusterLabs] Bug pacemaker with > >> > multiple IP > >> > > >> > [Vous ne recevez pas souvent de courriers de kgail...@redhat.com. > >> > Découvrez pourquoi ceci est important à > >> > https://aka.ms/LearnAboutSenderIdentification ] > >> > > >> > On Mon, 2022-12-19 at 09:48 +0000, Thomas CAS wrote: > >> > > Hello Clusterlabs, > >> > > > >> > > I would like to report a bug on Pacemaker with the "IPaddr2" > >> > > resource: > >> > > > >> > > OS: Debian 10 > >> > > Kernel: Linux wd-websqlng01 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 > >> > > (2021-09-29) x86_64 GNU/Linux > >> > > Pacemaker version: 2.0.1-5+deb10u2 > >> > > > >> > > You will find the configuration of our cluster with 2 nodes attached. > >> > > > >> > > Bug : > >> > > > >> > > We have several IP configured in the cluster configuration (12) > >> > > Sometimes the cluster is unstable with the following errors in the > >> > > pacemaker logs: > >> > > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 232_monitor_10000:28835:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > >> > This doesn't sound like a bug in the agent; "Text file busy" suggests > that > > the system "ip" command is being modified while the command is running. Is a > > > software update happening when the problem occurs? > >> > > >> > I'm not sure whether there's some other situation that could cause that > > error, but simply executing the command a bunch of times simultaneously > > shouldn't cause it as far as I know. > >> > > >> > If simultaneous monitors is somehow causing the problem, you should be > able > > to work around it by using different intervals for different monitors. > >> > > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 239_monitor_10000:28877:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 239_monitor_10000:28877:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 234_monitor_10000:28830:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 231_monitor_10000:28900:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 231_monitor_10000:28900:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 235_monitor_10000:28905:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 235_monitor_10000:28905:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 237_monitor_10000:28890:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 237_monitor_10000:28890:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 238_monitor_10000:28876:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP- > >> > > 238_monitor_10000:28876:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP_monitor_10000:28880:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 709: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > Dec 18 21:07:51 **SENSITIVEDATA** pacemaker-execd [5079] > >> > > (operation_finished) notice: NGINX-VIP_monitor_10000:28880:stderr [ > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: 1: > >> > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2: ip: Text file busy ] > >> > > > >> > > The reason is that there are a lot of IPs configured and if the > >> > > monitors take place at the same time it causes this type of error. > >> > > > >> > > Best regards, > >> > > > >> > > Thomas Cas | Technicien du support infogérance > >> > > PHONE : +33 3 51 25 23 26 WEB : > > > https://fra01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ikoula.c > > > > om%2Fen&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a773808dae2c13ae4%7C > > > cb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWF > > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 > > > C3000%7C%7C%7C&sdata=oYe7ws2%2BPx3sMblOFBgkuXuSHTdguzB%2Flk83O5W2MjE%3D&reserve > > d=0 > >> > > IKOULA Data Center 34 rue Pont Assy - 51100 Reims - FRANCE Before > >> > > printing this letter, think about the impact on the environment! > >> > > > >> > > > >> > > > >> > > > >> > > _______________________________________________ > >> > > Manage your subscription: > >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli > >> > > st > >> > > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%4 > >> > > 0i > >> > > koula.com%7C541f4960600340f90a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f8 > >> > > 0d > >> > > b4a66f1c7%7C0%7C0%7C638070808660951911%7CUnknown%7CTWFpbGZsb3d8eyJWI > >> > > jo > >> > > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7 > >> > > C% > >> > > 7C%7C&sdata=U9osKXkKgjcqp6PN0%2F%2FB%2BzZyX0JMe6WMqRPVDTEGyWg%3D&res > >> > > er > >> > > ved=0 > >> > > > >> > > ClusterLabs home: > >> > > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > >> > > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C541f4960600340 > >> > > f9 > >> > > 0a2c08dae20511fc%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C638070 > >> > > 80 > >> > > 8660951911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM > >> > > zI > >> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FfODTlNES3on > >> > > Dk > >> > > %2FfLgs6bWR2iikLdfqx7ePxzZfR%2BIU%3D&reserved=0 > >> > -- > >> > Ken Gaillot <kgail...@redhat.com> > >> > > >> > _______________________________________________ > >> > Manage your subscription: > >> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > >> > s.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C01%7Ctcas%40i > >> > koula.com%7C9aab91944bd6454a773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80d > >> > b4a66f1c7%7C0%7C0%7C638071616800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo > >> > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% > >> > 7C%7C&sdata=3jtVFwvmy127OwWr9ZNbr6B%2FefuvNeZl9YsM31QxHJM%3D&reserved= > >> > 0 > >> > > >> > ClusterLabs home: > >> > https://fra01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > >> > clusterlabs.org%2F&data=05%7C01%7Ctcas%40ikoula.com%7C9aab91944bd6454a > >> > 773808dae2c13ae4%7Ccb7a4a4ea7f747cc931f80db4a66f1c7%7C0%7C0%7C63807161 > >> > 6800939086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI > >> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2E8uF0uNDw4djwcy > >> > %2FjVJ%2FDdJu5E77LQZfU9yrf0dVBI%3D&reserved=0 > >> > > >> > >> > >> -- > >> Regards, > >> > >> Reid Wahl (He/Him) > >> Senior Software Engineer, Red Hat > >> RHEL High Availability - Pacemaker > >> > > > > > > -- > > Regards, > > > > Reid Wahl (He/Him) > > Senior Software Engineer, Red Hat > > RHEL High Availability - Pacemaker > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl (He/Him) Senior Software Engineer, Red Hat RHEL High Availability - Pacemaker _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/