On Tue, May 20, 2025 at 8:06 PM Tiago Pires via discuss < ovs-discuss@openvswitch.org> wrote:
> Hi All, > Hi Tiago, > In an cluster with OVN 24.03.5 we are observing in a few chassis that > works as dedicated OVN Interconnection Gateways the ovn-controller > process running almost in 100% of CPU usage: > > 2025-05-20T16:58:39.546Z|689641|poll_loop|INFO|wakeup due to [POLLIN] > on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95% > CPU usage) > 2025-05-20T16:58:45.488Z|689642|poll_loop|INFO|Dropped 48 log messages > in last 6 seconds (most recently, 1 seconds ago) due to excessive rate > 2025-05-20T16:58:45.488Z|689643|poll_loop|INFO|wakeup due to [POLLIN] > on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (92% > CPU usage) > 2025-05-20T16:58:51.553Z|689644|poll_loop|INFO|Dropped 47 log messages > in last 6 seconds (most recently, 0 seconds ago) due to excessive rate > 2025-05-20T16:58:51.553Z|689645|poll_loop|INFO|wakeup due to [POLLIN] > on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (98% > CPU usage) > 2025-05-20T16:58:57.514Z|689646|poll_loop|INFO|Dropped 50 log messages > in last 6 seconds (most recently, 1 seconds ago) due to excessive rate > 2025-05-20T16:58:57.514Z|689647|poll_loop|INFO|wakeup due to [POLLIN] > on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95% > CPU usage) > 2025-05-20T16:59:03.558Z|689648|poll_loop|INFO|Dropped 49 log messages > in last 6 seconds (most recently, 0 seconds ago) due to excessive rate > > Checking what ovn-controller is doing in debug mode, we can see a lot > of the below ARP packets: > > 2025-05-20T17:10:21.149Z|00004|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX.6.X31, dst-ip=172.XX.X.2XX > 2025-05-20T17:10:21.149Z|00005|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX1.6.XX1, dst-ip=172.XX.X.XX4 > 2025-05-20T17:10:21.271Z|00006|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.2X3 > 2025-05-20T17:10:21.271Z|00007|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.X41 > 2025-05-20T17:10:21.271Z|00008|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x60199dbd| in-port=338| src-mac=fa:16:3e:a7:a2:37, > dst-mac=00:00:00:00:00:00| src-ip=172.XX.X2.X30, dst-ip=172.XX.X.X09 > 2025-05-20T17:10:21.271Z|00009|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=131| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XXX.X.X4, dst-ip=172.XX.X.X19 > 2025-05-20T17:10:21.272Z|00010|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.XX.X.X98 > 2025-05-20T17:10:21.277Z|00011|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.1X1, dst-ip=172.XX.X.X05 > 2025-05-20T17:10:21.388Z|00012|pinctrl(ovn_pinctrl0)|DBG|pinctrl > received packet-in | opcode=ARP| OF_Table_ID=0| > OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, > dst-mac=00:00:00:00:00:00| src-ip=10.XX.X.X23, dst-ip=172.XX.X.2X2 I can see that almost all of those packets have identical src MAC and there are a lot of duplicate src IP AFAICT. I have a suspicion that this might be related to a problem that we saw with multicast split flooding ovn-controller with garps [0]. Could you please help us to identify which flow the OF_Cookie_ID=0x1367fe68 corresponds to? > In my understanding, it seems there are a lot of ARPs from different > OVN virtual networks and making the ovn-controller use more CPU time. > Wouldn't the ovn-controller know how to handle these ARP packets > without use a lot of CPU time? > I mean ovn-controller knows what to do with them but the snippet has 9 packets within 200ms, so you can overload pinctrl thread by just sheer volume. > Regards, > > Tiago Pires > > -- > > > > > _‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas > estão > imediatamente anuladas e proibidas’._ > > > * **‘Apesar do Magazine Luiza tomar > todas as precauções razoáveis para assegurar que nenhum vírus esteja > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* > > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2025-February/053455.html Regards, Ales
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss