On Tue, May 20, 2025 at 8:06 PM Tiago Pires via discuss <
ovs-discuss@openvswitch.org> wrote:

> Hi All,
>

Hi Tiago,


> In an cluster with OVN 24.03.5 we are observing in a few chassis that
> works as dedicated OVN Interconnection Gateways the ovn-controller
> process running almost in 100% of CPU usage:
>
> 2025-05-20T16:58:39.546Z|689641|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95%
> CPU usage)
> 2025-05-20T16:58:45.488Z|689642|poll_loop|INFO|Dropped 48 log messages
> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
> 2025-05-20T16:58:45.488Z|689643|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (92%
> CPU usage)
> 2025-05-20T16:58:51.553Z|689644|poll_loop|INFO|Dropped 47 log messages
> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> 2025-05-20T16:58:51.553Z|689645|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (98%
> CPU usage)
> 2025-05-20T16:58:57.514Z|689646|poll_loop|INFO|Dropped 50 log messages
> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
> 2025-05-20T16:58:57.514Z|689647|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95%
> CPU usage)
> 2025-05-20T16:59:03.558Z|689648|poll_loop|INFO|Dropped 49 log messages
> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
>
> Checking what ovn-controller is doing in debug mode, we can see a lot
> of the below ARP packets:
>
> 2025-05-20T17:10:21.149Z|00004|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX.6.X31, dst-ip=172.XX.X.2XX
> 2025-05-20T17:10:21.149Z|00005|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.6.XX1, dst-ip=172.XX.X.XX4
> 2025-05-20T17:10:21.271Z|00006|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.2X3
> 2025-05-20T17:10:21.271Z|00007|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.X41
> 2025-05-20T17:10:21.271Z|00008|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x60199dbd| in-port=338| src-mac=fa:16:3e:a7:a2:37,
> dst-mac=00:00:00:00:00:00| src-ip=172.XX.X2.X30, dst-ip=172.XX.X.X09
> 2025-05-20T17:10:21.271Z|00009|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=131| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XXX.X.X4, dst-ip=172.XX.X.X19
> 2025-05-20T17:10:21.272Z|00010|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.XX.X.X98
> 2025-05-20T17:10:21.277Z|00011|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.1X1, dst-ip=172.XX.X.X05
> 2025-05-20T17:10:21.388Z|00012|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> received  packet-in | opcode=ARP| OF_Table_ID=0|
> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> dst-mac=00:00:00:00:00:00| src-ip=10.XX.X.X23, dst-ip=172.XX.X.2X2


 I can see that almost all of those packets have identical src MAC and there
are a lot of duplicate src IP AFAICT. I have a suspicion that this might be
related
to a problem that we saw with multicast split flooding ovn-controller with
garps [0].

Could you please help us to identify which flow the OF_Cookie_ID=0x1367fe68
corresponds to?


> In my understanding, it seems there are a lot of ARPs from different
> OVN virtual networks and making the ovn-controller use more CPU time.
> Wouldn't the ovn-controller know how to handle these ARP packets
> without use a lot of CPU time?
>

I mean ovn-controller knows what to do with them but the snippet has 9
packets within 200ms, so you can overload pinctrl thread by just sheer
volume.


> Regards,
>
> Tiago Pires
>
> --
>
>
>
>
> _‘Esta mensagem é direcionada apenas para os endereços constantes no
> cabeçalho inicial. Se você não está listado nos endereços constantes no
> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> estão
> imediatamente anuladas e proibidas’._
>
>
> * **‘Apesar do Magazine Luiza tomar
> todas as precauções razoáveis para assegurar que nenhum vírus esteja
> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
>
>
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[0]
https://mail.openvswitch.org/pipermail/ovs-discuss/2025-February/053455.html

Regards,
Ales
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to