Re: [ovs-dev] [PATCH v5] netdev-dpdk: add control plane protection support
Kevin Traynor, Dec 21, 2022 at 17:35: > Hi Robin, > > I did a bit of testing and some comments. I tested out applying a > config. I wasn't able to check for lacp traffic, but I did see the extra > rxq being added and rss working as expected on the other rx queues. > > One issue I found is where flow cannot be applied. It is recovering but > when there is a reconfigure, everytime it uses n_rxq+1, and then > reconfigures again with n_rxq. > > For example (flow normally ok for mlx, i forced a fail): > $ ovs-vsctl get Interface myport status > {bus_info="bus_name=pci, vendor_id=15b3, device_id=1017", > cp_protection=unsupported, driver_name=mlx5_pci, if_descr="DPDK 22.11.1 > mlx5_pci", if_type="6", link_speed="25Gbps", max_hash_mac_addrs="0", > max_mac_addrs="128", max_rx_pktlen="1518", max_rx_queues="1024", > max_tx_queues="1024", max_vfs="0", max_vmdq_pools="0", > min_rx_bufsize="32", numa_id="0", port_no="2"} > > $ ovs-vsctl get Interface myport options:n_rxq > "3" > > change some config unrelated with rxq (snipped irrelevant parts): > $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100 > > INFO|Performing pmd to rx queue assignment using cycles algorithm. > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 2 > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 1 > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 0 > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 3 > INFO|Port 2: 04:3f:72:c2:07:b8 > INFO|Performing pmd to rx queue assignment using cycles algorithm. > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 0 > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 1 > INFO|Core 8 on numa node 0 assigned port 'myport' rx queue 2 > > If I remove the option:cp-protection, things return to normal. Hmm, that is weird, I'll have a closer look. > It would also be nice to document how to remove it from a port i.e. > ovs-vsctl remove interface dpdk-p0 options cp-protection > > I was able to add/remove/add and it seemed fine. At present it only > gives a dbg message: > 2022-12-21T16:00:43Z|00168|netdev_dpdk|DBG|myport: cp-protection: reset > flows and then it shows the reassignment of rxqs, so perhaps a message > at info level indicating cp-protection is being removed for that port > would be useful. Ack, I will add a hint in the docs and some explicit info message stating that cp protection is now disabled. > I notice that if I set cp-prot, then enable hw-offload, I get: > |WARN|myport: hw-offload is mutually exclusive with cp-protection > > It removes the cp-prot, but only mentions in debug log, so it's not > clear. I didn't check operation of hw-offload. > > Also saw that if I set hw-offload to false, cp-prot cannot operate as it > just checks for presence of hw-offload entry and rejects cp-prot even if > hw-offload=false. > > The main thing is cp-prot does not enable when hw-offload=true and that > is there at present. So probably just better logs can do for now. The > other combinations i'm testing are more test cases than realistic cases, > so as it's experimental perhaps it's fine to work on the different > combinations and more fine grained controls (e.g. hw-offload=false) later. This is probably related to the fact that hw-offload cannot be disabled entirely without restarting vswitchd (unrelated to my patch). This is what I noticed but I may have got it wrong. I'll have a look if this can be improved. Thanks a lot for more testing! Joyeux Noël & Bonne année, all :) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5] netdev-dpdk: add control plane protection support
On 16/12/2022 16:33, Robin Jarry wrote: Some control protocols are used to maintain link status between forwarding engines (e.g. LACP). When the system is not sized properly, the PMD threads may not be able to process all incoming traffic from the configured Rx queues. When a signaling packet of such protocols is dropped, it can cause link flapping, worsening the situation. Use the RTE flow API to redirect these protocols into a dedicated Rx queue. The assumption is made that the ratio between control protocol traffic and user data traffic is very low and thus this dedicated Rx queue will never get full. The RSS redirection table is re-programmed to only use the other Rx queues. The RSS table size is stored in the netdev_dpdk structure at port initialization to avoid requesting the information again when changing the port configuration. The additional Rx queue will be assigned a PMD core like any other Rx queue. Polling that extra queue may introduce increased latency and a slight performance penalty at the benefit of preventing link flapping. This feature must be enabled per port on specific protocols via the cp-protection option. This option takes a coma-separated list of protocol names. It is only supported on ethernet ports. If the user has already configured multiple Rx queues on the port, an additional one will be allocated for control plane packets. If the hardware cannot satisfy the requested number of requested Rx queues, the last Rx queue will be assigned for control plane. If only one Rx queue is available, the cp-protection feature will be disabled. If the hardware does not support the RTE flow matchers/actions, the feature will be disabled. It cannot be enabled when other_config:hw-offload=true as it may conflict with the offloaded RTE flows. Similarly, if hw-offload is enabled while some ports already have cp-protection enabled, RTE flow offloading will be disabled on these ports. Example use: ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \ set interface phy0 options:cp-protection=lacp -- \ set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \ set interface phy1 options:cp-protection=lacp As a starting point, only one protocol is supported: LACP. Other protocols can be added in the future. NIC compatibility should be checked. To validate that this works as intended, I used a traffic generator to generate random traffic slightly above the machine capacity at line rate on a two ports bond interface. OVS is configured to receive traffic on two VLANs and pop/push them in a br-int bridge based on tags set on patch ports. +--+ | DUT | |++| || br-int || default flow, action=NORMAL |||| || patch10patch11 || |+---|---|+| || | | |+---|---|+| || patch00patch01 || || tag:10tag:20 || |||| || br-phy || default flow, action=NORMAL |||| || bond0|| balance-slb, lacp=passive, lacp-time=fast ||phy0 phy1 || |+--|-|---+| +---|-|+ | | +---|-|+ | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast | lag | mode trunk VLANs 10, 20 | | |switch| | | | vlan 10vlan 20 | mode access | port2 port3 | +-|--|-+ | | +-|--|-+ | port0 port1 | Random traffic that is properly balanced | | across the bond ports in both directions. | traffic generator | +--+ Without cp-protection, the bond0 links are randomly switching to "defaulted" when one of the LACP packets sent by the switch is dropped because the RX queues are full and the PMD threads did not process them fast enough. When that happens, all traffic must go through a single link which causes above line rate traffic to be dropped. When cp-protection is enabled, no LACP packet is dropped and the bond links remain enabled at all times, maximizing the throughput. This feature may be considered as "QoS". However, it does not work by limiting the rate of traffic explicitly. It only guarantees that some protocols have a lower chance of being dropped because the PMD cores cannot keep up with regular traffic. The choice of protocols is limited on purpose. This is not meant to be configurable by users. Some limited configurability could be considered in the future but it would expose to more potential issues if users are accidentally redirecting all traffic in the control plane queue. Hi Robin, I did a bit of testing and some comments. I tested out applying a
[ovs-dev] [PATCH v5] netdev-dpdk: add control plane protection support
Some control protocols are used to maintain link status between forwarding engines (e.g. LACP). When the system is not sized properly, the PMD threads may not be able to process all incoming traffic from the configured Rx queues. When a signaling packet of such protocols is dropped, it can cause link flapping, worsening the situation. Use the RTE flow API to redirect these protocols into a dedicated Rx queue. The assumption is made that the ratio between control protocol traffic and user data traffic is very low and thus this dedicated Rx queue will never get full. The RSS redirection table is re-programmed to only use the other Rx queues. The RSS table size is stored in the netdev_dpdk structure at port initialization to avoid requesting the information again when changing the port configuration. The additional Rx queue will be assigned a PMD core like any other Rx queue. Polling that extra queue may introduce increased latency and a slight performance penalty at the benefit of preventing link flapping. This feature must be enabled per port on specific protocols via the cp-protection option. This option takes a coma-separated list of protocol names. It is only supported on ethernet ports. If the user has already configured multiple Rx queues on the port, an additional one will be allocated for control plane packets. If the hardware cannot satisfy the requested number of requested Rx queues, the last Rx queue will be assigned for control plane. If only one Rx queue is available, the cp-protection feature will be disabled. If the hardware does not support the RTE flow matchers/actions, the feature will be disabled. It cannot be enabled when other_config:hw-offload=true as it may conflict with the offloaded RTE flows. Similarly, if hw-offload is enabled while some ports already have cp-protection enabled, RTE flow offloading will be disabled on these ports. Example use: ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \ set interface phy0 options:cp-protection=lacp -- \ set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \ set interface phy1 options:cp-protection=lacp As a starting point, only one protocol is supported: LACP. Other protocols can be added in the future. NIC compatibility should be checked. To validate that this works as intended, I used a traffic generator to generate random traffic slightly above the machine capacity at line rate on a two ports bond interface. OVS is configured to receive traffic on two VLANs and pop/push them in a br-int bridge based on tags set on patch ports. +--+ | DUT | |++| || br-int || default flow, action=NORMAL |||| || patch10patch11 || |+---|---|+| || | | |+---|---|+| || patch00patch01 || || tag:10tag:20 || |||| || br-phy || default flow, action=NORMAL |||| || bond0|| balance-slb, lacp=passive, lacp-time=fast ||phy0 phy1 || |+--|-|---+| +---|-|+ | | +---|-|+ | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast | lag | mode trunk VLANs 10, 20 | | |switch| | | | vlan 10vlan 20 | mode access | port2 port3 | +-|--|-+ | | +-|--|-+ | port0 port1 | Random traffic that is properly balanced | | across the bond ports in both directions. | traffic generator | +--+ Without cp-protection, the bond0 links are randomly switching to "defaulted" when one of the LACP packets sent by the switch is dropped because the RX queues are full and the PMD threads did not process them fast enough. When that happens, all traffic must go through a single link which causes above line rate traffic to be dropped. When cp-protection is enabled, no LACP packet is dropped and the bond links remain enabled at all times, maximizing the throughput. This feature may be considered as "QoS". However, it does not work by limiting the rate of traffic explicitly. It only guarantees that some protocols have a lower chance of being dropped because the PMD cores cannot keep up with regular traffic. The choice of protocols is limited on purpose. This is not meant to be configurable by users. Some limited configurability could be considered in the future but it would expose to more potential issues if users are accidentally redirecting all traffic in the control plane queue. Cc: Christophe Fontaine Cc: Kevin Traynor Cc: David Marchand Signed-off-by: Robin Jarry --- v4 -> v5: * Added NEWS entry * Updated dpdk documentation link