Coucou Robin,
On Mon, Dec 12, 2022 at 4:16 PM Robin Jarry <[email protected]> wrote:
>
> Some control protocols are used to maintain link status between
> forwarding engines (e.g. LACP). When the system is not sized properly,
> the PMD threads may not be able to process all incoming traffic from the
> configured Rx queues. When a signaling packet of such protocols is
> dropped, it can cause link flapping, worsening the situation.
>
> Use the RTE flow API to redirect these protocols into a dedicated Rx
> queue. The assumption is made that the ratio between control protocol
> traffic and user data traffic is very low and thus this dedicated Rx
> queue will never get full. The RSS redirection table is re-programmed to
> only use the other Rx queues. The RSS table size is stored in the
> netdev_dpdk structure at port initialization to avoid requesting the
> information again when changing the port configuration.
>
> The additional Rx queue will be assigned a PMD core like any other Rx
> queue. Polling that extra queue may introduce increased latency and
> a slight performance penalty at the benefit of preventing link flapping.
>
> This feature must be enabled per port on specific protocols via the
> cp-protection option. This option takes a coma-separated list of
> protocol names. It is only supported on ethernet ports.
>
> If the user has already configured multiple Rx queues on the port, an
> additional one will be allocated for control plane packets. If the
> hardware cannot satisfy the requested number of requested Rx queues, the
> last Rx queue will be assigned for control plane. If only one Rx queue
> is available, the cp-protection feature will be disabled. If the
> hardware does not support the RTE flow matchers/actions, the feature
> will be disabled.
>
> It cannot be enabled when other_config:hw-offload=true as it may
> conflict with the offloaded RTE flows. Similarly, if hw-offload is
> enabled while some ports already have cp-protection enabled, the RTE
> flow offloading will be disabled on these ports.
>
> Example use:
>
> ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
> set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \
> set interface phy0 options:cp-protection=lacp -- \
> set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \
> set interface phy1 options:cp-protection=lacp
>
> As a starting point, only one protocol is supported: LACP. Other
> protocols can be added in the future. NIC compatibility should be
> checked.
>
> To validate that this works as intended, I used a traffic generator to
> generate random traffic slightly above the machine capacity at line rate
> on a two ports bond interface. OVS is configured to receive traffic on
> two VLANs and pop/push them in a br-int bridge based on tags set on
> patch ports.
>
> +----------------------+
> | DUT |
> |+--------------------+|
> || br-int || default flow, action=NORMAL
> || ||
> || patch10 patch11 ||
> |+---|-----------|----+|
> | | | |
> |+---|-----------|----+|
> || patch00 patch01 ||
> || tag:10 tag:20 ||
> || ||
> || br-phy || default flow, action=NORMAL
> || ||
> || bond0 || balance-slb, lacp=passive, lacp-time=fast
> || phy0 phy1 ||
> |+------|-----|-------+|
> +-------|-----|--------+
> | |
> +-------|-----|--------+
> | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast
> | lag | mode trunk VLANs 10, 20
> | |
> | switch |
> | |
> | vlan 10 vlan 20 | mode access
> | port2 port3 |
> +-----|----------|-----+
> | |
> +-----|----------|-----+
> | port0 port1 | Random traffic that is properly balanced
> | | across the bond ports in both directions.
> | traffic generator |
> +----------------------+
>
> Without cp-protection, the bond0 links are randomly switching to
> "defaulted" when one of the LACP packets sent by the switch is dropped
> because the RX queues are full and the PMD threads did not process them
> fast enough. When that happens, all traffic must go through a single
> link which causes above line rate traffic to be dropped.
>
> When cp-protection is enabled, no LACP packet is dropped and the bond
> links remain enabled at all times, maximizing the throughput.
>
> This feature may be considered as "QoS". However, it does not work by
> limiting the rate of traffic explicitly. It only guarantees that some
> protocols have a lower chance of being dropped because the PMD cores
> cannot keep up with regular traffic.
>
> The choice of protocols is limited on purpose. This is not meant to be
> configurable by users. Some limited configurability could be considered
> in the future but it would expose to more potential issues if users are
> accidentally redirecting all traffic in the control plane queue.
>
> Cc: Christophe Fontaine <[email protected]>
> Cc: Kevin Traynor <[email protected]>
> Signed-off-by: Robin Jarry <[email protected]>
This is not a full review (I did not look too much into the rte_flow /
RSS reta stuff), but I did catch some issues.
> ---
> v3 -> v4:
>
> * Rebased on master 739bcf2263b3 ("odp-execute: Fix ipv4 missing
> clearing of connection tracking fields.")
> * Removed the extra reta_size field from struct netdev_dpdk.
> rte_eth_dev_info() is now called every time in
> dpdk_cp_prot_rss_configure().
> * Fixed RSS reconfiguration for drivers that have redirection tables
> that are smaller than RTE_ETH_RETA_GROUP_SIZE (64).
> * Added workaround to avoid imbalanced queues when the reta size is too
> small and is not a multiple of the number of RSS queues.
> * Fixed panic when deleting ports with cp-protection enabled.
> * Using cp-protection with an invalid/unsupported protocol now only
> prints a warning in the logs and it is ignored. It no longer causes
> the port to be destroyed.
>
> Documentation/topics/dpdk/phy.rst | 55 +++++
> lib/netdev-dpdk.c | 323 +++++++++++++++++++++++++++++-
> vswitchd/vswitch.xml | 26 +++
> 3 files changed, 403 insertions(+), 1 deletion(-)
This is a new feature, probably worth adding a NEWS update.
Plus, should we declare it experimental until we get more feedback?
>
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index cb2d5bcb7b3f..e4ae69ec2854 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -131,6 +131,61 @@ possible with DPDK acceleration. It is possible to
> configure multiple Rx queues
> for ``dpdk`` ports, thus ensuring this is not a bottleneck for performance.
> For
> information on configuring PMD threads, refer to :doc:`pmd`.
>
> +Control Plane Protection
> +------------------------
> +
> +Some control protocols are used to maintain link status between forwarding
> +engines. In SDN environments, these packets share the same physical network
> +than the user data traffic.
> +
> +When the system is not sized properly, the PMD threads may not be able to
> +process all incoming traffic from the configured Rx queues. When a signaling
> +packet of such protocols is dropped, it can cause link flapping, worsening
> the
> +situation.
> +
> +Some physical NICs can be programmed to put these protocols in a dedicated
> +hardware Rx queue using the rte_flow__ API.
> +
> +__
> https://doc.dpdk.org/guides-21.11/prog_guide/rte_flow.html#device-compatibility
*22.11 now.
> +
> +The currently supported control plane protocols are:
> +
> +``lacp``
> + `Link Aggregation Control Protocol`__. Ether type ``0x8809``.
> +
> + __ https://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf
> +
> +.. warning::
> +
> + This feature is not compatible with all NICs. Refer to vendor
> documentation
> + for more information.
> +
> +Control plane protection must be enabled on specific protocols per port. The
> +``cp-protection`` option requires a coma separated list of protocol names::
> +
> + $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
> + options:dpdk-devargs=0000:01:00.0 options:cp-protection=lacp
> +
> +.. note::
> +
> + If multiple Rx queues are already configured, regular RSS (Receive Side
> + Scaling) queue balancing is done on all but the extra control plane
> + protection queue.
> +
> +.. tip::
> +
> + You can check if control plane protection is supported on a port with the
> + following command::
> +
> + $ ovs-vsctl get interface dpdk-p0 status
> + {cp_protection_queue="2", driver_name=..., rss_queues="0-1"}
> +
> + If the hardware does not support redirecting control plane traffic to
> + a dedicated queue, it will be explicit::
> +
> + $ ovs-vsctl get interface dpdk-p0 status
> + {cp_protection_queue=unsupported, driver_name=..., rss_queues="0-1"}
> +
> .. _dpdk-phy-flow-control:
>
> Flow Control
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fff57f78279a..7bbf629e75c7 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -417,6 +417,11 @@ enum dpdk_hw_ol_features {
> NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4,
> };
>
> +enum dpdk_cp_prot_flags {
> + DPDK_CP_PROT_UNSUPPORTED = 1 << 0,
> + DPDK_CP_PROT_LACP = 1 << 1,
> +};
> +
> /*
> * In order to avoid confusion in variables names, following naming
> convention
> * should be used, if possible:
> @@ -536,6 +541,13 @@ struct netdev_dpdk {
>
> /* VF configuration. */
> struct eth_addr requested_hwaddr;
> +
> + /* Requested control plane protection flags,
> + * from the enum set 'dpdk_cp_prot_flags' */
> + uint64_t requested_cp_prot_flags;
> + uint64_t cp_prot_flags;
> + size_t cp_prot_flows_num;
> + struct rte_flow **cp_prot_flows;
> );
>
> PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -1315,6 +1327,10 @@ common_construct(struct netdev *netdev, dpdk_port_t
> port_no,
> dev->requested_n_txq = NR_QUEUE;
> dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
> dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
> + dev->requested_cp_prot_flags = 0;
> + dev->cp_prot_flags = 0;
> + dev->cp_prot_flows_num = 0;
> + dev->cp_prot_flows = NULL;
>
> /* Initialize the flow control to NULL */
> memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> @@ -1489,6 +1505,8 @@ common_destruct(struct netdev_dpdk *dev)
> ovs_mutex_destroy(&dev->mutex);
> }
>
> +static void dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev);
> +
> static void
> netdev_dpdk_destruct(struct netdev *netdev)
> {
> @@ -1526,6 +1544,9 @@ netdev_dpdk_destruct(struct netdev *netdev)
> }
> }
>
> + /* Destroy any rte flows to allow RXQs to be removed. */
> + dpdk_cp_prot_unconfigure(dev);
> +
> /* Retrieve eth device data before closing it. */
> rte_eth_dev_info_get(dev->port_id, &dev_info);
>
> @@ -1910,6 +1931,9 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const
> struct smap *args)
> int new_n_rxq;
>
> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> + if (dev->requested_cp_prot_flags) {
I'd rather see if cp has a chance to work, by checking both != 0 and
not containing DPDK_CP_PROT_UNSUPPORTED.
So:
if (dev->requested_cp_prot_flags & ~DPDK_CP_PROT_UNSUPPORTED)
This way, there should be no need for the "dev->request_n_rxq-=1"
later in netdev_dpdk_reconfigure.
> + new_n_rxq += 1;
> + }
> if (new_n_rxq != dev->requested_n_rxq) {
> dev->requested_n_rxq = new_n_rxq;
> netdev_request_reconfigure(&dev->up);
> @@ -1933,6 +1957,50 @@ dpdk_process_queue_size(struct netdev *netdev, const
> struct smap *args,
> }
> }
>
> +static void
> +dpdk_cp_prot_set_config(struct netdev *netdev, struct netdev_dpdk *dev,
> + const struct smap *args, char **errp)
> +{
> + const char *arg = smap_get_def(args, "cp-protection", "");
> + uint64_t flags = 0;
> + char buf[256];
> + char *token, *saveptr;
> +
> + ovs_strzcpy(buf, arg, sizeof(buf));
> + buf[sizeof(buf) - 1] = '\0';
This is control path stuff => no static buffer, please xstrdup() + free.
> +
> + token = strtok_r(buf, ",", &saveptr);
> + while (token) {
> + if (strcmp(token, "lacp") == 0) {
> + flags |= DPDK_CP_PROT_LACP;
> + } else {
> + VLOG_WARN_BUF(
> + errp, "%s options:cp-protection unknown protocol '%s'",
> + netdev_get_name(netdev), token);
> + }
> + token = strtok_r(NULL, ",", &saveptr);
> + }
> +
> + if (flags && dev->type != DPDK_DEV_ETH) {
> + VLOG_WARN_BUF( errp,
> + "%s options:cp-protection is only supported on ethernet ports",
> + netdev_get_name(netdev));
> + flags = 0;
> + }
> +
> + if (flags && netdev_is_flow_api_enabled()) {
> + VLOG_WARN_BUF(errp,
> + "%s options:cp-protection is incompatible with hw-offload",
> + netdev_get_name(netdev));
> + flags = 0;
> + }
> +
> + if (flags != dev->requested_cp_prot_flags) {
> + dev->requested_cp_prot_flags = flags;
> + netdev_request_reconfigure(netdev);
> + }
> +}
> +
> static int
> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> char **errp)
> @@ -1952,6 +2020,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const
> struct smap *args,
> ovs_mutex_lock(&dpdk_mutex);
> ovs_mutex_lock(&dev->mutex);
>
> + /* needs to be called before rxq_config */
> + dpdk_cp_prot_set_config(netdev, dev, args, errp);
> +
> dpdk_set_rxq_config(dev, args);
>
> dpdk_process_queue_size(netdev, args, "n_rxq_desc",
> @@ -3646,8 +3717,10 @@ netdev_dpdk_get_status(const struct netdev *netdev,
> struct smap *args)
> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> struct rte_eth_dev_info dev_info;
> const char *bus_info;
> + uint64_t cp_prot_flags;
> uint32_t link_speed;
> uint32_t dev_flags;
> + int n_rxq;
>
> if (!rte_eth_dev_is_valid_port(dev->port_id)) {
> return ENODEV;
> @@ -3659,6 +3732,8 @@ netdev_dpdk_get_status(const struct netdev *netdev,
> struct smap *args)
> link_speed = dev->link.link_speed;
> dev_flags = *dev_info.dev_flags;
> bus_info = rte_dev_bus_info(dev_info.device);
> + cp_prot_flags = dev->cp_prot_flags;
> + n_rxq = netdev->n_rxq;
> ovs_mutex_unlock(&dev->mutex);
> ovs_mutex_unlock(&dpdk_mutex);
>
> @@ -3701,6 +3776,24 @@ netdev_dpdk_get_status(const struct netdev *netdev,
> struct smap *args)
> ETH_ADDR_ARGS(dev->hwaddr));
> }
>
> + if (cp_prot_flags) {
> + if (cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) {
> + smap_add(args, "cp_protection_queue", "unsupported");
> + if (n_rxq > 1) {
> + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 1);
> + } else {
> + smap_add(args, "rss_queues", "0");
> + }
> + } else {
> + smap_add_format(args, "cp_protection_queue", "%d", n_rxq - 1);
> + if (n_rxq > 2) {
> + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 2);
> + } else {
> + smap_add(args, "rss_queues", "0");
> + }
> + }
> + }
> +
There is something wrong in the status after disabling the feature, so
I suspect cp_prot_flags is not correctly cleared when unconfiguring.
Before disabling:
options : {cp-protection=lacp,
dpdk-devargs="0000:82:00.0", n_rxq="2"}
status : {bus_info="bus_name=pci, vendor_id=15b3,
device_id=1019", cp_protection_queue="2", driver_name=mlx5_pci,
if_descr="DPDK 22.11.1 mlx5_pci", if_type="6", link_speed="100Gbps",
max_hash_mac_addrs="0", max_mac_addrs="128", max_rx_pktlen="1518",
max_rx_queues="1024", max_tx_queues="1024", max_vfs="0",
max_vmdq_pools="0", min_rx_bufsize="32", numa_id="1", port_no="0",
rss_queues="0-1"}
After disabling:
options : {dpdk-devargs="0000:82:00.0", n_rxq="2"}
status : {bus_info="bus_name=pci, vendor_id=15b3,
device_id=1019", cp_protection_queue="1", driver_name=mlx5_pci,
if_descr="DPDK 22.11.1 mlx5_pci", if_type="6", link_speed="100Gbps",
max_hash_mac_addrs="0", max_mac_addrs="128", max_rx_pktlen="1518",
max_rx_queues="1024", max_tx_queues="1024", max_vfs="0",
max_vmdq_pools="0", min_rx_bufsize="32", numa_id="1", port_no="0",
rss_queues="0"}
I still see cp_protection_queue / rss_queues fields in the status colum.
> return 0;
> }
>
> @@ -4931,6 +5024,211 @@ static const struct dpdk_qos_ops trtcm_policer_ops = {
> .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
> };
>
> +static int
> +dpdk_cp_prot_add_flow(struct netdev_dpdk *dev,
> + const struct rte_flow_attr *attr,
> + const struct rte_flow_item items[],
> + const struct rte_flow_action actions[],
> + const char *desc, bool dry_run)
> +{
> + struct rte_flow_error error;
> + struct rte_flow *flow;
> + size_t num;
> +
> + if (dry_run) {
> + int ret;
> + ret = rte_flow_validate(dev->port_id, attr, items, actions, &error);
> + if (rte_flow_validate(dev->port_id, attr, items, actions, &error)) {
Hopefully it will yield the same result, but as an optimisation, I
propose 'if (ret) {' :-).
> + VLOG_WARN("%s: cp-protection: device does not support %s flow:
> %s",
> + netdev_get_name(&dev->up), desc, error.message);
> + }
> + return ret;
> + }
> +
> + flow = rte_flow_create(dev->port_id, attr, items, actions, &error);
> + if (flow == NULL) {
> + VLOG_WARN("%s: cp-protection: failed to add %s flow: %s",
> + netdev_get_name(&dev->up), desc, error.message);
> + return rte_errno;
> + }
> +
> + num = dev->cp_prot_flows_num + 1;
> + dev->cp_prot_flows = xrealloc(dev->cp_prot_flows, sizeof(flow) * num);
> + dev->cp_prot_flows[dev->cp_prot_flows_num] = flow;
> + dev->cp_prot_flows_num = num;
> +
> + return 0;
> +}
> +
> +static int
> +dpdk_cp_prot_add_traffic_flow(struct netdev_dpdk *dev,
> + const struct rte_flow_item items[],
> + const char *desc, bool dry_run)
> +{
> + const struct rte_flow_attr attr = { .ingress = 1 };
> + const struct rte_flow_action actions[] = {
> + {
> + .type = RTE_FLOW_ACTION_TYPE_QUEUE,
> + .conf = &(const struct rte_flow_action_queue) {
> + .index = dev->up.n_rxq - 1,
> + },
> + },
> + { .type = RTE_FLOW_ACTION_TYPE_END },
> + };
> +
> + if (!dry_run) {
> + VLOG_INFO("%s: cp-protection: redirecting %s traffic to queue %d",
> + netdev_get_name(&dev->up), desc, dev->up.n_rxq - 1);
> + }
> + return dpdk_cp_prot_add_flow(dev, &attr, items, actions, desc, dry_run);
> +}
> +
> +#define RETA_CONF_SIZE (RTE_ETH_RSS_RETA_SIZE_512 / RTE_ETH_RETA_GROUP_SIZE)
> +
> +static int
> +dpdk_cp_prot_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq, bool
> verbose)
Checkpatch complains about 80 columns limit here.
> +{
> + struct rte_eth_rss_reta_entry64 reta_conf[RETA_CONF_SIZE];
> + struct rte_eth_dev_info info;
> + int err;
> +
> + if (rss_n_rxq == 1 && verbose) {
> + VLOG_INFO("%s: cp-protection: redirecting other traffic to queue 0",
> + netdev_get_name(&dev->up));
> + } else if (verbose) {
> + VLOG_INFO("%s: cp-protection: applying rss on queues 0-%d",
> + netdev_get_name(&dev->up), rss_n_rxq - 1);
> + }
> +
> + memset(&info, 0, sizeof(info));
Unneeded, dpdk clears this param itself.
> + err = rte_eth_dev_info_get(dev->port_id, &info);
> + if (err < 0) {
> + goto out;
> + }
> +
> + if (info.reta_size % rss_n_rxq != 0 &&
> + info.reta_size < RTE_ETH_RSS_RETA_SIZE_128) {
> + /*
> + * Some drivers set reta_size equal to the total number of rxqs that
> + * are configured when it is a power of two. Since we are actually
> + * reconfiguring the redirection table to exclude the last rxq, we
> may
> + * end up with an imbalanced redirection table. For example, such
> + * configuration:
> + *
> + * options:n_rxq=3 options:cp-protection=lacp
> + *
> + * Will actually configure 4 rxqs on the NIC, and the default reta
> to:
> + *
> + * [0, 1, 2, 3]
> + *
> + * And dpdk_cp_prot_rss_configure() will reconfigure reta to:
> + *
> + * [0, 1, 2, 0]
> + *
> + * Causing queue 0 to receive twice as much traffic as queues 1 and
> 2.
> + *
> + * Work around that corner case by forcing a bigger redirection table
> + * size to 128 entries when reta_size is not a multiple of rss_n_rxq
> + * and when reta_size is less than 128. This value seems to be
> + * supported by most of the drivers that also support rte flow.
> + */
> + info.reta_size = RTE_ETH_RSS_RETA_SIZE_128;
> + }
> +
> + memset(reta_conf, 0, sizeof(reta_conf));
> + for (uint16_t i = 0; i < info.reta_size; i++) {
> + uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE;
> + uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE;
> + reta_conf[idx].mask |= 1ULL << shift;
> + reta_conf[idx].reta[shift] = i % rss_n_rxq;
> + }
> + err = rte_eth_dev_rss_reta_update(dev->port_id, reta_conf,
> info.reta_size);
> +
> +out:
> + if (err < 0 && verbose) {
> + VLOG_DBG("%s: failed to configure RSS redirection table: err=%d",
> + netdev_get_name(&dev->up), err);
> + }
> + return err;
> +}
> +
> +static int
> +dpdk_cp_prot_configure(struct netdev_dpdk *dev, bool dry_run)
> +{
> + int err = 0;
> +
> + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) {
> + goto out;
> + }
> + if (dev->up.n_rxq < 2) {
> + err = ENOTSUP;
> + VLOG_DBG("%s: cp-protection: not enough available rx queues",
> + netdev_get_name(&dev->up));
> + goto out;
> + }
> +
> + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_LACP) {
> + err = dpdk_cp_prot_add_traffic_flow(
> + dev,
> + (const struct rte_flow_item []) {
> + {
> + .type = RTE_FLOW_ITEM_TYPE_ETH,
> + .spec = &(const struct rte_flow_item_eth){
> + .type = htons(ETH_TYPE_LACP),
> + },
> + .mask = &(const struct rte_flow_item_eth){
> + .type = htons(0xffff),
> + },
> + },
> + { .type = RTE_FLOW_ITEM_TYPE_END },
> + },
> + "lacp",
> + dry_run
> + );
> + if (err) {
> + goto out;
> + }
> + }
> +
> + if (!dry_run && dev->cp_prot_flows_num) {
> + /* reconfigure RSS reta in all but the cp protection queue */
> + err = dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq - 1, true);
> + }
> +
> +out:
> + if (!dry_run) {
> + dev->cp_prot_flags = dev->requested_cp_prot_flags;
> + }
> + if (err) {
> + dev->requested_cp_prot_flags |= DPDK_CP_PROT_UNSUPPORTED;
> + }
> + return err;
> +}
> +
> +static void
> +dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev)
> +{
> + struct rte_flow_error error;
> +
> + if (dev->cp_prot_flows_num == 0) {
> + return;
> + }
> +
> + VLOG_DBG("%s: cp-protection: reset flows", netdev_get_name(&dev->up));
> +
> + for (int i = 0; i < dev->cp_prot_flows_num; i++) {
> + if (rte_flow_destroy(dev->port_id, dev->cp_prot_flows[i], &error)) {
> + VLOG_DBG("%s: cp-protection: failed to destroy flow: %s",
> + netdev_get_name(&dev->up), error.message);
> + }
> + }
> + free(dev->cp_prot_flows);
> + dev->cp_prot_flows_num = 0;
> + dev->cp_prot_flows = NULL;
> +
> + (void) dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq, false);
Please log a debug message if it fails.
> +}
> +
> static int
> netdev_dpdk_reconfigure(struct netdev *netdev)
> {
> @@ -4941,6 +5239,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>
> if (netdev->n_txq == dev->requested_n_txq
> && netdev->n_rxq == dev->requested_n_rxq
> + && dev->cp_prot_flags == dev->requested_cp_prot_flags
> && dev->mtu == dev->requested_mtu
> && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
> && dev->rxq_size == dev->requested_rxq_size
> @@ -4968,6 +5267,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> goto out;
> }
>
> + dpdk_cp_prot_unconfigure(dev);
> +
> dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
>
> netdev->n_txq = dev->requested_n_txq;
> @@ -5012,6 +5313,20 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> if (!dev->tx_q) {
> err = ENOMEM;
> }
> + if (!err && dev->requested_cp_prot_flags) {
> + /* dry run first */
> + err = dpdk_cp_prot_configure(dev, true);
> + if (!err) {
> + /* if no error, apply configuration */
> + err = dpdk_cp_prot_configure(dev, false);
I don't like this dry_run stuff, it makes the code hard to read...
At least move the dev->cp_prot_flags out of this dpdk_cp_prot_configure helper.
And instead, store/clear dev->cp_prot_flags only in netdev_dpdk_reconfigure.
> + }
> + if (err) {
> + /* no hw support, remove the extra queue & recover gracefully */
> + err = 0;
> + dev->requested_n_rxq -= 1;
It should be unneeded if dpdk_set_rxq_config checks for UNSUPPORTED as
I proposed earlier.
> + netdev_request_reconfigure(netdev);
> + }
> + }
I suspect a dev->cp_prot_flags = 0; is missing here and this is likely
the cause for the wrong status once disabling.
>
> netdev_change_seq_changed(netdev);
>
> @@ -5213,7 +5528,13 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev)
> ovs_mutex_lock(&dev->mutex);
> if (dev->type == DPDK_DEV_ETH) {
> /* TODO: Check if we able to offload some minimal flow. */
> - ret = true;
> + if (dev->requested_cp_prot_flags || dev->cp_prot_flags) {
I am unclear if we need to check anything here, but checking
requested_cp_prot_flags, only, should be enough.
> + VLOG_WARN(
> + "%s: hw-offload is mutually exclusive with cp-protection",
> + netdev_get_name(netdev));
> + } else {
> + ret = true;
> + }
> }
> ovs_mutex_unlock(&dev->mutex);
> out:
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index f9bdb2d92bec..e34a1728108f 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3430,6 +3430,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> <p>This option may only be used with dpdk VF representors.</p>
> </column>
>
> + <column name="options" key="cp-protection"
> + type='{"type": "string", "enum": ["set", ["lacp"]]}'>
> + <p>
> + Allocate an extra Rx queue for control plane packets of the
> specified
> + protocol(s).
> + </p>
> + <p>
> + If the user has already configured multiple
> + <code>options:n_rxq</code> on the port, an additional one will be
> + allocated for control plane packets. If the hardware cannot satisfy
> + the requested number of requested Rx queues, the last Rx queue will
> + be assigned for control plane. If only one Rx queue is available or
> + if the hardware does not support the RTE flow matchers/actions
> + required to redirect the selected protocols,
> + <code>cp-protection</code> will be disabled.
> + </p>
> + <p>
> + This feature is multually exclusive with
> + <code>other_options:hw-offload</code> as it may conflict with the
> + offloaded RTE flows.
> + </p>
> + <p>
> + Disabled by default.
> + </p>
> + </column>
> +
> <column name="other_config" key="tx-steering"
> type='{"type": "string",
> "enum": ["set", ["thread", "hash"]]}'>
> --
> 2.38.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
--
David Marchand
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev