On 5/3/23 11:49, Kevin Traynor wrote: > On 28/04/2023 19:16, Ilya Maximets wrote: >> On 4/14/23 17:44, Kevin Traynor wrote: >>> By default OVS configures 2048 descriptors for tx and rx queues >>> on DPDK devices. It also allows the user to configure those values. >>> >>> If the values used are not acceptable to the device then queue setup >>> would fail. >>> >>> The device exposes it's max/min/alignment requirements and OVS >>> applies some limits also. Use these to ensure an acceptable value >>> is used for the number of descriptors on a device tx/rx. >>> >>> If the default or user value is not acceptable, adjust to a suitable >>> value and log. >>> >>> Reported-at: https://bugzilla.redhat.com/2119876 >>> Signed-off-by: Kevin Traynor <[email protected]> >>> --- >>> lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++---------- >>> 1 file changed, 61 insertions(+), 15 deletions(-) >>> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index fb0dd43f7..cf91990e5 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -510,4 +510,9 @@ struct netdev_dpdk { >>> int requested_txq_size; >>> + /* Adjusted sizes are requested sizes after any adjustments >>> + * for OVS or device limits. */ >>> + int adjusted_rxq_size; >>> + int adjusted_txq_size; >>> + >>> /* Number of rx/tx descriptors for physical devices */ >>> int rxq_size; >>> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const >>> struct smap *args) >>> static void >>> dpdk_process_queue_size(struct netdev *netdev, const struct smap *args, >>> - const char *flag, int default_size, int *new_size) >>> + struct rte_eth_dev_info *info, bool is_rx) >>> { >>> - int queue_size = smap_get_int(args, flag, default_size); >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> + struct rte_eth_desc_lim *lim; >>> + int default_size, queue_size; >>> + int *requested_size, *adjusted_size; >>> + if (is_rx) { >>> + default_size = NIC_PORT_DEFAULT_RXQ_SIZE; >>> + queue_size = smap_get_int(args, "n_rxq_desc", default_size); >>> + requested_size = &dev->requested_rxq_size; >>> + adjusted_size = &dev->adjusted_rxq_size; >>> + lim = info ? &info->rx_desc_lim : NULL; >>> + >>> + } else { >>> + default_size = NIC_PORT_DEFAULT_TXQ_SIZE; >>> + queue_size = smap_get_int(args, "n_txq_desc", default_size); >>> + requested_size = &dev->requested_txq_size; >>> + adjusted_size = &dev->adjusted_txq_size; >>> + lim = info ? &info->tx_desc_lim : NULL; >>> + } >>> + >>> + *requested_size = queue_size; >>> + >>> + /* Check for OVS limits. */ >>> if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE >>> || !is_pow2(queue_size)) { >>> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const >>> struct smap *args, >>> } >>> - if (queue_size != *new_size) { >>> - *new_size = queue_size; >>> + if (lim) { >>> + /* Check for device limits. */ >>> + if (lim->nb_align) { >>> + queue_size = ROUND_UP(queue_size, lim->nb_align); >>> + } >>> + queue_size = MIN(queue_size, lim->nb_max); >>> + queue_size = MAX(queue_size, lim->nb_min); >>> + } >>> + >>> + if (queue_size != *adjusted_size) { >>> + *adjusted_size = queue_size; >>> netdev_request_reconfigure(netdev); >>> } >>> @@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const >>> struct smap *args, >>> {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL } >>> }; >>> + struct rte_eth_dev_info info; >>> const char *new_devargs; >>> const char *vf_mac; >>> int err = 0; >>> + int ret; >>> ovs_mutex_lock(&dpdk_mutex); >>> @@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const >>> struct smap *args, >>> dpdk_set_rxq_config(dev, args); >>> - dpdk_process_queue_size(netdev, args, "n_rxq_desc", >>> - NIC_PORT_DEFAULT_RXQ_SIZE, >>> - &dev->requested_rxq_size); >>> - dpdk_process_queue_size(netdev, args, "n_txq_desc", >>> - NIC_PORT_DEFAULT_TXQ_SIZE, >>> - &dev->requested_txq_size); >>> - >>> new_devargs = smap_get(args, "dpdk-devargs"); >>> @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const >>> struct smap *args, >>> } >>> + ret = rte_eth_dev_info_get(dev->port_id, &info); >>> + >>> + dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true); >>> + dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false); >>> + >>> out: >>> ovs_mutex_unlock(&dev->mutex); >>> @@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >>> && dev->mtu == dev->requested_mtu >>> && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode >>> - && dev->rxq_size == dev->requested_rxq_size >>> - && dev->txq_size == dev->requested_txq_size >>> + && dev->rxq_size == dev->adjusted_rxq_size >>> + && dev->txq_size == dev->adjusted_txq_size >>> && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr) >>> && dev->socket_id == dev->requested_socket_id >>> @@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >>> netdev->n_rxq = dev->requested_n_rxq; >>> - dev->rxq_size = dev->requested_rxq_size; >>> - dev->txq_size = dev->requested_txq_size; >>> + dev->rxq_size = dev->adjusted_rxq_size; >>> + if (dev->rxq_size != dev->requested_rxq_size) { >>> + VLOG_INFO("Interface %s cannot set rx descriptor size to %d. " >>> + "Adjusted to %d.", dev->up.name, dev->requested_rxq_size, >>> + dev->rxq_size); >>> + } >>> + >>> + dev->txq_size = dev->adjusted_txq_size; >>> + if (dev->txq_size != dev->requested_txq_size) { >>> + VLOG_INFO("Interface %s cannot set tx descriptor size to %d. " >>> + "Adjusted to %d.", dev->up.name, dev->requested_txq_size, >>> + dev->txq_size); >>> + } >> >> Hi, Kevin. >> > > Hi Ilya, thanks for reviewing. > >> I'm not sure if there is a point in having 'requested' variables after this >> change. >> > > I kept requested for the log in get_config only, will explain more below. > >> Both 'adjusted' and 'requested' are determined after >> dpdk_process_queue_size() and >> not going to change. So, the logs above can likely be moved to that function >> under if (queue_size != *adjusted_size). >> > > yes, that's true. I can move it there. > >> The only side effect is that we will log adjustment even if device >> configuration >> ultimately fails, but it doesn't sound like an issue. >> > > agreed > >> The other place where 'requested' value is used is mempool size calculation. >> And we should use the adjusted value there instead, because that is the >> number >> of buffers we will ultimately use. >> > > ack > >> The last place where it is used is the get_config() callback. Information >> there >> is not something we should report anyway, so I don't see the reason why we >> can't >> change the 'requested' to 'adjusted' there. Maybe use a different word >> though, >> so users can better understand what happened. In general, get_config() >> should >> just return the same 'n_rxq_desc' that was initially provided by the user, >> but >> returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, >> because >> by setting that value user should be able to reproduce the configuration, and >> that is the actual meaning of the get_config() callback. >> >> All in all, the 'requested' value can probably be just a local variable >> inside >> of the dpdk_process_queue_size(). >> >> What do you think? >> > > ok, I think I understand where the difference in approach is coming from. I > wasn't sure the reason that the previous 'requested' (post OVS adjustements) > was reported as it would be the same as the configured values. > > Seen as it is called requested and the user param is called requested, I > thought it should show what the user requested value was (or OVS default in > absence). > > Now that the requested and adjusted will be logged if there are any changes > needed to what the user has requested/default, it won't be silently adjusted > anymore. > > So if 'user requested' doesn't fit the context of get_config, then I can > remove it from there. Though it can sometimes be handier to pull the info > with a command than scroll through logs.
You should still be able to pull the value. The value that user have requested can be obtained from the 'options' column in the database record. And if we report the actual value in use as 'n_rxq_desc' in get_config(), it can be seen with dpif/show command. > > Not sure if there's much point to continue to have 'adjusted' in get_config, > it will be the same as n_rx/tx_desc and it may be hard to find a good name. > If it's kept I would definitely suggest we don't have 'requested' in the name > anymore as that is easily misinterpreted as being the user requested value. I'd say we should just have 'n_r/txq_desc' in the output that will represent what is actually configured, without any extra prefixes. Also, see the implementation of the "set-if" command in dpctl.c, which is currently broken for netdev-dpdk due to free format result of the get_config(). TBH, I don't think anyone is using it, but it does exist and it highlights the intended use of the netdev API. > > Hope that is readable :-) > > thanks, > Kevin. > >> Best regards, Ilya Maximets. >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
