On 03/05/2023 12:37, Ilya Maximets wrote:
On 5/3/23 11:49, Kevin Traynor wrote:
On 28/04/2023 19:16, Ilya Maximets wrote:
On 4/14/23 17:44, Kevin Traynor wrote:
By default OVS configures 2048 descriptors for tx and rx queues
on DPDK devices. It also allows the user to configure those values.

If the values used are not acceptable to the device then queue setup
would fail.

The device exposes it's max/min/alignment requirements and OVS
applies some limits also. Use these to ensure an acceptable value
is used for the number of descriptors on a device tx/rx.

If the default or user value is not acceptable, adjust to a suitable
value and log.

Reported-at: https://bugzilla.redhat.com/2119876
Signed-off-by: Kevin Traynor <[email protected]>
---
   lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
   1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index fb0dd43f7..cf91990e5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -510,4 +510,9 @@ struct netdev_dpdk {
           int requested_txq_size;
   +        /* Adjusted sizes are requested sizes after any adjustments
+         * for OVS or device limits. */
+        int adjusted_rxq_size;
+        int adjusted_txq_size;
+
           /* Number of rx/tx descriptors for physical devices */
           int rxq_size;
@@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const 
struct smap *args)
   static void
   dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
-                        const char *flag, int default_size, int *new_size)
+                        struct rte_eth_dev_info *info, bool is_rx)
   {
-    int queue_size = smap_get_int(args, flag, default_size);
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    struct rte_eth_desc_lim *lim;
+    int default_size, queue_size;
+    int *requested_size, *adjusted_size;
   +    if (is_rx) {
+        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
+        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
+        requested_size = &dev->requested_rxq_size;
+        adjusted_size = &dev->adjusted_rxq_size;
+        lim = info ? &info->rx_desc_lim : NULL;
+
+    } else {
+        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
+        queue_size = smap_get_int(args, "n_txq_desc", default_size);
+        requested_size = &dev->requested_txq_size;
+        adjusted_size = &dev->adjusted_txq_size;
+        lim = info ? &info->tx_desc_lim : NULL;
+    }
+
+    *requested_size = queue_size;
+
+    /* Check for OVS limits. */
       if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
               || !is_pow2(queue_size)) {
@@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const 
struct smap *args,
       }
   -    if (queue_size != *new_size) {
-        *new_size = queue_size;
+    if (lim) {
+        /* Check for device limits. */
+        if (lim->nb_align) {
+            queue_size = ROUND_UP(queue_size, lim->nb_align);
+        }
+        queue_size = MIN(queue_size, lim->nb_max);
+        queue_size = MAX(queue_size, lim->nb_min);
+    }
+
+    if (queue_size != *adjusted_size) {
+        *adjusted_size = queue_size;
           netdev_request_reconfigure(netdev);
       }
@@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
           {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
       };
+    struct rte_eth_dev_info info;
       const char *new_devargs;
       const char *vf_mac;
       int err = 0;
+    int ret;
         ovs_mutex_lock(&dpdk_mutex);
@@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
       dpdk_set_rxq_config(dev, args);
   -    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
-                            NIC_PORT_DEFAULT_RXQ_SIZE,
-                            &dev->requested_rxq_size);
-    dpdk_process_queue_size(netdev, args, "n_txq_desc",
-                            NIC_PORT_DEFAULT_TXQ_SIZE,
-                            &dev->requested_txq_size);
-
       new_devargs = smap_get(args, "dpdk-devargs");
   @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
       }
   +    ret = rte_eth_dev_info_get(dev->port_id, &info);
+
+    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
+    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
+
   out:
       ovs_mutex_unlock(&dev->mutex);
@@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
           && dev->mtu == dev->requested_mtu
           && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
-        && dev->rxq_size == dev->requested_rxq_size
-        && dev->txq_size == dev->requested_txq_size
+        && dev->rxq_size == dev->adjusted_rxq_size
+        && dev->txq_size == dev->adjusted_txq_size
           && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
           && dev->socket_id == dev->requested_socket_id
@@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
       netdev->n_rxq = dev->requested_n_rxq;
   -    dev->rxq_size = dev->requested_rxq_size;
-    dev->txq_size = dev->requested_txq_size;
+    dev->rxq_size = dev->adjusted_rxq_size;
+    if (dev->rxq_size != dev->requested_rxq_size) {
+        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
+                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
+                  dev->rxq_size);
+    }
+
+    dev->txq_size = dev->adjusted_txq_size;
+    if (dev->txq_size != dev->requested_txq_size) {
+        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
+                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
+                  dev->txq_size);
+    }

Hi, Kevin.


Hi Ilya, thanks for reviewing.

I'm not sure if there is a point in having 'requested' variables after this 
change.


I kept requested for the log in get_config only, will explain more below.

Both 'adjusted' and 'requested' are determined after dpdk_process_queue_size() 
and
not going to change.  So, the logs above can likely be moved to that function
under if (queue_size != *adjusted_size).


yes, that's true. I can move it there.

The only side effect is that we will log adjustment even if device configuration
ultimately fails, but it doesn't sound like an issue.


agreed

The other place where 'requested' value is used is mempool size calculation.
And we should use the adjusted value there instead, because that is the number
of buffers we will ultimately use.


ack

The last place where it is used is the get_config() callback.  Information there
is not something we should report anyway, so I don't see the reason why we can't
change the 'requested' to 'adjusted' there.  Maybe use a different word though,
so users can better understand what happened.  In general, get_config() should
just return the same 'n_rxq_desc' that was initially provided by the user, but
returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, because
by setting that value user should be able to reproduce the configuration, and
that is the actual meaning of the get_config() callback.

All in all, the 'requested' value can probably be just a local variable inside
of the dpdk_process_queue_size().

What do you think?


ok, I think I understand where the difference in approach is coming from. I 
wasn't sure the reason that the previous 'requested' (post OVS adjustements) 
was reported as it would be the same as the configured values.

Seen as it is called requested and the user param is called requested, I 
thought it should show what the user requested value was (or OVS default in 
absence).

Now that the requested and adjusted will be logged if there are any changes 
needed to what the user has requested/default, it won't be silently adjusted 
anymore.

So if 'user requested' doesn't fit the context of get_config, then I can remove 
it from there. Though it can sometimes be handier to pull the info with a 
command than scroll through logs.

You should still be able to pull the value.  The value that user have
requested can be obtained from the 'options' column in the database
record.  And if we report the actual value in use as 'n_rxq_desc' in
get_config(), it can be seen with dpif/show command.


Ah yes, you're right - i'd just been thinking of the get_config. Those two together with the new logs should be plenty of info for the user.


Not sure if there's much point to continue to have 'adjusted' in get_config, it 
will be the same as n_rx/tx_desc and it may be hard to find a good name. If 
it's kept I would definitely suggest we don't have 'requested' in the name 
anymore as that is easily misinterpreted as being the user requested value.

I'd say we should just have 'n_r/txq_desc' in the output that will
represent what is actually configured, without any extra prefixes.


Sounds good.

Also, see the implementation of the "set-if" command in dpctl.c,
which is currently broken for netdev-dpdk due to free format result
of the get_config().  TBH, I don't think anyone is using it, but
it does exist and it highlights the intended use of the netdev API.


Thanks.


Hope that is readable :-)

thanks,
Kevin.

Best regards, Ilya Maximets.




_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to