Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

Ilya Maximets Mon, 24 Jul 2017 04:59:17 -0700

On 23.07.2017 20:20, Darrell Ball wrote:
> Comments inline but this is mostly repetitive.
> 
> -----Original Message-----
> From: Ilya Maximets <[email protected]>
> Date: Monday, July 10, 2017 at 11:55 PM
> To: Darrell Ball <[email protected]>, "[email protected]" 
> <[email protected]>
> Cc: Heetae Ahn <[email protected]>, Daniele Di Proietto 
> <[email protected]>, Ben Pfaff <[email protected]>, Pravin Shelar 
> <[email protected]>, Ciara Loftus <[email protected]>, Ian Stokes 
> <[email protected]>, Kevin Traynor <[email protected]>
> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on 
> pmd-cpu-mask changes.
> 
>     On 11.07.2017 05:10, Darrell Ball wrote:
>     > 
>     > 
>     > On 7/10/17, 12:41 AM, "Ilya Maximets" <[email protected]> wrote:
>     > 
>     >     On 07.07.2017 21:09, Darrell Ball wrote:
>     >     > 
>     >     > 
>     >     > On 7/6/17, 11:11 PM, "Ilya Maximets" <[email protected]> 
> wrote:
>     >     > 
>     >     >     On 07.07.2017 08:08, Darrell Ball wrote:
>     >     >     > 
>     >     >     > 
>     >     >     > On 5/30/17, 7:12 AM, "Ilya Maximets" 
> <[email protected]> wrote:
>     >     >     > 
>     >     >     >     Reconfiguration of HW NICs may lead to packet drops.
>     >     >     >     In current model all physical ports will be 
> reconfigured each
>     >     >     >     time number of PMD threads changed. Since we not 
> stopping
>     >     >     >     threads on pmd-cpu-mask changes, this patch will help 
> to further
>     >     >     >     decrease port's downtime by setting the maximum 
> possible number
>     >     >     >     of wanted tx queues to avoid unnecessary 
> reconfigurations.
>     >     >     >     
>     >     >     >     Signed-off-by: Ilya Maximets <[email protected]>
>     >     >     >     ---
>     >     >     >      lib/dpif-netdev.c | 26 +++++++++++++++++++++-----
>     >     >     >      1 file changed, 21 insertions(+), 5 deletions(-)
>     >     >     >     
>     >     >     >     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     >     >     >     index 596d133..79db770 100644
>     >     >     >     --- a/lib/dpif-netdev.c
>     >     >     >     +++ b/lib/dpif-netdev.c
>     >     >     >     @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct 
> dp_netdev *dp)
>     >     >     >      {
>     >     >     >          struct dp_netdev_pmd_thread *pmd;
>     >     >     >          struct dp_netdev_port *port;
>     >     >     >     -    int wanted_txqs;
>     >     >     >     +    int needed_txqs, wanted_txqs;
>     >     >     >      
>     >     >     >          dp->last_reconfigure_seq = 
> seq_read(dp->reconfigure_seq);
>     >     >     >      
>     >     >     >     @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct 
> dp_netdev *dp)
>     >     >     >           * on the system and the user configuration. */
>     >     >     >          reconfigure_pmd_threads(dp);
>     >     >     >      
>     >     >     >     -    wanted_txqs = cmap_count(&dp->poll_threads);
>     >     >     >     +    /* We need 1 Tx queue for each thread to avoid 
> locking, but we will try
>     >     >     >     +     * to allocate the maximum possible value to 
> minimize the number of port
>     >     >     >     +     * reconfigurations. */
>     >     >     >     +    needed_txqs = cmap_count(&dp->poll_threads);
>     >     >     >     +    /* (n_cores + 1) is the maximum that we might need 
> to have.
>     >     >     >     +     * Additional queue is for non-PMD threads. */
>     >     >     >     +    wanted_txqs = ovs_numa_get_n_cores();
>     >     >     >     +    ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>     >     >     >     +    wanted_txqs++;
>     >     >     > 
>     >     >     > I don’t think PMD mask changes are common, so this patch is 
> trying to optimize a 
>     >     >     > rare disruptive event that can/will be scheduled by the 
> administrator.
>     >     >     > 
>     >     >     > Based on the actual number of queues supported and the 
> number of cores present,
>     >     >     > this optimization may or may not work. It is unpredictable 
> whether there will be benefit
>     >     >     > in a particular case from the user POV.
>     >     >     > If I were the administrator, I would try to error on the 
> conservative side anyways if I could
>     >     >     > not predict the result.
>     >     >     > 
>     >     >     > Did I miss something ?
>     >     >     
>     >     >     In NFV environment if you want to add one more VM to your 
> hosts you will have to
>     >     >     choose between:
>     >     >     
>     >     >           * not creating a new PMD thread
>     >     >                   -> performance degradation of networking for 
> other working VMs
>     >     > 
>     >     >     
>     >     >           * adding of the new PMD thread
>     >     >                   -> desruption of the networking for the whole 
> host for the time
>     >     >                      of OVS reconfiguration.
>     >     >     
>     >     >     This patch removes the cost of the second option.
>     >     > 
>     >     > In general, adding a PMD thread per VM may not always (or even 
> usually) make sense.
>     >     
>     >     Not per VM. Lets assume that all existing threads are already 
> overloaded.
>     > 
>     > That would be less common; it is not the usual case.
>     
>     We have different points of view. We need more opinions on that.
>     
>     >     
>     >     > There are use cases for sure, but using a PMD core per VM is 
> often viewed as dpdk using too much
>     >     > cpu resources and limiting the adoption of dpdk.
>     >     > 
>     >     > Furthermore, for dpdk gateways it is irrelevant.
>     >     
>     >     Disagree with that statement.
>     > 
>     > dpdk gateways should not usually require changing pmd-cpu-mask when a 
> VM is added to hypervisor
>     > somewhere. Sometimes you need to deploy more resources for sure, but 
> this should be much less
>     > frequent than adding or removing VMs.
>     > The point is the user should be rarely changing the pmd-cpu-mask and 
> this can be done during
>     > a maintenance window when needed.
>     > 
>     > 
>     >     >     
>     >     >     I don't understand what you mean saying 'unpredictable 
> benefit'. The benefit is clear
>     >     >     and this optimization will work, except for HW NICs with very 
> low number of HW queues.
>     >     > 
>     >     > HW nic interfaces carry aggregated traffic for multiple VMs etc, 
> so these cases are most important.
>     >     
>     >     And that is the point why this patch implemented. This patch tries 
> to
>     >     minimize the packet drops on the "shared" between VMs HW NICs. It
>     >     allows to add new threads without breaking networking for others not
>     >     reconfiguring device that handles traffic for other VMs at this 
> moment.
>     > 
>     > I understand what this patch wants to do and I understand what the last 
> designs wanted to
>     > do as well.
>     > 
>     >     
>     >     > It is the ratio of the number of cores to hw queues that matter. 
> If queues are less than
>     >     > cores, then the benefit may not occur.
>     >     
>     >     That's right. But are you saying that we should not apply 
> optimization if
>     >     it covers only 50% of cases? (There is bigger percentage, actually, 
> because
>     >     modern NICs supports pretty much tx queues)
>     > 
>     > Nobody knows what the number is, but I know that the number of cpus is 
> increasing
>     > and making assumptions about the relative numbers of cores vs queues 
> can only work by luck.
>     
>     As you mentioned previously, you think that creating large number of PMD
>     threads is not a common case (this follows from the "adding a PMD thread 
> per
>     VM may not always (or even usually) make sense."). This means that 
> optimization
>     from that patch will usually work because number of PMD threads is much 
> less
>     than number of cores and usually will be comparable or less than number 
> of queues.
> 
> Again, what I previously said is that I don’t expect each VM to have a 
> dedicated PMD thread in the general case.
> The reasoning comes the cost based on utilization. In general, the high 
> reservation of cores for PMD threads
> is persistently mentioned as a major impediment for DPDK adoption.
> 
> For DPDK gateways, I would expect anything is possible including most cores 
> having PMD threads running.
> 
> Again, in general, I am not making any assumptions on the ratio of cores to 
> queues, because I think these assumptions
> will break. The two parameters should be treated as independent parameters, 
> which they are.
> This patch makes assumptions about the ratio of these two independent 
> parameters.
> 
>     
>     And this will work now, not in some ephemeral future where DPDK will 
> return real
>     numbers for available queues. In the future this maybe will work too, 
> because not
>     only number of CPU cores increasing.
> 
> Again, the issue here is not whether it will work in some cases. The issues 
> here is to give
> feedback to the user when it might work. Changing the pmd-cpu-mask is an 
> infrequent administrative 
> command used on a ‘non-experimental product’, so the user should have 
> feedback on what 
> would to be the result of the command.
> I might be ok with this patch if OVS-DPDK were still labelled as 
> experimental, but I would not
> want to go back to the experimental tag if possible.
> 
>     > It is inherently unpredictable and unpredictable for a user 
> initiated/controlled change is ugly.
>     > This is even worse when the user initiated change is rare because the 
> user does not remember
>     > the full context of the command.
>     > 
>     >     
>     >     > This patch makes assumptions about the relative number of cores 
> vs the number of queues.
>     >     > These two parameters should be treated as independent – this part 
> really bothers me.
>     >     
>     >     It doesn't make any assumptions, it just optimizes one of 2 
> possible situations.
>     >     If the number of queues will be less, the behaviour will be exactly 
> same as in
>     >     current master.
>     > 
>     > We are repeating the same thing.
>     > 
>     >     
>     >     > I think we could solve this in a coordinated generic fashion with 
> a user management command
>     >     > that would also give feedback to the user as to the possible 
> success and maybe provide options.
>     >     > I wonder if we will eventually go in that direction, in the 
> future, anyways ?
>     >     
>     >     I don't think so. We don't need to have user management command 
> because the
>     >     only sane values for the number of queues right now is the number 
> of threads
>     >     or the maximum number of available queues in hardware. We're using 
> the
>     >     first one only because HW is not able to say how many queues it 
> able to
>     >     successfully configure.
>     > 
>     > The fact that hardware cannot always report the correct number of 
> queues is an issue
>     > that will get solved over time and should be part of what a user 
> command reports
>     > (eg) “unknown queue number, therefore it is unknown whether a 
> pmd-cpu-mask change
>     > will be hitless”, issue the following command ‘X’ when you would like 
> to make the change”).
>     
>     Reconfiguration will never be hitless until we're not even trying to make
>     it more lightweight.
>     
>     > 
>     > A wrapper user command(s) would provide visibility and predictability 
> to the user.
>     > i.e. we can do better.
>     
>     *Why we need the configuration knob which can only decrease the 
> performance?*
>     
>     This will end up in situation where users will just be forced to configure
>     number of tx queues for each port equal to number of cores in cpu mask.
>     Or they will just set up it to 1000 for all the ports to be sure that 
> maximum
>     possible value was configured.
> 
> The key aspect here is about giving feedback to user about the result of a 
> potential 
> pmd-cpu-mask change. I think it can done with the results being one of three 
> possibilities.
> 1/ Command will be hitless.
> 2/ Command will not be hitless.
> 3/ Command hitless impact is unknown; assume traffic will be affected.
> 
> This should be supported by documentation.
> 
> With this patch as is, the result is unknown in all cases.



How about following incremental:

--8<----------------------------------------------------------------------->8--
diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index a05aa1a..0462784 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -458,8 +458,19 @@ affinitized accordingly.
       $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
 
   .. note::
-    pmd thread on a NUMA node is only created if there is at least one DPDK
-    interface from that NUMA node added to OVS.
+    - pmd thread on a NUMA node is only created if there is at least one DPDK
+      interface from that NUMA node added to OVS.
+
+    - Addition/deletion of PMD threads may be hitless for Rx queues pinned via
+      ``pmd-rxq-affinity`` if the threads being deleted does not handle them
+      and ports has only one or big enough number of Tx queues. If device's
+      performance could be affected during ``pmd-cpu-mask`` changes, a warning
+      appears in the log: ::
+
+        Port 'port' has low number of Tx queues. Traffic through this port
+        may be affected during reconfiguration if the number of PMD threads
+        is changed to { less than or equal to | greater than } N.
+
 
 - QEMU vCPU thread Affinity
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 74d3535..2a92d86 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3531,7 +3531,19 @@ reconfigure_datapath(struct dp_netdev *dp)
             seq_change(dp->port_seq);
             port_destroy(port);
         } else {
-            port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
+            int n_txq = netdev_n_txq(port->netdev);
+
+            port->dynamic_txqs = n_txq < needed_txqs;
+
+            if (n_txq > 1 && n_txq < wanted_txqs) {
+                VLOG_WARN("Port '%s' has low number of Tx queues. "
+                          "Traffic through this port may be affected during "
+                          "reconfiguration if the number of PMD threads is "
+                          "changed to %s %d.", netdev_get_name(port->netdev),
+                          port->dynamic_txqs ? "less than or equal to"
+                                             : "greater than",
+                          n_txq - 1);
+            }
         }
     }
 
--8<----------------------------------------------------------------------->8--

This will give an administrator all the required information to predict
performance impact of pmd-cpu-mask changes.


>     
>     And this option will have to configure maximum possible value less or 
> equal
>     to passed one because we have no way to find the maximum value until dpdk 
> not
>     fixed (which is actually the hard problem).
>     Otherwise, they will have to bisect it and that is the ugly thing.
> 
> ‘IF’ there are issues with rte_eth_dev_info_get(), ‘in some modes’, with 
> returning the real number
> of hardware queues, then this can be characterized (meaning in which cases) 
> and fixed. 
> In such a case, the feedback to the user should be conservative and fall into 
> ‘3’ above.
>     
>     
>     >     This patch allows us to use the second value in all cases where 
> possible.
>     >     
>     >     > 
>     >     >     But situation will never be worse than without this patch 
> anyway.
>     >     >     
>     >     >     About PMD mask changes:
>     >     >     Even one change in a week can be disruptive in high-loaded 
> NFV environment.
>     >     >     Also, this changes can be performed automatically by 
> monitoring tools while
>     >     >     VMs migrations or rebalancing the load between PMD threads to 
> free the CPU
>     >     >     cores for other applications/VMs.
>     >     >     
>     >     >     > 
>     >     >     >          /* The number of pmd threads might have changed, 
> or a port can be new:
>     >     >     >           * adjust the txqs. */
>     >     >     >     @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct 
> dp_netdev *dp)
>     >     >     >      
>     >     >     >          /* Check for all the ports that need 
> reconfiguration.  We cache this in
>     >     >     >           * 'port->reconfigure', because 
> netdev_is_reconf_required() can change at
>     >     >     >     -     * any time. */
>     >     >     >     +     * any time.
>     >     >     >     +     * Also mark for reconfiguration all ports which 
> will likely change their
>     >     >     >     +     * 'dynamic_txqs' parameter. It's required to stop 
> using them before
>     >     >     >     +     * changing this setting and it's simpler to mark 
> ports here and allow
>     >     >     >     +     * 'pmd_remove_stale_ports' to remove them from 
> threads. There will be
>     >     >     >     +     * no actual reconfiguration in 'port_reconfigure' 
> because it's
>     >     >     >     +     * unnecessary.  */
>     >     >     >          HMAP_FOR_EACH (port, node, &dp->ports) {
>     >     >     >     -        if (netdev_is_reconf_required(port->netdev)) {
>     >     >     >     +        if (netdev_is_reconf_required(port->netdev)
>     >     >     >     +            || (port->dynamic_txqs
>     >     >     >     +                !=  netdev_n_txq(port->netdev) < 
> needed_txqs)) {
>     >     >     >                  port->need_reconfigure = true;
>     >     >     >              }
>     >     >     >          }
>     >     >     >     @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct 
> dp_netdev *dp)
>     >     >     >                  seq_change(dp->port_seq);
>     >     >     >                  port_destroy(port);
>     >     >     >              } else {
>     >     >     >     -            port->dynamic_txqs = 
> netdev_n_txq(port->netdev) < wanted_txqs;
>     >     >     >     +            port->dynamic_txqs = 
> netdev_n_txq(port->netdev) < needed_txqs;
>     >     >     >              }
>     >     >     >          }
>     >     >     >      
>     >     >     >     -- 
>     >     >     >     2.7.4
>     
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

Reply via email to