Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

Darrell Ball Tue, 25 Jul 2017 16:34:09 -0700


-----Original Message-----
From: Ilya Maximets <[email protected]>
Date: Monday, July 24, 2017 at 4:58 AM
To: Darrell Ball <[email protected]>, "[email protected]" 
<[email protected]>
Cc: Heetae Ahn <[email protected]>, Daniele Di Proietto 
<[email protected]>, Ben Pfaff <[email protected]>, Pravin Shelar 
<[email protected]>, Ciara Loftus <[email protected]>, Ian Stokes 
<[email protected]>, Kevin Traynor <[email protected]>
Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on 
pmd-cpu-mask changes.


    On 23.07.2017 20:20, Darrell Ball wrote:
    > Comments inline but this is mostly repetitive.
    > 
    > -----Original Message-----
    > From: Ilya Maximets <[email protected]>
    > Date: Monday, July 10, 2017 at 11:55 PM
    > To: Darrell Ball <[email protected]>, "[email protected]" 
<[email protected]>
    > Cc: Heetae Ahn <[email protected]>, Daniele Di Proietto 
<[email protected]>, Ben Pfaff <[email protected]>, Pravin Shelar 
<[email protected]>, Ciara Loftus <[email protected]>, Ian Stokes 
<[email protected]>, Kevin Traynor <[email protected]>
    > Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on 
pmd-cpu-mask changes.
    > 
    >     On 11.07.2017 05:10, Darrell Ball wrote:
    >     > 
    >     > 
    >     > On 7/10/17, 12:41 AM, "Ilya Maximets" <[email protected]> 
wrote:
    >     > 
    >     >     On 07.07.2017 21:09, Darrell Ball wrote:
    >     >     > 
    >     >     > 
    >     >     > On 7/6/17, 11:11 PM, "Ilya Maximets" <[email protected]> 
wrote:
    >     >     > 
    >     >     >     On 07.07.2017 08:08, Darrell Ball wrote:
    >     >     >     > 
    >     >     >     > 
    >     >     >     > On 5/30/17, 7:12 AM, "Ilya Maximets" 
<[email protected]> wrote:
    >     >     >     > 
    >     >     >     >     Reconfiguration of HW NICs may lead to packet drops.
    >     >     >     >     In current model all physical ports will be 
reconfigured each
    >     >     >     >     time number of PMD threads changed. Since we not 
stopping
    >     >     >     >     threads on pmd-cpu-mask changes, this patch will 
help to further
    >     >     >     >     decrease port's downtime by setting the maximum 
possible number
    >     >     >     >     of wanted tx queues to avoid unnecessary 
reconfigurations.
    >     >     >     >     
    >     >     >     >     Signed-off-by: Ilya Maximets 
<[email protected]>
    >     >     >     >     ---
    >     >     >     >      lib/dpif-netdev.c | 26 +++++++++++++++++++++-----
    >     >     >     >      1 file changed, 21 insertions(+), 5 deletions(-)
    >     >     >     >     
    >     >     >     >     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    >     >     >     >     index 596d133..79db770 100644
    >     >     >     >     --- a/lib/dpif-netdev.c
    >     >     >     >     +++ b/lib/dpif-netdev.c
    >     >     >     >     @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct 
dp_netdev *dp)
    >     >     >     >      {
    >     >     >     >          struct dp_netdev_pmd_thread *pmd;
    >     >     >     >          struct dp_netdev_port *port;
    >     >     >     >     -    int wanted_txqs;
    >     >     >     >     +    int needed_txqs, wanted_txqs;
    >     >     >     >      
    >     >     >     >          dp->last_reconfigure_seq = 
seq_read(dp->reconfigure_seq);
    >     >     >     >      
    >     >     >     >     @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct 
dp_netdev *dp)
    >     >     >     >           * on the system and the user configuration. */
    >     >     >     >          reconfigure_pmd_threads(dp);
    >     >     >     >      
    >     >     >     >     -    wanted_txqs = cmap_count(&dp->poll_threads);
    >     >     >     >     +    /* We need 1 Tx queue for each thread to avoid 
locking, but we will try
    >     >     >     >     +     * to allocate the maximum possible value to 
minimize the number of port
    >     >     >     >     +     * reconfigurations. */
    >     >     >     >     +    needed_txqs = cmap_count(&dp->poll_threads);
    >     >     >     >     +    /* (n_cores + 1) is the maximum that we might 
need to have.
    >     >     >     >     +     * Additional queue is for non-PMD threads. */
    >     >     >     >     +    wanted_txqs = ovs_numa_get_n_cores();
    >     >     >     >     +    ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
    >     >     >     >     +    wanted_txqs++;
    >     >     >     > 
    >     >     >     > I don’t think PMD mask changes are common, so this 
patch is trying to optimize a 
    >     >     >     > rare disruptive event that can/will be scheduled by the 
administrator.
    >     >     >     > 
    >     >     >     > Based on the actual number of queues supported and the 
number of cores present,
    >     >     >     > this optimization may or may not work. It is 
unpredictable whether there will be benefit
    >     >     >     > in a particular case from the user POV.
    >     >     >     > If I were the administrator, I would try to error on 
the conservative side anyways if I could
    >     >     >     > not predict the result.
    >     >     >     > 
    >     >     >     > Did I miss something ?
    >     >     >     
    >     >     >     In NFV environment if you want to add one more VM to your 
hosts you will have to
    >     >     >     choose between:
    >     >     >     
    >     >     >       * not creating a new PMD thread
    >     >     >               -> performance degradation of networking for 
other working VMs
    >     >     > 
    >     >     >     
    >     >     >       * adding of the new PMD thread
    >     >     >               -> desruption of the networking for the whole 
host for the time
    >     >     >                  of OVS reconfiguration.
    >     >     >     
    >     >     >     This patch removes the cost of the second option.
    >     >     > 
    >     >     > In general, adding a PMD thread per VM may not always (or 
even usually) make sense.
    >     >     
    >     >     Not per VM. Lets assume that all existing threads are already 
overloaded.
    >     > 
    >     > That would be less common; it is not the usual case.
    >     
    >     We have different points of view. We need more opinions on that.
    >     
    >     >     
    >     >     > There are use cases for sure, but using a PMD core per VM is 
often viewed as dpdk using too much
    >     >     > cpu resources and limiting the adoption of dpdk.
    >     >     > 
    >     >     > Furthermore, for dpdk gateways it is irrelevant.
    >     >     
    >     >     Disagree with that statement.
    >     > 
    >     > dpdk gateways should not usually require changing pmd-cpu-mask when 
a VM is added to hypervisor
    >     > somewhere. Sometimes you need to deploy more resources for sure, 
but this should be much less
    >     > frequent than adding or removing VMs.
    >     > The point is the user should be rarely changing the pmd-cpu-mask 
and this can be done during
    >     > a maintenance window when needed.
    >     > 
    >     > 
    >     >     >     
    >     >     >     I don't understand what you mean saying 'unpredictable 
benefit'. The benefit is clear
    >     >     >     and this optimization will work, except for HW NICs with 
very low number of HW queues.
    >     >     > 
    >     >     > HW nic interfaces carry aggregated traffic for multiple VMs 
etc, so these cases are most important.
    >     >     
    >     >     And that is the point why this patch implemented. This patch 
tries to
    >     >     minimize the packet drops on the "shared" between VMs HW NICs. 
It
    >     >     allows to add new threads without breaking networking for 
others not
    >     >     reconfiguring device that handles traffic for other VMs at this 
moment.
    >     > 
    >     > I understand what this patch wants to do and I understand what the 
last designs wanted to
    >     > do as well.
    >     > 
    >     >     
    >     >     > It is the ratio of the number of cores to hw queues that 
matter. If queues are less than
    >     >     > cores, then the benefit may not occur.
    >     >     
    >     >     That's right. But are you saying that we should not apply 
optimization if
    >     >     it covers only 50% of cases? (There is bigger percentage, 
actually, because
    >     >     modern NICs supports pretty much tx queues)
    >     > 
    >     > Nobody knows what the number is, but I know that the number of cpus 
is increasing
    >     > and making assumptions about the relative numbers of cores vs 
queues can only work by luck.
    >     
    >     As you mentioned previously, you think that creating large number of 
PMD
    >     threads is not a common case (this follows from the "adding a PMD 
thread per
    >     VM may not always (or even usually) make sense."). This means that 
optimization
    >     from that patch will usually work because number of PMD threads is 
much less
    >     than number of cores and usually will be comparable or less than 
number of queues.
    > 
    > Again, what I previously said is that I don’t expect each VM to have a 
dedicated PMD thread in the general case.
    > The reasoning comes the cost based on utilization. In general, the high 
reservation of cores for PMD threads
    > is persistently mentioned as a major impediment for DPDK adoption.
    > 
    > For DPDK gateways, I would expect anything is possible including most 
cores having PMD threads running.
    > 
    > Again, in general, I am not making any assumptions on the ratio of cores 
to queues, because I think these assumptions
    > will break. The two parameters should be treated as independent 
parameters, which they are.
    > This patch makes assumptions about the ratio of these two independent 
parameters.
    > 
    >     
    >     And this will work now, not in some ephemeral future where DPDK will 
return real
    >     numbers for available queues. In the future this maybe will work too, 
because not
    >     only number of CPU cores increasing.
    > 
    > Again, the issue here is not whether it will work in some cases. The 
issues here is to give
    > feedback to the user when it might work. Changing the pmd-cpu-mask is an 
infrequent administrative 
    > command used on a ‘non-experimental product’, so the user should have 
feedback on what 
    > would to be the result of the command.
    > I might be ok with this patch if OVS-DPDK were still labelled as 
experimental, but I would not
    > want to go back to the experimental tag if possible.
    > 
    >     > It is inherently unpredictable and unpredictable for a user 
initiated/controlled change is ugly.
    >     > This is even worse when the user initiated change is rare because 
the user does not remember
    >     > the full context of the command.
    >     > 
    >     >     
    >     >     > This patch makes assumptions about the relative number of 
cores vs the number of queues.
    >     >     > These two parameters should be treated as independent – this 
part really bothers me.
    >     >     
    >     >     It doesn't make any assumptions, it just optimizes one of 2 
possible situations.
    >     >     If the number of queues will be less, the behaviour will be 
exactly same as in
    >     >     current master.
    >     > 
    >     > We are repeating the same thing.
    >     > 
    >     >     
    >     >     > I think we could solve this in a coordinated generic fashion 
with a user management command
    >     >     > that would also give feedback to the user as to the possible 
success and maybe provide options.
    >     >     > I wonder if we will eventually go in that direction, in the 
future, anyways ?
    >     >     
    >     >     I don't think so. We don't need to have user management command 
because the
    >     >     only sane values for the number of queues right now is the 
number of threads
    >     >     or the maximum number of available queues in hardware. We're 
using the
    >     >     first one only because HW is not able to say how many queues it 
able to
    >     >     successfully configure.
    >     > 
    >     > The fact that hardware cannot always report the correct number of 
queues is an issue
    >     > that will get solved over time and should be part of what a user 
command reports
    >     > (eg) “unknown queue number, therefore it is unknown whether a 
pmd-cpu-mask change
    >     > will be hitless”, issue the following command ‘X’ when you would 
like to make the change”).
    >     
    >     Reconfiguration will never be hitless until we're not even trying to 
make
    >     it more lightweight.
    >     
    >     > 
    >     > A wrapper user command(s) would provide visibility and 
predictability to the user.
    >     > i.e. we can do better.
    >     
    >     *Why we need the configuration knob which can only decrease the 
performance?*
    >     
    >     This will end up in situation where users will just be forced to 
configure
    >     number of tx queues for each port equal to number of cores in cpu 
mask.
    >     Or they will just set up it to 1000 for all the ports to be sure that 
maximum
    >     possible value was configured.
    > 
    > The key aspect here is about giving feedback to user about the result of 
a potential 
    > pmd-cpu-mask change. I think it can done with the results being one of 
three possibilities.
    > 1/ Command will be hitless.
    > 2/ Command will not be hitless.
    > 3/ Command hitless impact is unknown; assume traffic will be affected.
    > 
    > This should be supported by documentation.
    > 
    > With this patch as is, the result is unknown in all cases.
    
    
    How about following incremental:
    
    
--8<----------------------------------------------------------------------->8—

[Darrell]
Part of the idea that I was getting at is a show command that can be run before 
a 
pmd-cpu-mask change. We have a few of these commands already, like pmd-rxq-show.
This would give additional information about TX queues across ports, for 
example.
The command can be used outside of rare pmd-cpu-mask changes that this patch is 
about.

Additionally, having a max pmd setting that wanted TX queues could inherit from 
allows to increase PMDs
less disruptively without opening and configuring all the queues on all nics 
ports at startup for PMD usage,
regardless if only a limited number of tx queues are ever used for pmd threads.

A log is not nearly as informative.

I simulated having a 72 ‘core equivalents’ configuration and asked for 73 tx 
queues as this patch and incremental
proposed doing.

2017-07-25T20:00:13.728Z|00069|netdev_dpdk|WARN|DARRELL: 
dpdk_eth_dev_queue_setup port name 'dpdk0'
2017-07-25T20:00:13.887Z|00071|dpif_netdev|WARN|DARRELL1: Port 'dpdk0'; 
dynamic_txqs '0'; n_txq '64'; needed_txqs 3; wanted_txqs 73

2017-07-25T20:00:13.887Z|00072|dpif_netdev|WARN|Port 'dpdk0' has low number of 
Tx queues. Traffic through this port may be affected during reconfiguration if 
the number of PMD threads is changed to greater than 63.

Many nics have 32 or 64 max tx queues; some more and some less.

Some minor points below:


    diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
    index a05aa1a..0462784 100644
    --- a/Documentation/intro/install/dpdk.rst
    +++ b/Documentation/intro/install/dpdk.rst
    @@ -458,8 +458,19 @@ affinitized accordingly.
           $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
     
       .. note::
    -    pmd thread on a NUMA node is only created if there is at least one DPDK
    -    interface from that NUMA node added to OVS.
    +    - pmd thread on a NUMA node is only created if there is at least one 
DPDK
    +      interface from that NUMA node added to OVS.
    +
    +    - Addition/deletion of PMD threads may be hitless for Rx queues pinned 
via
    +      ``pmd-rxq-affinity`` if the threads being deleted does not handle 
them
    +      and ports has only one or big enough number of Tx queues. 

[Darrell]
“big number of Tx queues” is not very clear.

            If device's
    +      performance could be affected during ``pmd-cpu-mask`` changes, a 
warning
    +      appears in the log: ::

[Darrell]
I think you meant “performance could be affected” to be something like “traffic 
can be interrupted.” ;
applicable to an administrative command.

    +
    +        Port 'port' has low number of Tx queues. 

[Darrell]
“low number of queues” is hard for the user to understand

Traffic through this port
    +        may be affected during reconfiguration if the number of PMD threads
    +        is changed to { less than or equal to | greater than } N.
    +
     
     - QEMU vCPU thread Affinity
     
    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    index 74d3535..2a92d86 100644
    --- a/lib/dpif-netdev.c
    +++ b/lib/dpif-netdev.c
    @@ -3531,7 +3531,19 @@ reconfigure_datapath(struct dp_netdev *dp)
                 seq_change(dp->port_seq);
                 port_destroy(port);
             } else {
    -            port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
    +            int n_txq = netdev_n_txq(port->netdev);
    +
    +            port->dynamic_txqs = n_txq < needed_txqs;
    +
    +            if (n_txq > 1 && n_txq < wanted_txqs) {
    +                VLOG_WARN("Port '%s' has low number of Tx queues. "
    +                          "Traffic through this port may be affected 
during "
    +                          "reconfiguration if the number of PMD threads is 
"
    +                          "changed to %s %d.", 
netdev_get_name(port->netdev),
    +                          port->dynamic_txqs ? "less than or equal to"
    +                                             : "greater than",
    +                          n_txq - 1);
    +            }
             }
         }
     
    
--8<----------------------------------------------------------------------->8--
    
    This will give an administrator all the required information to predict
    performance impact of pmd-cpu-mask changes.
    
    
    >     
    >     And this option will have to configure maximum possible value less or 
equal
    >     to passed one because we have no way to find the maximum value until 
dpdk not
    >     fixed (which is actually the hard problem).
    >     Otherwise, they will have to bisect it and that is the ugly thing.
    > 
    > ‘IF’ there are issues with rte_eth_dev_info_get(), ‘in some modes’, with 
returning the real number
    > of hardware queues, then this can be characterized (meaning in which 
cases) and fixed. 
    > In such a case, the feedback to the user should be conservative and fall 
into ‘3’ above.
    >     
    >     
    >     >     This patch allows us to use the second value in all cases where 
possible.
    >     >     
    >     >     > 
    >     >     >     But situation will never be worse than without this patch 
anyway.
    >     >     >     
    >     >     >     About PMD mask changes:
    >     >     >     Even one change in a week can be disruptive in 
high-loaded NFV environment.
    >     >     >     Also, this changes can be performed automatically by 
monitoring tools while
    >     >     >     VMs migrations or rebalancing the load between PMD 
threads to free the CPU
    >     >     >     cores for other applications/VMs.
    >     >     >     
    >     >     >     > 
    >     >     >     >          /* The number of pmd threads might have 
changed, or a port can be new:
    >     >     >     >           * adjust the txqs. */
    >     >     >     >     @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct 
dp_netdev *dp)
    >     >     >     >      
    >     >     >     >          /* Check for all the ports that need 
reconfiguration.  We cache this in
    >     >     >     >           * 'port->reconfigure', because 
netdev_is_reconf_required() can change at
    >     >     >     >     -     * any time. */
    >     >     >     >     +     * any time.
    >     >     >     >     +     * Also mark for reconfiguration all ports 
which will likely change their
    >     >     >     >     +     * 'dynamic_txqs' parameter. It's required to 
stop using them before
    >     >     >     >     +     * changing this setting and it's simpler to 
mark ports here and allow
    >     >     >     >     +     * 'pmd_remove_stale_ports' to remove them 
from threads. There will be
    >     >     >     >     +     * no actual reconfiguration in 
'port_reconfigure' because it's
    >     >     >     >     +     * unnecessary.  */
    >     >     >     >          HMAP_FOR_EACH (port, node, &dp->ports) {
    >     >     >     >     -        if 
(netdev_is_reconf_required(port->netdev)) {
    >     >     >     >     +        if (netdev_is_reconf_required(port->netdev)
    >     >     >     >     +            || (port->dynamic_txqs
    >     >     >     >     +                !=  netdev_n_txq(port->netdev) < 
needed_txqs)) {
    >     >     >     >                  port->need_reconfigure = true;
    >     >     >     >              }
    >     >     >     >          }
    >     >     >     >     @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct 
dp_netdev *dp)
    >     >     >     >                  seq_change(dp->port_seq);
    >     >     >     >                  port_destroy(port);
    >     >     >     >              } else {
    >     >     >     >     -            port->dynamic_txqs = 
netdev_n_txq(port->netdev) < wanted_txqs;
    >     >     >     >     +            port->dynamic_txqs = 
netdev_n_txq(port->netdev) < needed_txqs;
    >     >     >     >              }
    >     >     >     >          }
    >     >     >     >      
    >     >     >     >     -- 
    >     >     >     >     2.7.4
    >     
    > 
    



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

Reply via email to