On 23.07.2017 20:20, Darrell Ball wrote:
> Comments inline but this is mostly repetitive.
>
> -----Original Message-----
> From: Ilya Maximets <[email protected]>
> Date: Monday, July 10, 2017 at 11:55 PM
> To: Darrell Ball <[email protected]>, "[email protected]"
> <[email protected]>
> Cc: Heetae Ahn <[email protected]>, Daniele Di Proietto
> <[email protected]>, Ben Pfaff <[email protected]>, Pravin Shelar
> <[email protected]>, Ciara Loftus <[email protected]>, Ian Stokes
> <[email protected]>, Kevin Traynor <[email protected]>
> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on
> pmd-cpu-mask changes.
>
> On 11.07.2017 05:10, Darrell Ball wrote:
> >
> >
> > On 7/10/17, 12:41 AM, "Ilya Maximets" <[email protected]> wrote:
> >
> > On 07.07.2017 21:09, Darrell Ball wrote:
> > >
> > >
> > > On 7/6/17, 11:11 PM, "Ilya Maximets" <[email protected]>
> wrote:
> > >
> > > On 07.07.2017 08:08, Darrell Ball wrote:
> > > >
> > > >
> > > > On 5/30/17, 7:12 AM, "Ilya Maximets"
> <[email protected]> wrote:
> > > >
> > > > Reconfiguration of HW NICs may lead to packet drops.
> > > > In current model all physical ports will be
> reconfigured each
> > > > time number of PMD threads changed. Since we not
> stopping
> > > > threads on pmd-cpu-mask changes, this patch will help
> to further
> > > > decrease port's downtime by setting the maximum
> possible number
> > > > of wanted tx queues to avoid unnecessary
> reconfigurations.
> > > >
> > > > Signed-off-by: Ilya Maximets <[email protected]>
> > > > ---
> > > > lib/dpif-netdev.c | 26 +++++++++++++++++++++-----
> > > > 1 file changed, 21 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > > index 596d133..79db770 100644
> > > > --- a/lib/dpif-netdev.c
> > > > +++ b/lib/dpif-netdev.c
> > > > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct
> dp_netdev *dp)
> > > > {
> > > > struct dp_netdev_pmd_thread *pmd;
> > > > struct dp_netdev_port *port;
> > > > - int wanted_txqs;
> > > > + int needed_txqs, wanted_txqs;
> > > >
> > > > dp->last_reconfigure_seq =
> seq_read(dp->reconfigure_seq);
> > > >
> > > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct
> dp_netdev *dp)
> > > > * on the system and the user configuration. */
> > > > reconfigure_pmd_threads(dp);
> > > >
> > > > - wanted_txqs = cmap_count(&dp->poll_threads);
> > > > + /* We need 1 Tx queue for each thread to avoid
> locking, but we will try
> > > > + * to allocate the maximum possible value to
> minimize the number of port
> > > > + * reconfigurations. */
> > > > + needed_txqs = cmap_count(&dp->poll_threads);
> > > > + /* (n_cores + 1) is the maximum that we might need
> to have.
> > > > + * Additional queue is for non-PMD threads. */
> > > > + wanted_txqs = ovs_numa_get_n_cores();
> > > > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
> > > > + wanted_txqs++;
> > > >
> > > > I don’t think PMD mask changes are common, so this patch is
> trying to optimize a
> > > > rare disruptive event that can/will be scheduled by the
> administrator.
> > > >
> > > > Based on the actual number of queues supported and the
> number of cores present,
> > > > this optimization may or may not work. It is unpredictable
> whether there will be benefit
> > > > in a particular case from the user POV.
> > > > If I were the administrator, I would try to error on the
> conservative side anyways if I could
> > > > not predict the result.
> > > >
> > > > Did I miss something ?
> > >
> > > In NFV environment if you want to add one more VM to your
> hosts you will have to
> > > choose between:
> > >
> > > * not creating a new PMD thread
> > > -> performance degradation of networking for
> other working VMs
> > >
> > >
> > > * adding of the new PMD thread
> > > -> desruption of the networking for the whole
> host for the time
> > > of OVS reconfiguration.
> > >
> > > This patch removes the cost of the second option.
> > >
> > > In general, adding a PMD thread per VM may not always (or even
> usually) make sense.
> >
> > Not per VM. Lets assume that all existing threads are already
> overloaded.
> >
> > That would be less common; it is not the usual case.
>
> We have different points of view. We need more opinions on that.
>
> >
> > > There are use cases for sure, but using a PMD core per VM is
> often viewed as dpdk using too much
> > > cpu resources and limiting the adoption of dpdk.
> > >
> > > Furthermore, for dpdk gateways it is irrelevant.
> >
> > Disagree with that statement.
> >
> > dpdk gateways should not usually require changing pmd-cpu-mask when a
> VM is added to hypervisor
> > somewhere. Sometimes you need to deploy more resources for sure, but
> this should be much less
> > frequent than adding or removing VMs.
> > The point is the user should be rarely changing the pmd-cpu-mask and
> this can be done during
> > a maintenance window when needed.
> >
> >
> > >
> > > I don't understand what you mean saying 'unpredictable
> benefit'. The benefit is clear
> > > and this optimization will work, except for HW NICs with very
> low number of HW queues.
> > >
> > > HW nic interfaces carry aggregated traffic for multiple VMs etc,
> so these cases are most important.
> >
> > And that is the point why this patch implemented. This patch tries
> to
> > minimize the packet drops on the "shared" between VMs HW NICs. It
> > allows to add new threads without breaking networking for others not
> > reconfiguring device that handles traffic for other VMs at this
> moment.
> >
> > I understand what this patch wants to do and I understand what the last
> designs wanted to
> > do as well.
> >
> >
> > > It is the ratio of the number of cores to hw queues that matter.
> If queues are less than
> > > cores, then the benefit may not occur.
> >
> > That's right. But are you saying that we should not apply
> optimization if
> > it covers only 50% of cases? (There is bigger percentage, actually,
> because
> > modern NICs supports pretty much tx queues)
> >
> > Nobody knows what the number is, but I know that the number of cpus is
> increasing
> > and making assumptions about the relative numbers of cores vs queues
> can only work by luck.
>
> As you mentioned previously, you think that creating large number of PMD
> threads is not a common case (this follows from the "adding a PMD thread
> per
> VM may not always (or even usually) make sense."). This means that
> optimization
> from that patch will usually work because number of PMD threads is much
> less
> than number of cores and usually will be comparable or less than number
> of queues.
>
> Again, what I previously said is that I don’t expect each VM to have a
> dedicated PMD thread in the general case.
> The reasoning comes the cost based on utilization. In general, the high
> reservation of cores for PMD threads
> is persistently mentioned as a major impediment for DPDK adoption.
>
> For DPDK gateways, I would expect anything is possible including most cores
> having PMD threads running.
>
> Again, in general, I am not making any assumptions on the ratio of cores to
> queues, because I think these assumptions
> will break. The two parameters should be treated as independent parameters,
> which they are.
> This patch makes assumptions about the ratio of these two independent
> parameters.
>
>
> And this will work now, not in some ephemeral future where DPDK will
> return real
> numbers for available queues. In the future this maybe will work too,
> because not
> only number of CPU cores increasing.
>
> Again, the issue here is not whether it will work in some cases. The issues
> here is to give
> feedback to the user when it might work. Changing the pmd-cpu-mask is an
> infrequent administrative
> command used on a ‘non-experimental product’, so the user should have
> feedback on what
> would to be the result of the command.
> I might be ok with this patch if OVS-DPDK were still labelled as
> experimental, but I would not
> want to go back to the experimental tag if possible.
>
> > It is inherently unpredictable and unpredictable for a user
> initiated/controlled change is ugly.
> > This is even worse when the user initiated change is rare because the
> user does not remember
> > the full context of the command.
> >
> >
> > > This patch makes assumptions about the relative number of cores
> vs the number of queues.
> > > These two parameters should be treated as independent – this part
> really bothers me.
> >
> > It doesn't make any assumptions, it just optimizes one of 2
> possible situations.
> > If the number of queues will be less, the behaviour will be exactly
> same as in
> > current master.
> >
> > We are repeating the same thing.
> >
> >
> > > I think we could solve this in a coordinated generic fashion with
> a user management command
> > > that would also give feedback to the user as to the possible
> success and maybe provide options.
> > > I wonder if we will eventually go in that direction, in the
> future, anyways ?
> >
> > I don't think so. We don't need to have user management command
> because the
> > only sane values for the number of queues right now is the number
> of threads
> > or the maximum number of available queues in hardware. We're using
> the
> > first one only because HW is not able to say how many queues it
> able to
> > successfully configure.
> >
> > The fact that hardware cannot always report the correct number of
> queues is an issue
> > that will get solved over time and should be part of what a user
> command reports
> > (eg) “unknown queue number, therefore it is unknown whether a
> pmd-cpu-mask change
> > will be hitless”, issue the following command ‘X’ when you would like
> to make the change”).
>
> Reconfiguration will never be hitless until we're not even trying to make
> it more lightweight.
>
> >
> > A wrapper user command(s) would provide visibility and predictability
> to the user.
> > i.e. we can do better.
>
> *Why we need the configuration knob which can only decrease the
> performance?*
>
> This will end up in situation where users will just be forced to configure
> number of tx queues for each port equal to number of cores in cpu mask.
> Or they will just set up it to 1000 for all the ports to be sure that
> maximum
> possible value was configured.
>
> The key aspect here is about giving feedback to user about the result of a
> potential
> pmd-cpu-mask change. I think it can done with the results being one of three
> possibilities.
> 1/ Command will be hitless.
> 2/ Command will not be hitless.
> 3/ Command hitless impact is unknown; assume traffic will be affected.
>
> This should be supported by documentation.
>
> With this patch as is, the result is unknown in all cases.
How about following incremental:
--8<----------------------------------------------------------------------->8--
diff --git a/Documentation/intro/install/dpdk.rst
b/Documentation/intro/install/dpdk.rst
index a05aa1a..0462784 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -458,8 +458,19 @@ affinitized accordingly.
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
.. note::
- pmd thread on a NUMA node is only created if there is at least one DPDK
- interface from that NUMA node added to OVS.
+ - pmd thread on a NUMA node is only created if there is at least one DPDK
+ interface from that NUMA node added to OVS.
+
+ - Addition/deletion of PMD threads may be hitless for Rx queues pinned via
+ ``pmd-rxq-affinity`` if the threads being deleted does not handle them
+ and ports has only one or big enough number of Tx queues. If device's
+ performance could be affected during ``pmd-cpu-mask`` changes, a warning
+ appears in the log: ::
+
+ Port 'port' has low number of Tx queues. Traffic through this port
+ may be affected during reconfiguration if the number of PMD threads
+ is changed to { less than or equal to | greater than } N.
+
- QEMU vCPU thread Affinity
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 74d3535..2a92d86 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3531,7 +3531,19 @@ reconfigure_datapath(struct dp_netdev *dp)
seq_change(dp->port_seq);
port_destroy(port);
} else {
- port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
+ int n_txq = netdev_n_txq(port->netdev);
+
+ port->dynamic_txqs = n_txq < needed_txqs;
+
+ if (n_txq > 1 && n_txq < wanted_txqs) {
+ VLOG_WARN("Port '%s' has low number of Tx queues. "
+ "Traffic through this port may be affected during "
+ "reconfiguration if the number of PMD threads is "
+ "changed to %s %d.", netdev_get_name(port->netdev),
+ port->dynamic_txqs ? "less than or equal to"
+ : "greater than",
+ n_txq - 1);
+ }
}
}
--8<----------------------------------------------------------------------->8--
This will give an administrator all the required information to predict
performance impact of pmd-cpu-mask changes.
>
> And this option will have to configure maximum possible value less or
> equal
> to passed one because we have no way to find the maximum value until dpdk
> not
> fixed (which is actually the hard problem).
> Otherwise, they will have to bisect it and that is the ugly thing.
>
> ‘IF’ there are issues with rte_eth_dev_info_get(), ‘in some modes’, with
> returning the real number
> of hardware queues, then this can be characterized (meaning in which cases)
> and fixed.
> In such a case, the feedback to the user should be conservative and fall into
> ‘3’ above.
>
>
> > This patch allows us to use the second value in all cases where
> possible.
> >
> > >
> > > But situation will never be worse than without this patch
> anyway.
> > >
> > > About PMD mask changes:
> > > Even one change in a week can be disruptive in high-loaded
> NFV environment.
> > > Also, this changes can be performed automatically by
> monitoring tools while
> > > VMs migrations or rebalancing the load between PMD threads to
> free the CPU
> > > cores for other applications/VMs.
> > >
> > > >
> > > > /* The number of pmd threads might have changed,
> or a port can be new:
> > > > * adjust the txqs. */
> > > > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct
> dp_netdev *dp)
> > > >
> > > > /* Check for all the ports that need
> reconfiguration. We cache this in
> > > > * 'port->reconfigure', because
> netdev_is_reconf_required() can change at
> > > > - * any time. */
> > > > + * any time.
> > > > + * Also mark for reconfiguration all ports which
> will likely change their
> > > > + * 'dynamic_txqs' parameter. It's required to stop
> using them before
> > > > + * changing this setting and it's simpler to mark
> ports here and allow
> > > > + * 'pmd_remove_stale_ports' to remove them from
> threads. There will be
> > > > + * no actual reconfiguration in 'port_reconfigure'
> because it's
> > > > + * unnecessary. */
> > > > HMAP_FOR_EACH (port, node, &dp->ports) {
> > > > - if (netdev_is_reconf_required(port->netdev)) {
> > > > + if (netdev_is_reconf_required(port->netdev)
> > > > + || (port->dynamic_txqs
> > > > + != netdev_n_txq(port->netdev) <
> needed_txqs)) {
> > > > port->need_reconfigure = true;
> > > > }
> > > > }
> > > > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct
> dp_netdev *dp)
> > > > seq_change(dp->port_seq);
> > > > port_destroy(port);
> > > > } else {
> > > > - port->dynamic_txqs =
> netdev_n_txq(port->netdev) < wanted_txqs;
> > > > + port->dynamic_txqs =
> netdev_n_txq(port->netdev) < needed_txqs;
> > > > }
> > > > }
> > > >
> > > > --
> > > > 2.7.4
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev