On Wed, Mar 25, 2020 at 06:17:14PM +0000, Kevin Traynor wrote:
> On 24/03/2020 23:50, Ilya Maximets wrote:
> > In case number of polling threads goes from exact number of Tx queues
> > in port to higher value while set_tx_multiq() not implemented or not
> > requesting reconfiguration, port will not be reconfigured and datapath
> > will continue using static Tx queue ids leading to crash.
> >
> > Ex.:
> > Assuming that port p0 supports up to 4 Tx queues and doesn't support
> > set_tx_multiq() method. For example, netdev-afxdp could be the case,
> > because it could have multiple Tx queues, but doesn't have
> > set_tx_multiq() implementation because number of Tx queues always
> > equals to number of Rx queues.
> >
> > 1. Configuring pmd-cpu-mask to have 3 pmd threads.
> >
> > 2. Adding port p0 to OVS.
> > At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
> > Port reconfigured to have 4 Tx queues successfully.
> > dynamic_txqs = (4 < 4) = false;
> >
> > 3. Configuring pmd-cpu-mask to have 10 pmd threads.
> > At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
> > Since set_tx_multiq() is not implemented, netdev doesn't request
> > reconfiguration and 'dynamic_txqs' remains in 'false' state.
> >
> > 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
> > ids that are in range [0, 10] while device only supports 4 leading
> > to unwanted behavior and crashes.
> >
> > Fix that by marking for reconfiguration all the ports that will likely
> > change their 'dynamic_txqs' value.
> >
> > It looks like the issue could be reproduced only with afxdp ports,
> > because all other non-dpdk ports ignores Tx queue ids and dpdk ports
> > requests for reconfiguration on set_tx_multiq().
> >
> > Reported-by: William Tu <[email protected]>
> > Reported-at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
> > Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling
> > code.")
> > Signed-off-by: Ilya Maximets <[email protected]>
> > ---
> >
> > Since the issue could be reproduced only with afxdp ports, this should
> > be backported only down to 2.12.
> >
> > lib/dpif-netdev.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index a798db45d..e456cc9be 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> >
> > /* Check for all the ports that need reconfiguration. We cache this in
> > * 'port->need_reconfigure', because netdev_is_reconf_required() can
> > - * change at any time. */
> > + * change at any time.
> > + * Also mark for reconfiguration all ports which will likely change
> > their
> > + * 'dynamic_txqs' parameter. It's required to stop using them before
> > + * changing this setting and it's simpler to mark ports here and allow
> > + * 'pmd_remove_stale_ports' to remove them from threads. There will be
> > + * no actual reconfiguration in 'port_reconfigure' because it's
> > + * unnecessary. */
> > HMAP_FOR_EACH (port, node, &dp->ports) {
> > - if (netdev_is_reconf_required(port->netdev)) {
> > + if (netdev_is_reconf_required(port->netdev)
> > + || (port->dynamic_txqs
> > + != (netdev_n_txq(port->netdev) < wanted_txqs))) {
> > port->need_reconfigure = true;
> > }
> > }
> >
>
> LGTM
> Acked-by: Kevin Traynor <[email protected]>
>
Thanks!
Applied to master, 2.12, 2.13.
William
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev