Hi Ilya,
I applied this series on master ("dpdk: Update to use DPDK 19.11.") and PINGed 
between two "dpdk" ports (hw-offload=true).
It worked fine.
Then I watched flow statistics (dpif based) by executing the command in (1):
(1) watch ovs-appctl dpif/dump-flows -m <bridge name>
It worked fine.
Then I watched flow statistics (dpctl based) by executing the command in (2):
(2) watch ovs-appctl dpctl/dump-flows
In this case OVS crashed after a few seconds.

When inspecting the calls trace I notice that pmd->dp->dpif->dpif_class->type 
is a corrupted memory address.

(gdb) bt
#0  0x0000000000c26fda in dpif_normalize_type (type=0x7372 <Address 0x7372 out 
of bounds>) at lib/dpif.c:517
#1  0x0000000000c19864 in dp_netdev_flow_offload_put 
(offload=offload@entry=0x7f11fc00b790) at lib/dpif-netdev.c:2378
#2  0x0000000000c19ca8 in dp_netdev_flow_offload_main (data=<optimized out>) at 
lib/dpif-netdev.c:2467
#3  0x0000000000ca3c9d in ovsthread_wrapper (aux_=<optimized out>) at 
lib/ovs-thread.c:383
#4  0x00007f12a2a08e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f12a222c34d in clone () from /lib64/libc.so.6

This scenario repeats whenever running the command in (2) and when the code 
calls dpif_normalize_type().
It seems to me that the memory corruption was there for a while but only now it 
is exposed due to your rfc series.

In order to observe this memory corruption I added the following printouts.
I tested it with hw-offload=false: 
ovs-vsctl set Open_vSwitch . other_config:hw-offload=false
I tested it on latest master without your rfc series.

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1e54936..18a804a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3625,6 +3625,10 @@ dpif_netdev_flow_dump_next(struct dpif_flow_dump_thread 
*thread_,
             }
         }
 
+        VLOG_ERR("pmd->dp=%p pmd->dp->dpif=%p pmd->dp->dpif->dpif_class=%p \
+                 pmd->dp->dpif->full_name=%s",
+                 pmd->dp, pmd->dp->dpif, pmd->dp->dpif->dpif_class,
+                 pmd->dp->dpif->full_name);
         do {
             for (n_flows = 0; n_flows < flow_limit; n_flows++) {
                 struct cmap_node *node;


Looking at the printouts the memory corruption is observed.

Initially all printouts are identical and correct as expected.

dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 
pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev@ovs-netdev
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 
pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev@ovs-netdev
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 
pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev@ovs-netdev

Around this point I executed the dpctl command in (2) and you can notice that 
the pointers pmd->dp->dp_dpif and beyond became modified.

dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 
pmd->dp->dpif->dpif_class=0x251c790 pmd->dp->dpif->full_name=(null)
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 
pmd->dp->dpif->dpif_class=0x33c03608 pmd->dp->dpif->full_name=
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 
pmd->dp->dpif->dpif_class=0x2605700 pmd->dp->dpif->full_name=
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 
pmd->dp->dpif->dpif_class=0xe784250b pmd->dp->dpif->full_name=memseg-2048k-0-3
dpif_netdev|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430 
pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev@ovs-netdev
dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430 
pmd->dp->dpif->dpif_class=0x2606200 pmd->dp->dpif->full_name=(null)

The same happens if I test it with hw-offload=true.

I hope this scenario can be recreated on other setups as well.
I will look more into it.
Any insights on this issue would be appreciated.

Regards,
Ophir

> -----Original Message-----
> From: Ilya Maximets <[email protected]>
> Sent: Tuesday, December 3, 2019 1:13 PM
> To: [email protected]
> Cc: Ophir Munk <[email protected]>; Roni Bar Yanai
> <[email protected]>; Simon Horman
> <[email protected]>; Ilya Maximets <[email protected]>
> Subject: [RFC v3 0/4] netdev-offload: Prerequisites of vport offloading via
> DPDK.
> 
> These patches are necessary to enable vxlan offloading for userspace
> datapath via DPDK flow offloading API (not tested).
> They allows to properly distinguish userspace tunnels from the system
> tunnels on netdev level in order to choose appropriate flow offloading API
> provider.  Before these patches it was not possible because tunneling ports
> are implemented by the same netdev-vport classes and uses same dpif port
> names, so only ofproto really knows which netdevs assigned to each dpif.
> 
> RFC v3:
>   * Rebase on current master.
> 
> RFC v2:
>   * Added 2 patches for using dpif type.
>   * Last patch updated to use netdev dpif_type instead of traversing
>     dpif ports.
> 
> 
> Ilya Maximets (4):
>   netdev: Allow storing dpif type into netdev structure.
>   netdev-offload: Use dpif type instead of class.
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
> 
>  lib/dpif-netdev.c             | 13 +++----
>  lib/dpif-netlink.c            | 23 ++++++------
>  lib/dpif.c                    | 21 ++++++-----
>  lib/netdev-offload-dpdk.c     |  8 +++++
>  lib/netdev-offload-tc.c       | 12 +++++--
>  lib/netdev-offload.c          | 68 ++++++++++++++++++-----------------
>  lib/netdev-offload.h          | 16 ++++-----
>  lib/netdev-provider.h         |  3 +-
>  lib/netdev.c                  | 16 +++++++++
>  lib/netdev.h                  |  2 ++
>  ofproto/ofproto-dpif-upcall.c |  5 ++-
>  11 files changed, 114 insertions(+), 73 deletions(-)
> 
> --
> 2.17.1

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to