Hi Ilya, I think the root cause for the issues mentioned below is the usage of pmd->dp->dpif->dpif_class rather than pmd->dp->class. Please see a fixing patch https://patchwork.ozlabs.org/patch/1205684/ ("dpif-netdev: Retrieve dpif_class from struct dp_netdev ").
When running dpctl statistics there is a call to dpif_open(). As part of this call dp->dpif is assigned a new dpif pointer value. However pmd->dp->dpif should assume that dp->dpif is fixed throughout its life time. Hence it is better to access the dpif_class through fixed pointers that do not change (pmd->dp->class). With the fixing patch Eli issues are fixes. Also when I updated your RFCs series to use pmd->dp->class - there is no more crash and everything works fine. Ilya - can you please update the RFC series and send them as V1 patches? I will test them again and hopefully will be able to ACK them. Regards, Ophir > -----Original Message----- > From: Eli Britstein <[email protected]> > Sent: Sunday, December 8, 2019 2:29 PM > To: Ilya Maximets <[email protected]>; Ophir Munk > <[email protected]>; [email protected] > Cc: Roni Bar Yanai <[email protected]>; Simon Horman > <[email protected]>; Ameer Mahagneh > <[email protected]> > Subject: Re: [RFC v3 0/4] netdev-offload: Prerequisites of vport offloading > via > DPDK. > > > On 12/6/2019 4:34 PM, Ilya Maximets wrote: > > On 06.12.2019 14:09, Ophir Munk wrote: > >> Hi Ilya, > >> I applied this series on master ("dpdk: Update to use DPDK 19.11.") and > PINGed between two "dpdk" ports (hw-offload=true). > >> It worked fine. > >> Then I watched flow statistics (dpif based) by executing the command in > (1): > >> (1) watch ovs-appctl dpif/dump-flows -m <bridge name> It worked fine. > >> Then I watched flow statistics (dpctl based) by executing the command in > (2): > >> (2) watch ovs-appctl dpctl/dump-flows In this case OVS crashed after > >> a few seconds. > >> > >> When inspecting the calls trace I notice that pmd->dp->dpif->dpif_class- > >type is a corrupted memory address. > >> > >> (gdb) bt > >> #0 0x0000000000c26fda in dpif_normalize_type (type=0x7372 <Address > >> 0x7372 out of bounds>) at lib/dpif.c:517 > >> #1 0x0000000000c19864 in dp_netdev_flow_offload_put > >> (offload=offload@entry=0x7f11fc00b790) at lib/dpif-netdev.c:2378 > >> #2 0x0000000000c19ca8 in dp_netdev_flow_offload_main > >> (data=<optimized out>) at lib/dpif-netdev.c:2467 > >> #3 0x0000000000ca3c9d in ovsthread_wrapper (aux_=<optimized out>) > at > >> lib/ovs-thread.c:383 > >> #4 0x00007f12a2a08e25 in start_thread () from /lib64/libpthread.so.0 > >> #5 0x00007f12a222c34d in clone () from /lib64/libc.so.6 > >> > >> This scenario repeats whenever running the command in (2) and when > the code calls dpif_normalize_type(). > >> It seems to me that the memory corruption was there for a while but only > now it is exposed due to your rfc series. > >> > >> In order to observe this memory corruption I added the following > printouts. > >> I tested it with hw-offload=false: > >> ovs-vsctl set Open_vSwitch . other_config:hw-offload=false I tested > >> it on latest master without your rfc series. > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > >> 1e54936..18a804a 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -3625,6 +3625,10 @@ dpif_netdev_flow_dump_next(struct > dpif_flow_dump_thread *thread_, > >> } > >> } > >> > >> + VLOG_ERR("pmd->dp=%p pmd->dp->dpif=%p pmd->dp->dpif- > >dpif_class=%p \ > >> + pmd->dp->dpif->full_name=%s", > >> + pmd->dp, pmd->dp->dpif, pmd->dp->dpif->dpif_class, > >> + pmd->dp->dpif->full_name); > >> do { > >> for (n_flows = 0; n_flows < flow_limit; n_flows++) { > >> struct cmap_node *node; > >> > >> > >> Looking at the printouts the memory corruption is observed. > >> > >> Initially all printouts are identical and correct as expected. > >> > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 > >> pmd->dp->dpif->full_name=netdev@ovs-netdev > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 > >> pmd->dp->dpif->full_name=netdev@ovs-netdev > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 > >> pmd->dp->dpif->full_name=netdev@ovs-netdev > >> > >> Around this point I executed the dpctl command in (2) and you can notice > that the pointers pmd->dp->dp_dpif and beyond became modified. > >> > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x251c790 > >> pmd->dp->dpif->full_name=(null) > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x33c03608 > >> pmd->dp->dpif->full_name= > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x2605700 > >> pmd->dp->dpif->full_name= > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0xe784250b > >> pmd->dp->dpif->full_name=memseg-2048k-0-3 > >> dpif_netdev|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430 > >> pmd->dp->dpif->dpif_class=0xef3e80 > >> pmd->dp->dpif->full_name=netdev@ovs-netdev > >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 > >> pmd->dp->dpif=0x2602430 pmd->dp->dpif->dpif_class=0x2606200 > >> pmd->dp->dpif->full_name=(null) > >> > >> The same happens if I test it with hw-offload=true. > >> > >> I hope this scenario can be recreated on other setups as well. > > This is interesting. I'll try to reproduce this issue on my setup. > > Hi Ilya, > > I see that this bug is not only with this new patch-set, but now also came > into > effect in master, with acceptance of > > 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while > offloading.") > > - port = dp_netdev_lookup_port(pmd->dp, in_port); > + port = netdev_ports_get(in_port, pmd->dp->dpif->dpif_class); > As of the corruption, the "port" is not retrieved by netdev_ports_get, and > the offloading is not done. I have reverted this commits in my tests. > > Though this commit is correct (but should not have any effect before vport > offloading), maybe we should revert it until this corruption bug is fixed. > > Thanks, > > Eli > > > > >> I will look more into it. > > Thanks! > > > > Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
