Hi Ilya,
I think the root cause for the issues mentioned below is the usage of 
pmd->dp->dpif->dpif_class
rather than pmd->dp->class. Please see a fixing patch 
https://patchwork.ozlabs.org/patch/1205684/
("dpif-netdev: Retrieve dpif_class from struct dp_netdev ").

When running dpctl statistics there is a call to dpif_open(). As part of this 
call dp->dpif is assigned a new dpif pointer value.
However pmd->dp->dpif should assume that dp->dpif is fixed throughout its life 
time. 
Hence it is better to access the dpif_class through fixed pointers that do not 
change (pmd->dp->class).

With the fixing patch Eli issues are fixes.
Also when I updated your RFCs series to use pmd->dp->class - there is no more 
crash and everything works fine.
Ilya - can you please update the RFC series and send them as V1 patches? 
I will test them again and hopefully will be able to ACK them.

Regards,
Ophir

> -----Original Message-----
> From: Eli Britstein <[email protected]>
> Sent: Sunday, December 8, 2019 2:29 PM
> To: Ilya Maximets <[email protected]>; Ophir Munk
> <[email protected]>; [email protected]
> Cc: Roni Bar Yanai <[email protected]>; Simon Horman
> <[email protected]>; Ameer Mahagneh
> <[email protected]>
> Subject: Re: [RFC v3 0/4] netdev-offload: Prerequisites of vport offloading 
> via
> DPDK.
> 
> 
> On 12/6/2019 4:34 PM, Ilya Maximets wrote:
> > On 06.12.2019 14:09, Ophir Munk wrote:
> >> Hi Ilya,
> >> I applied this series on master ("dpdk: Update to use DPDK 19.11.") and
> PINGed between two "dpdk" ports (hw-offload=true).
> >> It worked fine.
> >> Then I watched flow statistics (dpif based) by executing the command in
> (1):
> >> (1) watch ovs-appctl dpif/dump-flows -m <bridge name> It worked fine.
> >> Then I watched flow statistics (dpctl based) by executing the command in
> (2):
> >> (2) watch ovs-appctl dpctl/dump-flows In this case OVS crashed after
> >> a few seconds.
> >>
> >> When inspecting the calls trace I notice that pmd->dp->dpif->dpif_class-
> >type is a corrupted memory address.
> >>
> >> (gdb) bt
> >> #0  0x0000000000c26fda in dpif_normalize_type (type=0x7372 <Address
> >> 0x7372 out of bounds>) at lib/dpif.c:517
> >> #1  0x0000000000c19864 in dp_netdev_flow_offload_put
> >> (offload=offload@entry=0x7f11fc00b790) at lib/dpif-netdev.c:2378
> >> #2  0x0000000000c19ca8 in dp_netdev_flow_offload_main
> >> (data=<optimized out>) at lib/dpif-netdev.c:2467
> >> #3  0x0000000000ca3c9d in ovsthread_wrapper (aux_=<optimized out>)
> at
> >> lib/ovs-thread.c:383
> >> #4  0x00007f12a2a08e25 in start_thread () from /lib64/libpthread.so.0
> >> #5  0x00007f12a222c34d in clone () from /lib64/libc.so.6
> >>
> >> This scenario repeats whenever running the command in (2) and when
> the code calls dpif_normalize_type().
> >> It seems to me that the memory corruption was there for a while but only
> now it is exposed due to your rfc series.
> >>
> >> In order to observe this memory corruption I added the following
> printouts.
> >> I tested it with hw-offload=false:
> >> ovs-vsctl set Open_vSwitch . other_config:hw-offload=false I tested
> >> it on latest master without your rfc series.
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >> 1e54936..18a804a 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -3625,6 +3625,10 @@ dpif_netdev_flow_dump_next(struct
> dpif_flow_dump_thread *thread_,
> >>               }
> >>           }
> >>
> >> +        VLOG_ERR("pmd->dp=%p pmd->dp->dpif=%p pmd->dp->dpif-
> >dpif_class=%p \
> >> +                 pmd->dp->dpif->full_name=%s",
> >> +                 pmd->dp, pmd->dp->dpif, pmd->dp->dpif->dpif_class,
> >> +                 pmd->dp->dpif->full_name);
> >>           do {
> >>               for (n_flows = 0; n_flows < flow_limit; n_flows++) {
> >>                   struct cmap_node *node;
> >>
> >>
> >> Looking at the printouts the memory corruption is observed.
> >>
> >> Initially all printouts are identical and correct as expected.
> >>
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80
> >> pmd->dp->dpif->full_name=netdev@ovs-netdev
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80
> >> pmd->dp->dpif->full_name=netdev@ovs-netdev
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80
> >> pmd->dp->dpif->full_name=netdev@ovs-netdev
> >>
> >> Around this point I executed the dpctl command in (2) and you can notice
> that the pointers pmd->dp->dp_dpif and beyond became modified.
> >>
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x251c790
> >> pmd->dp->dpif->full_name=(null)
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x33c03608
> >> pmd->dp->dpif->full_name=
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x2605700
> >> pmd->dp->dpif->full_name=
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0xe784250b
> >> pmd->dp->dpif->full_name=memseg-2048k-0-3
> >> dpif_netdev|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430
> >> pmd->dp->dpif->dpif_class=0xef3e80
> >> pmd->dp->dpif->full_name=netdev@ovs-netdev
> >> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010
> >> pmd->dp->dpif=0x2602430 pmd->dp->dpif->dpif_class=0x2606200
> >> pmd->dp->dpif->full_name=(null)
> >>
> >> The same happens if I test it with hw-offload=true.
> >>
> >> I hope this scenario can be recreated on other setups as well.
> > This is interesting.  I'll try to reproduce this issue on my setup.
> 
> Hi Ilya,
> 
> I see that this bug is not only with this new patch-set, but now also came 
> into
> effect in master, with acceptance of
> 
> 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while
> offloading.")
> 
> -        port = dp_netdev_lookup_port(pmd->dp, in_port);
> +        port = netdev_ports_get(in_port, pmd->dp->dpif->dpif_class);
> As of the corruption, the "port" is not retrieved by netdev_ports_get, and
> the offloading is not done. I have reverted this commits in my tests.
> 
> Though this commit is correct (but should not have any effect before vport
> offloading), maybe we should revert it until this corruption bug is fixed.
> 
> Thanks,
> 
> Eli
> 
> >
> >> I will look more into it.
> > Thanks!
> >
> > Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to