Hi Ben/Ilya
        Mainly patch do the following
1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP
2)  Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in updating 
the flags   


This  fixes the core dump issue but I see below side effect with this patch:
With this change now only ports admin state(ovs netdev level) is changed 
(dpdk/nic level state is not changed) so NIC will continue to receive the 
packet until its Rxq ring is full
Later if somebody bring up the port then he will get the burst of old packets 
which are there in the Rxq (typically 4k packet (dpdk Rxq size)).


Thanks
Keshav
   
 

-----Original Message-----
From: Ben Pfaff [mailto:[email protected]] 
Sent: Monday, July 31, 2017 9:26 PM
To: Ilya Maximets
Cc: [email protected]; Keshav Gupta
Subject: Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port 
using "ovs-ofctl mod-port br-prv dpdk1 down"

Thanks.

It doesn't cherry-pick cleanly to branch-2.6, so I think I'll leave it for now 
since Darrell made the reasonable point that DPDK was experimental in 2.6.

On Mon, Jul 31, 2017 at 06:20:28PM +0300, Ilya Maximets wrote:
> On 31.07.2017 16:05, Ben Pfaff wrote:
> > Ilya, should we apply this patch to branch-2.6?  Are there other 
> > patches that should be backported?
> 
> It's definitely a bug and should be backported if someone wants to use 
> barnch-2.6 with DPDK datapath. I traced only this exact case, so it's 
> hard to say if something else should be backported, but this patch 
> should fix described issue without  any additional changes.
>  
> > On Wed, Jul 26, 2017 at 03:28:12PM +0300, Ilya Maximets wrote:
> >> Hi.
> >>
> >> You need to backport at least following patch:
> >>
> >> commit 3b1fb0779b87788968c1a6a9ff295a9883547485
> >> Author: Daniele Di Proietto <[email protected]>
> >> Date:   Tue Nov 15 15:40:49 2016 -0800
> >>
> >>     netdev-dpdk: Don't call rte_dev_stop() in update_flags().
> >>     
> >>     Calling rte_eth_dev_stop() while the device is running causes a crash.
> >>     
> >>     We could use rte_eth_dev_set_link_down(), but not every PMD implements
> >>     that, and I found one NIC where that has no effect.
> >>     
> >>     Instead, this commit checks if the device has the NETDEV_UP flag when
> >>     transmitting or receiving (similarly to what we do for vhostuser). I
> >>     didn't notice any performance difference with this check in case the
> >>     device is up.
> >>     
> >>     An alternative would be to remove the device queues from the pmd 
> >> threads
> >>     tx and receive cache, but that requires reconfiguration and I'd prefer
> >>     to avoid it, because the change can come from OpenFlow.
> >>     
> >>     Signed-off-by: Daniele Di Proietto <[email protected]>
> >>     Acked-by: Ilya Maximets <[email protected]>
> >>
> >> This should fix your issue.
> >> In general, I'm suggesting to use stable 2.7 OVS, there was too 
> >> many DPDK related changes including stability fixes since 2.6.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >>> Hi
> >>>   We are experiencing a openvswitch crash when bringing down the dpdk 
> >>> bond port using "ovs-ofctl mod-port br-prv dpdk1 down".
> >>>
> >>> backtrace of core is like below. Is there any issue reported earlier  for 
> >>> this type of crash in openvswitch community.
> >>>
> >>> (gdb) bt
> >>> #0  ixgbe_rxq_rearm (rxq=0x7fa45061f800) at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:98
> >>> #1  _recv_raw_pkts_vec (split_packet=0x0, nb_pkts=32, rx_pkts=<optimized 
> >>> out>, rxq=0x7fa45061f800)
> >>>     at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:290
> >>> #2  ixgbe_recv_pkts_vec (rx_queue=0x7fa45061f800, rx_pkts=<optimized 
> >>> out>, nb_pkts=<optimized out>)
> >>>     at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:474
> >>> #3  0x000000e5000000e4 in ?? ()
> >>> #4  0x00000046000000e6 in ?? ()
> >>> #5  0x0000006a00000069 in ?? ()
> >>> #6  0x0000006c0000006b in ?? ()
> >>> #7  0x000000ec0000006d in ?? ()
> >>> #8  0x000000ee000000ed in ?? ()
> >>> #9  0x00000001537f5780 in ?? ()
> >>> #10 0x0000000000000000 in ?? ()
> >>> (gdb)
> >>>
> >>>
> >>> I have analyzed the core and it seems it is a result of device 
> >>> stop and packet receive from the port happening at same time by 
> >>> two thread OVS main thread(device stop) and PMD thread(pkt receive). More 
> >>> precisely main thread cleaning the packet buffer from rxq sw_ring to 
> >>> avoid the packet buffer leak while in parallel PMD thread is filling the 
> >>> packet buffer in sw_ring/descriptor ring as part of ixgbe_recv_pkts_vec.
> >>>
> >>> version used is: openvswitch (2.6.1) with dpdk (16.11).
> >>>
> >>> This crash is not every time reproducible but frequency seems to be high.
> >>>
> >>> I am new to openvswitch community and this is first time I am posting a 
> >>> query. let me know if anything you require from my side.
> >>>
> >>> Thanks
> >>> Keshav
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> > 
> > 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to