Thanks for your feedback. This means that recent kernel changes might have fixed the issue. I will see if there is any generic solution for the issue.
On Thu, Jun 14, 2018 at 8:09 AM, Pravin Shelar <[email protected]> wrote: > On Wed, Jun 13, 2018 at 5:16 AM, Neelakantam Gaddam > <[email protected]> wrote: > > It should be Qdisc's busylock in dev_queue_xmit function. I have seen the > > issue in kernel version is 3.10.87 vanilla. > > > > I see, This looks like general problem which exist in upstream kernel. > You would be able to reproduce it even with linux bridge and vxlan > device. > Proposed patch is specific solution. You can add another layer of > bridge and the patch would not handle same issue in that > configuration. Therefore I am bit hesitant to apply this patch. > > > > > > > On Wed, Jun 13, 2018 at 12:21 PM, Pravin Shelar <[email protected]> wrote: > >> > >> On Tue, Jun 12, 2018 at 10:58 PM, Neelakantam Gaddam > >> <[email protected]> wrote: > >> > Hi Pravin, > >> > > >> > I have seen the below crash. > >> > > >> > [<ffffffff80864cd4>] show_stack+0x6c/0xf8 > >> > > >> > [<ffffffff80ad1628>] do_raw_spin_lock+0x168/0x170 > >> > > >> > [<ffffffff80bf7b1c>] dev_queue_xmit+0x43c/0x470 > >> > > >> > [<ffffffff80c32c08>] ip_finish_output+0x250/0x490 > >> > > >> > [<ffffffffc0115664>] rpl_iptunnel_xmit+0x134/0x218 [openvswitch] > >> > > >> > [<ffffffffc0120f28>] rpl_vxlan_xmit+0x430/0x538 [openvswitch] > >> > > >> > [<ffffffffc00f9de0>] do_execute_actions+0x18f8/0x19e8 [openvswitch] > >> > > >> > [<ffffffffc00fa2b0>] ovs_execute_actions+0x90/0x208 [openvswitch] > >> > > >> > [<ffffffffc0101860>] ovs_dp_process_packet+0xb0/0x1a8 [openvswitch] > >> > > >> > [<ffffffffc010c5d8>] ovs_vport_receive+0x78/0x130 [openvswitch] > >> > > >> > [<ffffffffc010ce6c>] internal_dev_xmit+0x34/0x98 [openvswitch] > >> > > >> > [<ffffffff80bf74d0>] dev_hard_start_xmit+0x2e8/0x4f8 > >> > > >> > [<ffffffff80c10e48>] sch_direct_xmit+0xf0/0x238 > >> > > >> > [<ffffffff80bf78b8>] dev_queue_xmit+0x1d8/0x470 > >> > > >> > [<ffffffff80c5ffe4>] arp_process+0x614/0x628 > >> > > >> > [<ffffffff80bf0cb0>] __netif_receive_skb_core+0x2e8/0x5d8 > >> > > >> > [<ffffffff80bf4770>] process_backlog+0xc0/0x1b0 > >> > > >> > [<ffffffff80bf501c>] net_rx_action+0x154/0x240 > >> > > >> > [<ffffffff8088d130>] __do_softirq+0x1d0/0x218 > >> > > >> > [<ffffffff8088d240>] do_softirq+0x68/0x70 > >> > > >> > [<ffffffff8088d3a0>] local_bh_enable+0xa8/0xb0 > >> > > >> > [<ffffffff80bf5c88>] netif_rx_ni+0x20/0x30 > >> > > >> > > >> > > >> > > >> > I have spent some time in investigation and found that crash is > because > >> > of > >> > spinlock recursion in dev_queue_xmit function. > >> > The packet path traced is : netif_rx->arp->dev_queue_xmit(internal > >> > port)->vxlan_xmit->dev_queue_xmit(internal port). > >> > > >> > >> Which spin-lock is it? I am surprised to see a lock taken in fast > >> path. Can you also share kernel version? > >> > >> > The macro (XMIT_RECURSION_LIMIT) is defined as 10. This limit wont > >> > prevent > >> > the crash since the recursion is 2 only for my configuration. > >> > >> right, The recursion limit is to avoid stack overflow. > >> > >> > > >> > > >> > > >> > On Wed, Jun 13, 2018 at 4:11 AM, Pravin Shelar <[email protected]> > wrote: > >> >> > >> >> On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam > >> >> <[email protected]> wrote: > >> >> > > >> >> > Hi Pravin, > >> >> > > >> >> > The below configuration is causing the spinlock recursion issue. > >> >> > > >> >> > I am able to see the issue with below configuration. > >> >> > > >> >> > > >> >> > > >> >> > ovs-vsctl add-br br0 > >> >> > > >> >> > ovs-vsctl add-bond br0 bond0 p1p1 p1p2 > >> >> > > >> >> > ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp > >> >> > > >> >> > ifconfig br0 100.0.0.1 up > >> >> > > >> >> > ovs-vsctl add-port br0 veth0 > >> >> > > >> >> > ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan > >> >> > options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2 > >> >> > option:key=flow > >> >> > > >> >> > > >> >> > > >> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, > tun_id=100, > >> >> > in_port=4, action=output:3" > >> >> > > >> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3, > >> >> > actions=set_field:100->tun_id output:4" > >> >> > > >> >> > > >> >> > > >> >> > The same bridge br0 is being used as the local interface for vxlan > >> >> > tunnel. Even though this configuration is invalid, we should not > see > >> >> > the > >> >> > kernel crash. > >> >> > > >> >> > >> >> I agree this should not cause crash. > >> >> Can you post the crash or investigate why it is crashing I think such > >> >> configuration would hit the networking stack recursion limit > >> >> (XMIT_RECURSION_LIMIT) and then the packet would be dropped. I am not > >> >> sure which spinlock recursion issue you are referring to. > >> >> > >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Tue, Jun 12, 2018 at 11:55 AM, Pravin Shelar <[email protected]> > >> >> > wrote: > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam > >> >> >> <[email protected]> wrote: > >> >> >>> > >> >> >>> This patch fixes the kernel soft lockup issue with vxlan > >> >> >>> configuration > >> >> >>> where the tunneled packet is sent on the same bridge where vxlan > >> >> >>> port > >> >> >>> is > >> >> >>> attched to. It detects the loop in vxlan xmit functionb and drops > >> >> >>> if > >> >> >>> loop is > >> >> >>> detected. > >> >> >>> > >> >> >>> Signed-off-by: Neelakantam Gaddam <[email protected]> > >> >> >>> --- > >> >> >>> datapath/linux/compat/vxlan.c | 6 ++++-- > >> >> >>> 1 file changed, 4 insertions(+), 2 deletions(-) > >> >> >>> > >> >> >>> diff --git a/datapath/linux/compat/vxlan.c > >> >> >>> b/datapath/linux/compat/vxlan.c > >> >> >>> index 287dad2..00562fa 100644 > >> >> >>> --- a/datapath/linux/compat/vxlan.c > >> >> >>> +++ b/datapath/linux/compat/vxlan.c > >> >> >>> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff > >> >> >>> *skb, > >> >> >>> struct net_device *dev, > >> >> >>> goto tx_error; > >> >> >>> } > >> >> >>> > >> >> >>> - if (rt->dst.dev == dev) { > >> >> >>> + if ((rt->dst.dev == dev) || > >> >> >>> + (OVS_CB(skb)->input_vport->dev == > >> >> >>> rt->dst.dev)) { > >> >> >> > >> >> >> > >> >> >> I am not sure which case the OVS_CB(skb)->input_vport->dev is not > >> >> >> same > >> >> >> as the dev when there is recursion. Can you explain how to > reproduce > >> >> >> it? > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks & Regards > >> >> > Neelakantam Gaddam > >> > > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards > >> > Neelakantam Gaddam > > > > > > > > > > -- > > Thanks & Regards > > Neelakantam Gaddam > -- Thanks & Regards Neelakantam Gaddam _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
