Thanks Illya, looking forward to your update if you have any thoughts -----邮件原件----- 发件人: Ilya Maximets [mailto:[email protected]] 发送时间: 2020年6月18日 20:34 收件人: Frank Wang(王培辉) <[email protected]>; [email protected]; [email protected] 抄送: [email protected] 主题: Re: [ovs-dev] bug report//ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload enabled
On 6/18/20 12:52 PM, Frank Wang(王培辉) wrote: > Hello experts > > > > I'm writing this email to bring some attention since I am one > hundred percentage certain the issue is a deadlock after digging on > it, > > here is the procedure how to reproduce: > > a. turn on hw-offload=true, then restart ovs-vswitchd > > b. create a netdev bridge, then add nic to the bridge(already bound > vfio driver) > > c. the try to delete the dpdk port from the bridge > > d. It will got stuck in probability > > > > The assumption when the deadlock occurs: > > 1.ovs-vswitchd main thread get the mutex lock (dp->port-mutex) in > dpif_netdev_port_del when delete port from bridge, > > 2.Meantime, the revalidators will try to require mutex > lock(dp->port_mutex) in dpif_netdev_get_flow_offload_status when dump > flows, they will hang up because they can’t get the lock > > 3.ovs-vswitchd will pause revalidators through latch for purging pmd > flows in dp_purge_cb, but the revalidators already sleeping to wait > for the mutex lock, the lead can’t response to this pause action Hi. Thanks for the report. This is definitely a deadlock and your analysis is correct. The issue is derived from the fact that netdev-offload-dpdk is not thread safe and we have to hold dp->port_mutex during all the offloading related operations. > > > I have no idea how to fix it, please feel free to leave your comments. I don't have a solution for this right now in mind. Will take a closer look. > > > > Thanks. > > > > 发件人: Frank Wang(王培辉) > 发送时间: 2020年6月17日 19:08 > 收件人: '[email protected]' <[email protected]>; > [email protected] > 主题: ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload > enabled > > > > Hello > > > > I’m encountered a problem that ovs-vswitchd got stuck while I > tried to delete the dpdk port from bridge in in probability, when I > turn off hw-offload,it’s won’t happen again.I’m using the latest ovs > 2.13.1 version, CentOS 7.6, please help me out here, thanks in advance. > > > > Here is the ovs-vswitchd stack, it seems a deadlock: > > Thread 41 (Thread 0x7f2e2fdfe700 (LWP 156099)): > > #0 0x00007f2e975224ed in __lll_lock_wait () from > /lib64/libpthread.so.0 > > #1 0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0 > > #2 0x00007f2e9751dcdf in pthread_mutex_lock () from > /lib64/libpthread.so.0 > > #3 0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2. > 13.so.0 > > #4 0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from > /lib64/libopenvswitch-2.13.so.0 > > #5 0x00007f2e984c5504 in get_dpif_flow_status () from > /lib64/libopenvswitch-2.13.so.0 > > #6 0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from > /lib64/libopenvswitch-2.13.so.0 > > #7 0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from > /lib64/libopenvswitch-2.13.so.0 > > #8 0x00007f2e984d86f2 in dpif_flow_dump_next () from > /lib64/libopenvswitch-2.13.so.0 > > #9 0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13. > so.0 > > #10 0x00007f2e98b46db3 in udpif_revalidator () from > /lib64/libofproto-2.13.so.0 > > #11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2. > 13.so.0 > > #12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0 > > #13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6 > > Thread 40 (Thread 0x7f2e2fbfd700 (LWP 156100)): > > #0 0x00007f2e975224ed in __lll_lock_wait () from > /lib64/libpthread.so.0 > > #1 0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0 > > #2 0x00007f2e9751dcdf in pthread_mutex_lock () from > /lib64/libpthread.so.0 > > #3 0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2. > 13.so.0 > > #4 0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from > /lib64/libopenvswitch-2.13.so.0 > > #5 0x00007f2e984c5504 in get_dpif_flow_status () from > /lib64/libopenvswitch-2.13.so.0 > > #6 0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from > /lib64/libopenvswitch-2.13.so.0 > > #7 0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from > /lib64/libopenvswitch-2.13.so.0 > > #8 0x00007f2e984d86f2 in dpif_flow_dump_next () from > /lib64/libopenvswitch-2.13.so.0 > > #9 0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13. > so.0 > > #10 0x00007f2e98b46db3 in udpif_revalidator () from > /lib64/libofproto-2.13.so.0 > > #11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2. > 13.so.0 > > #12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0 > > #13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6 > > > > … > > Thread 1 (Thread 0x7f2eaa999000 (LWP 155698)): > > #0 0x00007f2e96a2f20d in poll () from /lib64/libc.so.6 > > #1 0x00007f2e98593064 in time_poll () from > /lib64/libopenvswitch-2.13.so.0 > > #2 0x00007f2e9857a50c in poll_block () from > /lib64/libopenvswitch-2.13.so.0 > > #3 0x00007f2e98562f78 in ovs_barrier_block () from /lib64/libopenvswitch-2. > 13.so.0 > > #4 0x00007f2e98b432be in dp_purge_cb () from > /lib64/libofproto-2.13.so.0 > > #5 0x00007f2e984c8391 in dp_netdev_del_pmd () from /lib64/libopenvswitch-2. > 13.so.0 > > #6 0x00007f2e984c9a67 in reconfigure_datapath () from > /lib64/libopenvswitch-2.13.so.0 > > #7 0x00007f2e984cacbd in do_del_port () from /lib64/libopenvswitch-2.13.so. > 0 > > #8 0x00007f2e984ccd5f in dpif_netdev_port_del () from > /lib64/libopenvswitch-2.13.so.0 > > #9 0x00007f2e984d74dc in dpif_port_del () from > /lib64/libopenvswitch-2.13.so.0 > > #10 0x00007f2e98b31595 in port_del () from /lib64/libofproto-2.13.so.0 > > #11 0x00007f2e98b1fe90 in ofproto_port_del () from > /lib64/libofproto-2.13.so.0 > > #12 0x000056187a856aa6 in bridge_delete_or_reconfigure_ports () > > #13 0x000056187a858490 in bridge_reconfigure () > > #14 0x000056187a85be26 in bridge_run () > > #15 0x000056187a85235d in main () > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
