On 7/15/20 6:00 PM, Ilya Maximets wrote: > Main thread will try to pause/stop all revalidators during datapath > reconfiguration via datapath purge callback (dp_purge_cb) while > holding 'dp->port_mutex'. And deadlock happens in case any of > revalidator threads is already waiting on 'dp->port_mutex' while > dumping offloaded flows: > > main thread revalidator > --------------------------------- ---------------------------------- > > ovs_mutex_lock(&dp->port_mutex) > > dpif_netdev_flow_dump_next() > -> dp_netdev_flow_to_dpif_flow > -> get_dpif_flow_status > -> dpif_netdev_get_flow_offload_status() > -> ovs_mutex_lock(&dp->port_mutex) > <waiting for mutex here> > > reconfigure_datapath() > -> reconfigure_pmd_threads() > -> dp_netdev_del_pmd() > -> dp_purge_cb() > -> udpif_pause_revalidators() > -> ovs_barrier_block(&udpif->pause_barrier) > <waiting for revalidators to reach barrier> > > <DEADLOCK> > > We're not allowed to call offloading API without holding global > port mutex from the userspace datapath due to thread safety > restrictions on netdev-offload-dpdk module. And it's also not easy > to rework datapath reconfiguration process in order to move actual > PMD removal and datapath purge out of the port mutex. > > So, for now, not sleeping on a mutex if it's not immediately available > seem like an easiest workaround. This will have impact on flow > statistics update rate and on ability to get the latest statistics > before removing the flow (latest stats will be lost in case we were > not able to take the mutex). However, this will allow us to operate > normally avoiding the deadlock. > > The last bit is that to avoid flapping of flow attributes and > statistics we're not failing the operation, but returning last > statistics and attributes returned by offload provider. Since those > might be updated in different threads, stores and reads are atomic. > > Reported-by: Frank Wang (王培辉) <wangpei...@inspur.com> > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html > Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.") > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > > Version 2: > - Storing error code from netdev_flow_get() to mimic the actual result > of the last call. >
Thanks, everyone! Applied and backported to branch-2.13. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev