On 7/16/20 10:01 AM, Eli Britstein wrote: > > On 7/16/2020 1:37 AM, Ilya Maximets wrote: >> On 7/15/20 8:30 PM, Stokes, Ian wrote: >>>> On 15/07/2020 17:00, Ilya Maximets wrote: >>>>> Main thread will try to pause/stop all revalidators during datapath >>>>> reconfiguration via datapath purge callback (dp_purge_cb) while >>>>> holding 'dp->port_mutex'. And deadlock happens in case any of >>>>> revalidator threads is already waiting on 'dp->port_mutex' while >>>>> dumping offloaded flows: >>>>> >>>>> main thread revalidator >>>>> --------------------------------- ---------------------------------- >>>>> >>>>> ovs_mutex_lock(&dp->port_mutex) >>>>> >>>>> dpif_netdev_flow_dump_next() >>>>> -> dp_netdev_flow_to_dpif_flow >>>>> -> get_dpif_flow_status >>>>> -> >>>>> dpif_netdev_get_flow_offload_status() >>>>> -> ovs_mutex_lock(&dp->port_mutex) >>>>> <waiting for mutex here> >>>>> >>>>> reconfigure_datapath() >>>>> -> reconfigure_pmd_threads() >>>>> -> dp_netdev_del_pmd() >>>>> -> dp_purge_cb() >>>>> -> udpif_pause_revalidators() >>>>> -> ovs_barrier_block(&udpif->pause_barrier) >>>>> <waiting for revalidators to reach barrier> >>>>> >>>>> <DEADLOCK> >>>>> >>>>> We're not allowed to call offloading API without holding global port >>>>> mutex from the userspace datapath due to thread safety restrictions on >>>>> netdev-offload-dpdk module. And it's also not easy to rework datapath >>>>> reconfiguration process in order to move actual PMD removal and >>>>> datapath purge out of the port mutex. >>>>> >>>>> So, for now, not sleeping on a mutex if it's not immediately available >>>>> seem like an easiest workaround. This will have impact on flow >>>>> statistics update rate and on ability to get the latest statistics >>>>> before removing the flow (latest stats will be lost in case we were >>>>> not able to take the mutex). However, this will allow us to operate >>>>> normally avoiding the deadlock. >>>>> >>>>> The last bit is that to avoid flapping of flow attributes and >>>>> statistics we're not failing the operation, but returning last >>>>> statistics and attributes returned by offload provider. Since those >>>>> might be updated in different threads, stores and reads are atomic. >>>>> >>>>> Reported-by: Frank Wang (王培辉) <wangpei...@inspur.com> >>>>> Reported-at: >>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2020-June%2F371753.html&data=02%7C01%7Celibr%40mellanox.com%7C57c79422483e42047f4d08d8290fb1e8%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637304494672050066&sdata=2HYA6RPY5LSa%2Beqy0WJZE0IZ65qRFacBph0M4SLXkwo%3D&reserved=0 >>>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows >>>>> statistics.") >>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >>>> LGTM >>>> >>>> Acked-by: Kevin Traynor <ktray...@redhat.com> >>>> >>> Looks ok to me as well Ilya, had some trouble getting some hardware setup >>> to test in lab, but standard test were ok it seems, no ill effect there. >>> Not sure if you want to see a test from another vendor device? >>> >>> Acked-by: Ian Stokes <ian.sto...@intel.com> >> Thanks, Kevin and Ian. >> >> It'll be good to have tests from Mellanox or Broadcom, or from the >> original reporter. But, if there will be no more replies until the >> end of July 16, I'll try to perform couple of synthetic tests by my >> own hands to see if it actually avoids deadlock and apply the patch. >> Anyway deadlock is worse than any stats or attributes misreporting. > > Hi. I tried but failed to reproduce it on master (without this patch). > > I run traffic (ping) between 2 ports (offloaded), and in the background > removing/adding one of the ports in a loop. > > Is there a better way to reproduce?
You need dp_netdev_del_pmd() being called, so I'd suggest removing pmd threads from the pmd-cpu-mask. I guess, that there was such event preceding the port deletion in the original bug report since port deletion itself should not trigger the pmd thread deletion. > >> >> Best regards, Ilya Maximets. >> >>>>> --- >>>>> >>>>> Version 2: >>>>> - Storing error code from netdev_flow_get() to mimic the actual result >>>>> of the last call. >>>>> >>>>> AUTHORS.rst | 1 + >>>>> lib/dpif-netdev.c | 93 >>>>> ++++++++++++++++++++++++++++++++++++++++++++--- >>>>> 2 files changed, 88 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/AUTHORS.rst b/AUTHORS.rst index 8e6a0769f..eb36a01d0 >>>>> 100644 >>>>> --- a/AUTHORS.rst >>>>> +++ b/AUTHORS.rst >>>>> @@ -504,6 +504,7 @@ Edwin Chiu ec...@vmware.com >>>>> Eivind Bulie Haanaes >>>>> Enas Ahmad enas.ah...@kaust.edu.sa >>>>> Eric Lopez >>>>> +Frank Wang (王培辉) wangpei...@inspur.com >>>>> Frido Roose fr.ro...@gmail.com >>>>> Gaetano Catalli gaetano.cata...@gmail.com >>>>> Gavin Remaley gavin_rema...@selinc.com >>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>>>> 5bb392cba..2aad24511 100644 >>>>> --- a/lib/dpif-netdev.c >>>>> +++ b/lib/dpif-netdev.c >>>>> @@ -492,6 +492,12 @@ struct dp_netdev_flow_stats { >>>>> atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags >>>>> values. */ >>>>> }; >>>>> >>>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ >>>>> +struct dp_netdev_flow_attrs { >>>>> + atomic_bool offloaded; /* True if flow is offloaded to HW. */ >>>>> + ATOMIC(const char *) dp_layer; /* DP layer the flow is handled >>>>> +in. */ }; >>>>> + >>>>> /* A flow in 'dp_netdev_pmd_thread's 'flow_table'. >>>>> * >>>>> * >>>>> @@ -552,6 +558,11 @@ struct dp_netdev_flow { >>>>> /* Statistics. */ >>>>> struct dp_netdev_flow_stats stats; >>>>> >>>>> + /* Statistics and attributes received from the netdev offload >>>>> provider. */ >>>>> + atomic_int netdev_flow_get_result; >>>>> + struct dp_netdev_flow_stats last_stats; >>>>> + struct dp_netdev_flow_attrs last_attrs; >>>>> + >>>>> /* Actions. */ >>>>> OVSRCU_TYPE(struct dp_netdev_actions *) actions; >>>>> >>>>> @@ -3277,9 +3288,56 @@ dp_netdev_pmd_find_flow(const struct >>>> dp_netdev_pmd_thread *pmd, >>>>> return NULL; >>>>> } >>>>> >>>>> +static void >>>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow, >>>>> + const struct dpif_flow_stats *stats, >>>>> + const struct dpif_flow_attrs *attrs, >>>>> + int result) { >>>>> + struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats; >>>>> + struct dp_netdev_flow_attrs *last_attrs = >>>>> +&netdev_flow->last_attrs; >>>>> + >>>>> + atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result); >>>>> + if (result) { >>>>> + return; >>>>> + } >>>>> + >>>>> + atomic_store_relaxed(&last_stats->used, stats->used); >>>>> + atomic_store_relaxed(&last_stats->packet_count, stats->n_packets); >>>>> + atomic_store_relaxed(&last_stats->byte_count, stats->n_bytes); >>>>> + atomic_store_relaxed(&last_stats->tcp_flags, stats->tcp_flags); >>>>> + >>>>> + atomic_store_relaxed(&last_attrs->offloaded, attrs->offloaded); >>>>> + atomic_store_relaxed(&last_attrs->dp_layer, attrs->dp_layer); >>>>> + >>>>> +} >>>>> + >>>>> +static void >>>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow, >>>>> + struct dpif_flow_stats *stats, >>>>> + struct dpif_flow_attrs *attrs, >>>>> + int *result) { >>>>> + struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats; >>>>> + struct dp_netdev_flow_attrs *last_attrs = >>>>> +&netdev_flow->last_attrs; >>>>> + >>>>> + atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result); >>>>> + if (*result) { >>>>> + return; >>>>> + } >>>>> + >>>>> + atomic_read_relaxed(&last_stats->used, &stats->used); >>>>> + atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets); >>>>> + atomic_read_relaxed(&last_stats->byte_count, &stats->n_bytes); >>>>> + atomic_read_relaxed(&last_stats->tcp_flags, &stats->tcp_flags); >>>>> + >>>>> + atomic_read_relaxed(&last_attrs->offloaded, &attrs->offloaded); >>>>> + atomic_read_relaxed(&last_attrs->dp_layer, &attrs->dp_layer); >>>>> +} >>>>> + >>>>> static bool >>>>> dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp, >>>>> - const struct dp_netdev_flow >>>>> *netdev_flow, >>>>> + struct dp_netdev_flow >>>>> + *netdev_flow, >>>>> struct dpif_flow_stats *stats, >>>>> struct dpif_flow_attrs *attrs) { >>>>> @@ -3302,11 +3360,31 @@ dpif_netdev_get_flow_offload_status(const >>>> struct dp_netdev *dp, >>>>> } >>>>> ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf); >>>>> /* Taking a global 'port_mutex' to fulfill thread safety >>>>> - * restrictions for the netdev-offload-dpdk module. */ >>>>> - ovs_mutex_lock(&dp->port_mutex); >>>>> - ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow- >>>>> mega_ufid, >>>>> - stats, attrs, &buf); >>>>> - ovs_mutex_unlock(&dp->port_mutex); >>>>> + * restrictions for the netdev-offload-dpdk module. >>>>> + * >>>>> + * XXX: Main thread will try to pause/stop all revalidators during >>>>> datapath >>>>> + * reconfiguration via datapath purge callback (dp_purge_cb) >>>>> while >>>>> + * holding 'dp->port_mutex'. So we're not waiting for mutex >>>>> here. >>>>> + * Otherwise, deadlock is possible, bcause revalidators might >>>>> sleep >>>>> + * waiting for the main thread to release the lock and main >>>>> thread >>>>> + * will wait for them to stop processing. >>>>> + * This workaround might make statistics less accurate. >>>>> Especially >>>>> + * for flow deletion case, since there will be no other >>>>> attempt. */ >>>>> + if (!ovs_mutex_trylock(&dp->port_mutex)) { >>>>> + ret = netdev_flow_get(netdev, &match, &actions, >>>>> + &netdev_flow->mega_ufid, stats, attrs, >>>>> &buf); >>>>> + /* Storing statistics and attributes from the last request for >>>>> + * later use on mutex contention. */ >>>>> + dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, >>>>> ret); >>>>> + ovs_mutex_unlock(&dp->port_mutex); >>>>> + } else { >>>>> + dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, >>>>> &ret); >>>>> + if (!ret && !attrs->dp_layer) { >>>>> + /* Flow was never reported as 'offloaded' so it's harmless >>>>> + * to continue to think so. */ >>>>> + ret = EAGAIN; >>>>> + } >>>>> + } >>>>> netdev_close(netdev); >>>>> if (ret) { >>>>> return false; >>>>> @@ -3575,6 +3653,9 @@ dp_netdev_flow_add(struct >>>> dp_netdev_pmd_thread *pmd, >>>>> /* Do not allocate extra space. */ >>>>> flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len); >>>>> memset(&flow->stats, 0, sizeof flow->stats); >>>>> + atomic_init(&flow->netdev_flow_get_result, 0); >>>>> + memset(&flow->last_stats, 0, sizeof flow->last_stats); >>>>> + memset(&flow->last_attrs, 0, sizeof flow->last_attrs); >>>>> flow->dead = false; >>>>> flow->batch = NULL; >>>>> flow->mark = INVALID_FLOW_MARK; >>>>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev