On 7/16/20 12:25 PM, Sriharsha Basavapatna wrote: > On Thu, Jul 16, 2020 at 2:52 PM Ilya Maximets <i.maxim...@ovn.org> wrote: >> >> On 7/16/20 10:01 AM, Eli Britstein wrote: >>> >>> On 7/16/2020 1:37 AM, Ilya Maximets wrote: >>>> On 7/15/20 8:30 PM, Stokes, Ian wrote: >>>>>> On 15/07/2020 17:00, Ilya Maximets wrote: >>>>>>> Main thread will try to pause/stop all revalidators during datapath >>>>>>> reconfiguration via datapath purge callback (dp_purge_cb) while >>>>>>> holding 'dp->port_mutex'. And deadlock happens in case any of >>>>>>> revalidator threads is already waiting on 'dp->port_mutex' while >>>>>>> dumping offloaded flows: >>>>>>> >>>>>>> main thread revalidator >>>>>>> --------------------------------- ---------------------------------- >>>>>>> >>>>>>> ovs_mutex_lock(&dp->port_mutex) >>>>>>> >>>>>>> dpif_netdev_flow_dump_next() >>>>>>> -> dp_netdev_flow_to_dpif_flow >>>>>>> -> get_dpif_flow_status >>>>>>> -> >>>>>>> dpif_netdev_get_flow_offload_status() >>>>>>> -> ovs_mutex_lock(&dp->port_mutex) >>>>>>> <waiting for mutex here> >>>>>>> >>>>>>> reconfigure_datapath() >>>>>>> -> reconfigure_pmd_threads() >>>>>>> -> dp_netdev_del_pmd() >>>>>>> -> dp_purge_cb() >>>>>>> -> udpif_pause_revalidators() >>>>>>> -> ovs_barrier_block(&udpif->pause_barrier) >>>>>>> <waiting for revalidators to reach barrier> >>>>>>> >>>>>>> <DEADLOCK> >>>>>>> >>>>>>> We're not allowed to call offloading API without holding global port >>>>>>> mutex from the userspace datapath due to thread safety restrictions on >>>>>>> netdev-offload-dpdk module. And it's also not easy to rework datapath >>>>>>> reconfiguration process in order to move actual PMD removal and >>>>>>> datapath purge out of the port mutex. >>>>>>> >>>>>>> So, for now, not sleeping on a mutex if it's not immediately available >>>>>>> seem like an easiest workaround. This will have impact on flow >>>>>>> statistics update rate and on ability to get the latest statistics >>>>>>> before removing the flow (latest stats will be lost in case we were >>>>>>> not able to take the mutex). However, this will allow us to operate >>>>>>> normally avoiding the deadlock. >>>>>>> >>>>>>> The last bit is that to avoid flapping of flow attributes and >>>>>>> statistics we're not failing the operation, but returning last >>>>>>> statistics and attributes returned by offload provider. Since those >>>>>>> might be updated in different threads, stores and reads are atomic. >>>>>>> >>>>>>> Reported-by: Frank Wang (王培辉) <wangpei...@inspur.com> >>>>>>> Reported-at: >>>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2020-June%2F371753.html&data=02%7C01%7Celibr%40mellanox.com%7C57c79422483e42047f4d08d8290fb1e8%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637304494672050066&sdata=2HYA6RPY5LSa%2Beqy0WJZE0IZ65qRFacBph0M4SLXkwo%3D&reserved=0 >>>>>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows >>>>>>> statistics.") >>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >>>>>> LGTM >>>>>> >>>>>> Acked-by: Kevin Traynor <ktray...@redhat.com> >>>>>> >>>>> Looks ok to me as well Ilya, had some trouble getting some hardware setup >>>>> to test in lab, but standard test were ok it seems, no ill effect there. >>>>> Not sure if you want to see a test from another vendor device? >>>>> >>>>> Acked-by: Ian Stokes <ian.sto...@intel.com> >>>> Thanks, Kevin and Ian. >>>> >>>> It'll be good to have tests from Mellanox or Broadcom, or from the >>>> original reporter. But, if there will be no more replies until the >>>> end of July 16, I'll try to perform couple of synthetic tests by my >>>> own hands to see if it actually avoids deadlock and apply the patch. >>>> Anyway deadlock is worse than any stats or attributes misreporting. >>> >>> Hi. I tried but failed to reproduce it on master (without this patch). >>> >>> I run traffic (ping) between 2 ports (offloaded), and in the background >>> removing/adding one of the ports in a loop. >>> >>> Is there a better way to reproduce? >> >> You need dp_netdev_del_pmd() being called, so I'd suggest removing >> pmd threads from the pmd-cpu-mask. I guess, that there was such >> event preceding the port deletion in the original bug report since >> port deletion itself should not trigger the pmd thread deletion. > > Ilya, > > If you can send me the steps, I'll try this with my partial offload > patchset. Let me know.
From my understanding following should eventually trigger a deadlock: 1. Start OVS with HW offloading enabled. 2. Add couple of ports. 3. Start traffic, so some flows will be offloaded. 4. ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xffff 5. Wait a little bit to give OVS some time for reconfiguration. (If you're lucky, you may catch a deadlock right here while removing threads that was created by default) 6. ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x000f 7. Wait a little bit to give OVS some time for reconfiguration. Deadlock should happen here. 8. goto step #4 > > Thanks, > -Harsha >> >>> >>>> >>>> Best regards, Ilya Maximets. >>>> >>>>>>> --- >>>>>>> >>>>>>> Version 2: >>>>>>> - Storing error code from netdev_flow_get() to mimic the actual >>>>>>> result >>>>>>> of the last call. >>>>>>> >>>>>>> AUTHORS.rst | 1 + >>>>>>> lib/dpif-netdev.c | 93 >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++--- >>>>>>> 2 files changed, 88 insertions(+), 6 deletions(-) >>>>>>> >>>>>>> diff --git a/AUTHORS.rst b/AUTHORS.rst index 8e6a0769f..eb36a01d0 >>>>>>> 100644 >>>>>>> --- a/AUTHORS.rst >>>>>>> +++ b/AUTHORS.rst >>>>>>> @@ -504,6 +504,7 @@ Edwin Chiu ec...@vmware.com >>>>>>> Eivind Bulie Haanaes >>>>>>> Enas Ahmad enas.ah...@kaust.edu.sa >>>>>>> Eric Lopez >>>>>>> +Frank Wang (王培辉) wangpei...@inspur.com >>>>>>> Frido Roose fr.ro...@gmail.com >>>>>>> Gaetano Catalli gaetano.cata...@gmail.com >>>>>>> Gavin Remaley gavin_rema...@selinc.com >>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>>>>>> 5bb392cba..2aad24511 100644 >>>>>>> --- a/lib/dpif-netdev.c >>>>>>> +++ b/lib/dpif-netdev.c >>>>>>> @@ -492,6 +492,12 @@ struct dp_netdev_flow_stats { >>>>>>> atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags >>>>>>> values. */ >>>>>>> }; >>>>>>> >>>>>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member. */ >>>>>>> +struct dp_netdev_flow_attrs { >>>>>>> + atomic_bool offloaded; /* True if flow is offloaded to HW. >>>>>>> */ >>>>>>> + ATOMIC(const char *) dp_layer; /* DP layer the flow is handled >>>>>>> +in. */ }; >>>>>>> + >>>>>>> /* A flow in 'dp_netdev_pmd_thread's 'flow_table'. >>>>>>> * >>>>>>> * >>>>>>> @@ -552,6 +558,11 @@ struct dp_netdev_flow { >>>>>>> /* Statistics. */ >>>>>>> struct dp_netdev_flow_stats stats; >>>>>>> >>>>>>> + /* Statistics and attributes received from the netdev offload >>>>>>> provider. */ >>>>>>> + atomic_int netdev_flow_get_result; >>>>>>> + struct dp_netdev_flow_stats last_stats; >>>>>>> + struct dp_netdev_flow_attrs last_attrs; >>>>>>> + >>>>>>> /* Actions. */ >>>>>>> OVSRCU_TYPE(struct dp_netdev_actions *) actions; >>>>>>> >>>>>>> @@ -3277,9 +3288,56 @@ dp_netdev_pmd_find_flow(const struct >>>>>> dp_netdev_pmd_thread *pmd, >>>>>>> return NULL; >>>>>>> } >>>>>>> >>>>>>> +static void >>>>>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow, >>>>>>> + const struct dpif_flow_stats >>>>>>> *stats, >>>>>>> + const struct dpif_flow_attrs >>>>>>> *attrs, >>>>>>> + int result) { >>>>>>> + struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats; >>>>>>> + struct dp_netdev_flow_attrs *last_attrs = >>>>>>> +&netdev_flow->last_attrs; >>>>>>> + >>>>>>> + atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result); >>>>>>> + if (result) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + atomic_store_relaxed(&last_stats->used, stats->used); >>>>>>> + atomic_store_relaxed(&last_stats->packet_count, stats->n_packets); >>>>>>> + atomic_store_relaxed(&last_stats->byte_count, stats->n_bytes); >>>>>>> + atomic_store_relaxed(&last_stats->tcp_flags, stats->tcp_flags); >>>>>>> + >>>>>>> + atomic_store_relaxed(&last_attrs->offloaded, attrs->offloaded); >>>>>>> + atomic_store_relaxed(&last_attrs->dp_layer, attrs->dp_layer); >>>>>>> + >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow, >>>>>>> + struct dpif_flow_stats *stats, >>>>>>> + struct dpif_flow_attrs *attrs, >>>>>>> + int *result) { >>>>>>> + struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats; >>>>>>> + struct dp_netdev_flow_attrs *last_attrs = >>>>>>> +&netdev_flow->last_attrs; >>>>>>> + >>>>>>> + atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result); >>>>>>> + if (*result) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + atomic_read_relaxed(&last_stats->used, &stats->used); >>>>>>> + atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets); >>>>>>> + atomic_read_relaxed(&last_stats->byte_count, &stats->n_bytes); >>>>>>> + atomic_read_relaxed(&last_stats->tcp_flags, &stats->tcp_flags); >>>>>>> + >>>>>>> + atomic_read_relaxed(&last_attrs->offloaded, &attrs->offloaded); >>>>>>> + atomic_read_relaxed(&last_attrs->dp_layer, &attrs->dp_layer); >>>>>>> +} >>>>>>> + >>>>>>> static bool >>>>>>> dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp, >>>>>>> - const struct dp_netdev_flow >>>>>>> *netdev_flow, >>>>>>> + struct dp_netdev_flow >>>>>>> + *netdev_flow, >>>>>>> struct dpif_flow_stats *stats, >>>>>>> struct dpif_flow_attrs *attrs) { >>>>>>> @@ -3302,11 +3360,31 @@ dpif_netdev_get_flow_offload_status(const >>>>>> struct dp_netdev *dp, >>>>>>> } >>>>>>> ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf); >>>>>>> /* Taking a global 'port_mutex' to fulfill thread safety >>>>>>> - * restrictions for the netdev-offload-dpdk module. */ >>>>>>> - ovs_mutex_lock(&dp->port_mutex); >>>>>>> - ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow- >>>>>>> mega_ufid, >>>>>>> - stats, attrs, &buf); >>>>>>> - ovs_mutex_unlock(&dp->port_mutex); >>>>>>> + * restrictions for the netdev-offload-dpdk module. >>>>>>> + * >>>>>>> + * XXX: Main thread will try to pause/stop all revalidators during >>>>>>> datapath >>>>>>> + * reconfiguration via datapath purge callback (dp_purge_cb) >>>>>>> while >>>>>>> + * holding 'dp->port_mutex'. So we're not waiting for mutex >>>>>>> here. >>>>>>> + * Otherwise, deadlock is possible, bcause revalidators might >>>>>>> sleep >>>>>>> + * waiting for the main thread to release the lock and main >>>>>>> thread >>>>>>> + * will wait for them to stop processing. >>>>>>> + * This workaround might make statistics less accurate. >>>>>>> Especially >>>>>>> + * for flow deletion case, since there will be no other >>>>>>> attempt. */ >>>>>>> + if (!ovs_mutex_trylock(&dp->port_mutex)) { >>>>>>> + ret = netdev_flow_get(netdev, &match, &actions, >>>>>>> + &netdev_flow->mega_ufid, stats, attrs, >>>>>>> &buf); >>>>>>> + /* Storing statistics and attributes from the last request for >>>>>>> + * later use on mutex contention. */ >>>>>>> + dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, >>>>>>> ret); >>>>>>> + ovs_mutex_unlock(&dp->port_mutex); >>>>>>> + } else { >>>>>>> + dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, >>>>>>> &ret); >>>>>>> + if (!ret && !attrs->dp_layer) { >>>>>>> + /* Flow was never reported as 'offloaded' so it's harmless >>>>>>> + * to continue to think so. */ >>>>>>> + ret = EAGAIN; >>>>>>> + } >>>>>>> + } >>>>>>> netdev_close(netdev); >>>>>>> if (ret) { >>>>>>> return false; >>>>>>> @@ -3575,6 +3653,9 @@ dp_netdev_flow_add(struct >>>>>> dp_netdev_pmd_thread *pmd, >>>>>>> /* Do not allocate extra space. */ >>>>>>> flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len); >>>>>>> memset(&flow->stats, 0, sizeof flow->stats); >>>>>>> + atomic_init(&flow->netdev_flow_get_result, 0); >>>>>>> + memset(&flow->last_stats, 0, sizeof flow->last_stats); >>>>>>> + memset(&flow->last_attrs, 0, sizeof flow->last_attrs); >>>>>>> flow->dead = false; >>>>>>> flow->batch = NULL; >>>>>>> flow->mark = INVALID_FLOW_MARK; >>>>>>> >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev