On 7/16/20 10:01 AM, Eli Britstein wrote:
> 
> On 7/16/2020 1:37 AM, Ilya Maximets wrote:
>> On 7/15/20 8:30 PM, Stokes, Ian wrote:
>>>> On 15/07/2020 17:00, Ilya Maximets wrote:
>>>>> Main thread will try to pause/stop all revalidators during datapath
>>>>> reconfiguration via datapath purge callback (dp_purge_cb) while
>>>>> holding 'dp->port_mutex'.  And deadlock happens in case any of
>>>>> revalidator threads is already waiting on 'dp->port_mutex' while
>>>>> dumping offloaded flows:
>>>>>
>>>>>             main thread                           revalidator
>>>>>   ---------------------------------  ----------------------------------
>>>>>
>>>>>   ovs_mutex_lock(&dp->port_mutex)
>>>>>
>>>>>                                      dpif_netdev_flow_dump_next()
>>>>>                                      -> dp_netdev_flow_to_dpif_flow
>>>>>                                      -> get_dpif_flow_status
>>>>>                                      -> 
>>>>> dpif_netdev_get_flow_offload_status()
>>>>>                                      -> ovs_mutex_lock(&dp->port_mutex)
>>>>>                                         <waiting for mutex here>
>>>>>
>>>>>   reconfigure_datapath()
>>>>>   -> reconfigure_pmd_threads()
>>>>>   -> dp_netdev_del_pmd()
>>>>>   -> dp_purge_cb()
>>>>>   -> udpif_pause_revalidators()
>>>>>   -> ovs_barrier_block(&udpif->pause_barrier)
>>>>>      <waiting for revalidators to reach barrier>
>>>>>
>>>>>                            <DEADLOCK>
>>>>>
>>>>> We're not allowed to call offloading API without holding global port
>>>>> mutex from the userspace datapath due to thread safety restrictions on
>>>>> netdev-offload-dpdk module.  And it's also not easy to rework datapath
>>>>> reconfiguration process in order to move actual PMD removal and
>>>>> datapath purge out of the port mutex.
>>>>>
>>>>> So, for now, not sleeping on a mutex if it's not immediately available
>>>>> seem like an easiest workaround.  This will have impact on flow
>>>>> statistics update rate and on ability to get the latest statistics
>>>>> before removing the flow (latest stats will be lost in case we were
>>>>> not able to take the mutex).  However, this will allow us to operate
>>>>> normally avoiding the deadlock.
>>>>>
>>>>> The last bit is that to avoid flapping of flow attributes and
>>>>> statistics we're not failing the operation, but returning last
>>>>> statistics and attributes returned by offload provider.  Since those
>>>>> might be updated in different threads, stores and reads are atomic.
>>>>>
>>>>> Reported-by: Frank Wang (王培辉) <wangpei...@inspur.com>
>>>>> Reported-at:
>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2020-June%2F371753.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C57c79422483e42047f4d08d8290fb1e8%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637304494672050066&amp;sdata=2HYA6RPY5LSa%2Beqy0WJZE0IZ65qRFacBph0M4SLXkwo%3D&amp;reserved=0
>>>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows
>>>>> statistics.")
>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>>> LGTM
>>>>
>>>> Acked-by: Kevin Traynor <ktray...@redhat.com>
>>>>
>>> Looks ok to me as well Ilya, had some trouble getting some hardware setup 
>>> to test in lab,  but standard test were ok it seems, no ill effect there. 
>>> Not sure if you want to see a test from another vendor device?
>>>
>>> Acked-by: Ian Stokes <ian.sto...@intel.com>
>> Thanks, Kevin and Ian.
>>
>> It'll be good to have tests from Mellanox or Broadcom, or from the
>> original reporter.  But, if there will be no more replies until the
>> end of July 16, I'll try to perform couple of synthetic tests by my
>> own hands to see if it actually avoids deadlock and apply the patch.
>> Anyway deadlock is worse than any stats or attributes misreporting.
> 
> Hi. I tried but failed to reproduce it on master (without this patch).
> 
> I run traffic (ping) between 2 ports (offloaded), and in the background 
> removing/adding one of the ports in a loop.
> 
> Is there a better way to reproduce?

You need dp_netdev_del_pmd() being called, so I'd suggest removing
pmd threads from the pmd-cpu-mask.  I guess, that there was such
event preceding the port deletion in the original bug report since
port deletion itself should not trigger the pmd thread deletion.

> 
>>
>> Best regards, Ilya Maximets.
>>
>>>>> ---
>>>>>
>>>>> Version 2:
>>>>>    - Storing error code from netdev_flow_get() to mimic the actual result
>>>>>      of the last call.
>>>>>
>>>>>   AUTHORS.rst       |  1 +
>>>>>   lib/dpif-netdev.c | 93
>>>>> ++++++++++++++++++++++++++++++++++++++++++++---
>>>>>   2 files changed, 88 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/AUTHORS.rst b/AUTHORS.rst index 8e6a0769f..eb36a01d0
>>>>> 100644
>>>>> --- a/AUTHORS.rst
>>>>> +++ b/AUTHORS.rst
>>>>> @@ -504,6 +504,7 @@ Edwin Chiu                      ec...@vmware.com
>>>>>   Eivind Bulie Haanaes
>>>>>   Enas Ahmad                      enas.ah...@kaust.edu.sa
>>>>>   Eric Lopez
>>>>> +Frank Wang (王培辉)             wangpei...@inspur.com
>>>>>   Frido Roose                     fr.ro...@gmail.com
>>>>>   Gaetano Catalli                 gaetano.cata...@gmail.com
>>>>>   Gavin Remaley                   gavin_rema...@selinc.com
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>>> 5bb392cba..2aad24511 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -492,6 +492,12 @@ struct dp_netdev_flow_stats {
>>>>>       atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags 
>>>>> values. */
>>>>>   };
>>>>>
>>>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
>>>>> +struct dp_netdev_flow_attrs {
>>>>> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. */
>>>>> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled
>>>>> +in. */ };
>>>>> +
>>>>>   /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>>>>>    *
>>>>>    *
>>>>> @@ -552,6 +558,11 @@ struct dp_netdev_flow {
>>>>>       /* Statistics. */
>>>>>       struct dp_netdev_flow_stats stats;
>>>>>
>>>>> +    /* Statistics and attributes received from the netdev offload 
>>>>> provider. */
>>>>> +    atomic_int netdev_flow_get_result;
>>>>> +    struct dp_netdev_flow_stats last_stats;
>>>>> +    struct dp_netdev_flow_attrs last_attrs;
>>>>> +
>>>>>       /* Actions. */
>>>>>       OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>>>>>
>>>>> @@ -3277,9 +3288,56 @@ dp_netdev_pmd_find_flow(const struct
>>>> dp_netdev_pmd_thread *pmd,
>>>>>       return NULL;
>>>>>   }
>>>>>
>>>>> +static void
>>>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>>> +                                    const struct dpif_flow_stats *stats,
>>>>> +                                    const struct dpif_flow_attrs *attrs,
>>>>> +                                    int result) {
>>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>>> +    struct dp_netdev_flow_attrs *last_attrs =
>>>>> +&netdev_flow->last_attrs;
>>>>> +
>>>>> +    atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result);
>>>>> +    if (result) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    atomic_store_relaxed(&last_stats->used,         stats->used);
>>>>> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>>>>> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
>>>>> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
>>>>> +
>>>>> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>>>>> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
>>>>> +
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>>> +                                    struct dpif_flow_stats *stats,
>>>>> +                                    struct dpif_flow_attrs *attrs,
>>>>> +                                    int *result) {
>>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>>> +    struct dp_netdev_flow_attrs *last_attrs =
>>>>> +&netdev_flow->last_attrs;
>>>>> +
>>>>> +    atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result);
>>>>> +    if (*result) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
>>>>> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>>>>> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
>>>>> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
>>>>> +
>>>>> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
>>>>> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
>>>>> +}
>>>>> +
>>>>>   static bool
>>>>>   dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>>> -                                    const struct dp_netdev_flow 
>>>>> *netdev_flow,
>>>>> +                                    struct dp_netdev_flow
>>>>> + *netdev_flow,
>>>>>                                       struct dpif_flow_stats *stats,
>>>>>                                       struct dpif_flow_attrs *attrs)  {
>>>>> @@ -3302,11 +3360,31 @@ dpif_netdev_get_flow_offload_status(const
>>>> struct dp_netdev *dp,
>>>>>       }
>>>>>       ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>>>       /* Taking a global 'port_mutex' to fulfill thread safety
>>>>> -     * restrictions for the netdev-offload-dpdk module. */
>>>>> -    ovs_mutex_lock(&dp->port_mutex);
>>>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow-
>>>>> mega_ufid,
>>>>> -                          stats, attrs, &buf);
>>>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>>>> +     * restrictions for the netdev-offload-dpdk module.
>>>>> +     *
>>>>> +     * XXX: Main thread will try to pause/stop all revalidators during 
>>>>> datapath
>>>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) 
>>>>> while
>>>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex 
>>>>> here.
>>>>> +     *      Otherwise, deadlock is possible, bcause revalidators might 
>>>>> sleep
>>>>> +     *      waiting for the main thread to release the lock and main 
>>>>> thread
>>>>> +     *      will wait for them to stop processing.
>>>>> +     *      This workaround might make statistics less accurate. 
>>>>> Especially
>>>>> +     *      for flow deletion case, since there will be no other 
>>>>> attempt.  */
>>>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>>>> +                              &netdev_flow->mega_ufid, stats, attrs, 
>>>>> &buf);
>>>>> +        /* Storing statistics and attributes from the last request for
>>>>> +         * later use on mutex contention. */
>>>>> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, 
>>>>> ret);
>>>>> +        ovs_mutex_unlock(&dp->port_mutex);
>>>>> +    } else {
>>>>> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, 
>>>>> &ret);
>>>>> +        if (!ret && !attrs->dp_layer) {
>>>>> +            /* Flow was never reported as 'offloaded' so it's harmless
>>>>> +             * to continue to think so. */
>>>>> +            ret = EAGAIN;
>>>>> +        }
>>>>> +    }
>>>>>       netdev_close(netdev);
>>>>>       if (ret) {
>>>>>           return false;
>>>>> @@ -3575,6 +3653,9 @@ dp_netdev_flow_add(struct
>>>> dp_netdev_pmd_thread *pmd,
>>>>>       /* Do not allocate extra space. */
>>>>>       flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>>>>>       memset(&flow->stats, 0, sizeof flow->stats);
>>>>> +    atomic_init(&flow->netdev_flow_get_result, 0);
>>>>> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
>>>>> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>>>>>       flow->dead = false;
>>>>>       flow->batch = NULL;
>>>>>       flow->mark = INVALID_FLOW_MARK;
>>>>>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to