On 7/16/20 12:25 PM, Sriharsha Basavapatna wrote:
> On Thu, Jul 16, 2020 at 2:52 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>
>> On 7/16/20 10:01 AM, Eli Britstein wrote:
>>>
>>> On 7/16/2020 1:37 AM, Ilya Maximets wrote:
>>>> On 7/15/20 8:30 PM, Stokes, Ian wrote:
>>>>>> On 15/07/2020 17:00, Ilya Maximets wrote:
>>>>>>> Main thread will try to pause/stop all revalidators during datapath
>>>>>>> reconfiguration via datapath purge callback (dp_purge_cb) while
>>>>>>> holding 'dp->port_mutex'.  And deadlock happens in case any of
>>>>>>> revalidator threads is already waiting on 'dp->port_mutex' while
>>>>>>> dumping offloaded flows:
>>>>>>>
>>>>>>>             main thread                           revalidator
>>>>>>>   ---------------------------------  ----------------------------------
>>>>>>>
>>>>>>>   ovs_mutex_lock(&dp->port_mutex)
>>>>>>>
>>>>>>>                                      dpif_netdev_flow_dump_next()
>>>>>>>                                      -> dp_netdev_flow_to_dpif_flow
>>>>>>>                                      -> get_dpif_flow_status
>>>>>>>                                      -> 
>>>>>>> dpif_netdev_get_flow_offload_status()
>>>>>>>                                      -> ovs_mutex_lock(&dp->port_mutex)
>>>>>>>                                         <waiting for mutex here>
>>>>>>>
>>>>>>>   reconfigure_datapath()
>>>>>>>   -> reconfigure_pmd_threads()
>>>>>>>   -> dp_netdev_del_pmd()
>>>>>>>   -> dp_purge_cb()
>>>>>>>   -> udpif_pause_revalidators()
>>>>>>>   -> ovs_barrier_block(&udpif->pause_barrier)
>>>>>>>      <waiting for revalidators to reach barrier>
>>>>>>>
>>>>>>>                            <DEADLOCK>
>>>>>>>
>>>>>>> We're not allowed to call offloading API without holding global port
>>>>>>> mutex from the userspace datapath due to thread safety restrictions on
>>>>>>> netdev-offload-dpdk module.  And it's also not easy to rework datapath
>>>>>>> reconfiguration process in order to move actual PMD removal and
>>>>>>> datapath purge out of the port mutex.
>>>>>>>
>>>>>>> So, for now, not sleeping on a mutex if it's not immediately available
>>>>>>> seem like an easiest workaround.  This will have impact on flow
>>>>>>> statistics update rate and on ability to get the latest statistics
>>>>>>> before removing the flow (latest stats will be lost in case we were
>>>>>>> not able to take the mutex).  However, this will allow us to operate
>>>>>>> normally avoiding the deadlock.
>>>>>>>
>>>>>>> The last bit is that to avoid flapping of flow attributes and
>>>>>>> statistics we're not failing the operation, but returning last
>>>>>>> statistics and attributes returned by offload provider.  Since those
>>>>>>> might be updated in different threads, stores and reads are atomic.
>>>>>>>
>>>>>>> Reported-by: Frank Wang (王培辉) <wangpei...@inspur.com>
>>>>>>> Reported-at:
>>>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2020-June%2F371753.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C57c79422483e42047f4d08d8290fb1e8%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637304494672050066&amp;sdata=2HYA6RPY5LSa%2Beqy0WJZE0IZ65qRFacBph0M4SLXkwo%3D&amp;reserved=0
>>>>>>> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows
>>>>>>> statistics.")
>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>>>>> LGTM
>>>>>>
>>>>>> Acked-by: Kevin Traynor <ktray...@redhat.com>
>>>>>>
>>>>> Looks ok to me as well Ilya, had some trouble getting some hardware setup 
>>>>> to test in lab,  but standard test were ok it seems, no ill effect there. 
>>>>> Not sure if you want to see a test from another vendor device?
>>>>>
>>>>> Acked-by: Ian Stokes <ian.sto...@intel.com>
>>>> Thanks, Kevin and Ian.
>>>>
>>>> It'll be good to have tests from Mellanox or Broadcom, or from the
>>>> original reporter.  But, if there will be no more replies until the
>>>> end of July 16, I'll try to perform couple of synthetic tests by my
>>>> own hands to see if it actually avoids deadlock and apply the patch.
>>>> Anyway deadlock is worse than any stats or attributes misreporting.
>>>
>>> Hi. I tried but failed to reproduce it on master (without this patch).
>>>
>>> I run traffic (ping) between 2 ports (offloaded), and in the background 
>>> removing/adding one of the ports in a loop.
>>>
>>> Is there a better way to reproduce?
>>
>> You need dp_netdev_del_pmd() being called, so I'd suggest removing
>> pmd threads from the pmd-cpu-mask.  I guess, that there was such
>> event preceding the port deletion in the original bug report since
>> port deletion itself should not trigger the pmd thread deletion.
> 
> Ilya,
> 
> If you can send me the steps, I'll try this with my partial offload
> patchset. Let me know.

From my understanding following should eventually trigger a deadlock:

1. Start OVS with HW offloading enabled.
2. Add couple of ports.
3. Start traffic, so some flows will be offloaded.
4. ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xffff
5. Wait a little bit to give OVS some time for reconfiguration.
   (If you're lucky, you may catch a deadlock right here while removing
    threads that was created by default)
6. ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x000f
7. Wait a little bit to give OVS some time for reconfiguration.
   Deadlock should happen here.
8. goto step #4


> 
> Thanks,
> -Harsha
>>
>>>
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> Version 2:
>>>>>>>    - Storing error code from netdev_flow_get() to mimic the actual 
>>>>>>> result
>>>>>>>      of the last call.
>>>>>>>
>>>>>>>   AUTHORS.rst       |  1 +
>>>>>>>   lib/dpif-netdev.c | 93
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++---
>>>>>>>   2 files changed, 88 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/AUTHORS.rst b/AUTHORS.rst index 8e6a0769f..eb36a01d0
>>>>>>> 100644
>>>>>>> --- a/AUTHORS.rst
>>>>>>> +++ b/AUTHORS.rst
>>>>>>> @@ -504,6 +504,7 @@ Edwin Chiu                      ec...@vmware.com
>>>>>>>   Eivind Bulie Haanaes
>>>>>>>   Enas Ahmad                      enas.ah...@kaust.edu.sa
>>>>>>>   Eric Lopez
>>>>>>> +Frank Wang (王培辉)             wangpei...@inspur.com
>>>>>>>   Frido Roose                     fr.ro...@gmail.com
>>>>>>>   Gaetano Catalli                 gaetano.cata...@gmail.com
>>>>>>>   Gavin Remaley                   gavin_rema...@selinc.com
>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>>>>> 5bb392cba..2aad24511 100644
>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>> @@ -492,6 +492,12 @@ struct dp_netdev_flow_stats {
>>>>>>>       atomic_uint16_t tcp_flags;     /* Bitwise-OR of seen tcp_flags 
>>>>>>> values. */
>>>>>>>   };
>>>>>>>
>>>>>>> +/* Contained by struct dp_netdev_flow's 'last_attrs' member.  */
>>>>>>> +struct dp_netdev_flow_attrs {
>>>>>>> +    atomic_bool offloaded;         /* True if flow is offloaded to HW. 
>>>>>>> */
>>>>>>> +    ATOMIC(const char *) dp_layer; /* DP layer the flow is handled
>>>>>>> +in. */ };
>>>>>>> +
>>>>>>>   /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
>>>>>>>    *
>>>>>>>    *
>>>>>>> @@ -552,6 +558,11 @@ struct dp_netdev_flow {
>>>>>>>       /* Statistics. */
>>>>>>>       struct dp_netdev_flow_stats stats;
>>>>>>>
>>>>>>> +    /* Statistics and attributes received from the netdev offload 
>>>>>>> provider. */
>>>>>>> +    atomic_int netdev_flow_get_result;
>>>>>>> +    struct dp_netdev_flow_stats last_stats;
>>>>>>> +    struct dp_netdev_flow_attrs last_attrs;
>>>>>>> +
>>>>>>>       /* Actions. */
>>>>>>>       OVSRCU_TYPE(struct dp_netdev_actions *) actions;
>>>>>>>
>>>>>>> @@ -3277,9 +3288,56 @@ dp_netdev_pmd_find_flow(const struct
>>>>>> dp_netdev_pmd_thread *pmd,
>>>>>>>       return NULL;
>>>>>>>   }
>>>>>>>
>>>>>>> +static void
>>>>>>> +dp_netdev_flow_set_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>>>>> +                                    const struct dpif_flow_stats 
>>>>>>> *stats,
>>>>>>> +                                    const struct dpif_flow_attrs 
>>>>>>> *attrs,
>>>>>>> +                                    int result) {
>>>>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>>>>> +    struct dp_netdev_flow_attrs *last_attrs =
>>>>>>> +&netdev_flow->last_attrs;
>>>>>>> +
>>>>>>> +    atomic_store_relaxed(&netdev_flow->netdev_flow_get_result, result);
>>>>>>> +    if (result) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    atomic_store_relaxed(&last_stats->used,         stats->used);
>>>>>>> +    atomic_store_relaxed(&last_stats->packet_count, stats->n_packets);
>>>>>>> +    atomic_store_relaxed(&last_stats->byte_count,   stats->n_bytes);
>>>>>>> +    atomic_store_relaxed(&last_stats->tcp_flags,    stats->tcp_flags);
>>>>>>> +
>>>>>>> +    atomic_store_relaxed(&last_attrs->offloaded,    attrs->offloaded);
>>>>>>> +    atomic_store_relaxed(&last_attrs->dp_layer,     attrs->dp_layer);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +dp_netdev_flow_get_last_stats_attrs(struct dp_netdev_flow *netdev_flow,
>>>>>>> +                                    struct dpif_flow_stats *stats,
>>>>>>> +                                    struct dpif_flow_attrs *attrs,
>>>>>>> +                                    int *result) {
>>>>>>> +    struct dp_netdev_flow_stats *last_stats = &netdev_flow->last_stats;
>>>>>>> +    struct dp_netdev_flow_attrs *last_attrs =
>>>>>>> +&netdev_flow->last_attrs;
>>>>>>> +
>>>>>>> +    atomic_read_relaxed(&netdev_flow->netdev_flow_get_result, result);
>>>>>>> +    if (*result) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    atomic_read_relaxed(&last_stats->used,         &stats->used);
>>>>>>> +    atomic_read_relaxed(&last_stats->packet_count, &stats->n_packets);
>>>>>>> +    atomic_read_relaxed(&last_stats->byte_count,   &stats->n_bytes);
>>>>>>> +    atomic_read_relaxed(&last_stats->tcp_flags,    &stats->tcp_flags);
>>>>>>> +
>>>>>>> +    atomic_read_relaxed(&last_attrs->offloaded,    &attrs->offloaded);
>>>>>>> +    atomic_read_relaxed(&last_attrs->dp_layer,     &attrs->dp_layer);
>>>>>>> +}
>>>>>>> +
>>>>>>>   static bool
>>>>>>>   dpif_netdev_get_flow_offload_status(const struct dp_netdev *dp,
>>>>>>> -                                    const struct dp_netdev_flow 
>>>>>>> *netdev_flow,
>>>>>>> +                                    struct dp_netdev_flow
>>>>>>> + *netdev_flow,
>>>>>>>                                       struct dpif_flow_stats *stats,
>>>>>>>                                       struct dpif_flow_attrs *attrs)  {
>>>>>>> @@ -3302,11 +3360,31 @@ dpif_netdev_get_flow_offload_status(const
>>>>>> struct dp_netdev *dp,
>>>>>>>       }
>>>>>>>       ofpbuf_use_stack(&buf, &act_buf, sizeof act_buf);
>>>>>>>       /* Taking a global 'port_mutex' to fulfill thread safety
>>>>>>> -     * restrictions for the netdev-offload-dpdk module. */
>>>>>>> -    ovs_mutex_lock(&dp->port_mutex);
>>>>>>> -    ret = netdev_flow_get(netdev, &match, &actions, &netdev_flow-
>>>>>>> mega_ufid,
>>>>>>> -                          stats, attrs, &buf);
>>>>>>> -    ovs_mutex_unlock(&dp->port_mutex);
>>>>>>> +     * restrictions for the netdev-offload-dpdk module.
>>>>>>> +     *
>>>>>>> +     * XXX: Main thread will try to pause/stop all revalidators during 
>>>>>>> datapath
>>>>>>> +     *      reconfiguration via datapath purge callback (dp_purge_cb) 
>>>>>>> while
>>>>>>> +     *      holding 'dp->port_mutex'.  So we're not waiting for mutex 
>>>>>>> here.
>>>>>>> +     *      Otherwise, deadlock is possible, bcause revalidators might 
>>>>>>> sleep
>>>>>>> +     *      waiting for the main thread to release the lock and main 
>>>>>>> thread
>>>>>>> +     *      will wait for them to stop processing.
>>>>>>> +     *      This workaround might make statistics less accurate. 
>>>>>>> Especially
>>>>>>> +     *      for flow deletion case, since there will be no other 
>>>>>>> attempt.  */
>>>>>>> +    if (!ovs_mutex_trylock(&dp->port_mutex)) {
>>>>>>> +        ret = netdev_flow_get(netdev, &match, &actions,
>>>>>>> +                              &netdev_flow->mega_ufid, stats, attrs, 
>>>>>>> &buf);
>>>>>>> +        /* Storing statistics and attributes from the last request for
>>>>>>> +         * later use on mutex contention. */
>>>>>>> +        dp_netdev_flow_set_last_stats_attrs(netdev_flow, stats, attrs, 
>>>>>>> ret);
>>>>>>> +        ovs_mutex_unlock(&dp->port_mutex);
>>>>>>> +    } else {
>>>>>>> +        dp_netdev_flow_get_last_stats_attrs(netdev_flow, stats, attrs, 
>>>>>>> &ret);
>>>>>>> +        if (!ret && !attrs->dp_layer) {
>>>>>>> +            /* Flow was never reported as 'offloaded' so it's harmless
>>>>>>> +             * to continue to think so. */
>>>>>>> +            ret = EAGAIN;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>>       netdev_close(netdev);
>>>>>>>       if (ret) {
>>>>>>>           return false;
>>>>>>> @@ -3575,6 +3653,9 @@ dp_netdev_flow_add(struct
>>>>>> dp_netdev_pmd_thread *pmd,
>>>>>>>       /* Do not allocate extra space. */
>>>>>>>       flow = xmalloc(sizeof *flow - sizeof flow->cr.flow.mf + mask.len);
>>>>>>>       memset(&flow->stats, 0, sizeof flow->stats);
>>>>>>> +    atomic_init(&flow->netdev_flow_get_result, 0);
>>>>>>> +    memset(&flow->last_stats, 0, sizeof flow->last_stats);
>>>>>>> +    memset(&flow->last_attrs, 0, sizeof flow->last_attrs);
>>>>>>>       flow->dead = false;
>>>>>>>       flow->batch = NULL;
>>>>>>>       flow->mark = INVALID_FLOW_MARK;
>>>>>>>
>>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to