On 04/09/2018 03:36 PM, Kevin Traynor wrote:
> On 04/06/2018 04:51 PM, Ilya Maximets wrote:
>>>> DPDK mempools are freed when they are no longer needed.
>>>> This can happen when a port is removed or a port's mtu is reconfigured so
>>>> that a new mempool is used.
>>>>
>>>> It is possible that an mbuf is attempted to be returned to a freed mempool
>>>> from NIC Tx queues and this can lead to a segfault.
>>>>
>>>> In order to prevent this, only free mempools when they are not needed and
>>>> have no in-use mbufs. As this might not be possible immediately, sweep the
>>>> mempools anytime a port tries to get a mempool.
>>>>
>>>
>>> Thanks for working on this Kevin. Just a query below. From a testing POV I 
>>> didn't come across any issues.
>>>
>>>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>>>> Cc: mark.b.kavanagh81 at gmail.com
>>>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>>>> ---
>>>>
>>>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>>>> 2.*'s. If the patch is acceptable I can backport to older branches.
>>
>> This issue was actually rised back in January while discussing mempool issue.
>>
> 
> Hmmm, thanks for pointing it out. Seems the last mails in the thread
> coincided with when I was traveling and I didn't read them back then.
> 
>>>
>>> Sounds good to me.
>>>>
>>>>  lib/netdev-dpdk.c | 40 +++++++++++++++++++++++++++-------------
>>>>  1 file changed, 27 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>>>> 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu)  }
>>>>
>>>> +/* Free unused mempools. */
>>>> +static void
>>>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>>>> +    struct dpdk_mp *dmp, *next;
>>>> +
>>>> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>>>> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>>>
>>> I hadn't looked at rte_mempool_full() before. I noticed the documentation 
>>> warns not to use it as part of the datapath but for debug purposes only.
>>
>> Yes, rte_mempool_full() is racy and could return false-positives, but it's
>> the best and only useful function in DPDK API. In practice we can never be
>> sure if someone still uses the memory pool. I also used this function in
>> my solution for branch 2.8 posted here:
>>
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html
>>
> 
> The similarities and differences are interesting!
> 
>> Memory handling in DPDK is awful. In fact, noone is able to use 
>> rte_mempool_free(),
>> because there is no way correct to check if memory pool is still in use.
>>
>> There is one dirty hack to check if mempool is really full. It based on the
>> internal mempool implementation and the fact that we're not allocating any
>> mbufs from the pool:
>> 1.  Check rte_mempool_ops_get_count(). <-- this exposed externally for some 
>> reason
>> 2.  Check rte_mempool_full().
>> 3.  Check rte_mempool_ops_get_count() again.
>> 4.  If rte_mempool_full() was 'true' and both calls to
>>     rte_mempool_ops_get_count() returned same value ----> Mempool really 
>> full.
>> 5.  Here we could safely call rte_mempool_free().
>>
> 
> Yes, that would work.
> 
> The ring count is made before the cache count in rte_mempool_full() and
> as we are not getting mbufs I think we should only have the possibility
> of mbufs moving from a cache to the ring. In that case we may get a
> false negative but wouldn't get a false positive for rte_mempool_full().
> Obviously the downside is that it is implementation specific - but maybe
> it could be argued the implementation will not change on already
> released DPDK versions. What do you think?
> 
>>>
>>> Does its use add to the downtime in a noticeable way while we reconfigure? 
>>> I'm thinking of a deployment where the number of lcores is high and we 
>>> reconfigure often. When cache is enabled, it has to browse the length of 
>>> all lcores. There may not be an issue here but was curious.
>>
> 
> I think it should be insignificant compared to actually
> allocating/freeing a mempool. I will try to check it.
> 

I measured 4K-5K cycles, so extra time for rte_mempool_full() is not an
issue in the control path.

>> That is the interesting question. In my version (referenced above) I used
>> watchdog thread to periodically free mempools with zero refcount.
>> Not sure which solution is better.
>>
> 
> The advantage I see of being on the watchdog timer thread is that
> probably no one cares how long it takes :-) The advantage of being in
> the mempool config is that it runs (and guarantees to run) just before a
> new mempool is going to be requested, so there is smaller risk of
> additional memory requirements.
> 
> In the short term I think we should use the a fix like the patches that
> don't cover every possible corner case or the get_count() that isn't the
> most elegant. Longer term we can think about what we would want to add
> to DPDK to make it easier to do this.
> 
> thanks,
> Kevin.
> 
>>>
>>> Did you do any testing around this?
>>>
>>> Thanks
>>> Ian
>>>
>>>> +            ovs_list_remove(&dmp->list_node);
>>>> +            rte_mempool_free(dmp->mp);
>>>> +            rte_free(dmp);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>  static struct dpdk_mp *
>>>>  dpdk_mp_get(int socket_id, int mtu)
>>>>  {
>>>>      struct dpdk_mp *dmp;
>>>> +    bool reuse = false;
>>>>
>>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>>> @@ -599,15 +615,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>>>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>>>              dmp->refcount++;
>>>> -            goto out;
>>>> +            reuse = true;
>>>> +            break;
>>>>          }
>>>>      }
>>>> +    /* Sweep mempools after reuse or before create. */
>>>> +    dpdk_mp_sweep();
>>>>
>>>> -    dmp = dpdk_mp_create(socket_id, mtu);
>>>> -    if (dmp) {
>>>> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> +    if (!reuse) {
>>>> +        dmp = dpdk_mp_create(socket_id, mtu);
>>>> +        if (dmp) {
>>>> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> +        }
>>>>      }
>>>>
>>>> -out:
>>>> -
>>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>>
>>>> @@ -615,5 +634,5 @@ out:
>>>>  }
>>>>
>>>> -/* Release an existing mempool. */
>>>> +/* Decrement reference to a mempool. */
>>>>  static void
>>>>  dpdk_mp_put(struct dpdk_mp *dmp)
>>>> @@ -625,10 +644,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>>>      ovs_assert(dmp->refcount);
>>>> -
>>>> -    if (!--dmp->refcount) {
>>>> -        ovs_list_remove(&dmp->list_node);
>>>> -        rte_mempool_free(dmp->mp);
>>>> -        rte_free(dmp);
>>>> -     }
>>>> +    dmp->refcount--;
>>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>>  }
>>>> --
>>>> 1.8.3.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to