>> DPDK mempools are freed when they are no longer needed.
>> This can happen when a port is removed or a port's mtu is reconfigured so
>> that a new mempool is used.
>> 
>> It is possible that an mbuf is attempted to be returned to a freed mempool
>> from NIC Tx queues and this can lead to a segfault.
>> 
>> In order to prevent this, only free mempools when they are not needed and
>> have no in-use mbufs. As this might not be possible immediately, sweep the
>> mempools anytime a port tries to get a mempool.
>> 
> 
> Thanks for working on this Kevin. Just a query below. From a testing POV I 
> didn't come across any issues.
> 
>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>> Cc: mark.b.kavanagh81 at gmail.com
>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>> ---
>> 
>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>> 2.*'s. If the patch is acceptable I can backport to older branches.

This issue was actually rised back in January while discussing mempool issue.

> 
> Sounds good to me.
>> 
>>  lib/netdev-dpdk.c | 40 +++++++++++++++++++++++++++-------------
>>  1 file changed, 27 insertions(+), 13 deletions(-)
>> 
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>> 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu)  }
>> 
>> +/* Free unused mempools. */
>> +static void
>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>> +    struct dpdk_mp *dmp, *next;
>> +
>> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
> 
> I hadn't looked at rte_mempool_full() before. I noticed the documentation 
> warns not to use it as part of the datapath but for debug purposes only.

Yes, rte_mempool_full() is racy and could return false-positives, but it's
the best and only useful function in DPDK API. In practice we can never be
sure if someone still uses the memory pool. I also used this function in
my solution for branch 2.8 posted here:

https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html

Memory handling in DPDK is awful. In fact, noone is able to use 
rte_mempool_free(),
because there is no way correct to check if memory pool is still in use.

There is one dirty hack to check if mempool is really full. It based on the
internal mempool implementation and the fact that we're not allocating any
mbufs from the pool:
1.  Check rte_mempool_ops_get_count(). <-- this exposed externally for some 
reason
2.  Check rte_mempool_full().
3.  Check rte_mempool_ops_get_count() again.
4.  If rte_mempool_full() was 'true' and both calls to
    rte_mempool_ops_get_count() returned same value ----> Mempool really full.
5.  Here we could safely call rte_mempool_free().

> 
> Does its use add to the downtime in a noticeable way while we reconfigure? 
> I'm thinking of a deployment where the number of lcores is high and we 
> reconfigure often. When cache is enabled, it has to browse the length of all 
> lcores. There may not be an issue here but was curious.

That is the interesting question. In my version (referenced above) I used
watchdog thread to periodically free mempools with zero refcount.
Not sure which solution is better.

> 
> Did you do any testing around this?
> 
> Thanks
> Ian
> 
>> +            ovs_list_remove(&dmp->list_node);
>> +            rte_mempool_free(dmp->mp);
>> +            rte_free(dmp);
>> +        }
>> +    }
>> +}
>> +
>>  static struct dpdk_mp *
>>  dpdk_mp_get(int socket_id, int mtu)
>>  {
>>      struct dpdk_mp *dmp;
>> +    bool reuse = false;
>> 
>>      ovs_mutex_lock(&dpdk_mp_mutex);
>> @@ -599,15 +615,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>              dmp->refcount++;
>> -            goto out;
>> +            reuse = true;
>> +            break;
>>          }
>>      }
>> +    /* Sweep mempools after reuse or before create. */
>> +    dpdk_mp_sweep();
>> 
>> -    dmp = dpdk_mp_create(socket_id, mtu);
>> -    if (dmp) {
>> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> +    if (!reuse) {
>> +        dmp = dpdk_mp_create(socket_id, mtu);
>> +        if (dmp) {
>> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> +        }
>>      }
>> 
>> -out:
>> -
>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>> 
>> @@ -615,5 +634,5 @@ out:
>>  }
>> 
>> -/* Release an existing mempool. */
>> +/* Decrement reference to a mempool. */
>>  static void
>>  dpdk_mp_put(struct dpdk_mp *dmp)
>> @@ -625,10 +644,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>      ovs_assert(dmp->refcount);
>> -
>> -    if (!--dmp->refcount) {
>> -        ovs_list_remove(&dmp->list_node);
>> -        rte_mempool_free(dmp->mp);
>> -        rte_free(dmp);
>> -     }
>> +    dmp->refcount--;
>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>  }
>> --
>> 1.8.3.1
>> 
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to