On 04/12/2018 04:40 PM, Ilya Maximets wrote:
> On 10.04.2018 21:12, Kevin Traynor wrote:
>> DPDK mempools are freed when they are no longer needed.
>> This can happen when a port is removed or a port's mtu
>> is reconfigured so that a new mempool is used.
>>
>> It is possible that an mbuf is attempted to be returned
>> to a freed mempool from NIC Tx queues and this can lead
>> to a segfault.
>>
>> In order to prevent this, only free mempools when they
>> are not needed and have no in-use mbufs. As this might
>> not be possible immediately, sweep the mempools anytime
>> a port tries to get a mempool.
>>
>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>> Cc: [email protected]
>> Cc: Ilya Maximets <[email protected]>
>> Reported-by: Venkatesan Pradeep <[email protected]>
>> Signed-off-by: Kevin Traynor <[email protected]>
>> ---
>>
>> v2: Add second call to rte_mempool_full() as it is not atomic
>> so we can't trust one call to it.
>>
>> lib/netdev-dpdk.c | 48 +++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 35 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index c19cedc..00306c4 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -590,8 +590,32 @@ dpdk_mp_create(int socket_id, int mtu)
>> }
>>
>> +/* Free unused mempools. */
>> +static void
>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
>> +{
>> + struct dpdk_mp *dmp, *next;
>> +
>> + LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>> + if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>> + /*
>> + * rte_mempool_full() is not atomic, but at this point
>> + * we will not be getting any more mbufs from the mempool
>> + * so if it reports full again on a subsequent call, then
>> + * we can be sure that it really is full.
>> + */
>
>
> Checking of the rte_mempool_full() twice will not guarantee that it's
> really full because it sums cache sizes with ring size and truncates the
> value from the upper side. For example:
>
> 0. Let the mempool size == 250. Current mempool ring contains 150 packets.
> Some of the packets in caches and in use.
> 1. Sum of the caches calculated. total_cache_len = 100;
> 2. In a separate thread some of the packets (let it be 50) flushed from the
> cache.
> 3. Getting the mempool ring count (200) and summing with previously calculated
> cache sizes. total_size = 100 + 200 == 300.
> --> Here we have twice accounted flushed packets. Truncated to 250 before
> return.
It does not happen in this order - maybe you are just trying to
illustrate what would happen if the order was reversed in the
implementation. Currently it would be like this:
1. The ring is counted (150)
2. Packets flushed from cache to ring (50)
3. Caches counted (50)
4. Total counted = 150 + 50 = 200
5. rte_mempool_full() returns false.
Which is a false negative and is handled correctly in OVS.
That is why with knowledge of the count order it is safe to only call
rte_mempool_full() once, as we know mbufs could only possibly go from
cache-->ring in the mempool now.
>
> 4. rte_mempool_full() returns true.
> 5. Second call:
> 6. Sum of the caches calculated. total_cache_len = 50;
> 7. In a separate thread some of the packets (let it be 30) flushed from the
> cache.
> 8. Getting the mempool ring count (230) and summing with previously calculated
> cache sizes. total_size = 50 + 230 == 280.
> --> We still have twice accounted mbufs. Truncated to 250 before return.
> 9. rte_mempool_full() returns true which could be false positive.
>
> So, it's possible to have 2 false positives in a row. The only way to be sure
> is to check twice rte_mempool_ops_get_count(), which is thread safe.
>
ok, let's do it. I'll send a v3. All of the solutions discussed here are
a big improvement on the current situation anyway. If we get a more
elegant solution from DPDK in future we can use it.
>
>> + if (OVS_LIKELY(rte_mempool_full(dmp->mp))) {
>> + ovs_list_remove(&dmp->list_node);
>> + rte_mempool_free(dmp->mp);
>> + rte_free(dmp);
>> + }
>> + }
>> + }
>> +}
>> +
>> static struct dpdk_mp *
>> dpdk_mp_get(int socket_id, int mtu)
>> {
>> struct dpdk_mp *dmp;
>> + bool reuse = false;
>>
>> ovs_mutex_lock(&dpdk_mp_mutex);
>> @@ -599,15 +623,18 @@ dpdk_mp_get(int socket_id, int mtu)
>> if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>> dmp->refcount++;
>> - goto out;
>> + reuse = true;
>> + break;
>> }
>> }
>> + /* Sweep mempools after reuse or before create. */
>> + dpdk_mp_sweep();
>>
>> - dmp = dpdk_mp_create(socket_id, mtu);
>> - if (dmp) {
>> - ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> + if (!reuse) {
>> + dmp = dpdk_mp_create(socket_id, mtu);
>> + if (dmp) {
>> + ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> + }
>> }
>>
>> -out:
>> -
>> ovs_mutex_unlock(&dpdk_mp_mutex);
>>
>> @@ -615,5 +642,5 @@ out:
>> }
>>
>> -/* Release an existing mempool. */
>> +/* Decrement reference to a mempool. */
>> static void
>> dpdk_mp_put(struct dpdk_mp *dmp)
>> @@ -625,10 +652,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>> ovs_mutex_lock(&dpdk_mp_mutex);
>> ovs_assert(dmp->refcount);
>> -
>> - if (!--dmp->refcount) {
>> - ovs_list_remove(&dmp->list_node);
>> - rte_mempool_free(dmp->mp);
>> - rte_free(dmp);
>> - }
>> + dmp->refcount--;
>> ovs_mutex_unlock(&dpdk_mp_mutex);
>> }
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev