On 10.04.2018 21:12, Kevin Traynor wrote:
> DPDK mempools are freed when they are no longer needed.
> This can happen when a port is removed or a port's mtu
> is reconfigured so that a new mempool is used.
> 
> It is possible that an mbuf is attempted to be returned
> to a freed mempool from NIC Tx queues and this can lead
> to a segfault.
> 
> In order to prevent this, only free mempools when they
> are not needed and have no in-use mbufs. As this might
> not be possible immediately, sweep the mempools anytime
> a port tries to get a mempool.
> 
> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
> Cc: mark.b.kavanag...@gmail.com
> Cc: Ilya Maximets <i.maxim...@samsung.com>
> Reported-by: Venkatesan Pradeep <venkatesan.prad...@ericsson.com>
> Signed-off-by: Kevin Traynor <ktray...@redhat.com>
> ---
> 
> v2: Add second call to rte_mempool_full() as it is not atomic
>     so we can't trust one call to it.
> 
>  lib/netdev-dpdk.c | 48 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c19cedc..00306c4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -590,8 +590,32 @@ dpdk_mp_create(int socket_id, int mtu)
>  }
>  
> +/* Free unused mempools. */
> +static void
> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
> +{
> +    struct dpdk_mp *dmp, *next;
> +
> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
> +            /*
> +             * rte_mempool_full() is not atomic, but at this point
> +             * we will not be getting any more mbufs from the mempool
> +             * so if it reports full again on a subsequent call, then
> +             * we can be sure that it really is full.
> +             */


Checking of the rte_mempool_full() twice will not guarantee that it's
really full because it sums cache sizes with ring size and truncates the
value from the upper side. For example:

0. Let the mempool size == 250. Current mempool ring contains 150 packets.
   Some of the packets in caches and in use.
1. Sum of the caches calculated. total_cache_len = 100;
2. In a separate thread some of the packets (let it be 50) flushed from the 
cache.
3. Getting the mempool ring count (200) and summing with previously calculated
   cache sizes. total_size = 100 + 200 == 300.
   --> Here we have twice accounted flushed packets. Truncated to 250 before 
return.

4. rte_mempool_full() returns true.
5. Second call:
6. Sum of the caches calculated. total_cache_len = 50;
7. In a separate thread some of the packets (let it be 30) flushed from the 
cache.
8. Getting the mempool ring count (230) and summing with previously calculated
   cache sizes. total_size = 50 + 230 == 280.
   --> We still have twice accounted mbufs. Truncated to 250 before return.
9. rte_mempool_full() returns true which could be false positive.

So, it's possible to have 2 false positives in a row. The only way to be sure
is to check twice rte_mempool_ops_get_count(), which is thread safe.


> +            if (OVS_LIKELY(rte_mempool_full(dmp->mp))) {
> +                ovs_list_remove(&dmp->list_node);
> +                rte_mempool_free(dmp->mp);
> +                rte_free(dmp);
> +            }
> +        }
> +    }
> +}
> +
>  static struct dpdk_mp *
>  dpdk_mp_get(int socket_id, int mtu)
>  {
>      struct dpdk_mp *dmp;
> +    bool reuse = false;
>  
>      ovs_mutex_lock(&dpdk_mp_mutex);
> @@ -599,15 +623,18 @@ dpdk_mp_get(int socket_id, int mtu)
>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>              dmp->refcount++;
> -            goto out;
> +            reuse = true;
> +            break;
>          }
>      }
> +    /* Sweep mempools after reuse or before create. */
> +    dpdk_mp_sweep();
>  
> -    dmp = dpdk_mp_create(socket_id, mtu);
> -    if (dmp) {
> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
> +    if (!reuse) {
> +        dmp = dpdk_mp_create(socket_id, mtu);
> +        if (dmp) {
> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
> +        }
>      }
>  
> -out:
> -
>      ovs_mutex_unlock(&dpdk_mp_mutex);
>  
> @@ -615,5 +642,5 @@ out:
>  }
>  
> -/* Release an existing mempool. */
> +/* Decrement reference to a mempool. */
>  static void
>  dpdk_mp_put(struct dpdk_mp *dmp)
> @@ -625,10 +652,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>      ovs_mutex_lock(&dpdk_mp_mutex);
>      ovs_assert(dmp->refcount);
> -
> -    if (!--dmp->refcount) {
> -        ovs_list_remove(&dmp->list_node);
> -        rte_mempool_free(dmp->mp);
> -        rte_free(dmp);
> -     }
> +    dmp->refcount--;
>      ovs_mutex_unlock(&dpdk_mp_mutex);
>  }
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to