On 10.04.2018 21:12, Kevin Traynor wrote:
> DPDK mempools are freed when they are no longer needed.
> This can happen when a port is removed or a port's mtu
> is reconfigured so that a new mempool is used.
>
> It is possible that an mbuf is attempted to be returned
> to a freed mempool from NIC Tx queues and this can lead
> to a segfault.
>
> In order to prevent this, only free mempools when they
> are not needed and have no in-use mbufs. As this might
> not be possible immediately, sweep the mempools anytime
> a port tries to get a mempool.
>
> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
> Cc: [email protected]
> Cc: Ilya Maximets <[email protected]>
> Reported-by: Venkatesan Pradeep <[email protected]>
> Signed-off-by: Kevin Traynor <[email protected]>
> ---
>
> v2: Add second call to rte_mempool_full() as it is not atomic
> so we can't trust one call to it.
>
> lib/netdev-dpdk.c | 48 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 35 insertions(+), 13 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c19cedc..00306c4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -590,8 +590,32 @@ dpdk_mp_create(int socket_id, int mtu)
> }
>
> +/* Free unused mempools. */
> +static void
> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
> +{
> + struct dpdk_mp *dmp, *next;
> +
> + LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
> + if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
> + /*
> + * rte_mempool_full() is not atomic, but at this point
> + * we will not be getting any more mbufs from the mempool
> + * so if it reports full again on a subsequent call, then
> + * we can be sure that it really is full.
> + */
Checking of the rte_mempool_full() twice will not guarantee that it's
really full because it sums cache sizes with ring size and truncates the
value from the upper side. For example:
0. Let the mempool size == 250. Current mempool ring contains 150 packets.
Some of the packets in caches and in use.
1. Sum of the caches calculated. total_cache_len = 100;
2. In a separate thread some of the packets (let it be 50) flushed from the
cache.
3. Getting the mempool ring count (200) and summing with previously calculated
cache sizes. total_size = 100 + 200 == 300.
--> Here we have twice accounted flushed packets. Truncated to 250 before
return.
4. rte_mempool_full() returns true.
5. Second call:
6. Sum of the caches calculated. total_cache_len = 50;
7. In a separate thread some of the packets (let it be 30) flushed from the
cache.
8. Getting the mempool ring count (230) and summing with previously calculated
cache sizes. total_size = 50 + 230 == 280.
--> We still have twice accounted mbufs. Truncated to 250 before return.
9. rte_mempool_full() returns true which could be false positive.
So, it's possible to have 2 false positives in a row. The only way to be sure
is to check twice rte_mempool_ops_get_count(), which is thread safe.
> + if (OVS_LIKELY(rte_mempool_full(dmp->mp))) {
> + ovs_list_remove(&dmp->list_node);
> + rte_mempool_free(dmp->mp);
> + rte_free(dmp);
> + }
> + }
> + }
> +}
> +
> static struct dpdk_mp *
> dpdk_mp_get(int socket_id, int mtu)
> {
> struct dpdk_mp *dmp;
> + bool reuse = false;
>
> ovs_mutex_lock(&dpdk_mp_mutex);
> @@ -599,15 +623,18 @@ dpdk_mp_get(int socket_id, int mtu)
> if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
> dmp->refcount++;
> - goto out;
> + reuse = true;
> + break;
> }
> }
> + /* Sweep mempools after reuse or before create. */
> + dpdk_mp_sweep();
>
> - dmp = dpdk_mp_create(socket_id, mtu);
> - if (dmp) {
> - ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
> + if (!reuse) {
> + dmp = dpdk_mp_create(socket_id, mtu);
> + if (dmp) {
> + ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
> + }
> }
>
> -out:
> -
> ovs_mutex_unlock(&dpdk_mp_mutex);
>
> @@ -615,5 +642,5 @@ out:
> }
>
> -/* Release an existing mempool. */
> +/* Decrement reference to a mempool. */
> static void
> dpdk_mp_put(struct dpdk_mp *dmp)
> @@ -625,10 +652,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
> ovs_mutex_lock(&dpdk_mp_mutex);
> ovs_assert(dmp->refcount);
> -
> - if (!--dmp->refcount) {
> - ovs_list_remove(&dmp->list_node);
> - rte_mempool_free(dmp->mp);
> - rte_free(dmp);
> - }
> + dmp->refcount--;
> ovs_mutex_unlock(&dpdk_mp_mutex);
> }
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev