On 04/09/2018 03:36 PM, Kevin Traynor wrote:
> On 04/06/2018 04:51 PM, Ilya Maximets wrote:
>>>> DPDK mempools are freed when they are no longer needed.
>>>> This can happen when a port is removed or a port's mtu is reconfigured so
>>>> that a new mempool is used.
>>>>
>>>> It is possible that an mbuf is attempted to be returned to a freed mempool
>>>> from NIC Tx queues and this can lead to a segfault.
>>>>
>>>> In order to prevent this, only free mempools when they are not needed and
>>>> have no in-use mbufs. As this might not be possible immediately, sweep the
>>>> mempools anytime a port tries to get a mempool.
>>>>
>>>
>>> Thanks for working on this Kevin. Just a query below. From a testing POV I
>>> didn't come across any issues.
>>>
>>>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>>>> Cc: mark.b.kavanagh81 at gmail.com
>>>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>>>> ---
>>>>
>>>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>>>> 2.*'s. If the patch is acceptable I can backport to older branches.
>>
>> This issue was actually rised back in January while discussing mempool issue.
>>
>
> Hmmm, thanks for pointing it out. Seems the last mails in the thread
> coincided with when I was traveling and I didn't read them back then.
>
>>>
>>> Sounds good to me.
>>>>
>>>> lib/netdev-dpdk.c | 40 +++++++++++++++++++++++++++-------------
>>>> 1 file changed, 27 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>>>> 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu) }
>>>>
>>>> +/* Free unused mempools. */
>>>> +static void
>>>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>>>> + struct dpdk_mp *dmp, *next;
>>>> +
>>>> + LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>>>> + if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>>>
>>> I hadn't looked at rte_mempool_full() before. I noticed the documentation
>>> warns not to use it as part of the datapath but for debug purposes only.
>>
>> Yes, rte_mempool_full() is racy and could return false-positives, but it's
>> the best and only useful function in DPDK API. In practice we can never be
>> sure if someone still uses the memory pool. I also used this function in
>> my solution for branch 2.8 posted here:
>>
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html
>>
>
> The similarities and differences are interesting!
>
>> Memory handling in DPDK is awful. In fact, noone is able to use
>> rte_mempool_free(),
>> because there is no way correct to check if memory pool is still in use.
>>
>> There is one dirty hack to check if mempool is really full. It based on the
>> internal mempool implementation and the fact that we're not allocating any
>> mbufs from the pool:
>> 1. Check rte_mempool_ops_get_count(). <-- this exposed externally for some
>> reason
>> 2. Check rte_mempool_full().
>> 3. Check rte_mempool_ops_get_count() again.
>> 4. If rte_mempool_full() was 'true' and both calls to
>> rte_mempool_ops_get_count() returned same value ----> Mempool really
>> full.
>> 5. Here we could safely call rte_mempool_free().
>>
>
> Yes, that would work.
>
> The ring count is made before the cache count in rte_mempool_full() and
> as we are not getting mbufs I think we should only have the possibility
> of mbufs moving from a cache to the ring. In that case we may get a
> false negative but wouldn't get a false positive for rte_mempool_full().
> Obviously the downside is that it is implementation specific - but maybe
> it could be argued the implementation will not change on already
> released DPDK versions. What do you think?
>
>>>
>>> Does its use add to the downtime in a noticeable way while we reconfigure?
>>> I'm thinking of a deployment where the number of lcores is high and we
>>> reconfigure often. When cache is enabled, it has to browse the length of
>>> all lcores. There may not be an issue here but was curious.
>>
>
> I think it should be insignificant compared to actually
> allocating/freeing a mempool. I will try to check it.
>
I measured 4K-5K cycles, so extra time for rte_mempool_full() is not an
issue in the control path.
>> That is the interesting question. In my version (referenced above) I used
>> watchdog thread to periodically free mempools with zero refcount.
>> Not sure which solution is better.
>>
>
> The advantage I see of being on the watchdog timer thread is that
> probably no one cares how long it takes :-) The advantage of being in
> the mempool config is that it runs (and guarantees to run) just before a
> new mempool is going to be requested, so there is smaller risk of
> additional memory requirements.
>
> In the short term I think we should use the a fix like the patches that
> don't cover every possible corner case or the get_count() that isn't the
> most elegant. Longer term we can think about what we would want to add
> to DPDK to make it easier to do this.
>
> thanks,
> Kevin.
>
>>>
>>> Did you do any testing around this?
>>>
>>> Thanks
>>> Ian
>>>
>>>> + ovs_list_remove(&dmp->list_node);
>>>> + rte_mempool_free(dmp->mp);
>>>> + rte_free(dmp);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> static struct dpdk_mp *
>>>> dpdk_mp_get(int socket_id, int mtu)
>>>> {
>>>> struct dpdk_mp *dmp;
>>>> + bool reuse = false;
>>>>
>>>> ovs_mutex_lock(&dpdk_mp_mutex);
>>>> @@ -599,15 +615,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>>> if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>>> dmp->refcount++;
>>>> - goto out;
>>>> + reuse = true;
>>>> + break;
>>>> }
>>>> }
>>>> + /* Sweep mempools after reuse or before create. */
>>>> + dpdk_mp_sweep();
>>>>
>>>> - dmp = dpdk_mp_create(socket_id, mtu);
>>>> - if (dmp) {
>>>> - ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> + if (!reuse) {
>>>> + dmp = dpdk_mp_create(socket_id, mtu);
>>>> + if (dmp) {
>>>> + ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> + }
>>>> }
>>>>
>>>> -out:
>>>> -
>>>> ovs_mutex_unlock(&dpdk_mp_mutex);
>>>>
>>>> @@ -615,5 +634,5 @@ out:
>>>> }
>>>>
>>>> -/* Release an existing mempool. */
>>>> +/* Decrement reference to a mempool. */
>>>> static void
>>>> dpdk_mp_put(struct dpdk_mp *dmp)
>>>> @@ -625,10 +644,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>>> ovs_mutex_lock(&dpdk_mp_mutex);
>>>> ovs_assert(dmp->refcount);
>>>> -
>>>> - if (!--dmp->refcount) {
>>>> - ovs_list_remove(&dmp->list_node);
>>>> - rte_mempool_free(dmp->mp);
>>>> - rte_free(dmp);
>>>> - }
>>>> + dmp->refcount--;
>>>> ovs_mutex_unlock(&dpdk_mp_mutex);
>>>> }
>>>> --
>>>> 1.8.3.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev