Hi Kevin,
On 25/05/2018 18:22, Kevin Traynor wrote:
> Hi Tiago,
>
[snip]
>>> +
>>> +static struct dpdk_mp *
>>> +dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>>> +{
>>> + struct dpdk_mp *dmp;
>>> + bool reuse = false;
>>>
>>> ovs_mutex_lock(&dpdk_mp_mutex);
>>> - if (dpdk_mp_full(mp)) {
>>> - VLOG_DBG("Freeing mempool \"%s\"", mp->name);
>>> - rte_mempool_free(mp);
>>> - } else {
>>> - struct dpdk_mp *dmp;
>>> + /* Check if shared mempools are being used, if so check existing
>>> mempools
>>> + * to see if reuse is possible. */
>>> + if (!per_port_mp) {
>>> + LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
>>> + if (dmp->socket_id == dev->requested_socket_id
>>> + && dmp->mtu == mtu) {
>>> + VLOG_DBG("Reusing mempool \"%s\"", dmp->mp->name);
>>> + dmp->refcount++;
>>> + reuse = true;
>>> + break;
>>> + }
>>> + }
>>> + }
>>> + /* Sweep mempools after reuse or before create. */
>>> + dpdk_mp_sweep();
>>
>> *Should dpdk_mp_sweep() be called when destroying an interface as well,
>> in common_destruct()? While testing, I've noticed that if I add one port
>> and delete the same port the mempool will still be allocated until you
>> add another port, since dpdk_mp_sweep() will only be called on the next
>> call to dpdp_mp_get(). This could potentially create problems in the
>> per-port model where one deletes a certain number of ports and can't add
>> any other ports because there's (hanging) mempools holding memory.
>>
>
> With the shared model, it was deliberate to leave the dpdk_mp_sweep() as
> late as possible to give more time for buffers to be returned to it and
> to allow for the possibility that it might be reused. In order to not
> require any additional memory, it is done just before a new mempool is
> created.
>
Thanks for giving some context, I also found that to be the case while
testing with both models.
> It's not a problem to call dpdk_mp_sweep() anytime but not sure I follow
> the issue you've raised - sweep would still be called before mempools
> for any new port are created. Is there a case missed?
>
Maybe I haven't explained myself as clearly as I wanted to. I wouldn't
say it is a missed case as much as a consequence of what you just
mentioned - "leave the dpdk_mp_sweep() as late as possible", but I
thought it would be best to flag it just so we are aware.
Let's say you are using the per-port model and you add port X, which
gets mempool A allocated to it. And let's say you immediately delete
that port X. What I was seeing while testing is that mempool A is not
freed right away, only when a new port, let's say port Y, is added. This
also goes in-line with what you just mentioned.
Now, my point is that this could potentially become an issue in the
per-port model, since it doesn't reuse the mempools. If you add a bunch
of ports and then delete them in a batch (or maybe this is not a
use-case?), if you try to add more ports you may find yourself out of
memory, because the mempools haven't actually been freed when deleting
the ports.
>>> +
>>> +/* Depending on the memory model being used this function tries
>>> + * identify and reuse an existing mempool or tries to allocate new
>>> + * mempool on requested_socket_id with mbuf size corresponding to
>>> + * requested_mtu. On success new configuration will be applied.
>>> * On error, device will be left unchanged. */
>>> static int
>>> netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>>> OVS_REQUIRES(dev->mutex)
>>> {
>>> uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
>>> - struct rte_mempool *mp;
>>> + struct dpdk_mp *mp;
>>> int ret = 0;
>>> + bool per_port_mp = dpdk_per_port_mempool_enabled();
>>>
>>> - dpdk_mp_sweep();
>>> + /* With shared mempools we do not need to configure a mempool if the
>>> MTU
>>> + * and socket ID have not changed, the previous configuration is still
>>> + * valid so return 0 */
>>> + if (dev->mtu == dev->requested_mtu
>>> + && dev->socket_id == dev->requested_socket_id
>>> + && (!per_port_mp)) {
>>
>> I also find that moving the `(!per_port_mp)` condition to the beginning
>> improves readability here. It even goes in hand with your comment just
>> above - "With shared mempools we do not ...".
>>
>>> + return ret;
>>> + }
>>>
>>> - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));
>>> + mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
>>> if (!mp) {
>>> VLOG_ERR("Failed to create memory pool for netdev "
>>> "%s, with MTU %d on socket %d: %s\n",
>>> dev->up.name, dev->requested_mtu,
>>> dev->requested_socket_id,
>>> - rte_strerror(rte_errno));
>>> - ret = rte_errno;
>>> + rte_strerror(rte_errno));
>>
>> Missing indentation here.
>>
>>> + return rte_errno;
>>> } else {
>>> - /* If a new MTU was requested and its rounded value equals the one
>>> - * that is currently used, then the existing mempool is returned.
>>> */
>>> - if (dev->mp != mp) {
>>> - /* A new mempool was created, release the previous one. */
>>> - dpdk_mp_release(dev->mp);
>>> - } else {
>>> - ret = EEXIST;
>>> + /* Check for any pre-existing dpdk_mp for the device */
>>> + if (dev->dpdk_mp != NULL) {
>>> + /* If a new MTU was requested and its rounded value equals the
>>> + * one that is currently used, then the existing mempool is
>>> + * returned. */
>>> + if (dev->dpdk_mp->mp != mp->mp) {
>>> + /* A new mempool was created, release the previous one. */
>>> + dpdk_mp_put(dev->dpdk_mp);
>>> + } else {
>>> + /* If the mempool already exists in the current dpdk_mp
>>> then
>>> + * we need to ensure dpdk_mp that was just created is
>>> freed in
>>> + * the next sweep as it will not be used. */
>> The code path around the `else` block here will only be called when
>> `!per_port_mp`. This is because `dpdk_mp_get()` will only return an
>> already existing mempool when using the shared model. Otherwise a new
>> one is always returned, and thus the `if (dev->dpdk_mp->mp != mp->mp)`
>> will be true. Am I reading this right? If so then refactoring this a bit
>> to differentiate on `per_port_mp` might help on readability - this goes
>> in-line with Kevin's comment about making this piece a bit more readable.
>>
>> On the same note, the comment above mislead me to think that the
>> allocated `mp` is being freed, which would result in error since the
>> same `mp` is then assigned below. Instead, what it is doing is
>> decrementing the refcount in struct dpdk_mp, which might end up being
>> freed on the next dpdk_mp_sweep() if `refcount=0`. But that won't happen
>> on the shared model unless no port is using the mempool.
>>
>
> but this port is using it :-)
>
> + if (dev->dpdk_mp->mp != mp->mp) {
>
> is false, which means they both have the same mempool. The refcount in
> dpdk_mp only counts user of the dpdk_mp, not the actual mp, so it could
> then be freed while still needed.
>
I'm not sure I understand your point entirely. It is indeed true that
both `mp`s will be equal and thus `dpdk_mp_put()` will be called,
decrementing the `refcount`. But `refcount` shouldn't get to 0 (and thus
the mempool shouldn't be freed during sweep) as the previous call to
`dpdk_mp_get()` is also incrementing `refcount` by 1.
(In fact, the `else` block seems to be here only to decrement the
`refcount` when a port has been reconfigured and ends up reusing the
same mempool it was using before. In that case, the previous call to
`dpdk_mp_get()` is "wrongly" incrementing `refcount` by 1.)
I also tried to crash the patch by exercising this part of the code with
the following:
1. Add one port - `refcount` here was 1;
2. Reconfigure that port (but keep the MTU, so the same mempool gets
reused) - `refcount` here was 1;
3. Add a second port (this should trigger a call to `dpdk_mp_sweep()`) -
`refcount` here was 2;
4. Send some traffic.
And wasn't able to do so. I do agree, though, that this part can be
improved.
Regards,
Tiago.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev