On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <[email protected]> wrote:
> On 19/07/2018 00:02, Darrell Ball wrote: > > > > > > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 16/07/2018 09:37, Lam, Tiago wrote: > > > On 13/07/2018 18:54, Darrell Ball wrote: > > >> Thanks for the patch. > > >> > > >> A few queries inline. > > >> > > > > > > Hi Darrell, > > > > > > Thanks for your inputs. I've replied in-line as well. > > > > > >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <[email protected] > > <mailto:[email protected]> > > >> <mailto:[email protected] <mailto:[email protected]>>> wrote: > > >> > > >> When enabled with DPDK OvS relies on mbufs allocated by > > mempools to > > >> receive and output data on DPDK ports. Until now, each OvS > > dp_packet has > > >> had only one mbuf associated, which is allocated with the > maximum > > >> possible size, taking the MTU into account. This approach, > > however, > > >> doesn't allow us to increase the allocated size in an mbuf, > > if needed, > > >> since an mbuf is allocated and initialised upon mempool > > creation. Thus, > > >> in the current implementatin this is dealt with by calling > > >> OVS_NOT_REACHED() and terminating OvS. > > >> > > >> To avoid this, and allow the (already) allocated space to be > > better > > >> used, dp_packet_resize__() now tries to use the available > > room, both the > > >> tailroom and the headroom, to make enough space for the new > > data. Since > > >> this happens for packets of source DPBUF_DPDK, the > > single-segment mbuf > > >> case mentioned above is also covered by this new aproach in > > resize__(). > > >> > > >> Signed-off-by: Tiago Lam <[email protected] > > <mailto:[email protected]> > > >> <mailto:[email protected] <mailto:[email protected]>>> > > >> Acked-by: Eelco Chaudron <[email protected] > > <mailto:[email protected]> > > >> <mailto:[email protected] <mailto:[email protected]>>> > > >> --- > > >> lib/dp-packet.c | 48 > > ++++++++++++++++++++++++++++++++++++++++++++++-- > > >> 1 file changed, 46 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > >> index d6e19eb..87af459 100644 > > >> --- a/lib/dp-packet.c > > >> +++ b/lib/dp-packet.c > > >> @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, > > size_t > > >> new_headroom, size_t new_tailroom > > >> new_allocated = new_headroom + dp_packet_size(b) + > > new_tailroom; > > >> > > >> switch (b->source) { > > >> + /* When resizing mbufs, both a single mbuf and > multi-segment > > >> mbufs (where > > >> + * data is not contigously held in memory), both the > > headroom > > >> and the > > >> + * tailroom available will be used to make more space > > for where > > >> data needs > > >> + * to be inserted. I.e if there's not enough headroom, > > data may > > >> be shifted > > >> + * right if there's enough tailroom. > > >> + * However, this is not bulletproof and in some cases > > the space > > >> available > > >> + * won't be enough - in those cases, an error should be > > >> returned and the > > >> + * packet dropped. */ > > >> case DPBUF_DPDK: > > >> - OVS_NOT_REACHED(); > > >> > > >> > > >> Previously, it was a coding error to call this function for a > > DPDK mbuf > > >> case, which is pretty > > >> clear. But with this patch, presumably that is not longer the > > case and > > >> the calling the API is > > >> now ok for DPDK mbufs. > > >> > > > > > > As it stands, it will still be an error to call > > dp_packet_resize__() for > > > any DPDK packet, or by extension any of the other functions that > call > > > it, such as dp_packet_prealloc_tailroom() and > > > dp_packet_prealloc_headroom(). This patch only tries to alleviate > that > > > by accommodating space from the headroom or tailroom, if possible, > and > > > create just enough space for the new data. My preferred approach > would > > > be to return an error if not possible, but since the API doesn't > deal > > > with errors as is, the previous behavior of manually asserting was > > left > > > as is. As reported in [1] (I comment more on that below), the > behavior > > > of manually asserting can lead to undesired behavior in some use > > cases. > > > > > >> > > >> > > >> + { > > >> + size_t miss_len; > > >> + > > >> + if (new_headroom == dp_packet_headroom(b)) { > > >> + /* This is a tailroom adjustment. Since there's > no > > >> tailroom space > > >> + * left, try and shift data towards the head to > > free up > > >> tail space, > > >> + * if there's enough headroom */ > > >> + > > >> + miss_len = new_tailroom - dp_packet_tailroom(b); > > >> + > > >> + if (miss_len <= new_headroom) { > > >> + dp_packet_shift(b, -miss_len); > > >> + } else { > > >> + /* XXX: Handle error case and report error > > to caller */ > > >> + OVS_NOT_REACHED(); > > >> > > >> > > >> > > >> This will not just drop the packet, it will fail the daemon, > > because a > > >> packet cannot be resized. > > >> If the system is completely depleted of memory, that may be ok, > > but in > > >> the case, no such > > >> assumption is implied. > > >> > > >> Also, why is XXX still left in the patch? > > >> > > > > > > Because there's still work to do in that regard. The whole process > > > shouldn't be brought down if there's not enough space to put some > data > > > in one single packet. However, this was intentionally left out of > this > > > series or otherwise it would increase its complexity considerably. > > > > > > As others have pointed out in [1], this is not a simple change, > which > > > would have to be propagated to higher levels in other parts of the > > code > > > base. I've proposed an alternative (vs refactoring the whole > dp_packet > > > API to handle and return errors) in [2], but that seems to have > gone > > > stale. Going forward I see that approach merging with this new > > piece in > > > dp_packet_resize__(), where an error can be returned to the caller > if > > > there's not enough space. > > > > > >> Also, the preexisting API handles two cases: > > >> 1/ Tailroom only adjustment > > >> 2/ headroom and/or tailroom adjustment > > >> > > >> meaning it handles all cases. > > >> > > >> The new DPDK addition (part of the same API) defines 2 cases > > >> > > >> 1/ tailroom only adjustment > > >> 2/ headroom only adjustment > > >> > > >> So, it looks like a different API, that also does not handle all > > cases. > > >> > > >> > > > > > > You have a point there, support for point 2/ "headroom and tailroom > > > adjustment" is missed. It doesn't seem to be used anywhere at the > > > moment, the only callers being dp_packet_prealloc_tailroom() and > > > dp_packet_prealloc_headroom(), but I'll submit an incremental > patch to > > > deal with this. Thanks for pointing it out. > > > > > I've had a look into this and it doesn't seem that case number 2/ > > "headroom and tailroom adjustment" above would make sense for the > > DPBUF_DPDK case. The reason being that if both `new_tailroom` and > > `new_headroom` are being increased (in respect to > dp_packet_tailroom() > > and dp_packet_headroom()), there won't be enough space as the tail > and > > head would be competing for the same available space. So it makes > sense > > to make it an exclusive operation for the DPBUF_DPDK case. > > > > > > Maybe you can explain why that would be the case, in general? > > > > DPDK mbufs have a fixed layout. That means we can't really resize the > mbufs per say, as `buf_len` are fixed. For example, while in a normal > DPBUF_MALLOC packet we can reallocate data and copy everything to a > smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are > stuck with the same `buf_len` (meaning we can't change the tailroom > without affecting the `data_len`, thus modifying data). What we can do > is shift data to try and create more or less tailroom, taking or adding > space from the headroom. > The above is fine, albeit obvious; but my question was something different. I was asking why you thought a packet could not adjust both headroom and tailroom for dpdk mbuf packets. data_len may change will headroom and/or tailroom changes. > > I've gave it some thought and I'm now convinced that reusing resize__() > here is probably not the best approach. Given its implementation and > description ("Reallocates 'b' so that it has exactly 'new_headroom' and > 'new_tailroom'"), that doesn't sound like it would be a good fit for a > DPDK packet. Maybe a completely separate function would be the best > approach here, setting up different expectations. > That is part of the issue. > > However, as I said before, this was implemented to try to alleviate the > call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding > more complications than what it's actually solving (at least at this > point in time), I'm going to drop this patch (I'll send a revert patch > with the needed changes) in favor of dealing properly with > dp_packet_resize__() for DPDK packets, and stop calling it for DPDK > packets. > I think it would be better to modify the original patchset, rather than try to add code with the multisegment mbuf patchset and then immediately revert it. Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used without the resize__() in this patch 8, I think it would be best to remove the 'shift' api from patch 7. > > > However, if that is really the case, then that is fine; an "else if" > > case would > > need to be added and a modified 'return error else case'. > > > > > > > > > > The work I mentioned before in [1] should help here as we would then > be > > able to return an error (suach as `EINVAL`) if both "new_tailroom" > and > > "new_headroom" are incremented for the DPBUF_DPDK case. > > > > What do you think? > > > > [1] > > https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html > > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/ > 348908.html> > > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
