On Tue, Jul 24, 2018 at 12:36 AM, Lam, Tiago <[email protected]> wrote:
> On 23/07/2018 23:48, Darrell Ball wrote: > > > > > > On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 19/07/2018 00:02, Darrell Ball wrote: > > > > > > > > > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <[email protected] > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > On 16/07/2018 09:37, Lam, Tiago wrote: > > > > On 13/07/2018 18:54, Darrell Ball wrote: > > > >> Thanks for the patch. > > > >> > > > >> A few queries inline. > > > >> > > > > > > > > Hi Darrell, > > > > > > > > Thanks for your inputs. I've replied in-line as well. > > > > > > > >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam < > [email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>> > > > >> <mailto:[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > >> > > > >> When enabled with DPDK OvS relies on mbufs allocated by > > > mempools to > > > >> receive and output data on DPDK ports. Until now, each > OvS > > > dp_packet has > > > >> had only one mbuf associated, which is allocated with > > the maximum > > > >> possible size, taking the MTU into account. This > approach, > > > however, > > > >> doesn't allow us to increase the allocated size in an > mbuf, > > > if needed, > > > >> since an mbuf is allocated and initialised upon mempool > > > creation. Thus, > > > >> in the current implementatin this is dealt with by > calling > > > >> OVS_NOT_REACHED() and terminating OvS. > > > >> > > > >> To avoid this, and allow the (already) allocated space > > to be > > > better > > > >> used, dp_packet_resize__() now tries to use the > available > > > room, both the > > > >> tailroom and the headroom, to make enough space for the > new > > > data. Since > > > >> this happens for packets of source DPBUF_DPDK, the > > > single-segment mbuf > > > >> case mentioned above is also covered by this new > aproach in > > > resize__(). > > > >> > > > >> Signed-off-by: Tiago Lam <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>> > > > >> <mailto:[email protected] > > <mailto:[email protected]> <mailto:[email protected] > > <mailto:[email protected]>>>> > > > >> Acked-by: Eelco Chaudron <[email protected] <mailto: > [email protected]> > > > <mailto:[email protected] <mailto:[email protected]>> > > > >> <mailto:[email protected] > > <mailto:[email protected]> <mailto:[email protected] > > <mailto:[email protected]>>>> > > > >> --- > > > >> lib/dp-packet.c | 48 > > > ++++++++++++++++++++++++++++++++++++++++++++++-- > > > >> 1 file changed, 46 insertions(+), 2 deletions(-) > > > >> > > > >> diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > > >> index d6e19eb..87af459 100644 > > > >> --- a/lib/dp-packet.c > > > >> +++ b/lib/dp-packet.c > > > >> @@ -237,9 +237,51 @@ dp_packet_resize__(struct > > dp_packet *b, > > > size_t > > > >> new_headroom, size_t new_tailroom > > > >> new_allocated = new_headroom + dp_packet_size(b) + > > > new_tailroom; > > > >> > > > >> switch (b->source) { > > > >> + /* When resizing mbufs, both a single mbuf and > > multi-segment > > > >> mbufs (where > > > >> + * data is not contigously held in memory), both > the > > > headroom > > > >> and the > > > >> + * tailroom available will be used to make more > space > > > for where > > > >> data needs > > > >> + * to be inserted. I.e if there's not enough > headroom, > > > data may > > > >> be shifted > > > >> + * right if there's enough tailroom. > > > >> + * However, this is not bulletproof and in some > cases > > > the space > > > >> available > > > >> + * won't be enough - in those cases, an error > > should be > > > >> returned and the > > > >> + * packet dropped. */ > > > >> case DPBUF_DPDK: > > > >> - OVS_NOT_REACHED(); > > > >> > > > >> > > > >> Previously, it was a coding error to call this function for > a > > > DPDK mbuf > > > >> case, which is pretty > > > >> clear. But with this patch, presumably that is not longer > the > > > case and > > > >> the calling the API is > > > >> now ok for DPDK mbufs. > > > >> > > > > > > > > As it stands, it will still be an error to call > > > dp_packet_resize__() for > > > > any DPDK packet, or by extension any of the other functions > > that call > > > > it, such as dp_packet_prealloc_tailroom() and > > > > dp_packet_prealloc_headroom(). This patch only tries to > > alleviate that > > > > by accommodating space from the headroom or tailroom, if > > possible, and > > > > create just enough space for the new data. My preferred > > approach would > > > > be to return an error if not possible, but since the API > > doesn't deal > > > > with errors as is, the previous behavior of manually > > asserting was > > > left > > > > as is. As reported in [1] (I comment more on that below), > > the behavior > > > > of manually asserting can lead to undesired behavior in some > use > > > cases. > > > > > > > >> > > > >> > > > >> + { > > > >> + size_t miss_len; > > > >> + > > > >> + if (new_headroom == dp_packet_headroom(b)) { > > > >> + /* This is a tailroom adjustment. Since > > there's no > > > >> tailroom space > > > >> + * left, try and shift data towards the > > head to > > > free up > > > >> tail space, > > > >> + * if there's enough headroom */ > > > >> + > > > >> + miss_len = new_tailroom - > > dp_packet_tailroom(b); > > > >> + > > > >> + if (miss_len <= new_headroom) { > > > >> + dp_packet_shift(b, -miss_len); > > > >> + } else { > > > >> + /* XXX: Handle error case and report > error > > > to caller */ > > > >> + OVS_NOT_REACHED(); > > > >> > > > >> > > > >> > > > >> This will not just drop the packet, it will fail the daemon, > > > because a > > > >> packet cannot be resized. > > > >> If the system is completely depleted of memory, that may be > ok, > > > but in > > > >> the case, no such > > > >> assumption is implied. > > > >> > > > >> Also, why is XXX still left in the patch? > > > >> > > > > > > > > Because there's still work to do in that regard. The whole > > process > > > > shouldn't be brought down if there's not enough space to put > > some data > > > > in one single packet. However, this was intentionally left > > out of this > > > > series or otherwise it would increase its complexity > > considerably. > > > > > > > > As others have pointed out in [1], this is not a simple > > change, which > > > > would have to be propagated to higher levels in other parts > > of the > > > code > > > > base. I've proposed an alternative (vs refactoring the whole > > dp_packet > > > > API to handle and return errors) in [2], but that seems to > > have gone > > > > stale. Going forward I see that approach merging with this > new > > > piece in > > > > dp_packet_resize__(), where an error can be returned to the > > caller if > > > > there's not enough space. > > > > > > > >> Also, the preexisting API handles two cases: > > > >> 1/ Tailroom only adjustment > > > >> 2/ headroom and/or tailroom adjustment > > > >> > > > >> meaning it handles all cases. > > > >> > > > >> The new DPDK addition (part of the same API) defines 2 cases > > > >> > > > >> 1/ tailroom only adjustment > > > >> 2/ headroom only adjustment > > > >> > > > >> So, it looks like a different API, that also does not > > handle all > > > cases. > > > >> > > > >> > > > > > > > > You have a point there, support for point 2/ "headroom and > > tailroom > > > > adjustment" is missed. It doesn't seem to be used anywhere > > at the > > > > moment, the only callers being dp_packet_prealloc_tailroom() > and > > > > dp_packet_prealloc_headroom(), but I'll submit an > > incremental patch to > > > > deal with this. Thanks for pointing it out. > > > > > > > I've had a look into this and it doesn't seem that case number > 2/ > > > "headroom and tailroom adjustment" above would make sense for > the > > > DPBUF_DPDK case. The reason being that if both `new_tailroom` > and > > > `new_headroom` are being increased (in respect to > > dp_packet_tailroom() > > > and dp_packet_headroom()), there won't be enough space as the > > tail and > > > head would be competing for the same available space. So it > > makes sense > > > to make it an exclusive operation for the DPBUF_DPDK case. > > > > > > > > > Maybe you can explain why that would be the case, in general? > > > > > > > DPDK mbufs have a fixed layout. That means we can't really resize the > > mbufs per say, as `buf_len` are fixed. For example, while in a normal > > DPBUF_MALLOC packet we can reallocate data and copy everything to a > > smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we > are > > stuck with the same `buf_len` (meaning we can't change the tailroom > > without affecting the `data_len`, thus modifying data). What we can > do > > is shift data to try and create more or less tailroom, taking or > adding > > space from the headroom. > > > > > > The above is fine, albeit obvious; but my question was something > different. > > I was asking why you thought a packet could not adjust both headroom and > > tailroom > > for dpdk mbuf packets. data_len may change will headroom and/or tailroom > > changes. > > > > > > My understanding is that `data_len` needs to remain fixed (as `size_` in > non-DPDK packets remains the same in DPBUF_MALLOC). If you change > `data_len` then you're effectively affecting the data. Either adding > random data to the packet, by increasing `data_len`, or trunking the > packet by decreasing `data_len`. > > Given the above, if you want to increase the tailroom then you can only > shift the data left (if there's enough headroom to do so), and that will > affect the headroom available space. > I think it is possible to change both headroom and tailroom in a packet in general. Also from your own code /* Update mbufs' properties, and if using multi-segment mbufs, first and * last mbuf's data_len also needs to be adjusted */ mbuf->data_off = mbuf->data_off + dst_ofs; > > > > > > > I've gave it some thought and I'm now convinced that reusing > resize__() > > here is probably not the best approach. Given its implementation and > > description ("Reallocates 'b' so that it has exactly 'new_headroom' > and > > 'new_tailroom'"), that doesn't sound like it would be a good fit for > a > > DPDK packet. Maybe a completely separate function would be the best > > approach here, setting up different expectations. > > > > > > That is part of the issue. > > > > > > > > > > > > However, as I said before, this was implemented to try to alleviate > the > > call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's > adding > > more complications than what it's actually solving (at least at this > > point in time), I'm going to drop this patch (I'll send a revert > patch > > with the needed changes) in favor of dealing properly with > > dp_packet_resize__() for DPDK packets, and stop calling it for DPDK > > packets. > > > > > > > > I think it would be better to modify the original patchset, rather than > > try to add code with > > the multisegment mbuf patchset and then immediately revert it. > > Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used > > without the resize__() in > > this patch 8, I think it would be best to remove the 'shift' api from > > patch 7. > > > > I would be fine with modifying the original patchset, except your > comments came after the merge to the `dpdk_merge` branch. And it becomes > somewhat even more delicate because it is very close to the deadline for > accepting new features. > > Now, we could modify the `dpdk_merge` branch, since it hasn't been > merged to master yet, but at expenses of Ian's time... I'm not sure what > the policy is here, so I'm CC'ing Ian to clarify. Should I submit v6 > here, or leave it as is (with a revert patch queued to be applied)? > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
