sorry, several distractions delayed response. On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <[email protected]> wrote:
> On 13/07/2018 18:54, Darrell Ball wrote: > > Thanks for the patch. > > > > A few queries inline. > > > > Hi Darrell, > > Thanks for your inputs. I've replied in-line as well. > > > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <[email protected] > > <mailto:[email protected]>> wrote: > > > > When enabled with DPDK OvS relies on mbufs allocated by mempools to > > receive and output data on DPDK ports. Until now, each OvS dp_packet > has > > had only one mbuf associated, which is allocated with the maximum > > possible size, taking the MTU into account. This approach, however, > > doesn't allow us to increase the allocated size in an mbuf, if > needed, > > since an mbuf is allocated and initialised upon mempool creation. > Thus, > > in the current implementatin this is dealt with by calling > > OVS_NOT_REACHED() and terminating OvS. > > > > To avoid this, and allow the (already) allocated space to be better > > used, dp_packet_resize__() now tries to use the available room, both > the > > tailroom and the headroom, to make enough space for the new data. > Since > > this happens for packets of source DPBUF_DPDK, the single-segment > mbuf > > case mentioned above is also covered by this new aproach in > resize__(). > > > > Signed-off-by: Tiago Lam <[email protected] > > <mailto:[email protected]>> > > Acked-by: Eelco Chaudron <[email protected] > > <mailto:[email protected]>> > > --- > > lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++ > ++++++++++++++++-- > > 1 file changed, 46 insertions(+), 2 deletions(-) > > > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > index d6e19eb..87af459 100644 > > --- a/lib/dp-packet.c > > +++ b/lib/dp-packet.c > > @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t > > new_headroom, size_t new_tailroom > > new_allocated = new_headroom + dp_packet_size(b) + new_tailroom; > > > > switch (b->source) { > > + /* When resizing mbufs, both a single mbuf and multi-segment > > mbufs (where > > + * data is not contigously held in memory), both the headroom > > and the > > + * tailroom available will be used to make more space for where > > data needs > > + * to be inserted. I.e if there's not enough headroom, data may > > be shifted > > + * right if there's enough tailroom. > > + * However, this is not bulletproof and in some cases the space > > available > > + * won't be enough - in those cases, an error should be > > returned and the > > + * packet dropped. */ > > case DPBUF_DPDK: > > - OVS_NOT_REACHED(); > > > > > > Previously, it was a coding error to call this function for a DPDK mbuf > > case, which is pretty > > clear. But with this patch, presumably that is not longer the case and > > the calling the API is > > now ok for DPDK mbufs. > > > > As it stands, it will still be an error to call dp_packet_resize__() for > any DPDK packet, or by extension any of the other functions that call > it, such as dp_packet_prealloc_tailroom() and > dp_packet_prealloc_headroom(). yep, the existing code fails in a very clear way; i.e. whenever it is called for a dpdk packet. So the user would need to handle in some other way, which is not being done today, I know. > This patch only tries to alleviate that > by accommodating space from the headroom or tailroom, if possible, and > create just enough space for the new data. The new code will fail is some yet undefined way, occasionally working and failing in the "other" cases. > My preferred approach would > be to return an error if not possible, but since the API doesn't deal > with errors as is, the previous behavior of manually asserting was left > as is. As reported in [1] (I comment more on that below), the behavior of manually asserting can lead to undesired behavior in some use cases. > I am familiar with the issue. As part of the change, dp_packet_put_uninit() and dp_packet_push_uninit() could be modified to return NULL and that could be percolated and checked for. Those APIs could simply check (by calling a helper API) if they would fail a priori to trigger returning NULL for dpdk buf cases. > > > > > > > + { > > + size_t miss_len; > > + > > + if (new_headroom == dp_packet_headroom(b)) { > > + /* This is a tailroom adjustment. Since there's no > > tailroom space > > + * left, try and shift data towards the head to free up > > tail space, > > + * if there's enough headroom */ > > + > > + miss_len = new_tailroom - dp_packet_tailroom(b); > > + > > + if (miss_len <= new_headroom) { > > + dp_packet_shift(b, -miss_len); > > + } else { > > + /* XXX: Handle error case and report error to > caller */ > > + OVS_NOT_REACHED(); > > > > > > > > This will not just drop the packet, it will fail the daemon, because a > > packet cannot be resized. > > If the system is completely depleted of memory, that may be ok, but in > > the case, no such > > assumption is implied. > > > > Also, why is XXX still left in the patch? > > > > Because there's still work to do in that regard. The whole process > shouldn't be brought down if there's not enough space to put some data > in one single packet. However, this was intentionally left out of this > series or otherwise it would increase its complexity considerably. It seems unnecessary to add a bunch of code to a series that tries to handle 'resize', but handles it partially in practical cases. It also seems undefined when it works and when it does not from a API caller POV. I think patch 7 is also there to only support this patch 8. Ideally, It would seem like a modified patch 7 and patch 8 would belong with the rest of the fix for the dpdk packet memory preallocation constraint issue. Also, ideally, 'XXX' is removed from patches. > > As others have pointed out in [1], this is not a simple change, which > would have to be propagated to higher levels in other parts of the code > base. I've proposed an alternative (vs refactoring the whole dp_packet > API to handle and return errors) in [2], but that seems to have gone > stale. Going forward I see that approach merging with this new piece in > dp_packet_resize__(), where an error can be returned to the caller if > there's not enough space. > The full change is outside the scope of this series. > > > Also, the preexisting API handles two cases: > > 1/ Tailroom only adjustment > > 2/ headroom and/or tailroom adjustment > > > > meaning it handles all cases. > > > > The new DPDK addition (part of the same API) defines 2 cases > > > > 1/ tailroom only adjustment > > 2/ headroom only adjustment > > > > So, it looks like a different API, that also does not handle all cases. > > > > > > You have a point there, support for point 2/ "headroom and tailroom > adjustment" is missed. It doesn't seem to be used anywhere at the > moment, the only callers being dp_packet_prealloc_tailroom() and > dp_packet_prealloc_headroom(), but I'll submit an incremental patch to > deal with this. Thanks for pointing it out. > > Tiago. > > [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html > [2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html > > > > > + } > > + } else { > > + /* Otherwise, this is a headroom adjustment. Try to > > shift data > > + * towards the tail to free up head space, if there's > > enough > > + * tailroom */ > > + > > + miss_len = new_headroom - dp_packet_headroom(b); > > > > + > > + if (miss_len <= new_tailroom) { > > + dp_packet_shift(b, miss_len); > > + } else { > > + /* XXX: Handle error case and report error to > caller */ > > + OVS_NOT_REACHED(); > > > > > > > > same comments as above. > > > > > > > > + } > > + } > > + > > + new_base = dp_packet_base(b); > > + > > + break; > > + } > > case DPBUF_MALLOC: > > if (new_headroom == dp_packet_headroom(b)) { > > new_base = xrealloc(dp_packet_base(b), new_allocated); > > @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t > > new_headroom, size_t new_tailroom > > OVS_NOT_REACHED(); > > } > > > > - dp_packet_set_allocated(b, new_allocated); > > + if (b->source != DPBUF_DPDK) { > > + dp_packet_set_allocated(b, new_allocated); > > + } > > dp_packet_set_base(b, new_base); > > > > new_data = (char *) new_base + new_headroom; > > -- > > 2.7.4 > > > > _______________________________________________ > > dev mailing list > > [email protected] <mailto:[email protected]> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
