On 23/07/2018 23:55, Darrell Ball wrote: > > > On Fri, Jul 20, 2018 at 10:09 AM, Lam, Tiago <tiago....@intel.com > <mailto:tiago....@intel.com>> wrote: > > On 18/07/2018 23:53, Darrell Ball wrote: > > sorry, several distractions delayed response. > > > > On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago....@intel.com > <mailto:tiago....@intel.com> > > <mailto:tiago....@intel.com <mailto:tiago....@intel.com>>> wrote: > > > > On 13/07/2018 18:54, Darrell Ball wrote: > > > Thanks for the patch. > > > > > > A few queries inline. > > > > > > > Hi Darrell, > > > > Thanks for your inputs. I've replied in-line as well. > > > > > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago....@intel.com > <mailto:tiago....@intel.com> > <mailto:tiago....@intel.com <mailto:tiago....@intel.com>> > > > <mailto:tiago....@intel.com <mailto:tiago....@intel.com> > <mailto:tiago....@intel.com <mailto:tiago....@intel.com>>>> wrote: > > > > > > When enabled with DPDK OvS relies on mbufs allocated by > mempools to > > > receive and output data on DPDK ports. Until now, each OvS > dp_packet has > > > had only one mbuf associated, which is allocated with the > maximum > > > possible size, taking the MTU into account. This approach, > however, > > > doesn't allow us to increase the allocated size in an mbuf, > if needed, > > > since an mbuf is allocated and initialised upon mempool > creation. Thus, > > > in the current implementatin this is dealt with by calling > > > OVS_NOT_REACHED() and terminating OvS. > > > > > > To avoid this, and allow the (already) allocated space to be > better > > > used, dp_packet_resize__() now tries to use the available > room, both the > > > tailroom and the headroom, to make enough space for the new > data. Since > > > this happens for packets of source DPBUF_DPDK, the > single-segment mbuf > > > case mentioned above is also covered by this new aproach in > resize__(). > > > > > > Signed-off-by: Tiago Lam <tiago....@intel.com > <mailto:tiago....@intel.com> > <mailto:tiago....@intel.com <mailto:tiago....@intel.com>> > > > <mailto:tiago....@intel.com <mailto:tiago....@intel.com> > <mailto:tiago....@intel.com <mailto:tiago....@intel.com>>>> > > > Acked-by: Eelco Chaudron <echau...@redhat.com > <mailto:echau...@redhat.com> > > <mailto:echau...@redhat.com <mailto:echau...@redhat.com>> > > > <mailto:echau...@redhat.com <mailto:echau...@redhat.com> > <mailto:echau...@redhat.com <mailto:echau...@redhat.com>>>> > > > --- > > > lib/dp-packet.c | 48 > > ++++++++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 46 insertions(+), 2 deletions(-) > > > > > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > > index d6e19eb..87af459 100644 > > > --- a/lib/dp-packet.c > > > +++ b/lib/dp-packet.c > > > @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, > > size_t > > > new_headroom, size_t new_tailroom > > > new_allocated = new_headroom + dp_packet_size(b) + > > new_tailroom; > > > > > > switch (b->source) { > > > + /* When resizing mbufs, both a single mbuf and > multi-segment > > > mbufs (where > > > + * data is not contigously held in memory), both > the headroom > > > and the > > > + * tailroom available will be used to make more > space for > > where > > > data needs > > > + * to be inserted. I.e if there's not enough headroom, > > data may > > > be shifted > > > + * right if there's enough tailroom. > > > + * However, this is not bulletproof and in some > cases the > > space > > > available > > > + * won't be enough - in those cases, an error should be > > > returned and the > > > + * packet dropped. */ > > > case DPBUF_DPDK: > > > - OVS_NOT_REACHED(); > > > > > > > > > Previously, it was a coding error to call this function for > a DPDK > > mbuf > > > case, which is pretty > > > clear. But with this patch, presumably that is not longer > the case and > > > the calling the API is > > > now ok for DPDK mbufs. > > > > > > > As it stands, it will still be an error to call > dp_packet_resize__() for > > any DPDK packet, or by extension any of the other functions > that call > > it, such as dp_packet_prealloc_tailroom() and > > dp_packet_prealloc_headroom(). > > > > > > > > yep, the existing code fails in a very clear way; i.e. whenever it is > > called for a dpdk packet. > > So the user would need to handle in some other way, which is not being > > done today, I know. > > > > > > > > This patch only tries to alleviate that > > by accommodating space from the headroom or tailroom, if > possible, and > > create just enough space for the new data. > > > > > > The new code will fail is some yet undefined way, occasionally working > > and failing > > in the "other" cases. > > > > > > > > My preferred approach would > > be to return an error if not possible, but since the API > doesn't deal > > with errors as is, the previous behavior of manually asserting > was left > > as is. As reported in [1] (I comment more on that below), the > behavior > > > > of manually asserting can lead to undesired behavior in some > use cases. > > > > > > > > I am familiar with the issue. > > As part of the change, dp_packet_put_uninit() > > and dp_packet_push_uninit() could be modified to return NULL > > and that could be percolated and checked for. > > > > Those APIs could simply check (by calling a helper API) if they would > > fail a priori to trigger returning > > NULL for dpdk buf cases. > > > > > > Ok, I'm glad we're on the same page here. That's what [2] is doing. I'm > planning to bring that forward. If you could have a look as well, that > would be great. > > > > > > > > > > > > > > + { > > > + size_t miss_len; > > > + > > > + if (new_headroom == dp_packet_headroom(b)) { > > > + /* This is a tailroom adjustment. Since > there's no > > > tailroom space > > > + * left, try and shift data towards the > head to free up > > > tail space, > > > + * if there's enough headroom */ > > > + > > > + miss_len = new_tailroom - > dp_packet_tailroom(b); > > > + > > > + if (miss_len <= new_headroom) { > > > + dp_packet_shift(b, -miss_len); > > > + } else { > > > + /* XXX: Handle error case and report > error to caller */ > > > + OVS_NOT_REACHED(); > > > > > > > > > > > > This will not just drop the packet, it will fail the daemon, > because a > > > packet cannot be resized. > > > If the system is completely depleted of memory, that may be > ok, but in > > > the case, no such > > > assumption is implied. > > > > > > Also, why is XXX still left in the patch? > > > > > > > Because there's still work to do in that regard. The whole process > > shouldn't be brought down if there's not enough space to put > some data > > in one single packet. However, this was intentionally left out > of this > > series or otherwise it would increase its complexity > considerably. > > > > > > It seems unnecessary to add a bunch of code to a series that tries > to handle > > 'resize', but handles it partially in practical cases. It also seems > > undefined when > > it works and when it does not from a API caller POV. > > Granted that the dp_packet_resize__() supports more operations than what > this implementation provides, but dp_packet API callers won't notice a > difference since the only callers are > dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And the > implementation covers for those functions, which only either modify the > tailroom or the headroom (and not both). > > > I think patch 7 is also there to only support this patch 8. > > > > That's a wrong assumption. Patch 7/14 was added to support the > dp_packet_shift() operation. > > In this patch 8/14, because of the noncontiguous nature of chained > mbufs, there's no possibility of using general functions like `memmove`, > and re-using the shift operations was just easier. > > > I guess we may be talking about 2 different things. > My point is simply that greping for dp_packet_mbuf_shift() does not show > any users with resize__() > removed; I think it would be best to remove unused code from the patchset > dp_packet_mbuf_shift() is defined in patch 7. > >
Again, it is used in `dp_packet_shift()` for DPDK packets. > > > Ideally, It would seem like a modified patch 7 and patch 8 would > belong > > with the rest of the > > fix for the dpdk packet memory preallocation constraint issue. > > > > Also, ideally, 'XXX' is removed from patches. > > > > > > > > > > As others have pointed out in [1], this is not a simple > change, which > > would have to be propagated to higher levels in other parts of > the code > > base. I've proposed an alternative (vs refactoring the whole > dp_packet > > API to handle and return errors) in [2], but that seems to > have gone > > stale. Going forward I see that approach merging with this new > piece in > > dp_packet_resize__(), where an error can be returned to the > caller if > > there's not enough space. > > > > > > The full change is outside the scope of this series. > > > > > > > > > > > Also, the preexisting API handles two cases: > > > 1/ Tailroom only adjustment > > > 2/ headroom and/or tailroom adjustment > > > > > > meaning it handles all cases. > > > > > > The new DPDK addition (part of the same API) defines 2 cases > > > > > > 1/ tailroom only adjustment > > > 2/ headroom only adjustment > > > > > > So, it looks like a different API, that also does not handle > all cases. > > > > > > > > > > You have a point there, support for point 2/ "headroom and > tailroom > > adjustment" is missed. It doesn't seem to be used anywhere at the > > moment, the only callers being dp_packet_prealloc_tailroom() and > > dp_packet_prealloc_headroom(), but I'll submit an incremental > patch to > > deal with this. Thanks for pointing it out. > > > > Tiago. > > > > [1] > > > https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html> > > > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html>> > > [2] > > > https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html> > > > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>> > > > > > > > > + } > > > + } else { > > > + /* Otherwise, this is a headroom > adjustment. Try to > > > shift data > > > + * towards the tail to free up head space, > if there's > > > enough > > > + * tailroom */ > > > + > > > + miss_len = new_headroom - > dp_packet_headroom(b); > > > > > > + > > > + if (miss_len <= new_tailroom) { > > > + dp_packet_shift(b, miss_len); > > > + } else { > > > + /* XXX: Handle error case and report > error to > > caller */ > > > + OVS_NOT_REACHED(); > > > > > > > > > > > > same comments as above. > > > > > > > > > > > > + } > > > + } > > > + > > > + new_base = dp_packet_base(b); > > > + > > > + break; > > > + } > > > case DPBUF_MALLOC: > > > if (new_headroom == dp_packet_headroom(b)) { > > > new_base = xrealloc(dp_packet_base(b), > > new_allocated); > > > @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet > *b, size_t > > > new_headroom, size_t new_tailroom > > > OVS_NOT_REACHED(); > > > } > > > > > > - dp_packet_set_allocated(b, new_allocated); > > > + if (b->source != DPBUF_DPDK) { > > > + dp_packet_set_allocated(b, new_allocated); > > > + } > > > dp_packet_set_base(b, new_base); > > > > > > new_data = (char *) new_base + new_headroom; > > > -- > > > 2.7.4 > > > > > > _______________________________________________ > > > dev mailing list > > > d...@openvswitch.org <mailto:d...@openvswitch.org> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> > > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> > > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>> > > > > > > > > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev