On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago....@intel.com> wrote:
> On 16/07/2018 09:37, Lam, Tiago wrote: > > On 13/07/2018 18:54, Darrell Ball wrote: > >> Thanks for the patch. > >> > >> A few queries inline. > >> > > > > Hi Darrell, > > > > Thanks for your inputs. I've replied in-line as well. > > > >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago....@intel.com > >> <mailto:tiago....@intel.com>> wrote: > >> > >> When enabled with DPDK OvS relies on mbufs allocated by mempools to > >> receive and output data on DPDK ports. Until now, each OvS > dp_packet has > >> had only one mbuf associated, which is allocated with the maximum > >> possible size, taking the MTU into account. This approach, however, > >> doesn't allow us to increase the allocated size in an mbuf, if > needed, > >> since an mbuf is allocated and initialised upon mempool creation. > Thus, > >> in the current implementatin this is dealt with by calling > >> OVS_NOT_REACHED() and terminating OvS. > >> > >> To avoid this, and allow the (already) allocated space to be better > >> used, dp_packet_resize__() now tries to use the available room, > both the > >> tailroom and the headroom, to make enough space for the new data. > Since > >> this happens for packets of source DPBUF_DPDK, the single-segment > mbuf > >> case mentioned above is also covered by this new aproach in > resize__(). > >> > >> Signed-off-by: Tiago Lam <tiago....@intel.com > >> <mailto:tiago....@intel.com>> > >> Acked-by: Eelco Chaudron <echau...@redhat.com > >> <mailto:echau...@redhat.com>> > >> --- > >> lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++ > ++++++++++++++++-- > >> 1 file changed, 46 insertions(+), 2 deletions(-) > >> > >> diff --git a/lib/dp-packet.c b/lib/dp-packet.c > >> index d6e19eb..87af459 100644 > >> --- a/lib/dp-packet.c > >> +++ b/lib/dp-packet.c > >> @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t > >> new_headroom, size_t new_tailroom > >> new_allocated = new_headroom + dp_packet_size(b) + > new_tailroom; > >> > >> switch (b->source) { > >> + /* When resizing mbufs, both a single mbuf and multi-segment > >> mbufs (where > >> + * data is not contigously held in memory), both the headroom > >> and the > >> + * tailroom available will be used to make more space for where > >> data needs > >> + * to be inserted. I.e if there's not enough headroom, data may > >> be shifted > >> + * right if there's enough tailroom. > >> + * However, this is not bulletproof and in some cases the space > >> available > >> + * won't be enough - in those cases, an error should be > >> returned and the > >> + * packet dropped. */ > >> case DPBUF_DPDK: > >> - OVS_NOT_REACHED(); > >> > >> > >> Previously, it was a coding error to call this function for a DPDK mbuf > >> case, which is pretty > >> clear. But with this patch, presumably that is not longer the case and > >> the calling the API is > >> now ok for DPDK mbufs. > >> > > > > As it stands, it will still be an error to call dp_packet_resize__() for > > any DPDK packet, or by extension any of the other functions that call > > it, such as dp_packet_prealloc_tailroom() and > > dp_packet_prealloc_headroom(). This patch only tries to alleviate that > > by accommodating space from the headroom or tailroom, if possible, and > > create just enough space for the new data. My preferred approach would > > be to return an error if not possible, but since the API doesn't deal > > with errors as is, the previous behavior of manually asserting was left > > as is. As reported in [1] (I comment more on that below), the behavior > > of manually asserting can lead to undesired behavior in some use cases. > > > >> > >> > >> + { > >> + size_t miss_len; > >> + > >> + if (new_headroom == dp_packet_headroom(b)) { > >> + /* This is a tailroom adjustment. Since there's no > >> tailroom space > >> + * left, try and shift data towards the head to free up > >> tail space, > >> + * if there's enough headroom */ > >> + > >> + miss_len = new_tailroom - dp_packet_tailroom(b); > >> + > >> + if (miss_len <= new_headroom) { > >> + dp_packet_shift(b, -miss_len); > >> + } else { > >> + /* XXX: Handle error case and report error to > caller */ > >> + OVS_NOT_REACHED(); > >> > >> > >> > >> This will not just drop the packet, it will fail the daemon, because a > >> packet cannot be resized. > >> If the system is completely depleted of memory, that may be ok, but in > >> the case, no such > >> assumption is implied. > >> > >> Also, why is XXX still left in the patch? > >> > > > > Because there's still work to do in that regard. The whole process > > shouldn't be brought down if there's not enough space to put some data > > in one single packet. However, this was intentionally left out of this > > series or otherwise it would increase its complexity considerably. > > > > As others have pointed out in [1], this is not a simple change, which > > would have to be propagated to higher levels in other parts of the code > > base. I've proposed an alternative (vs refactoring the whole dp_packet > > API to handle and return errors) in [2], but that seems to have gone > > stale. Going forward I see that approach merging with this new piece in > > dp_packet_resize__(), where an error can be returned to the caller if > > there's not enough space. > > > >> Also, the preexisting API handles two cases: > >> 1/ Tailroom only adjustment > >> 2/ headroom and/or tailroom adjustment > >> > >> meaning it handles all cases. > >> > >> The new DPDK addition (part of the same API) defines 2 cases > >> > >> 1/ tailroom only adjustment > >> 2/ headroom only adjustment > >> > >> So, it looks like a different API, that also does not handle all cases. > >> > >> > > > > You have a point there, support for point 2/ "headroom and tailroom > > adjustment" is missed. It doesn't seem to be used anywhere at the > > moment, the only callers being dp_packet_prealloc_tailroom() and > > dp_packet_prealloc_headroom(), but I'll submit an incremental patch to > > deal with this. Thanks for pointing it out. > > > I've had a look into this and it doesn't seem that case number 2/ > "headroom and tailroom adjustment" above would make sense for the > DPBUF_DPDK case. The reason being that if both `new_tailroom` and > `new_headroom` are being increased (in respect to dp_packet_tailroom() > and dp_packet_headroom()), there won't be enough space as the tail and > head would be competing for the same available space. So it makes sense > to make it an exclusive operation for the DPBUF_DPDK case. > Maybe you can explain why that would be the case, in general? However, if that is really the case, then that is fine; an "else if" case would need to be added and a modified 'return error else case'. > > The work I mentioned before in [1] should help here as we would then be > able to return an error (suach as `EINVAL`) if both "new_tailroom" and > "new_headroom" are incremented for the DPBUF_DPDK case. > > What do you think? > > [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev