Re: [nvo3] Comments on Draft Geneve

Tom Herbert Mon, 03 Mar 2014 10:14:41 -0800

On Mon, Mar 3, 2014 at 12:05 AM, Anton Ivanov (antivano)
<[email protected]> wrote:
> Hi all,
>
> I would like to address one more issue which has been omitted so far from
> the background to the discussion.
>
> If we restrict the use cases to virtualization (which is the remit of NVO3),
> the assumption that variable length options are "easy" to implement in
> software is valid if and only if they are constant length for the duration
> of a session. Otherwise it is incorrect.
>


Pardon my ignorance, but what is a "network virtualization session"?
I perused several of the nvo3 architecture documents (frameworks,
dataplane, requirements, etc.) and couldn't find any references to
sessions.

Thanks,
Tom

> If you work purely in software with no VMs involved f.e. software switch
> which takes pseudowires from the network and writes to pseudowires with a
> variable length header parsing geneve is trivial - you allocate big enough
> buffers and play with offsets. The code for that has been polished over the
> years, standard kernel buffer handling on all OS-es (or its equivalents for
> switches), nothing new here.
>
> If you have to pass that data into a VM this changes the picture - you want
> that data to be page aligned so you can page it in without copying it. This
> is trivial if your header is constant for the duration of the session. You
> get the header separately, data separately by knowing the offsets. The APIs
> to do that are there - it does not matter are you doing it in userspace
> (POSIX vector IO and its Microsoft equivalent) or in kernel space scatter
> gather IO. It is easy.
>
> If your header is variable length during the session and you do not know the
> size for a particular packet you have page-in the whole buffer and supply
> the driver with an offset on where to start. This means that you have to
> zero the bits of the header which would otherwise "leak into" the VM every
> time and/or do some copying. If you do not zero them, you have a security
> issue of the VM seeing its overlay and/or metadata which may have potential
> security use. The same applies if you can write directly to the VM address
> space instead of paging in buffers via the mmu. Zeroing 256+ bytes on every
> pass tends to add up to quite a few CPU cycles over time.
>
> So from an implementation perspective as far as variable size headers are
> concerned, there is little difference between software in a virtualized
> environment and hardware. They have very similar restrictions (unless you
> want to sacrifice 40% of your performance to an interim copy). Provided that
> you want performance of course.
>
> Going back to Geneve - if the header is constant duration within the session
> it is not different from what has been done in l2tp and what is being done
> in sfc. No technical merit to perpetrate it. If the header is variable, then
> we either have a case of:
>
> 1. The draft may need an IPR statement already at this stage. I do not feel
> comfortable discussing a spec that looks like it has been submarined so you
> need a specific piece of IPR to implement it with an acceptable performance.
>
> 2. A spec that is specifically tailored to a single NPU/NIC to ship from a
> single (un)known vendor. This is similarly not something we should be
> discussing (once again - IPR statement there too).
>
> Brgds,
>
> A.
>
>
>
> On 02/03/14 23:30, Phil Bedard wrote:
>
> I've read most of the posts in this thread as an operator who may be looking
> at an overlay solution in the future.
>
> So the crux of the discussion is whether to extend the functionality of an
> existing protocol or introduce a brand new protocol.
>
> I would like to see the VNI space extended to 32 bits instead of 24 in
> whatever encapsulation method is being chosen.  24 seems like a holdover
> from the 802.1ah I-SID value and other adapted tunnel protocol limitations
> and I'm not sure it's really necessary anymore.
>
> I also believe there has to be a protocol identifier in the encapsulation
> header identifying what comes next.  Static provisioning of this kind of
> information at the endpoints or midpoints in the case of monitoring gear,
> etc. is too cumbersome and not extensible.   I think Tom said it initially,
> but I also don't believe inserting an Ethernet header just for the sake of
> it is efficient and the overlay encapsulation protocol should be able to
> encapsulate IP directly.
>
> I do not think the metadata should be a part of the encapsulation protocol,
> the encapsulation header should be a fixed length.   I think the majority of
> simple overlay networks will not require additional metadata information and
> will likely be using the encapsulation with nothing following it but IP
> packets or Ethernet frames.    Having a variable length suffix is just going
> to add implementation headaches for hardware vendors and will be a quick way
> to see it not get adopted, IMHO.    If someone needs additional hardware
> support for the next header, whether it be a security integrity header, or
> some sort of additional metadata, let that be sorted out elsewhere.
>
> Just my 2c.
>
> -Phil
>
>
> From: Pankaj Garg <[email protected]>
> Date: Sunday, March 2, 2014 at 2:06 PM
> To: "Larry Kreeger (kreeger)" <[email protected]>, "[email protected]"
> <[email protected]>
> Subject: Re: [nvo3] Comments on Draft Geneve
>
> My responses are inline marked with PG.
>
>
>
> From: Larry Kreeger (kreeger) [mailto:[email protected]]
> Sent: Sunday, March 2, 2014 9:16 PM
> To: Pankaj Garg; [email protected]
> Subject: Re: Comments on Draft Geneve
>
>
>
> My responses are inline marked with LK>.  - Larry
>
>
>
> From: Pankaj Garg <[email protected]>
> Date: Saturday, March 1, 2014 4:22 AM
> To: Larry Kreeger <[email protected]>, "[email protected]" <[email protected]>
> Subject: RE: Comments on Draft Geneve
>
>
>
> My comments are inline marked with [PG].
>
>
>
> From: nvo3 [mailto:[email protected]] On Behalf Of Larry Kreeger
> (kreeger)
> Sent: Saturday, March 1, 2014 3:28 AM
> To: [email protected]
> Subject: [nvo3] Comments on Draft Geneve
>
>
>
> I see that a healthy discussion has broken out around draft-gross-geneve-00
> which I see has a slot in the agenda for the NVO3 WG meeting on Monday.
> Here are my thoughts.
>
>
>
> I will be comparing Geneve to an encapsulation that is near and dear to my
> heart, VXLAN.  When I do this, I see an encapsulation that is very similar
> to VXLAN (e.g. uses UDP, uses a 24-bit segment identifier at the same
> offset).  I see three things that Geneve adds beyond what is available in
> draft-mahalingam-dutt-dcops-vxlan:
>
>
>
> 1) The ability to encapsulate any protocol with an Ethertype (not just
> Ethernet frames), by adding a Protocol Type field.  This is certainly
> useful, and has already been covered in draft-quinn-vxlan-gpe as a backward
> compatible extension to VXLAN by using a P bit flag to signal its presence.
> The field is even at the same offset as draft-quinn-vxlan-gpe, but is
> missing the P bit for backwards compatibility.
>
>
>
> [PG] The backward compatibility argument is invalid since a frame with P bit
> set (let me call it VXLAN V2) cannot be processed by the older endpoint,
> thus having no backward compatibility.
>
>
>
> LK> By backward compatibility, I mean that new implementations of VXLAN
> (VXLAN V2 as you call it) can understand packets sent by older
> implementations (VXLAN V1) as well as from new ones.  If older endpoints
> could understand the future bits, I would call that forward compatibility.
>
> [PG] My point was that the VXLAN V2 endpoint would have to support
> generating and understanding VXLAN V1 format packets. Is it much different
> than an endpoint supporting both Geneve and VXLAN V1?
>
>
>
> [PG] Essentially, what you are saying is that one can generate packets in
> VXLAN V1 for older endpoint and VXLAN V2 for newer endpoints. So the
> question is, why is VXLAN V2 better than Geneve? In fact, switching on a top
> level UDP port, provides a cleaner processing pipeline.
>
>
>
> LK> By enhancing VXLAN, there is no need to get a new UDP port assigned and
> all the current parsing logic for VXLAN V1 can be applied.
>
> [PG] I am not sure if allocating a new port is the meta issue here. The main
> issue here seems to be whether new protocol should _require_ support for
> VXLAN V1 or not. Coming from NVGRE side, the same argument would apply to
> Geneve where one can say that Geneve should be backward compatible with
> NVGRE. I feel this might be a slippery slope where a new protocol cannot
> start with a clean slate.
>
>
>
> 2) The addition of an OAM bit to signal that the packet should be processed
> by the tunnel endpoint and not forwarded to a tenant.  This also seems
> useful, and seems identical in usage to the (IMO, poorly named) "Router
> Alert" bit extension to VXLAN covered in (the currently expired)
> draft-singh-nvo3-vxlan-router-alert.
>
>
>
> [PG] Yes, the OAM bit usage is similar. However, this is another extension
> which is incompatible with older implementation of VXLAN thus breaking
> backward compatibility.
>
>
>
> LK> Again, I would call what you are referring to "forward compatibility".
>
>
>
> 3) Last, but not least is the addition of a variable length options field,
> which the draft suggests is used to carry metadata along with the payload.
> As mentioned by some others, IMO, the encapsulation transport header is not
> the right place to define and carry metadata.  Architecturally, metadata
> should be defined independent of transport so it can be carried inside of
> whatever transport is desired (e.g. VXLAN, NVGRE, MPLSoGRE, L2TPV3 etc).
> One example of an effort to do this is in the Network Service Header draft
> (draft-quinn-sfc-nsh) being discussed in the SFC WG.  I am guessing that
> since the Geneve options field is optional, that the metadata it contains is
> not related to basic network connectivity, but more to providing higher
> level network services (aka Service Functions).  The Network Service Header
> contains two separate parts, the service path (used to guide the packets
> through the service chain) and context (metadata).  I can certainly see the
> context part of NSH being used to carry metadata even if the service chain
> is null (all services are fully distributed to the tunnel endpoints).
>
>
>
> [PG] The meta-data should be defined by their respective group. Different
> encapsulation protocols can carry those meta-data in their headers as
> needed. One clear example of how Geneve is better is that it can carry that
> meta-data without breaking hardware offloads, whereas VXLAN and NVGRE cannot
> do that. Btw I want to be clear, Geneve is not defining the meta-data, and
> it is not tying meta-data to Geneve, it is only defining a general purpose
> ability to carry meta-data, which is tremendously useful to have in the
> encapsulation header.
>
>
>
> On a side note, I don’t believe that the design of NSH is suitable for
> carrying general purpose meta-data. In fact in its current definition, it is
> not defining service chaining primitives clearly either, however we can
> discuss that in SFC forum, and focus the discussion on encapsulation header
> in this forum.
>
>
>
> In short, I don't see anything in Geneve that cannot be accomplished by
> using the backward compatible extensions to VXLAN proposed in
> draft-quinn-vxlan-gpe and draft-singh-nvo3-vxlan-router-alert, combined with
> the addition of NSH.
>
>
>
> [PG] Yes, one can put multiple (incompatible) extensions on top of VXLAN,
> and achieve many things that Geneve is supporting. But at that point, aren’t
> we creating a new encapsulation format altogether? This new protocol with
> all such extensions would require new hardware, new software, break existing
> NIC offloads etc. and still carry the legacy baggage with no clear
> advantage. At that point, I am not sure, why it is better?
>
>
>
> LK> As I wrote above, extending VXLAN allows the same UDP port to be used
> and reuse of the existing VXLAN parsing logic.
>
> [PG] The crux of the discussion seems to be, whether Geneve should have a
> mode that is compatible with VXLAN V1 or not. Even though it might be a
> slippery slope, I think it is something to think about and debate further.
>
>
>
> When the current NVO3 WG charter was being written, there seemed to be
> consensus that we have no shortage of encapsulation options, but what was
> lacking was a standard control plane.  The Geneve draft seems to turn that
> on its head by saying "There is a clear advantage in settling on a data
> format: most of the protocols are only superficially different and there is
> little advantage in duplicating effort.  However, the same cannot be said of
> control planes, which are diverse in very fundamental ways.  The case for
> standardization is also less clear given the wide variety in requirements,
> goals, and deployment scenarios.".  I agree with the first part of this, so
> why define a completely new, non-backward compatible encapsulation?  I
> disagree with the second part, since this is clearly the goal of the NVO3
> WG.
>
>
>
> I see that there is an agenda slot to discuss the Geneve draft, but I'm not
> clear what the goals are of the authors within the IETF since the draft name
> does not target it to any particular WG, and it is currently marked as
> "Informational".  I would suggest that the authors consider extending
> currently implemented encapsulations rather than starting from scratch, e.g.
> by moving a few bits around in the first word of the Geneve header, it could
> be made backward compatible with VXLAN.
>
>
>
> Thanks, Larry
>
> _______________________________________________ nvo3 mailing list
> [email protected] https://www.ietf.org/mailman/listinfo/nvo3
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
>
>
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
>

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on Draft Geneve

Reply via email to