Re: [nvo3] Comments on Draft Geneve

Tom Herbert Mon, 03 Mar 2014 08:47:55 -0800

Hi Anton,

What you are describing is header data split which is where a device
splits header and data portions of packet into two buffers so that
data can be page aligned (or as least in a different cache line as you
pointed out). Several NICs have already implemented this with TCP to
split out TCP data from rest of the packet headers. They have done
this even though there is no restriction that TCP options don't vary
during the lifetime of a connection, in fact it's pretty essential to
the protocol to allow that they do. Also, NICs are stateless for this,
so the split is done for each packet independently by parsing the
packet and computing the offset for the split. A device can perform
header data split in the same manner for encapsulation protocols (I
suspect some might have already done that for GRE).


Within a data center environment, it is probably true that pretty much
all packets we'd send will have the same format. Homogeneity makes it
much easier to do things like program a TCAM for headers. However,
this is an implementation and deployment choice, not a necessarily
fundamental property of the protocol. At the protocol design
robustness protocol is the stronger guiding principle.

Tom



On Mon, Mar 3, 2014 at 12:05 AM, Anton Ivanov (antivano)
<[email protected]> wrote:
> Hi all,
>
> I would like to address one more issue which has been omitted so far from
> the background to the discussion.
>
> If we restrict the use cases to virtualization (which is the remit of NVO3),
> the assumption that variable length options are "easy" to implement in
> software is valid if and only if they are constant length for the duration
> of a session. Otherwise it is incorrect.
>
> If you work purely in software with no VMs involved f.e. software switch
> which takes pseudowires from the network and writes to pseudowires with a
> variable length header parsing geneve is trivial - you allocate big enough
> buffers and play with offsets. The code for that has been polished over the
> years, standard kernel buffer handling on all OS-es (or its equivalents for
> switches), nothing new here.
>
> If you have to pass that data into a VM this changes the picture - you want
> that data to be page aligned so you can page it in without copying it. This
> is trivial if your header is constant for the duration of the session. You
> get the header separately, data separately by knowing the offsets. The APIs
> to do that are there - it does not matter are you doing it in userspace
> (POSIX vector IO and its Microsoft equivalent) or in kernel space scatter
> gather IO. It is easy.
>
> If your header is variable length during the session and you do not know the
> size for a particular packet you have page-in the whole buffer and supply
> the driver with an offset on where to start. This means that you have to
> zero the bits of the header which would otherwise "leak into" the VM every
> time and/or do some copying. If you do not zero them, you have a security
> issue of the VM seeing its overlay and/or metadata which may have potential
> security use. The same applies if you can write directly to the VM address
> space instead of paging in buffers via the mmu. Zeroing 256+ bytes on every
> pass tends to add up to quite a few CPU cycles over time.
>
> So from an implementation perspective as far as variable size headers are
> concerned, there is little difference between software in a virtualized
> environment and hardware. They have very similar restrictions (unless you
> want to sacrifice 40% of your performance to an interim copy). Provided that
> you want performance of course.
>
> Going back to Geneve - if the header is constant duration within the session
> it is not different from what has been done in l2tp and what is being done
> in sfc. No technical merit to perpetrate it. If the header is variable, then
> we either have a case of:
>
> 1. The draft may need an IPR statement already at this stage. I do not feel
> comfortable discussing a spec that looks like it has been submarined so you
> need a specific piece of IPR to implement it with an acceptable performance.
>
> 2. A spec that is specifically tailored to a single NPU/NIC to ship from a
> single (un)known vendor. This is similarly not something we should be
> discussing (once again - IPR statement there too).
>
> Brgds,
>
> A.
>
>
>
> On 02/03/14 23:30, Phil Bedard wrote:
>
> I've read most of the posts in this thread as an operator who may be looking
> at an overlay solution in the future.
>
> So the crux of the discussion is whether to extend the functionality of an
> existing protocol or introduce a brand new protocol.
>
> I would like to see the VNI space extended to 32 bits instead of 24 in
> whatever encapsulation method is being chosen.  24 seems like a holdover
> from the 802.1ah I-SID value and other adapted tunnel protocol limitations
> and I'm not sure it's really necessary anymore.
>
> I also believe there has to be a protocol identifier in the encapsulation
> header identifying what comes next.  Static provisioning of this kind of
> information at the endpoints or midpoints in the case of monitoring gear,
> etc. is too cumbersome and not extensible.   I think Tom said it initially,
> but I also don't believe inserting an Ethernet header just for the sake of
> it is efficient and the overlay encapsulation protocol should be able to
> encapsulate IP directly.
>
> I do not think the metadata should be a part of the encapsulation protocol,
> the encapsulation header should be a fixed length.   I think the majority of
> simple overlay networks will not require additional metadata information and
> will likely be using the encapsulation with nothing following it but IP
> packets or Ethernet frames.    Having a variable length suffix is just going
> to add implementation headaches for hardware vendors and will be a quick way
> to see it not get adopted, IMHO.    If someone needs additional hardware
> support for the next header, whether it be a security integrity header, or
> some sort of additional metadata, let that be sorted out elsewhere.
>
> Just my 2c.
>
> -Phil
>
>
> From: Pankaj Garg <[email protected]>
> Date: Sunday, March 2, 2014 at 2:06 PM
> To: "Larry Kreeger (kreeger)" <[email protected]>, "[email protected]"
> <[email protected]>
> Subject: Re: [nvo3] Comments on Draft Geneve
>
> My responses are inline marked with PG.
>
>
>
> From: Larry Kreeger (kreeger) [mailto:[email protected]]
> Sent: Sunday, March 2, 2014 9:16 PM
> To: Pankaj Garg; [email protected]
> Subject: Re: Comments on Draft Geneve
>
>
>
> My responses are inline marked with LK>.  - Larry
>
>
>
> From: Pankaj Garg <[email protected]>
> Date: Saturday, March 1, 2014 4:22 AM
> To: Larry Kreeger <[email protected]>, "[email protected]" <[email protected]>
> Subject: RE: Comments on Draft Geneve
>
>
>
> My comments are inline marked with [PG].
>
>
>
> From: nvo3 [mailto:[email protected]] On Behalf Of Larry Kreeger
> (kreeger)
> Sent: Saturday, March 1, 2014 3:28 AM
> To: [email protected]
> Subject: [nvo3] Comments on Draft Geneve
>
>
>
> I see that a healthy discussion has broken out around draft-gross-geneve-00
> which I see has a slot in the agenda for the NVO3 WG meeting on Monday.
> Here are my thoughts.
>
>
>
> I will be comparing Geneve to an encapsulation that is near and dear to my
> heart, VXLAN.  When I do this, I see an encapsulation that is very similar
> to VXLAN (e.g. uses UDP, uses a 24-bit segment identifier at the same
> offset).  I see three things that Geneve adds beyond what is available in
> draft-mahalingam-dutt-dcops-vxlan:
>
>
>
> 1) The ability to encapsulate any protocol with an Ethertype (not just
> Ethernet frames), by adding a Protocol Type field.  This is certainly
> useful, and has already been covered in draft-quinn-vxlan-gpe as a backward
> compatible extension to VXLAN by using a P bit flag to signal its presence.
> The field is even at the same offset as draft-quinn-vxlan-gpe, but is
> missing the P bit for backwards compatibility.
>
>
>
> [PG] The backward compatibility argument is invalid since a frame with P bit
> set (let me call it VXLAN V2) cannot be processed by the older endpoint,
> thus having no backward compatibility.
>
>
>
> LK> By backward compatibility, I mean that new implementations of VXLAN
> (VXLAN V2 as you call it) can understand packets sent by older
> implementations (VXLAN V1) as well as from new ones.  If older endpoints
> could understand the future bits, I would call that forward compatibility.
>
> [PG] My point was that the VXLAN V2 endpoint would have to support
> generating and understanding VXLAN V1 format packets. Is it much different
> than an endpoint supporting both Geneve and VXLAN V1?
>
>
>
> [PG] Essentially, what you are saying is that one can generate packets in
> VXLAN V1 for older endpoint and VXLAN V2 for newer endpoints. So the
> question is, why is VXLAN V2 better than Geneve? In fact, switching on a top
> level UDP port, provides a cleaner processing pipeline.
>
>
>
> LK> By enhancing VXLAN, there is no need to get a new UDP port assigned and
> all the current parsing logic for VXLAN V1 can be applied.
>
> [PG] I am not sure if allocating a new port is the meta issue here. The main
> issue here seems to be whether new protocol should _require_ support for
> VXLAN V1 or not. Coming from NVGRE side, the same argument would apply to
> Geneve where one can say that Geneve should be backward compatible with
> NVGRE. I feel this might be a slippery slope where a new protocol cannot
> start with a clean slate.
>
>
>
> 2) The addition of an OAM bit to signal that the packet should be processed
> by the tunnel endpoint and not forwarded to a tenant.  This also seems
> useful, and seems identical in usage to the (IMO, poorly named) "Router
> Alert" bit extension to VXLAN covered in (the currently expired)
> draft-singh-nvo3-vxlan-router-alert.
>
>
>
> [PG] Yes, the OAM bit usage is similar. However, this is another extension
> which is incompatible with older implementation of VXLAN thus breaking
> backward compatibility.
>
>
>
> LK> Again, I would call what you are referring to "forward compatibility".
>
>
>
> 3) Last, but not least is the addition of a variable length options field,
> which the draft suggests is used to carry metadata along with the payload.
> As mentioned by some others, IMO, the encapsulation transport header is not
> the right place to define and carry metadata.  Architecturally, metadata
> should be defined independent of transport so it can be carried inside of
> whatever transport is desired (e.g. VXLAN, NVGRE, MPLSoGRE, L2TPV3 etc).
> One example of an effort to do this is in the Network Service Header draft
> (draft-quinn-sfc-nsh) being discussed in the SFC WG.  I am guessing that
> since the Geneve options field is optional, that the metadata it contains is
> not related to basic network connectivity, but more to providing higher
> level network services (aka Service Functions).  The Network Service Header
> contains two separate parts, the service path (used to guide the packets
> through the service chain) and context (metadata).  I can certainly see the
> context part of NSH being used to carry metadata even if the service chain
> is null (all services are fully distributed to the tunnel endpoints).
>
>
>
> [PG] The meta-data should be defined by their respective group. Different
> encapsulation protocols can carry those meta-data in their headers as
> needed. One clear example of how Geneve is better is that it can carry that
> meta-data without breaking hardware offloads, whereas VXLAN and NVGRE cannot
> do that. Btw I want to be clear, Geneve is not defining the meta-data, and
> it is not tying meta-data to Geneve, it is only defining a general purpose
> ability to carry meta-data, which is tremendously useful to have in the
> encapsulation header.
>
>
>
> On a side note, I don’t believe that the design of NSH is suitable for
> carrying general purpose meta-data. In fact in its current definition, it is
> not defining service chaining primitives clearly either, however we can
> discuss that in SFC forum, and focus the discussion on encapsulation header
> in this forum.
>
>
>
> In short, I don't see anything in Geneve that cannot be accomplished by
> using the backward compatible extensions to VXLAN proposed in
> draft-quinn-vxlan-gpe and draft-singh-nvo3-vxlan-router-alert, combined with
> the addition of NSH.
>
>
>
> [PG] Yes, one can put multiple (incompatible) extensions on top of VXLAN,
> and achieve many things that Geneve is supporting. But at that point, aren’t
> we creating a new encapsulation format altogether? This new protocol with
> all such extensions would require new hardware, new software, break existing
> NIC offloads etc. and still carry the legacy baggage with no clear
> advantage. At that point, I am not sure, why it is better?
>
>
>
> LK> As I wrote above, extending VXLAN allows the same UDP port to be used
> and reuse of the existing VXLAN parsing logic.
>
> [PG] The crux of the discussion seems to be, whether Geneve should have a
> mode that is compatible with VXLAN V1 or not. Even though it might be a
> slippery slope, I think it is something to think about and debate further.
>
>
>
> When the current NVO3 WG charter was being written, there seemed to be
> consensus that we have no shortage of encapsulation options, but what was
> lacking was a standard control plane.  The Geneve draft seems to turn that
> on its head by saying "There is a clear advantage in settling on a data
> format: most of the protocols are only superficially different and there is
> little advantage in duplicating effort.  However, the same cannot be said of
> control planes, which are diverse in very fundamental ways.  The case for
> standardization is also less clear given the wide variety in requirements,
> goals, and deployment scenarios.".  I agree with the first part of this, so
> why define a completely new, non-backward compatible encapsulation?  I
> disagree with the second part, since this is clearly the goal of the NVO3
> WG.
>
>
>
> I see that there is an agenda slot to discuss the Geneve draft, but I'm not
> clear what the goals are of the authors within the IETF since the draft name
> does not target it to any particular WG, and it is currently marked as
> "Informational".  I would suggest that the authors consider extending
> currently implemented encapsulations rather than starting from scratch, e.g.
> by moving a few bits around in the first word of the Geneve header, it could
> be made backward compatible with VXLAN.
>
>
>
> Thanks, Larry
>
> _______________________________________________ nvo3 mailing list
> [email protected] https://www.ietf.org/mailman/listinfo/nvo3
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
>
>
>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
>

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on Draft Geneve

Reply via email to