Re: [nvo3] Comments on Draft Geneve

Anton Ivanov (antivano) Mon, 03 Mar 2014 12:03:08 -0800

On 03/03/14 18:10, Tom Herbert wrote:
> On Mon, Mar 3, 2014 at 12:05 AM, Anton Ivanov (antivano)
> <[email protected]> wrote:
>> Hi all,
>>
>> I would like to address one more issue which has been omitted so far from
>> the background to the discussion.
>>
>> If we restrict the use cases to virtualization (which is the remit of NVO3),
>> the assumption that variable length options are "easy" to implement in
>> software is valid if and only if they are constant length for the duration
>> of a session. Otherwise it is incorrect.
>>
> Pardon my ignorance, but what is a "network virtualization session"?


Apologies - L2TPv3 terminology :) As I have said quite a few times that 
it is being reinvented by geneve - both the standards and the most 
popular non-standard feature (the extra metadata blob between header and 
data)

So - on why you do not see it. You are not going to find it as the 
current reqs have no metadata req.

Once you add metadata there is the following design question to answer:

1. Do you have truly variable metadata where every packet gets a 
distinct metadata blob and every header has a different size.

2. Do you have a small set of metadata choices associated with a 
particular pseudowire at any given time (and small choice of headers).

In theory (and as proclaimed by people who have never written a line of 
code to do this), you can do anything in software. So in software we 
should be proclaiming 1 as the one and true way. Well NO. With all due 
respect designing for it, implementing and testing a robust high 
performance parser is crazy. At that rate we might as well abandon 
packets all together and have the VMs communicate via XML. Infinitely 
expandable too you know.

I agree 100% with Dino - this level of flexibility is useless.

If take 2 as the base use case, we can map the packets onto one or more 
"sessions" associated with a particular pseudowire (pretty much 
identical to the L2TPv3 concept of session). You can think of it as a 
flow identified by NVID + metadata not, packet contents.

It is a logical concept for which the implementer can set up originating 
and terminating contexts in the software implementation - buffers, 
header parser, etc. Having such concept explicitly or implicitly defined 
in the spec is a key to having multiple efficient, standardize-able and 
robust implementations which can interoperate.

So from an implementation perspective of how to do that in software I 
can set up a high performance "peeler" which peels off all the headers 
and metadata for such an association (session), runs them past 
pre-loaded parsers/verifiers and handles "other" only as an _EXEMPTION_ 
(in both English and Software Engineering sense of this world). The 
exemption usually results in setting up another association, its parser, 
fetching extra data from databases and caching it locally, etc so that 
the next packet to hit that is peeled off effectively and does not hit 
an exemption path. I can also build that as an open system where I can 
have the parsers loadable and have third party processing of metadata. 
Trivial actually and quite useful too.

Compared to that an "infinitely variable" parser will need to be 
monolitic (at least most of it).

So going back to Geneve - from the draft (and the very vague and full of 
marketing presentation which we had today) it is unclear what are they 
trying to address - 1 or 2. Both the draft and the presentations have 
failed to address it and no questions have been answered on this so far.

If it is 1, this is unrealistic - there is no way to implement that 
efficiently and securely as an interoperating standard. Infinite 
flexibility a standard does not make. There is no point for the draft to 
continue to exist.

If it is 2 - if it can be represented as sessions, then I see no 
advantage between geneve and any other protocol + header that can be 
specified for the duration of the association (session). F.E - L2TPv3 
that may map 1:1 onto session as we know it. VXLAN and NVGRE we will 
probably need to have an extension header but to the same effect. In any 
case - it has all been invented before and is well known. No need to 
reinvent the wheel. No need for another protocol, once again, ergo, no 
reason for geneve to exist.

A.

> I perused several of the nvo3 architecture documents (frameworks,
> dataplane, requirements, etc.) and couldn't find any references to
> sessions.
>
> Thanks,
> Tom
>
>> If you work purely in software with no VMs involved f.e. software switch
>> which takes pseudowires from the network and writes to pseudowires with a
>> variable length header parsing geneve is trivial - you allocate big enough
>> buffers and play with offsets. The code for that has been polished over the
>> years, standard kernel buffer handling on all OS-es (or its equivalents for
>> switches), nothing new here.
>>
>> If you have to pass that data into a VM this changes the picture - you want
>> that data to be page aligned so you can page it in without copying it. This
>> is trivial if your header is constant for the duration of the session. You
>> get the header separately, data separately by knowing the offsets. The APIs
>> to do that are there - it does not matter are you doing it in userspace
>> (POSIX vector IO and its Microsoft equivalent) or in kernel space scatter
>> gather IO. It is easy.
>>
>> If your header is variable length during the session and you do not know the
>> size for a particular packet you have page-in the whole buffer and supply
>> the driver with an offset on where to start. This means that you have to
>> zero the bits of the header which would otherwise "leak into" the VM every
>> time and/or do some copying. If you do not zero them, you have a security
>> issue of the VM seeing its overlay and/or metadata which may have potential
>> security use. The same applies if you can write directly to the VM address
>> space instead of paging in buffers via the mmu. Zeroing 256+ bytes on every
>> pass tends to add up to quite a few CPU cycles over time.
>>
>> So from an implementation perspective as far as variable size headers are
>> concerned, there is little difference between software in a virtualized
>> environment and hardware. They have very similar restrictions (unless you
>> want to sacrifice 40% of your performance to an interim copy). Provided that
>> you want performance of course.
>>
>> Going back to Geneve - if the header is constant duration within the session
>> it is not different from what has been done in l2tp and what is being done
>> in sfc. No technical merit to perpetrate it. If the header is variable, then
>> we either have a case of:
>>
>> 1. The draft may need an IPR statement already at this stage. I do not feel
>> comfortable discussing a spec that looks like it has been submarined so you
>> need a specific piece of IPR to implement it with an acceptable performance.
>>
>> 2. A spec that is specifically tailored to a single NPU/NIC to ship from a
>> single (un)known vendor. This is similarly not something we should be
>> discussing (once again - IPR statement there too).
>>
>> Brgds,
>>
>> A.
>>
>>
>>
>> On 02/03/14 23:30, Phil Bedard wrote:
>>
>> I've read most of the posts in this thread as an operator who may be looking
>> at an overlay solution in the future.
>>
>> So the crux of the discussion is whether to extend the functionality of an
>> existing protocol or introduce a brand new protocol.
>>
>> I would like to see the VNI space extended to 32 bits instead of 24 in
>> whatever encapsulation method is being chosen.  24 seems like a holdover
>> from the 802.1ah I-SID value and other adapted tunnel protocol limitations
>> and I'm not sure it's really necessary anymore.
>>
>> I also believe there has to be a protocol identifier in the encapsulation
>> header identifying what comes next.  Static provisioning of this kind of
>> information at the endpoints or midpoints in the case of monitoring gear,
>> etc. is too cumbersome and not extensible.   I think Tom said it initially,
>> but I also don't believe inserting an Ethernet header just for the sake of
>> it is efficient and the overlay encapsulation protocol should be able to
>> encapsulate IP directly.
>>
>> I do not think the metadata should be a part of the encapsulation protocol,
>> the encapsulation header should be a fixed length.   I think the majority of
>> simple overlay networks will not require additional metadata information and
>> will likely be using the encapsulation with nothing following it but IP
>> packets or Ethernet frames.    Having a variable length suffix is just going
>> to add implementation headaches for hardware vendors and will be a quick way
>> to see it not get adopted, IMHO.    If someone needs additional hardware
>> support for the next header, whether it be a security integrity header, or
>> some sort of additional metadata, let that be sorted out elsewhere.
>>
>> Just my 2c.
>>
>> -Phil
>>
>>
>> From: Pankaj Garg <[email protected]>
>> Date: Sunday, March 2, 2014 at 2:06 PM
>> To: "Larry Kreeger (kreeger)" <[email protected]>, "[email protected]"
>> <[email protected]>
>> Subject: Re: [nvo3] Comments on Draft Geneve
>>
>> My responses are inline marked with PG.
>>
>>
>>
>> From: Larry Kreeger (kreeger) [mailto:[email protected]]
>> Sent: Sunday, March 2, 2014 9:16 PM
>> To: Pankaj Garg; [email protected]
>> Subject: Re: Comments on Draft Geneve
>>
>>
>>
>> My responses are inline marked with LK>.  - Larry
>>
>>
>>
>> From: Pankaj Garg <[email protected]>
>> Date: Saturday, March 1, 2014 4:22 AM
>> To: Larry Kreeger <[email protected]>, "[email protected]" <[email protected]>
>> Subject: RE: Comments on Draft Geneve
>>
>>
>>
>> My comments are inline marked with [PG].
>>
>>
>>
>> From: nvo3 [mailto:[email protected]] On Behalf Of Larry Kreeger
>> (kreeger)
>> Sent: Saturday, March 1, 2014 3:28 AM
>> To: [email protected]
>> Subject: [nvo3] Comments on Draft Geneve
>>
>>
>>
>> I see that a healthy discussion has broken out around draft-gross-geneve-00
>> which I see has a slot in the agenda for the NVO3 WG meeting on Monday.
>> Here are my thoughts.
>>
>>
>>
>> I will be comparing Geneve to an encapsulation that is near and dear to my
>> heart, VXLAN.  When I do this, I see an encapsulation that is very similar
>> to VXLAN (e.g. uses UDP, uses a 24-bit segment identifier at the same
>> offset).  I see three things that Geneve adds beyond what is available in
>> draft-mahalingam-dutt-dcops-vxlan:
>>
>>
>>
>> 1) The ability to encapsulate any protocol with an Ethertype (not just
>> Ethernet frames), by adding a Protocol Type field.  This is certainly
>> useful, and has already been covered in draft-quinn-vxlan-gpe as a backward
>> compatible extension to VXLAN by using a P bit flag to signal its presence.
>> The field is even at the same offset as draft-quinn-vxlan-gpe, but is
>> missing the P bit for backwards compatibility.
>>
>>
>>
>> [PG] The backward compatibility argument is invalid since a frame with P bit
>> set (let me call it VXLAN V2) cannot be processed by the older endpoint,
>> thus having no backward compatibility.
>>
>>
>>
>> LK> By backward compatibility, I mean that new implementations of VXLAN
>> (VXLAN V2 as you call it) can understand packets sent by older
>> implementations (VXLAN V1) as well as from new ones.  If older endpoints
>> could understand the future bits, I would call that forward compatibility.
>>
>> [PG] My point was that the VXLAN V2 endpoint would have to support
>> generating and understanding VXLAN V1 format packets. Is it much different
>> than an endpoint supporting both Geneve and VXLAN V1?
>>
>>
>>
>> [PG] Essentially, what you are saying is that one can generate packets in
>> VXLAN V1 for older endpoint and VXLAN V2 for newer endpoints. So the
>> question is, why is VXLAN V2 better than Geneve? In fact, switching on a top
>> level UDP port, provides a cleaner processing pipeline.
>>
>>
>>
>> LK> By enhancing VXLAN, there is no need to get a new UDP port assigned and
>> all the current parsing logic for VXLAN V1 can be applied.
>>
>> [PG] I am not sure if allocating a new port is the meta issue here. The main
>> issue here seems to be whether new protocol should _require_ support for
>> VXLAN V1 or not. Coming from NVGRE side, the same argument would apply to
>> Geneve where one can say that Geneve should be backward compatible with
>> NVGRE. I feel this might be a slippery slope where a new protocol cannot
>> start with a clean slate.
>>
>>
>>
>> 2) The addition of an OAM bit to signal that the packet should be processed
>> by the tunnel endpoint and not forwarded to a tenant.  This also seems
>> useful, and seems identical in usage to the (IMO, poorly named) "Router
>> Alert" bit extension to VXLAN covered in (the currently expired)
>> draft-singh-nvo3-vxlan-router-alert.
>>
>>
>>
>> [PG] Yes, the OAM bit usage is similar. However, this is another extension
>> which is incompatible with older implementation of VXLAN thus breaking
>> backward compatibility.
>>
>>
>>
>> LK> Again, I would call what you are referring to "forward compatibility".
>>
>>
>>
>> 3) Last, but not least is the addition of a variable length options field,
>> which the draft suggests is used to carry metadata along with the payload.
>> As mentioned by some others, IMO, the encapsulation transport header is not
>> the right place to define and carry metadata.  Architecturally, metadata
>> should be defined independent of transport so it can be carried inside of
>> whatever transport is desired (e.g. VXLAN, NVGRE, MPLSoGRE, L2TPV3 etc).
>> One example of an effort to do this is in the Network Service Header draft
>> (draft-quinn-sfc-nsh) being discussed in the SFC WG.  I am guessing that
>> since the Geneve options field is optional, that the metadata it contains is
>> not related to basic network connectivity, but more to providing higher
>> level network services (aka Service Functions).  The Network Service Header
>> contains two separate parts, the service path (used to guide the packets
>> through the service chain) and context (metadata).  I can certainly see the
>> context part of NSH being used to carry metadata even if the service chain
>> is null (all services are fully distributed to the tunnel endpoints).
>>
>>
>>
>> [PG] The meta-data should be defined by their respective group. Different
>> encapsulation protocols can carry those meta-data in their headers as
>> needed. One clear example of how Geneve is better is that it can carry that
>> meta-data without breaking hardware offloads, whereas VXLAN and NVGRE cannot
>> do that. Btw I want to be clear, Geneve is not defining the meta-data, and
>> it is not tying meta-data to Geneve, it is only defining a general purpose
>> ability to carry meta-data, which is tremendously useful to have in the
>> encapsulation header.
>>
>>
>>
>> On a side note, I don’t believe that the design of NSH is suitable for
>> carrying general purpose meta-data. In fact in its current definition, it is
>> not defining service chaining primitives clearly either, however we can
>> discuss that in SFC forum, and focus the discussion on encapsulation header
>> in this forum.
>>
>>
>>
>> In short, I don't see anything in Geneve that cannot be accomplished by
>> using the backward compatible extensions to VXLAN proposed in
>> draft-quinn-vxlan-gpe and draft-singh-nvo3-vxlan-router-alert, combined with
>> the addition of NSH.
>>
>>
>>
>> [PG] Yes, one can put multiple (incompatible) extensions on top of VXLAN,
>> and achieve many things that Geneve is supporting. But at that point, aren’t
>> we creating a new encapsulation format altogether? This new protocol with
>> all such extensions would require new hardware, new software, break existing
>> NIC offloads etc. and still carry the legacy baggage with no clear
>> advantage. At that point, I am not sure, why it is better?
>>
>>
>>
>> LK> As I wrote above, extending VXLAN allows the same UDP port to be used
>> and reuse of the existing VXLAN parsing logic.
>>
>> [PG] The crux of the discussion seems to be, whether Geneve should have a
>> mode that is compatible with VXLAN V1 or not. Even though it might be a
>> slippery slope, I think it is something to think about and debate further.
>>
>>
>>
>> When the current NVO3 WG charter was being written, there seemed to be
>> consensus that we have no shortage of encapsulation options, but what was
>> lacking was a standard control plane.  The Geneve draft seems to turn that
>> on its head by saying "There is a clear advantage in settling on a data
>> format: most of the protocols are only superficially different and there is
>> little advantage in duplicating effort.  However, the same cannot be said of
>> control planes, which are diverse in very fundamental ways.  The case for
>> standardization is also less clear given the wide variety in requirements,
>> goals, and deployment scenarios.".  I agree with the first part of this, so
>> why define a completely new, non-backward compatible encapsulation?  I
>> disagree with the second part, since this is clearly the goal of the NVO3
>> WG.
>>
>>
>>
>> I see that there is an agenda slot to discuss the Geneve draft, but I'm not
>> clear what the goals are of the authors within the IETF since the draft name
>> does not target it to any particular WG, and it is currently marked as
>> "Informational".  I would suggest that the authors consider extending
>> currently implemented encapsulations rather than starting from scratch, e.g.
>> by moving a few bits around in the first word of the Geneve header, it could
>> be made backward compatible with VXLAN.
>>
>>
>>
>> Thanks, Larry
>>
>> _______________________________________________ nvo3 mailing list
>> [email protected] https://www.ietf.org/mailman/listinfo/nvo3
>>
>> _______________________________________________
>> nvo3 mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/nvo3
>>
>>
>>
>> _______________________________________________
>> nvo3 mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/nvo3
>>
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on Draft Geneve

Reply via email to