On Mon, Mar 3, 2014 at 12:05 AM, Anton Ivanov (antivano) <[email protected]> wrote: > Hi all, > > I would like to address one more issue which has been omitted so far from > the background to the discussion. > > If we restrict the use cases to virtualization (which is the remit of NVO3), > the assumption that variable length options are "easy" to implement in > software is valid if and only if they are constant length for the duration > of a session. Otherwise it is incorrect. >
Pardon my ignorance, but what is a "network virtualization session"? I perused several of the nvo3 architecture documents (frameworks, dataplane, requirements, etc.) and couldn't find any references to sessions. Thanks, Tom > If you work purely in software with no VMs involved f.e. software switch > which takes pseudowires from the network and writes to pseudowires with a > variable length header parsing geneve is trivial - you allocate big enough > buffers and play with offsets. The code for that has been polished over the > years, standard kernel buffer handling on all OS-es (or its equivalents for > switches), nothing new here. > > If you have to pass that data into a VM this changes the picture - you want > that data to be page aligned so you can page it in without copying it. This > is trivial if your header is constant for the duration of the session. You > get the header separately, data separately by knowing the offsets. The APIs > to do that are there - it does not matter are you doing it in userspace > (POSIX vector IO and its Microsoft equivalent) or in kernel space scatter > gather IO. It is easy. > > If your header is variable length during the session and you do not know the > size for a particular packet you have page-in the whole buffer and supply > the driver with an offset on where to start. This means that you have to > zero the bits of the header which would otherwise "leak into" the VM every > time and/or do some copying. If you do not zero them, you have a security > issue of the VM seeing its overlay and/or metadata which may have potential > security use. The same applies if you can write directly to the VM address > space instead of paging in buffers via the mmu. Zeroing 256+ bytes on every > pass tends to add up to quite a few CPU cycles over time. > > So from an implementation perspective as far as variable size headers are > concerned, there is little difference between software in a virtualized > environment and hardware. They have very similar restrictions (unless you > want to sacrifice 40% of your performance to an interim copy). Provided that > you want performance of course. > > Going back to Geneve - if the header is constant duration within the session > it is not different from what has been done in l2tp and what is being done > in sfc. No technical merit to perpetrate it. If the header is variable, then > we either have a case of: > > 1. The draft may need an IPR statement already at this stage. I do not feel > comfortable discussing a spec that looks like it has been submarined so you > need a specific piece of IPR to implement it with an acceptable performance. > > 2. A spec that is specifically tailored to a single NPU/NIC to ship from a > single (un)known vendor. This is similarly not something we should be > discussing (once again - IPR statement there too). > > Brgds, > > A. > > > > On 02/03/14 23:30, Phil Bedard wrote: > > I've read most of the posts in this thread as an operator who may be looking > at an overlay solution in the future. > > So the crux of the discussion is whether to extend the functionality of an > existing protocol or introduce a brand new protocol. > > I would like to see the VNI space extended to 32 bits instead of 24 in > whatever encapsulation method is being chosen. 24 seems like a holdover > from the 802.1ah I-SID value and other adapted tunnel protocol limitations > and I'm not sure it's really necessary anymore. > > I also believe there has to be a protocol identifier in the encapsulation > header identifying what comes next. Static provisioning of this kind of > information at the endpoints or midpoints in the case of monitoring gear, > etc. is too cumbersome and not extensible. I think Tom said it initially, > but I also don't believe inserting an Ethernet header just for the sake of > it is efficient and the overlay encapsulation protocol should be able to > encapsulate IP directly. > > I do not think the metadata should be a part of the encapsulation protocol, > the encapsulation header should be a fixed length. I think the majority of > simple overlay networks will not require additional metadata information and > will likely be using the encapsulation with nothing following it but IP > packets or Ethernet frames. Having a variable length suffix is just going > to add implementation headaches for hardware vendors and will be a quick way > to see it not get adopted, IMHO. If someone needs additional hardware > support for the next header, whether it be a security integrity header, or > some sort of additional metadata, let that be sorted out elsewhere. > > Just my 2c. > > -Phil > > > From: Pankaj Garg <[email protected]> > Date: Sunday, March 2, 2014 at 2:06 PM > To: "Larry Kreeger (kreeger)" <[email protected]>, "[email protected]" > <[email protected]> > Subject: Re: [nvo3] Comments on Draft Geneve > > My responses are inline marked with PG. > > > > From: Larry Kreeger (kreeger) [mailto:[email protected]] > Sent: Sunday, March 2, 2014 9:16 PM > To: Pankaj Garg; [email protected] > Subject: Re: Comments on Draft Geneve > > > > My responses are inline marked with LK>. - Larry > > > > From: Pankaj Garg <[email protected]> > Date: Saturday, March 1, 2014 4:22 AM > To: Larry Kreeger <[email protected]>, "[email protected]" <[email protected]> > Subject: RE: Comments on Draft Geneve > > > > My comments are inline marked with [PG]. > > > > From: nvo3 [mailto:[email protected]] On Behalf Of Larry Kreeger > (kreeger) > Sent: Saturday, March 1, 2014 3:28 AM > To: [email protected] > Subject: [nvo3] Comments on Draft Geneve > > > > I see that a healthy discussion has broken out around draft-gross-geneve-00 > which I see has a slot in the agenda for the NVO3 WG meeting on Monday. > Here are my thoughts. > > > > I will be comparing Geneve to an encapsulation that is near and dear to my > heart, VXLAN. When I do this, I see an encapsulation that is very similar > to VXLAN (e.g. uses UDP, uses a 24-bit segment identifier at the same > offset). I see three things that Geneve adds beyond what is available in > draft-mahalingam-dutt-dcops-vxlan: > > > > 1) The ability to encapsulate any protocol with an Ethertype (not just > Ethernet frames), by adding a Protocol Type field. This is certainly > useful, and has already been covered in draft-quinn-vxlan-gpe as a backward > compatible extension to VXLAN by using a P bit flag to signal its presence. > The field is even at the same offset as draft-quinn-vxlan-gpe, but is > missing the P bit for backwards compatibility. > > > > [PG] The backward compatibility argument is invalid since a frame with P bit > set (let me call it VXLAN V2) cannot be processed by the older endpoint, > thus having no backward compatibility. > > > > LK> By backward compatibility, I mean that new implementations of VXLAN > (VXLAN V2 as you call it) can understand packets sent by older > implementations (VXLAN V1) as well as from new ones. If older endpoints > could understand the future bits, I would call that forward compatibility. > > [PG] My point was that the VXLAN V2 endpoint would have to support > generating and understanding VXLAN V1 format packets. Is it much different > than an endpoint supporting both Geneve and VXLAN V1? > > > > [PG] Essentially, what you are saying is that one can generate packets in > VXLAN V1 for older endpoint and VXLAN V2 for newer endpoints. So the > question is, why is VXLAN V2 better than Geneve? In fact, switching on a top > level UDP port, provides a cleaner processing pipeline. > > > > LK> By enhancing VXLAN, there is no need to get a new UDP port assigned and > all the current parsing logic for VXLAN V1 can be applied. > > [PG] I am not sure if allocating a new port is the meta issue here. The main > issue here seems to be whether new protocol should _require_ support for > VXLAN V1 or not. Coming from NVGRE side, the same argument would apply to > Geneve where one can say that Geneve should be backward compatible with > NVGRE. I feel this might be a slippery slope where a new protocol cannot > start with a clean slate. > > > > 2) The addition of an OAM bit to signal that the packet should be processed > by the tunnel endpoint and not forwarded to a tenant. This also seems > useful, and seems identical in usage to the (IMO, poorly named) "Router > Alert" bit extension to VXLAN covered in (the currently expired) > draft-singh-nvo3-vxlan-router-alert. > > > > [PG] Yes, the OAM bit usage is similar. However, this is another extension > which is incompatible with older implementation of VXLAN thus breaking > backward compatibility. > > > > LK> Again, I would call what you are referring to "forward compatibility". > > > > 3) Last, but not least is the addition of a variable length options field, > which the draft suggests is used to carry metadata along with the payload. > As mentioned by some others, IMO, the encapsulation transport header is not > the right place to define and carry metadata. Architecturally, metadata > should be defined independent of transport so it can be carried inside of > whatever transport is desired (e.g. VXLAN, NVGRE, MPLSoGRE, L2TPV3 etc). > One example of an effort to do this is in the Network Service Header draft > (draft-quinn-sfc-nsh) being discussed in the SFC WG. I am guessing that > since the Geneve options field is optional, that the metadata it contains is > not related to basic network connectivity, but more to providing higher > level network services (aka Service Functions). The Network Service Header > contains two separate parts, the service path (used to guide the packets > through the service chain) and context (metadata). I can certainly see the > context part of NSH being used to carry metadata even if the service chain > is null (all services are fully distributed to the tunnel endpoints). > > > > [PG] The meta-data should be defined by their respective group. Different > encapsulation protocols can carry those meta-data in their headers as > needed. One clear example of how Geneve is better is that it can carry that > meta-data without breaking hardware offloads, whereas VXLAN and NVGRE cannot > do that. Btw I want to be clear, Geneve is not defining the meta-data, and > it is not tying meta-data to Geneve, it is only defining a general purpose > ability to carry meta-data, which is tremendously useful to have in the > encapsulation header. > > > > On a side note, I don’t believe that the design of NSH is suitable for > carrying general purpose meta-data. In fact in its current definition, it is > not defining service chaining primitives clearly either, however we can > discuss that in SFC forum, and focus the discussion on encapsulation header > in this forum. > > > > In short, I don't see anything in Geneve that cannot be accomplished by > using the backward compatible extensions to VXLAN proposed in > draft-quinn-vxlan-gpe and draft-singh-nvo3-vxlan-router-alert, combined with > the addition of NSH. > > > > [PG] Yes, one can put multiple (incompatible) extensions on top of VXLAN, > and achieve many things that Geneve is supporting. But at that point, aren’t > we creating a new encapsulation format altogether? This new protocol with > all such extensions would require new hardware, new software, break existing > NIC offloads etc. and still carry the legacy baggage with no clear > advantage. At that point, I am not sure, why it is better? > > > > LK> As I wrote above, extending VXLAN allows the same UDP port to be used > and reuse of the existing VXLAN parsing logic. > > [PG] The crux of the discussion seems to be, whether Geneve should have a > mode that is compatible with VXLAN V1 or not. Even though it might be a > slippery slope, I think it is something to think about and debate further. > > > > When the current NVO3 WG charter was being written, there seemed to be > consensus that we have no shortage of encapsulation options, but what was > lacking was a standard control plane. The Geneve draft seems to turn that > on its head by saying "There is a clear advantage in settling on a data > format: most of the protocols are only superficially different and there is > little advantage in duplicating effort. However, the same cannot be said of > control planes, which are diverse in very fundamental ways. The case for > standardization is also less clear given the wide variety in requirements, > goals, and deployment scenarios.". I agree with the first part of this, so > why define a completely new, non-backward compatible encapsulation? I > disagree with the second part, since this is clearly the goal of the NVO3 > WG. > > > > I see that there is an agenda slot to discuss the Geneve draft, but I'm not > clear what the goals are of the authors within the IETF since the draft name > does not target it to any particular WG, and it is currently marked as > "Informational". I would suggest that the authors consider extending > currently implemented encapsulations rather than starting from scratch, e.g. > by moving a few bits around in the first word of the Geneve header, it could > be made backward compatible with VXLAN. > > > > Thanks, Larry > > _______________________________________________ nvo3 mailing list > [email protected] https://www.ietf.org/mailman/listinfo/nvo3 > > _______________________________________________ > nvo3 mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/nvo3 > > > > _______________________________________________ > nvo3 mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/nvo3 > _______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
