On 3/25/14, 8:10 AM, Tom Herbert wrote:
Tom,
please note that the VXLAN-GPE draft says that a GPE device must not send
non-ethernet frames to a VXLAN device (Section 4.2), exactly to avoid the
problem you describe.

Unfortunately, that requirement conflicts with the robustness
principle. In a full scale deployment, it might be potentially
feasible with a whole bunch of control plane logic to enforce the rule
between communicating end hosts, but that still wouldn't account for
middleboxes somewhere in the path that have implemented VXLAN
functionality.

right, the hard part is the incremental deployment of a new solution on top of the existing VXLAN implementations. The draft is just suggesting a way to do that incremental deployment, with some known limitations. GPE tries to play with the current definition of 'reserved' bits in VXLAN, using that as a way to transition existing VXLAN fabrics to GPE fabrics that will support multiple protocol encapsulation (IPx, ethernet) and metadata.


Note, this is not the only potential incompatibility issue with VXLAN.
Every new flag defining a new field would create another instance of
incompatibility. This is not just hypothetical, we have already
demonstrated that adding a new field to GRE breaks hashing in
switches. This problem also exists with nvgre.

In fact, one could argue that every new flag added after the protocol definition is indeed a bit of a new distributed 'version' field. One could explicitly use a couple of bits as a version, or define the new protocol in a way that a new flag will raise an exception in older implementations.


Also note that the draft is focusing on deployments where VXLAN is already
in use, and GPE is introduced incrementally. Hence the use of the same UDP
port for VXLAN and GPE.

I suspect that in the deployment you describe, one could disambiguate GPE
from VXLAN by using two different destination UDP ports. if you still want
to have backward compatibility the sending GPE device will have to know the
receiver's capability (VXLAN or GPE), and pick the appropriate destination
UDP port.

At that point it becomes a different protocol.

Right. It is indeed a different protocol that is trying to coexist with the reality of networks where VXLAN (and LISP, by the way, as specified in the companion doc http://tools.ietf.org/html/draft-lewis-lisp-gpe) is already deployed.


I believe a generic and extensible encapsulation protocol needs three
fundamental elements:

1) Type-version-- so that new (incompatible) formats can be safely defined

I think Type-version doesn't buy you much in term of backward compatibility (compatibility of a newer device with an older device): you just can't change the VXLAN specification.

It helps a bit with forward compatibility (compatibility with future versions of the protocol) to the limited extent that older implementations will have to take a specific action (drop most likely) for newer versions.

However, I think you can do 'versioning' in various way: using a different UDP port, with an explicit version field, or using the combination of the reserved bits as a version field. Given that reserved bits have proven over the years to be hot real estate, GPE is not using an explicit version field, but does support versioning.

2) Protocol type-- type of encapsulated packets
3) Header length-- offset of next header can be determined
independently of any other elements in the encapsulation.

The semantics of these elements should be invariant (just like in IP).
Protocol type always defines the type encapsulated packet, header
length is always offset of it. If an alternate interpretation of the
payload is needed which does not correspond to a protocol type (like
an OAM message) this should be in a separate type.
or you could use a flag for the last one (OAM).

All of those features come with a cost, and can be implemented in different ways. GPE is trying to use an approach that is close to VXLAN (and LISP) so that the incremental cost of implementing GPE+VXLAN+LISP on the same device give it a chance to be deployed.

Please look http://tools.ietf.org/html/draft-herbert-gue-01 for reference.

That's a great write up. Thanks especially for the appendix that articulates very well the motivations that are driving the effort of using the optimization provided by current NICs.

I think the design of the protocol would benefit from a better separation of the network virtualization layer and the metadata layer. It would allow each implementation (at the end host or in the network) to implement independently part of the specification, and will eventually help with adoption. I think with a better layering you could also take advantage of other well established features (such as security, for example) that you may want to reuse, rather than reinvent.


Fabio



Thanks,
Tom


Regards,
Fabio



A.
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to