On 3/25/14, 8:10 AM, Tom Herbert wrote:
Tom,
please note that the VXLAN-GPE draft says that a GPE device must not send
non-ethernet frames to a VXLAN device (Section 4.2), exactly to avoid the
problem you describe.
Unfortunately, that requirement conflicts with the robustness
principle. In a full scale deployment, it might be potentially
feasible with a whole bunch of control plane logic to enforce the rule
between communicating end hosts, but that still wouldn't account for
middleboxes somewhere in the path that have implemented VXLAN
functionality.
right, the hard part is the incremental deployment of a new solution on
top of the existing VXLAN implementations. The draft is just suggesting
a way to do that incremental deployment, with some known limitations.
GPE tries to play with the current definition of 'reserved' bits in
VXLAN, using that as a way to transition existing VXLAN fabrics to GPE
fabrics that will support multiple protocol encapsulation (IPx,
ethernet) and metadata.
Note, this is not the only potential incompatibility issue with VXLAN.
Every new flag defining a new field would create another instance of
incompatibility. This is not just hypothetical, we have already
demonstrated that adding a new field to GRE breaks hashing in
switches. This problem also exists with nvgre.
In fact, one could argue that every new flag added after the protocol
definition is indeed a bit of a new distributed 'version' field. One
could explicitly use a couple of bits as a version, or define the new
protocol in a way that a new flag will raise an exception in older
implementations.
Also note that the draft is focusing on deployments where VXLAN is already
in use, and GPE is introduced incrementally. Hence the use of the same UDP
port for VXLAN and GPE.
I suspect that in the deployment you describe, one could disambiguate GPE
from VXLAN by using two different destination UDP ports. if you still want
to have backward compatibility the sending GPE device will have to know the
receiver's capability (VXLAN or GPE), and pick the appropriate destination
UDP port.
At that point it becomes a different protocol.
Right. It is indeed a different protocol that is trying to coexist with
the reality of networks where VXLAN (and LISP, by the way, as specified
in the companion doc http://tools.ietf.org/html/draft-lewis-lisp-gpe) is
already deployed.
I believe a generic and extensible encapsulation protocol needs three
fundamental elements:
1) Type-version-- so that new (incompatible) formats can be safely defined
I think Type-version doesn't buy you much in term of backward
compatibility (compatibility of a newer device with an older device):
you just can't change the VXLAN specification.
It helps a bit with forward compatibility (compatibility with future
versions of the protocol) to the limited extent that older
implementations will have to take a specific action (drop most likely)
for newer versions.
However, I think you can do 'versioning' in various way: using a
different UDP port, with an explicit version field, or using the
combination of the reserved bits as a version field. Given that reserved
bits have proven over the years to be hot real estate, GPE is not using
an explicit version field, but does support versioning.
2) Protocol type-- type of encapsulated packets
3) Header length-- offset of next header can be determined
independently of any other elements in the encapsulation.
The semantics of these elements should be invariant (just like in IP).
Protocol type always defines the type encapsulated packet, header
length is always offset of it. If an alternate interpretation of the
payload is needed which does not correspond to a protocol type (like
an OAM message) this should be in a separate type.
or you could use a flag for the last one (OAM).
All of those features come with a cost, and can be implemented in
different ways. GPE is trying to use an approach that is close to VXLAN
(and LISP) so that the incremental cost of implementing GPE+VXLAN+LISP
on the same device give it a chance to be deployed.
Please look http://tools.ietf.org/html/draft-herbert-gue-01 for reference.
That's a great write up. Thanks especially for the appendix that
articulates very well the motivations that are driving the effort of
using the optimization provided by current NICs.
I think the design of the protocol would benefit from a better
separation of the network virtualization layer and the metadata layer.
It would allow each implementation (at the end host or in the network)
to implement independently part of the specification, and will
eventually help with adoption. I think with a better layering you could
also take advantage of other well established features (such as
security, for example) that you may want to reuse, rather than reinvent.
Fabio
Thanks,
Tom
Regards,
Fabio
A.
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3