I have some comments/suggestions on draft-drake-nvo3-evpn-control-plane
(some of these suggestions have been made privately to the
draft-marques-l3vpn-end-system authors as well):
1) I would suggest *not* altering the semantics of the MPLS label in the
BGP route. Instead, use the route distinguisher to carry the 24-bit VNID
(this is arguably better since the semantics of the RD align better with
the semantics of the VNID). I would suggest encoding this as a type 0 RD,
with the VNID going into the Assigned number sub-field. In addition, call
out that an MPLS label value of 0 in the BGP route is a valid value, and
will be used by PEs which do not support MPLS encap.
2) I've separately mentioned that the draft should call out that the PE
control plane need not be co-located with the PE forwarding plane, and
that XMPP could be used as the messaging format between the two. Each
endpoint update should look like this:{endpoint_mac, {endpoint_ips}, {NVE
IPs}, RD, label}. This allows for true decoupling of the control plane
from the frame format.
3) Even though it's not called out explicitly, the model used in both
draft-drake-nvo3-evpn-control-plane and draft-marques-l3vpn-end-system
assumes that the PE forwarding plane is interested in having all of the
endpoint routes in their participating VNIDs. In my view, this puts an
unnecessary load on NVEs. Instead, we can modify XMPP so that the NVE can
request end point resolution of specific addresses within a VNID, so it
only ever needs to cache information about flows that are transiting it.
The issue here is how to push endpoint updates to the NVE when endpoints
move (since having the PE keep track of which update to push to which NVE
is unreasonable), My proposal here is to place the onus of requesting
updates on the NVE, possibly triggered by the receipt of an ICMP error
message. In other words, mandate that when an NVE receives a packet that
after decapsulation is found to belong to an end-host that is no longer
present in the attached customer network, it generates an ICMP error (rate
controlled) back to source, taking care to include in the ICMP error
packet enough of the payload so that the remote PE can figure out the
customer endpoints that were attempting to communicate. On receipt of such
an ICMP error, the NVE can extract the endpoint information from the
payload and request the control plane for an endpoint update.
4) I would also suggest not having the NVE keep track of the encapsulation
used by the remote endpoint. (this means that the tunnel encapsulation
attribute in the draft would be unnecessary). Instead, the onus of
translating between encapsulation methods should be on gateways. If you
define the XMPP format well, you should be able to communicate end point
information in a way that is agnostic of the encap method used by the NVE,
allowing it to do the one encap it does best. A gateway can do this
translation without BGP control plane intervention, because it would be
configured to have interfaces that are (for eg) NVGRE on one arm and VXLAN
on the other, and it would be obvious as to what encap to put on a packet
going from one arm to the other. Applying an MPLS label would involve the
gateway participating in BGP.
5) The multi-homing discussed in the draft only covers the case where the
CE devices are physically separated from the PE devices by physical links.
In the case where the PE forwarding plane is implemented in the
hypervisor, the more important multi-homing question is what to do if the
NVE is connected to two or more upstream devices, (basically, the NVE has
two or more IP addresses). What I would like to happen is have a mac
address route be associated with multiple NVE addresses. I believe this is
possible using the framework established in the
draft-raggarwa-sajassi-l2vpn-evpn, but it might be worthwhile to call out
this case in the draft-drake-nvo3-evpn-control-plane draft. This is useful
because draft-raggarwa-sajassi-l2vpn-evpn treats multiple PE IP addresses
associated with the same mac address as belonging to separate MESs, and
assumes that the CE-PE links will be labelled with an ethernet segment,
which is not the case for the hypervisor NVE/PE.
Long email, thanks for reading!
--
Sunny
p.s. John, your email address in the draft is incorrect.
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3