I have some comments/suggestions on draft-drake-nvo3-evpn-control-plane 
(some of these suggestions have been made privately to the 
draft-marques-l3vpn-end-system authors as well):

1) I would suggest *not* altering the semantics of the MPLS label in the 
BGP route. Instead, use the route distinguisher to carry the 24-bit VNID 
(this is arguably better since the semantics of the RD align better with 
the semantics of the VNID). I would suggest encoding this as a type 0 RD, 
with the VNID going into the Assigned number sub-field. In addition, call 
out that an MPLS label value of 0 in the BGP route is a valid value, and 
will be used by PEs which do not support MPLS encap. 

2) I've separately mentioned that the draft should call out that the PE 
control plane need not be co-located with the PE forwarding plane, and 
that XMPP could be used as the messaging format between the two. Each 
endpoint update should look like this:{endpoint_mac, {endpoint_ips}, {NVE 
IPs}, RD, label}. This allows for true decoupling of the control plane 
from the frame format.

3) Even though it's not called out explicitly, the model used in both 
draft-drake-nvo3-evpn-control-plane and draft-marques-l3vpn-end-system 
assumes that the PE forwarding plane is interested in having all of the 
endpoint routes in their participating VNIDs. In my view, this puts an 
unnecessary load on NVEs. Instead, we can modify XMPP so that the NVE can 
request end point resolution of specific addresses within a VNID, so it 
only ever needs to cache information about flows that are transiting it. 

The issue here is how to push endpoint updates to the NVE when endpoints 
move (since having the PE keep track of which update to push to which NVE 
is unreasonable), My proposal here is to place the onus of requesting 
updates on the NVE, possibly triggered by the receipt of an ICMP error 
message. In other words, mandate that when an NVE receives a packet that 
after decapsulation is found to belong to an end-host that is no longer 
present in the attached customer network, it generates an ICMP error (rate 
controlled) back to source, taking care to include in the ICMP error 
packet enough of the payload so that the remote PE can figure out the 
customer endpoints that were attempting to communicate. On receipt of such 
an ICMP error, the NVE can extract the endpoint information from the 
payload and request the control plane for an endpoint update.

4) I would also suggest not having the NVE keep track of the encapsulation 
used by the remote endpoint. (this means that the tunnel encapsulation 
attribute in the draft would be unnecessary). Instead, the onus of 
translating between encapsulation methods should be on gateways. If you 
define the XMPP format well, you should be able to communicate end point 
information in a way that is agnostic of the encap method used by the NVE, 
allowing it to do the one encap it does best. A gateway can do this 
translation without BGP control plane intervention, because it would be 
configured to have interfaces that are (for eg) NVGRE on one arm and VXLAN 
on the other, and it would be obvious as to what encap to put on a packet 
going from one arm to the other. Applying an MPLS label would involve the 
gateway participating in BGP.

5) The multi-homing discussed in the draft only covers the case where the 
CE devices are physically separated from the PE devices by physical links. 
In the case where the PE forwarding plane is implemented in the 
hypervisor, the more important multi-homing question is what to do if the 
NVE is connected to two or more upstream devices, (basically, the NVE has 
two or more IP addresses). What I would like to happen is have a mac 
address route be associated with multiple NVE addresses. I believe this is 
possible using the framework established in the 
draft-raggarwa-sajassi-l2vpn-evpn, but it might be worthwhile to call out 
this case in the draft-drake-nvo3-evpn-control-plane draft. This is useful 
because draft-raggarwa-sajassi-l2vpn-evpn treats multiple PE IP addresses 
associated with the same mac address as belonging to separate MESs, and 
assumes that the CE-PE links will be labelled with an ethernet segment, 
which is not the case for the hypervisor NVE/PE.

Long email, thanks for reading!
--
Sunny
p.s. John, your email address in the draft is incorrect.

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to