Dear Authors, I posted those comments and suggested text changes during the WG Adoption process. Since I haven't heard any replies from the authors, I am re-posting my suggested addition and change of text to draft-narten-nvo3-arch-01 plus a few more suggestions to clear out issues being discussed on the mailing list.
Issues with the current writing of Distributed Gateway (Section 5.4): As described in Section 5.3, a Gateway does many things. However, I don't think that a NVE, if taking on the responsibility of a distributed Gateway, will do all the things that a conventional Gateway does (or the list of items mentioned in the Section 5.3). First, it might be too much to ask a NVE based gateway (especially Hypervisor based NVE) to * relay traffic off the virtual networks, i.e. perform gateway function to reach destinations outside the local VNs, * serve as IPSec gateway to external (i.e. out of Virtual Networks), * perform NAT on the source virtual addresses, or * relay traffic to a VN that doesn't have any hosts attached to the NVE (it is debatable if a NVE based distributed Gateway should take this responsibility) The meaningful functions performed by NVE, if designated as "distributed gateway", are more like Inter-VN relay (instead of full blown Gateway function). Second, when host "a" in VN-1 sends traffic to "b" in VN-2, the data packet's Ethernet header has "MAC-DA = Gateway-MAC & MAC-SA= a-MAC & VLAN= VN-1". Most implementations (Microsoft Window 8 and VM NSX) allocate a "fake MAC" for all the NVE based distributed gateways to share, so that host "a" can use the same Gateway-MAC when moved to another NVE. This again is different from the conventional gateways. Third, the issue of conventional gateway (i.e. a->b traffic to be routed at gateway even if "a" & "b" are attached to the same NVE) is traffic "hairpinning", instead of triangular routing. Therefore, I suggest rewriting Section 5.4 as below: 5.4 Distributed Gateway (Re-write) The relaying of traffic from one VN to another, especially when Source and Destination are attached to the same NVE, deserves special consideration. With conventional Gateways, the traffic between TSes on different VNs has to be traversed to the gateway, even if the Source and Destination are attached to the save NVE, which causes traffic hairpinning and wasted bandwidth. As an optimization, it is desirable for individual NVEs to take over the inter-VN relay responsibilities that are traditionally done by conventional gateways to reduce or eliminate hairpinning issue. In order for NVEs to perform inter-VN relay, the NVEs must have access to the policy information needed to determine whether inter-VN communication is allowed. Those inter-VN communication policies are most likely to come from NVA. However, it is not practical for NVEs to take over all functions of conventional gateways. In particular, it might be too much to ask a NVE based gateway (especially Hypervisor based NVE) to * relay traffic off the virtual networks, i.e. perform gateway function to reach destinations outside the local VNs, * serve as IPSec gateway to external traffic * perform NAT on the source virtual addresses, or * relay traffic to a VN that doesn't have any hosts attached to the NVE. The NVEs that are capable of performing inter-VN replaying are called "Distributed Gateway" in this document. (Note: Inter-VN relaying capable NVE is a more accurate term). The NVO3 architecture should support distributed gateways, at least allowing some NVEs, if not all, supporting the inter-VN relaying function, especially when both source and destination are attached to the same NVE. Such support requires the NVO3 control protocols include mechanisms for the maintenance and distribution of the inter-VN policy information to the NVEs that are capable of performing inter-VN communications. There are many emails on the list with regard to the necessity of having NVE<->NVE control plane protocol. All those emails discussion warrants a subsection under Section 4 to explain why/how the consequence of not having NVE<->NVE control plane protocol is a minor issue (or not worth addressing) in the environment addressed by NVO3. Here is my suggested text: 4.4 Is it necessary to have Inter-NVE control plane protocol? (Suggested new sub-section) There could be various reasons, link failure, node failure, or others, causing egress NVEs not reachable. Without any NVE<->NVE control plane protocol, the ingress NVE is not aware of the reachability of egress NVE causing encapsulated packets to dropped somewhere in the underlay network. If most TSes are only attached to a single NVE and traffic to NVEs are not aggregated flows, then the NVE-NVE control plane protocol doesn't provide any benefits. Under this environment, the difference between having "NVE-NVE control protocol" is whether the packets being dropped at the Ingress NVE or somewhere in the underlay network when egress NVE is not reachable. In data center environment where most communications are among applications, most likely a source will not send more packets without acknowledgment from the destination. Then the impact of where the packet is dropped is not that big. However, in an environment where TSes are connected to multiple NVEs, then it might be worthwhile to consider the Inter-NVE control plane protocol, so that ingress NVE can choose different egress NVE for a given target. Thanks for considering my suggested text. Linda Dunbar
_______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
