Hi Guru, Sure, providing more explanation.
Q. What are we trying to solve? Ans. Getting distributed routing to work for vlan backed networks through OVN. Q. Disconnect wrt OVN capabilities for above task? Ans. OVN lacks in certain areas wrt how to forward the packets "correctly/efficiently" in the absence of encapsulation (VXLAN, STT or GENEVE). Following are the known gaps: L3 E-W ====== a. Since a router port is distributed, hence in the absence of encapsulation, we should not be using router port mac as source mac. Our proposal is to replace router port mac with a chassis specific unique mac, when an unencapsulated packet originating from router port goes on wire. This was explained in following email: https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353179.html b. Sending ARP reply on wire. As of now, OVN consumes ARP reply from VM which are destined to router port (because router port is present locally on vm's chassis as well). Because ARP reply is NOT seen on the wire, hence a physical switch will never learn VM's mac (unless VM is involved in a L2 communication as well). As a result, a DVR routed traffic, will always be flooded by TOR (top of the rack switch), as dest mac is that of the VM, which TOR never learnt. L3 N-S ====== a. For vlan backed networks, NATing is NOT a must to talk to "outside" physical network (for overlay it is). Hence, OVN requires some changes in this area as well. b. DO NOT respond to ARP request for any ROUTER PORT from uplink, unless it is on gateway chassis. c. When gateway chassis failover happens, then advertise router port mac as well. L3 N-S NAT ========= a. Current OVN implementation uses geneve encap (geneve options) to provide metadata to the gateway chassis (where SNAT happens). b. In the absence of encapsulation, OVN should be enhanced to still support NAT on gateway chassis. =========================================================== Our initial proposal has details as well: https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353066.html Like i mentioned, problem statement we are trying to solve is "Distributed Virtual Routing For VLAN Backed Networks". As a part of above, we have identified some gaps, which we intend to fix. As we progress further, we will have to add some features as well. But, as of now we are focused on getting basic functionality to work correctly first. Please feel free to put forth more queries/concerns you have, i will be happy to explain. Thanks again for review. Regards, Ankur ________________________________ From: Guru Shetty <g...@ovn.org> Sent: Monday, November 12, 2018 9:58:07 AM To: Ankur Sharma Cc: ovs dev; Numan Siddique; Ben Pfaff Subject: Re: VLAN tenant network patches On Sun, 11 Nov 2018 at 21:02, Ankur Sharma <ankur.sha...@nutanix.com<mailto:ankur.sha...@nutanix.com>> wrote: Hi Guru, Thanks for spending time in understanding the proposal and drafting your understanding as well. Thanks Numan for pitching in. Some comments (trying to keep them as brief as possible). a. On a high level, we are trying to do following: "Distributed router functionality for vlan backed networks" I guess, there is a big disconnect then. OVN currently does "distributed router for VLAN backed networks". Do you disagree? If so, please explain. b. This would include changes/analysis for E-W traffic and N-S traffic. c. Some the changes are specific to the characteristics of a distributed router and some are specific to OVN way of doing things. d. The points we have discussed thus far, is a subset of changes, i.e a vlan backed DVR (or logical router) would be more than just replacing router port mac with a chassis mac. e. Numan's changes DO NOT conflict/overlap with what we have proposed so far and hence should be discussed/reviewed independently. His changes are solving a very specific problem. His changes are to "mimic" a centralized router in a distributed router. i.e to execute router pipeline on a centralized chassis, while the router is still is distributed. I have provided my feedback here: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/353701.html [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2018-2DNovember_353701.html&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=XO-18lbZRBMj3Y31RX-knscxn_yu-Y9ukK_MhWMq5_s&s=RF3Hsy_IA_3gDzjc66Cpnm8PSAczbRB1PDNIWCIUYQc&e=> Providing some more comments inline. Thanks again Guru, Numan, Mark and Han for spending time on the proposal and providing feedback. I am preparing a v2, which will have changes till E-W. Regards, Ankur ________________________________ From: Guru Shetty <g...@ovn.org<mailto:g...@ovn.org>> Sent: Friday, November 9, 2018 11:45 AM To: ovs dev Cc: Ankur Sharma; Numan Siddique; Ben Pfaff Subject: VLAN tenant network patches I have tried to summarize the problem statement that Numan and Ankur are trying to solve here based on my understanding so far. Please correct me and I will revise it along. Current feature set in OVN. ========================== A logical switch should only have one localnet logical port. If a logical switch has a logical port of type "localnet",then all traffic for that logical switch avoids overlays. So in essence, this is only useful when all the hypervisors are in the same broadcast domain. Currently there are no known problems as long as logical switches are not connected to any logical routers. When 2 logical switches (each with a localnet port) is connected to a logical router, we still push all east-west traffic to the underlay. The source hypervisor executes the pipeline of all 3 logical datapaths and then pushes the traffic to the underlay via the localnet port (with its corresponding vlan tag) of third logical switch. The above topology creates a problem for the underlying hardware switch. Because now it can see the same mac address of the distributed router coming from 2 different hypervisors as a source mac address of the packet on wire. According to Ankur, there are physical switches which can detect source mac address coming from differnet ports and limit it. But this looks like it is configurable in physical switches. For N/S traffic, currently traffic is punted to gateway chassis via a overlay tunnel. There is a use case where you want to avoid overlay tunnels. This is because for "localnet" topology you can keep the the MTU of inner VM to be the same as underlay MTU. But when you have overlays just for one class of traffic, this becomes a problem. So both Ankur's and Numan's patches tries to tackle the above 2 problems. To re-summarize Problem 1: External switch getting confused about the machine on which OVN router mac address resides. But this is only source mac address coming from different hypervisors (not destination mac). [ANKUR]: We are trying to do more than just replacing a router port mac with a chassis mac. We are trying to get a distributed routing functionality working via OVN for vlan backed networks. Not using the router port mac, is one of the first problems that has to be solved. For a production deployment, we might need some more changes/analysis. Problem 2: When packet has destination IP address outside OVN router known subnets, it is being currently sent via overlay tunnel. This would need MTU configuration for inner VMs. Numans patch: ============ Numans patches tries to solve the above 2 problems by doing the following. 1. When VM-A (on Hyp-A) in switch-A tries to talk to VM-B in switch-B (Hyp-B) (switch-A and switch-B are connected with router), Hyp-A will execute switch-A pipeline and push the traffic out of localnet port with router's mac address as destination. 2. Router chassis will receive the packet, execute switch-A pipleline again, router pipeline and then switch-B pipeline and push packet out of switch-B's localnet port. Now Hyp-B receives the traffic, executes switch-B pipeline again and packet gets delivered. The result is that all east west traffic is centralized and has an extra hop. [ANKUR]: Yes, Numan's approach is to mimic a centralized router, while the vlan backed logical switch is still connected to a distributed logical router (i.e connecting ports are of type "patch"). Ankur's proposal: ============== Though the complete patches do not exist, Ankur wants to solve the problem 1 by having a chassis specific MAC. So when packet leaves a hypervisor for east-west routing, it uses a unique mac. The disadvantage with this proposal is that the VM (i.e logical port) will see the mac of its first hop router change continuously which may have some yet to be clearly defined side-effects (leads to more ARP requests from the VM). [ANKUR]: Just want to clarify that a tcp/ip stack would NEVER populate its ARP cache based on IP packets. It would rely on ARP (/GARP) to resolve gateway mac, ARP queries for router port (gateway) ip will always be responded by OVN with router port mac only. i.e using Chassis mac as source mac WOULD NOT impact any functionality of a VM's networking stack. However, it could still be desirable to NOT TO show the chassis mac to a VM. We intend to solve it as well, but our first implementation does not look clean/scalable. We will submit it for review anyways, but not in the first series. Problem 2 is solved similar to what Numan has in patches, although there are small changes in implementation. It is not clear whether one code is more complicated than other. But it looks like Ankur’s patches will avoid the extra hop for east-west traffic. Numan is perfectly fine with Ankur’s patches (after it is sent, reviewed and tested) if they satisfy his problem statements. But he does prefer his patches reviewed and merged if there is delay in Ankur's patches (and possibly reverted later, if there is an alternative). [ANKUR] Mine and Numan's patches are not realted to each other and should not be seen as "either or". Numan's patch is trying to solve a very specific case. It should be reviewed independently and should not be blocked because of my changes. Management plane / data center architecture would drive which approach to take. As a platform, OVN should support both. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev