Thanks, Ben. Sorry for the delay. Please find attached a draft design proposal and let me know your comments etc. I did some quick prototyping to check for feasibility too; I can share that, if it helps. Note, the document is a draft and, I admit, there might be things that I haven't thought about/through, or missed. I am attaching a text doc, assuming it might be easier, but if you'd like it in a different format, please let me know.
thanks! -venu On Wednesday, October 31, 2018, 10:30:23 AM PDT, Ben Pfaff <b...@ovn.org> wrote: Honestly the best thing to do is probably to propose a design or, if it's simple enough, to send a patch. That will probably be more effective at sparking a discussion. On Wed, Oct 31, 2018 at 03:33:48PM +0000, venugopal iyer wrote: > Hi: > Just wanted to check if folks had any thoughts on the use case Girish > outlined below. We do have > a real use case for this and are interested in looking at options for > supporting more than one VTEP IP.It is currently a limitation for us, wanted > to know if there are similar use cases folks are looking at/interested in > addressing. > > thanks, > -venu > > On Thursday, September 6, 2018, 9:19:01 AM PDT, venugopal iyer via dev ><ovs-...@openvswitch.org> wrote: > > Would it be possible for the association <logical port|dst MAC, VTEP> to be >made > when the logical port is instantiated on a node? and relayed on to the SB by > the controller, e.g. assuming a mechanism to specify/determine a physical > port mapping for a > logical port for a VM. The <physical port,encap-ip> mappings can be > specified as > configuration on the chassis. In the absence of physical port information for > a logical port/VM, I suppose we could default to an encap-ip. > > > just a thought, > -venu > On Wednesday, September 5, 2018, 2:03:35 PM PDT, Ben Pfaff <b...@ovn.org> > wrote: > > How would OVN know which IP to use for a given logical port on a > chassis? > > I think that the "multiple tunnel encapsulations" is meant to cover, > say, Geneve vs. STT vs. VXLAN, not the case you have in mind. > > On Wed, Sep 05, 2018 at 09:50:32AM -0700, Girish Moodalbail wrote: > > Hello all, > > > > I would like to add more context here. In the diagram below > > > > +----------------------------------+ > > |ovn-host | > > | | > > | | > > | +-------------------------+| > > | | br-int || > > | +----+-------------+------+| > > | | | | > > | +--v-----+ +---v----+ | > > | | geneve | | geneve | | > > | +--+-----+ +---+----+ | > > | | | | > > | +-v----+ +--v---+ | > > | | IP0 | | IP1 | | > > | +------+ +------+ | > > +----------+ eth0 +-----+ eth1 +---+ > > +------+ +------+ > > > > eth0 and eth are, say, in its own physical segments. The VMs that are > > instantiated in the above ovn-host will have multiple interfaces and each > > of those interface need to be on a different Geneve VTEP. > > > > I think the following entry in OVN TODOs ( > > https://github.com/openvswitch/ovs/blob/master/ovn/TODO.rst) > > > > ---------------8<------------------8<--------------- > > Support multiple tunnel encapsulations in Chassis. > > > > So far, both ovn-controller and ovn-controller-vtep only allow chassis to > > have one tunnel encapsulation entry. We should extend the implementation to > > support multiple tunnel encapsulations > > ---------------8<------------------8<--------------- > > > > captures the above requirement. Is that the case? > > > > Thanks again. > > > > Regards, > > ~Girish > > > > > > > > > > On Tue, Sep 4, 2018 at 3:00 PM Girish Moodalbail <gmoodalb...@gmail.com> > > wrote: > > > > > Hello all, > > > > > > Is it possible to configure remote_ip as a 'flow' instead of an IP address > > > (i.e., setting ovn-encap-ip to a single IP address)? > > > > > > Today, we have one VTEP endpoint per OVN host and all the VMs that > > > connects to br-int on that OVN host are reachable behind this VTEP > > > endpoint. Is it possible to have multiple VTEP endpoints for a br-int > > > bridge and use Open Flow flows to select one of the VTEP endpoint? > > > > > > > > > +----------------------------------+ > > > |ovn-host | > > > | | > > > | | > > > | +-------------------------+| > > > | | br-int || > > > | +----+-------------+------+| > > > | | | | > > > | +--v-----+ +---v----+ | > > > | | geneve | | geneve | | > > > | +--+-----+ +---+----+ | > > > | | | | > > > | +-v----+ +--v---+ | > > > | | IP0 | | IP1 | | > > > | +------+ +------+ | > > > +----------+ eth0 +-----+ eth1 +---+ > > > +------+ +------+ > > > > > > Also, we don't want to bond eth0 and eth1 into a bond interface and then > > > use bond's IP as VTEP endpoint. > > > > > > Thanks in advance, > > > ~Girish > > > > > > > > > > > > > > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >too, I can share that if it helps.
OVN Multi-VTEP - Draft Design Proposal Version 1.0 11/29/2018 1. Context =========== Currently, OVN overlay is configured using the following ovn parameters as external-ids to OVS on each chassis: 1. ovn-encap-ip: "The" [outer] IP used to connect to this chassis 2. ovn-encap-type: List of supported overlay: Geneve, STT and VXLAN (though VXLAN is only used to communicate with gateways given its fixed/limited header size) When there are multiple ovn-encap-types, each type will be paired with the encap-ip (ovn- controller(8)[1]). This works well in the generic case as OVN communication between remote hypervisors takes place using their corresponding IPs, independent of whether the IP is configured on an interface or a bond, as long as they are reachable using L3. 2. Problem Statement ==================== The single ovn-encap-ip (henceforth referred as single VTEP) becomes an issue when we start working with SR-IOV and multiple NICs, all participating Alsoin OVN logical network. E.g: Chassis X +---------------------------------------+ | | | +-------+ +-------+ | | | | | | | | | vm1 | | vm2 | | | | | | | | | +-------+ +-------+ | | |vf1 |vf2 | | +-------+ +-------+ |considerations | | +-----------------------+ | | | | | br-int | | | | | +-----------------------+ | | | | | |geneve | | | | | | (encap-ip=X) | | | | | | | | | | | | | | | | | |vf1_rep vf2_rep| | | | | +-----+ +-----+ | | | +------| | | |---+ | +---------|nic1 |---------|nic2 |-------+ +-----+ +-----+ | | IP X IP Y Note: **** 1. This assumes OVS support in the NIC, e.g Mellanox ASAP2[2] - so we'll give the VFs to the guest and hook up the corresponding representors to br-int; geneve offload should be possible in the not-so distant future, I believe. 2. There could be a case for multiple NICs without SR-IOV where we have each connected to a different network and expect the guest to use specific physical interface to get out of the chassis. In the picture above, the expectation is for VFs, given to the guest, to use their underlying NIC to exit out, for the following reasons: 1. Use all the available physical NICs in the logical network. 2. The NICs may be connected to different network and we want the traffic on the VFs to exit over that network. E.g. we have an scenario where each NIC is considerationsused for GPU direct communication and has a peer on the remote. 3. OVS offload, offload doesn't span physical NICs. With the current OVN design traffic from vm1 or vm2 to the other will enter the logical network and be encapsulated with the remote's ovn-encap-ip. This will mean that all the traffic between the two chassis will always be received on the NIC that hosts the ovn-remote-ip (and likely routed over the ovn-encap-ip on the source). considerations Bonding the physical NICs is not an option as, I donât believe, we can't guarantee that the VFs will use their underlying NICs when transmitting and also due to the lack of offload support across a bond (of separate NICs). 3. Design Proposal ================== The proposal is to: 1. Extend the ovn-encap-ip to accept a list of IPs (i.e. multiple VTEP) and 2. Bind the logical ports to the desired VTEP, if needed. 3.1 Tunnels between Chassis --------------------------- Specifically, ovn-encap-ip will be a comma separated list (similar to ovn-encap-type) - this might mean we need to rename it to ovn-encap-ips, but given ovn-encap-type takes a list, we could leave it as ovn-encap-ip. Each IP in ovn-encap-ip is paired with each ovn-encap-type for a chassis. Thus, # ovs-vsctl get open . external_ids:ovn-encap-ip "10.1.1.2,10.1.2.2" When ovn-controller registers the chassis with the SB (chassis_run), we register the list of encaps (as currently), but instead of pairing each ovn-encap-type with a single ovn-encap-ip, we pair each ovn-encap-type with each ovn-encap-ip, so, assuming: ovn-encap-ip:x, y ovn-encap-type:geneve,stt we'll have: encaps: x-geneve, x-stt, y-geneve, y-stt when, currently, we have: ovn-encap-ip:x ovn-encap-type:geneve,stt And, hence: encaps: x-geneve, x-stt This can be achieved in 2 ways: 3.1.1 Option I .............. When creating the tunnels to all the remote chassis (encaps_run), we could extend the current logic, i.e. create multiple ovn tunnels, one for each remote IP for the remote chassis. This means we'll have as many tunnels (of the preferred type) to a remote chassis as there are VTEPs configured on the remote. So, with n nodes, each with m VTEP of the preferred type, every chassis will end up with (n - 1) * m tunnel ports on br-int. This will also likely increase the number of openflow rules on a host. The advantage to this approach is that most of the other processing remains largely unchanged, except for getting a tunnel based on <chassis id, VTEP> instead of <chassis id>. If the specific VTEP is not available (i.e. we are sending a packet to a gateway_chassis), we could pick the first VTEP on that chassis. Some of the data structures such as chassis_tunnel, including chassis_tunnel_find(), will need to be updated to support this. 3.1.2 Option II ............... Instead of creating a tunnel for every VTEP on the remote, we could create only one tunnel port on each chassis with remote-up as flow. e.g.: # ovs-vsctl get open . external_ids:system-id "9622c6a5-4660-424a-b049-81202e6b2785" # ovs-vsctl list port ovn-9622c6-0 _uuid : 9456553c-c78b-449b-bf2b-ef550ad6e79b ... external_ids : {ovn-chassis-id="9622c6a5-4660-424a-b049-81202e6b2785"} # ovs-vsctl list interface ovn-9622c6-0 _uuid : d35ad77d-076c-456d-83a5-73e12b81fdf5 .. options : {csum="true", key=flow, remote_ip=flow} When sending a packet to a remote, we will get the desired remote VTEP (section 3.2) and include it as part of the encapsulation, e.g. if we are sending a paconsiderationscket to a logical port that's bound to VTEP Y on the remote, we'll include the following in put_encapsulation: put_load(<ip Y>, MFF_TUN_DST, 0, 32, ofpacts); The advantage of this approach is that it is elegant and will significantly reduce the number of tunnel ports on each chassis. However, it will need more changes, i.e instead of walking the chassis list in SB and creating a tunnel for each remote, we will only create a tunnel port for "our" chassis and use the associated ofport to communicate with logical ports on all remote hypervisors. When creating the openflow rules for remote ports, we will aAlsolways select the ofport of the tunnel and create an encapsulation with the destination IP of the remote hosting the port. As for BFD using the tunnel to detect reachability, we could use *a* VTEP on the remote chassis, I think. Note: **** The open question about this approach is the support for output to tunnels with active/ backup. In the case of distributed router port with multiple gateway_chassis, currently, I believe, we create a bundle and add the ofport of the tunnels to reach the nodes in the list of gateway_chassis. With this option there is only one ofport on the host, so, unless I am missing it, not sure if active/backup can be supported. Preferred approach: Though Option II seems to make use of tunnel ports more effectively, if active/backup can't be supported, I suppose Option I is preferred. 3.2 Port Binding ---------------- Regardless of the alternative selected in creating the tunnels between chassis, want to select the VTEP when communicating with a logical port on a remote chassis. In terms of overlay, we want to have a mapping between the destination MAC and remote VTEP. Currently, we use the Port Binding to get the Chassis, which hosts the logical port and use the Chassis's VTEP to get to it. With this proposal, each Chassis could have multiple VTEP and we want to associate the destination MAC (the logical port) to a specific VTEP on the chassis. This can be achieved in 2 ways: 3.2.1 Option I .............. Extend the SB port binding to include a VTEP column in addition to the Chassis. This seems logical in the context of overlay, i.e. in terms of mapping the logical port to a VTEP. Thus the SB record will have an additional column, "encap" and in addition to adding the port binding record (in consider_local_datapath via sbrec_port_binding_set_chassis()), we will also add the VTEP binding record, e.g. via sbrec_port_binding_set_encap(). e.g: # ovn-sbctl list port_binding ls1-vm1 _uuid : 22e82b30-2947-4667-9274-1ed1cbbd5452 chassis : de7ca9fd-c79a-42df-af38-c98c9b4dd9ab datapath : 1356caaf-2170-4039-afc2-524d10b0d2aa encap : 3b4f6cbc-b8c7-4e2b-9e45-ec723d372ce0 external_ids : {} gateway_chassis : [] logical_port : "ls1-vm1" mac : ["02:ac:10:ff:00:11"] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" The encap is one of the encaps that is associated with the chassis it is bound to, i.e.: # ovn-sbctl --column encaps list chassis de7ca9fd-c79a-42df-af38-c98c9b4dd9ab encaps : [3b4f6cbc-b8c7-4e2b-9e45-ec723d372ce0, ea7e8a51-7342-4911-afa7-56832f5c78dd] The encap associated with âls1-vm1â is # ovn-sbctl list encap 3b4f6cbc-b8c7-4e2b-9e45-ec723d372ce0 _uuid : 3b4f6cbc-b8c7-4e2b-9e45-ec723d372ce0 chassis_name : "225741f3-f92f-4074-b41d-e24d2ee4fb6e" ip : "10.1.1.1" options : {csum="true"} type : geneve While the other one is: # ovn-sbctl list encap ea7e8a51-7342-4911-afa7-56832f5c78dd _uuid : ea7e8a51-7342-4911-afa7-56832f5c78dd chassis_name : "225741f3-f92f-4074-b41d-e24d2ee4fb6e" ip : "10.1.2.1" options : {csum="true"} type : geneve When looking for the output OF port for a logical port in the remote hypervisor (i.e. going out on a tunnel), in consider_port_binding(), we will get the Port's encap binding and use that IP from its encapsulation. In order to make the association between a logical port and a VTEP, we propose a new "encap-ip" external ids (similar to "iface-id") when a logical port is instantiated on a node (it needs to be done on logical port instantiation since a VTEP is bound to a node). So, # ovs-vsctl --column external_ids list Interface vm1 external_ids : {encap-ip="10.1.1.1", iface-id="ls1-vm1"} Also If the encap-ip is not specified, we use a configured IP for the preferred tunnel type for this Port. 3.2.2 Option II ............... If adding a column to the SB's port binding record seems heavy handed, we could add the encap binding in the external_ids of the port binding. We'll still use "encap-ip" external-ids on a logical port to make the association, except that we will stash the information in the SB in the port binding's external-ids as well. Preferred approach: Option I is preferred since it treats the VTEP binding as a first class citizen, but if changing the schema is a major change (in terms of release too), Option II could be considered. 4. Summary ========== In summary, the proposal is to change From (current) : ---------------- SouthBound Records +-------------------------------------------------------------------------------+ +---------------+ | | |encaps | |Chassis C1-----+------>[type1/IP1, type2/IP1...] | | | encap1 encap2 | | | | |Chassis C2-----+------>[type1/IP2, type2/IP2...] | | | encap1 encap2 | | | | +---------------+ | | +---------------+ | | |Chassis | |Port LP1-------+------>[C1] | | | | |Port LP2-------+------>[C2] | | | | +---------------+ | --------------------------------------------------------------------------------+ ovn-encap-ip=IP1 ovn-encap-ip=IP2 ovn-encap-type=type1,type2 ovn-encap-type=type1,type2 system-id=C1 system-id=C2 +-----------------------+ +-----------------------+ | | | | | Chassis C1 | | Chassis C2 | | | | | | +------+ | | +------+ | | |br-int| | | |br-int| | | | |------- | | ------| | | | | |remote-ip=IP2| |remote-ip=IP1| | | | +------+key=flow | |key=flow +------+ | | | | | | | | | | | | | | |(external_ids) | | (external_ids)| | | |iface-id=LP1 | | iface-id=LP2| | | | | | | | +----+------------------+ +-----------------------+ | | | | | | | IP3 IP1 IP2 IP4 | | : : | | : geneve tunnel* : | | ....................... | LP1| |LP2 +----+---------------------------------------------------+------+ | | | LS1 | +---------------------------------------------------------------+ [*assuming IP1/IP3 and IP2/IP4 are not bonded] To: Option I for tunnel & Option I for port binding (basic prototype done to check feasibility) ------------------------------------------------------------------------------------------------ SouthBound Records +---------------------------------------------------------------------------------------+ +---------------+ | | |encaps | |Chassis C1-----+------>[type1/IP1, type1/IP3, type2/IP1...] | | | encap1 encap2 encap3 | | | | |Chassis C2-----+------>[type1/IP2, type1/IP4,type2/IP2...] | | | encap1 encap2 encap3 | | | | +---------------+ | | +---------------+ | | |Chassis | |Port LP1-------------->[C1] | | | |Encap | | +--------------->[C1:encap1] | | | | | | | | |Chassis | |Port LP2-------------->[C2] | | | |Encap | | +-------------->[C2:encap1] | | | | +---------------+ | +---------------------------------------------------------------------------------------+ ovn-encap-ip=IP1,IP3 ovn-encap-ip=IP2,IP4 ovn-encap-type=type1,type2 ovn-encap-type=type1,type2 system-id=C1 system-id=C2 +-----------------------+ +-----------------------+ | | | | | Chassis C1 | | Chassis C2 | | | | | | +------+ | | +------+ | | |br-int| | | |br-int| | | | |------- | | ------| | | | | |remote-ip=flow| |remote-ip=flow| | | | +------+key=flow | |key=flow +------+ | | | | | | | | | | | | | | |(external_ids) | | (external_ids)| | | |iface-id=LP1 | | iface-id=LP2| | | |encap-ip=IP1 | | encap-ip=IP2| | | | | | | | +----+------------------+ +-----------------------+ | | | | | | | IP3 IP1 IP2 IP4 | | : : : : | | : : geneve tunnel : : | | ............................... | LP1| |LP2 +----+---------------------------------------------------+------+ | | | LS1 | +---------------------------------------------------------------+ Option II for tunnel & Option I for port binding considerations: --------------------------------------------------------------- ovn-encap-ip=IP1,IP3 ovn-encap-ip=IP2,IP4 ovn-encap-type=type1,type2 ovn-encap-type=type1,type2 system-id=C1 system-id=C2 +-----------------------+ +-----------------------+ | | | | | Chassis C1 | | Chassis C2 | | | | | | +------+ | | +------+ | | |br-int| | | |br-int| | | | |------- | | ------| | | | | |remote-ip=flow| |remote-ip=flow| | | | +------+key=flow | |key=flow +------+ | | | | | | | | | | | | | | |(external_ids) | | (external_ids)| | | |iface-id=LP1 | | iface-id=LP2| | | |encap-ip=IP1 | | encap-ip=IP2| | | | | | | | +----+------------------+ +-----------------------+ | | | | | | | IP3 IP1 IP2 IP4 | | : : : : | | : : geneve tunnel : : | | ............................... | LP1| |LP2 +----+---------------------------------------------------+------+ | Also | | LS1 | +---------------------------------------------------------------+ ovn-encap-ip=IP1,IP3 ovn-encap-ip=IP2,IP4 ovn-encap-type=type1,type2 ovn-encap-type=type1,type2 system-id=C1 system-id=C2 +-----------------------+ +-----------------------+ | | | | | Chassis C1 | | Chassis C2 | | | | | | remote_ip=IP4 | |remote_ip=IP3 | | +------+key=flow | |key=flow +------+ | | | |-------- | | -------| | | | |br-int| | | |br-int| | | | |-------- | | -------| | | | | |remote-ip=IP2| |remote-ip=IP1 | | | | +------+key=flow | |key=flow +------+ | | | | | | | | | | | | | | |(external_ids) | | (external_ids)| | | |iface-id=LP1 | | iface-id=LP2| | | |encap-ip=IP1 | | encap-ip=IP2| | | | | | | | +----+------------------+ +-----------------------+ | | | | | | | IP3 IP1 IP2 IP4 | | : : : : | | : : geneve tunnel : : | | ............................... | LP1| |LP2 +----+---------------------------------------------------+------+ | | | LS1 | +---------------------------------------------------------------+ 5. References ============= [1]: http://www.openvswitch.org/support/dist-docs/ovn-controller.8.html [2]:https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss