+1 on using VRFs for tenant isolation. We do not want to use separate BGP daemon for the purpose of isolation but use VRFs. There should be 1 BGP daemon at the system level that caters to multiple tenants.
Regards Gurpreet > On Sep 1, 2024, at 6:07 AM, Frode Nordahl <fnord...@ubuntu.com> wrote: > > > > fre. 30. aug. 2024, 19:48 skrev Roberto Bartzen Acosta > <roberto.aco...@luizalabs.com <mailto:roberto.aco...@luizalabs.com>>: >> Hello Frode, >> >> Thanks for working on this. > > > Hello, Roberto, > > Thank you for your interest in the work. > >> >> Em sex., 30 de ago. de 2024 às 12:37, Frode Nordahl <fnord...@ubuntu.com >> <mailto:fnord...@ubuntu.com>> escreveu: >>> Hello, Tiago, >>> >>> Please find my response in-line below. >>> >>> fre. 30. aug. 2024, 17:09 skrev Tiago Pires <tiago.pi...@luizalabs.com >>> <mailto:tiago.pi...@luizalabs.com>>: >>> >>> > Hi all, >>> > >>> > I did test your Routing protocol port redirection patch and I was >>> > wondering how you guys are planning to make the learning and >>> > advertisement of the routes between the Logical router's routing table and >>> > the bgp daemon. >>> > >>> >>> Thank you for your interest in this work. The answer to your question is in >>> the thread you are quoting, I'll highlight it with comments below. >>> >>> Thanks >>> > >>> > Regards, >>> > >>> > Tiago Pires >>> > >>> > On Tue, Aug 6, 2024 at 9:03 AM Frode Nordahl <fnord...@ubuntu.com >>> > <mailto:fnord...@ubuntu.com>> wrote: >>> > >>> >> On Mon, Aug 5, 2024 at 12:02 PM Ales Musil <amu...@redhat.com >>> >> <mailto:amu...@redhat.com>> wrote: >>> >> > On Thu, Aug 1, 2024 at 6:04 PM Frode Nordahl <fnord...@ubuntu.com >>> >> > <mailto:fnord...@ubuntu.com>> >>> >> wrote: >>> >> >> >>> >> >> Hello, Ales, >>> >> >> >>> >> >> This is a fork of the thread to go back to discuss some of the items >>> >> >> raised in the most recent instance of the OVN A/V Community meeting >>> >> >> [6]. >>> >> > >>> >> > >>> >> > >>> >> > Hi Frode, >>> >> > >>> >> > thank you for the followup discussion. >>> >> > >>> >> >> >>> >> >> On Fri, Jun 28, 2024 at 11:03 AM Ales Musil <amu...@redhat.com >>> >> >> <mailto:amu...@redhat.com>> wrote: >>> >> >> > On Tue, Jun 25, 2024 at 6:52 PM Frode Nordahl <fnord...@ubuntu.com >>> >> >> > <mailto:fnord...@ubuntu.com>> >>> >> wrote: >>> >> >> >> >>> >> >> >> Hello, >>> >> >> >> >>> >> >> >> We are increasingly seeing requests for integration between OVN >>> >> >> >> powered CMSs/workloads and the fabric. >>> >> >> >> >>> >> >> >> As a side note, this is a very interesting topic to me personally, >>> >> and >>> >> >> >> I think there are opportunities in the long term for this class of >>> >> >> >> software to potentially fill a void for more automated and SDN-like >>> >> >> >> ways of managing the physical network, as previously closed >>> >> >> >> physical >>> >> >> >> switch hardware is increasingly opening up to programmatic >>> >> >> >> extension >>> >> >> >> and control. >>> >> >> >> >>> >> >> >> While very exciting, it will take a while, both in terms of >>> >> >> >> evolving >>> >> >> >> how networking teams are organized, in terms of the longevity of >>> >> >> >> networking gear making entity wide refresh cycles very long, not to >>> >> >> >> mention gathering agreement and momentum to build such a thing from >>> >> >> >> the pieces we have. >>> >> >> >> >>> >> >> >> >>> >> >> >> So to be pragmatic, we need to integrate with something that fabric >>> >> >> >> network engineers are comfortable with, and already available on >>> >> most >>> >> >> >> networking hardware, be it closed or open, today. >>> >> >> >> >>> >> >> >> The most ubiquitous routing protocol, which has prevailed in modern >>> >> >> >> layer 3 only data center designs [0], is BGP. >>> >> >> >> >>> >> >> >> Use cases: >>> >> >> >> * Allow fabric to locate and direct traffic to reroutable resources >>> >> >> >> such as IPv4/IPv6 prefixes, Floating IPs (FIPs) and Load Balancer >>> >> >> >> VIPs. >>> >> >> >> >>> >> >> >> * Use the fabric as a load balancer, announcing the same service IP >>> >> on >>> >> >> >> multiple hosts (anycast). >>> >> >> >> >>> >> >> >> * Aggregate announcements from stacked CMSes (i.e. Kubernetes >>> >> running >>> >> >> >> on top of OpenStack). >>> >> >> >> >>> >> >> >> >>> >> >> >> Requirements: >>> >> >> >> * Data path must be hardware offloaded, i.e. the next hop address >>> >> the >>> >> >> >> peer resolves for announcements of OVN resources needs to be an LRP >>> >> >> >> IP. >>> >> >> >> >>> >> >> >> * Minimize configuration overhead through the use of IPv6 LLAs for >>> >> >> >> peering routing both IPv4 and IPv6 prefixes over a IPv6 BGP session >>> >> >> >> [1] (aka. “BGP Unnumbered”). >>> >> >> >> >>> >> >> >> * Support ECMP out of the host, i.e. use L3 interfaces potentially >>> >> >> >> connecting to two different ToRs, instead of bonds, avoiding the >>> >> >> >> additional complexity of multi-chassis bonds. >>> >> >> >> >>> >> >> >> * Support BGP authentication [2][3], i.e. the source, destination >>> >> >> >> address and ports in packet headers can not be changed. >>> >> >> >> >>> >> >> >> * Compatibility >>> >> >> >> * Running a BGP protocol suite on the host is becoming a thing >>> >> >> >> in >>> >> >> >> its own right, and our users may have requirements of their own >>> >> >> >> that >>> >> >> >> influence their choice of implementation. We need to take this into >>> >> >> >> account and choose integration methods that allow OVN to work with >>> >> >> >> multiple protocol suite implementations. >>> >> >> >> >>> >> >> >> * While we have the power to change and fix issues in popular >>> >> >> >> routing protocol suites, such as FRR, we need to be able to >>> >> integrate >>> >> >> >> with versions that exist on networking hardware out there today. >>> >> >> >> >>> >> >> >> Limitations that influence/dictate implementation choices: >>> >> >> >> * Peering with IPv6 LLAs to meet the configuration overhead >>> >> >> >> requirement makes the peering relationship point to point. >>> >> >> >> >>> >> >> >> * Popular BGP implementations, such as FRR which is used as routing >>> >> >> >> protocol suite by many ToR open source NOSes, does not accept >>> >> >> >> sending/receiving IPv6 LLA next hop with the route, so the BGP peer >>> >> >> >> address will be used as next hop. (There are even mentions of 3rd >>> >> >> >> party nexthop currently not being supported, but not sure if that >>> >> >> >> is >>> >> >> >> accurate [4]). >>> >> >> >> >>> >> >> >> * As mentioned above, BGP authentication requires IP headers to be >>> >> >> >> unchanged for the BGP TCP packets going to/from the BGP speaker. >>> >> >> >> >>> >> >> >> >>> >> >> >> Proposed implementation: >>> >> >> >> >>> >> >> >> We are in the process of preparing some RFC/PoC patches that at a >>> >> high >>> >> >> >> level will: >>> >> >> >> * Manage a VRF in the system serving two purposes: >>> >> >> >> * Leaking of route information from ovn-controller to the VRF >>> >> >> >> routing table, which a routing protocol suite can redistribute >>> >> subject >>> >> >> >> to configuration. >>> >> >> >> >>> >> >> >> * Provide an IP endpoint that a VRF aware application, such as >>> >> FRR, >>> >> >> >> can bind to serving as a BGP speaker on behalf of a OVN LRP IP. >>> >> >> >> >>> >> >> >> * We will attach a OVN VIF to this VRF that has data path rules >>> >> that: >>> >> >> >> >>> >> >> >> * Forward required traffic destined to the OVN LRP IP to the >>> >> >> >> VRF. >>> >> >> >> >>> >> >> >> * Forward required traffic from the application bound to the VRF >>> >> as >>> >> >> >> if it originated from the OVN LRP IP. >>> >> >>> > >>> The above bullets give an overview of the proposed implementation. >> >> Just to validate the understanding and align expectations, the main purpose >> of this implementation/module developed for route-exchange-netlink is to >> export addresses that are typically "external" from the point of view of the >> OVN router / SDN. In this case dnat_and_snat rules for FIPs and LB VIP's >> configured on the Logical Router, right? > > > Development happens in iterations, so this is indeed where we start. We also > want to get to learning of routes, as that would simplify configuration, and > hopefully we will. > >> So, it's out of the scope to advertise/learn routes from the router's >> internal networks, as well as the router's static route table, directly >> connected LS subnets, etc. Therefore, it's out of the scope to make the >> service and integration of the BGP daemon multi-tenant, since there is no >> plan for segmentation by namespaces to run different BGP daemons, right? I >> imagine that in the multi-tenant use case we would have a BGP daemon for >> each logical router, in addition to playing with the router's complete route >> table. > > > The BGP redirect option and route redistribute options are per LRP, and there > are no restrictions on how the LSP redirected to is terminated in the system. > Likewise, the use of VRFs to exchange route information give you isolation. > We view the BGP daemon as a system/admin level entity and the isolation you > seek can be configured in it? > > We have large users that do not use NAT and would be interested in > redistributing LR networks attached to distributed gateways (the OpenStack > use case). > > However, before diving into that we need to figure out how to connect a > distributed topology with the per chassis gateway router, as the answer to > that will impact what resources it makes sense to redistribute where. > > -- > Frode Nordahl > > > >> Best regards, >> Roberto >> >>> >>> >> >> >>> >> >> >> >>> >> >> >> Hopefully we'll have something up on the list before the end of >>> >> >> >> this >>> >> >> >> week, which makes it real and easier to reason about for further >>> >> >> >> discussion. >>> >> >> >> >>> >> >> >> >>> >> >> >> Prior art: >>> >> >> >> >>> >> >> >> We recognize that there already exists a third party approach to >>> >> this >>> >> >> >> in the ovn-bgp-agent [5] governed by OpenStack, and our goal with >>> >> this >>> >> >> >> work is to provide a tighter integration that might cater >>> >> generically >>> >> >> >> for other CMSes and use cases. >>> >> >> >> >>> >> >> >> >>> >> >> >> 0: https://datatracker.ietf.org/doc/html/rfc7938 >>> >> >> >> 1: https://datatracker.ietf.org/doc/html/rfc5549 >>> >> >> >> 2: https://datatracker.ietf.org/doc/html/rfc2385 >>> >> >> >> 3: https://datatracker.ietf.org/doc/html/rfc5925 >>> >> >> >> 4: >>> >> https://github.com/FRRouting/frr/blob/cc3519f3e6eaa06f762e0d447202df32df66e129/bgpd/bgp_route.c#L2719 >>> >> >> >> 5: https://docs.openstack.org/ovn-bgp-agent/latest/ >>> >> >> >>> >> >> >>> >> >> 6: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-August/416209.html >>> >> >> >>> >> >> > >>> >> >> > Hi Frode, >>> >> >> > >>> >> >> > looking forward to the RFC. >>> >> >> >>> >> >> As we agreed, the current set of patches that we have >>> >> >> [7][8][9][10][11][12][13] will not be considered for the 24.09 release >>> >> >> as we would like to make it more feature complete and target the 25.03 >>> >> >> release instead. In that context I guess they serve as the RFC >>> >> >> patches. >>> >> >> >>> >> >> 7: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416038.html >>> >> >> 8: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416039.html >>> >> >> 9: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416040.html >>> >> >> 10: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416042.html >>> >> >> 11: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416041.html >>> >> >> 12: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416043.html >>> >> >> 13: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416044.html >>> >> >> >>> >> >>> > >>> The above patches are the current state of the work and we will continue it >>> the coming cycle with the intention to get it into the 25.03 release. If >>> you have cycles to try it out and provide feedback, that would be most >>> welcome. >>> >>> -- >>> Frode Nordahl >>> >>> >> In addition to the above there is the LRP BGP redirect patch from >>> >> >> Martin [14], which could be useful independently. >>> >> >> >>> >> >> 14: >>> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/416095.html >>> >> >> >>> >> >> Discussion points from meeting: >>> >> >> 1) OVN and Netlink code >>> >> >> In the meeting you raised some concerns about introducing Netlink code >>> >> >> in the OVN repository. While I agree with you 100% that the part of >>> >> >> [10] that vendors code from OVS (contents of >>> >> >> route-exchange-netlink-private.h), should instead be a patch for OVS. >>> >> >> >>> >> >> The parts of [10] that provide higher layer helper functions, >>> >> >> consuming OVS library code, do not naturally fit in OVS as OVS itself >>> >> >> has no use for them. >>> >> > >>> >> > >>> >> > >>> >> > After the discussion that we had during the meeting I agree that we >>> >> should reuse >>> >> > as much OvS code as possible. >>> >> > >>> >> >> >>> >> >> As a quick reminder, we are looking at Netlink because it provides a >>> >> >> simple and established API for exchange of this type of information >>> >> >> which is already supported by all routing protocol suites out there. >>> >> >> It is not tied to any particular data path type, we could >>> >> >> theoretically even use it as IPC between two userspace processes, >>> >> >> removing the kernel from the picture with support on the routing >>> >> >> protocol suite side. (There has been some discussion on this for BIRD: >>> >> >> >>> >> https://bird.network.cz/pipermail/bird-users/2021-September/015707.html). >>> >> >> >>> >> >> Would it be possible to reach some compromise to include only the >>> >> >> parts that consume OVS library code (route-exchange-netlink.{c|h})? >>> >> >> >>> >> >> While a plugin based approach was also suggested, and we have prior >>> >> >> examples of successfully using that, it does not come without >>> >> >> substantial cost. So I want to explore what options there are to host >>> >> >> this inside the main repository. >>> >> > >>> >> > >>> >> > We could maintain the netlink plugin in the OVN codebase as an example >>> >> > and still have the plugin system in place. The plugin system has >>> >> potential >>> >> > benefit of not locking ourselves to just netlink, but we could >>> >> potentially use >>> >> > API regardless of the OVN code. Would that be enough of justification >>> >> > and to add the plugin system + netlink as default plugin housed >>> >> > in the OVN codebase? >>> >> >>> >> This makes sense and provides a path forward, thanks! >>> >> >>> >> We are in a point of the development cycle where we need to tend to >>> >> some downstream stuff, but we will continue this work. I'll try to get >>> >> patches to make the OVS route-table module consumable/reusable for >>> >> this work posted as soon as possible and start thinking about what's >>> >> needed in the route-exchange provider interface. >>> >> >>> >> >> 2) OVS OpenFlow extensions >>> >> >> One of the counter proposals you brought up was to add OVS OpenFlow >>> >> >> extensions to allow OVN instruct OVS to insert routes into a system >>> >> >> routing table. >>> >> >> >>> >> >> While I see this could be a clear separation of concerns between OVN >>> >> >> and OVS, and OpenFlow being OVN's native integration language, I >>> >> >> struggle a bit with the general usefulness of such an extension. >>> >> >> >>> >> >> Our use case for inserting routes into a system routing table is >>> >> >> purely for exchange of control plane information with some external >>> >> >> system, such as a routing protocol suite, and we have no interest in >>> >> >> using it for actual data path control. This is in contrast to how OVN >>> >> >> uses OpenFlow generally, which as far as I understand is to control >>> >> >> the data path. >>> >> > >>> >> > >>> >> > In the light of the other discussion it doesn't really make sense to >>> >> have >>> >> > plugin when most of the netlink code wouldn't be in the OvS codebase >>> >> > anyway. >>> >> >>> >> Assuming you are referring to the OpenFlow extension here, and yes, I >>> >> agree. >>> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> [ snip ] >>> >> >> >>> >> >> > Another important part that we should keep in mind if possible, is >>> >> the EVNP >>> >> >> > use case. To be able to configure VXLAN tunnels based on the info >>> >> that we >>> >> >> > will receive. >>> >> >> > >>> >> >> > I'm not sure how far/deep in the actual design you are but maybe the >>> >> following >>> >> >> > might be helpful in some way. What I had in mind was sort of plugin >>> >> that would >>> >> >> > expose the info of bound entities that are interesting in terms of >>> >> BGP >>> >> >> > (it could be configurable), so mainly FIPS, LBs, GW router IPs. For >>> >> the import >>> >> >> > part (which would be applicable only for GW LR) we would create >>> >> entries in SB >>> >> >> > DB similarly as we do currently with Multicast_Group >>> >> >> > (BGP_Routes? EVPN_Tunnels?). Northd could consume those values and >>> >> configure >>> >> >> > logical flows and encaps as needed. >>> >> >> >>> >> >> While the EVPN part is not a priority for us at this point in time, we >>> >> >> will of course be interested in making sure the work we put into stage >>> >> >> 1 (ovn-controller redistributing FIPS, LBs and GW router IPs), stage 2 >>> >> >> (ovn-controller learning routes) will be consumable for a stage 3. >>> >> > >>> >> > >>> >> > Great, yeah my point during the discussion wasn't about >>> >> > making it available right away just to reiterate that it would be of >>> >> > interest for us and if needed we can of course help with the >>> >> > development >>> >> > process. >>> >> >>> >> Cool stuff, let's do it! >>> >> >>> >> -- >>> >> Frode Nordahl >>> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Frode Nordahl >>> >> >> >>> >> >> > Let me know if that makes sense. >>> >> >> > >>> >> >> > Thanks, >>> >> >> > Ales >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > >>> >> >> > Ales Musil >>> >> >> > >>> >> >> > Senior Software Engineer - OVN Core >>> >> >> > >>> >> >> > Red Hat EMEA >>> >> >> > >>> >> >> > amu...@redhat.com <mailto:amu...@redhat.com> >>> >> >> >>> >> > >>> >> > Thanks, >>> >> > Ales >>> >> > >>> >> > -- >>> >> > >>> >> > Ales Musil >>> >> > >>> >> > Senior Software Engineer - OVN Core >>> >> > >>> >> > Red Hat EMEA >>> >> > >>> >> > amu...@redhat.com <mailto:amu...@redhat.com> >>> >> _______________________________________________ >>> >> dev mailing list >>> >> d...@openvswitch.org <mailto:d...@openvswitch.org> >>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >> >>> > >>> > >>> > *‘Esta mensagem é direcionada apenas para os endereços constantes no >>> > cabeçalho inicial. Se você não está listado nos endereços constantes no >>> > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa >>> > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas >>> > estão >>> > imediatamente anuladas e proibidas’.* >>> > >>> > *‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para >>> > assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não >>> > poderá aceitar a responsabilidade por quaisquer perdas ou danos causados >>> > por esse e-mail ou por seus anexos’.* >>> > >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> ‘Esta mensagem é direcionada apenas para os endereços constantes no >> cabeçalho inicial. Se você não está listado nos endereços constantes no >> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa >> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão >> imediatamente anuladas e proibidas’. >> ‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para >> assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não >> poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por >> esse e-mail ou por seus anexos’. >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev