On 03/08/2021 19:33, Han Zhou wrote: > On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <[email protected]> wrote: >> >> On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <[email protected]> wrote: >>> >>> Note: This patch series is on top of a pending patch that is still under >>> review: > http://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ >>> >>> It is RFC because: a) it is based on the unmerged patch. b) DDlog >>> changes are not done yet. Below is a copy of the commit message of the > last >>> patch in this series: >>> >>> For a fully distributed virtual network dataplane, ovn-controller >>> flood-fills datapaths that are connected through patch ports. This >>> creates scale problems in ovn-controller when the connected datapaths >>> are too many. >>> >>> In a particular situation, when distributed gateway ports are used to >>> connect logical routers to logical switches, when there is no need for >>> distributed processing of those gateway ports (e.g. no dnat_and_snat >>> configured), the datapaths on the other side of the gateway ports are >>> not needed locally on the current chassis. This patch avoids pulling >>> those datapaths to local in those scenarios. >>> >>> There are two scenarios that can greatly benefit from this optimization. >>> >>> 1) When there are multiple tenants, each has its own logical topology, >>> but sharing the same external/provider networks, connected to their >>> own logical routers with DGPs. Without this optimization, each >>> ovn-controller would process all logical topology of all tenants and >>> program flows for all of them, even if there are only workloads of a >>> very few number of tenants on the node where the ovn-controller is >>> running, because the shared external network connects all tenants. >>> With this change, only the logical topologies relevant to the node >>> are processed and programmed on the node. >>> >>> 2) In some deployments, such as ovn-kubernetes, logical switches are >>> bound to chassises instead of distributed, because each chassis is >>> assigned dedicated subnets. With the current implementation, >>> ovn-controller on each node processes all logical switches and all >>> ports on them, without knowing that they are not distributed at all. >>> At large scale with N nodes (N = hundreds or even more), there are >>> roughly N times processing power wasted for the logical connectivity >>> related flows. With this change, those depolyments can utilize DGP >>> to connect the node level logical switches to distributed router(s), >>> with gateway chassis (or HA chassis without really HA) of the DGP >>> set to the chassis where the logical switch is bound. This inherently >>> tells OVN the mapping between logical switch and chassis, and >>> ovn-controller would smartly avoid processing topologies of other > node >>> level logical switches, which would hugely save compute cost of each >>> ovn-controller. >>> >>> For 2), test result for an ovn-kubernetes alike deployment shows >>> signficant improvement of ovn-controller, both CPU (>90% reduced) and > memory. >>> >>> Topology: >>> >>> - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed >>> router. >>> >>> - 2 large port-groups PG1 and PG2, each with 2000 LSPs >>> >>> - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1 >>> >>> - 1 GR per node, connected to the distributed router through a join >>> switch. Each GR also connects to an external logical switch per node. >>> (This part is to keep the test environment close to a real >>> ovn-kubernetes setup but shouldn't make much difference for the >>> comparison) >>> >>> ==== Before the change ==== >>> OVS flows per node: 297408 >>> ovn-controller memory: 772696 KB >>> ovn-controller recompute: 13s >>> ovn-controller restart (recompute + reinstall OVS flows): 63s >>> >>> ==== After the change (also use DGP to connect node level LSes) ==== >>> OVS flows per node: 81139 (~70% reduced) >>> ovn-controller memory: 163464 KB (~80% reduced) >>> ovn-controller recompute: 0.86s (>90% reduced) >>> ovn-controller restart (recompute + reinstall OVS flows): 5s (>90% > reduced) >> >> Hi Han, >> >> Thanks for these RFC patches. The improvements are significant. >> That's awesome. >> >> If I understand this RFC correctly, ovn-k8s will set the >> gateway_chassis for each logical >> router port of the cluster router (ovn_cluster_router) connecting to >> the node logical switch right ? >> >> If so, instead of using the multiple gw port feature, why can't >> ovn-k8s just set the chassis=<node_chassis_name> >> in the logical switch other_config option ? >> >> ovn-controllers can exclude the logical switches from the >> local_datapaths if they don't belong to the local chassis. >> >> I'm not entirely sure if this would work. Any thoughts ? If the same >> can be achieved using the chassis option >> instead of multiple gw router ports, perhaps the former seems better >> to me as it would be less work for ovn-k8s. >> And there will be fewer resources in SB DB. What do you think ? >> Otherwise +1 from me for >> this RFC series. >> > > Thanks Numan for the feedback! > The reason why not introducing a new option in LS is: > 1) The multiple DGP support is a valuable feature regardless of the use > case of this RFC. > 2) Don't flood-fill beyond DGP port is also valuable regardless of the > ovn-k8s use case. As mentioned it would also help the OpenStack scalability > when multi-tenant sharing same provider networks. > 3) If 1) and 2) are both implemented, there is no need for an extra > mechanism for "bind logical switches to chassis", because the outcome of 1) > and 2) are sufficient. The changes in ovn-k8s would be the same, i.e. set > the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to the > ovn-k8s repo and it appears to be a very small change: > https://github.com/ovn-org/ovn-kubernetes/pull/2388 > > In addition, a separate option on LS seems unnatural to me, because the end > user must understand what they are doing by setting that option. In
This is a great series Han. Although, I haven't looked into all the details. I think I disagree with this point. For me, at least, the idea of setting the chassis for a switch appears a more intuitive way of configuring this as it follows an established pattern (that we use for routers). > contrast, the DGP more flexibly and accurately tells what OVN should do. > Maybe the name "Distributed Gateway Port" is somehow confusing, but the Yes, this could be the case. > chassis-redirect port behind it is telling OVN that the user wants the > traffic to be redirected to a chassis for the LRP. There can be different > scenarios such as a single LS connecting to multiple DGPs and vice versa, > all are valid setups that can be supported by this feature. While setting a > chassis option for a LS is arbitrary and it is easy to create conflict > setups, e.g. setting such an option for LS-join. Of course we can say the > user is responsible for what they are setting, but I just don't see it > necessary for now. > > Does this make sense? > >> Thanks >> Numan >> >>> >>> Han Zhou (4): >>> ovn-northd: Avoid ha_ref_chassis calculation when there is only one >>> chassis in ha_chassis_group. >>> binding.c: Refactor binding_handle_port_binding_changes. >>> binding.c: Create a new function >>> consider_patch_port_for_local_datapaths. >>> ovn-controller: Don't flood fill local datapaths beyond DGP boundary. >>> >>> controller/binding.c | 190 +++++++++++++++++++++++++++++------------ >>> northd/ovn-northd.c | 39 +++++++-- >>> ovn-architecture.7.xml | 26 ++++++ >>> ovn-nb.xml | 6 ++ >>> tests/ovn.at | 67 +++++++++++++++ >>> 5 files changed, 268 insertions(+), 60 deletions(-) >>> >>> -- >>> 2.30.2 >>> >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
