On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <[email protected]> wrote: > > Note: This patch series is on top of a pending patch that is still under > review: > http://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > > It is RFC because: a) it is based on the unmerged patch. b) DDlog > changes are not done yet. Below is a copy of the commit message of the last > patch in this series: > > For a fully distributed virtual network dataplane, ovn-controller > flood-fills datapaths that are connected through patch ports. This > creates scale problems in ovn-controller when the connected datapaths > are too many. > > In a particular situation, when distributed gateway ports are used to > connect logical routers to logical switches, when there is no need for > distributed processing of those gateway ports (e.g. no dnat_and_snat > configured), the datapaths on the other side of the gateway ports are > not needed locally on the current chassis. This patch avoids pulling > those datapaths to local in those scenarios. > > There are two scenarios that can greatly benefit from this optimization. > > 1) When there are multiple tenants, each has its own logical topology, > but sharing the same external/provider networks, connected to their > own logical routers with DGPs. Without this optimization, each > ovn-controller would process all logical topology of all tenants and > program flows for all of them, even if there are only workloads of a > very few number of tenants on the node where the ovn-controller is > running, because the shared external network connects all tenants. > With this change, only the logical topologies relevant to the node > are processed and programmed on the node. > > 2) In some deployments, such as ovn-kubernetes, logical switches are > bound to chassises instead of distributed, because each chassis is > assigned dedicated subnets. With the current implementation, > ovn-controller on each node processes all logical switches and all > ports on them, without knowing that they are not distributed at all. > At large scale with N nodes (N = hundreds or even more), there are > roughly N times processing power wasted for the logical connectivity > related flows. With this change, those depolyments can utilize DGP > to connect the node level logical switches to distributed router(s), > with gateway chassis (or HA chassis without really HA) of the DGP > set to the chassis where the logical switch is bound. This inherently > tells OVN the mapping between logical switch and chassis, and > ovn-controller would smartly avoid processing topologies of other node > level logical switches, which would hugely save compute cost of each > ovn-controller. > > For 2), test result for an ovn-kubernetes alike deployment shows > signficant improvement of ovn-controller, both CPU (>90% reduced) and memory. > > Topology: > > - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed > router. > > - 2 large port-groups PG1 and PG2, each with 2000 LSPs > > - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1 > > - 1 GR per node, connected to the distributed router through a join > switch. Each GR also connects to an external logical switch per node. > (This part is to keep the test environment close to a real > ovn-kubernetes setup but shouldn't make much difference for the > comparison) > > ==== Before the change ==== > OVS flows per node: 297408 > ovn-controller memory: 772696 KB > ovn-controller recompute: 13s > ovn-controller restart (recompute + reinstall OVS flows): 63s > > ==== After the change (also use DGP to connect node level LSes) ==== > OVS flows per node: 81139 (~70% reduced) > ovn-controller memory: 163464 KB (~80% reduced) > ovn-controller recompute: 0.86s (>90% reduced) > ovn-controller restart (recompute + reinstall OVS flows): 5s (>90% reduced)
Hi Han, Thanks for these RFC patches. The improvements are significant. That's awesome. If I understand this RFC correctly, ovn-k8s will set the gateway_chassis for each logical router port of the cluster router (ovn_cluster_router) connecting to the node logical switch right ? If so, instead of using the multiple gw port feature, why can't ovn-k8s just set the chassis=<node_chassis_name> in the logical switch other_config option ? ovn-controllers can exclude the logical switches from the local_datapaths if they don't belong to the local chassis. I'm not entirely sure if this would work. Any thoughts ? If the same can be achieved using the chassis option instead of multiple gw router ports, perhaps the former seems better to me as it would be less work for ovn-k8s. And there will be fewer resources in SB DB. What do you think ? Otherwise +1 from me for this RFC series. Thanks Numan > > Han Zhou (4): > ovn-northd: Avoid ha_ref_chassis calculation when there is only one > chassis in ha_chassis_group. > binding.c: Refactor binding_handle_port_binding_changes. > binding.c: Create a new function > consider_patch_port_for_local_datapaths. > ovn-controller: Don't flood fill local datapaths beyond DGP boundary. > > controller/binding.c | 190 +++++++++++++++++++++++++++++------------ > northd/ovn-northd.c | 39 +++++++-- > ovn-architecture.7.xml | 26 ++++++ > ovn-nb.xml | 6 ++ > tests/ovn.at | 67 +++++++++++++++ > 5 files changed, 268 insertions(+), 60 deletions(-) > > -- > 2.30.2 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
