On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <[email protected]> wrote: > > On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <[email protected]> wrote: > > > > Note: This patch series is on top of a pending patch that is still under > > review: http://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > > > > It is RFC because: a) it is based on the unmerged patch. b) DDlog > > changes are not done yet. Below is a copy of the commit message of the last > > patch in this series: > > > > For a fully distributed virtual network dataplane, ovn-controller > > flood-fills datapaths that are connected through patch ports. This > > creates scale problems in ovn-controller when the connected datapaths > > are too many. > > > > In a particular situation, when distributed gateway ports are used to > > connect logical routers to logical switches, when there is no need for > > distributed processing of those gateway ports (e.g. no dnat_and_snat > > configured), the datapaths on the other side of the gateway ports are > > not needed locally on the current chassis. This patch avoids pulling > > those datapaths to local in those scenarios. > > > > There are two scenarios that can greatly benefit from this optimization. > > > > 1) When there are multiple tenants, each has its own logical topology, > > but sharing the same external/provider networks, connected to their > > own logical routers with DGPs. Without this optimization, each > > ovn-controller would process all logical topology of all tenants and > > program flows for all of them, even if there are only workloads of a > > very few number of tenants on the node where the ovn-controller is > > running, because the shared external network connects all tenants. > > With this change, only the logical topologies relevant to the node > > are processed and programmed on the node. > > > > 2) In some deployments, such as ovn-kubernetes, logical switches are > > bound to chassises instead of distributed, because each chassis is > > assigned dedicated subnets. With the current implementation, > > ovn-controller on each node processes all logical switches and all > > ports on them, without knowing that they are not distributed at all. > > At large scale with N nodes (N = hundreds or even more), there are > > roughly N times processing power wasted for the logical connectivity > > related flows. With this change, those depolyments can utilize DGP > > to connect the node level logical switches to distributed router(s), > > with gateway chassis (or HA chassis without really HA) of the DGP > > set to the chassis where the logical switch is bound. This inherently > > tells OVN the mapping between logical switch and chassis, and > > ovn-controller would smartly avoid processing topologies of other node > > level logical switches, which would hugely save compute cost of each > > ovn-controller. > > > > For 2), test result for an ovn-kubernetes alike deployment shows > > signficant improvement of ovn-controller, both CPU (>90% reduced) and memory. > > > > Topology: > > > > - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed > > router. > > > > - 2 large port-groups PG1 and PG2, each with 2000 LSPs > > > > - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1 > > > > - 1 GR per node, connected to the distributed router through a join > > switch. Each GR also connects to an external logical switch per node. > > (This part is to keep the test environment close to a real > > ovn-kubernetes setup but shouldn't make much difference for the > > comparison) > > > > ==== Before the change ==== > > OVS flows per node: 297408 > > ovn-controller memory: 772696 KB > > ovn-controller recompute: 13s > > ovn-controller restart (recompute + reinstall OVS flows): 63s > > > > ==== After the change (also use DGP to connect node level LSes) ==== > > OVS flows per node: 81139 (~70% reduced) > > ovn-controller memory: 163464 KB (~80% reduced) > > ovn-controller recompute: 0.86s (>90% reduced) > > ovn-controller restart (recompute + reinstall OVS flows): 5s (>90% reduced) > > Hi Han, > > Thanks for these RFC patches. The improvements are significant. > That's awesome. > > If I understand this RFC correctly, ovn-k8s will set the > gateway_chassis for each logical > router port of the cluster router (ovn_cluster_router) connecting to > the node logical switch right ? > > If so, instead of using the multiple gw port feature, why can't > ovn-k8s just set the chassis=<node_chassis_name> > in the logical switch other_config option ? > > ovn-controllers can exclude the logical switches from the > local_datapaths if they don't belong to the local chassis. > > I'm not entirely sure if this would work. Any thoughts ? If the same > can be achieved using the chassis option > instead of multiple gw router ports, perhaps the former seems better > to me as it would be less work for ovn-k8s. > And there will be fewer resources in SB DB. What do you think ? > Otherwise +1 from me for > this RFC series. >
Thanks Numan for the feedback! The reason why not introducing a new option in LS is: 1) The multiple DGP support is a valuable feature regardless of the use case of this RFC. 2) Don't flood-fill beyond DGP port is also valuable regardless of the ovn-k8s use case. As mentioned it would also help the OpenStack scalability when multi-tenant sharing same provider networks. 3) If 1) and 2) are both implemented, there is no need for an extra mechanism for "bind logical switches to chassis", because the outcome of 1) and 2) are sufficient. The changes in ovn-k8s would be the same, i.e. set the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to the ovn-k8s repo and it appears to be a very small change: https://github.com/ovn-org/ovn-kubernetes/pull/2388 In addition, a separate option on LS seems unnatural to me, because the end user must understand what they are doing by setting that option. In contrast, the DGP more flexibly and accurately tells what OVN should do. Maybe the name "Distributed Gateway Port" is somehow confusing, but the chassis-redirect port behind it is telling OVN that the user wants the traffic to be redirected to a chassis for the LRP. There can be different scenarios such as a single LS connecting to multiple DGPs and vice versa, all are valid setups that can be supported by this feature. While setting a chassis option for a LS is arbitrary and it is easy to create conflict setups, e.g. setting such an option for LS-join. Of course we can say the user is responsible for what they are setting, but I just don't see it necessary for now. Does this make sense? > Thanks > Numan > > > > > Han Zhou (4): > > ovn-northd: Avoid ha_ref_chassis calculation when there is only one > > chassis in ha_chassis_group. > > binding.c: Refactor binding_handle_port_binding_changes. > > binding.c: Create a new function > > consider_patch_port_for_local_datapaths. > > ovn-controller: Don't flood fill local datapaths beyond DGP boundary. > > > > controller/binding.c | 190 +++++++++++++++++++++++++++++------------ > > northd/ovn-northd.c | 39 +++++++-- > > ovn-architecture.7.xml | 26 ++++++ > > ovn-nb.xml | 6 ++ > > tests/ovn.at | 67 +++++++++++++++ > > 5 files changed, 268 insertions(+), 60 deletions(-) > > > > -- > > 2.30.2 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
