Re: [ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Mark Gray Tue, 10 Aug 2021 09:43:18 -0700

On 03/08/2021 19:33, Han Zhou wrote:
> On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique <[email protected]> wrote:
>>
>> On Fri, Jul 30, 2021 at 3:22 AM Han Zhou <[email protected]> wrote:
>>>
>>> Note: This patch series is on top of a pending patch that is still under
>>> review:
> http://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
>>>
>>> It is RFC because: a) it is based on the unmerged patch. b) DDlog
>>> changes are not done yet. Below is a copy of the commit message of the
> last
>>> patch in this series:
>>>
>>> For a fully distributed virtual network dataplane, ovn-controller
>>> flood-fills datapaths that are connected through patch ports. This
>>> creates scale problems in ovn-controller when the connected datapaths
>>> are too many.
>>>
>>> In a particular situation, when distributed gateway ports are used to
>>> connect logical routers to logical switches, when there is no need for
>>> distributed processing of those gateway ports (e.g. no dnat_and_snat
>>> configured), the datapaths on the other side of the gateway ports are
>>> not needed locally on the current chassis. This patch avoids pulling
>>> those datapaths to local in those scenarios.
>>>
>>> There are two scenarios that can greatly benefit from this optimization.
>>>
>>> 1) When there are multiple tenants, each has its own logical topology,
>>>    but sharing the same external/provider networks, connected to their
>>>    own logical routers with DGPs. Without this optimization, each
>>>    ovn-controller would process all logical topology of all tenants and
>>>    program flows for all of them, even if there are only workloads of a
>>>    very few number of tenants on the node where the ovn-controller is
>>>    running, because the shared external network connects all tenants.
>>>    With this change, only the logical topologies relevant to the node
>>>    are processed and programmed on the node.
>>>
>>> 2) In some deployments, such as ovn-kubernetes, logical switches are
>>>    bound to chassises instead of distributed, because each chassis is
>>>    assigned dedicated subnets. With the current implementation,
>>>    ovn-controller on each node processes all logical switches and all
>>>    ports on them, without knowing that they are not distributed at all.
>>>    At large scale with N nodes (N = hundreds or even more), there are
>>>    roughly N times processing power wasted for the logical connectivity
>>>    related flows. With this change, those depolyments can utilize DGP
>>>    to connect the node level logical switches to distributed router(s),
>>>    with gateway chassis (or HA chassis without really HA) of the DGP
>>>    set to the chassis where the logical switch is bound. This inherently
>>>    tells OVN the mapping between logical switch and chassis, and
>>>    ovn-controller would smartly avoid processing topologies of other
> node
>>>    level logical switches, which would hugely save compute cost of each
>>>    ovn-controller.
>>>
>>> For 2), test result for an ovn-kubernetes alike deployment shows
>>> signficant improvement of ovn-controller, both CPU (>90% reduced) and
> memory.
>>>
>>> Topology:
>>>
>>> - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
>>>   router.
>>>
>>> - 2 large port-groups PG1 and PG2, each with 2000 LSPs
>>>
>>> - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
>>>
>>> - 1 GR per node, connected to the distributed router through a join
>>>   switch. Each GR also connects to an external logical switch per node.
>>>   (This part is to keep the test environment close to a real
>>>    ovn-kubernetes setup but shouldn't make much difference for the
>>>    comparison)
>>>
>>> ==== Before the change ====
>>> OVS flows per node: 297408
>>> ovn-controller memory: 772696 KB
>>> ovn-controller recompute: 13s
>>> ovn-controller restart (recompute + reinstall OVS flows): 63s
>>>
>>> ==== After the change (also use DGP to connect node level LSes) ====
>>> OVS flows per node: 81139 (~70% reduced)
>>> ovn-controller memory: 163464 KB (~80% reduced)
>>> ovn-controller recompute: 0.86s (>90% reduced)
>>> ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> reduced)
>>
>> Hi Han,
>>
>> Thanks for these RFC patches.  The improvements are significant.
>> That's awesome.
>>
>> If I understand this RFC correctly, ovn-k8s will set the
>> gateway_chassis for each logical
>> router port of the cluster router (ovn_cluster_router) connecting to
>> the node logical switch right ?
>>
>> If so, instead of using the multiple gw port feature, why can't
>> ovn-k8s just set the chassis=<node_chassis_name>
>> in the logical switch other_config option ?
>>
>> ovn-controllers can exclude the logical switches from the
>> local_datapaths if they don't belong to the local chassis.
>>
>> I'm not entirely sure if this would work.  Any thoughts ?  If the same
>> can be achieved using the chassis option
>> instead of multiple gw router ports, perhaps the former seems better
>> to me as it would be less work for ovn-k8s.
>> And there will be fewer resources in SB DB.   What do you think ?
>> Otherwise +1 from me for
>> this RFC series.
>>
> 
> Thanks Numan for the feedback!
> The reason why not introducing a new option in LS is:
> 1) The multiple DGP support is a valuable feature regardless of the use
> case of this RFC.
> 2) Don't flood-fill beyond DGP port is also valuable regardless of the
> ovn-k8s use case. As mentioned it would also help the OpenStack scalability
> when multi-tenant sharing same provider networks.
> 3) If 1) and 2) are both implemented, there is no need for an extra
> mechanism for "bind logical switches to chassis", because the outcome of 1)
> and 2) are sufficient. The changes in ovn-k8s would be the same, i.e. set
> the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to the
> ovn-k8s repo and it appears to be a very small change:
> https://github.com/ovn-org/ovn-kubernetes/pull/2388
> 
> In addition, a separate option on LS seems unnatural to me, because the end
> user must understand what they are doing by setting that option. In


This is a great series Han. Although, I haven't looked into all the
details. I think I disagree with this point. For me, at least, the idea
of setting the chassis for a switch appears a more intuitive way of
configuring this as it follows an established pattern (that we use for
routers).

> contrast, the DGP more flexibly and accurately tells what OVN should do.
> Maybe the name "Distributed Gateway Port" is somehow confusing, but the

Yes, this could be the case.

> chassis-redirect port behind it is telling OVN that the user wants the
> traffic to be redirected to a chassis for the LRP. There can be different
> scenarios such as a single LS connecting to multiple DGPs and vice versa,
> all are valid setups that can be supported by this feature. While setting a
> chassis option for a LS is arbitrary and it is easy to create conflict
> setups, e.g. setting such an option for LS-join. Of course we can say the
> user is responsible for what they are setting, but I just don't see it
> necessary for now.
> 
> Does this make sense?
> 
>> Thanks
>> Numan
>>
>>>
>>> Han Zhou (4):
>>>   ovn-northd: Avoid ha_ref_chassis calculation when there is only one
>>>     chassis in ha_chassis_group.
>>>   binding.c: Refactor binding_handle_port_binding_changes.
>>>   binding.c: Create a new function
>>>     consider_patch_port_for_local_datapaths.
>>>   ovn-controller: Don't flood fill local datapaths beyond DGP boundary.
>>>
>>>  controller/binding.c   | 190 +++++++++++++++++++++++++++++------------
>>>  northd/ovn-northd.c    |  39 +++++++--
>>>  ovn-architecture.7.xml |  26 ++++++
>>>  ovn-nb.xml             |   6 ++
>>>  tests/ovn.at           |  67 +++++++++++++++
>>>  5 files changed, 268 insertions(+), 60 deletions(-)
>>>
>>> --
>>> 2.30.2
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

Reply via email to