CC: Numan, Han, Mark, and Leonid.
On 11/25/21 22:33, Dumitru Ceara wrote:
> This series started as an effort to port the support for
> Load Balancer Groups to the DDlog version of northd. The initial
> patch that did that turned out to be very simple and small but test
> results were still not great.
>
> This series documents the incremental effort that was done to
> determine what exactly is causing the performance bottleneck in
> ovn-northd-ddlog.
>
> The series consists of:
> - 1/7: a simple, hacky, script to simulate part of an ovn-k8s
> deployment.
> - 2/7: the initial LB Groups DDlog implementation.
> - 3/7: port of a northd fix to avoid a LS_IN_LKUP flow explosion
> for load balancer VIPs.
> - 4/7: port of a northd fix to optimize LR ARP responder flows for
> load balancer VIPs (using address sets).
> - 5/7: a HACK to simulate the effect of generating ARP responder
> flows in the router pipeline only for VIPs that are reachable
> on at least one of the router's subnets. In the ovn-k8s
> scenario this means to skip all such flows. I couldn't
> figure out the proper DDlog implementation but the hack is
> good enough for proving the point.
> - 6/7: Split the generation of sb:Out_Load_Balancer relation in two
> steps. This significantly improves performance.
> - 7/7: Remove the load balancer routable/unroutable logic. It's not
> relevant to ovn-k8s and it is very CPU intensive.
>
> I ran the benchmark that creates N ovn-kubernetes-like nodes (switches
> and routers) and M services (load balancers) applied to all of them
> using a load balancer group; after M load balancers are created and
> propagated to SB, the test adds one more load balancer:
>
> # For 20 nodes, 3K services:
> $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
> $ ./lb-group-stress.sh 20 3000
>
> # For 120 nodes, 3K services:
> $ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
> $ ./lb-group-stress.sh 20 3000
>
> RUN Patch # RSS Last northd loop duration Comment
> (almost equivalent to
> time spent incrementally
> processing last load
> balancer addition)
> ----------------- ------- ----- ------------------------- -------
> 20 nodes + 3K LB 2/7 31.4g 94028ms Initial LB Group
> implementation
> 20 nodes + 3K LB 3/7 25.6g 89220ms Skip unreachable
> VIPs in switch pipeline
> 20 nodes + 3K LB 4/7 26.1g 92615ms Use address sets
> for VIPs in router pipeline
> 20 nodes + 3K LB 5/7 30.0g 96535ms Skip all VIP ARP
> responder flows in router pipeline
> 20 nodes + 3K LB 6/7 0.5g 15783ms Split
> sb::Out_Load_Balancer relation
> 20 nodes + 3K LB 7/7 0.5g 1581ms Remove load
> balancer routable/unroutable logic
>
> 120 nodes + 3K LB 2/7 62.3g* DNF (>= 346215ms)* Initial LB Group
> implementation
> 120 nodes + 3K LB 3/7 71.0g* DNF (>= 111938ms)* Skip unreachable
> VIPs in switch pipeline
> 120 nodes + 3K LB 4/7 65.3g* DNF (>= 121180ms)* Use address sets
> for VIPs in router pipeline
> 120 nodes + 3K LB 5/7 57.2g* DNF (>= 207258ms)* Skip all VIP ARP
> responder flows in router pipeline
> 120 nodes + 3K LB 6/7 2.1g 96363ms Split
> sb::Out_Load_Balancer relation
> 120 nodes + 3K LB 7/7 2.1g 10899ms Remove load
> balancer routable/unroutable logic
> * I stopped the test after a while.
>
> While some of the patches in the series are just quick and dirty
> hacks used to prove a hypothesis, the results seem to indicate
> that some careful optimization of the ovn-northd-ddlog code
> (preferably by someone more knowledgeable wrt. DDlog internals)
> would generate a very efficient implementation.
>
> For reference, the last step of the test, adding a new load balancer to
> the network, generates just a handful of Southbound updates, e.g.:
>
> record 48: 2021-11-25 15:49:05.343 "ovn-northd-ddlog"
> table Load_Balancer insert row "lb4001" (51ca22c0):
> name=lb4001
> protocol=tcp
> external_ids={lb_id="51ca22c0-8647-4811-8856-84738988dc61"}
> options={hairpin_orig_tuple="true"}
> datapaths=[05fbfbed-a976-4ad7-a484-3a452d63d838,
> 0794e508-2908-45a5-905a-0dd84ace89e6, 08d5c20e-68cb-49e0-803a-03b9699186c6,
> 28f77742-9e3e-4788-b6f2-e485e0880013, 3f8ae79a-0b83-4f92-ae1a-3b996626cf5b,
> 55865879-e6cc-4f6c-a582-56f2368a1263, 601a8e53-ad8d-411c-bf52-1de8f84645fa,
> 60d914d0-0ee1-47a1-8219-aec4aaa793c8, 682bd907-0854-492f-9331-c3fe39f7d9a9,
> 6af9ac9c-40bb-463f-a9e2-5ad58eac98b1, 6eb3c1f7-1467-46b3-9ccb-5e29cf4cf517,
> 70bb0703-7edb-4caf-90ad-a72f9081ad91, 7ecabe7b-7cd2-48ff-86e8-26eee4e9018c,
> 8b19c2d2-8571-4410-a29a-3c9aac9bc4b2, af362ec0-c07b-48f6-92b8-4a0fb03b7f44,
> b3eec0bd-7c96-459a-bec4-597c8d2defe1, c708f850-9823-4225-a275-9efef1786efd,
> c8b3d672-1f7f-497a-98a3-e30543fb37a7, dc4d30e6-5a56-4ce0-82f5-bc2c922196fb,
> de5b5c9d-dcfe-42fe-97c9-3573a62d95ef]
> vips={"42.15.176.1:8080"="42.15.176.2:8081"}
> table Logical_Flow insert row f3163fb7:
> pipeline=ingress
> match="ip && ip4.dst == 42.15.176.1 && tcp"
> logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
> priority=110
> external_ids={stage-name=lr_in_defrag}
> table_id=5
> actions="reg0 = 42.15.176.1; reg9[16..31] = tcp.dst; ct_dnat;"
> table Logical_Flow insert row 1d9318a2:
> pipeline=ingress
> match="ct.new && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] ==
> 8080"
> logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
> priority=120
> external_ids={stage-name=lr_in_dnat}
> table_id=6
> actions="ct_lb(backends=42.15.176.2:8081);"
> table Logical_Flow insert row ff0be33e:
> pipeline=ingress
> match="ct.est && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] ==
> 8080 && ct_label.natted == 1"
> logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
> priority=120
> external_ids={stage-name=lr_in_dnat}
> table_id=6
> actions="next;"
> table Logical_Flow insert row 74843573:
> pipeline=ingress
> match="ct.new && ip4.dst == 42.15.176.1 && tcp.dst == 8080"
> logical_dp_group=4bbda436-d7b7-833a-3f1d-b44c77abc1d4
> priority=120
> external_ids={stage-name=ls_in_stateful}
> table_id=12
> actions="reg1 = 42.15.176.1; reg2[0..15] = 8080;
> ct_lb(backends=42.15.176.2:8081);"
>
> Dumitru Ceara (7):
> tutorial: Add hacky load balancer stress test.
> northd-ddlog: Add LB Group support.
> northd-ddlog: Don't add ARP responder flows for unreachable VIPs.
> northd-ddlog: Use address sets for ARP responder flows for VIPs.
> northd-ddlog: HACK: Generate ARP responder flows only for reachable
> VIPs.
> northd-ddlog: Split sb::Out_Load_Balancer relation.
> HACK: Remove load balancer routable/unroutable logic.
>
>
> northd/lrouter.dl | 61 ++++++++++++----------
> northd/lswitch.dl | 6 ++
> northd/ovn-nb.dlopts | 1
> northd/ovn_northd.dl | 122
> ++++++++++++++++++++++---------------------
> tutorial/automake.mk | 3 +
> tutorial/lb-group-stress.sh | 60 +++++++++++++++++++++
> 6 files changed, 164 insertions(+), 89 deletions(-)
> create mode 100755 tutorial/lb-group-stress.sh
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev