This series started as an effort to port the support for
Load Balancer Groups to the DDlog version of northd. The initial
patch that did that turned out to be very simple and small but test
results were still not great.
This series documents the incremental effort that was done to
determine what exactly is causing the performance bottleneck in
ovn-northd-ddlog.
The series consists of:
- 1/7: a simple, hacky, script to simulate part of an ovn-k8s
deployment.
- 2/7: the initial LB Groups DDlog implementation.
- 3/7: port of a northd fix to avoid a LS_IN_LKUP flow explosion
for load balancer VIPs.
- 4/7: port of a northd fix to optimize LR ARP responder flows for
load balancer VIPs (using address sets).
- 5/7: a HACK to simulate the effect of generating ARP responder
flows in the router pipeline only for VIPs that are reachable
on at least one of the router's subnets. In the ovn-k8s
scenario this means to skip all such flows. I couldn't
figure out the proper DDlog implementation but the hack is
good enough for proving the point.
- 6/7: Split the generation of sb:Out_Load_Balancer relation in two
steps. This significantly improves performance.
- 7/7: Remove the load balancer routable/unroutable logic. It's not
relevant to ovn-k8s and it is very CPU intensive.
I ran the benchmark that creates N ovn-kubernetes-like nodes (switches
and routers) and M services (load balancers) applied to all of them
using a load balancer group; after M load balancers are created and
propagated to SB, the test adds one more load balancer:
# For 20 nodes, 3K services:
$ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
$ ./lb-group-stress.sh 20 3000
# For 120 nodes, 3K services:
$ SANDBOXFLAGS="--no-ovn-rbac --ddlog --no-ddlog-record" make sandbox
$ ./lb-group-stress.sh 20 3000
RUN Patch # RSS Last northd loop duration Comment
(almost equivalent to
time spent incrementally
processing last load
balancer addition)
----------------- ------- ----- ------------------------- -------
20 nodes + 3K LB 2/7 31.4g 94028ms Initial LB Group
implementation
20 nodes + 3K LB 3/7 25.6g 89220ms Skip unreachable
VIPs in switch pipeline
20 nodes + 3K LB 4/7 26.1g 92615ms Use address sets
for VIPs in router pipeline
20 nodes + 3K LB 5/7 30.0g 96535ms Skip all VIP ARP
responder flows in router pipeline
20 nodes + 3K LB 6/7 0.5g 15783ms Split
sb::Out_Load_Balancer relation
20 nodes + 3K LB 7/7 0.5g 1581ms Remove load
balancer routable/unroutable logic
120 nodes + 3K LB 2/7 62.3g* DNF (>= 346215ms)* Initial LB Group
implementation
120 nodes + 3K LB 3/7 71.0g* DNF (>= 111938ms)* Skip unreachable
VIPs in switch pipeline
120 nodes + 3K LB 4/7 65.3g* DNF (>= 121180ms)* Use address sets
for VIPs in router pipeline
120 nodes + 3K LB 5/7 57.2g* DNF (>= 207258ms)* Skip all VIP ARP
responder flows in router pipeline
120 nodes + 3K LB 6/7 2.1g 96363ms Split
sb::Out_Load_Balancer relation
120 nodes + 3K LB 7/7 2.1g 10899ms Remove load
balancer routable/unroutable logic
* I stopped the test after a while.
While some of the patches in the series are just quick and dirty
hacks used to prove a hypothesis, the results seem to indicate
that some careful optimization of the ovn-northd-ddlog code
(preferably by someone more knowledgeable wrt. DDlog internals)
would generate a very efficient implementation.
For reference, the last step of the test, adding a new load balancer to
the network, generates just a handful of Southbound updates, e.g.:
record 48: 2021-11-25 15:49:05.343 "ovn-northd-ddlog"
table Load_Balancer insert row "lb4001" (51ca22c0):
name=lb4001
protocol=tcp
external_ids={lb_id="51ca22c0-8647-4811-8856-84738988dc61"}
options={hairpin_orig_tuple="true"}
datapaths=[05fbfbed-a976-4ad7-a484-3a452d63d838,
0794e508-2908-45a5-905a-0dd84ace89e6, 08d5c20e-68cb-49e0-803a-03b9699186c6,
28f77742-9e3e-4788-b6f2-e485e0880013, 3f8ae79a-0b83-4f92-ae1a-3b996626cf5b,
55865879-e6cc-4f6c-a582-56f2368a1263, 601a8e53-ad8d-411c-bf52-1de8f84645fa,
60d914d0-0ee1-47a1-8219-aec4aaa793c8, 682bd907-0854-492f-9331-c3fe39f7d9a9,
6af9ac9c-40bb-463f-a9e2-5ad58eac98b1, 6eb3c1f7-1467-46b3-9ccb-5e29cf4cf517,
70bb0703-7edb-4caf-90ad-a72f9081ad91, 7ecabe7b-7cd2-48ff-86e8-26eee4e9018c,
8b19c2d2-8571-4410-a29a-3c9aac9bc4b2, af362ec0-c07b-48f6-92b8-4a0fb03b7f44,
b3eec0bd-7c96-459a-bec4-597c8d2defe1, c708f850-9823-4225-a275-9efef1786efd,
c8b3d672-1f7f-497a-98a3-e30543fb37a7, dc4d30e6-5a56-4ce0-82f5-bc2c922196fb,
de5b5c9d-dcfe-42fe-97c9-3573a62d95ef]
vips={"42.15.176.1:8080"="42.15.176.2:8081"}
table Logical_Flow insert row f3163fb7:
pipeline=ingress
match="ip && ip4.dst == 42.15.176.1 && tcp"
logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
priority=110
external_ids={stage-name=lr_in_defrag}
table_id=5
actions="reg0 = 42.15.176.1; reg9[16..31] = tcp.dst; ct_dnat;"
table Logical_Flow insert row 1d9318a2:
pipeline=ingress
match="ct.new && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == 8080"
logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
priority=120
external_ids={stage-name=lr_in_dnat}
table_id=6
actions="ct_lb(backends=42.15.176.2:8081);"
table Logical_Flow insert row ff0be33e:
pipeline=ingress
match="ct.est && ip4 && reg0 == 42.15.176.1 && tcp && reg9[16..31] == 8080
&& ct_label.natted == 1"
logical_dp_group=64b4348e-4978-7e7f-7c8a-d263bcea5480
priority=120
external_ids={stage-name=lr_in_dnat}
table_id=6
actions="next;"
table Logical_Flow insert row 74843573:
pipeline=ingress
match="ct.new && ip4.dst == 42.15.176.1 && tcp.dst == 8080"
logical_dp_group=4bbda436-d7b7-833a-3f1d-b44c77abc1d4
priority=120
external_ids={stage-name=ls_in_stateful}
table_id=12
actions="reg1 = 42.15.176.1; reg2[0..15] = 8080;
ct_lb(backends=42.15.176.2:8081);"
Dumitru Ceara (7):
tutorial: Add hacky load balancer stress test.
northd-ddlog: Add LB Group support.
northd-ddlog: Don't add ARP responder flows for unreachable VIPs.
northd-ddlog: Use address sets for ARP responder flows for VIPs.
northd-ddlog: HACK: Generate ARP responder flows only for reachable VIPs.
northd-ddlog: Split sb::Out_Load_Balancer relation.
HACK: Remove load balancer routable/unroutable logic.
northd/lrouter.dl | 61 ++++++++++++----------
northd/lswitch.dl | 6 ++
northd/ovn-nb.dlopts | 1
northd/ovn_northd.dl | 122 ++++++++++++++++++++++---------------------
tutorial/automake.mk | 3 +
tutorial/lb-group-stress.sh | 60 +++++++++++++++++++++
6 files changed, 164 insertions(+), 89 deletions(-)
create mode 100755 tutorial/lb-group-stress.sh
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev