From: Numan Siddique <[email protected]>
This patch series fixes the memory and CPU usage increase seen
in ovn-northd after the lflow I-P patches were merged.
The first 2 patches in the series addresses duplicate flows
added by northd into the lflow table for the same datapath.
The 3rd patch fixes a bug in lflow-mgr and the 4th patch
actually addresses the northd memory and CPU increase issue.
We considered 2 approaches to solve this.
Approach 1 (which this series adopts) solves by maintaining dp
refcnt for an lflow only if required.
Approach 2 (which can be found [1]) solves this by resorting to
a full recompute when an lflow is added more than once for a datapath.
Below are the test results with ovn-heater for both the approaches.
Cluster density 500 node test
-----------------------------
| Avg. Poll Intervals | Total test time | northd RSS
--------------------------------+-----------------+-------------------------
Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB
lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB
Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB
Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB
-----------------------------------------------------------------------------
Node density heavy 500 node test
--------------------------------
| Avg. Poll Intervals | Total test time | northd RSS
--------------------------------+-----------------+-----------------------
Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB
lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB
Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB
Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB
-------------------------------------------------------------------------
Both has advantages and disadvantages
(As outlined by Ilya below about pros and cons)
Approach 1
---------
Pros:
* Doesn't fall back to recompute more often than current main.
* Fairly simple.
* Can be optimized by getting rid of duplicated lflows - we'll allocate less
refcounts.
Cons:
* Higher CPU and memory usage in ovn-heater tests due to actual refcount and
hash map allocations.
Approach 2:
---------
Pros:
* Lower memory usage due to no refcounting.
* Lower CPU usage in cases where we do not fall into recompute.
* End code is simpler.
* Can be optimized by getting rid of duplicated lflows - we'll not fall back
to recompute that often.
Cons:
* Falling into recompute more frequently - Higher CPU usage in some cases.
(whenever users create the same LBs for different protos)
* Concerning log message in legitimate configurations.
We chose Approach 1 based on the above test results.
[1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1
Ilya Maximets (1):
northd: lflow-mgr: Allocate DP reference counters on a second use.
Numan Siddique (3):
northd: Don't add lr_out_delivery default drop flow for each lrp.
northd: Don't add ARP request responder flows for NAT multiple times.
northd: Fix lflow ref node's reference counting.
northd/en-lr-nat.c | 6 ++++++
northd/en-lr-nat.h | 2 ++
northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++--------------------
northd/northd.c | 43 +++++++++++++++++++++++++++++---------
northd/northd.h | 1 +
5 files changed, 71 insertions(+), 33 deletions(-)
--
2.43.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev