From: Numan Siddique <[email protected]>

This patch series fixes the memory and CPU usage increase seen
in ovn-northd after the lflow I-P patches were merged.

The first 2 patches in the series addresses duplicate flows
added by northd into the lflow table for the same datapath.

The 3rd patch fixes a bug in lflow-mgr and the 4th patch
actually addresses the northd memory and CPU increase issue.

We considered 2 approaches to solve this.

Approach 1 (which this series adopts) solves by maintaining dp
refcnt for an lflow only if required.

Approach 2 (which can be found [1]) solves this by resorting to
a full recompute when an lflow is added more than once for a datapath.

Below are the test results with ovn-heater for both the approaches.

Cluster density 500 node test
-----------------------------

                       | Avg. Poll Intervals | Total test time |  northd RSS
--------------------------------+-----------------+-------------------------
Before lflow I-P       |     1.5 seconds     |  1005 seconds   |    2.5 GB
lflow i-p merged       |      6  seconds     |  2246 seconds   |    8.5 GB
Approach 1             |     2.1 seconds     |  1142 seconds   |    2.67 GB
Approach 2             |     1.8 seconds     |  1046 seconds   |    2.41 GB
-----------------------------------------------------------------------------

Node density heavy 500 node test
--------------------------------

                       | Avg. Poll Intervals | Total test time |  northd RSS
--------------------------------+-----------------+-----------------------
Before lflow I-P       |     1.3 seconds     |  192 seconds    |  1.49 GB   
lflow I-P merged       |     4.5 seconds     |  87 seconds     |  7.3 GB
Approach 1             |     2.4 seconds     |  83 seconds     |  2.2 GB
Approach 2             |     1.36 seconds    |  193 seconds    |  2.2 GB
-------------------------------------------------------------------------

Both has advantages and disadvantages

(As outlined by Ilya below about pros and cons)
Approach 1
---------
Pros:
  * Doesn't fall back to recompute more often than current main.
  * Fairly simple.
  * Can be optimized by getting rid of duplicated lflows - we'll allocate less 
refcounts.
Cons:
  * Higher CPU and memory usage in ovn-heater tests due to actual refcount and
    hash map allocations.

Approach 2:
---------
Pros:
   * Lower memory usage due to no refcounting.
   * Lower CPU usage in cases where we do not fall into recompute.
   * End code is simpler.
   * Can be optimized by getting rid of duplicated lflows - we'll not fall back 
to recompute that often.

Cons:
   * Falling into recompute more frequently - Higher CPU usage in some cases.
     (whenever users create the same LBs for different protos)
   * Concerning log message in legitimate configurations.


We chose Approach 1 based on the above test results.

[1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1


Ilya Maximets (1):
  northd: lflow-mgr: Allocate DP reference counters on a second use.

Numan Siddique (3):
  northd: Don't add lr_out_delivery default drop flow for each lrp.
  northd:  Don't add ARP request responder flows for NAT multiple times.
  northd: Fix lflow ref node's reference counting.

 northd/en-lr-nat.c |  6 ++++++
 northd/en-lr-nat.h |  2 ++
 northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++--------------------
 northd/northd.c    | 43 +++++++++++++++++++++++++++++---------
 northd/northd.h    |  1 +
 5 files changed, 71 insertions(+), 33 deletions(-)

-- 
2.43.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to