On Thu, Feb 8, 2024, 4:49 PM <[email protected]> wrote: > From: Numan Siddique <[email protected]> > > This patch series fixes the memory and CPU usage increase seen > in ovn-northd after the lflow I-P patches were merged. > > The first 2 patches in the series addresses duplicate flows > added by northd into the lflow table for the same datapath. > > The 3rd patch fixes a bug in lflow-mgr and the 4th patch > actually addresses the northd memory and CPU increase issue. > > We considered 2 approaches to solve this. > > Approach 1 (which this series adopts) solves by maintaining dp > refcnt for an lflow only if required. > > Approach 2 (which can be found [1]) solves this by resorting to > a full recompute when an lflow is added more than once for a datapath. > > Below are the test results with ovn-heater for both the approaches. > > Cluster density 500 node test > ----------------------------- > > | Avg. Poll Intervals | Total test time | northd > RSS > > --------------------------------+-----------------+------------------------- > Before lflow I-P | 1.5 seconds | 1005 seconds | 2.5 GB > lflow i-p merged | 6 seconds | 2246 seconds | 8.5 GB > Approach 1 | 2.1 seconds | 1142 seconds | 2.67 GB > Approach 2 | 1.8 seconds | 1046 seconds | 2.41 GB > > ----------------------------------------------------------------------------- > > Node density heavy 500 node test > -------------------------------- > > | Avg. Poll Intervals | Total test time | northd > RSS > --------------------------------+-----------------+----------------------- > Before lflow I-P | 1.3 seconds | 192 seconds | 1.49 GB > > lflow I-P merged | 4.5 seconds | 87 seconds | 7.3 GB > Approach 1 | 2.4 seconds | 83 seconds | 2.2 GB >
A clarification here : with Approach 1 even though the 2.4 seconds poll interval seems high, but it is only with 3 poll interval data. i.e. ovn-northd didn't recompute for the most part of test duration. Thanks Numan Approach 2 | 1.36 seconds | 193 seconds | 2.2 GB > ------------------------------------------------------------------------- > > Both has advantages and disadvantages > > (As outlined by Ilya below about pros and cons) > Approach 1 > --------- > Pros: > * Doesn't fall back to recompute more often than current main. > * Fairly simple. > * Can be optimized by getting rid of duplicated lflows - we'll allocate > less refcounts. > Cons: > * Higher CPU and memory usage in ovn-heater tests due to actual refcount > and > hash map allocations. > > Approach 2: > --------- > Pros: > * Lower memory usage due to no refcounting. > * Lower CPU usage in cases where we do not fall into recompute. > * End code is simpler. > * Can be optimized by getting rid of duplicated lflows - we'll not fall > back to recompute that often. > > Cons: > * Falling into recompute more frequently - Higher CPU usage in some > cases. > (whenever users create the same LBs for different protos) > * Concerning log message in legitimate configurations. > > > We chose Approach 1 based on the above test results. > > [1] - https://github.com/numansiddique/ovn/commits/dp_refcnt_fix_v1 > > > Ilya Maximets (1): > northd: lflow-mgr: Allocate DP reference counters on a second use. > > Numan Siddique (3): > northd: Don't add lr_out_delivery default drop flow for each lrp. > northd: Don't add ARP request responder flows for NAT multiple times. > northd: Fix lflow ref node's reference counting. > > northd/en-lr-nat.c | 6 ++++++ > northd/en-lr-nat.h | 2 ++ > northd/lflow-mgr.c | 52 ++++++++++++++++++++++++++-------------------- > northd/northd.c | 43 +++++++++++++++++++++++++++++--------- > northd/northd.h | 1 + > 5 files changed, 71 insertions(+), 33 deletions(-) > > -- > 2.43.0 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
