On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote:
>
> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote:
> >
> > From: Numan Siddique <[email protected]>
> >
> > This patch series adds incremental processing in the lflow engine
> > node to handle changes to northd and other engine nodes.
> > Changed related to load balancers and NAT are mainly handled in
> > this patch series.
> >
> > This patch series can also be found here -
> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1
>
> Thanks Numan for the great improvement!

Hi Han,

Thanks for the review comments.  I understand it is hard to review
somany patches, especially related to I-P.
I appreciate the time spent on it and  the feedback.

>
> I spent some time these days to review the series and now at patch 10. I
> need to take a break and so I just sent the comments I had so far.
>
> I also did scale testing for northd with
> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
> chassis setup, and noticed that the recompute latency has increased 20%
> after the series. I think in general it is expected to have some
> performance penalty for the recompute case because of the new indexes
> created for I-P. However, the patch 10 "northd: Refactor lflow management
> into a separate module." alone introduces a 10% latency increase, which
> necessitates more investigation.

Before sending out the series I  did some testing on recomputes with a
large OVN NB DB and SB DB
(from a 500 node ovn-heater density heavy run).
I'm aware of the increase in recomputes.  And I did some more testing
today as well.

In my testing,  I can see that the increase in latency is due to the
new engine nodes added (lr_lbnat mainly)
and due to housekeeping of the lflow references.  I do not see any
increase due to the new lflow-mgr.c added in patch 10.

I compared patch 9 and patch 10 separately and there is no difference
in lflow engine node recompute time between patch 9 and 10.

Below are the results with the time taken for the mentioned engine
nodes in msecs for a recompute for some of the individual patches in
the series.


The sample OVN DBs have

--------------------------------
Resource              Total
-------------------------------
Logical switches    1001
----------------------------
Logical routers      501
----------------------------
Router ports         1501
----------------------------
Switch ports         29501
----------------------------
Load balancers    35253
----------------------------
Load balancer group 1
----------------------------
SB Logical flows    268349
----------------------------
SB DB groups          509
----------------------------

There is one load balancer group which has all the load balancers and
it is associated with all the logical switches and routers.

Below is the time taken for each engine node in msec

---------------------------------------------------------------------
Engine nodes     | lb_data | northd  | lr_lbnat  | lflow  |
---------------------------------------------------------------------
ovn-northd-main  | 358      | 2455    | x         | 2082     |
---------------------------------------------------------------------
ovn-northd-p1    | 373       | 2476    | x         | 2170       |
---------------------------------------------------------------------
ovn-northd-p5    | 367       | 2413    | x         | 2195       |
---------------------------------------------------------------------
ovn-northd-p6    | 354       | 688     | 1815      | 2442    |
---------------------------------------------------------------------
ovn-northd-p7    | 357       | 683     | 1778      | 2806    |
---------------------------------------------------------------------
ovn-northd-p9    | 352       | 682     | 1781      | 2781    |
---------------------------------------------------------------------
ovn-northd-p10   | 365      | 838     | 1707      | 2812    |
---------------------------------------------------------------------
ovn-northd-p13   | 362      | 1066    | 1882      | 2917    |
---------------------------------------------------------------------
ovn-northd-p15   | 359      | 1046    | 1688      | 2907    |
---------------------------------------------------------------------
ovn-northd-p18   | 379      | 1066    | 1729      | 2886    |
---------------------------------------------------------------------

ovn-northd-p1 means ovn-northd with the patch 1 of this series,
ovn-northd-p2 - patch 2 of this series and so on.

Patch 6 adds a new engine node lr_lbnat and that's why the northd
engine node time gets reduced from ~2413 msec to 688.

With the first 10 patches,  northd engine over all time increases to
200msec compared to "main '' and lflow engine node time increases to
around 800 msec.
The point of this data is to show that the overall increase is mainly
due to bookkeeping and new engine nodes.  I tried to optimise but I
couldn't.


IIMO this 20% overall increase should be fine for the following reasons
   -  The scale tests with ovn-heater show significant decrease in
ovn-northd CPU usage and the decrease in the overall test duration.
   -  This means there were very few recomputes triggered.
   -  With the OCP/K8s  kube-burner  tests,  there were 0 recomputes
in the northd and lflow engine nodes.



>
> >
> > Prior to this patch series, most of the changes to northd engine
> > resulted in full recomputation of logical flows.  This series
> > aims to improve the performance of ovn-northd by adding the I-P
> > support.
>
> I'd like to clarify "most of the changes" a little. I think we should focus
> on the most impactful changes that happen in a cloud environment. I don't
> think it is realistic to achieve "most of the changes" in I-P because it is
> too complex (even with this series we still handle a very small part of the
> DB schema incrementally), but it may be realistic to handle the most
> impactful changes, which are the most frequent changes in the cloud,
> incrementally. Before this series, we could handle regular VIF changes and
> related address-sets, port-groups (some of) updates incrementally. Those
> are the most common changes related to pod/VM changes in cloud. I believe
> the next goal is for LB changes related to pod/VMs (i.e. the LB backends),
> which I believe is the main goal of this series. Is my understanding
> correct?
>
That is correct.  LB change handling is very important in my opinion
when used with ovn-kubernetes because all the k8s services
are mapped to OVN LBs.


> While reviewing the patches, I'd say sometimes I feel a little lost because
> it is hard to correlate some of the changes with the above goal.

I understand that.  Maybe I've done a bad job in conveying why the
changes are required for LBs.

Let me try to clarify a bit here.  I'll incorporate and add more
details in the patch commits and cover letter in the next series.

This series mainly targets handling lflow generation for load balancer changes.
As you know we use conntrack in OVN for
   - Load balancers (both in logical switch and router)
   - ACLs in logical switch
   - NAT in load balancers.

And this makes it complicated to decouple logical flow generation for
LBs from ACLs and NATs.  That is the reason I had to split northd node
data related to load balancers, ACLs and NATs into separate nodes.

To give an example, we generate certain logical flows for a logical
switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See
build_acl_hints()).
For logical routers, a NAT can be a LB VIP too.  This becomes
difficult to decouple LB and NAT.   We could fall back to recompute
for NAT changes.  But we still need to group logical flows related to
LB and NAT.  And that's why I added lr_lbnat_data engine node in patch
6.

 I believe
> there is a reason for all the changes but I am not sure if it is directly
> related to the goal of LB backend related I-P. I understand that there are
> other aspects of LB that can be incrementally processed. But I'd recommend
> if changes necessary for this goal can be largely reduced it would help the
> review and we might be able to merge them sooner and add more but less
> impactful I-P later step by step. I guess I will have a clearer picture
> when I finish the rest of the patches, but it would be helpful if you could
> add more guidance in this cover letter.

I'll definitely add more details in the cover letter.

Apart from the LB related I-P, this series only adds 2 more additional
patches for I-P.
One is for the NB_Global changes and the other for SB mac binding changes.
ovn-kubernetes periodically writes in the NB_Global.options column
with a timestamp for its internal health checks
and this results in unnecessary recomputes.   After running the
kube-burner density cni tests, I figured out the most common
changes done in the NB DB and added the I-P for that.  With all these
patches applied, there are no recomputes in both northd and lflow
engine nodes
during the test duration.  Not having the I-P for NB Global for
example was resulting in significant ovn-northd CPU usage.

Thanks again for the reviews.

Numan

>
> Thanks,
> Han
>
> >  In order to add this support, some of the northd engine
> > node data (from struct ovn_datapath) is split and moved over to
> > new engine nodes - mainly related to load balancers, NAT and ACLs.
> >
> > Below are the scale testing results done with these patches applied
> > using ovn-heater.  The test ran the scenario  -
> > ocp-500-density-heavy.yml [1].
> >
> > With all the lflow I-P patches applied, the resuts are:
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >                         Min (s)         Median (s)      90%ile (s)
>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
> Failed
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Iteration Total         0.136883        1.129016        1.192001
>  1.204167        1.212728        0.665017        83.127099       125     0
> > Namespace.add_ports     0.005216        0.005736        0.007034
>  0.015486        0.018978        0.006211        0.776373        125     0
> > WorkerNode.bind_port    0.035030        0.046082        0.052469
>  0.058293        0.060311        0.045973        11.493259       250     0
> > WorkerNode.ping_port    0.005057        0.006727        1.047692
>  1.069253        1.071336        0.266896        66.724094       250     0
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > The results with the present main are:
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >                         Min (s)         Median (s)      90%ile (s)
>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
> Failed
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> > Iteration Total         0.135491        2.223805        3.311270
>  3.339078        3.345346        1.729172        216.146495      125     0
> > Namespace.add_ports     0.005380        0.005744        0.006819
>  0.018773        0.020800        0.006292        0.786532        125     0
> > WorkerNode.bind_port    0.034179        0.046055        0.053488
>  0.058801        0.071043        0.046117        11.529311       250     0
> > WorkerNode.ping_port    0.004956        0.006952        3.086952
>  3.191743        3.192807        0.791544        197.886026      250     0
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > [1] -
> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> >
> > v1 -> v2
> > --------
> >    * Now also maintaing array indexes for ls_lbacls, lr_nat and
> >      lr_lb_nat_data tables (similar to ovn_datapaths->array) to
> >      make the lookup effecient.  The same ovn_datapath->index
> >      is reused.
> >
> >    * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c.
> >      In v2 we don't use objdep_mgr to maintain the resource to lflow
> >      references.  Instead we maintain the 'struct lflow' pointer.
> >      With this we don't need to maintain additional hmap of lflows.
> >
> >
> > Numan Siddique (18):
> >   northd: Refactor the northd change tracking.
> >   northd: Track ovn_datapaths in northd engine track data.
> >   tests: Add a couple of tests in ovn-northd for I-P.
> >   northd: Move router ports SB PB options sync to sync_to_sb_pb node.
> >   northd: Add a new engine 'lr-nat' to manage lr NAT data.
> >   northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data.
> >   northd:  Generate logical router's LB and NAT flows using
> >     lr_lbnat_data.
> >   northd: Don't commit dhcp response flows in the conntrack.
> >   northd: Add a new node ls_lbacls.
> >   northd: Refactor lflow management into a separate module.
> >   northd: Use lflow_ref when adding all logical flows.
> >   northd:  Move ovn_lb_datapaths from lib to northd module.
> >   northd: Handle lb changes in lflow engine.
> >   northd: Add lr_lb_nat_data handler for lflow engine node.
> >   northd: Add ls_lbacls handler for lflow engine node.
> >   northd:  Add I-P for NB_Global and SB_Global.
> >   northd: Add a noop handler for northd SB mac binding.
> >   northd: Add northd change handler for sync_to_sb_lb node.
> >
> >  lib/lb.c                   |   96 -
> >  lib/lb.h                   |   57 -
> >  lib/ovn-util.c             |   17 +-
> >  lib/ovn-util.h             |    2 +-
> >  lib/stopwatch-names.h      |    3 +
> >  northd/aging.c             |   21 +-
> >  northd/automake.mk         |   12 +-
> >  northd/en-global-config.c  |  588 ++++
> >  northd/en-global-config.h  |   65 +
> >  northd/en-lflow.c          |  123 +-
> >  northd/en-lflow.h          |    8 +
> >  northd/en-lr-lb-nat-data.c |  685 ++++
> >  northd/en-lr-lb-nat-data.h |  125 +
> >  northd/en-lr-nat.c         |  498 +++
> >  northd/en-lr-nat.h         |  137 +
> >  northd/en-ls-lb-acls.c     |  530 +++
> >  northd/en-ls-lb-acls.h     |  111 +
> >  northd/en-northd.c         |   67 +-
> >  northd/en-northd.h         |    2 +-
> >  northd/en-port-group.h     |    3 +
> >  northd/en-sync-sb.c        |  513 ++-
> >  northd/inc-proc-northd.c   |   79 +-
> >  northd/lflow-mgr.c         | 1078 ++++++
> >  northd/lflow-mgr.h         |  192 ++
> >  northd/northd.c            | 6485 ++++++++++++++++--------------------
> >  northd/northd.h            |  551 ++-
> >  northd/ovn-northd.c        |    4 +
> >  tests/ovn-northd.at        |  858 ++++-
> >  28 files changed, 8827 insertions(+), 4083 deletions(-)
> >  create mode 100644 northd/en-global-config.c
> >  create mode 100644 northd/en-global-config.h
> >  create mode 100644 northd/en-lr-lb-nat-data.c
> >  create mode 100644 northd/en-lr-lb-nat-data.h
> >  create mode 100644 northd/en-lr-nat.c
> >  create mode 100644 northd/en-lr-nat.h
> >  create mode 100644 northd/en-ls-lb-acls.c
> >  create mode 100644 northd/en-ls-lb-acls.h
> >  create mode 100644 northd/lflow-mgr.c
> >  create mode 100644 northd/lflow-mgr.h
> >
> > --
> > 2.41.0
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to