On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <[email protected]> wrote:
>
> On 11/16/23 22:05, Numan Siddique wrote:
> > On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote:
> >>
> >> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote:
> >>>
> >>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote:
> >>>>
> >>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote:
> >>>>>
> >>>>> From: Numan Siddique <[email protected]>
> >>>>>
> >>>>> This patch series adds incremental processing in the lflow engine
> >>>>> node to handle changes to northd and other engine nodes.
> >>>>> Changed related to load balancers and NAT are mainly handled in
> >>>>> this patch series.
> >>>>>
> >>>>> This patch series can also be found here -
> >>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1
> >>>>
> >>>> Thanks Numan for the great improvement!
> >>>
> >>> Hi Han,
> >>>
> >>> Thanks for the review comments.  I understand it is hard to review
> >>> somany patches, especially related to I-P.
> >>> I appreciate the time spent on it and  the feedback.
> >>>
> >>>>
> >>>> I spent some time these days to review the series and now at patch 10. I
> >>>> need to take a break and so I just sent the comments I had so far.
> >>>>
> >>>> I also did scale testing for northd with
> >>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
> >>>> chassis setup, and noticed that the recompute latency has increased 20%
> >>>> after the series. I think in general it is expected to have some
> >>>> performance penalty for the recompute case because of the new indexes
> >>>> created for I-P. However, the patch 10 "northd: Refactor lflow
> >> management
> >>>> into a separate module." alone introduces a 10% latency increase, which
> >>>> necessitates more investigation.
> >>>
> >>> Before sending out the series I  did some testing on recomputes with a
> >>> large OVN NB DB and SB DB
> >>> (from a 500 node ovn-heater density heavy run).
> >>> I'm aware of the increase in recomputes.  And I did some more testing
> >>> today as well.
> >>>
> >>> In my testing,  I can see that the increase in latency is due to the
> >>> new engine nodes added (lr_lbnat mainly)
> >>> and due to housekeeping of the lflow references.  I do not see any
> >>> increase due to the new lflow-mgr.c added in patch 10.
> >>>
> >>> I compared patch 9 and patch 10 separately and there is no difference
> >>> in lflow engine node recompute time between patch 9 and 10.
> >>>
> >>
> >> My results were different. My test profile simulates the ovn-k8s topology
> >> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small
> >> amount of ACLs and PGs.
> >> (
> >> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg
> >> )
> >>
> >> The test results for ovn-northd recompute time are:
> >> main: 1118.3
> >> p9: 1129.5
> >> p10: 1243.4 ==> 10% more than p9
> >> p18: 1357.6
> >>
> >> I am not sure if the p10 increase is related to the hash change or
> >> something else.
> >>
>
> I didn't review p10 in detail yet but I did run some tests and (with gcc
> and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no
> significant difference between p9 and p10 when running Han's scale test
> profile.
>
> Then I also tried the same thing using the 500 nodes + 50 LSP/node perf
> test we already have in the repo:
>
> https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185
>
> Again, I didn't see any significant change between p9 and p10 on my
> laptop.  I wonder what's different in our setups.
>
> Kind of related, I posted a series to improve the in-tree perf testing
> and allow us to also gather recompute stats (build the DB, reset
> stopwatches and trigger a recompute, then collect stopwatches):
>
> https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=*
>
> Would it be an idea to try to merge both scenarios you guys used into
> something that's defined as a new test in-tree?  Like that it's easier
> for anyone to just run the same test.

That sounds good to me. I'll take a look at your patches soon.

Thanks for testing them out.  Can you also please compare OVN main vs
p10 of this series ?

Thanks
Numan


>
> Thanks,
> Dumitru
>
> >>> Below are the results with the time taken for the mentioned engine
> >>> nodes in msecs for a recompute for some of the individual patches in
> >>> the series.
> >>>
> >>>
> >>> The sample OVN DBs have
> >>>
> >>> --------------------------------
> >>> Resource              Total
> >>> -------------------------------
> >>> Logical switches    1001
> >>> ----------------------------
> >>> Logical routers      501
> >>> ----------------------------
> >>> Router ports         1501
> >>> ----------------------------
> >>> Switch ports         29501
> >>> ----------------------------
> >>> Load balancers    35253
> >>> ----------------------------
> >>> Load balancer group 1
> >>> ----------------------------
> >>> SB Logical flows    268349
> >>> ----------------------------
> >>> SB DB groups          509
> >>> ----------------------------
> >>>
> >>> There is one load balancer group which has all the load balancers and
> >>> it is associated with all the logical switches and routers.
> >>>
> >>> Below is the time taken for each engine node in msec
> >>>
> >>> ---------------------------------------------------------------------
> >>> Engine nodes     | lb_data | northd  | lr_lbnat  | lflow  |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-main  | 358      | 2455    | x         | 2082     |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p1    | 373       | 2476    | x         | 2170       |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p5    | 367       | 2413    | x         | 2195       |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p6    | 354       | 688     | 1815      | 2442    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p7    | 357       | 683     | 1778      | 2806    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p9    | 352       | 682     | 1781      | 2781    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p10   | 365      | 838     | 1707      | 2812    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p13   | 362      | 1066    | 1882      | 2917    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p15   | 359      | 1046    | 1688      | 2907    |
> >>> ---------------------------------------------------------------------
> >>> ovn-northd-p18   | 379      | 1066    | 1729      | 2886    |
> >>> ---------------------------------------------------------------------
> >>>
> >>> ovn-northd-p1 means ovn-northd with the patch 1 of this series,
> >>> ovn-northd-p2 - patch 2 of this series and so on.
> >>>
> >>> Patch 6 adds a new engine node lr_lbnat and that's why the northd
> >>> engine node time gets reduced from ~2413 msec to 688.
> >>>
> >>> With the first 10 patches,  northd engine over all time increases to
> >>> 200msec compared to "main '' and lflow engine node time increases to
> >>> around 800 msec.
> >>> The point of this data is to show that the overall increase is mainly
> >>> due to bookkeeping and new engine nodes.  I tried to optimise but I
> >>> couldn't.
> >>>
> >>>
> >>> IIMO this 20% overall increase should be fine for the following reasons
> >>>    -  The scale tests with ovn-heater show significant decrease in
> >>> ovn-northd CPU usage and the decrease in the overall test duration.
> >>>    -  This means there were very few recomputes triggered.
> >>>    -  With the OCP/K8s  kube-burner  tests,  there were 0 recomputes
> >>> in the northd and lflow engine nodes.
> >>>
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> Prior to this patch series, most of the changes to northd engine
> >>>>> resulted in full recomputation of logical flows.  This series
> >>>>> aims to improve the performance of ovn-northd by adding the I-P
> >>>>> support.
> >>>>
> >>>> I'd like to clarify "most of the changes" a little. I think we should
> >> focus
> >>>> on the most impactful changes that happen in a cloud environment. I
> >> don't
> >>>> think it is realistic to achieve "most of the changes" in I-P because
> >> it is
> >>>> too complex (even with this series we still handle a very small part of
> >> the
> >>>> DB schema incrementally), but it may be realistic to handle the most
> >>>> impactful changes, which are the most frequent changes in the cloud,
> >>>> incrementally. Before this series, we could handle regular VIF changes
> >> and
> >>>> related address-sets, port-groups (some of) updates incrementally. Those
> >>>> are the most common changes related to pod/VM changes in cloud. I
> >> believe
> >>>> the next goal is for LB changes related to pod/VMs (i.e. the LB
> >> backends),
> >>>> which I believe is the main goal of this series. Is my understanding
> >>>> correct?
> >>>>
> >>> That is correct.  LB change handling is very important in my opinion
> >>> when used with ovn-kubernetes because all the k8s services
> >>> are mapped to OVN LBs.
> >>>
> >>>
> >>>> While reviewing the patches, I'd say sometimes I feel a little lost
> >> because
> >>>> it is hard to correlate some of the changes with the above goal.
> >>>
> >>> I understand that.  Maybe I've done a bad job in conveying why the
> >>> changes are required for LBs.
> >>>
> >>> Let me try to clarify a bit here.  I'll incorporate and add more
> >>> details in the patch commits and cover letter in the next series.
> >>>
> >>> This series mainly targets handling lflow generation for load balancer
> >> changes.
> >>> As you know we use conntrack in OVN for
> >>>    - Load balancers (both in logical switch and router)
> >>>    - ACLs in logical switch
> >>>    - NAT in load balancers.
> >>>
> >>> And this makes it complicated to decouple logical flow generation for
> >>> LBs from ACLs and NATs.  That is the reason I had to split northd node
> >>> data related to load balancers, ACLs and NATs into separate nodes.
> >>>
> >>> To give an example, we generate certain logical flows for a logical
> >>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See
> >>> build_acl_hints()).
> >>> For logical routers, a NAT can be a LB VIP too.  This becomes
> >>> difficult to decouple LB and NAT.   We could fall back to recompute
> >>> for NAT changes.  But we still need to group logical flows related to
> >>> LB and NAT.  And that's why I added lr_lbnat_data engine node in patch
> >>> 6.
> >>>
> >>
> >> Thanks for explaining. However, I wonder if these are related to LB backend
> >> updates. I agree that service creation/deletion may also be important to be
> >> handled incrementally, but they are less critical (less frequent) than the
> >> backend updates which are directly related to pod/vm creation/deletion.
> >
> > From what I understand, at least for OCP use cases they seem to be critical.
> > Also it depends on the use case I suppose.
> >
> >  The
> >> has_xxx and NAT related handling are more related to the service
> >> creation/deletion rather than LB backend updates, right?
> >
> > I think so.
> >
> > Numan
> >
> >>
> >> Thanks,
> >> Han
> >>
> >>>  I believe
> >>>> there is a reason for all the changes but I am not sure if it is
> >> directly
> >>>> related to the goal of LB backend related I-P. I understand that there
> >> are
> >>>> other aspects of LB that can be incrementally processed. But I'd
> >> recommend
> >>>> if changes necessary for this goal can be largely reduced it would help
> >> the
> >>>> review and we might be able to merge them sooner and add more but less
> >>>> impactful I-P later step by step. I guess I will have a clearer picture
> >>>> when I finish the rest of the patches, but it would be helpful if you
> >> could
> >>>> add more guidance in this cover letter.
> >>>
> >>> I'll definitely add more details in the cover letter.
> >>>
> >>> Apart from the LB related I-P, this series only adds 2 more additional
> >>> patches for I-P.
> >>> One is for the NB_Global changes and the other for SB mac binding changes.
> >>> ovn-kubernetes periodically writes in the NB_Global.options column
> >>> with a timestamp for its internal health checks
> >>> and this results in unnecessary recomputes.   After running the
> >>> kube-burner density cni tests, I figured out the most common
> >>> changes done in the NB DB and added the I-P for that.  With all these
> >>> patches applied, there are no recomputes in both northd and lflow
> >>> engine nodes
> >>> during the test duration.  Not having the I-P for NB Global for
> >>> example was resulting in significant ovn-northd CPU usage.
> >>>
> >>> Thanks again for the reviews.
> >>>
> >>> Numan
> >>>
> >>>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>>>  In order to add this support, some of the northd engine
> >>>>> node data (from struct ovn_datapath) is split and moved over to
> >>>>> new engine nodes - mainly related to load balancers, NAT and ACLs.
> >>>>>
> >>>>> Below are the scale testing results done with these patches applied
> >>>>> using ovn-heater.  The test ran the scenario  -
> >>>>> ocp-500-density-heavy.yml [1].
> >>>>>
> >>>>> With all the lflow I-P patches applied, the resuts are:
> >>>>>
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>>                         Min (s)         Median (s)      90%ile (s)
> >>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
> >>>> Failed
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>> Iteration Total         0.136883        1.129016        1.192001
> >>>>  1.204167        1.212728        0.665017        83.127099       125
> >>   0
> >>>>> Namespace.add_ports     0.005216        0.005736        0.007034
> >>>>  0.015486        0.018978        0.006211        0.776373        125
> >>   0
> >>>>> WorkerNode.bind_port    0.035030        0.046082        0.052469
> >>>>  0.058293        0.060311        0.045973        11.493259       250
> >>   0
> >>>>> WorkerNode.ping_port    0.005057        0.006727        1.047692
> >>>>  1.069253        1.071336        0.266896        66.724094       250
> >>   0
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>> The results with the present main are:
> >>>>>
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>>                         Min (s)         Median (s)      90%ile (s)
> >>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
> >>>> Failed
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>> Iteration Total         0.135491        2.223805        3.311270
> >>>>  3.339078        3.345346        1.729172        216.146495      125
> >>   0
> >>>>> Namespace.add_ports     0.005380        0.005744        0.006819
> >>>>  0.018773        0.020800        0.006292        0.786532        125
> >>   0
> >>>>> WorkerNode.bind_port    0.034179        0.046055        0.053488
> >>>>  0.058801        0.071043        0.046117        11.529311       250
> >>   0
> >>>>> WorkerNode.ping_port    0.004956        0.006952        3.086952
> >>>>  3.191743        3.192807        0.791544        197.886026      250
> >>   0
> >>>>>
> >>>>
> >> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>> [1] -
> >>>>
> >> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> >>>>>
> >>>>> v1 -> v2
> >>>>> --------
> >>>>>    * Now also maintaing array indexes for ls_lbacls, lr_nat and
> >>>>>      lr_lb_nat_data tables (similar to ovn_datapaths->array) to
> >>>>>      make the lookup effecient.  The same ovn_datapath->index
> >>>>>      is reused.
> >>>>>
> >>>>>    * Made some signficant changes to 'struct lflow_ref' in
> >> lflow-mgr.c.
> >>>>>      In v2 we don't use objdep_mgr to maintain the resource to lflow
> >>>>>      references.  Instead we maintain the 'struct lflow' pointer.
> >>>>>      With this we don't need to maintain additional hmap of lflows.
> >>>>>
> >>>>>
> >>>>> Numan Siddique (18):
> >>>>>   northd: Refactor the northd change tracking.
> >>>>>   northd: Track ovn_datapaths in northd engine track data.
> >>>>>   tests: Add a couple of tests in ovn-northd for I-P.
> >>>>>   northd: Move router ports SB PB options sync to sync_to_sb_pb node.
> >>>>>   northd: Add a new engine 'lr-nat' to manage lr NAT data.
> >>>>>   northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb
> >> data.
> >>>>>   northd:  Generate logical router's LB and NAT flows using
> >>>>>     lr_lbnat_data.
> >>>>>   northd: Don't commit dhcp response flows in the conntrack.
> >>>>>   northd: Add a new node ls_lbacls.
> >>>>>   northd: Refactor lflow management into a separate module.
> >>>>>   northd: Use lflow_ref when adding all logical flows.
> >>>>>   northd:  Move ovn_lb_datapaths from lib to northd module.
> >>>>>   northd: Handle lb changes in lflow engine.
> >>>>>   northd: Add lr_lb_nat_data handler for lflow engine node.
> >>>>>   northd: Add ls_lbacls handler for lflow engine node.
> >>>>>   northd:  Add I-P for NB_Global and SB_Global.
> >>>>>   northd: Add a noop handler for northd SB mac binding.
> >>>>>   northd: Add northd change handler for sync_to_sb_lb node.
> >>>>>
> >>>>>  lib/lb.c                   |   96 -
> >>>>>  lib/lb.h                   |   57 -
> >>>>>  lib/ovn-util.c             |   17 +-
> >>>>>  lib/ovn-util.h             |    2 +-
> >>>>>  lib/stopwatch-names.h      |    3 +
> >>>>>  northd/aging.c             |   21 +-
> >>>>>  northd/automake.mk         |   12 +-
> >>>>>  northd/en-global-config.c  |  588 ++++
> >>>>>  northd/en-global-config.h  |   65 +
> >>>>>  northd/en-lflow.c          |  123 +-
> >>>>>  northd/en-lflow.h          |    8 +
> >>>>>  northd/en-lr-lb-nat-data.c |  685 ++++
> >>>>>  northd/en-lr-lb-nat-data.h |  125 +
> >>>>>  northd/en-lr-nat.c         |  498 +++
> >>>>>  northd/en-lr-nat.h         |  137 +
> >>>>>  northd/en-ls-lb-acls.c     |  530 +++
> >>>>>  northd/en-ls-lb-acls.h     |  111 +
> >>>>>  northd/en-northd.c         |   67 +-
> >>>>>  northd/en-northd.h         |    2 +-
> >>>>>  northd/en-port-group.h     |    3 +
> >>>>>  northd/en-sync-sb.c        |  513 ++-
> >>>>>  northd/inc-proc-northd.c   |   79 +-
> >>>>>  northd/lflow-mgr.c         | 1078 ++++++
> >>>>>  northd/lflow-mgr.h         |  192 ++
> >>>>>  northd/northd.c            | 6485
> >> ++++++++++++++++--------------------
> >>>>>  northd/northd.h            |  551 ++-
> >>>>>  northd/ovn-northd.c        |    4 +
> >>>>>  tests/ovn-northd.at        |  858 ++++-
> >>>>>  28 files changed, 8827 insertions(+), 4083 deletions(-)
> >>>>>  create mode 100644 northd/en-global-config.c
> >>>>>  create mode 100644 northd/en-global-config.h
> >>>>>  create mode 100644 northd/en-lr-lb-nat-data.c
> >>>>>  create mode 100644 northd/en-lr-lb-nat-data.h
> >>>>>  create mode 100644 northd/en-lr-nat.c
> >>>>>  create mode 100644 northd/en-lr-nat.h
> >>>>>  create mode 100644 northd/en-ls-lb-acls.c
> >>>>>  create mode 100644 northd/en-ls-lb-acls.h
> >>>>>  create mode 100644 northd/lflow-mgr.c
> >>>>>  create mode 100644 northd/lflow-mgr.h
> >>>>>
> >>>>> --
> >>>>> 2.41.0
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> [email protected]
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> [email protected]
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to