Re: [ovs-dev] [PATCH ovn v2 00/18] northd lflow incremental processing

Dumitru Ceara Tue, 28 Nov 2023 08:12:50 -0800

On 11/24/23 16:41, Numan Siddique wrote:
> On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 11/16/23 22:05, Numan Siddique wrote:
>>> On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote:
>>>>
>>>> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote:
>>>>>
>>>>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote:
>>>>>>
>>>>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote:
>>>>>>>
>>>>>>> From: Numan Siddique <[email protected]>
>>>>>>>
>>>>>>> This patch series adds incremental processing in the lflow engine
>>>>>>> node to handle changes to northd and other engine nodes.
>>>>>>> Changed related to load balancers and NAT are mainly handled in
>>>>>>> this patch series.
>>>>>>>
>>>>>>> This patch series can also be found here -
>>>>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1
>>>>>>
>>>>>> Thanks Numan for the great improvement!
>>>>>
>>>>> Hi Han,
>>>>>
>>>>> Thanks for the review comments.  I understand it is hard to review
>>>>> somany patches, especially related to I-P.
>>>>> I appreciate the time spent on it and  the feedback.
>>>>>
>>>>>>
>>>>>> I spent some time these days to review the series and now at patch 10. I
>>>>>> need to take a break and so I just sent the comments I had so far.
>>>>>>
>>>>>> I also did scale testing for northd with
>>>>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
>>>>>> chassis setup, and noticed that the recompute latency has increased 20%
>>>>>> after the series. I think in general it is expected to have some
>>>>>> performance penalty for the recompute case because of the new indexes
>>>>>> created for I-P. However, the patch 10 "northd: Refactor lflow
>>>> management
>>>>>> into a separate module." alone introduces a 10% latency increase, which
>>>>>> necessitates more investigation.
>>>>>
>>>>> Before sending out the series I  did some testing on recomputes with a
>>>>> large OVN NB DB and SB DB
>>>>> (from a 500 node ovn-heater density heavy run).
>>>>> I'm aware of the increase in recomputes.  And I did some more testing
>>>>> today as well.
>>>>>
>>>>> In my testing,  I can see that the increase in latency is due to the
>>>>> new engine nodes added (lr_lbnat mainly)
>>>>> and due to housekeeping of the lflow references.  I do not see any
>>>>> increase due to the new lflow-mgr.c added in patch 10.
>>>>>
>>>>> I compared patch 9 and patch 10 separately and there is no difference
>>>>> in lflow engine node recompute time between patch 9 and 10.
>>>>>
>>>>
>>>> My results were different. My test profile simulates the ovn-k8s topology
>>>> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small
>>>> amount of ACLs and PGs.
>>>> (
>>>> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg
>>>> )
>>>>
>>>> The test results for ovn-northd recompute time are:
>>>> main: 1118.3
>>>> p9: 1129.5
>>>> p10: 1243.4 ==> 10% more than p9
>>>> p18: 1357.6
>>>>
>>>> I am not sure if the p10 increase is related to the hash change or
>>>> something else.
>>>>
>>
>> I didn't review p10 in detail yet but I did run some tests and (with gcc
>> and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no
>> significant difference between p9 and p10 when running Han's scale test
>> profile.
>>
>> Then I also tried the same thing using the 500 nodes + 50 LSP/node perf
>> test we already have in the repo:
>>
>> https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185
>>
>> Again, I didn't see any significant change between p9 and p10 on my
>> laptop.  I wonder what's different in our setups.
>>
>> Kind of related, I posted a series to improve the in-tree perf testing
>> and allow us to also gather recompute stats (build the DB, reset
>> stopwatches and trigger a recompute, then collect stopwatches):
>>
>> https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=*
>>
>> Would it be an idea to try to merge both scenarios you guys used into
>> something that's defined as a new test in-tree?  Like that it's easier
>> for anyone to just run the same test.
> 
> That sounds good to me. I'll take a look at your patches soon.
> 
> Thanks for testing them out.  Can you also please compare OVN main vs
> p10 of this series ?
>


Sorry for the delay in replying; I ran some more tests both with Han's
benchmark and also with the in-tree check-perf 500 node + 50 LSP/node
test.  I did this with debug logs disabled and ran them multiple times
to collect some more relevant averages (the results did seem more stable).

This is what I measured (where "pX" == "patch X of this series" and
values are averages in milliseconds):

-------------------------------------------------------------------------------
| Benchmark | Recompute "main" | Recompute p9 | Recompute p10 | Recompute p16 |
|------------------------------------------------------------------------------
|   Han's   |      1552        |    1560      |     1676      |     1704      |
-------------------------------------------------------------------------------
|  in-tree  |      1146        |    1250      |     1281      |     1337      |
-------------------------------------------------------------------------------

So, in both cases, I see some increase in time it takes to run full
recomputes.  However, IMO, a hit of 10-15% in time it takes to recompute
might be acceptable as long as the most common cases that used to
trigger recompute are now handled incrementally.

Regards,
Dumitru

> Thanks
> Numan
> 
> 
>>
>> Thanks,
>> Dumitru
>>
>>>>> Below are the results with the time taken for the mentioned engine
>>>>> nodes in msecs for a recompute for some of the individual patches in
>>>>> the series.
>>>>>
>>>>>
>>>>> The sample OVN DBs have
>>>>>
>>>>> --------------------------------
>>>>> Resource              Total
>>>>> -------------------------------
>>>>> Logical switches    1001
>>>>> ----------------------------
>>>>> Logical routers      501
>>>>> ----------------------------
>>>>> Router ports         1501
>>>>> ----------------------------
>>>>> Switch ports         29501
>>>>> ----------------------------
>>>>> Load balancers    35253
>>>>> ----------------------------
>>>>> Load balancer group 1
>>>>> ----------------------------
>>>>> SB Logical flows    268349
>>>>> ----------------------------
>>>>> SB DB groups          509
>>>>> ----------------------------
>>>>>
>>>>> There is one load balancer group which has all the load balancers and
>>>>> it is associated with all the logical switches and routers.
>>>>>
>>>>> Below is the time taken for each engine node in msec
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> Engine nodes     | lb_data | northd  | lr_lbnat  | lflow  |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-main  | 358      | 2455    | x         | 2082     |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p1    | 373       | 2476    | x         | 2170       |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p5    | 367       | 2413    | x         | 2195       |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p6    | 354       | 688     | 1815      | 2442    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p7    | 357       | 683     | 1778      | 2806    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p9    | 352       | 682     | 1781      | 2781    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p10   | 365      | 838     | 1707      | 2812    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p13   | 362      | 1066    | 1882      | 2917    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p15   | 359      | 1046    | 1688      | 2907    |
>>>>> ---------------------------------------------------------------------
>>>>> ovn-northd-p18   | 379      | 1066    | 1729      | 2886    |
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> ovn-northd-p1 means ovn-northd with the patch 1 of this series,
>>>>> ovn-northd-p2 - patch 2 of this series and so on.
>>>>>
>>>>> Patch 6 adds a new engine node lr_lbnat and that's why the northd
>>>>> engine node time gets reduced from ~2413 msec to 688.
>>>>>
>>>>> With the first 10 patches,  northd engine over all time increases to
>>>>> 200msec compared to "main '' and lflow engine node time increases to
>>>>> around 800 msec.
>>>>> The point of this data is to show that the overall increase is mainly
>>>>> due to bookkeeping and new engine nodes.  I tried to optimise but I
>>>>> couldn't.
>>>>>
>>>>>
>>>>> IIMO this 20% overall increase should be fine for the following reasons
>>>>>    -  The scale tests with ovn-heater show significant decrease in
>>>>> ovn-northd CPU usage and the decrease in the overall test duration.
>>>>>    -  This means there were very few recomputes triggered.
>>>>>    -  With the OCP/K8s  kube-burner  tests,  there were 0 recomputes
>>>>> in the northd and lflow engine nodes.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Prior to this patch series, most of the changes to northd engine
>>>>>>> resulted in full recomputation of logical flows.  This series
>>>>>>> aims to improve the performance of ovn-northd by adding the I-P
>>>>>>> support.
>>>>>>
>>>>>> I'd like to clarify "most of the changes" a little. I think we should
>>>> focus
>>>>>> on the most impactful changes that happen in a cloud environment. I
>>>> don't
>>>>>> think it is realistic to achieve "most of the changes" in I-P because
>>>> it is
>>>>>> too complex (even with this series we still handle a very small part of
>>>> the
>>>>>> DB schema incrementally), but it may be realistic to handle the most
>>>>>> impactful changes, which are the most frequent changes in the cloud,
>>>>>> incrementally. Before this series, we could handle regular VIF changes
>>>> and
>>>>>> related address-sets, port-groups (some of) updates incrementally. Those
>>>>>> are the most common changes related to pod/VM changes in cloud. I
>>>> believe
>>>>>> the next goal is for LB changes related to pod/VMs (i.e. the LB
>>>> backends),
>>>>>> which I believe is the main goal of this series. Is my understanding
>>>>>> correct?
>>>>>>
>>>>> That is correct.  LB change handling is very important in my opinion
>>>>> when used with ovn-kubernetes because all the k8s services
>>>>> are mapped to OVN LBs.
>>>>>
>>>>>
>>>>>> While reviewing the patches, I'd say sometimes I feel a little lost
>>>> because
>>>>>> it is hard to correlate some of the changes with the above goal.
>>>>>
>>>>> I understand that.  Maybe I've done a bad job in conveying why the
>>>>> changes are required for LBs.
>>>>>
>>>>> Let me try to clarify a bit here.  I'll incorporate and add more
>>>>> details in the patch commits and cover letter in the next series.
>>>>>
>>>>> This series mainly targets handling lflow generation for load balancer
>>>> changes.
>>>>> As you know we use conntrack in OVN for
>>>>>    - Load balancers (both in logical switch and router)
>>>>>    - ACLs in logical switch
>>>>>    - NAT in load balancers.
>>>>>
>>>>> And this makes it complicated to decouple logical flow generation for
>>>>> LBs from ACLs and NATs.  That is the reason I had to split northd node
>>>>> data related to load balancers, ACLs and NATs into separate nodes.
>>>>>
>>>>> To give an example, we generate certain logical flows for a logical
>>>>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See
>>>>> build_acl_hints()).
>>>>> For logical routers, a NAT can be a LB VIP too.  This becomes
>>>>> difficult to decouple LB and NAT.   We could fall back to recompute
>>>>> for NAT changes.  But we still need to group logical flows related to
>>>>> LB and NAT.  And that's why I added lr_lbnat_data engine node in patch
>>>>> 6.
>>>>>
>>>>
>>>> Thanks for explaining. However, I wonder if these are related to LB backend
>>>> updates. I agree that service creation/deletion may also be important to be
>>>> handled incrementally, but they are less critical (less frequent) than the
>>>> backend updates which are directly related to pod/vm creation/deletion.
>>>
>>> From what I understand, at least for OCP use cases they seem to be critical.
>>> Also it depends on the use case I suppose.
>>>
>>>  The
>>>> has_xxx and NAT related handling are more related to the service
>>>> creation/deletion rather than LB backend updates, right?
>>>
>>> I think so.
>>>
>>> Numan
>>>
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>>  I believe
>>>>>> there is a reason for all the changes but I am not sure if it is
>>>> directly
>>>>>> related to the goal of LB backend related I-P. I understand that there
>>>> are
>>>>>> other aspects of LB that can be incrementally processed. But I'd
>>>> recommend
>>>>>> if changes necessary for this goal can be largely reduced it would help
>>>> the
>>>>>> review and we might be able to merge them sooner and add more but less
>>>>>> impactful I-P later step by step. I guess I will have a clearer picture
>>>>>> when I finish the rest of the patches, but it would be helpful if you
>>>> could
>>>>>> add more guidance in this cover letter.
>>>>>
>>>>> I'll definitely add more details in the cover letter.
>>>>>
>>>>> Apart from the LB related I-P, this series only adds 2 more additional
>>>>> patches for I-P.
>>>>> One is for the NB_Global changes and the other for SB mac binding changes.
>>>>> ovn-kubernetes periodically writes in the NB_Global.options column
>>>>> with a timestamp for its internal health checks
>>>>> and this results in unnecessary recomputes.   After running the
>>>>> kube-burner density cni tests, I figured out the most common
>>>>> changes done in the NB DB and added the I-P for that.  With all these
>>>>> patches applied, there are no recomputes in both northd and lflow
>>>>> engine nodes
>>>>> during the test duration.  Not having the I-P for NB Global for
>>>>> example was resulting in significant ovn-northd CPU usage.
>>>>>
>>>>> Thanks again for the reviews.
>>>>>
>>>>> Numan
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Han
>>>>>>
>>>>>>>  In order to add this support, some of the northd engine
>>>>>>> node data (from struct ovn_datapath) is split and moved over to
>>>>>>> new engine nodes - mainly related to load balancers, NAT and ACLs.
>>>>>>>
>>>>>>> Below are the scale testing results done with these patches applied
>>>>>>> using ovn-heater.  The test ran the scenario  -
>>>>>>> ocp-500-density-heavy.yml [1].
>>>>>>>
>>>>>>> With all the lflow I-P patches applied, the resuts are:
>>>>>>>
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>                         Min (s)         Median (s)      90%ile (s)
>>>>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
>>>>>> Failed
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>> Iteration Total         0.136883        1.129016        1.192001
>>>>>>  1.204167        1.212728        0.665017        83.127099       125
>>>>   0
>>>>>>> Namespace.add_ports     0.005216        0.005736        0.007034
>>>>>>  0.015486        0.018978        0.006211        0.776373        125
>>>>   0
>>>>>>> WorkerNode.bind_port    0.035030        0.046082        0.052469
>>>>>>  0.058293        0.060311        0.045973        11.493259       250
>>>>   0
>>>>>>> WorkerNode.ping_port    0.005057        0.006727        1.047692
>>>>>>  1.069253        1.071336        0.266896        66.724094       250
>>>>   0
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> The results with the present main are:
>>>>>>>
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>                         Min (s)         Median (s)      90%ile (s)
>>>>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
>>>>>> Failed
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>> Iteration Total         0.135491        2.223805        3.311270
>>>>>>  3.339078        3.345346        1.729172        216.146495      125
>>>>   0
>>>>>>> Namespace.add_ports     0.005380        0.005744        0.006819
>>>>>>  0.018773        0.020800        0.006292        0.786532        125
>>>>   0
>>>>>>> WorkerNode.bind_port    0.034179        0.046055        0.053488
>>>>>>  0.058801        0.071043        0.046117        11.529311       250
>>>>   0
>>>>>>> WorkerNode.ping_port    0.004956        0.006952        3.086952
>>>>>>  3.191743        3.192807        0.791544        197.886026      250
>>>>   0
>>>>>>>
>>>>>>
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> [1] -
>>>>>>
>>>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>>>>>>
>>>>>>> v1 -> v2
>>>>>>> --------
>>>>>>>    * Now also maintaing array indexes for ls_lbacls, lr_nat and
>>>>>>>      lr_lb_nat_data tables (similar to ovn_datapaths->array) to
>>>>>>>      make the lookup effecient.  The same ovn_datapath->index
>>>>>>>      is reused.
>>>>>>>
>>>>>>>    * Made some signficant changes to 'struct lflow_ref' in
>>>> lflow-mgr.c.
>>>>>>>      In v2 we don't use objdep_mgr to maintain the resource to lflow
>>>>>>>      references.  Instead we maintain the 'struct lflow' pointer.
>>>>>>>      With this we don't need to maintain additional hmap of lflows.
>>>>>>>
>>>>>>>
>>>>>>> Numan Siddique (18):
>>>>>>>   northd: Refactor the northd change tracking.
>>>>>>>   northd: Track ovn_datapaths in northd engine track data.
>>>>>>>   tests: Add a couple of tests in ovn-northd for I-P.
>>>>>>>   northd: Move router ports SB PB options sync to sync_to_sb_pb node.
>>>>>>>   northd: Add a new engine 'lr-nat' to manage lr NAT data.
>>>>>>>   northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb
>>>> data.
>>>>>>>   northd:  Generate logical router's LB and NAT flows using
>>>>>>>     lr_lbnat_data.
>>>>>>>   northd: Don't commit dhcp response flows in the conntrack.
>>>>>>>   northd: Add a new node ls_lbacls.
>>>>>>>   northd: Refactor lflow management into a separate module.
>>>>>>>   northd: Use lflow_ref when adding all logical flows.
>>>>>>>   northd:  Move ovn_lb_datapaths from lib to northd module.
>>>>>>>   northd: Handle lb changes in lflow engine.
>>>>>>>   northd: Add lr_lb_nat_data handler for lflow engine node.
>>>>>>>   northd: Add ls_lbacls handler for lflow engine node.
>>>>>>>   northd:  Add I-P for NB_Global and SB_Global.
>>>>>>>   northd: Add a noop handler for northd SB mac binding.
>>>>>>>   northd: Add northd change handler for sync_to_sb_lb node.
>>>>>>>
>>>>>>>  lib/lb.c                   |   96 -
>>>>>>>  lib/lb.h                   |   57 -
>>>>>>>  lib/ovn-util.c             |   17 +-
>>>>>>>  lib/ovn-util.h             |    2 +-
>>>>>>>  lib/stopwatch-names.h      |    3 +
>>>>>>>  northd/aging.c             |   21 +-
>>>>>>>  northd/automake.mk         |   12 +-
>>>>>>>  northd/en-global-config.c  |  588 ++++
>>>>>>>  northd/en-global-config.h  |   65 +
>>>>>>>  northd/en-lflow.c          |  123 +-
>>>>>>>  northd/en-lflow.h          |    8 +
>>>>>>>  northd/en-lr-lb-nat-data.c |  685 ++++
>>>>>>>  northd/en-lr-lb-nat-data.h |  125 +
>>>>>>>  northd/en-lr-nat.c         |  498 +++
>>>>>>>  northd/en-lr-nat.h         |  137 +
>>>>>>>  northd/en-ls-lb-acls.c     |  530 +++
>>>>>>>  northd/en-ls-lb-acls.h     |  111 +
>>>>>>>  northd/en-northd.c         |   67 +-
>>>>>>>  northd/en-northd.h         |    2 +-
>>>>>>>  northd/en-port-group.h     |    3 +
>>>>>>>  northd/en-sync-sb.c        |  513 ++-
>>>>>>>  northd/inc-proc-northd.c   |   79 +-
>>>>>>>  northd/lflow-mgr.c         | 1078 ++++++
>>>>>>>  northd/lflow-mgr.h         |  192 ++
>>>>>>>  northd/northd.c            | 6485
>>>> ++++++++++++++++--------------------
>>>>>>>  northd/northd.h            |  551 ++-
>>>>>>>  northd/ovn-northd.c        |    4 +
>>>>>>>  tests/ovn-northd.at        |  858 ++++-
>>>>>>>  28 files changed, 8827 insertions(+), 4083 deletions(-)
>>>>>>>  create mode 100644 northd/en-global-config.c
>>>>>>>  create mode 100644 northd/en-global-config.h
>>>>>>>  create mode 100644 northd/en-lr-lb-nat-data.c
>>>>>>>  create mode 100644 northd/en-lr-lb-nat-data.h
>>>>>>>  create mode 100644 northd/en-lr-nat.c
>>>>>>>  create mode 100644 northd/en-lr-nat.h
>>>>>>>  create mode 100644 northd/en-ls-lb-acls.c
>>>>>>>  create mode 100644 northd/en-ls-lb-acls.h
>>>>>>>  create mode 100644 northd/lflow-mgr.c
>>>>>>>  create mode 100644 northd/lflow-mgr.h
>>>>>>>
>>>>>>> --
>>>>>>> 2.41.0
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [email protected]
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2 00/18] northd lflow incremental processing

Reply via email to