On 11/16/23 22:05, Numan Siddique wrote:
> On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote:
>>
>> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote:
>>>
>>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote:
>>>>
>>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote:
>>>>>
>>>>> From: Numan Siddique <[email protected]>
>>>>>
>>>>> This patch series adds incremental processing in the lflow engine
>>>>> node to handle changes to northd and other engine nodes.
>>>>> Changed related to load balancers and NAT are mainly handled in
>>>>> this patch series.
>>>>>
>>>>> This patch series can also be found here -
>>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1
>>>>
>>>> Thanks Numan for the great improvement!
>>>
>>> Hi Han,
>>>
>>> Thanks for the review comments.  I understand it is hard to review
>>> somany patches, especially related to I-P.
>>> I appreciate the time spent on it and  the feedback.
>>>
>>>>
>>>> I spent some time these days to review the series and now at patch 10. I
>>>> need to take a break and so I just sent the comments I had so far.
>>>>
>>>> I also did scale testing for northd with
>>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp /
>>>> chassis setup, and noticed that the recompute latency has increased 20%
>>>> after the series. I think in general it is expected to have some
>>>> performance penalty for the recompute case because of the new indexes
>>>> created for I-P. However, the patch 10 "northd: Refactor lflow
>> management
>>>> into a separate module." alone introduces a 10% latency increase, which
>>>> necessitates more investigation.
>>>
>>> Before sending out the series I  did some testing on recomputes with a
>>> large OVN NB DB and SB DB
>>> (from a 500 node ovn-heater density heavy run).
>>> I'm aware of the increase in recomputes.  And I did some more testing
>>> today as well.
>>>
>>> In my testing,  I can see that the increase in latency is due to the
>>> new engine nodes added (lr_lbnat mainly)
>>> and due to housekeeping of the lflow references.  I do not see any
>>> increase due to the new lflow-mgr.c added in patch 10.
>>>
>>> I compared patch 9 and patch 10 separately and there is no difference
>>> in lflow engine node recompute time between patch 9 and 10.
>>>
>>
>> My results were different. My test profile simulates the ovn-k8s topology
>> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small
>> amount of ACLs and PGs.
>> (
>> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg
>> )
>>
>> The test results for ovn-northd recompute time are:
>> main: 1118.3
>> p9: 1129.5
>> p10: 1243.4 ==> 10% more than p9
>> p18: 1357.6
>>
>> I am not sure if the p10 increase is related to the hash change or
>> something else.
>>

I didn't review p10 in detail yet but I did run some tests and (with gcc
and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no
significant difference between p9 and p10 when running Han's scale test
profile.

Then I also tried the same thing using the 500 nodes + 50 LSP/node perf
test we already have in the repo:

https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185

Again, I didn't see any significant change between p9 and p10 on my
laptop.  I wonder what's different in our setups.

Kind of related, I posted a series to improve the in-tree perf testing
and allow us to also gather recompute stats (build the DB, reset
stopwatches and trigger a recompute, then collect stopwatches):

https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=*

Would it be an idea to try to merge both scenarios you guys used into
something that's defined as a new test in-tree?  Like that it's easier
for anyone to just run the same test.

Thanks,
Dumitru

>>> Below are the results with the time taken for the mentioned engine
>>> nodes in msecs for a recompute for some of the individual patches in
>>> the series.
>>>
>>>
>>> The sample OVN DBs have
>>>
>>> --------------------------------
>>> Resource              Total
>>> -------------------------------
>>> Logical switches    1001
>>> ----------------------------
>>> Logical routers      501
>>> ----------------------------
>>> Router ports         1501
>>> ----------------------------
>>> Switch ports         29501
>>> ----------------------------
>>> Load balancers    35253
>>> ----------------------------
>>> Load balancer group 1
>>> ----------------------------
>>> SB Logical flows    268349
>>> ----------------------------
>>> SB DB groups          509
>>> ----------------------------
>>>
>>> There is one load balancer group which has all the load balancers and
>>> it is associated with all the logical switches and routers.
>>>
>>> Below is the time taken for each engine node in msec
>>>
>>> ---------------------------------------------------------------------
>>> Engine nodes     | lb_data | northd  | lr_lbnat  | lflow  |
>>> ---------------------------------------------------------------------
>>> ovn-northd-main  | 358      | 2455    | x         | 2082     |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p1    | 373       | 2476    | x         | 2170       |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p5    | 367       | 2413    | x         | 2195       |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p6    | 354       | 688     | 1815      | 2442    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p7    | 357       | 683     | 1778      | 2806    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p9    | 352       | 682     | 1781      | 2781    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p10   | 365      | 838     | 1707      | 2812    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p13   | 362      | 1066    | 1882      | 2917    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p15   | 359      | 1046    | 1688      | 2907    |
>>> ---------------------------------------------------------------------
>>> ovn-northd-p18   | 379      | 1066    | 1729      | 2886    |
>>> ---------------------------------------------------------------------
>>>
>>> ovn-northd-p1 means ovn-northd with the patch 1 of this series,
>>> ovn-northd-p2 - patch 2 of this series and so on.
>>>
>>> Patch 6 adds a new engine node lr_lbnat and that's why the northd
>>> engine node time gets reduced from ~2413 msec to 688.
>>>
>>> With the first 10 patches,  northd engine over all time increases to
>>> 200msec compared to "main '' and lflow engine node time increases to
>>> around 800 msec.
>>> The point of this data is to show that the overall increase is mainly
>>> due to bookkeeping and new engine nodes.  I tried to optimise but I
>>> couldn't.
>>>
>>>
>>> IIMO this 20% overall increase should be fine for the following reasons
>>>    -  The scale tests with ovn-heater show significant decrease in
>>> ovn-northd CPU usage and the decrease in the overall test duration.
>>>    -  This means there were very few recomputes triggered.
>>>    -  With the OCP/K8s  kube-burner  tests,  there were 0 recomputes
>>> in the northd and lflow engine nodes.
>>>
>>>
>>>
>>>>
>>>>>
>>>>> Prior to this patch series, most of the changes to northd engine
>>>>> resulted in full recomputation of logical flows.  This series
>>>>> aims to improve the performance of ovn-northd by adding the I-P
>>>>> support.
>>>>
>>>> I'd like to clarify "most of the changes" a little. I think we should
>> focus
>>>> on the most impactful changes that happen in a cloud environment. I
>> don't
>>>> think it is realistic to achieve "most of the changes" in I-P because
>> it is
>>>> too complex (even with this series we still handle a very small part of
>> the
>>>> DB schema incrementally), but it may be realistic to handle the most
>>>> impactful changes, which are the most frequent changes in the cloud,
>>>> incrementally. Before this series, we could handle regular VIF changes
>> and
>>>> related address-sets, port-groups (some of) updates incrementally. Those
>>>> are the most common changes related to pod/VM changes in cloud. I
>> believe
>>>> the next goal is for LB changes related to pod/VMs (i.e. the LB
>> backends),
>>>> which I believe is the main goal of this series. Is my understanding
>>>> correct?
>>>>
>>> That is correct.  LB change handling is very important in my opinion
>>> when used with ovn-kubernetes because all the k8s services
>>> are mapped to OVN LBs.
>>>
>>>
>>>> While reviewing the patches, I'd say sometimes I feel a little lost
>> because
>>>> it is hard to correlate some of the changes with the above goal.
>>>
>>> I understand that.  Maybe I've done a bad job in conveying why the
>>> changes are required for LBs.
>>>
>>> Let me try to clarify a bit here.  I'll incorporate and add more
>>> details in the patch commits and cover letter in the next series.
>>>
>>> This series mainly targets handling lflow generation for load balancer
>> changes.
>>> As you know we use conntrack in OVN for
>>>    - Load balancers (both in logical switch and router)
>>>    - ACLs in logical switch
>>>    - NAT in load balancers.
>>>
>>> And this makes it complicated to decouple logical flow generation for
>>> LBs from ACLs and NATs.  That is the reason I had to split northd node
>>> data related to load balancers, ACLs and NATs into separate nodes.
>>>
>>> To give an example, we generate certain logical flows for a logical
>>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See
>>> build_acl_hints()).
>>> For logical routers, a NAT can be a LB VIP too.  This becomes
>>> difficult to decouple LB and NAT.   We could fall back to recompute
>>> for NAT changes.  But we still need to group logical flows related to
>>> LB and NAT.  And that's why I added lr_lbnat_data engine node in patch
>>> 6.
>>>
>>
>> Thanks for explaining. However, I wonder if these are related to LB backend
>> updates. I agree that service creation/deletion may also be important to be
>> handled incrementally, but they are less critical (less frequent) than the
>> backend updates which are directly related to pod/vm creation/deletion.
> 
> From what I understand, at least for OCP use cases they seem to be critical.
> Also it depends on the use case I suppose.
> 
>  The
>> has_xxx and NAT related handling are more related to the service
>> creation/deletion rather than LB backend updates, right?
> 
> I think so.
> 
> Numan
> 
>>
>> Thanks,
>> Han
>>
>>>  I believe
>>>> there is a reason for all the changes but I am not sure if it is
>> directly
>>>> related to the goal of LB backend related I-P. I understand that there
>> are
>>>> other aspects of LB that can be incrementally processed. But I'd
>> recommend
>>>> if changes necessary for this goal can be largely reduced it would help
>> the
>>>> review and we might be able to merge them sooner and add more but less
>>>> impactful I-P later step by step. I guess I will have a clearer picture
>>>> when I finish the rest of the patches, but it would be helpful if you
>> could
>>>> add more guidance in this cover letter.
>>>
>>> I'll definitely add more details in the cover letter.
>>>
>>> Apart from the LB related I-P, this series only adds 2 more additional
>>> patches for I-P.
>>> One is for the NB_Global changes and the other for SB mac binding changes.
>>> ovn-kubernetes periodically writes in the NB_Global.options column
>>> with a timestamp for its internal health checks
>>> and this results in unnecessary recomputes.   After running the
>>> kube-burner density cni tests, I figured out the most common
>>> changes done in the NB DB and added the I-P for that.  With all these
>>> patches applied, there are no recomputes in both northd and lflow
>>> engine nodes
>>> during the test duration.  Not having the I-P for NB Global for
>>> example was resulting in significant ovn-northd CPU usage.
>>>
>>> Thanks again for the reviews.
>>>
>>> Numan
>>>
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>>  In order to add this support, some of the northd engine
>>>>> node data (from struct ovn_datapath) is split and moved over to
>>>>> new engine nodes - mainly related to load balancers, NAT and ACLs.
>>>>>
>>>>> Below are the scale testing results done with these patches applied
>>>>> using ovn-heater.  The test ran the scenario  -
>>>>> ocp-500-density-heavy.yml [1].
>>>>>
>>>>> With all the lflow I-P patches applied, the resuts are:
>>>>>
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>                         Min (s)         Median (s)      90%ile (s)
>>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
>>>> Failed
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Iteration Total         0.136883        1.129016        1.192001
>>>>  1.204167        1.212728        0.665017        83.127099       125
>>   0
>>>>> Namespace.add_ports     0.005216        0.005736        0.007034
>>>>  0.015486        0.018978        0.006211        0.776373        125
>>   0
>>>>> WorkerNode.bind_port    0.035030        0.046082        0.052469
>>>>  0.058293        0.060311        0.045973        11.493259       250
>>   0
>>>>> WorkerNode.ping_port    0.005057        0.006727        1.047692
>>>>  1.069253        1.071336        0.266896        66.724094       250
>>   0
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> The results with the present main are:
>>>>>
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>                         Min (s)         Median (s)      90%ile (s)
>>>>  99%ile (s)      Max (s)         Mean (s)        Total (s)       Count
>>>> Failed
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Iteration Total         0.135491        2.223805        3.311270
>>>>  3.339078        3.345346        1.729172        216.146495      125
>>   0
>>>>> Namespace.add_ports     0.005380        0.005744        0.006819
>>>>  0.018773        0.020800        0.006292        0.786532        125
>>   0
>>>>> WorkerNode.bind_port    0.034179        0.046055        0.053488
>>>>  0.058801        0.071043        0.046117        11.529311       250
>>   0
>>>>> WorkerNode.ping_port    0.004956        0.006952        3.086952
>>>>  3.191743        3.192807        0.791544        197.886026      250
>>   0
>>>>>
>>>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> [1] -
>>>>
>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>>>>
>>>>> v1 -> v2
>>>>> --------
>>>>>    * Now also maintaing array indexes for ls_lbacls, lr_nat and
>>>>>      lr_lb_nat_data tables (similar to ovn_datapaths->array) to
>>>>>      make the lookup effecient.  The same ovn_datapath->index
>>>>>      is reused.
>>>>>
>>>>>    * Made some signficant changes to 'struct lflow_ref' in
>> lflow-mgr.c.
>>>>>      In v2 we don't use objdep_mgr to maintain the resource to lflow
>>>>>      references.  Instead we maintain the 'struct lflow' pointer.
>>>>>      With this we don't need to maintain additional hmap of lflows.
>>>>>
>>>>>
>>>>> Numan Siddique (18):
>>>>>   northd: Refactor the northd change tracking.
>>>>>   northd: Track ovn_datapaths in northd engine track data.
>>>>>   tests: Add a couple of tests in ovn-northd for I-P.
>>>>>   northd: Move router ports SB PB options sync to sync_to_sb_pb node.
>>>>>   northd: Add a new engine 'lr-nat' to manage lr NAT data.
>>>>>   northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb
>> data.
>>>>>   northd:  Generate logical router's LB and NAT flows using
>>>>>     lr_lbnat_data.
>>>>>   northd: Don't commit dhcp response flows in the conntrack.
>>>>>   northd: Add a new node ls_lbacls.
>>>>>   northd: Refactor lflow management into a separate module.
>>>>>   northd: Use lflow_ref when adding all logical flows.
>>>>>   northd:  Move ovn_lb_datapaths from lib to northd module.
>>>>>   northd: Handle lb changes in lflow engine.
>>>>>   northd: Add lr_lb_nat_data handler for lflow engine node.
>>>>>   northd: Add ls_lbacls handler for lflow engine node.
>>>>>   northd:  Add I-P for NB_Global and SB_Global.
>>>>>   northd: Add a noop handler for northd SB mac binding.
>>>>>   northd: Add northd change handler for sync_to_sb_lb node.
>>>>>
>>>>>  lib/lb.c                   |   96 -
>>>>>  lib/lb.h                   |   57 -
>>>>>  lib/ovn-util.c             |   17 +-
>>>>>  lib/ovn-util.h             |    2 +-
>>>>>  lib/stopwatch-names.h      |    3 +
>>>>>  northd/aging.c             |   21 +-
>>>>>  northd/automake.mk         |   12 +-
>>>>>  northd/en-global-config.c  |  588 ++++
>>>>>  northd/en-global-config.h  |   65 +
>>>>>  northd/en-lflow.c          |  123 +-
>>>>>  northd/en-lflow.h          |    8 +
>>>>>  northd/en-lr-lb-nat-data.c |  685 ++++
>>>>>  northd/en-lr-lb-nat-data.h |  125 +
>>>>>  northd/en-lr-nat.c         |  498 +++
>>>>>  northd/en-lr-nat.h         |  137 +
>>>>>  northd/en-ls-lb-acls.c     |  530 +++
>>>>>  northd/en-ls-lb-acls.h     |  111 +
>>>>>  northd/en-northd.c         |   67 +-
>>>>>  northd/en-northd.h         |    2 +-
>>>>>  northd/en-port-group.h     |    3 +
>>>>>  northd/en-sync-sb.c        |  513 ++-
>>>>>  northd/inc-proc-northd.c   |   79 +-
>>>>>  northd/lflow-mgr.c         | 1078 ++++++
>>>>>  northd/lflow-mgr.h         |  192 ++
>>>>>  northd/northd.c            | 6485
>> ++++++++++++++++--------------------
>>>>>  northd/northd.h            |  551 ++-
>>>>>  northd/ovn-northd.c        |    4 +
>>>>>  tests/ovn-northd.at        |  858 ++++-
>>>>>  28 files changed, 8827 insertions(+), 4083 deletions(-)
>>>>>  create mode 100644 northd/en-global-config.c
>>>>>  create mode 100644 northd/en-global-config.h
>>>>>  create mode 100644 northd/en-lr-lb-nat-data.c
>>>>>  create mode 100644 northd/en-lr-lb-nat-data.h
>>>>>  create mode 100644 northd/en-lr-nat.c
>>>>>  create mode 100644 northd/en-lr-nat.h
>>>>>  create mode 100644 northd/en-ls-lb-acls.c
>>>>>  create mode 100644 northd/en-ls-lb-acls.h
>>>>>  create mode 100644 northd/lflow-mgr.c
>>>>>  create mode 100644 northd/lflow-mgr.h
>>>>>
>>>>> --
>>>>> 2.41.0
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [email protected]
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to