On 11/16/23 22:05, Numan Siddique wrote: > On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote: >> >> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote: >>> >>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote: >>>> >>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote: >>>>> >>>>> From: Numan Siddique <[email protected]> >>>>> >>>>> This patch series adds incremental processing in the lflow engine >>>>> node to handle changes to northd and other engine nodes. >>>>> Changed related to load balancers and NAT are mainly handled in >>>>> this patch series. >>>>> >>>>> This patch series can also be found here - >>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 >>>> >>>> Thanks Numan for the great improvement! >>> >>> Hi Han, >>> >>> Thanks for the review comments. I understand it is hard to review >>> somany patches, especially related to I-P. >>> I appreciate the time spent on it and the feedback. >>> >>>> >>>> I spent some time these days to review the series and now at patch 10. I >>>> need to take a break and so I just sent the comments I had so far. >>>> >>>> I also did scale testing for northd with >>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / >>>> chassis setup, and noticed that the recompute latency has increased 20% >>>> after the series. I think in general it is expected to have some >>>> performance penalty for the recompute case because of the new indexes >>>> created for I-P. However, the patch 10 "northd: Refactor lflow >> management >>>> into a separate module." alone introduces a 10% latency increase, which >>>> necessitates more investigation. >>> >>> Before sending out the series I did some testing on recomputes with a >>> large OVN NB DB and SB DB >>> (from a 500 node ovn-heater density heavy run). >>> I'm aware of the increase in recomputes. And I did some more testing >>> today as well. >>> >>> In my testing, I can see that the increase in latency is due to the >>> new engine nodes added (lr_lbnat mainly) >>> and due to housekeeping of the lflow references. I do not see any >>> increase due to the new lflow-mgr.c added in patch 10. >>> >>> I compared patch 9 and patch 10 separately and there is no difference >>> in lflow engine node recompute time between patch 9 and 10. >>> >> >> My results were different. My test profile simulates the ovn-k8s topology >> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small >> amount of ACLs and PGs. >> ( >> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg >> ) >> >> The test results for ovn-northd recompute time are: >> main: 1118.3 >> p9: 1129.5 >> p10: 1243.4 ==> 10% more than p9 >> p18: 1357.6 >> >> I am not sure if the p10 increase is related to the hash change or >> something else. >>
I didn't review p10 in detail yet but I did run some tests and (with gcc and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no significant difference between p9 and p10 when running Han's scale test profile. Then I also tried the same thing using the 500 nodes + 50 LSP/node perf test we already have in the repo: https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 Again, I didn't see any significant change between p9 and p10 on my laptop. I wonder what's different in our setups. Kind of related, I posted a series to improve the in-tree perf testing and allow us to also gather recompute stats (build the DB, reset stopwatches and trigger a recompute, then collect stopwatches): https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* Would it be an idea to try to merge both scenarios you guys used into something that's defined as a new test in-tree? Like that it's easier for anyone to just run the same test. Thanks, Dumitru >>> Below are the results with the time taken for the mentioned engine >>> nodes in msecs for a recompute for some of the individual patches in >>> the series. >>> >>> >>> The sample OVN DBs have >>> >>> -------------------------------- >>> Resource Total >>> ------------------------------- >>> Logical switches 1001 >>> ---------------------------- >>> Logical routers 501 >>> ---------------------------- >>> Router ports 1501 >>> ---------------------------- >>> Switch ports 29501 >>> ---------------------------- >>> Load balancers 35253 >>> ---------------------------- >>> Load balancer group 1 >>> ---------------------------- >>> SB Logical flows 268349 >>> ---------------------------- >>> SB DB groups 509 >>> ---------------------------- >>> >>> There is one load balancer group which has all the load balancers and >>> it is associated with all the logical switches and routers. >>> >>> Below is the time taken for each engine node in msec >>> >>> --------------------------------------------------------------------- >>> Engine nodes | lb_data | northd | lr_lbnat | lflow | >>> --------------------------------------------------------------------- >>> ovn-northd-main | 358 | 2455 | x | 2082 | >>> --------------------------------------------------------------------- >>> ovn-northd-p1 | 373 | 2476 | x | 2170 | >>> --------------------------------------------------------------------- >>> ovn-northd-p5 | 367 | 2413 | x | 2195 | >>> --------------------------------------------------------------------- >>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | >>> --------------------------------------------------------------------- >>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | >>> --------------------------------------------------------------------- >>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | >>> --------------------------------------------------------------------- >>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | >>> --------------------------------------------------------------------- >>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | >>> --------------------------------------------------------------------- >>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | >>> --------------------------------------------------------------------- >>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | >>> --------------------------------------------------------------------- >>> >>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, >>> ovn-northd-p2 - patch 2 of this series and so on. >>> >>> Patch 6 adds a new engine node lr_lbnat and that's why the northd >>> engine node time gets reduced from ~2413 msec to 688. >>> >>> With the first 10 patches, northd engine over all time increases to >>> 200msec compared to "main '' and lflow engine node time increases to >>> around 800 msec. >>> The point of this data is to show that the overall increase is mainly >>> due to bookkeeping and new engine nodes. I tried to optimise but I >>> couldn't. >>> >>> >>> IIMO this 20% overall increase should be fine for the following reasons >>> - The scale tests with ovn-heater show significant decrease in >>> ovn-northd CPU usage and the decrease in the overall test duration. >>> - This means there were very few recomputes triggered. >>> - With the OCP/K8s kube-burner tests, there were 0 recomputes >>> in the northd and lflow engine nodes. >>> >>> >>> >>>> >>>>> >>>>> Prior to this patch series, most of the changes to northd engine >>>>> resulted in full recomputation of logical flows. This series >>>>> aims to improve the performance of ovn-northd by adding the I-P >>>>> support. >>>> >>>> I'd like to clarify "most of the changes" a little. I think we should >> focus >>>> on the most impactful changes that happen in a cloud environment. I >> don't >>>> think it is realistic to achieve "most of the changes" in I-P because >> it is >>>> too complex (even with this series we still handle a very small part of >> the >>>> DB schema incrementally), but it may be realistic to handle the most >>>> impactful changes, which are the most frequent changes in the cloud, >>>> incrementally. Before this series, we could handle regular VIF changes >> and >>>> related address-sets, port-groups (some of) updates incrementally. Those >>>> are the most common changes related to pod/VM changes in cloud. I >> believe >>>> the next goal is for LB changes related to pod/VMs (i.e. the LB >> backends), >>>> which I believe is the main goal of this series. Is my understanding >>>> correct? >>>> >>> That is correct. LB change handling is very important in my opinion >>> when used with ovn-kubernetes because all the k8s services >>> are mapped to OVN LBs. >>> >>> >>>> While reviewing the patches, I'd say sometimes I feel a little lost >> because >>>> it is hard to correlate some of the changes with the above goal. >>> >>> I understand that. Maybe I've done a bad job in conveying why the >>> changes are required for LBs. >>> >>> Let me try to clarify a bit here. I'll incorporate and add more >>> details in the patch commits and cover letter in the next series. >>> >>> This series mainly targets handling lflow generation for load balancer >> changes. >>> As you know we use conntrack in OVN for >>> - Load balancers (both in logical switch and router) >>> - ACLs in logical switch >>> - NAT in load balancers. >>> >>> And this makes it complicated to decouple logical flow generation for >>> LBs from ACLs and NATs. That is the reason I had to split northd node >>> data related to load balancers, ACLs and NATs into separate nodes. >>> >>> To give an example, we generate certain logical flows for a logical >>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See >>> build_acl_hints()). >>> For logical routers, a NAT can be a LB VIP too. This becomes >>> difficult to decouple LB and NAT. We could fall back to recompute >>> for NAT changes. But we still need to group logical flows related to >>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch >>> 6. >>> >> >> Thanks for explaining. However, I wonder if these are related to LB backend >> updates. I agree that service creation/deletion may also be important to be >> handled incrementally, but they are less critical (less frequent) than the >> backend updates which are directly related to pod/vm creation/deletion. > > From what I understand, at least for OCP use cases they seem to be critical. > Also it depends on the use case I suppose. > > The >> has_xxx and NAT related handling are more related to the service >> creation/deletion rather than LB backend updates, right? > > I think so. > > Numan > >> >> Thanks, >> Han >> >>> I believe >>>> there is a reason for all the changes but I am not sure if it is >> directly >>>> related to the goal of LB backend related I-P. I understand that there >> are >>>> other aspects of LB that can be incrementally processed. But I'd >> recommend >>>> if changes necessary for this goal can be largely reduced it would help >> the >>>> review and we might be able to merge them sooner and add more but less >>>> impactful I-P later step by step. I guess I will have a clearer picture >>>> when I finish the rest of the patches, but it would be helpful if you >> could >>>> add more guidance in this cover letter. >>> >>> I'll definitely add more details in the cover letter. >>> >>> Apart from the LB related I-P, this series only adds 2 more additional >>> patches for I-P. >>> One is for the NB_Global changes and the other for SB mac binding changes. >>> ovn-kubernetes periodically writes in the NB_Global.options column >>> with a timestamp for its internal health checks >>> and this results in unnecessary recomputes. After running the >>> kube-burner density cni tests, I figured out the most common >>> changes done in the NB DB and added the I-P for that. With all these >>> patches applied, there are no recomputes in both northd and lflow >>> engine nodes >>> during the test duration. Not having the I-P for NB Global for >>> example was resulting in significant ovn-northd CPU usage. >>> >>> Thanks again for the reviews. >>> >>> Numan >>> >>>> >>>> Thanks, >>>> Han >>>> >>>>> In order to add this support, some of the northd engine >>>>> node data (from struct ovn_datapath) is split and moved over to >>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. >>>>> >>>>> Below are the scale testing results done with these patches applied >>>>> using ovn-heater. The test ran the scenario - >>>>> ocp-500-density-heavy.yml [1]. >>>>> >>>>> With all the lflow I-P patches applied, the resuts are: >>>>> >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Min (s) Median (s) 90%ile (s) >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>> Failed >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Iteration Total 0.136883 1.129016 1.192001 >>>> 1.204167 1.212728 0.665017 83.127099 125 >> 0 >>>>> Namespace.add_ports 0.005216 0.005736 0.007034 >>>> 0.015486 0.018978 0.006211 0.776373 125 >> 0 >>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 >>>> 0.058293 0.060311 0.045973 11.493259 250 >> 0 >>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 >>>> 1.069253 1.071336 0.266896 66.724094 250 >> 0 >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> The results with the present main are: >>>>> >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Min (s) Median (s) 90%ile (s) >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>> Failed >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Iteration Total 0.135491 2.223805 3.311270 >>>> 3.339078 3.345346 1.729172 216.146495 125 >> 0 >>>>> Namespace.add_ports 0.005380 0.005744 0.006819 >>>> 0.018773 0.020800 0.006292 0.786532 125 >> 0 >>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 >>>> 0.058801 0.071043 0.046117 11.529311 250 >> 0 >>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 >>>> 3.191743 3.192807 0.791544 197.886026 250 >> 0 >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> [1] - >>>> >> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml >>>>> >>>>> v1 -> v2 >>>>> -------- >>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and >>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to >>>>> make the lookup effecient. The same ovn_datapath->index >>>>> is reused. >>>>> >>>>> * Made some signficant changes to 'struct lflow_ref' in >> lflow-mgr.c. >>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow >>>>> references. Instead we maintain the 'struct lflow' pointer. >>>>> With this we don't need to maintain additional hmap of lflows. >>>>> >>>>> >>>>> Numan Siddique (18): >>>>> northd: Refactor the northd change tracking. >>>>> northd: Track ovn_datapaths in northd engine track data. >>>>> tests: Add a couple of tests in ovn-northd for I-P. >>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. >>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. >>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb >> data. >>>>> northd: Generate logical router's LB and NAT flows using >>>>> lr_lbnat_data. >>>>> northd: Don't commit dhcp response flows in the conntrack. >>>>> northd: Add a new node ls_lbacls. >>>>> northd: Refactor lflow management into a separate module. >>>>> northd: Use lflow_ref when adding all logical flows. >>>>> northd: Move ovn_lb_datapaths from lib to northd module. >>>>> northd: Handle lb changes in lflow engine. >>>>> northd: Add lr_lb_nat_data handler for lflow engine node. >>>>> northd: Add ls_lbacls handler for lflow engine node. >>>>> northd: Add I-P for NB_Global and SB_Global. >>>>> northd: Add a noop handler for northd SB mac binding. >>>>> northd: Add northd change handler for sync_to_sb_lb node. >>>>> >>>>> lib/lb.c | 96 - >>>>> lib/lb.h | 57 - >>>>> lib/ovn-util.c | 17 +- >>>>> lib/ovn-util.h | 2 +- >>>>> lib/stopwatch-names.h | 3 + >>>>> northd/aging.c | 21 +- >>>>> northd/automake.mk | 12 +- >>>>> northd/en-global-config.c | 588 ++++ >>>>> northd/en-global-config.h | 65 + >>>>> northd/en-lflow.c | 123 +- >>>>> northd/en-lflow.h | 8 + >>>>> northd/en-lr-lb-nat-data.c | 685 ++++ >>>>> northd/en-lr-lb-nat-data.h | 125 + >>>>> northd/en-lr-nat.c | 498 +++ >>>>> northd/en-lr-nat.h | 137 + >>>>> northd/en-ls-lb-acls.c | 530 +++ >>>>> northd/en-ls-lb-acls.h | 111 + >>>>> northd/en-northd.c | 67 +- >>>>> northd/en-northd.h | 2 +- >>>>> northd/en-port-group.h | 3 + >>>>> northd/en-sync-sb.c | 513 ++- >>>>> northd/inc-proc-northd.c | 79 +- >>>>> northd/lflow-mgr.c | 1078 ++++++ >>>>> northd/lflow-mgr.h | 192 ++ >>>>> northd/northd.c | 6485 >> ++++++++++++++++-------------------- >>>>> northd/northd.h | 551 ++- >>>>> northd/ovn-northd.c | 4 + >>>>> tests/ovn-northd.at | 858 ++++- >>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) >>>>> create mode 100644 northd/en-global-config.c >>>>> create mode 100644 northd/en-global-config.h >>>>> create mode 100644 northd/en-lr-lb-nat-data.c >>>>> create mode 100644 northd/en-lr-lb-nat-data.h >>>>> create mode 100644 northd/en-lr-nat.c >>>>> create mode 100644 northd/en-lr-nat.h >>>>> create mode 100644 northd/en-ls-lb-acls.c >>>>> create mode 100644 northd/en-ls-lb-acls.h >>>>> create mode 100644 northd/lflow-mgr.c >>>>> create mode 100644 northd/lflow-mgr.h >>>>> >>>>> -- >>>>> 2.41.0 >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> [email protected] >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
