On 11/24/23 16:41, Numan Siddique wrote: > On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <[email protected]> wrote: >> >> On 11/16/23 22:05, Numan Siddique wrote: >>> On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote: >>>> >>>> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote: >>>>> >>>>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote: >>>>>> >>>>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote: >>>>>>> >>>>>>> From: Numan Siddique <[email protected]> >>>>>>> >>>>>>> This patch series adds incremental processing in the lflow engine >>>>>>> node to handle changes to northd and other engine nodes. >>>>>>> Changed related to load balancers and NAT are mainly handled in >>>>>>> this patch series. >>>>>>> >>>>>>> This patch series can also be found here - >>>>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 >>>>>> >>>>>> Thanks Numan for the great improvement! >>>>> >>>>> Hi Han, >>>>> >>>>> Thanks for the review comments. I understand it is hard to review >>>>> somany patches, especially related to I-P. >>>>> I appreciate the time spent on it and the feedback. >>>>> >>>>>> >>>>>> I spent some time these days to review the series and now at patch 10. I >>>>>> need to take a break and so I just sent the comments I had so far. >>>>>> >>>>>> I also did scale testing for northd with >>>>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / >>>>>> chassis setup, and noticed that the recompute latency has increased 20% >>>>>> after the series. I think in general it is expected to have some >>>>>> performance penalty for the recompute case because of the new indexes >>>>>> created for I-P. However, the patch 10 "northd: Refactor lflow >>>> management >>>>>> into a separate module." alone introduces a 10% latency increase, which >>>>>> necessitates more investigation. >>>>> >>>>> Before sending out the series I did some testing on recomputes with a >>>>> large OVN NB DB and SB DB >>>>> (from a 500 node ovn-heater density heavy run). >>>>> I'm aware of the increase in recomputes. And I did some more testing >>>>> today as well. >>>>> >>>>> In my testing, I can see that the increase in latency is due to the >>>>> new engine nodes added (lr_lbnat mainly) >>>>> and due to housekeeping of the lflow references. I do not see any >>>>> increase due to the new lflow-mgr.c added in patch 10. >>>>> >>>>> I compared patch 9 and patch 10 separately and there is no difference >>>>> in lflow engine node recompute time between patch 9 and 10. >>>>> >>>> >>>> My results were different. My test profile simulates the ovn-k8s topology >>>> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small >>>> amount of ACLs and PGs. >>>> ( >>>> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg >>>> ) >>>> >>>> The test results for ovn-northd recompute time are: >>>> main: 1118.3 >>>> p9: 1129.5 >>>> p10: 1243.4 ==> 10% more than p9 >>>> p18: 1357.6 >>>> >>>> I am not sure if the p10 increase is related to the hash change or >>>> something else. >>>> >> >> I didn't review p10 in detail yet but I did run some tests and (with gcc >> and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no >> significant difference between p9 and p10 when running Han's scale test >> profile. >> >> Then I also tried the same thing using the 500 nodes + 50 LSP/node perf >> test we already have in the repo: >> >> https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 >> >> Again, I didn't see any significant change between p9 and p10 on my >> laptop. I wonder what's different in our setups. >> >> Kind of related, I posted a series to improve the in-tree perf testing >> and allow us to also gather recompute stats (build the DB, reset >> stopwatches and trigger a recompute, then collect stopwatches): >> >> https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* >> >> Would it be an idea to try to merge both scenarios you guys used into >> something that's defined as a new test in-tree? Like that it's easier >> for anyone to just run the same test. > > That sounds good to me. I'll take a look at your patches soon. > > Thanks for testing them out. Can you also please compare OVN main vs > p10 of this series ? >
Sorry for the delay in replying; I ran some more tests both with Han's benchmark and also with the in-tree check-perf 500 node + 50 LSP/node test. I did this with debug logs disabled and ran them multiple times to collect some more relevant averages (the results did seem more stable). This is what I measured (where "pX" == "patch X of this series" and values are averages in milliseconds): ------------------------------------------------------------------------------- | Benchmark | Recompute "main" | Recompute p9 | Recompute p10 | Recompute p16 | |------------------------------------------------------------------------------ | Han's | 1552 | 1560 | 1676 | 1704 | ------------------------------------------------------------------------------- | in-tree | 1146 | 1250 | 1281 | 1337 | ------------------------------------------------------------------------------- So, in both cases, I see some increase in time it takes to run full recomputes. However, IMO, a hit of 10-15% in time it takes to recompute might be acceptable as long as the most common cases that used to trigger recompute are now handled incrementally. Regards, Dumitru > Thanks > Numan > > >> >> Thanks, >> Dumitru >> >>>>> Below are the results with the time taken for the mentioned engine >>>>> nodes in msecs for a recompute for some of the individual patches in >>>>> the series. >>>>> >>>>> >>>>> The sample OVN DBs have >>>>> >>>>> -------------------------------- >>>>> Resource Total >>>>> ------------------------------- >>>>> Logical switches 1001 >>>>> ---------------------------- >>>>> Logical routers 501 >>>>> ---------------------------- >>>>> Router ports 1501 >>>>> ---------------------------- >>>>> Switch ports 29501 >>>>> ---------------------------- >>>>> Load balancers 35253 >>>>> ---------------------------- >>>>> Load balancer group 1 >>>>> ---------------------------- >>>>> SB Logical flows 268349 >>>>> ---------------------------- >>>>> SB DB groups 509 >>>>> ---------------------------- >>>>> >>>>> There is one load balancer group which has all the load balancers and >>>>> it is associated with all the logical switches and routers. >>>>> >>>>> Below is the time taken for each engine node in msec >>>>> >>>>> --------------------------------------------------------------------- >>>>> Engine nodes | lb_data | northd | lr_lbnat | lflow | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-main | 358 | 2455 | x | 2082 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p1 | 373 | 2476 | x | 2170 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p5 | 367 | 2413 | x | 2195 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | >>>>> --------------------------------------------------------------------- >>>>> >>>>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, >>>>> ovn-northd-p2 - patch 2 of this series and so on. >>>>> >>>>> Patch 6 adds a new engine node lr_lbnat and that's why the northd >>>>> engine node time gets reduced from ~2413 msec to 688. >>>>> >>>>> With the first 10 patches, northd engine over all time increases to >>>>> 200msec compared to "main '' and lflow engine node time increases to >>>>> around 800 msec. >>>>> The point of this data is to show that the overall increase is mainly >>>>> due to bookkeeping and new engine nodes. I tried to optimise but I >>>>> couldn't. >>>>> >>>>> >>>>> IIMO this 20% overall increase should be fine for the following reasons >>>>> - The scale tests with ovn-heater show significant decrease in >>>>> ovn-northd CPU usage and the decrease in the overall test duration. >>>>> - This means there were very few recomputes triggered. >>>>> - With the OCP/K8s kube-burner tests, there were 0 recomputes >>>>> in the northd and lflow engine nodes. >>>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> Prior to this patch series, most of the changes to northd engine >>>>>>> resulted in full recomputation of logical flows. This series >>>>>>> aims to improve the performance of ovn-northd by adding the I-P >>>>>>> support. >>>>>> >>>>>> I'd like to clarify "most of the changes" a little. I think we should >>>> focus >>>>>> on the most impactful changes that happen in a cloud environment. I >>>> don't >>>>>> think it is realistic to achieve "most of the changes" in I-P because >>>> it is >>>>>> too complex (even with this series we still handle a very small part of >>>> the >>>>>> DB schema incrementally), but it may be realistic to handle the most >>>>>> impactful changes, which are the most frequent changes in the cloud, >>>>>> incrementally. Before this series, we could handle regular VIF changes >>>> and >>>>>> related address-sets, port-groups (some of) updates incrementally. Those >>>>>> are the most common changes related to pod/VM changes in cloud. I >>>> believe >>>>>> the next goal is for LB changes related to pod/VMs (i.e. the LB >>>> backends), >>>>>> which I believe is the main goal of this series. Is my understanding >>>>>> correct? >>>>>> >>>>> That is correct. LB change handling is very important in my opinion >>>>> when used with ovn-kubernetes because all the k8s services >>>>> are mapped to OVN LBs. >>>>> >>>>> >>>>>> While reviewing the patches, I'd say sometimes I feel a little lost >>>> because >>>>>> it is hard to correlate some of the changes with the above goal. >>>>> >>>>> I understand that. Maybe I've done a bad job in conveying why the >>>>> changes are required for LBs. >>>>> >>>>> Let me try to clarify a bit here. I'll incorporate and add more >>>>> details in the patch commits and cover letter in the next series. >>>>> >>>>> This series mainly targets handling lflow generation for load balancer >>>> changes. >>>>> As you know we use conntrack in OVN for >>>>> - Load balancers (both in logical switch and router) >>>>> - ACLs in logical switch >>>>> - NAT in load balancers. >>>>> >>>>> And this makes it complicated to decouple logical flow generation for >>>>> LBs from ACLs and NATs. That is the reason I had to split northd node >>>>> data related to load balancers, ACLs and NATs into separate nodes. >>>>> >>>>> To give an example, we generate certain logical flows for a logical >>>>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See >>>>> build_acl_hints()). >>>>> For logical routers, a NAT can be a LB VIP too. This becomes >>>>> difficult to decouple LB and NAT. We could fall back to recompute >>>>> for NAT changes. But we still need to group logical flows related to >>>>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch >>>>> 6. >>>>> >>>> >>>> Thanks for explaining. However, I wonder if these are related to LB backend >>>> updates. I agree that service creation/deletion may also be important to be >>>> handled incrementally, but they are less critical (less frequent) than the >>>> backend updates which are directly related to pod/vm creation/deletion. >>> >>> From what I understand, at least for OCP use cases they seem to be critical. >>> Also it depends on the use case I suppose. >>> >>> The >>>> has_xxx and NAT related handling are more related to the service >>>> creation/deletion rather than LB backend updates, right? >>> >>> I think so. >>> >>> Numan >>> >>>> >>>> Thanks, >>>> Han >>>> >>>>> I believe >>>>>> there is a reason for all the changes but I am not sure if it is >>>> directly >>>>>> related to the goal of LB backend related I-P. I understand that there >>>> are >>>>>> other aspects of LB that can be incrementally processed. But I'd >>>> recommend >>>>>> if changes necessary for this goal can be largely reduced it would help >>>> the >>>>>> review and we might be able to merge them sooner and add more but less >>>>>> impactful I-P later step by step. I guess I will have a clearer picture >>>>>> when I finish the rest of the patches, but it would be helpful if you >>>> could >>>>>> add more guidance in this cover letter. >>>>> >>>>> I'll definitely add more details in the cover letter. >>>>> >>>>> Apart from the LB related I-P, this series only adds 2 more additional >>>>> patches for I-P. >>>>> One is for the NB_Global changes and the other for SB mac binding changes. >>>>> ovn-kubernetes periodically writes in the NB_Global.options column >>>>> with a timestamp for its internal health checks >>>>> and this results in unnecessary recomputes. After running the >>>>> kube-burner density cni tests, I figured out the most common >>>>> changes done in the NB DB and added the I-P for that. With all these >>>>> patches applied, there are no recomputes in both northd and lflow >>>>> engine nodes >>>>> during the test duration. Not having the I-P for NB Global for >>>>> example was resulting in significant ovn-northd CPU usage. >>>>> >>>>> Thanks again for the reviews. >>>>> >>>>> Numan >>>>> >>>>>> >>>>>> Thanks, >>>>>> Han >>>>>> >>>>>>> In order to add this support, some of the northd engine >>>>>>> node data (from struct ovn_datapath) is split and moved over to >>>>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. >>>>>>> >>>>>>> Below are the scale testing results done with these patches applied >>>>>>> using ovn-heater. The test ran the scenario - >>>>>>> ocp-500-density-heavy.yml [1]. >>>>>>> >>>>>>> With all the lflow I-P patches applied, the resuts are: >>>>>>> >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Min (s) Median (s) 90%ile (s) >>>>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>>>> Failed >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Iteration Total 0.136883 1.129016 1.192001 >>>>>> 1.204167 1.212728 0.665017 83.127099 125 >>>> 0 >>>>>>> Namespace.add_ports 0.005216 0.005736 0.007034 >>>>>> 0.015486 0.018978 0.006211 0.776373 125 >>>> 0 >>>>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 >>>>>> 0.058293 0.060311 0.045973 11.493259 250 >>>> 0 >>>>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 >>>>>> 1.069253 1.071336 0.266896 66.724094 250 >>>> 0 >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> >>>>>>> The results with the present main are: >>>>>>> >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Min (s) Median (s) 90%ile (s) >>>>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>>>> Failed >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Iteration Total 0.135491 2.223805 3.311270 >>>>>> 3.339078 3.345346 1.729172 216.146495 125 >>>> 0 >>>>>>> Namespace.add_ports 0.005380 0.005744 0.006819 >>>>>> 0.018773 0.020800 0.006292 0.786532 125 >>>> 0 >>>>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 >>>>>> 0.058801 0.071043 0.046117 11.529311 250 >>>> 0 >>>>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 >>>>>> 3.191743 3.192807 0.791544 197.886026 250 >>>> 0 >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> >>>>>>> [1] - >>>>>> >>>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml >>>>>>> >>>>>>> v1 -> v2 >>>>>>> -------- >>>>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and >>>>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to >>>>>>> make the lookup effecient. The same ovn_datapath->index >>>>>>> is reused. >>>>>>> >>>>>>> * Made some signficant changes to 'struct lflow_ref' in >>>> lflow-mgr.c. >>>>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow >>>>>>> references. Instead we maintain the 'struct lflow' pointer. >>>>>>> With this we don't need to maintain additional hmap of lflows. >>>>>>> >>>>>>> >>>>>>> Numan Siddique (18): >>>>>>> northd: Refactor the northd change tracking. >>>>>>> northd: Track ovn_datapaths in northd engine track data. >>>>>>> tests: Add a couple of tests in ovn-northd for I-P. >>>>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. >>>>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. >>>>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb >>>> data. >>>>>>> northd: Generate logical router's LB and NAT flows using >>>>>>> lr_lbnat_data. >>>>>>> northd: Don't commit dhcp response flows in the conntrack. >>>>>>> northd: Add a new node ls_lbacls. >>>>>>> northd: Refactor lflow management into a separate module. >>>>>>> northd: Use lflow_ref when adding all logical flows. >>>>>>> northd: Move ovn_lb_datapaths from lib to northd module. >>>>>>> northd: Handle lb changes in lflow engine. >>>>>>> northd: Add lr_lb_nat_data handler for lflow engine node. >>>>>>> northd: Add ls_lbacls handler for lflow engine node. >>>>>>> northd: Add I-P for NB_Global and SB_Global. >>>>>>> northd: Add a noop handler for northd SB mac binding. >>>>>>> northd: Add northd change handler for sync_to_sb_lb node. >>>>>>> >>>>>>> lib/lb.c | 96 - >>>>>>> lib/lb.h | 57 - >>>>>>> lib/ovn-util.c | 17 +- >>>>>>> lib/ovn-util.h | 2 +- >>>>>>> lib/stopwatch-names.h | 3 + >>>>>>> northd/aging.c | 21 +- >>>>>>> northd/automake.mk | 12 +- >>>>>>> northd/en-global-config.c | 588 ++++ >>>>>>> northd/en-global-config.h | 65 + >>>>>>> northd/en-lflow.c | 123 +- >>>>>>> northd/en-lflow.h | 8 + >>>>>>> northd/en-lr-lb-nat-data.c | 685 ++++ >>>>>>> northd/en-lr-lb-nat-data.h | 125 + >>>>>>> northd/en-lr-nat.c | 498 +++ >>>>>>> northd/en-lr-nat.h | 137 + >>>>>>> northd/en-ls-lb-acls.c | 530 +++ >>>>>>> northd/en-ls-lb-acls.h | 111 + >>>>>>> northd/en-northd.c | 67 +- >>>>>>> northd/en-northd.h | 2 +- >>>>>>> northd/en-port-group.h | 3 + >>>>>>> northd/en-sync-sb.c | 513 ++- >>>>>>> northd/inc-proc-northd.c | 79 +- >>>>>>> northd/lflow-mgr.c | 1078 ++++++ >>>>>>> northd/lflow-mgr.h | 192 ++ >>>>>>> northd/northd.c | 6485 >>>> ++++++++++++++++-------------------- >>>>>>> northd/northd.h | 551 ++- >>>>>>> northd/ovn-northd.c | 4 + >>>>>>> tests/ovn-northd.at | 858 ++++- >>>>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) >>>>>>> create mode 100644 northd/en-global-config.c >>>>>>> create mode 100644 northd/en-global-config.h >>>>>>> create mode 100644 northd/en-lr-lb-nat-data.c >>>>>>> create mode 100644 northd/en-lr-lb-nat-data.h >>>>>>> create mode 100644 northd/en-lr-nat.c >>>>>>> create mode 100644 northd/en-lr-nat.h >>>>>>> create mode 100644 northd/en-ls-lb-acls.c >>>>>>> create mode 100644 northd/en-ls-lb-acls.h >>>>>>> create mode 100644 northd/lflow-mgr.c >>>>>>> create mode 100644 northd/lflow-mgr.h >>>>>>> >>>>>>> -- >>>>>>> 2.41.0 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> [email protected] >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> [email protected] >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
