On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <[email protected]> wrote: > > On 11/16/23 22:05, Numan Siddique wrote: > > On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <[email protected]> wrote: > >> > >> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote: > >>> > >>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote: > >>>> > >>>> On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote: > >>>>> > >>>>> From: Numan Siddique <[email protected]> > >>>>> > >>>>> This patch series adds incremental processing in the lflow engine > >>>>> node to handle changes to northd and other engine nodes. > >>>>> Changed related to load balancers and NAT are mainly handled in > >>>>> this patch series. > >>>>> > >>>>> This patch series can also be found here - > >>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > >>>> > >>>> Thanks Numan for the great improvement! > >>> > >>> Hi Han, > >>> > >>> Thanks for the review comments. I understand it is hard to review > >>> somany patches, especially related to I-P. > >>> I appreciate the time spent on it and the feedback. > >>> > >>>> > >>>> I spent some time these days to review the series and now at patch 10. I > >>>> need to take a break and so I just sent the comments I had so far. > >>>> > >>>> I also did scale testing for northd with > >>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > >>>> chassis setup, and noticed that the recompute latency has increased 20% > >>>> after the series. I think in general it is expected to have some > >>>> performance penalty for the recompute case because of the new indexes > >>>> created for I-P. However, the patch 10 "northd: Refactor lflow > >> management > >>>> into a separate module." alone introduces a 10% latency increase, which > >>>> necessitates more investigation. > >>> > >>> Before sending out the series I did some testing on recomputes with a > >>> large OVN NB DB and SB DB > >>> (from a 500 node ovn-heater density heavy run). > >>> I'm aware of the increase in recomputes. And I did some more testing > >>> today as well. > >>> > >>> In my testing, I can see that the increase in latency is due to the > >>> new engine nodes added (lr_lbnat mainly) > >>> and due to housekeeping of the lflow references. I do not see any > >>> increase due to the new lflow-mgr.c added in patch 10. > >>> > >>> I compared patch 9 and patch 10 separately and there is no difference > >>> in lflow engine node recompute time between patch 9 and 10. > >>> > >> > >> My results were different. My test profile simulates the ovn-k8s topology > >> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small > >> amount of ACLs and PGs. > >> ( > >> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg > >> ) > >> > >> The test results for ovn-northd recompute time are: > >> main: 1118.3 > >> p9: 1129.5 > >> p10: 1243.4 ==> 10% more than p9 > >> p18: 1357.6 > >> > >> I am not sure if the p10 increase is related to the hash change or > >> something else. > >> > > I didn't review p10 in detail yet but I did run some tests and (with gcc > and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no > significant difference between p9 and p10 when running Han's scale test > profile. > > Then I also tried the same thing using the 500 nodes + 50 LSP/node perf > test we already have in the repo: > > https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 > > Again, I didn't see any significant change between p9 and p10 on my > laptop. I wonder what's different in our setups. > > Kind of related, I posted a series to improve the in-tree perf testing > and allow us to also gather recompute stats (build the DB, reset > stopwatches and trigger a recompute, then collect stopwatches): > > https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* > > Would it be an idea to try to merge both scenarios you guys used into > something that's defined as a new test in-tree? Like that it's easier > for anyone to just run the same test.
That sounds good to me. I'll take a look at your patches soon. Thanks for testing them out. Can you also please compare OVN main vs p10 of this series ? Thanks Numan > > Thanks, > Dumitru > > >>> Below are the results with the time taken for the mentioned engine > >>> nodes in msecs for a recompute for some of the individual patches in > >>> the series. > >>> > >>> > >>> The sample OVN DBs have > >>> > >>> -------------------------------- > >>> Resource Total > >>> ------------------------------- > >>> Logical switches 1001 > >>> ---------------------------- > >>> Logical routers 501 > >>> ---------------------------- > >>> Router ports 1501 > >>> ---------------------------- > >>> Switch ports 29501 > >>> ---------------------------- > >>> Load balancers 35253 > >>> ---------------------------- > >>> Load balancer group 1 > >>> ---------------------------- > >>> SB Logical flows 268349 > >>> ---------------------------- > >>> SB DB groups 509 > >>> ---------------------------- > >>> > >>> There is one load balancer group which has all the load balancers and > >>> it is associated with all the logical switches and routers. > >>> > >>> Below is the time taken for each engine node in msec > >>> > >>> --------------------------------------------------------------------- > >>> Engine nodes | lb_data | northd | lr_lbnat | lflow | > >>> --------------------------------------------------------------------- > >>> ovn-northd-main | 358 | 2455 | x | 2082 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p1 | 373 | 2476 | x | 2170 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p5 | 367 | 2413 | x | 2195 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | > >>> --------------------------------------------------------------------- > >>> > >>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, > >>> ovn-northd-p2 - patch 2 of this series and so on. > >>> > >>> Patch 6 adds a new engine node lr_lbnat and that's why the northd > >>> engine node time gets reduced from ~2413 msec to 688. > >>> > >>> With the first 10 patches, northd engine over all time increases to > >>> 200msec compared to "main '' and lflow engine node time increases to > >>> around 800 msec. > >>> The point of this data is to show that the overall increase is mainly > >>> due to bookkeeping and new engine nodes. I tried to optimise but I > >>> couldn't. > >>> > >>> > >>> IIMO this 20% overall increase should be fine for the following reasons > >>> - The scale tests with ovn-heater show significant decrease in > >>> ovn-northd CPU usage and the decrease in the overall test duration. > >>> - This means there were very few recomputes triggered. > >>> - With the OCP/K8s kube-burner tests, there were 0 recomputes > >>> in the northd and lflow engine nodes. > >>> > >>> > >>> > >>>> > >>>>> > >>>>> Prior to this patch series, most of the changes to northd engine > >>>>> resulted in full recomputation of logical flows. This series > >>>>> aims to improve the performance of ovn-northd by adding the I-P > >>>>> support. > >>>> > >>>> I'd like to clarify "most of the changes" a little. I think we should > >> focus > >>>> on the most impactful changes that happen in a cloud environment. I > >> don't > >>>> think it is realistic to achieve "most of the changes" in I-P because > >> it is > >>>> too complex (even with this series we still handle a very small part of > >> the > >>>> DB schema incrementally), but it may be realistic to handle the most > >>>> impactful changes, which are the most frequent changes in the cloud, > >>>> incrementally. Before this series, we could handle regular VIF changes > >> and > >>>> related address-sets, port-groups (some of) updates incrementally. Those > >>>> are the most common changes related to pod/VM changes in cloud. I > >> believe > >>>> the next goal is for LB changes related to pod/VMs (i.e. the LB > >> backends), > >>>> which I believe is the main goal of this series. Is my understanding > >>>> correct? > >>>> > >>> That is correct. LB change handling is very important in my opinion > >>> when used with ovn-kubernetes because all the k8s services > >>> are mapped to OVN LBs. > >>> > >>> > >>>> While reviewing the patches, I'd say sometimes I feel a little lost > >> because > >>>> it is hard to correlate some of the changes with the above goal. > >>> > >>> I understand that. Maybe I've done a bad job in conveying why the > >>> changes are required for LBs. > >>> > >>> Let me try to clarify a bit here. I'll incorporate and add more > >>> details in the patch commits and cover letter in the next series. > >>> > >>> This series mainly targets handling lflow generation for load balancer > >> changes. > >>> As you know we use conntrack in OVN for > >>> - Load balancers (both in logical switch and router) > >>> - ACLs in logical switch > >>> - NAT in load balancers. > >>> > >>> And this makes it complicated to decouple logical flow generation for > >>> LBs from ACLs and NATs. That is the reason I had to split northd node > >>> data related to load balancers, ACLs and NATs into separate nodes. > >>> > >>> To give an example, we generate certain logical flows for a logical > >>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See > >>> build_acl_hints()). > >>> For logical routers, a NAT can be a LB VIP too. This becomes > >>> difficult to decouple LB and NAT. We could fall back to recompute > >>> for NAT changes. But we still need to group logical flows related to > >>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch > >>> 6. > >>> > >> > >> Thanks for explaining. However, I wonder if these are related to LB backend > >> updates. I agree that service creation/deletion may also be important to be > >> handled incrementally, but they are less critical (less frequent) than the > >> backend updates which are directly related to pod/vm creation/deletion. > > > > From what I understand, at least for OCP use cases they seem to be critical. > > Also it depends on the use case I suppose. > > > > The > >> has_xxx and NAT related handling are more related to the service > >> creation/deletion rather than LB backend updates, right? > > > > I think so. > > > > Numan > > > >> > >> Thanks, > >> Han > >> > >>> I believe > >>>> there is a reason for all the changes but I am not sure if it is > >> directly > >>>> related to the goal of LB backend related I-P. I understand that there > >> are > >>>> other aspects of LB that can be incrementally processed. But I'd > >> recommend > >>>> if changes necessary for this goal can be largely reduced it would help > >> the > >>>> review and we might be able to merge them sooner and add more but less > >>>> impactful I-P later step by step. I guess I will have a clearer picture > >>>> when I finish the rest of the patches, but it would be helpful if you > >> could > >>>> add more guidance in this cover letter. > >>> > >>> I'll definitely add more details in the cover letter. > >>> > >>> Apart from the LB related I-P, this series only adds 2 more additional > >>> patches for I-P. > >>> One is for the NB_Global changes and the other for SB mac binding changes. > >>> ovn-kubernetes periodically writes in the NB_Global.options column > >>> with a timestamp for its internal health checks > >>> and this results in unnecessary recomputes. After running the > >>> kube-burner density cni tests, I figured out the most common > >>> changes done in the NB DB and added the I-P for that. With all these > >>> patches applied, there are no recomputes in both northd and lflow > >>> engine nodes > >>> during the test duration. Not having the I-P for NB Global for > >>> example was resulting in significant ovn-northd CPU usage. > >>> > >>> Thanks again for the reviews. > >>> > >>> Numan > >>> > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>>>> In order to add this support, some of the northd engine > >>>>> node data (from struct ovn_datapath) is split and moved over to > >>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. > >>>>> > >>>>> Below are the scale testing results done with these patches applied > >>>>> using ovn-heater. The test ran the scenario - > >>>>> ocp-500-density-heavy.yml [1]. > >>>>> > >>>>> With all the lflow I-P patches applied, the resuts are: > >>>>> > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Min (s) Median (s) 90%ile (s) > >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count > >>>> Failed > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Iteration Total 0.136883 1.129016 1.192001 > >>>> 1.204167 1.212728 0.665017 83.127099 125 > >> 0 > >>>>> Namespace.add_ports 0.005216 0.005736 0.007034 > >>>> 0.015486 0.018978 0.006211 0.776373 125 > >> 0 > >>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 > >>>> 0.058293 0.060311 0.045973 11.493259 250 > >> 0 > >>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 > >>>> 1.069253 1.071336 0.266896 66.724094 250 > >> 0 > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> > >>>>> The results with the present main are: > >>>>> > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Min (s) Median (s) 90%ile (s) > >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count > >>>> Failed > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Iteration Total 0.135491 2.223805 3.311270 > >>>> 3.339078 3.345346 1.729172 216.146495 125 > >> 0 > >>>>> Namespace.add_ports 0.005380 0.005744 0.006819 > >>>> 0.018773 0.020800 0.006292 0.786532 125 > >> 0 > >>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 > >>>> 0.058801 0.071043 0.046117 11.529311 250 > >> 0 > >>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 > >>>> 3.191743 3.192807 0.791544 197.886026 250 > >> 0 > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> > >>>>> [1] - > >>>> > >> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > >>>>> > >>>>> v1 -> v2 > >>>>> -------- > >>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and > >>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to > >>>>> make the lookup effecient. The same ovn_datapath->index > >>>>> is reused. > >>>>> > >>>>> * Made some signficant changes to 'struct lflow_ref' in > >> lflow-mgr.c. > >>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow > >>>>> references. Instead we maintain the 'struct lflow' pointer. > >>>>> With this we don't need to maintain additional hmap of lflows. > >>>>> > >>>>> > >>>>> Numan Siddique (18): > >>>>> northd: Refactor the northd change tracking. > >>>>> northd: Track ovn_datapaths in northd engine track data. > >>>>> tests: Add a couple of tests in ovn-northd for I-P. > >>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. > >>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. > >>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb > >> data. > >>>>> northd: Generate logical router's LB and NAT flows using > >>>>> lr_lbnat_data. > >>>>> northd: Don't commit dhcp response flows in the conntrack. > >>>>> northd: Add a new node ls_lbacls. > >>>>> northd: Refactor lflow management into a separate module. > >>>>> northd: Use lflow_ref when adding all logical flows. > >>>>> northd: Move ovn_lb_datapaths from lib to northd module. > >>>>> northd: Handle lb changes in lflow engine. > >>>>> northd: Add lr_lb_nat_data handler for lflow engine node. > >>>>> northd: Add ls_lbacls handler for lflow engine node. > >>>>> northd: Add I-P for NB_Global and SB_Global. > >>>>> northd: Add a noop handler for northd SB mac binding. > >>>>> northd: Add northd change handler for sync_to_sb_lb node. > >>>>> > >>>>> lib/lb.c | 96 - > >>>>> lib/lb.h | 57 - > >>>>> lib/ovn-util.c | 17 +- > >>>>> lib/ovn-util.h | 2 +- > >>>>> lib/stopwatch-names.h | 3 + > >>>>> northd/aging.c | 21 +- > >>>>> northd/automake.mk | 12 +- > >>>>> northd/en-global-config.c | 588 ++++ > >>>>> northd/en-global-config.h | 65 + > >>>>> northd/en-lflow.c | 123 +- > >>>>> northd/en-lflow.h | 8 + > >>>>> northd/en-lr-lb-nat-data.c | 685 ++++ > >>>>> northd/en-lr-lb-nat-data.h | 125 + > >>>>> northd/en-lr-nat.c | 498 +++ > >>>>> northd/en-lr-nat.h | 137 + > >>>>> northd/en-ls-lb-acls.c | 530 +++ > >>>>> northd/en-ls-lb-acls.h | 111 + > >>>>> northd/en-northd.c | 67 +- > >>>>> northd/en-northd.h | 2 +- > >>>>> northd/en-port-group.h | 3 + > >>>>> northd/en-sync-sb.c | 513 ++- > >>>>> northd/inc-proc-northd.c | 79 +- > >>>>> northd/lflow-mgr.c | 1078 ++++++ > >>>>> northd/lflow-mgr.h | 192 ++ > >>>>> northd/northd.c | 6485 > >> ++++++++++++++++-------------------- > >>>>> northd/northd.h | 551 ++- > >>>>> northd/ovn-northd.c | 4 + > >>>>> tests/ovn-northd.at | 858 ++++- > >>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) > >>>>> create mode 100644 northd/en-global-config.c > >>>>> create mode 100644 northd/en-global-config.h > >>>>> create mode 100644 northd/en-lr-lb-nat-data.c > >>>>> create mode 100644 northd/en-lr-lb-nat-data.h > >>>>> create mode 100644 northd/en-lr-nat.c > >>>>> create mode 100644 northd/en-lr-nat.h > >>>>> create mode 100644 northd/en-ls-lb-acls.c > >>>>> create mode 100644 northd/en-ls-lb-acls.h > >>>>> create mode 100644 northd/lflow-mgr.c > >>>>> create mode 100644 northd/lflow-mgr.h > >>>>> > >>>>> -- > >>>>> 2.41.0 > >>>>> > >>>>> _______________________________________________ > >>>>> dev mailing list > >>>>> [email protected] > >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>> _______________________________________________ > >>>> dev mailing list > >>>> [email protected] > >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> _______________________________________________ > >> dev mailing list > >> [email protected] > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
