On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <[email protected]> wrote: > > On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <[email protected]> wrote: > > > > On Thu, Oct 26, 2023 at 11:12 AM <[email protected]> wrote: > > > > > > From: Numan Siddique <[email protected]> > > > > > > This patch series adds incremental processing in the lflow engine > > > node to handle changes to northd and other engine nodes. > > > Changed related to load balancers and NAT are mainly handled in > > > this patch series. > > > > > > This patch series can also be found here - > > https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > > > > Thanks Numan for the great improvement! > > Hi Han, > > Thanks for the review comments. I understand it is hard to review > somany patches, especially related to I-P. > I appreciate the time spent on it and the feedback. > > > > > I spent some time these days to review the series and now at patch 10. I > > need to take a break and so I just sent the comments I had so far. > > > > I also did scale testing for northd with > > https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > > chassis setup, and noticed that the recompute latency has increased 20% > > after the series. I think in general it is expected to have some > > performance penalty for the recompute case because of the new indexes > > created for I-P. However, the patch 10 "northd: Refactor lflow management > > into a separate module." alone introduces a 10% latency increase, which > > necessitates more investigation. > > Before sending out the series I did some testing on recomputes with a > large OVN NB DB and SB DB > (from a 500 node ovn-heater density heavy run). > I'm aware of the increase in recomputes. And I did some more testing > today as well. > > In my testing, I can see that the increase in latency is due to the > new engine nodes added (lr_lbnat mainly) > and due to housekeeping of the lflow references. I do not see any > increase due to the new lflow-mgr.c added in patch 10. > > I compared patch 9 and patch 10 separately and there is no difference > in lflow engine node recompute time between patch 9 and 10. >
My results were different. My test profile simulates the ovn-k8s topology (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small amount of ACLs and PGs. ( https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg ) The test results for ovn-northd recompute time are: main: 1118.3 p9: 1129.5 p10: 1243.4 ==> 10% more than p9 p18: 1357.6 I am not sure if the p10 increase is related to the hash change or something else. > Below are the results with the time taken for the mentioned engine > nodes in msecs for a recompute for some of the individual patches in > the series. > > > The sample OVN DBs have > > -------------------------------- > Resource Total > ------------------------------- > Logical switches 1001 > ---------------------------- > Logical routers 501 > ---------------------------- > Router ports 1501 > ---------------------------- > Switch ports 29501 > ---------------------------- > Load balancers 35253 > ---------------------------- > Load balancer group 1 > ---------------------------- > SB Logical flows 268349 > ---------------------------- > SB DB groups 509 > ---------------------------- > > There is one load balancer group which has all the load balancers and > it is associated with all the logical switches and routers. > > Below is the time taken for each engine node in msec > > --------------------------------------------------------------------- > Engine nodes | lb_data | northd | lr_lbnat | lflow | > --------------------------------------------------------------------- > ovn-northd-main | 358 | 2455 | x | 2082 | > --------------------------------------------------------------------- > ovn-northd-p1 | 373 | 2476 | x | 2170 | > --------------------------------------------------------------------- > ovn-northd-p5 | 367 | 2413 | x | 2195 | > --------------------------------------------------------------------- > ovn-northd-p6 | 354 | 688 | 1815 | 2442 | > --------------------------------------------------------------------- > ovn-northd-p7 | 357 | 683 | 1778 | 2806 | > --------------------------------------------------------------------- > ovn-northd-p9 | 352 | 682 | 1781 | 2781 | > --------------------------------------------------------------------- > ovn-northd-p10 | 365 | 838 | 1707 | 2812 | > --------------------------------------------------------------------- > ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | > --------------------------------------------------------------------- > ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | > --------------------------------------------------------------------- > ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | > --------------------------------------------------------------------- > > ovn-northd-p1 means ovn-northd with the patch 1 of this series, > ovn-northd-p2 - patch 2 of this series and so on. > > Patch 6 adds a new engine node lr_lbnat and that's why the northd > engine node time gets reduced from ~2413 msec to 688. > > With the first 10 patches, northd engine over all time increases to > 200msec compared to "main '' and lflow engine node time increases to > around 800 msec. > The point of this data is to show that the overall increase is mainly > due to bookkeeping and new engine nodes. I tried to optimise but I > couldn't. > > > IIMO this 20% overall increase should be fine for the following reasons > - The scale tests with ovn-heater show significant decrease in > ovn-northd CPU usage and the decrease in the overall test duration. > - This means there were very few recomputes triggered. > - With the OCP/K8s kube-burner tests, there were 0 recomputes > in the northd and lflow engine nodes. > > > > > > > > > > > Prior to this patch series, most of the changes to northd engine > > > resulted in full recomputation of logical flows. This series > > > aims to improve the performance of ovn-northd by adding the I-P > > > support. > > > > I'd like to clarify "most of the changes" a little. I think we should focus > > on the most impactful changes that happen in a cloud environment. I don't > > think it is realistic to achieve "most of the changes" in I-P because it is > > too complex (even with this series we still handle a very small part of the > > DB schema incrementally), but it may be realistic to handle the most > > impactful changes, which are the most frequent changes in the cloud, > > incrementally. Before this series, we could handle regular VIF changes and > > related address-sets, port-groups (some of) updates incrementally. Those > > are the most common changes related to pod/VM changes in cloud. I believe > > the next goal is for LB changes related to pod/VMs (i.e. the LB backends), > > which I believe is the main goal of this series. Is my understanding > > correct? > > > That is correct. LB change handling is very important in my opinion > when used with ovn-kubernetes because all the k8s services > are mapped to OVN LBs. > > > > While reviewing the patches, I'd say sometimes I feel a little lost because > > it is hard to correlate some of the changes with the above goal. > > I understand that. Maybe I've done a bad job in conveying why the > changes are required for LBs. > > Let me try to clarify a bit here. I'll incorporate and add more > details in the patch commits and cover letter in the next series. > > This series mainly targets handling lflow generation for load balancer changes. > As you know we use conntrack in OVN for > - Load balancers (both in logical switch and router) > - ACLs in logical switch > - NAT in load balancers. > > And this makes it complicated to decouple logical flow generation for > LBs from ACLs and NATs. That is the reason I had to split northd node > data related to load balancers, ACLs and NATs into separate nodes. > > To give an example, we generate certain logical flows for a logical > switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See > build_acl_hints()). > For logical routers, a NAT can be a LB VIP too. This becomes > difficult to decouple LB and NAT. We could fall back to recompute > for NAT changes. But we still need to group logical flows related to > LB and NAT. And that's why I added lr_lbnat_data engine node in patch > 6. > Thanks for explaining. However, I wonder if these are related to LB backend updates. I agree that service creation/deletion may also be important to be handled incrementally, but they are less critical (less frequent) than the backend updates which are directly related to pod/vm creation/deletion. The has_xxx and NAT related handling are more related to the service creation/deletion rather than LB backend updates, right? Thanks, Han > I believe > > there is a reason for all the changes but I am not sure if it is directly > > related to the goal of LB backend related I-P. I understand that there are > > other aspects of LB that can be incrementally processed. But I'd recommend > > if changes necessary for this goal can be largely reduced it would help the > > review and we might be able to merge them sooner and add more but less > > impactful I-P later step by step. I guess I will have a clearer picture > > when I finish the rest of the patches, but it would be helpful if you could > > add more guidance in this cover letter. > > I'll definitely add more details in the cover letter. > > Apart from the LB related I-P, this series only adds 2 more additional > patches for I-P. > One is for the NB_Global changes and the other for SB mac binding changes. > ovn-kubernetes periodically writes in the NB_Global.options column > with a timestamp for its internal health checks > and this results in unnecessary recomputes. After running the > kube-burner density cni tests, I figured out the most common > changes done in the NB DB and added the I-P for that. With all these > patches applied, there are no recomputes in both northd and lflow > engine nodes > during the test duration. Not having the I-P for NB Global for > example was resulting in significant ovn-northd CPU usage. > > Thanks again for the reviews. > > Numan > > > > > Thanks, > > Han > > > > > In order to add this support, some of the northd engine > > > node data (from struct ovn_datapath) is split and moved over to > > > new engine nodes - mainly related to load balancers, NAT and ACLs. > > > > > > Below are the scale testing results done with these patches applied > > > using ovn-heater. The test ran the scenario - > > > ocp-500-density-heavy.yml [1]. > > > > > > With all the lflow I-P patches applied, the resuts are: > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Min (s) Median (s) 90%ile (s) > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > Failed > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Iteration Total 0.136883 1.129016 1.192001 > > 1.204167 1.212728 0.665017 83.127099 125 0 > > > Namespace.add_ports 0.005216 0.005736 0.007034 > > 0.015486 0.018978 0.006211 0.776373 125 0 > > > WorkerNode.bind_port 0.035030 0.046082 0.052469 > > 0.058293 0.060311 0.045973 11.493259 250 0 > > > WorkerNode.ping_port 0.005057 0.006727 1.047692 > > 1.069253 1.071336 0.266896 66.724094 250 0 > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > The results with the present main are: > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Min (s) Median (s) 90%ile (s) > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > Failed > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Iteration Total 0.135491 2.223805 3.311270 > > 3.339078 3.345346 1.729172 216.146495 125 0 > > > Namespace.add_ports 0.005380 0.005744 0.006819 > > 0.018773 0.020800 0.006292 0.786532 125 0 > > > WorkerNode.bind_port 0.034179 0.046055 0.053488 > > 0.058801 0.071043 0.046117 11.529311 250 0 > > > WorkerNode.ping_port 0.004956 0.006952 3.086952 > > 3.191743 3.192807 0.791544 197.886026 250 0 > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > [1] - > > https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > > > > > > v1 -> v2 > > > -------- > > > * Now also maintaing array indexes for ls_lbacls, lr_nat and > > > lr_lb_nat_data tables (similar to ovn_datapaths->array) to > > > make the lookup effecient. The same ovn_datapath->index > > > is reused. > > > > > > * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c. > > > In v2 we don't use objdep_mgr to maintain the resource to lflow > > > references. Instead we maintain the 'struct lflow' pointer. > > > With this we don't need to maintain additional hmap of lflows. > > > > > > > > > Numan Siddique (18): > > > northd: Refactor the northd change tracking. > > > northd: Track ovn_datapaths in northd engine track data. > > > tests: Add a couple of tests in ovn-northd for I-P. > > > northd: Move router ports SB PB options sync to sync_to_sb_pb node. > > > northd: Add a new engine 'lr-nat' to manage lr NAT data. > > > northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data. > > > northd: Generate logical router's LB and NAT flows using > > > lr_lbnat_data. > > > northd: Don't commit dhcp response flows in the conntrack. > > > northd: Add a new node ls_lbacls. > > > northd: Refactor lflow management into a separate module. > > > northd: Use lflow_ref when adding all logical flows. > > > northd: Move ovn_lb_datapaths from lib to northd module. > > > northd: Handle lb changes in lflow engine. > > > northd: Add lr_lb_nat_data handler for lflow engine node. > > > northd: Add ls_lbacls handler for lflow engine node. > > > northd: Add I-P for NB_Global and SB_Global. > > > northd: Add a noop handler for northd SB mac binding. > > > northd: Add northd change handler for sync_to_sb_lb node. > > > > > > lib/lb.c | 96 - > > > lib/lb.h | 57 - > > > lib/ovn-util.c | 17 +- > > > lib/ovn-util.h | 2 +- > > > lib/stopwatch-names.h | 3 + > > > northd/aging.c | 21 +- > > > northd/automake.mk | 12 +- > > > northd/en-global-config.c | 588 ++++ > > > northd/en-global-config.h | 65 + > > > northd/en-lflow.c | 123 +- > > > northd/en-lflow.h | 8 + > > > northd/en-lr-lb-nat-data.c | 685 ++++ > > > northd/en-lr-lb-nat-data.h | 125 + > > > northd/en-lr-nat.c | 498 +++ > > > northd/en-lr-nat.h | 137 + > > > northd/en-ls-lb-acls.c | 530 +++ > > > northd/en-ls-lb-acls.h | 111 + > > > northd/en-northd.c | 67 +- > > > northd/en-northd.h | 2 +- > > > northd/en-port-group.h | 3 + > > > northd/en-sync-sb.c | 513 ++- > > > northd/inc-proc-northd.c | 79 +- > > > northd/lflow-mgr.c | 1078 ++++++ > > > northd/lflow-mgr.h | 192 ++ > > > northd/northd.c | 6485 ++++++++++++++++-------------------- > > > northd/northd.h | 551 ++- > > > northd/ovn-northd.c | 4 + > > > tests/ovn-northd.at | 858 ++++- > > > 28 files changed, 8827 insertions(+), 4083 deletions(-) > > > create mode 100644 northd/en-global-config.c > > > create mode 100644 northd/en-global-config.h > > > create mode 100644 northd/en-lr-lb-nat-data.c > > > create mode 100644 northd/en-lr-lb-nat-data.h > > > create mode 100644 northd/en-lr-nat.c > > > create mode 100644 northd/en-lr-nat.h > > > create mode 100644 northd/en-ls-lb-acls.c > > > create mode 100644 northd/en-ls-lb-acls.h > > > create mode 100644 northd/lflow-mgr.c > > > create mode 100644 northd/lflow-mgr.h > > > > > > -- > > > 2.41.0 > > > > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
