On 2/28/23 16:28, Dumitru Ceara wrote: > On 2/28/23 16:22, Ilya Maximets wrote: >> On 2/28/23 16:14, Dumitru Ceara wrote: >>> On 2/28/23 15:29, Ilya Maximets wrote: >>>> On 2/28/23 01:03, Dumitru Ceara wrote: >>>>> Hi Ilya, >>>>> >>>>> Thanks for the patch, you're right the hairpin SNAT IP processing in >>>>> ovn-controller is extremely inefficient today. >>>>> >>>>> I didn't review the code closely but I do have some initial remarks below. >>>>> >>>>> On 2/20/23 12:42, Ilya Maximets wrote: >>>>>> It's common to have 'hairpin_snat_ip' to be configured exactly the >>>>>> same for each and every Load Balancer in a setup. For example, >>>>>> ovn-kubernetes does that, setting '169.254.169.5 fd69::5' value >>>>>> unconditionally for every LB: >>>>>> >>>>>> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314 >>>>>> >>>>>> Current implementation of ovn-controller will create an OpenFlow rule >>>>>> for every datapath for each VIP in case of 'hairpin_snat_ip', because >>>>>> we need to handle the case where LBs with the same VIP are applied to >>>>>> different datapaths and have different SNAT IPs. >>>>> >>>>> IMO the real issue is that we do this blindly for every datapath. It >>>>> should actually be only for local datapaths. In ovn-kubernetes (which >>>>> AFAIK is the only user of hairpin_snat_ip) there will only be a single >>>>> local logical switch with LBs applied on it. >>>> >>>> True, but we will still need to check each datapath this LB is applied >>>> to for being local, or the other way around, check that each local datapath >>>> is in the list of datapaths this LB is applied to. >>>> It's definitely way less work than actually creating the flow, but might >>>> still be not negligible. I didn't test though. >>>> >>> >>> Ack, it would be great if we could get some data on this. Can we take a >>> 500 node ovn-heater density-heavy NB database and just set >>> hairpin_snat_ip on all load balancers to see what impact that has on >>> ovn-controller? >>> >>>>> >>>>>> >>>>>> In large scale setups with tens of thousands of LBs and hundreds of >>>>>> nodes, this generates millions of OpenFlow rules, making ovn-controller >>>>>> choke and fall into unrecoverable disconnection loop with poll >>>>>> intervals up to 200 seconds in ovn-heater's density-heavy scenario with >>>>>> just 120 nodes. It also generates rules with thousands of conjunctions >>>>>> that do not fit into a single FLOW_MOD, breaking the OpenFlow >>>>>> management communication. >>>>> >>>>> These conjunctive flows were added with: >>>>> >>>>> 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows") >>>>> >>>>> in order to reduce load in ovn-controller for the (now obsolete) use >>>>> case ovn-kubernetes had. >>>>> >>>>> However, Han added: >>>>> >>>>> 22298fd37908 ("ovn-controller: Don't flood fill local datapaths beyond >>>>> DGP boundary.") >>>>> >>>>> Which allowed ovn-kubernetes to essentially bind a logical switch to a >>>>> chassis and which means that we will have at most one local datapath >>>>> with load balancers applied to it. At this point, I suspect the >>>>> original refactor patch could just be reverted. We would have to check >>>>> that snat-ip flows (non-conjunctive) are being added only for the local >>>>> datapath. >>>> >>>> This might be OK for ovn-kubernetes. But wouldn't this explode the >>>> number of flows for OpenStack case? All datapaths can be local in >>>> OpenStack, IIUC. So, we will go back to N_LBs * N_DPs * N_VIPs >>>> number of OpenFlow rules. Or am I missing something? >>>> >>> >>> OpenStack doesn't set hairpin_snat_ip today. >> >> But revert will impact not only hairpin_snat_ip case, it will impact >> all the normal VIPs too. Wouldn't it? >> > > True, but OpenStack doesn't apply the same LB to multiple logical > switches (*) so the actual number of OpenFlows is probably at most N_LBs > * N_VIPs. I suspect N_VIPs is 1 (or low) in most cases too.
I posted a "v2" with alternative solution here: https://patchwork.ozlabs.org/project/ovn/patch/20230301223600.1607778-1-i.maxim...@ovn.org/ It seems that we can preserve current behavior for LBs without hairpin_snat_ip and avoid affecting OpenStack. I hope, my assumption is correct. :) > > (*) needs fact checking > >>> Even if it would change >>> and start using this feature in the future, I don't think we'd have the >>> same complexity. If I understand correctly OpenStack's OVN load >>> balancers are not applied to all datapaths. >>> >>>>> >>>>> Using the database you shared offline (120 node ovn-heater >>>>> density-heavy) I tried this (didn't test too much though) and there's no >>>>> visible impact on ovn-controller performance. >>>>> >>>>>> >>>>>> Fix that by checking that all the 'hairpin_snat_ip' options on all the >>>>>> Load Balancers are exactly the same and using just one OpenFlow rule >>>>>> per VIP if that's the case. This is possible because even if LBs with >>>>>> the same VIP are applied to different datapaths, we're still going to >>>>>> SNAT with the exactly same hairpin SNAT IP. >>>>>> >>>>>> Ideally, we can only check that 'hairpin_snat_ip' is the same for all >>>>>> LBs with the same VIP, but that is not necessary for now, as the only >>>>>> common use case for this option today is ovn-kubernetes. Can be added >>>>>> in the future by enhancing the logic in ovn-northd, if necessary. >>>>>> >>>>> >>>>> What worries me about this approach is that in the case when one >>>>> hairpin_snat_ip changes value we suddenly disable the whole optimization. >>>> >>>> With the current patch, yes. With the 'ideally' solution, one will >>>> need to create 2 LBs with the same VIP and different SNAT IPs. >>>> This only makes sense if these LBs are on different datapaths, and >>>> that can't happen in ovn-kubernetes, IIUC. >>>> >>> >>> Given that (very likely) only ovn-kubernetes uses hairpin_snat_ip, can't >>> we just do the revert + local datapaths check as a short-term fix and >>> work on the "ideal" solution on the long-term? It feels like that would >>> avoid (slightly more) complex code in the interim. >> >> "ideal" solution is an incremental change on top of the current patch. >> Just a slightly more sophisticated northd check. >> > > Ok. I'll have another look at the code as soon as I get the chance. > >>> >>>>> >>>>> I think I'd like, if possible, to get more performance data on what >>>>> happens if we revert 07467cfac499 ("lflow: Refactor OpenFlow snat >>>>> hairpin flows") and only add snat ip flows for local datapaths. >>>>> >>>>> What do you think? >>>> >>>> See my concern about reverting above. >>>> >>> >>> Ack, replied there. >>> >>>>> >>>>>> Note: >>>>>> >>>>>> There seems to be a separate issue in ovn-controller. If only Load >>>>>> Balancer data is updated, I-P processing is going through the 'lb_data' >>>>>> node. And handler for that node deletes old flows and adds new ones >>>>>> for each updated LB. This triggers OpenFlow flow DEL + ADD. On DEL >>>>>> of hairpin reply learning flows in table 68, OVS cleans up all the >>>>>> learned flows in table 69 as well, because of 'delete_learned' flag. >>>>> >>>>> Won't this potentially affect dataplane performance for hairpin traffic? >>>> >>>> It will, but since the learning flow actually changed in this case, >>>> it's correct thing to do to drop the flows learned from it. >>>> >>> >>> I see. >>> >>>>> >>>>>> >>>>>> However, when changes are happening in higher level nodes and >>>>>> en_lflow_output_run() triggers lflow_run(), lflow_run() doesn't delete >>>>>> current hairpin flows before re-processing them. That leads to the >>>>>> flow MOD instead of DEL + ADD. Since the learning flow is not deleted, >>>>>> but modified, that doesn't trigger removal of learned rules in OVS. >>>>>> This is what happening if 'lb_same_hairpin_ips' flips the value. >>>>>> The test adjusted to change the 'hairpin_snat_ip' twice, to trigger >>>>>> an 'lb_data' processing. Otherwise, old learned flow for the VIP >>>>>> stays in table 69. >>>>>> >>>>>> This should not be a huge issue, as it's still a VIP address and it >>>>>> should not make any harm to have that learned flow. Also, the value >>>>>> of 'lb_same_hairpin_ips' doesn't ever flip in current ovn-kubernetes. >>>>>> >>>>>> Reported-at: https://bugzilla.redhat.com/2171423 >>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >>>>> Regards, >>>>> Dumitru >>>>> >>>> >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev