On 3/1/23 23:36, Ilya Maximets wrote: > It's common to have 'hairpin_snat_ip' to be configured for each and > every Load Balancer in a setup. For example, ovn-kubernetes does that, > setting '169.254.169.5 fd69::5' value unconditionally for every LB: > > https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314 > > Current implementation of ovn-controller will create an OpenFlow rule > for every datapath for each VIP in case of 'hairpin_snat_ip', because > we need to handle the case where LBs with the same VIP are applied to > different datapaths and have different SNAT IPs. > > In large scale setups with tens of thousands of LBs and hundreds of > nodes, this generates millions of OpenFlow rules, making ovn-controller > choke and fall into unrecoverable disconnection loop with poll > intervals up to 200 seconds in ovn-heater's density-heavy scenario with > just 120 nodes. It also generates rules with thousands of conjunctions > that do not fit into a single FLOW_MOD, breaking the OpenFlow > management communication. > > Fix that by partially reverting cited commit for Load Balancers with > 'hairpin_snat_ip' specified. This will allow us to avoid construction > of OpenFlow rules that do not fit into a single OpenFlow message. > In order to fight the number of flows we have to generate we will > generate flows for local datapaths only. In modern ovn-kubernetes > setups, in general, there should be only one local datapath. > > Optimization for Load Balancers without 'hairpin_snat_ip' specified > can still be preserved by keeping these wider flows at a lower priority. > The same way as we do now. That allows to limit the blast radius of > this change. E.g. OpenStack and older ovn-kubernetes deployments will > not be affected. > > Fixes: 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows") > Reported-at: https://bugzilla.redhat.com/2171423 > Suggested-by: Dumitru Ceara <[email protected]> > Signed-off-by: Ilya Maximets <[email protected]> > --- > > Version 2: > > - This is less of a v2, but an alternative solution. A modified > version of what Dumitru suggested during review of 1. Difference > is that it's a partial revert of the cited commit, not a full one. > i.e. optimization preserved for a case without hairpin_snat_ip. > > - Posting now due to pressing time constraints for a release. > I didn't finish high-scale performance tests yet. I'll post > results for that tomorrow. Preliminary ovn-heater run with > 120 nodes density-heavy worked fine without issues. 500 node > tests are queued.
500-node tests finished. And I see almost no performance difference between cases with and without hairpin_snat_ip and the current main branch without hairpin_snat_ip. All 3 setups have the same amount of OF rules in ovn-k-ish setups with ovn-heater. CPU usage is approximately on the same level as well. Best regards, Ilya Maximets. > > - Unfortunately, ovn-kubernetes tests are fialing in CI on ovn-k > continer build stage, so I didn't run that check. > > controller/lflow.c | 304 +++++++++++------------------------- > controller/lflow.h | 2 - > controller/ovn-controller.c | 14 -- > 3 files changed, 90 insertions(+), 230 deletions(-) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
