On 3/2/23 05:24, Dumitru Ceara wrote:
On 3/2/23 11:15, Ilya Maximets wrote:
On 3/1/23 23:36, Ilya Maximets wrote:
It's common to have 'hairpin_snat_ip' to be configured for each and
every Load Balancer in a setup.  For example, ovn-kubernetes does that,
setting '169.254.169.5 fd69::5' value unconditionally for every LB:
   
https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314

Current implementation of ovn-controller will create an OpenFlow rule
for every datapath for each VIP in case of 'hairpin_snat_ip', because
we need to handle the case where LBs with the same VIP are applied to
different datapaths and have different SNAT IPs.

In large scale setups with tens of thousands of LBs and hundreds of
nodes, this generates millions of OpenFlow rules, making ovn-controller
choke and fall into unrecoverable disconnection loop with poll
intervals up to 200 seconds in ovn-heater's density-heavy scenario with
just 120 nodes.  It also generates rules with thousands of conjunctions
that do not fit into a single FLOW_MOD, breaking the OpenFlow
management communication.

Fix that by partially reverting cited commit for Load Balancers with
'hairpin_snat_ip' specified.  This will allow us to avoid construction
of OpenFlow rules that do not fit into a single OpenFlow message.
In order to fight the number of flows we have to generate we will
generate flows for local datapaths only.  In modern ovn-kubernetes
setups, in general, there should be only one local datapath.

Optimization for Load Balancers without 'hairpin_snat_ip' specified
can still be preserved by keeping these wider flows at a lower priority.
The same way as we do now.  That allows to limit the blast radius of
this change.  E.g. OpenStack and older ovn-kubernetes deployments will
not be affected.

Fixes: 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows")
Reported-at: https://bugzilla.redhat.com/2171423
Suggested-by: Dumitru Ceara <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
---

Version 2:

   - This is less of a v2, but an alternative solution.  A modified
     version of what Dumitru suggested during review of 1.  Difference
     is that it's a partial revert of the cited commit, not a full one.
     i.e. optimization preserved for a case without hairpin_snat_ip.

   - Posting now due to pressing time constraints for a release.
     I didn't finish high-scale performance tests yet.  I'll post
     results for that tomorrow.  Preliminary ovn-heater run with
     120 nodes density-heavy worked fine without issues.  500 node
     tests are queued.

500-node tests finished.  And I see almost no performance difference
between cases with and without hairpin_snat_ip and the current main
branch without hairpin_snat_ip.  All 3 setups have the same amount of
OF rules in ovn-k-ish setups with ovn-heater.  CPU usage is approximately
on the same level as well.


Nice!  I'm also running this through the ovn-kubernetes CI but I expect
it to work fine:

https://github.com/dceara/ovn/actions/runs/4312305009

Acked-by: Dumitru Ceara <[email protected]>

Thanks a lot for fixing this!

Regards,
Dumitru

I applied this to main and branch-23.03. I attempted to apply it to branch-22.12 but it did not apply cleanly. If this needs to be backported to other branches, then we will need patches for the affected branches.

Thanks Ilya and Dumitru!




_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to