On 3/1/23 23:36, Ilya Maximets wrote:
> It's common to have 'hairpin_snat_ip' to be configured for each and
> every Load Balancer in a setup.  For example, ovn-kubernetes does that,
> setting '169.254.169.5 fd69::5' value unconditionally for every LB:
>   
> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314
> 
> Current implementation of ovn-controller will create an OpenFlow rule
> for every datapath for each VIP in case of 'hairpin_snat_ip', because
> we need to handle the case where LBs with the same VIP are applied to
> different datapaths and have different SNAT IPs.
> 
> In large scale setups with tens of thousands of LBs and hundreds of
> nodes, this generates millions of OpenFlow rules, making ovn-controller
> choke and fall into unrecoverable disconnection loop with poll
> intervals up to 200 seconds in ovn-heater's density-heavy scenario with
> just 120 nodes.  It also generates rules with thousands of conjunctions
> that do not fit into a single FLOW_MOD, breaking the OpenFlow
> management communication.
> 
> Fix that by partially reverting cited commit for Load Balancers with
> 'hairpin_snat_ip' specified.  This will allow us to avoid construction
> of OpenFlow rules that do not fit into a single OpenFlow message.
> In order to fight the number of flows we have to generate we will
> generate flows for local datapaths only.  In modern ovn-kubernetes
> setups, in general, there should be only one local datapath.
> 
> Optimization for Load Balancers without 'hairpin_snat_ip' specified
> can still be preserved by keeping these wider flows at a lower priority.
> The same way as we do now.  That allows to limit the blast radius of
> this change.  E.g. OpenStack and older ovn-kubernetes deployments will
> not be affected.
> 
> Fixes: 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows")
> Reported-at: https://bugzilla.redhat.com/2171423
> Suggested-by: Dumitru Ceara <[email protected]>
> Signed-off-by: Ilya Maximets <[email protected]>
> ---
> 
> Version 2:
> 
>   - This is less of a v2, but an alternative solution.  A modified
>     version of what Dumitru suggested during review of 1.  Difference
>     is that it's a partial revert of the cited commit, not a full one.
>     i.e. optimization preserved for a case without hairpin_snat_ip.
> 
>   - Posting now due to pressing time constraints for a release.
>     I didn't finish high-scale performance tests yet.  I'll post
>     results for that tomorrow.  Preliminary ovn-heater run with
>     120 nodes density-heavy worked fine without issues.  500 node
>     tests are queued.

500-node tests finished.  And I see almost no performance difference
between cases with and without hairpin_snat_ip and the current main
branch without hairpin_snat_ip.  All 3 setups have the same amount of
OF rules in ovn-k-ish setups with ovn-heater.  CPU usage is approximately
on the same level as well.

Best regards, Ilya Maximets.

> 
>   - Unfortunately, ovn-kubernetes tests are fialing in CI on ovn-k
>     continer build stage, so I didn't run that check.
> 
>  controller/lflow.c          | 304 +++++++++++-------------------------
>  controller/lflow.h          |   2 -
>  controller/ovn-controller.c |  14 --
>  3 files changed, 90 insertions(+), 230 deletions(-)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to