On 7/29/21 11:59 AM, Dumitru Ceara wrote:
> Logical_Flows are immutable, that is, when the match or action change
> ovn-northd will actually remove the old logical flow and add a new one.
> 
> Commit ecc941c70f16 added flows to flood ARPs to routers for unreachable
> addresses.
> 
> In the following topology (2 switches connected to a router with a load
> balancer attached to it):
> 
> S1 --- (IP1/MASK1) R (IP2/MASK2) --- S2
> 
> LB1 (applied on R) with N VIPs.
> 
> The following logical flows were generated:
> 
> S1 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP1 }" then 
> ...
> S2 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP2 }" then 
> ...
> 
> While this seemed a good idea initially because it would reduce the
> total number of logical flows, this approach has a few major downsides
> that severely affect scalability:
> 
> 1. When logical datapath grouping is enabled (which is how high scale
>    deployments should use OVN) there is no chance that the flows for
>    reachable/unreachable router owned IPs are grouped together for all
>    datapaths that correspond to the logical switches they are generated
>    for.
> 2. Whenever a VIP (VIPn+1) is added to LB1 all these flows will be
>    removed and new ones will be added with the only difference being the
>    VIPn+1 aded to the flow match.  As this is a new logical flow record,
>    ovn-controller will have to remove all previously installed openflows
>    (for the deleted logical flow) and add new ones.  However, for most
>    of these openflows, the only change is the cookie value (the first 32
>    bits of the logical flow UUID).  At scale this unnecessary openflow
>    churn will induce significant latency.  Moreover, this also creates
>    unnecessary jsonrpc traffic from ovn-northd towards OVN_Southbound
>    because the old flow needs to be removed and a new one (with a large
>    match field) needs to be installed.
> 
> To avoid this, we now install individual flows, one per IP, for
> managing ARP/ND traffic from logical switches towards router owned
> IPs that are reachable/unreachable on the logical port connecting
> the switch.
> 
> Fixes: ecc941c70f16 ("northd: Flood ARPs to routers for "unreachable" 
> addresses.")
> Signed-off-by: Dumitru Ceara <[email protected]>
> ---
> On a deployment based on an ovn-kubernetes cluster with 20 nodes, 2500
> load balancer VIPs, 5000 pods, we:
> - measure number of logical flows in the SB DB, size of the SB DB
> - then create an additional 250 pods (logical ports), bind them, and add
>   a load balancer VIP (with a single backend) for each of the additional
>   pods, and measure how long it takes for all openflows corresponding to
>   each of the pods to be installed in OVS.
> 
> Results:
> - without fix:
>   ------------
>   SB lflows:             67976
>   SB size (uncompacted): 46 MiB
>   SB size (compacted):   40 MiB
> 
>   after 250 ports + VIPs added:
>   SB lflows:             71479
>   SB size (uncompacted): 248 MiB
>   SB size (compacted):   42  MiB
>   Max time to install all relevant openflows for a single pod: ~6sec
> 
> - with fix:
>   ---------
>   SB lflows:             70399
>   SB size (uncompacted): 46 MiB
>   SB size (compacted):   40 MiB
> 
>   after 250 ports + VIPs added:
>   SB lflows:             74149
>   SB size (uncompacted): 165 MiB
>   SB size (compacted):   42 MiB
>   Max time to install all relevant openflows for a single pod: ~100msec


CC: Mark Michelson (original author of ecc941c70f16 ("northd: Flood ARPs
to routers for "unreachable" addresses."))

There's another note I should mention here:

Since we now call build_lswitch_rport_arp_req_flow_for_unreachable_ip()
for every load balancer VIP (instead of once for a list of VIPs) load on
ovn-northd will be increased as it has to create more logical flows that
will eventually be grouped together on the same datapath group.

When testing at higher scales (120 nodes setups similar topologies to
the one above but with 45K VIPs) I see ovn-northd poll loop intervals
taking ~3s longer on my machine (from ~10s to ~13s).  I do, however,
still think that not merging the flow matches is a good thing because it
has a very substantial impact on all ovn-controllers in the cluster,
already visible in our tests with smaller, 20 node clusters.

A point here is that I think it's fine if
build_lswitch_rport_arp_req_flow_for_unreachable_ip() installs flows at
a priority lower than the one used for reachable IPs, e.g., 79.  That
would allow us to avoid the !lrouter_port_ipv*_reachable() condition
when installing these flows.

As a follow up optimization we can take the same approach as in
0fd39c6cf0ba "northd: move build_lb_rules in build_lswitch_flows_for_lb"
and precompute the matches once for all load balancer VIPs, and just use
them when installing the flows on the connected logical switch datapaths.

Thoughts?

Thanks,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to