On 29/07/2021 14:20, Dumitru Ceara wrote:
> On 7/29/21 11:59 AM, Dumitru Ceara wrote:
>> Logical_Flows are immutable, that is, when the match or action change
>> ovn-northd will actually remove the old logical flow and add a new one.
>>
>> Commit ecc941c70f16 added flows to flood ARPs to routers for unreachable
>> addresses.
>>
>> In the following topology (2 switches connected to a router with a load
>> balancer attached to it):
>>
>> S1 --- (IP1/MASK1) R (IP2/MASK2) --- S2
>>
>> LB1 (applied on R) with N VIPs.
>>
>> The following logical flows were generated:
>>
>> S1 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP1 }" then 
>> ...
>> S2 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP2 }" then 
>> ...
>>
>> While this seemed a good idea initially because it would reduce the
>> total number of logical flows, this approach has a few major downsides
>> that severely affect scalability:
>>
>> 1. When logical datapath grouping is enabled (which is how high scale
>>    deployments should use OVN) there is no chance that the flows for
>>    reachable/unreachable router owned IPs are grouped together for all
>>    datapaths that correspond to the logical switches they are generated
>>    for.
>> 2. Whenever a VIP (VIPn+1) is added to LB1 all these flows will be
>>    removed and new ones will be added with the only difference being the
>>    VIPn+1 aded to the flow match.  As this is a new logical flow record,
>>    ovn-controller will have to remove all previously installed openflows
>>    (for the deleted logical flow) and add new ones.  However, for most
>>    of these openflows, the only change is the cookie value (the first 32
>>    bits of the logical flow UUID).  At scale this unnecessary openflow
>>    churn will induce significant latency.  Moreover, this also creates
>>    unnecessary jsonrpc traffic from ovn-northd towards OVN_Southbound
>>    because the old flow needs to be removed and a new one (with a large
>>    match field) needs to be installed.
>>
>> To avoid this, we now install individual flows, one per IP, for
>> managing ARP/ND traffic from logical switches towards router owned
>> IPs that are reachable/unreachable on the logical port connecting
>> the switch.
>>
>> Fixes: ecc941c70f16 ("northd: Flood ARPs to routers for "unreachable" 
>> addresses.")
>> Signed-off-by: Dumitru Ceara <[email protected]>
>> ---
>> On a deployment based on an ovn-kubernetes cluster with 20 nodes, 2500
>> load balancer VIPs, 5000 pods, we:
>> - measure number of logical flows in the SB DB, size of the SB DB
>> - then create an additional 250 pods (logical ports), bind them, and add
>>   a load balancer VIP (with a single backend) for each of the additional
>>   pods, and measure how long it takes for all openflows corresponding to
>>   each of the pods to be installed in OVS.
>>
>> Results:
>> - without fix:
>>   ------------
>>   SB lflows:             67976
>>   SB size (uncompacted): 46 MiB
>>   SB size (compacted):   40 MiB
>>
>>   after 250 ports + VIPs added:
>>   SB lflows:             71479
>>   SB size (uncompacted): 248 MiB
>>   SB size (compacted):   42  MiB
>>   Max time to install all relevant openflows for a single pod: ~6sec
>>
>> - with fix:
>>   ---------
>>   SB lflows:             70399
>>   SB size (uncompacted): 46 MiB
>>   SB size (compacted):   40 MiB
>>
>>   after 250 ports + VIPs added:
>>   SB lflows:             74149
>>   SB size (uncompacted): 165 MiB
>>   SB size (compacted):   42 MiB
>>   Max time to install all relevant openflows for a single pod: ~100msec
> 
> 

I think you should put these^ (or a summary of these - including maybe
the northd degradation) into the commit message.

> CC: Mark Michelson (original author of ecc941c70f16 ("northd: Flood ARPs
> to routers for "unreachable" addresses."))
> 
> There's another note I should mention here:
> 
> Since we now call build_lswitch_rport_arp_req_flow_for_unreachable_ip()
> for every load balancer VIP (instead of once for a list of VIPs) load on
> ovn-northd will be increased as it has to create more logical flows that
> will eventually be grouped together on the same datapath group.
> 
> When testing at higher scales (120 nodes setups similar topologies to
> the one above but with 45K VIPs) I see ovn-northd poll loop intervals
> taking ~3s longer on my machine (from ~10s to ~13s).  I do, however,

I'm kind of surprised at that degradation because your change doesn't
introduce any additional looping. This is purely due to the cost of
ovn_lflow_add_with_hint()?

> still think that not merging the flow matches is a good thing because it
> has a very substantial impact on all ovn-controllers in the cluster,
> already visible in our tests with smaller, 20 node clusters.
> 
> A point here is that I think it's fine if
> build_lswitch_rport_arp_req_flow_for_unreachable_ip() installs flows at
> a priority lower than the one used for reachable IPs, e.g., 79.  That
> would allow us to avoid the !lrouter_port_ipv*_reachable() condition
> when installing these flows.

I don't get this point?
> 
> As a follow up optimization we can take the same approach as in
> 0fd39c6cf0ba "northd: move build_lb_rules in build_lswitch_flows_for_lb"
> and precompute the matches once for all load balancer VIPs, and just use
> them when installing the flows on the connected logical switch datapaths.
> 
> Thoughts?
> 
> Thanks,
> Dumitru
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to