On 29/07/2021 17:16, Dumitru Ceara wrote:
> On 7/29/21 5:42 PM, Mark Gray wrote:
>> On 29/07/2021 14:20, Dumitru Ceara wrote:
>>> On 7/29/21 11:59 AM, Dumitru Ceara wrote:
>>>> Logical_Flows are immutable, that is, when the match or action change
>>>> ovn-northd will actually remove the old logical flow and add a new one.
>>>>
>>>> Commit ecc941c70f16 added flows to flood ARPs to routers for unreachable
>>>> addresses.
>>>>
>>>> In the following topology (2 switches connected to a router with a load
>>>> balancer attached to it):
>>>>
>>>> S1 --- (IP1/MASK1) R (IP2/MASK2) --- S2
>>>>
>>>> LB1 (applied on R) with N VIPs.
>>>>
>>>> The following logical flows were generated:
>>>>
>>>> S1 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP1 }" 
>>>> then ...
>>>> S2 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP2 }" 
>>>> then ...
>>>>
>>>> While this seemed a good idea initially because it would reduce the
>>>> total number of logical flows, this approach has a few major downsides
>>>> that severely affect scalability:
>>>>
>>>> 1. When logical datapath grouping is enabled (which is how high scale
>>>>    deployments should use OVN) there is no chance that the flows for
>>>>    reachable/unreachable router owned IPs are grouped together for all
>>>>    datapaths that correspond to the logical switches they are generated
>>>>    for.
>>>> 2. Whenever a VIP (VIPn+1) is added to LB1 all these flows will be
>>>>    removed and new ones will be added with the only difference being the
>>>>    VIPn+1 aded to the flow match.  As this is a new logical flow record,
>>>>    ovn-controller will have to remove all previously installed openflows
>>>>    (for the deleted logical flow) and add new ones.  However, for most
>>>>    of these openflows, the only change is the cookie value (the first 32
>>>>    bits of the logical flow UUID).  At scale this unnecessary openflow
>>>>    churn will induce significant latency.  Moreover, this also creates
>>>>    unnecessary jsonrpc traffic from ovn-northd towards OVN_Southbound
>>>>    because the old flow needs to be removed and a new one (with a large
>>>>    match field) needs to be installed.
>>>>
>>>> To avoid this, we now install individual flows, one per IP, for
>>>> managing ARP/ND traffic from logical switches towards router owned
>>>> IPs that are reachable/unreachable on the logical port connecting
>>>> the switch.
>>>>
>>>> Fixes: ecc941c70f16 ("northd: Flood ARPs to routers for "unreachable" 
>>>> addresses.")
>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>> ---
>>>> On a deployment based on an ovn-kubernetes cluster with 20 nodes, 2500
>>>> load balancer VIPs, 5000 pods, we:
>>>> - measure number of logical flows in the SB DB, size of the SB DB
>>>> - then create an additional 250 pods (logical ports), bind them, and add
>>>>   a load balancer VIP (with a single backend) for each of the additional
>>>>   pods, and measure how long it takes for all openflows corresponding to
>>>>   each of the pods to be installed in OVS.
>>>>
>>>> Results:
>>>> - without fix:
>>>>   ------------
>>>>   SB lflows:             67976
>>>>   SB size (uncompacted): 46 MiB
>>>>   SB size (compacted):   40 MiB
>>>>
>>>>   after 250 ports + VIPs added:
>>>>   SB lflows:             71479
>>>>   SB size (uncompacted): 248 MiB
>>>>   SB size (compacted):   42  MiB
>>>>   Max time to install all relevant openflows for a single pod: ~6sec
>>>>
>>>> - with fix:
>>>>   ---------
>>>>   SB lflows:             70399
>>>>   SB size (uncompacted): 46 MiB
>>>>   SB size (compacted):   40 MiB
>>>>
>>>>   after 250 ports + VIPs added:
>>>>   SB lflows:             74149
>>>>   SB size (uncompacted): 165 MiB
>>>>   SB size (compacted):   42 MiB
>>>>   Max time to install all relevant openflows for a single pod: ~100msec
>>>
>>>
>>
>> I think you should put these^ (or a summary of these - including maybe
>> the northd degradation) into the commit message.
>>
> 
> Sure, I'll do that in the next revision.

If you are going to revise it, in the comments for a lot of the
functions in northd.c, it references "Ingress Table 19". Can you change
this? It is actually table 22 now.

> 
>>> CC: Mark Michelson (original author of ecc941c70f16 ("northd: Flood ARPs
>>> to routers for "unreachable" addresses."))
>>>
>>> There's another note I should mention here:
>>>
>>> Since we now call build_lswitch_rport_arp_req_flow_for_unreachable_ip()
>>> for every load balancer VIP (instead of once for a list of VIPs) load on
>>> ovn-northd will be increased as it has to create more logical flows that
>>> will eventually be grouped together on the same datapath group.
>>>
>>> When testing at higher scales (120 nodes setups similar topologies to
>>> the one above but with 45K VIPs) I see ovn-northd poll loop intervals
>>> taking ~3s longer on my machine (from ~10s to ~13s).  I do, however,
>>
>> I'm kind of surprised at that degradation because your change doesn't
>> introduce any additional looping. This is purely due to the cost of
>> ovn_lflow_add_with_hint()?
> 
> Well, yes, and building of the match.  I commented it out and the hit is
> gone.
> 
>>
>>> still think that not merging the flow matches is a good thing because it
>>> has a very substantial impact on all ovn-controllers in the cluster,
>>> already visible in our tests with smaller, 20 node clusters.
>>>
>>> A point here is that I think it's fine if
>>> build_lswitch_rport_arp_req_flow_for_unreachable_ip() installs flows at
>>> a priority lower than the one used for reachable IPs, e.g., 79.  That
>>> would allow us to avoid the !lrouter_port_ipv*_reachable() condition
>>> when installing these flows.
>>
>> I don't get this point?
> 
> In my opinion we don't need to do:
> 
> if (port_reachable(op)) {
>     add_reachable_flows(prio=80);
> } else {
>     add_unreachable_flows(prio=90);
> }
> 
> We can probably just do:
> 
> if (port_reachable(op)) {
>     add_reachable_flows(prio=80);
> }
> add_unreachable_flows(prio=79);
> 
> The idea being that a packet is either for a "reachable IP" in which
> case we hit the prio 80 flow, or it's not, in which case we hit the prio
> 79 flow.


Ok if add_unreachable_flows() adds flows for all the IPs then this
should work.

> 
> The advantage is we don't have the port dependency for
> add_unreachable_flows() anymore which would allow us to prebuild all
> possible the matches/actions once and just reuse them for all logical
> switches where they need to be added.
> 
> Thanks,
> Dumitru
> 
>>>
>>> As a follow up optimization we can take the same approach as in
>>> 0fd39c6cf0ba "northd: move build_lb_rules in build_lswitch_flows_for_lb"
>>> and precompute the matches once for all load balancer VIPs, and just use
>>> them when installing the flows on the connected logical switch datapaths.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Dumitru
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to