On 2/28/23 16:28, Dumitru Ceara wrote:
> On 2/28/23 16:22, Ilya Maximets wrote:
>> On 2/28/23 16:14, Dumitru Ceara wrote:
>>> On 2/28/23 15:29, Ilya Maximets wrote:
>>>> On 2/28/23 01:03, Dumitru Ceara wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> Thanks for the patch, you're right the hairpin SNAT IP processing in
>>>>> ovn-controller is extremely inefficient today.
>>>>>
>>>>> I didn't review the code closely but I do have some initial remarks below.
>>>>>
>>>>> On 2/20/23 12:42, Ilya Maximets wrote:
>>>>>> It's common to have 'hairpin_snat_ip' to be configured exactly the
>>>>>> same for each and every Load Balancer in a setup.  For example,
>>>>>> ovn-kubernetes does that, setting '169.254.169.5 fd69::5' value
>>>>>> unconditionally for every LB:
>>>>>>   
>>>>>> https://github.com/ovn-org/ovn-kubernetes/blob/cd78ae1af4657d38bdc41003a8737aa958d62b9d/go-controller/pkg/ovn/controller/services/loadbalancer.go#L314
>>>>>>
>>>>>> Current implementation of ovn-controller will create an OpenFlow rule
>>>>>> for every datapath for each VIP in case of 'hairpin_snat_ip', because
>>>>>> we need to handle the case where LBs with the same VIP are applied to
>>>>>> different datapaths and have different SNAT IPs.
>>>>>
>>>>> IMO the real issue is that we do this blindly for every datapath.  It
>>>>> should actually be only for local datapaths.  In ovn-kubernetes (which
>>>>> AFAIK is the only user of hairpin_snat_ip) there will only be a single
>>>>> local logical switch with LBs applied on it.
>>>>
>>>> True, but we will still need to check each datapath this LB is applied
>>>> to for being local, or the other way around, check that each local datapath
>>>> is in the list of datapaths this LB is applied to.
>>>> It's definitely way less work than actually creating the flow, but might
>>>> still be not negligible.  I didn't test though.
>>>>
>>>
>>> Ack, it would be great if we could get some data on this.  Can we take a
>>> 500 node ovn-heater density-heavy NB database and just set
>>> hairpin_snat_ip on all load balancers to see what impact that has on
>>> ovn-controller?
>>>
>>>>>
>>>>>>
>>>>>> In large scale setups with tens of thousands of LBs and hundreds of
>>>>>> nodes, this generates millions of OpenFlow rules, making ovn-controller
>>>>>> choke and fall into unrecoverable disconnection loop with poll
>>>>>> intervals up to 200 seconds in ovn-heater's density-heavy scenario with
>>>>>> just 120 nodes.  It also generates rules with thousands of conjunctions
>>>>>> that do not fit into a single FLOW_MOD, breaking the OpenFlow
>>>>>> management communication.
>>>>>
>>>>> These conjunctive flows were added with:
>>>>>
>>>>> 07467cfac499 ("lflow: Refactor OpenFlow snat hairpin flows")
>>>>>
>>>>> in order to reduce load in ovn-controller for the (now obsolete) use
>>>>> case ovn-kubernetes had.
>>>>>
>>>>> However, Han added:
>>>>>
>>>>> 22298fd37908 ("ovn-controller: Don't flood fill local datapaths beyond
>>>>> DGP boundary.")
>>>>>
>>>>> Which allowed ovn-kubernetes to essentially bind a logical switch to a
>>>>> chassis and which means that we will have at most one local datapath
>>>>> with load balancers applied to it.  At this point, I suspect the
>>>>> original refactor patch could just be reverted.  We would have to check
>>>>> that snat-ip flows (non-conjunctive) are being added only for the local
>>>>> datapath.
>>>>
>>>> This might be OK for ovn-kubernetes.  But wouldn't this explode the
>>>> number of flows for OpenStack case?  All datapaths can be local in
>>>> OpenStack, IIUC.  So, we will go back to N_LBs * N_DPs * N_VIPs
>>>> number of OpenFlow rules.  Or am I missing something?
>>>>
>>>
>>> OpenStack doesn't set hairpin_snat_ip today.
>>
>> But revert will impact not only hairpin_snat_ip case, it will impact
>> all the normal VIPs too.  Wouldn't it?
>>
> 
> True, but OpenStack doesn't apply the same LB to multiple logical
> switches (*) so the actual number of OpenFlows is probably at most N_LBs
> * N_VIPs.  I suspect N_VIPs is 1 (or low) in most cases too.

I posted a "v2" with alternative solution here:
  
https://patchwork.ozlabs.org/project/ovn/patch/20230301223600.1607778-1-i.maxim...@ovn.org/

It seems that we can preserve current behavior for LBs without
hairpin_snat_ip and avoid affecting OpenStack.  I hope, my
assumption is correct. :)

> 
> (*) needs fact checking
> 
>>> Even if it would change
>>> and start using this feature in the future, I don't think we'd have the
>>> same complexity.  If I understand correctly OpenStack's OVN load
>>> balancers are not applied to all datapaths.
>>>
>>>>>
>>>>> Using the database you shared offline (120 node ovn-heater
>>>>> density-heavy) I tried this (didn't test too much though) and there's no
>>>>> visible impact on ovn-controller performance.
>>>>>
>>>>>>
>>>>>> Fix that by checking that all the 'hairpin_snat_ip' options on all the
>>>>>> Load Balancers are exactly the same and using just one OpenFlow rule
>>>>>> per VIP if that's the case.  This is possible because even if LBs with
>>>>>> the same VIP are applied to different datapaths, we're still going to
>>>>>> SNAT with the exactly same hairpin SNAT IP.
>>>>>>
>>>>>> Ideally, we can only check that 'hairpin_snat_ip' is the same for all
>>>>>> LBs with the same VIP, but that is not necessary for now, as the only
>>>>>> common use case for this option today is ovn-kubernetes.  Can be added
>>>>>> in the future by enhancing the logic in ovn-northd, if necessary.
>>>>>>
>>>>>
>>>>> What worries me about this approach is that in the case when one
>>>>> hairpin_snat_ip changes value we suddenly disable the whole optimization.
>>>>
>>>> With the current patch, yes.  With the 'ideally' solution, one will
>>>> need to create 2 LBs with the same VIP and different SNAT IPs.
>>>> This only makes sense if these LBs are on different datapaths, and
>>>> that can't happen in ovn-kubernetes, IIUC.
>>>>
>>>
>>> Given that (very likely) only ovn-kubernetes uses hairpin_snat_ip, can't
>>> we just do the revert + local datapaths check as a short-term fix and
>>> work on the "ideal" solution on the long-term?  It feels like that would
>>> avoid (slightly more) complex code in the interim.
>>
>> "ideal" solution is an incremental change on top of the current patch.
>> Just a slightly more sophisticated northd check.
>>
> 
> Ok.  I'll have another look at the code as soon as I get the chance.
> 
>>>
>>>>>
>>>>> I think I'd like, if possible, to get more performance data on what
>>>>> happens if we revert 07467cfac499 ("lflow: Refactor OpenFlow snat
>>>>> hairpin flows") and only add snat ip flows for local datapaths.
>>>>>
>>>>> What do you think?
>>>>
>>>> See my concern about reverting above.
>>>>
>>>
>>> Ack, replied there.
>>>
>>>>>
>>>>>> Note:
>>>>>>
>>>>>> There seems to be a separate issue in ovn-controller.  If only Load
>>>>>> Balancer data is updated, I-P processing is going through the 'lb_data'
>>>>>> node.  And handler for that node deletes old flows and adds new ones
>>>>>> for each updated LB.  This triggers OpenFlow flow DEL + ADD.  On DEL
>>>>>> of hairpin reply learning flows in table 68, OVS cleans up all the
>>>>>> learned flows in table 69 as well, because of 'delete_learned' flag.
>>>>>
>>>>> Won't this potentially affect dataplane performance for hairpin traffic?
>>>>
>>>> It will, but since the learning flow actually changed in this case,
>>>> it's correct thing to do to drop the flows learned from it.
>>>>
>>>
>>> I see.
>>>
>>>>>
>>>>>>
>>>>>> However, when changes are happening in higher level nodes and
>>>>>> en_lflow_output_run() triggers lflow_run(), lflow_run() doesn't delete
>>>>>> current hairpin flows before re-processing them.  That leads to the
>>>>>> flow MOD instead of DEL + ADD.  Since the learning flow is not deleted,
>>>>>> but modified, that doesn't trigger removal of learned rules in OVS.
>>>>>> This is what happening if 'lb_same_hairpin_ips' flips the value.
>>>>>> The test adjusted to change the 'hairpin_snat_ip' twice, to trigger
>>>>>> an 'lb_data' processing.  Otherwise, old learned flow for the VIP
>>>>>> stays in table 69.
>>>>>>
>>>>>> This should not be a huge issue, as it's still a VIP address and it
>>>>>> should not make any harm to have that learned flow.  Also, the value
>>>>>> of 'lb_same_hairpin_ips' doesn't ever flip in current ovn-kubernetes.
>>>>>>
>>>>>> Reported-at: https://bugzilla.redhat.com/2171423
>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>>>> Regards,
>>>>> Dumitru
>>>>>
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to