Hi, Ales, Daniel,

On 6/15/22 14:36, Daniel Alvarez Sanchez wrote:
> On Wed, Jun 15, 2022 at 1:54 PM Ales Musil <[email protected]> wrote:
> 
>> Hi Daniel,
>>
>> thank you for the suggestions, please see my replies inline.
>>
>> On Wed, Jun 15, 2022 at 1:00 PM Daniel Alvarez Sanchez <
>> [email protected]> wrote:
>>
>>> Hey Ales, first of all thanks a lot for this series!! Great job and good
>>> approach.
>>>
>>> I however have some concerns but maybe they're not much of a deal, please
>>> let me know what you think:
>>>
>>> - Broadcast gARPs will cause all ovn-controller instances in the cluster
>>> to attempt to write into the SB database at the same time as they're racing
>>> for the ownership of the MAC_Binding entry. This can cause performance
>>> problems if they're frequent and/or the number of nodes is "large".
>>> What I'd suggest is probably a small random delay to this 'race' so that
>>> ovsdb-server won't need to process all the write txns at once. Also with
>>> this delay there's a chance that some of the nodes will receive the update
>>> of the ownership and abort their write attempt.
>>>
>>
>> This is definitely a potential problem, thank you for bringing it up. It
>> seems like this race was there always, even without the chassis that "owns"
>> the binding controllers would still try to add the same MAC binding when
>> the gARP arrived at multiple of them. So it is actually a broader issue,
>> still definitely something that would deserve to be addressed.
>>
> 
> Indeed, I suspected that this problem existed before. It can be addressed
> as a follow up perhaps but thanks for the inputs.
> 
> 

If we don't do it in this series could you, please, add a TODO item for
this?

>>
>>
>>>
>>> - Similar write 'storm' can happen (at smaller scale) if many of those
>>> MAC_Binding entries are added at the same time (as a consequence of a node
>>> with a lot of workloads booting up or similar scenario). In this case,
>>> there's a chance that ovn-controller(s) will attempt to delete those
>>> entries after they age out. Perhaps adding a random number of seconds to
>>> the timeout would alleviate this. IMO this is less of a problem.
>>>
>>
>> This might indeed be an issue, however in a real scenario it does not seem
>> likely that all binding from this "storm" would be removed at once. Also
>> the 60 sec timeout by default is probably not the best choice
>> so considering something like 300 secs, it would be a burst of removals
>> but still spaced enough apart IMO.
>>
> 
> You're probably right. The scenario that I had in mind is a node hosting a
> lot of workloads with VIPs for example (could be an OpenShift/kubernetes
> deployed on top of OpenStack) and announcing their location with gARPs,
> upon reboot. All those entries may be added to the cluster right at the
> same time and, therefore, expire at the same time *if* they are not used.
> Again, more unlikely but experience tells me that these things happen :p
> In any case, the problem is much smaller than the first one I described,
> even in the worst case.
> 
> 

I think we need, at least, to limit the max number of mac bindings we
remove in a single iteration/transaction.  This is just to avoid very
large batch removals, e.g., after a long recompute in the main thread).

> 
>>
>>
>>>
>>> - The 'racing' for taking ownership can lead to scenarios where the node
>>> that wins is not really the one communicating with that MAC anymore, so
>>> once it ages out, the other nodes will have to send ARP requests again.
>>> This is probably complicated but perhaps, before deleting the entry, we can
>>> mark it somehow as 'expired' for a while and other nodes can take ownership
>>> during this state if their idle_age < TIMEOUT.
>>>
>>
>> That's probably the biggest downside of this approach. I would like to
>> measure if it is causing any issues, if so communicating to other
>> ovn-controllers if someone wants to take ownership might be the way to go.
>>
> 
> I'd say it is hard to tell what issues will cause aside from a few
> broadcast packets and some delays in the dataplane due to having to
> re-learn the MAC address. I believe that it is quite likely that the
> ovn-controller instance that owns the entry is not really the one
> communicating in unicast with that particular endpoint. The larger the
> cluster, the more likely this scenario is.
> 
> What I was suggesting is perhaps marking it as 'expired'
> (external_ids:expired=timestamp). If the external_id is set, then other
> ovn-controllers can take ownership if idle_age < some_threshold. Taking
> ownership would clear the external_id:expired and change the 'chassis'. If
> nobody takes ownership, then the previous owner (as the 'chassis' column is
> still populated), will delete the entry after a certain amount of time.
> This complicates things a bit but saves some broadcast traffic in the
> network and avoid dataplane delays.
> 

It would be great if this could be implemented indeed.

>>
>>
>>>
>>> Again, thanks a lot for this!
>>> daniel
>>>
>>
>> Thanks,
>> Ales
>>

Thanks,
Dumitru

>>
>>>
>>> On Tue, Jun 14, 2022 at 3:49 PM Ales Musil <[email protected]> wrote:
>>>
>>>> Add MAC binding aging mechanism, that
>>>> should take care of stale MAC bindings.
>>>>
>>>> The mechanism works on "ownership" of the
>>>> MAC binding row. The chassis that creates
>>>> the row is then checking if the "idle_age"
>>>> of the flow is over the aging threshold.
>>>> In that case the MAC binding is removed
>>>> from database. The "owner" might change
>>>> when another chassis saw an update of the
>>>> MAC address.
>>>>
>>>> This approach has downside, the chassis
>>>> that "owns" the MAC binding might not actually be
>>>> the one that is using it actively. This
>>>> might lead some delays in packet flow when
>>>> the row is removed.
>>>>
>>>> The threshold can be configured in
>>>> NB_global table with key "mac_binding_age_threshold"
>>>> in seconds with default value being 60.
>>>>
>>>> The test case is present in the last patch of the series.
>>>>
>>>> Ales Musil (5):
>>>>   Add chassis column to MAC_Binding table
>>>>   Add MAC binding aging mechanism
>>>>   Add stopwatch for MAC binding aging
>>>>   Allow the MAC binding age threshold to be configurable
>>>>   ovn.at: Add test case covering the MAC binding aging
>>>>
>>>>  controller/automake.mk         |   4 +-
>>>>  controller/mac-binding-aging.c | 230 +++++++++++++++++++++++++++++++++
>>>>  controller/mac-binding-aging.h |  32 +++++
>>>>  controller/ovn-controller.c    |  32 +++++
>>>>  controller/pinctrl.c           |  35 +++--
>>>>  northd/northd.c                |  12 ++
>>>>  northd/ovn-northd.c            |   2 +-
>>>>  ovn-nb.xml                     |   5 +
>>>>  ovn-sb.ovsschema               |   6 +-
>>>>  ovn-sb.xml                     |   5 +
>>>>  tests/ovn.at                   | 164 +++++++++++++++++++++--
>>>>  11 files changed, 499 insertions(+), 28 deletions(-)
>>>>  create mode 100644 controller/mac-binding-aging.c
>>>>  create mode 100644 controller/mac-binding-aging.h
>>>>
>>>> --
>>>> 2.35.3
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>>
>>
>> --
>>
>> Ales Musil
>>
>> Senior Software Engineer - OVN Core
>>
>> Red Hat EMEA <https://www.redhat.com>
>>
>> [email protected]    IM: amusil
>> <https://red.ht/sig>
>>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to