Hey Ales, first of all thanks a lot for this series!! Great job and good
approach.

I however have some concerns but maybe they're not much of a deal, please
let me know what you think:

- Broadcast gARPs will cause all ovn-controller instances in the cluster to
attempt to write into the SB database at the same time as they're racing
for the ownership of the MAC_Binding entry. This can cause performance
problems if they're frequent and/or the number of nodes is "large".
What I'd suggest is probably a small random delay to this 'race' so that
ovsdb-server won't need to process all the write txns at once. Also with
this delay there's a chance that some of the nodes will receive the update
of the ownership and abort their write attempt.

- Similar write 'storm' can happen (at smaller scale) if many of those
MAC_Binding entries are added at the same time (as a consequence of a node
with a lot of workloads booting up or similar scenario). In this case,
there's a chance that ovn-controller(s) will attempt to delete those
entries after they age out. Perhaps adding a random number of seconds to
the timeout would alleviate this. IMO this is less of a problem.

- The 'racing' for taking ownership can lead to scenarios where the node
that wins is not really the one communicating with that MAC anymore, so
once it ages out, the other nodes will have to send ARP requests again.
This is probably complicated but perhaps, before deleting the entry, we can
mark it somehow as 'expired' for a while and other nodes can take ownership
during this state if their idle_age < TIMEOUT.

Again, thanks a lot for this!
daniel

On Tue, Jun 14, 2022 at 3:49 PM Ales Musil <amu...@redhat.com> wrote:

> Add MAC binding aging mechanism, that
> should take care of stale MAC bindings.
>
> The mechanism works on "ownership" of the
> MAC binding row. The chassis that creates
> the row is then checking if the "idle_age"
> of the flow is over the aging threshold.
> In that case the MAC binding is removed
> from database. The "owner" might change
> when another chassis saw an update of the
> MAC address.
>
> This approach has downside, the chassis
> that "owns" the MAC binding might not actually be
> the one that is using it actively. This
> might lead some delays in packet flow when
> the row is removed.
>
> The threshold can be configured in
> NB_global table with key "mac_binding_age_threshold"
> in seconds with default value being 60.
>
> The test case is present in the last patch of the series.
>
> Ales Musil (5):
>   Add chassis column to MAC_Binding table
>   Add MAC binding aging mechanism
>   Add stopwatch for MAC binding aging
>   Allow the MAC binding age threshold to be configurable
>   ovn.at: Add test case covering the MAC binding aging
>
>  controller/automake.mk         |   4 +-
>  controller/mac-binding-aging.c | 230 +++++++++++++++++++++++++++++++++
>  controller/mac-binding-aging.h |  32 +++++
>  controller/ovn-controller.c    |  32 +++++
>  controller/pinctrl.c           |  35 +++--
>  northd/northd.c                |  12 ++
>  northd/ovn-northd.c            |   2 +-
>  ovn-nb.xml                     |   5 +
>  ovn-sb.ovsschema               |   6 +-
>  ovn-sb.xml                     |   5 +
>  tests/ovn.at                   | 164 +++++++++++++++++++++--
>  11 files changed, 499 insertions(+), 28 deletions(-)
>  create mode 100644 controller/mac-binding-aging.c
>  create mode 100644 controller/mac-binding-aging.h
>
> --
> 2.35.3
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to