On Wed, Jun 15, 2022 at 1:54 PM Ales Musil <[email protected]> wrote:
> Hi Daniel, > > thank you for the suggestions, please see my replies inline. > > On Wed, Jun 15, 2022 at 1:00 PM Daniel Alvarez Sanchez < > [email protected]> wrote: > >> Hey Ales, first of all thanks a lot for this series!! Great job and good >> approach. >> >> I however have some concerns but maybe they're not much of a deal, please >> let me know what you think: >> >> - Broadcast gARPs will cause all ovn-controller instances in the cluster >> to attempt to write into the SB database at the same time as they're racing >> for the ownership of the MAC_Binding entry. This can cause performance >> problems if they're frequent and/or the number of nodes is "large". >> What I'd suggest is probably a small random delay to this 'race' so that >> ovsdb-server won't need to process all the write txns at once. Also with >> this delay there's a chance that some of the nodes will receive the update >> of the ownership and abort their write attempt. >> > > This is definitely a potential problem, thank you for bringing it up. It > seems like this race was there always, even without the chassis that "owns" > the binding controllers would still try to add the same MAC binding when > the gARP arrived at multiple of them. So it is actually a broader issue, > still definitely something that would deserve to be addressed. > Indeed, I suspected that this problem existed before. It can be addressed as a follow up perhaps but thanks for the inputs. > > >> >> - Similar write 'storm' can happen (at smaller scale) if many of those >> MAC_Binding entries are added at the same time (as a consequence of a node >> with a lot of workloads booting up or similar scenario). In this case, >> there's a chance that ovn-controller(s) will attempt to delete those >> entries after they age out. Perhaps adding a random number of seconds to >> the timeout would alleviate this. IMO this is less of a problem. >> > > This might indeed be an issue, however in a real scenario it does not seem > likely that all binding from this "storm" would be removed at once. Also > the 60 sec timeout by default is probably not the best choice > so considering something like 300 secs, it would be a burst of removals > but still spaced enough apart IMO. > You're probably right. The scenario that I had in mind is a node hosting a lot of workloads with VIPs for example (could be an OpenShift/kubernetes deployed on top of OpenStack) and announcing their location with gARPs, upon reboot. All those entries may be added to the cluster right at the same time and, therefore, expire at the same time *if* they are not used. Again, more unlikely but experience tells me that these things happen :p In any case, the problem is much smaller than the first one I described, even in the worst case. > > >> >> - The 'racing' for taking ownership can lead to scenarios where the node >> that wins is not really the one communicating with that MAC anymore, so >> once it ages out, the other nodes will have to send ARP requests again. >> This is probably complicated but perhaps, before deleting the entry, we can >> mark it somehow as 'expired' for a while and other nodes can take ownership >> during this state if their idle_age < TIMEOUT. >> > > That's probably the biggest downside of this approach. I would like to > measure if it is causing any issues, if so communicating to other > ovn-controllers if someone wants to take ownership might be the way to go. > I'd say it is hard to tell what issues will cause aside from a few broadcast packets and some delays in the dataplane due to having to re-learn the MAC address. I believe that it is quite likely that the ovn-controller instance that owns the entry is not really the one communicating in unicast with that particular endpoint. The larger the cluster, the more likely this scenario is. What I was suggesting is perhaps marking it as 'expired' (external_ids:expired=timestamp). If the external_id is set, then other ovn-controllers can take ownership if idle_age < some_threshold. Taking ownership would clear the external_id:expired and change the 'chassis'. If nobody takes ownership, then the previous owner (as the 'chassis' column is still populated), will delete the entry after a certain amount of time. This complicates things a bit but saves some broadcast traffic in the network and avoid dataplane delays. > > >> >> Again, thanks a lot for this! >> daniel >> > > Thanks, > Ales > > >> >> On Tue, Jun 14, 2022 at 3:49 PM Ales Musil <[email protected]> wrote: >> >>> Add MAC binding aging mechanism, that >>> should take care of stale MAC bindings. >>> >>> The mechanism works on "ownership" of the >>> MAC binding row. The chassis that creates >>> the row is then checking if the "idle_age" >>> of the flow is over the aging threshold. >>> In that case the MAC binding is removed >>> from database. The "owner" might change >>> when another chassis saw an update of the >>> MAC address. >>> >>> This approach has downside, the chassis >>> that "owns" the MAC binding might not actually be >>> the one that is using it actively. This >>> might lead some delays in packet flow when >>> the row is removed. >>> >>> The threshold can be configured in >>> NB_global table with key "mac_binding_age_threshold" >>> in seconds with default value being 60. >>> >>> The test case is present in the last patch of the series. >>> >>> Ales Musil (5): >>> Add chassis column to MAC_Binding table >>> Add MAC binding aging mechanism >>> Add stopwatch for MAC binding aging >>> Allow the MAC binding age threshold to be configurable >>> ovn.at: Add test case covering the MAC binding aging >>> >>> controller/automake.mk | 4 +- >>> controller/mac-binding-aging.c | 230 +++++++++++++++++++++++++++++++++ >>> controller/mac-binding-aging.h | 32 +++++ >>> controller/ovn-controller.c | 32 +++++ >>> controller/pinctrl.c | 35 +++-- >>> northd/northd.c | 12 ++ >>> northd/ovn-northd.c | 2 +- >>> ovn-nb.xml | 5 + >>> ovn-sb.ovsschema | 6 +- >>> ovn-sb.xml | 5 + >>> tests/ovn.at | 164 +++++++++++++++++++++-- >>> 11 files changed, 499 insertions(+), 28 deletions(-) >>> create mode 100644 controller/mac-binding-aging.c >>> create mode 100644 controller/mac-binding-aging.h >>> >>> -- >>> 2.35.3 >>> >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] IM: amusil > <https://red.ht/sig> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
