Hi, Ales, Daniel, On 6/15/22 14:36, Daniel Alvarez Sanchez wrote: > On Wed, Jun 15, 2022 at 1:54 PM Ales Musil <[email protected]> wrote: > >> Hi Daniel, >> >> thank you for the suggestions, please see my replies inline. >> >> On Wed, Jun 15, 2022 at 1:00 PM Daniel Alvarez Sanchez < >> [email protected]> wrote: >> >>> Hey Ales, first of all thanks a lot for this series!! Great job and good >>> approach. >>> >>> I however have some concerns but maybe they're not much of a deal, please >>> let me know what you think: >>> >>> - Broadcast gARPs will cause all ovn-controller instances in the cluster >>> to attempt to write into the SB database at the same time as they're racing >>> for the ownership of the MAC_Binding entry. This can cause performance >>> problems if they're frequent and/or the number of nodes is "large". >>> What I'd suggest is probably a small random delay to this 'race' so that >>> ovsdb-server won't need to process all the write txns at once. Also with >>> this delay there's a chance that some of the nodes will receive the update >>> of the ownership and abort their write attempt. >>> >> >> This is definitely a potential problem, thank you for bringing it up. It >> seems like this race was there always, even without the chassis that "owns" >> the binding controllers would still try to add the same MAC binding when >> the gARP arrived at multiple of them. So it is actually a broader issue, >> still definitely something that would deserve to be addressed. >> > > Indeed, I suspected that this problem existed before. It can be addressed > as a follow up perhaps but thanks for the inputs. > >
If we don't do it in this series could you, please, add a TODO item for this? >> >> >>> >>> - Similar write 'storm' can happen (at smaller scale) if many of those >>> MAC_Binding entries are added at the same time (as a consequence of a node >>> with a lot of workloads booting up or similar scenario). In this case, >>> there's a chance that ovn-controller(s) will attempt to delete those >>> entries after they age out. Perhaps adding a random number of seconds to >>> the timeout would alleviate this. IMO this is less of a problem. >>> >> >> This might indeed be an issue, however in a real scenario it does not seem >> likely that all binding from this "storm" would be removed at once. Also >> the 60 sec timeout by default is probably not the best choice >> so considering something like 300 secs, it would be a burst of removals >> but still spaced enough apart IMO. >> > > You're probably right. The scenario that I had in mind is a node hosting a > lot of workloads with VIPs for example (could be an OpenShift/kubernetes > deployed on top of OpenStack) and announcing their location with gARPs, > upon reboot. All those entries may be added to the cluster right at the > same time and, therefore, expire at the same time *if* they are not used. > Again, more unlikely but experience tells me that these things happen :p > In any case, the problem is much smaller than the first one I described, > even in the worst case. > > I think we need, at least, to limit the max number of mac bindings we remove in a single iteration/transaction. This is just to avoid very large batch removals, e.g., after a long recompute in the main thread). > >> >> >>> >>> - The 'racing' for taking ownership can lead to scenarios where the node >>> that wins is not really the one communicating with that MAC anymore, so >>> once it ages out, the other nodes will have to send ARP requests again. >>> This is probably complicated but perhaps, before deleting the entry, we can >>> mark it somehow as 'expired' for a while and other nodes can take ownership >>> during this state if their idle_age < TIMEOUT. >>> >> >> That's probably the biggest downside of this approach. I would like to >> measure if it is causing any issues, if so communicating to other >> ovn-controllers if someone wants to take ownership might be the way to go. >> > > I'd say it is hard to tell what issues will cause aside from a few > broadcast packets and some delays in the dataplane due to having to > re-learn the MAC address. I believe that it is quite likely that the > ovn-controller instance that owns the entry is not really the one > communicating in unicast with that particular endpoint. The larger the > cluster, the more likely this scenario is. > > What I was suggesting is perhaps marking it as 'expired' > (external_ids:expired=timestamp). If the external_id is set, then other > ovn-controllers can take ownership if idle_age < some_threshold. Taking > ownership would clear the external_id:expired and change the 'chassis'. If > nobody takes ownership, then the previous owner (as the 'chassis' column is > still populated), will delete the entry after a certain amount of time. > This complicates things a bit but saves some broadcast traffic in the > network and avoid dataplane delays. > It would be great if this could be implemented indeed. >> >> >>> >>> Again, thanks a lot for this! >>> daniel >>> >> >> Thanks, >> Ales >> Thanks, Dumitru >> >>> >>> On Tue, Jun 14, 2022 at 3:49 PM Ales Musil <[email protected]> wrote: >>> >>>> Add MAC binding aging mechanism, that >>>> should take care of stale MAC bindings. >>>> >>>> The mechanism works on "ownership" of the >>>> MAC binding row. The chassis that creates >>>> the row is then checking if the "idle_age" >>>> of the flow is over the aging threshold. >>>> In that case the MAC binding is removed >>>> from database. The "owner" might change >>>> when another chassis saw an update of the >>>> MAC address. >>>> >>>> This approach has downside, the chassis >>>> that "owns" the MAC binding might not actually be >>>> the one that is using it actively. This >>>> might lead some delays in packet flow when >>>> the row is removed. >>>> >>>> The threshold can be configured in >>>> NB_global table with key "mac_binding_age_threshold" >>>> in seconds with default value being 60. >>>> >>>> The test case is present in the last patch of the series. >>>> >>>> Ales Musil (5): >>>> Add chassis column to MAC_Binding table >>>> Add MAC binding aging mechanism >>>> Add stopwatch for MAC binding aging >>>> Allow the MAC binding age threshold to be configurable >>>> ovn.at: Add test case covering the MAC binding aging >>>> >>>> controller/automake.mk | 4 +- >>>> controller/mac-binding-aging.c | 230 +++++++++++++++++++++++++++++++++ >>>> controller/mac-binding-aging.h | 32 +++++ >>>> controller/ovn-controller.c | 32 +++++ >>>> controller/pinctrl.c | 35 +++-- >>>> northd/northd.c | 12 ++ >>>> northd/ovn-northd.c | 2 +- >>>> ovn-nb.xml | 5 + >>>> ovn-sb.ovsschema | 6 +- >>>> ovn-sb.xml | 5 + >>>> tests/ovn.at | 164 +++++++++++++++++++++-- >>>> 11 files changed, 499 insertions(+), 28 deletions(-) >>>> create mode 100644 controller/mac-binding-aging.c >>>> create mode 100644 controller/mac-binding-aging.h >>>> >>>> -- >>>> 2.35.3 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>>> >> >> -- >> >> Ales Musil >> >> Senior Software Engineer - OVN Core >> >> Red Hat EMEA <https://www.redhat.com> >> >> [email protected] IM: amusil >> <https://red.ht/sig> >> > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
