On Fri Oct 17, 2025 at 2:32 PM CEST, Fiona Ebner wrote: > Am 30.09.25 um 4:20 PM schrieb Daniel Kral: >> The HA Manager builds $online_node_usage in every FSM iteration in >> manage(...) and at every HA resource state change in >> change_service_state(...). This becomes quite costly with a high HA >> resource count and a lot of state changes happening at once, e.g. >> starting up multiple nodes with rebalance_on_request_start set or a >> failover of a node with many configured HA resources. >> >> To improve this situation, make the changes to the $online_node_usage >> more granular by building $online_node_usage only once per call to >> manage(...) and changing the nodes a HA resource uses individually on >> every HA resource state transition. >> >> The change in service usage "freshness" should be negligible here as the >> static service usage data is cached anyway (except if the cache fails >> for some reason). > > But the cache is refreshed on every recompute_online_node_usage(), which > happened much more frequently before, so the fact that it's cached > doesn't seem like a strong argument here? > > I /do/ think there is a real tradeoff being made, namely "the ability to > manage much larger fleets of guests" versus "immediately incorporating > every guest config change in decisions". Config changes that would lead > to wildly different decisions would need to be timed very badly to cause > actual issues and should be rare to begin with. Also, with PSI-based > information, things are also less "instant", I don't see an issue with > moving in the same direction.
Right, I'll change that to better reflect the tradeoff here! _______________________________________________ pve-devel mailing list [email protected] https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
