Gave this version another spin today, focusing on the migration from groups to rules. I tested this 3-node and 5-node clusters. Went through the following scenarios:
1) At least one of the nodes in the cluster not at minimum version required for migration to rules 2) At least one node offline during the attempt to migrate to rules In both of the above cases, only the in-memory mapping of groups to rules will happen. Groups continue to work on the PVE 8 nodes and rules continue work on the PVE 9 nodes. It should be noted that the nofailback flag is not inverted for the resources while the rules are still in-memory. This "switch" from nofailback to failback only occurs once the migration is persisted. 3) Updating the remaining PVE 8 nodes one after another Persistent migration started soon after all nodes were upgraded to PVE 9 (there is a slight delay since the check if groups need to migrated does not happen every round). Worked smoothly and I did not notice any discrepancies in the rules.cfg generated from the groups.cfg. 4) Migration with non-existent groups in resource.cfg 5) Invalid properties in resources.cfg or groups.cfg 6) Partially upgrading the cluster, editing a rule on a PVE 9 node This will not persist. It is not unexpected, since the rules exist only in-memory at this point, but users should probably be warned about making any changes to rules mid-upgrade. Dano already incorporated feedback from Hannes' and my tests and we also tested updated versions that fix the problems that we noticed, just documenting it here for the sake of completeness. The migration from groups to rules overall worked very well in the cases where migration was already possible and did not proceed (and provided informative errors or warnings) if it was not. On 7/29/25 20:03, Daniel Kral wrote: > Here's a quick update on the core HA rules series. This cleans up the > series so that all tests are running again and includes the missing ui > patch that I didn't see missing last time. > > The persistent migration path has been tested for at least four full > upgrade runs now, always with one node being behind and checking that > the group config is only removed as soon as all nodes are on the right > version. > > I'll wait for tomorrow if something comes up and will do some testing > myself, so I'm anticipating to follow up on this tomorrow. I'll also > want to get a more mature version of the HA resource affinity series > ready for tomorrow on the mailing list. > > For maintainers: ha-manager patch #19 should be updated to the correct > pve-manager version that is dependent on the pve-ha-manager package > which can interpret the HA rules config. > > Changelog since v3 > ------------------ > > - rebased on newest available master > > - included missing ui patch for web interface > > - correction in failback property description (does not influence the ha > node affinity rules) > > - migrated the groups configs in the test cases to node affinity rules > in rules configs (except two test cases for the persistent migration) > > - improved persistent ha group migration process > > - try a persistent upgrade only every 10 HA manager rounds > > - various other minor touches > > TODO > ---- > > - More testing on edge cases for the HA Manager migration path > > - Some more testing of the ha-manager CLI and adding a deprecation > warning on the HA Groups API and disallowing requests as soon as the > groups config is fully migrated _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel