Hi all, regarding the discussion about our HA stack on the pve-user list in October we made some changes, which - hopefully - should address some problems and reduce some common pitfalls.
* What has changed or is new: pct shutdown / qm shutdown and the Shutdown button in the web interface work now as expected, if triggered the HA service will be shut down and not automatically started again. If that is needed there is still the 'reset' functionality. We provide now better feedback about the actual state of a HA service. E.g. 'started' will be only shown if the local resource manager confirmed that the service really started, else we show 'starting' so that it's clearer whats currently happening. We merged the GUI's 'Resource' tab into the 'HA' tab, related information is now placed together. This should give a better overview of the current situation. Note, there are some fields in the resource grid which are hidden by default, to show them click on one of the tiny triangles in the column headers: https://i.imgsafe.org/6a271a3cc4.png Improved the built in documentation. We also reworked the request states for services, there is now: * started (replaces 'enabled') The CRM tries to start the resource. Service state is set to started after successful start. On node failures, or when start fails, it tries to recover the resource. If everything fails, service state it set to error. * stopped (new) The CRM tries to keep the resource in stopped state, but it still tries to relocate the resources on node failures. * disabled The CRM tries to put the resource in stopped state, but does not try to relocate the resources on node failures. The main purpose of this state is error recovery, because it is the only way to move a resource out of the error state. So the general used ones should be now 'started' and 'stopped', here its clear what the HA stack will do. 'disabled' should be mainly used to recover a service which is in the error state. ha-manager enabled/disabled was removed, this was not in the API so it should only affect user which called it directly. You can use `ha-manager set SID --state REQUEST_STATE` instead. * What has still to come: A 'ignore' request state in which the service will not be touched by HA but is still in the resource configuration - this was wished a few times. I have WIP patches ready but nothing merged yet. A bit less confusion on task execution logs. Allowing hard stopping of a VM/CT under HA. I hope this addresses some part of the feedback we got. Many thanks to the community for the feedback and to Dietmar who did a lot of the above mentioned work and also Dominik for his help with the UI. User which want to test this changes can use the new packages we pushed to pvetest yesterday evening CET. The changes are include in the packages: pve-ha-manager >= 1.0-38 pve-manager >= 4.3-11 Happy testing and feel free to provide feedback. cheers, Thomas _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
