Hello all, I've installed a setup using multiple masters using "native HA" (i.e. HAproxy) -- just as described here <https://docs.openshift.com/enterprise/latest/install_config/install/advanced_install.html>:
My problem: After a reboot, on two of my three masters -- namely "master2" and "master3" -- the "atomic-openshift-master-controllers" service keeps respawning every 30 seconds. The systemd logs for the service (here master2). Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift Master Controllers... Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893 3145 plugins.go:71] No cloud provider specified. Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515 3145 start_master.go:410] Starting controllers on 0.0.0.0:8444 (v3.1.0.4-16-g112fcc4) Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566 3145 start_master.go:414] Using images from "openshift3/ose-<component>:latest" Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183 3145 master.go:232] Started health checks at 0.0.0.0:8444 Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747 3145 master_config.go:250] Attempting to acquire controller lease as master-xct012o4, renewing every 30 seconds Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service start operation timed out. Terminating. Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=2/INVALIDARGUMENT Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic OpenShift Master Controllers. Feb 20 21:14:44 vspose-master2 systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state. Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service failed. Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service holdoff time over, scheduling restart. My questions: - What have gone wrong here ? - How do I recover from this ? - What is the recommended procedure to shut down / restart the OpenShift master services in a multi-master setup ? Normally on a (single) master environment I do "systemctl stop/start/restart atomic-openshift-master" but it seems naturally that the process on a multi-master environment should be more involved -- just cannot find any guidance on this Kindest thanks for the help, /Florian
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
