On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <[email protected]> wrote:


On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <[email protected]>
wrote:
>
> Hello all,
>
> I've installed a setup using multiple masters using "native HA" (i.e.
HAproxy) -- just as described here:
>
> My problem:
>
> After a reboot, on two of my three masters -- namely "master2" and
"master3" -- the "atomic-openshift-master-controllers" service keeps
respawning every 30 seconds.

This is expected. The controllers service can only be active on a single
host. The active service acquires a lock within etcd and the others will
continuously respawn and attempt to acquire the lock.


That is not expected - the controllers should start and block until they
are needed.  They should never restart unless the lose their leader lock.


>
> The systemd logs for the service (here master2).
>
>
> Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift
Master Controllers...
> Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]:
I0220 21:13:14.669893    3145 plugins.go:71] No cloud provider specified.
> Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]:
I0220 21:13:14.818515    3145 start_master.go:410] Starting controllers on
0.0.0.0:8444 (v3.1.0.4-16-g112fcc4)
> Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]:
I0220 21:13:14.818566    3145 start_master.go:414] Using images from
"openshift3/ose-<component>:latest"
> Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]:
I0220 21:13:14.846183    3145 master.go:232] Started health checks at
0.0.0.0:8444
> Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]:
I0220 21:13:14.864747    3145 master_config.go:250] Attempting to acquire
controller lease as master-xct012o4, renewing every 30 seconds
> Feb 20 21:14:44 vspose-master2 systemd[1]:
atomic-openshift-master-controllers.service start operation timed out.
Terminating.
> Feb 20 21:14:44 vspose-master2 systemd[1]:
atomic-openshift-master-controllers.service: main process exited,
code=exited, status=2/INVALIDARGUMENT
> Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic
OpenShift Master Controllers.
> Feb 20 21:14:44 vspose-master2 systemd[1]: Unit
atomic-openshift-master-controllers.service entered failed state.
> Feb 20 21:14:44 vspose-master2 systemd[1]:
atomic-openshift-master-controllers.service failed.
> Feb 20 21:14:44 vspose-master2 systemd[1]:
atomic-openshift-master-controllers.service holdoff time over, scheduling
restart.
>
>
> My questions:
>
> - What have gone wrong  here ?
>
> - How do I recover from this ?
>
> - What is the recommended procedure to shut down / restart the OpenShift
master services in a multi-master setup ?
>
> Normally on a (single) master environment I do "systemctl
stop/start/restart atomic-openshift-master" but it seems naturally that the
process on a multi-master environment should be more involved -- just
cannot find any guidance on this
>
>
> Kindest thanks for the help,
>
>
> /Florian
>
>
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to