I believe we fixed the issue with the restarting controller in 3.1.1 - this looks like what I would expect in 3.1.0.4. For now, there's minimal impact to the looping other than it looks ugly.
On Sun, Feb 21, 2016 at 2:11 AM, Florian Daniel Otel <[email protected]> wrote: > Kindest thanks Clayton, Jason for being willing to help yet again: > > The info Clayton requested: > > The service status on e.g. "master2" > > > [root@vspose-master2 ~]# systemctl status > atomic-openshift-master-controllers.service > ● atomic-openshift-master-controllers.service - Atomic OpenShift Master > Controllers > Loaded: loaded > (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; > enabled; vendor preset: disabled) > Active: activating (start) since Sun 2016-02-21 06:55:25 UTC; 9s ago > Docs: https://github.com/openshift/origin > Main PID: 54642 (openshift) > CGroup: /system.slice/atomic-openshift-master-controllers.service > └─54642 /usr/bin/openshift start master controllers > --config=/etc/origin/master/master-config.yaml --loglevel=2 > --listen=https://0.0.0.0:8444 > > .... > > The corresponding systemd unit file: > > [root@vspose-master2 systemd]# cat > /usr/lib/systemd/system/atomic-openshift-master-controllers.service > [Unit] > Description=Atomic OpenShift Master Controllers > Documentation=https://github.com/openshift/origin > After=network.target > After=atomic-openshift-master-api.service > Before=atomic-openshift-node.service > Requires=network.target > > [Service] > Type=notify > EnvironmentFile=/etc/sysconfig/atomic-openshift-master-controllers > Environment=GOTRACEBACK=crash > ExecStart=/usr/bin/openshift start master controllers > --config=${CONFIG_FILE} $OPTIONS > LimitNOFILE=131072 > LimitCORE=infinity > WorkingDirectory=/var/lib/origin > SyslogIdentifier=atomic-openshift-master-controllers > Restart=on-failure > > [Install] > WantedBy=multi-user.target > WantedBy=atomic-openshift-node.service > > > > OSE version: > > [root@vspose-master2 systemd]# /usr/bin/openshift version > openshift v3.1.0.4-16-g112fcc4 > kubernetes v1.1.0-origin-1107-g4c8e6f4 > etcd 2.1.2 > > > So far, the procedure I tried for stopping / starting the masters was: > > systemctl stop atomic-openshift-master-controllers.service > systemctl stop atomic-openshift-master-api.service > > > respectively: > > systemctl start atomic-openshift-master-api.service > systemctl start atomic-openshift-master-controllers.service > > > (stopping / staring "atomic-openshift-master-api" seems a bit redundant > since it is a requirement for "atomic-openshift-master-controllers" , but > still... ) > > > Thanks, > > /Florian > > > > On Sun, Feb 21, 2016 at 1:07 AM, Clayton Coleman <[email protected]> > wrote: >> >> >> >> On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <[email protected]> wrote: >> >> >> On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <[email protected]> >> wrote: >> > >> > Hello all, >> > >> > I've installed a setup using multiple masters using "native HA" (i.e. >> > HAproxy) -- just as described here: >> > >> > My problem: >> > >> > After a reboot, on two of my three masters -- namely "master2" and >> > "master3" -- the "atomic-openshift-master-controllers" service keeps >> > respawning every 30 seconds. >> >> This is expected. The controllers service can only be active on a single >> host. The active service acquires a lock within etcd and the others will >> continuously respawn and attempt to acquire the lock. >> >> >> That is not expected - the controllers should start and block until they >> are needed. They should never restart unless the lose their leader lock. >> >> >> > >> > The systemd logs for the service (here master2). >> > >> > >> > Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift >> > Master Controllers... >> > Feb 20 21:13:14 vspose-master2 >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893 3145 >> > plugins.go:71] No cloud provider specified. >> > Feb 20 21:13:14 vspose-master2 >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515 3145 >> > start_master.go:410] Starting controllers on 0.0.0.0:8444 >> > (v3.1.0.4-16-g112fcc4) >> > Feb 20 21:13:14 vspose-master2 >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566 3145 >> > start_master.go:414] Using images from "openshift3/ose-<component>:latest" >> > Feb 20 21:13:14 vspose-master2 >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183 3145 >> > master.go:232] Started health checks at 0.0.0.0:8444 >> > Feb 20 21:13:14 vspose-master2 >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747 3145 >> > master_config.go:250] Attempting to acquire controller lease as >> > master-xct012o4, renewing every 30 seconds >> > Feb 20 21:14:44 vspose-master2 systemd[1]: >> > atomic-openshift-master-controllers.service start operation timed out. >> > Terminating. >> > Feb 20 21:14:44 vspose-master2 systemd[1]: >> > atomic-openshift-master-controllers.service: main process exited, >> > code=exited, status=2/INVALIDARGUMENT >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic >> > OpenShift Master Controllers. >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Unit >> > atomic-openshift-master-controllers.service entered failed state. >> > Feb 20 21:14:44 vspose-master2 systemd[1]: >> > atomic-openshift-master-controllers.service failed. >> > Feb 20 21:14:44 vspose-master2 systemd[1]: >> > atomic-openshift-master-controllers.service holdoff time over, scheduling >> > restart. >> > >> > >> > My questions: >> > >> > - What have gone wrong here ? >> > >> > - How do I recover from this ? >> > >> > - What is the recommended procedure to shut down / restart the OpenShift >> > master services in a multi-master setup ? >> > >> > Normally on a (single) master environment I do "systemctl >> > stop/start/restart atomic-openshift-master" but it seems naturally that the >> > process on a multi-master environment should be more involved -- just >> > cannot >> > find any guidance on this >> > >> > >> > Kindest thanks for the help, >> > >> > >> > /Florian >> > >> > >> > >> > >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> > >> >> _______________________________________________ >> users mailing list >> [email protected] >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > _______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
