Perfectly fine with me, as long is this is expected behavior and not some silly mistake from my side :)
Thanks again, /Florian On Sun, Feb 21, 2016 at 9:33 PM, Clayton Coleman <[email protected]> wrote: > I believe we fixed the issue with the restarting controller in 3.1.1 - > this looks like what I would expect in 3.1.0.4. For now, there's > minimal impact to the looping other than it looks ugly. > > On Sun, Feb 21, 2016 at 2:11 AM, Florian Daniel Otel > <[email protected]> wrote: > > Kindest thanks Clayton, Jason for being willing to help yet again: > > > > The info Clayton requested: > > > > The service status on e.g. "master2" > > > > > > [root@vspose-master2 ~]# systemctl status > > atomic-openshift-master-controllers.service > > ● atomic-openshift-master-controllers.service - Atomic OpenShift Master > > Controllers > > Loaded: loaded > > (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; > > enabled; vendor preset: disabled) > > Active: activating (start) since Sun 2016-02-21 06:55:25 UTC; 9s ago > > Docs: https://github.com/openshift/origin > > Main PID: 54642 (openshift) > > CGroup: /system.slice/atomic-openshift-master-controllers.service > > └─54642 /usr/bin/openshift start master controllers > > --config=/etc/origin/master/master-config.yaml --loglevel=2 > > --listen=https://0.0.0.0:8444 > > > > .... > > > > The corresponding systemd unit file: > > > > [root@vspose-master2 systemd]# cat > > /usr/lib/systemd/system/atomic-openshift-master-controllers.service > > [Unit] > > Description=Atomic OpenShift Master Controllers > > Documentation=https://github.com/openshift/origin > > After=network.target > > After=atomic-openshift-master-api.service > > Before=atomic-openshift-node.service > > Requires=network.target > > > > [Service] > > Type=notify > > EnvironmentFile=/etc/sysconfig/atomic-openshift-master-controllers > > Environment=GOTRACEBACK=crash > > ExecStart=/usr/bin/openshift start master controllers > > --config=${CONFIG_FILE} $OPTIONS > > LimitNOFILE=131072 > > LimitCORE=infinity > > WorkingDirectory=/var/lib/origin > > SyslogIdentifier=atomic-openshift-master-controllers > > Restart=on-failure > > > > [Install] > > WantedBy=multi-user.target > > WantedBy=atomic-openshift-node.service > > > > > > > > OSE version: > > > > [root@vspose-master2 systemd]# /usr/bin/openshift version > > openshift v3.1.0.4-16-g112fcc4 > > kubernetes v1.1.0-origin-1107-g4c8e6f4 > > etcd 2.1.2 > > > > > > So far, the procedure I tried for stopping / starting the masters was: > > > > systemctl stop atomic-openshift-master-controllers.service > > systemctl stop atomic-openshift-master-api.service > > > > > > respectively: > > > > systemctl start atomic-openshift-master-api.service > > systemctl start atomic-openshift-master-controllers.service > > > > > > (stopping / staring "atomic-openshift-master-api" seems a bit redundant > > since it is a requirement for "atomic-openshift-master-controllers" , but > > still... ) > > > > > > Thanks, > > > > /Florian > > > > > > > > On Sun, Feb 21, 2016 at 1:07 AM, Clayton Coleman <[email protected]> > > wrote: > >> > >> > >> > >> On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <[email protected]> > wrote: > >> > >> > >> On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <[email protected]> > >> wrote: > >> > > >> > Hello all, > >> > > >> > I've installed a setup using multiple masters using "native HA" (i.e. > >> > HAproxy) -- just as described here: > >> > > >> > My problem: > >> > > >> > After a reboot, on two of my three masters -- namely "master2" and > >> > "master3" -- the "atomic-openshift-master-controllers" service keeps > >> > respawning every 30 seconds. > >> > >> This is expected. The controllers service can only be active on a single > >> host. The active service acquires a lock within etcd and the others will > >> continuously respawn and attempt to acquire the lock. > >> > >> > >> That is not expected - the controllers should start and block until they > >> are needed. They should never restart unless the lose their leader > lock. > >> > >> > >> > > >> > The systemd logs for the service (here master2). > >> > > >> > > >> > Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift > >> > Master Controllers... > >> > Feb 20 21:13:14 vspose-master2 > >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893 > 3145 > >> > plugins.go:71] No cloud provider specified. > >> > Feb 20 21:13:14 vspose-master2 > >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515 > 3145 > >> > start_master.go:410] Starting controllers on 0.0.0.0:8444 > >> > (v3.1.0.4-16-g112fcc4) > >> > Feb 20 21:13:14 vspose-master2 > >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566 > 3145 > >> > start_master.go:414] Using images from > "openshift3/ose-<component>:latest" > >> > Feb 20 21:13:14 vspose-master2 > >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183 > 3145 > >> > master.go:232] Started health checks at 0.0.0.0:8444 > >> > Feb 20 21:13:14 vspose-master2 > >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747 > 3145 > >> > master_config.go:250] Attempting to acquire controller lease as > >> > master-xct012o4, renewing every 30 seconds > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: > >> > atomic-openshift-master-controllers.service start operation timed out. > >> > Terminating. > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: > >> > atomic-openshift-master-controllers.service: main process exited, > >> > code=exited, status=2/INVALIDARGUMENT > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic > >> > OpenShift Master Controllers. > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Unit > >> > atomic-openshift-master-controllers.service entered failed state. > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: > >> > atomic-openshift-master-controllers.service failed. > >> > Feb 20 21:14:44 vspose-master2 systemd[1]: > >> > atomic-openshift-master-controllers.service holdoff time over, > scheduling > >> > restart. > >> > > >> > > >> > My questions: > >> > > >> > - What have gone wrong here ? > >> > > >> > - How do I recover from this ? > >> > > >> > - What is the recommended procedure to shut down / restart the > OpenShift > >> > master services in a multi-master setup ? > >> > > >> > Normally on a (single) master environment I do "systemctl > >> > stop/start/restart atomic-openshift-master" but it seems naturally > that the > >> > process on a multi-master environment should be more involved -- just > cannot > >> > find any guidance on this > >> > > >> > > >> > Kindest thanks for the help, > >> > > >> > > >> > /Florian > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > users mailing list > >> > [email protected] > >> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > >> > > >> > >> _______________________________________________ > >> users mailing list > >> [email protected] > >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > > > >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
