Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)

Florian Daniel Otel Sun, 21 Feb 2016 13:05:07 -0800

Perfectly fine with me, as long is this is expected behavior and not some
silly mistake from my side :)


Thanks again,

/Florian


On Sun, Feb 21, 2016 at 9:33 PM, Clayton Coleman <[email protected]>
wrote:

> I believe we fixed the issue with the restarting controller in 3.1.1 -
> this looks like what I would expect in 3.1.0.4.  For now, there's
> minimal impact to the looping other than it looks ugly.
>
> On Sun, Feb 21, 2016 at 2:11 AM, Florian Daniel Otel
> <[email protected]> wrote:
> > Kindest thanks Clayton, Jason for being willing to help yet again:
> >
> > The info Clayton requested:
> >
> > The service status on e.g. "master2"
> >
> >
> > [root@vspose-master2 ~]# systemctl status
> > atomic-openshift-master-controllers.service
> > ● atomic-openshift-master-controllers.service - Atomic OpenShift Master
> > Controllers
> >    Loaded: loaded
> > (/usr/lib/systemd/system/atomic-openshift-master-controllers.service;
> > enabled; vendor preset: disabled)
> >    Active: activating (start) since Sun 2016-02-21 06:55:25 UTC; 9s ago
> >      Docs: https://github.com/openshift/origin
> >  Main PID: 54642 (openshift)
> >    CGroup: /system.slice/atomic-openshift-master-controllers.service
> >            └─54642 /usr/bin/openshift start master controllers
> > --config=/etc/origin/master/master-config.yaml --loglevel=2
> > --listen=https://0.0.0.0:8444
> >
> > ....
> >
> > The corresponding systemd unit file:
> >
> > [root@vspose-master2 systemd]# cat
> > /usr/lib/systemd/system/atomic-openshift-master-controllers.service
> > [Unit]
> > Description=Atomic OpenShift Master Controllers
> > Documentation=https://github.com/openshift/origin
> > After=network.target
> > After=atomic-openshift-master-api.service
> > Before=atomic-openshift-node.service
> > Requires=network.target
> >
> > [Service]
> > Type=notify
> > EnvironmentFile=/etc/sysconfig/atomic-openshift-master-controllers
> > Environment=GOTRACEBACK=crash
> > ExecStart=/usr/bin/openshift start master controllers
> > --config=${CONFIG_FILE} $OPTIONS
> > LimitNOFILE=131072
> > LimitCORE=infinity
> > WorkingDirectory=/var/lib/origin
> > SyslogIdentifier=atomic-openshift-master-controllers
> > Restart=on-failure
> >
> > [Install]
> > WantedBy=multi-user.target
> > WantedBy=atomic-openshift-node.service
> >
> >
> >
> > OSE version:
> >
> > [root@vspose-master2 systemd]# /usr/bin/openshift version
> > openshift v3.1.0.4-16-g112fcc4
> > kubernetes v1.1.0-origin-1107-g4c8e6f4
> > etcd 2.1.2
> >
> >
> > So far, the procedure I tried for stopping / starting the masters was:
> >
> >      systemctl stop atomic-openshift-master-controllers.service
> >      systemctl stop atomic-openshift-master-api.service
> >
> >
> > respectively:
> >
> >      systemctl start atomic-openshift-master-api.service
> >      systemctl start atomic-openshift-master-controllers.service
> >
> >
> > (stopping / staring "atomic-openshift-master-api" seems a bit redundant
> > since it is a requirement for "atomic-openshift-master-controllers" , but
> > still... )
> >
> >
> > Thanks,
> >
> > /Florian
> >
> >
> >
> > On Sun, Feb 21, 2016 at 1:07 AM, Clayton Coleman <[email protected]>
> > wrote:
> >>
> >>
> >>
> >> On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <[email protected]>
> wrote:
> >>
> >>
> >> On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <[email protected]>
> >> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > I've installed a setup using multiple masters using "native HA" (i.e.
> >> > HAproxy) -- just as described here:
> >> >
> >> > My problem:
> >> >
> >> > After a reboot, on two of my three masters -- namely "master2" and
> >> > "master3" -- the "atomic-openshift-master-controllers" service keeps
> >> > respawning every 30 seconds.
> >>
> >> This is expected. The controllers service can only be active on a single
> >> host. The active service acquires a lock within etcd and the others will
> >> continuously respawn and attempt to acquire the lock.
> >>
> >>
> >> That is not expected - the controllers should start and block until they
> >> are needed.  They should never restart unless the lose their leader
> lock.
> >>
> >>
> >> >
> >> > The systemd logs for the service (here master2).
> >> >
> >> >
> >> > Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift
> >> > Master Controllers...
> >> > Feb 20 21:13:14 vspose-master2
> >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893
> 3145
> >> > plugins.go:71] No cloud provider specified.
> >> > Feb 20 21:13:14 vspose-master2
> >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515
> 3145
> >> > start_master.go:410] Starting controllers on 0.0.0.0:8444
> >> > (v3.1.0.4-16-g112fcc4)
> >> > Feb 20 21:13:14 vspose-master2
> >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566
> 3145
> >> > start_master.go:414] Using images from
> "openshift3/ose-<component>:latest"
> >> > Feb 20 21:13:14 vspose-master2
> >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183
> 3145
> >> > master.go:232] Started health checks at 0.0.0.0:8444
> >> > Feb 20 21:13:14 vspose-master2
> >> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747
> 3145
> >> > master_config.go:250] Attempting to acquire controller lease as
> >> > master-xct012o4, renewing every 30 seconds
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> >> > atomic-openshift-master-controllers.service start operation timed out.
> >> > Terminating.
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> >> > atomic-openshift-master-controllers.service: main process exited,
> >> > code=exited, status=2/INVALIDARGUMENT
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic
> >> > OpenShift Master Controllers.
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]: Unit
> >> > atomic-openshift-master-controllers.service entered failed state.
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> >> > atomic-openshift-master-controllers.service failed.
> >> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> >> > atomic-openshift-master-controllers.service holdoff time over,
> scheduling
> >> > restart.
> >> >
> >> >
> >> > My questions:
> >> >
> >> > - What have gone wrong  here ?
> >> >
> >> > - How do I recover from this ?
> >> >
> >> > - What is the recommended procedure to shut down / restart the
> OpenShift
> >> > master services in a multi-master setup ?
> >> >
> >> > Normally on a (single) master environment I do "systemctl
> >> > stop/start/restart atomic-openshift-master" but it seems naturally
> that the
> >> > process on a multi-master environment should be more involved -- just
> cannot
> >> > find any guidance on this
> >> >
> >> >
> >> > Kindest thanks for the help,
> >> >
> >> >
> >> > /Florian
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > [email protected]
> >> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >> >
> >>
> >> _______________________________________________
> >> users mailing list
> >> [email protected]
> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
> >
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)

Reply via email to