Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)

Florian Daniel Otel Sat, 20 Feb 2016 23:14:03 -0800

Kindest thanks Clayton, Jason for being willing to help yet again:

The info Clayton requested:


The service status on e.g. "master2"


[root@vspose-master2 ~]# systemctl status
 atomic-openshift-master-controllers.service
● atomic-openshift-master-controllers.service - Atomic OpenShift Master
Controllers
   Loaded: loaded
(/usr/lib/systemd/system/atomic-openshift-master-controllers.service;
enabled; vendor preset: disabled)
   Active: activating (start) since Sun 2016-02-21 06:55:25 UTC; 9s ago
     Docs: https://github.com/openshift/origin
 Main PID: 54642 (openshift)
   CGroup: /system.slice/atomic-openshift-master-controllers.service
           └─54642 /usr/bin/openshift start master controllers
--config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=
https://0.0.0.0:8444

....

The corresponding systemd unit file:

[root@vspose-master2 systemd]# cat
/usr/lib/systemd/system/atomic-openshift-master-controllers.service
[Unit]
Description=Atomic OpenShift Master Controllers
Documentation=https://github.com/openshift/origin
After=network.target
After=atomic-openshift-master-api.service
Before=atomic-openshift-node.service
Requires=network.target

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/atomic-openshift-master-controllers
Environment=GOTRACEBACK=crash
ExecStart=/usr/bin/openshift start master controllers
--config=${CONFIG_FILE} $OPTIONS
LimitNOFILE=131072
LimitCORE=infinity
WorkingDirectory=/var/lib/origin
SyslogIdentifier=atomic-openshift-master-controllers
Restart=on-failure

[Install]
WantedBy=multi-user.target
WantedBy=atomic-openshift-node.service



OSE version:

[root@vspose-master2 systemd]# /usr/bin/openshift version
openshift v3.1.0.4-16-g112fcc4
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2


So far, the procedure I tried for stopping / starting the masters was:

     systemctl stop atomic-openshift-master-controllers.service
     systemctl stop atomic-openshift-master-api.service


respectively:

     systemctl start atomic-openshift-master-api.service
     systemctl start atomic-openshift-master-controllers.service


(stopping / staring "atomic-openshift-master-api" seems a bit redundant
since it is a requirement for "atomic-openshift-master-controllers" , but
still... )


Thanks,

/Florian



On Sun, Feb 21, 2016 at 1:07 AM, Clayton Coleman <[email protected]>
wrote:

>
>
> On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <[email protected]> wrote:
>
>
> On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <[email protected]>
> wrote:
> >
> > Hello all,
> >
> > I've installed a setup using multiple masters using "native HA" (i.e.
> HAproxy) -- just as described here:
> >
> > My problem:
> >
> > After a reboot, on two of my three masters -- namely "master2" and
> "master3" -- the "atomic-openshift-master-controllers" service keeps
> respawning every 30 seconds.
>
> This is expected. The controllers service can only be active on a single
> host. The active service acquires a lock within etcd and the others will
> continuously respawn and attempt to acquire the lock.
>
>
> That is not expected - the controllers should start and block until they
> are needed.  They should never restart unless the lose their leader lock.
>
>
> >
> > The systemd logs for the service (here master2).
> >
> >
> > Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift
> Master Controllers...
> > Feb 20 21:13:14 vspose-master2
> atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893    3145
> plugins.go:71] No cloud provider specified.
> > Feb 20 21:13:14 vspose-master2
> atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515    3145
> start_master.go:410] Starting controllers on 0.0.0.0:8444
> (v3.1.0.4-16-g112fcc4)
> > Feb 20 21:13:14 vspose-master2
> atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566    3145
> start_master.go:414] Using images from "openshift3/ose-<component>:latest"
> > Feb 20 21:13:14 vspose-master2
> atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183    3145
> master.go:232] Started health checks at 0.0.0.0:8444
> > Feb 20 21:13:14 vspose-master2
> atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747    3145
> master_config.go:250] Attempting to acquire controller lease as
> master-xct012o4, renewing every 30 seconds
> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> atomic-openshift-master-controllers.service start operation timed out.
> Terminating.
> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> atomic-openshift-master-controllers.service: main process exited,
> code=exited, status=2/INVALIDARGUMENT
> > Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic
> OpenShift Master Controllers.
> > Feb 20 21:14:44 vspose-master2 systemd[1]: Unit
> atomic-openshift-master-controllers.service entered failed state.
> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> atomic-openshift-master-controllers.service failed.
> > Feb 20 21:14:44 vspose-master2 systemd[1]:
> atomic-openshift-master-controllers.service holdoff time over, scheduling
> restart.
> >
> >
> > My questions:
> >
> > - What have gone wrong  here ?
> >
> > - How do I recover from this ?
> >
> > - What is the recommended procedure to shut down / restart the OpenShift
> master services in a multi-master setup ?
> >
> > Normally on a (single) master environment I do "systemctl
> stop/start/restart atomic-openshift-master" but it seems naturally that the
> process on a multi-master environment should be more involved -- just
> cannot find any guidance on this
> >
> >
> > Kindest thanks for the help,
> >
> >
> > /Florian
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > [email protected]
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)

Reply via email to