Re: [OpenStack-Infra] ze04 & #532575

2018-01-11 Thread Paul Belanger
On Thu, Jan 11, 2018 at 07:58:11AM -0500, David Shrewsbury wrote:
> This is probably mostly my fault since I did not WIP or -2 my change in
> 532575 to keep it
> from getting merged without some infra coordination.
> 
> Because of that change, it is also required that we change the user
> zuul-executor starts
> as from root to zuul [1], and that we also open up the new default finger
> port on the
> executors [2]. Once those are in place, we should be ok to restart the
> executors.
> 
> As for ze04, since that one restarted as the 'root' user, and never dropped
> privileges
> to the 'zuul' user due to 532575, I'm not sure what state it is going to be
> in after applying
> [1] and [2]. Would it create files/directories as root that would now be
> inaccessible if it
> were to restart with the zuul user? Think logs, work dirs, etc...
> 
For permissions, we should likely confirm that puppet-zuul will properly setup
zuul:zuul on the required folders. Then next puppet run we'd be protected.
> 
> -Dave
> 
> 
> [1] https://review.openstack.org/532594
> [2] https://review.openstack.org/532709
> 
> 
> On Wed, Jan 10, 2018 at 11:53 PM, Ian Wienand  wrote:
> 
> > Hi,
> >
> > To avoid you having to pull apart the logs starting ~ [1], we
> > determined that ze04.o.o was externally rebooted at 01:00UTC (there is
> > a rather weird support ticket which you can look at, which is assigned
> > to a rackspace employee but in our queue, saying the host became
> > unresponsive).
> >
> > Unfortunately that left a bunch of jobs orphaned and necessitated a
> > restart of zuul.
> >
> > However, recent changes to not run the executor as root [2] were thus
> > partially rolled out on ze04 as it came up after reboot.  As a
> > consequence when the host came back up the executor was running as
> > root with an invalid finger server.
> >
> > The executor on ze04 has been stopped, and the host placed in the
> > emergency file to avoid it coming back.  There are now some in-flight
> > patches to complete this transition, which will need to be staged a
> > bit more manually.
> >
> > The other executors have been left as is, based on the KISS theory
> > they shouldn't restart and pick up the code until this has been dealt
> > with.
> >
> > Thanks,
> >
> > -i
> >
> >
> > [1] http://eavesdrop.openstack.org/irclogs/%23openstack-
> > infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20
> > [2] https://review.openstack.org/#/c/532575/
> >
> > ___
> > OpenStack-Infra mailing list
> > OpenStack-Infra@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 
> 
> 
> 
> -- 
> David Shrewsbury (Shrews)

> ___
> OpenStack-Infra mailing list
> OpenStack-Infra@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] ze04 & #532575

2018-01-11 Thread David Shrewsbury
This is probably mostly my fault since I did not WIP or -2 my change in
532575 to keep it
from getting merged without some infra coordination.

Because of that change, it is also required that we change the user
zuul-executor starts
as from root to zuul [1], and that we also open up the new default finger
port on the
executors [2]. Once those are in place, we should be ok to restart the
executors.

As for ze04, since that one restarted as the 'root' user, and never dropped
privileges
to the 'zuul' user due to 532575, I'm not sure what state it is going to be
in after applying
[1] and [2]. Would it create files/directories as root that would now be
inaccessible if it
were to restart with the zuul user? Think logs, work dirs, etc...


-Dave


[1] https://review.openstack.org/532594
[2] https://review.openstack.org/532709


On Wed, Jan 10, 2018 at 11:53 PM, Ian Wienand  wrote:

> Hi,
>
> To avoid you having to pull apart the logs starting ~ [1], we
> determined that ze04.o.o was externally rebooted at 01:00UTC (there is
> a rather weird support ticket which you can look at, which is assigned
> to a rackspace employee but in our queue, saying the host became
> unresponsive).
>
> Unfortunately that left a bunch of jobs orphaned and necessitated a
> restart of zuul.
>
> However, recent changes to not run the executor as root [2] were thus
> partially rolled out on ze04 as it came up after reboot.  As a
> consequence when the host came back up the executor was running as
> root with an invalid finger server.
>
> The executor on ze04 has been stopped, and the host placed in the
> emergency file to avoid it coming back.  There are now some in-flight
> patches to complete this transition, which will need to be staged a
> bit more manually.
>
> The other executors have been left as is, based on the KISS theory
> they shouldn't restart and pick up the code until this has been dealt
> with.
>
> Thanks,
>
> -i
>
>
> [1] http://eavesdrop.openstack.org/irclogs/%23openstack-
> infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20
> [2] https://review.openstack.org/#/c/532575/
>
> ___
> OpenStack-Infra mailing list
> OpenStack-Infra@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra




-- 
David Shrewsbury (Shrews)
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] ze04 & #532575

2018-01-10 Thread Ian Wienand
Hi,

To avoid you having to pull apart the logs starting ~ [1], we
determined that ze04.o.o was externally rebooted at 01:00UTC (there is
a rather weird support ticket which you can look at, which is assigned
to a rackspace employee but in our queue, saying the host became
unresponsive).

Unfortunately that left a bunch of jobs orphaned and necessitated a
restart of zuul.

However, recent changes to not run the executor as root [2] were thus
partially rolled out on ze04 as it came up after reboot.  As a
consequence when the host came back up the executor was running as
root with an invalid finger server.

The executor on ze04 has been stopped, and the host placed in the
emergency file to avoid it coming back.  There are now some in-flight
patches to complete this transition, which will need to be staged a
bit more manually.

The other executors have been left as is, based on the KISS theory
they shouldn't restart and pick up the code until this has been dealt
with.

Thanks,

-i


[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20
[2] https://review.openstack.org/#/c/532575/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra