Re: [OpenStack-Infra] ze04 & #532575
On Thu, Jan 11, 2018 at 07:58:11AM -0500, David Shrewsbury wrote: > This is probably mostly my fault since I did not WIP or -2 my change in > 532575 to keep it > from getting merged without some infra coordination. > > Because of that change, it is also required that we change the user > zuul-executor starts > as from root to zuul [1], and that we also open up the new default finger > port on the > executors [2]. Once those are in place, we should be ok to restart the > executors. > > As for ze04, since that one restarted as the 'root' user, and never dropped > privileges > to the 'zuul' user due to 532575, I'm not sure what state it is going to be > in after applying > [1] and [2]. Would it create files/directories as root that would now be > inaccessible if it > were to restart with the zuul user? Think logs, work dirs, etc... > For permissions, we should likely confirm that puppet-zuul will properly setup zuul:zuul on the required folders. Then next puppet run we'd be protected. > > -Dave > > > [1] https://review.openstack.org/532594 > [2] https://review.openstack.org/532709 > > > On Wed, Jan 10, 2018 at 11:53 PM, Ian Wienandwrote: > > > Hi, > > > > To avoid you having to pull apart the logs starting ~ [1], we > > determined that ze04.o.o was externally rebooted at 01:00UTC (there is > > a rather weird support ticket which you can look at, which is assigned > > to a rackspace employee but in our queue, saying the host became > > unresponsive). > > > > Unfortunately that left a bunch of jobs orphaned and necessitated a > > restart of zuul. > > > > However, recent changes to not run the executor as root [2] were thus > > partially rolled out on ze04 as it came up after reboot. As a > > consequence when the host came back up the executor was running as > > root with an invalid finger server. > > > > The executor on ze04 has been stopped, and the host placed in the > > emergency file to avoid it coming back. There are now some in-flight > > patches to complete this transition, which will need to be staged a > > bit more manually. > > > > The other executors have been left as is, based on the KISS theory > > they shouldn't restart and pick up the code until this has been dealt > > with. > > > > Thanks, > > > > -i > > > > > > [1] http://eavesdrop.openstack.org/irclogs/%23openstack- > > infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20 > > [2] https://review.openstack.org/#/c/532575/ > > > > ___ > > OpenStack-Infra mailing list > > OpenStack-Infra@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > > > > -- > David Shrewsbury (Shrews) > ___ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] ze04 & #532575
This is probably mostly my fault since I did not WIP or -2 my change in 532575 to keep it from getting merged without some infra coordination. Because of that change, it is also required that we change the user zuul-executor starts as from root to zuul [1], and that we also open up the new default finger port on the executors [2]. Once those are in place, we should be ok to restart the executors. As for ze04, since that one restarted as the 'root' user, and never dropped privileges to the 'zuul' user due to 532575, I'm not sure what state it is going to be in after applying [1] and [2]. Would it create files/directories as root that would now be inaccessible if it were to restart with the zuul user? Think logs, work dirs, etc... -Dave [1] https://review.openstack.org/532594 [2] https://review.openstack.org/532709 On Wed, Jan 10, 2018 at 11:53 PM, Ian Wienandwrote: > Hi, > > To avoid you having to pull apart the logs starting ~ [1], we > determined that ze04.o.o was externally rebooted at 01:00UTC (there is > a rather weird support ticket which you can look at, which is assigned > to a rackspace employee but in our queue, saying the host became > unresponsive). > > Unfortunately that left a bunch of jobs orphaned and necessitated a > restart of zuul. > > However, recent changes to not run the executor as root [2] were thus > partially rolled out on ze04 as it came up after reboot. As a > consequence when the host came back up the executor was running as > root with an invalid finger server. > > The executor on ze04 has been stopped, and the host placed in the > emergency file to avoid it coming back. There are now some in-flight > patches to complete this transition, which will need to be staged a > bit more manually. > > The other executors have been left as is, based on the KISS theory > they shouldn't restart and pick up the code until this has been dealt > with. > > Thanks, > > -i > > > [1] http://eavesdrop.openstack.org/irclogs/%23openstack- > infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20 > [2] https://review.openstack.org/#/c/532575/ > > ___ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra -- David Shrewsbury (Shrews) ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
[OpenStack-Infra] ze04 & #532575
Hi, To avoid you having to pull apart the logs starting ~ [1], we determined that ze04.o.o was externally rebooted at 01:00UTC (there is a rather weird support ticket which you can look at, which is assigned to a rackspace employee but in our queue, saying the host became unresponsive). Unfortunately that left a bunch of jobs orphaned and necessitated a restart of zuul. However, recent changes to not run the executor as root [2] were thus partially rolled out on ze04 as it came up after reboot. As a consequence when the host came back up the executor was running as root with an invalid finger server. The executor on ze04 has been stopped, and the host placed in the emergency file to avoid it coming back. There are now some in-flight patches to complete this transition, which will need to be staged a bit more manually. The other executors have been left as is, based on the KISS theory they shouldn't restart and pick up the code until this has been dealt with. Thanks, -i [1] http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20 [2] https://review.openstack.org/#/c/532575/ ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra