[OpenStack-Infra] Zuul v3: some layout checks disabled in project-config
Hi, With https://review.openstack.org/492697 we are moving gating of Zuul itself and some related job repos from Zuul v2 to Zuul v3. As part of this, we need to disable some of the checks that we perform on the layout file. That change disables the following checks for the openstack-infra/* repos only: * usage of the merge-check template * at least one check job * at least one gate job * every gerrit project appears in zuul The first three should only be needed for a short time while we continue to construct the post and release pipelines in Zuul v3. After that is complete, we should be able to reinstate those checks, but we will need to keep the final check disabled (for openstack-infra repos at least) until Zuul v2 is retired. -Jim ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] citycloud lon1 mirror postmortem
On Thu, Aug 10, 2017 at 10:34:56PM +1000, Ian Wienand wrote: > Hi, > > In response to sdague reporting that citycloud jobs were timing out, I > investigated the mirror, suspecting it was not providing data fast enough. > > There were some 170 htcacheclean jobs running, and the host had a load > over 100. I killed all these, but performance was still unacceptable. > > I suspected networking, but since the host was in such a bad state I > decided to reboot it. Unfortunately it would get an address from DHCP > but seemed to have DNS issues ... eventually it would ping but nothing > else was working. > > nodepool.o.o was placed in the emergency file and I removed lon1 to > avoid jobs going there. > > I used the citycloud live chat, and Kim helpfully investigated and > ended up migrating mirror.lon1.citycloud.openstack.org to a new > compute node. This appeared to fix things, for us at least. > > nodepool.o.o is removed from the emergency file and original config > restored. > > With hindsight, clearly the excessive htcacheclean processes were due > to negative feedback of slow processes due to the network/dns issues > all starting to bunch up over time. However, I still think we could > minimise further issues running it under a lock [1]. Other than that, > not sure there is much else we can do, I think this was largely an > upstream issue. > > Cheers, > > -i > > [1] https://review.openstack.org/#/c/492481/ > Thanks, I also noticed a job fail to download a package from mirror.iad.rax.openstack.org. When I SSH'd to the server I too see high load (6.0+) and multiple htcacheclean processes running. I did an audit on the other mirrors and they too had the same, so I killed all the processes there. I can confirm the lock patch merged but will keep an eye on it. I did notice that mirror.lon1.citycloud.openstack.org wass still slow to react to shell commands. I still think we have an IO bottleneck some where, possible the compute host is throttling something. We should keep an eye on it. -PB ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
[OpenStack-Infra] citycloud lon1 mirror postmortem
Hi, In response to sdague reporting that citycloud jobs were timing out, I investigated the mirror, suspecting it was not providing data fast enough. There were some 170 htcacheclean jobs running, and the host had a load over 100. I killed all these, but performance was still unacceptable. I suspected networking, but since the host was in such a bad state I decided to reboot it. Unfortunately it would get an address from DHCP but seemed to have DNS issues ... eventually it would ping but nothing else was working. nodepool.o.o was placed in the emergency file and I removed lon1 to avoid jobs going there. I used the citycloud live chat, and Kim helpfully investigated and ended up migrating mirror.lon1.citycloud.openstack.org to a new compute node. This appeared to fix things, for us at least. nodepool.o.o is removed from the emergency file and original config restored. With hindsight, clearly the excessive htcacheclean processes were due to negative feedback of slow processes due to the network/dns issues all starting to bunch up over time. However, I still think we could minimise further issues running it under a lock [1]. Other than that, not sure there is much else we can do, I think this was largely an upstream issue. Cheers, -i [1] https://review.openstack.org/#/c/492481/ ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra