On Fri, Jan 19, 2018 at 12:23 PM, Ben Nemec <[email protected]> wrote:
> > > On 01/18/2018 09:45 AM, Emilien Macchi wrote: > >> On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <[email protected]> wrote: >> >>> Hi, >>> we're encountering many timeouts for zuul gates in TripleO. >>> For example, see >>> http://logs.openstack.org/95/508195/28/check-tripleo/tripleo >>> -ci-centos-7-ovb-ha-oooq/c85fcb7/. >>> >>> rechecks won't help and sometimes specific gate is end successfully and >>> sometimes not. >>> The problem is that after recheck it's not always the same gate which is >>> failed. >>> >>> Is there someone who have access to the servers load to see what cause >>> this? >>> alternatively, is there something we can do in order to reduce the >>> running >>> time for each gate? >>> >> >> We're migrating to RDO Cloud for OVB jobs: >> https://review.openstack.org/#/c/526481/ >> It's a work in progress but will help a lot for OVB timeouts on RH1. >> >> I'll let the CI folks comment on that topic. >> >> > I noticed that the timeouts on rh1 have been especially bad as of late so > I did a little testing and found that it did seem to be running more slowly > than it should. After some investigation I found that 6 of our compute > nodes have warning messages that the cpu was throttled due to high > temperature. I've disabled 4 of them that had a lot of warnings. The other > 2 only had a handful of warnings so I'm hopeful we can leave them active > without affecting job performance too much. It won't accomplish much if we > disable the overheating nodes only to overload the remaining ones. > > I'll follow up with our hardware people and see if we can determine why > these specific nodes are overheating. They seem to be running 20 degrees C > hotter than the rest of the nodes. > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > For the latest discussion and to-do's before rh1 ovb jobs are migrated to rdo-cloud look here [1]. TLDR is that we're looking for a run of seven days where the jobs are passing at around 80% or better in check. We've reported a number of issues w/ the environment, and AFAIK everything is now resolved just recently. [1] https://trello.com/c/wGUUEqty/384-steps-needed-to-migrate-ovb-to-rdo-cloud
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
