On Mon, Aug 6, 2018 at 5:55 PM Wesley Hayutin <whayu...@redhat.com> wrote:
> On Mon, Aug 6, 2018 at 12:56 PM Wesley Hayutin <whayu...@redhat.com> > wrote: > >> Greetings, >> >> There is currently an unplanned outtage atm for the tripleo 3rd party OVB >> based jobs. >> We will contact the list when there are more details. >> >> Thank you! >> > > OK, > I'm going to call an end to the current outtage. We are closely monitoring > the ovb 3rd party jobs. > I'll called for the outtage when we hit [1]. Once I deleted the stack > that moved teh HA routers to back_up state, the networking came back online. > > Additionally Kieran and I had to work through a number of instances that > required admin access to remove. > Once those resources were cleaned up our CI tooling removed the rest of > the stacks in delete_failed status. The stacks in delete_failed status > were holding ip address that were causing new stacks to fail [2] > > There are still active issues that could cause OVB jobs to fail. > This connection issues [3] was originaly thought to be DNS, however that > turned out to not be the case. > You may also see your job have a "node_failure" status, Paul has sent > updates about this issue and is working on a patch and integration into rdo > software factory. > > The CI team is close to including all the console logs into the regular > job logs, however if needed atm they can be viewed at [5]. > We are also adding the bmc to the list of instances that we collect logs > from. > > *To summarize* the most recent outtage was infra related and the errors > were swallowed up in the bmc console log that at the time was not available > to users. > > We continue to monitor that ovb jobs at http://cistatus.tripleo.org/ > The legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master job > is at a 53% pass rate, it needs to move to a > 85% pass rate to match other > check jobs. > > Thanks all! > Following up, legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master job is at a 78.6% pass rate today. Certainly an improvement. We had a quick sync meeting this morning w/ RDO-Cloud admins, tripleo and infra folks. There are two remaining issues. There is an active issue w/ network connections, and an issue w/ instances booting into node_failure status. New issues creep up all the time and we're actively monitoring those as well. Still shooting for 85% pass rate. Thanks all > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1570136 > [2] http://paste.openstack.org/show/727444/ > [3] https://bugs.launchpad.net/tripleo/+bug/1785342 > [4] https://review.openstack.org/#/c/584488/ > [5] http://38.145.34.41/console-logs/?C=M;O=D > > > > > > >> >> -- >> >> Wes Hayutin >> >> Associate MANAGER >> >> Red Hat >> >> <https://www.redhat.com/> >> >> w <cclay...@redhat.com>hayu...@redhat.com T: +1919 <+19197544114> >> 4232509 IRC: weshay >> <https://red.ht/sig> >> >> View my calendar and check my availability for meetings HERE >> <https://calendar.google.com/calendar/b/1/embed?src=whayu...@redhat.com&ctz=America/New_York> >> > -- > > Wes Hayutin > > Associate MANAGER > > Red Hat > > <https://www.redhat.com/> > > w <cclay...@redhat.com>hayu...@redhat.com T: +1919 <+19197544114> > 4232509 IRC: weshay > <https://red.ht/sig> > > View my calendar and check my availability for meetings HERE > <https://calendar.google.com/calendar/b/1/embed?src=whayu...@redhat.com&ctz=America/New_York> > -- Wes Hayutin Associate MANAGER Red Hat <https://www.redhat.com/> w <cclay...@redhat.com>hayu...@redhat.com T: +1919 <+19197544114>4232509 IRC: weshay <https://red.ht/sig> View my calendar and check my availability for meetings HERE <https://calendar.google.com/calendar/b/1/embed?src=whayu...@redhat.com&ctz=America/New_York>
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev