Re: [openstack-dev] [tripleo] 3rd party ovb jobs are down

2018-08-07 Thread Wesley Hayutin
On Mon, Aug 6, 2018 at 5:55 PM Wesley Hayutin  wrote:

> On Mon, Aug 6, 2018 at 12:56 PM Wesley Hayutin 
> wrote:
>
>> Greetings,
>>
>> There is currently an unplanned outtage atm for the tripleo 3rd party OVB
>> based jobs.
>> We will contact the list when there are more details.
>>
>> Thank you!
>>
>
> OK,
> I'm going to call an end to the current outtage. We are closely monitoring
> the ovb 3rd party jobs.
> I'll called for the outtage when we hit [1].  Once I deleted the stack
> that moved teh HA routers to back_up state, the networking came back online.
>
> Additionally Kieran and I had to work through a number of instances that
> required admin access to remove.
> Once those resources  were cleaned up our CI tooling removed the rest of
> the stacks in delete_failed status.The stacks in delete_failed status
> were holding ip address that were causing new stacks to fail [2]
>
> There are still active issues that could cause OVB jobs to fail.
> This connection issues [3] was originaly thought to be DNS, however that
> turned out to not be the case.
> You may also see your job have a "node_failure" status, Paul has sent
> updates about this issue and is working on a patch and integration into rdo
> software factory.
>
> The CI team is close to including all the console logs into the regular
> job logs, however if needed atm they can be viewed at [5].
> We are also adding the bmc to the list of instances that we collect logs
> from.
>
> *To summarize* the most recent outtage was infra related and the errors
> were swallowed up in the bmc console log that at the time was not available
> to users.
>
> We continue to monitor that ovb jobs at http://cistatus.tripleo.org/
> The  legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master job
> is at a 53% pass rate, it needs to move to a > 85% pass rate to match other
> check jobs.
>
> Thanks all!
>

Following up,
 legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master job is at
a 78.6% pass rate today.   Certainly an improvement.

We had a quick sync meeting this morning w/ RDO-Cloud admins, tripleo and
infra folks.  There are two remaining issues.
There is an active issue w/ network connections, and an issue w/ instances
booting into node_failure status.   New issues
creep up all the time and we're actively monitoring those as well.  Still
shooting for 85% pass rate.

Thanks all



>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1570136
> [2] http://paste.openstack.org/show/727444/
> [3] https://bugs.launchpad.net/tripleo/+bug/1785342
> [4] https://review.openstack.org/#/c/584488/
> [5] http://38.145.34.41/console-logs/?C=M;O=D
>
>
>
>
>
>
>>
>> --
>>
>> Wes Hayutin
>>
>> Associate MANAGER
>>
>> Red Hat
>>
>> 
>>
>> w hayu...@redhat.comT: +1919 <+19197544114>
>> 4232509 IRC:  weshay
>> 
>>
>> View my calendar and check my availability for meetings HERE
>> 
>>
> --
>
> Wes Hayutin
>
> Associate MANAGER
>
> Red Hat
>
> 
>
> w hayu...@redhat.comT: +1919 <+19197544114>
> 4232509 IRC:  weshay
> 
>
> View my calendar and check my availability for meetings HERE
> 
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.comT: +1919 <+19197544114>4232509
   IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] 3rd party ovb jobs are down

2018-08-06 Thread Wesley Hayutin
On Mon, Aug 6, 2018 at 12:56 PM Wesley Hayutin  wrote:

> Greetings,
>
> There is currently an unplanned outtage atm for the tripleo 3rd party OVB
> based jobs.
> We will contact the list when there are more details.
>
> Thank you!
>

OK,
I'm going to call an end to the current outtage. We are closely monitoring
the ovb 3rd party jobs.
I'll called for the outtage when we hit [1].  Once I deleted the stack that
moved teh HA routers to back_up state, the networking came back online.

Additionally Kieran and I had to work through a number of instances that
required admin access to remove.
Once those resources  were cleaned up our CI tooling removed the rest of
the stacks in delete_failed status.The stacks in delete_failed status
were holding ip address that were causing new stacks to fail [2]

There are still active issues that could cause OVB jobs to fail.
This connection issues [3] was originaly thought to be DNS, however that
turned out to not be the case.
You may also see your job have a "node_failure" status, Paul has sent
updates about this issue and is working on a patch and integration into rdo
software factory.

The CI team is close to including all the console logs into the regular job
logs, however if needed atm they can be viewed at [5].
We are also adding the bmc to the list of instances that we collect logs
from.

*To summarize* the most recent outtage was infra related and the errors
were swallowed up in the bmc console log that at the time was not available
to users.

We continue to monitor that ovb jobs at http://cistatus.tripleo.org/
The  legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master job is
at a 53% pass rate, it needs to move to a > 85% pass rate to match other
check jobs.

Thanks all!

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1570136
[2] http://paste.openstack.org/show/727444/
[3] https://bugs.launchpad.net/tripleo/+bug/1785342
[4] https://review.openstack.org/#/c/584488/
[5] http://38.145.34.41/console-logs/?C=M;O=D






>
> --
>
> Wes Hayutin
>
> Associate MANAGER
>
> Red Hat
>
> 
>
> w hayu...@redhat.comT: +1919 <+19197544114>
> 4232509 IRC:  weshay
> 
>
> View my calendar and check my availability for meetings HERE
> 
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.comT: +1919 <+19197544114>4232509
   IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev