Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-26 Thread Attila Darazs

On 10/26/2017 06:14 AM, Emilien Macchi wrote:

On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi  wrote:

Quick update before being afk for some hours:

- Still trying to land https://review.openstack.org/#/c/513701 (thanks
Paul for promoting it in gate).


Landed.


- Disabling voting on scenario001 and scenario004 container jobs:
https://review.openstack.org/#/c/515188/


Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.


- overcloudrc/keystone v2 workaround:
https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
fix for https://bugs.launchpad.net/tripleo/+bug/1727454)


Merged - Dougal will work on the real fix this week but not urgent anymore.


- Fixing zaqar/notification issues on
https://review.openstack.org/#/c/515123 - we hope that helps to reduce
some failures in gate


In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.


- puppet-tripleo gate broken on stable branches (syntax jobs not
running properly) - jeblair is looking at it now


jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.


Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,


I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

Now let's see how RDO promotion works. We're close :-)


We also have to change the tenant rc file from overcloudrc to 
overcloudrc.v3 for the validate-simple role to unblock promotion on master.


I created a bug to track that problem and going to post a fix soon:

https://bugs.launchpad.net/tripleo/+bug/1727698

Attila

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-26 Thread Bogdan Dobrelya

Thank you for working on this!
I know it is needed to unblock development of tripleo. I have though a 
few comments inline.


On 10/26/17 6:14 AM, Emilien Macchi wrote:

On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi  wrote:

Quick update before being afk for some hours:

- Still trying to land https://review.openstack.org/#/c/513701 (thanks
Paul for promoting it in gate).


Landed.


- Disabling voting on scenario001 and scenario004 container jobs:
https://review.openstack.org/#/c/515188/


Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.


- overcloudrc/keystone v2 workaround:
https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
fix for https://bugs.launchpad.net/tripleo/+bug/1727454)


Merged - Dougal will work on the real fix this week but not urgent anymore.


- Fixing zaqar/notification issues on
https://review.openstack.org/#/c/515123 - we hope that helps to reduce
some failures in gate


In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.


- puppet-tripleo gate broken on stable branches (syntax jobs not
running properly) - jeblair is looking at it now


jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.


Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,


I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.


I have to warn tripleo folks about any instack-only changes these days.
Please make sure each instack-only change, like Hiera overrides, has 
follow-up patches for containerized cases as well, which do not use 
instack. Otherwise, we're putting the whole containers thing under high 
risk to keep in place the regressions fixed for non-containers. That is 
dangerous, given that we disable voting for it from time to time.


For this particular case, please add it in a separate review in 
puppet/services/zaqar*. Thanks @bandini for confirming that on IRC.




Now let's see how RDO promotion works. We're close :-)

Thanks everyone,


On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi  wrote:

Status:

- Heat Convergence switch *might* be a reason why overcloud timeout so
much. Thomas proposed to disable it:
https://review.openstack.org/515077
- Every time a patch fails in the tripleo gate queue, it reset the
gate. I proposed to remove this common queue:
https://review.openstack.org/515070
- I cleared the patches in check and queue to make sure the 2 blockers
are tested and can be merged in priority. I'll keep an eye on it
today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi  wrote:

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi




--
Emilien Macchi




--
Emilien Macchi







--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-25 Thread Emilien Macchi
On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi  wrote:
> Quick update before being afk for some hours:
>
> - Still trying to land https://review.openstack.org/#/c/513701 (thanks
> Paul for promoting it in gate).

Landed.

> - Disabling voting on scenario001 and scenario004 container jobs:
> https://review.openstack.org/#/c/515188/

Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.

> - overcloudrc/keystone v2 workaround:
> https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
> fix for https://bugs.launchpad.net/tripleo/+bug/1727454)

Merged - Dougal will work on the real fix this week but not urgent anymore.

> - Fixing zaqar/notification issues on
> https://review.openstack.org/#/c/515123 - we hope that helps to reduce
> some failures in gate

In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.

> - puppet-tripleo gate broken on stable branches (syntax jobs not
> running properly) - jeblair is looking at it now

jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.

> Once again, we'll need to retrospect and see why we reached that
> terrible state but let's focus on bringing our CI in a good shape
> again.
> Thanks a ton to everyone who is involved,

I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.

Now let's see how RDO promotion works. We're close :-)

Thanks everyone,

> On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi  wrote:
>> Status:
>>
>> - Heat Convergence switch *might* be a reason why overcloud timeout so
>> much. Thomas proposed to disable it:
>> https://review.openstack.org/515077
>> - Every time a patch fails in the tripleo gate queue, it reset the
>> gate. I proposed to remove this common queue:
>> https://review.openstack.org/515070
>> - I cleared the patches in check and queue to make sure the 2 blockers
>> are tested and can be merged in priority. I'll keep an eye on it
>> today.
>>
>> Any help is very welcome.
>>
>> On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi  wrote:
>>> We have been working very hard to get a package/container promotions
>>> (since 44 days) and now our blocker is
>>> https://review.openstack.org/#/c/513701/.
>>>
>>> Because the gate queue is huge, we decided to block the gate and kill
>>> all the jobs running there until we can get
>>> https://review.openstack.org/#/c/513701/ and its backport
>>> https://review.openstack.org/#/c/514584 (both are blocking the whole
>>> production chain).
>>> We hope to promote after these 2 patches, unless there is something
>>> else, in that case we would iterate to the next problem.
>>>
>>> We hope you understand and support us during this effort.
>>> So please do not recheck, rebase or approve any patch until further notice.
>>>
>>> Thank you,
>>> --
>>> Emilien Macchi
>>
>>
>>
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-25 Thread Emilien Macchi
Quick update before being afk for some hours:

- Still trying to land https://review.openstack.org/#/c/513701 (thanks
Paul for promoting it in gate).
- Disabling voting on scenario001 and scenario004 container jobs:
https://review.openstack.org/#/c/515188/
- overcloudrc/keystone v2 workaround:
https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
fix for https://bugs.launchpad.net/tripleo/+bug/1727454)
- Fixing zaqar/notification issues on
https://review.openstack.org/#/c/515123 - we hope that helps to reduce
some failures in gate
- puppet-tripleo gate broken on stable branches (syntax jobs not
running properly) - jeblair is looking at it now

Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,

On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi  wrote:
> Status:
>
> - Heat Convergence switch *might* be a reason why overcloud timeout so
> much. Thomas proposed to disable it:
> https://review.openstack.org/515077
> - Every time a patch fails in the tripleo gate queue, it reset the
> gate. I proposed to remove this common queue:
> https://review.openstack.org/515070
> - I cleared the patches in check and queue to make sure the 2 blockers
> are tested and can be merged in priority. I'll keep an eye on it
> today.
>
> Any help is very welcome.
>
> On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi  wrote:
>> We have been working very hard to get a package/container promotions
>> (since 44 days) and now our blocker is
>> https://review.openstack.org/#/c/513701/.
>>
>> Because the gate queue is huge, we decided to block the gate and kill
>> all the jobs running there until we can get
>> https://review.openstack.org/#/c/513701/ and its backport
>> https://review.openstack.org/#/c/514584 (both are blocking the whole
>> production chain).
>> We hope to promote after these 2 patches, unless there is something
>> else, in that case we would iterate to the next problem.
>>
>> We hope you understand and support us during this effort.
>> So please do not recheck, rebase or approve any patch until further notice.
>>
>> Thank you,
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-25 Thread Emilien Macchi
Status:

- Heat Convergence switch *might* be a reason why overcloud timeout so
much. Thomas proposed to disable it:
https://review.openstack.org/515077
- Every time a patch fails in the tripleo gate queue, it reset the
gate. I proposed to remove this common queue:
https://review.openstack.org/515070
- I cleared the patches in check and queue to make sure the 2 blockers
are tested and can be merged in priority. I'll keep an eye on it
today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi  wrote:
> We have been working very hard to get a package/container promotions
> (since 44 days) and now our blocker is
> https://review.openstack.org/#/c/513701/.
>
> Because the gate queue is huge, we decided to block the gate and kill
> all the jobs running there until we can get
> https://review.openstack.org/#/c/513701/ and its backport
> https://review.openstack.org/#/c/514584 (both are blocking the whole
> production chain).
> We hope to promote after these 2 patches, unless there is something
> else, in that case we would iterate to the next problem.
>
> We hope you understand and support us during this effort.
> So please do not recheck, rebase or approve any patch until further notice.
>
> Thank you,
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev