Re: [openstack-dev] [tripleo] CI is currently down: 2 blockers

2016-09-16 Thread Dougal Matthews
For those interested we now have a minimal way to reproduce the
MessagingTimeout in Mistral.

https://bugs.launchpad.net/mistral/+bug/1624284

It seems to be related to this change in Mistral:


https://github.com/openstack/mistral/commit/1b0f0cddd620a3785017bb28d432cb0030b627d7

And even more specifically, this line:


https://github.com/openstack/mistral/commit/1b0f0cddd620a3785017bb28d432cb0030b627d7#diff-fa1c08d9053a1e6736fb8ac64e51d1ab

Thomas Herve managed to work around it by changing the executor.


On 16 September 2016 at 01:19, Emilien Macchi  wrote:

> So here's an update about current situation:
>
> Master / Newton
> gate-tripleo-ci-centos-7-ovb-nonha
> gate-tripleo-ci-centos-7-ovb-ha
> The 2 jobs are supposed to pass, but some jobs are timing out in RH1 cloud.
> In order to reduce the timeouts, Ben ran:
> heat-manage purge_deleted 3
> nova-manage db archive_deleted_rows --verbose --max_rows 100
> sudo mysqlcheck -o -A
>
> gate-tripleo-ci-centos-7-nonha-multinode
> We merged the revert: https://review.openstack.org/#/c/370250/
> At the time I'm writing this email, the job is still non-voting:
> https://review.openstack.org/#/c/371133/
> But hopefully Infra will merge this patch soon to bring it back in the
> gate.
>
>
> stable/mitaka and stable/liberty
> gate-tripleo-ci-centos-7-ovb-nonha works fine.
> gate-tripleo-ci-centos-7-ovb-ha is broken because Galera was updated
> in EPEL (and TripleO Mitaka still deploys EPEL).
> I have 2 patches in order to fix the situation:
> 1) Fix Galera configuration to work with recent EPEL (kudos to Damien
> for his help): https://review.openstack.org/#/c/371029/
> 2) (not required but good to have) Disable EPEL in tripleoclient
> https://review.openstack.org/#/c/369559/ - I would understand if
> people -1 this patch and I have no strong opinion about it.
>
> I hope 1) will pass CI so we can just move forward.
>
> It's end of day for me but if someone can monitor
> http://tripleo.org/cistatus.html during Friday morning and make sure
> everything it still running fine, we would appreciate it. Also please
> report any bug related to CI and set the ci & alert tags.
>
> Thanks, and let's keep focusing on Newton release!
>
> On Thu, Sep 15, 2016 at 11:26 AM, Emilien Macchi 
> wrote:
> > On Wed, Sep 14, 2016 at 10:13 PM, Emilien Macchi 
> wrote:
> >> Hi,
> >>
> >> Just a heads-up before end of day:
> >>
> >> 1) multinode job is failing 80% of time. James and myself did some
> >> attempts to revert or fix things but we have been unfortunate until
> >> now.
> >> Everything is documented here: https://bugs.launchpad.net/
> tripleo/+bug/1623606
> >
> > We found out that https://review.openstack.org/#/c/368760/ is breaking
> > us, so we will revert it and work on it again later.
> >
> >> 2) ovb jobs are timeing out during NetworkDeployment because
> >> 99-refresh-completed is not signaling to Heat due to instance-id being
> >> detected as null by os-apply-config.
> >> James proposed a revert: https://review.openstack.org/#/c/370250/
> >> But the patch can't be merged because of 1).
> >
> > We are going to merge James's revert, we think it will bring back OVB
> jobs.
> >
> > To merge the reverts, we need to disable voting on multinode jobs:
> > https://review.openstack.org/#/c/370922/
> >
> > Please do not merge anything today (except the 2 reverts) until our
> > situation becomes more stable. Probably tonight or tomorrow.
> > Once situation is better, I or someone else in the team will give an
> > update here.
> >
> > Thanks for your understanding,
> >
> >> I'll continue to work on it tomorrow but if you're able to jump in and
> >> make progress on it, this downtime is very critical at this stage of
> >> the cycle.
> >>
> >> Any help is highly welcome.
> >>
> >> Thanks,
> >> --
> >> Emilien Macchi
> >
> >
> >
> > --
> > Emilien Macchi
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI is currently down: 2 blockers

2016-09-15 Thread Emilien Macchi
So here's an update about current situation:

Master / Newton
gate-tripleo-ci-centos-7-ovb-nonha
gate-tripleo-ci-centos-7-ovb-ha
The 2 jobs are supposed to pass, but some jobs are timing out in RH1 cloud.
In order to reduce the timeouts, Ben ran:
heat-manage purge_deleted 3
nova-manage db archive_deleted_rows --verbose --max_rows 100
sudo mysqlcheck -o -A

gate-tripleo-ci-centos-7-nonha-multinode
We merged the revert: https://review.openstack.org/#/c/370250/
At the time I'm writing this email, the job is still non-voting:
https://review.openstack.org/#/c/371133/
But hopefully Infra will merge this patch soon to bring it back in the gate.


stable/mitaka and stable/liberty
gate-tripleo-ci-centos-7-ovb-nonha works fine.
gate-tripleo-ci-centos-7-ovb-ha is broken because Galera was updated
in EPEL (and TripleO Mitaka still deploys EPEL).
I have 2 patches in order to fix the situation:
1) Fix Galera configuration to work with recent EPEL (kudos to Damien
for his help): https://review.openstack.org/#/c/371029/
2) (not required but good to have) Disable EPEL in tripleoclient
https://review.openstack.org/#/c/369559/ - I would understand if
people -1 this patch and I have no strong opinion about it.

I hope 1) will pass CI so we can just move forward.

It's end of day for me but if someone can monitor
http://tripleo.org/cistatus.html during Friday morning and make sure
everything it still running fine, we would appreciate it. Also please
report any bug related to CI and set the ci & alert tags.

Thanks, and let's keep focusing on Newton release!

On Thu, Sep 15, 2016 at 11:26 AM, Emilien Macchi  wrote:
> On Wed, Sep 14, 2016 at 10:13 PM, Emilien Macchi  wrote:
>> Hi,
>>
>> Just a heads-up before end of day:
>>
>> 1) multinode job is failing 80% of time. James and myself did some
>> attempts to revert or fix things but we have been unfortunate until
>> now.
>> Everything is documented here: 
>> https://bugs.launchpad.net/tripleo/+bug/1623606
>
> We found out that https://review.openstack.org/#/c/368760/ is breaking
> us, so we will revert it and work on it again later.
>
>> 2) ovb jobs are timeing out during NetworkDeployment because
>> 99-refresh-completed is not signaling to Heat due to instance-id being
>> detected as null by os-apply-config.
>> James proposed a revert: https://review.openstack.org/#/c/370250/
>> But the patch can't be merged because of 1).
>
> We are going to merge James's revert, we think it will bring back OVB jobs.
>
> To merge the reverts, we need to disable voting on multinode jobs:
> https://review.openstack.org/#/c/370922/
>
> Please do not merge anything today (except the 2 reverts) until our
> situation becomes more stable. Probably tonight or tomorrow.
> Once situation is better, I or someone else in the team will give an
> update here.
>
> Thanks for your understanding,
>
>> I'll continue to work on it tomorrow but if you're able to jump in and
>> make progress on it, this downtime is very critical at this stage of
>> the cycle.
>>
>> Any help is highly welcome.
>>
>> Thanks,
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI is currently down: 2 blockers

2016-09-15 Thread Emilien Macchi
On Wed, Sep 14, 2016 at 10:13 PM, Emilien Macchi  wrote:
> Hi,
>
> Just a heads-up before end of day:
>
> 1) multinode job is failing 80% of time. James and myself did some
> attempts to revert or fix things but we have been unfortunate until
> now.
> Everything is documented here: https://bugs.launchpad.net/tripleo/+bug/1623606

We found out that https://review.openstack.org/#/c/368760/ is breaking
us, so we will revert it and work on it again later.

> 2) ovb jobs are timeing out during NetworkDeployment because
> 99-refresh-completed is not signaling to Heat due to instance-id being
> detected as null by os-apply-config.
> James proposed a revert: https://review.openstack.org/#/c/370250/
> But the patch can't be merged because of 1).

We are going to merge James's revert, we think it will bring back OVB jobs.

To merge the reverts, we need to disable voting on multinode jobs:
https://review.openstack.org/#/c/370922/

Please do not merge anything today (except the 2 reverts) until our
situation becomes more stable. Probably tonight or tomorrow.
Once situation is better, I or someone else in the team will give an
update here.

Thanks for your understanding,

> I'll continue to work on it tomorrow but if you're able to jump in and
> make progress on it, this downtime is very critical at this stage of
> the cycle.
>
> Any help is highly welcome.
>
> Thanks,
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev