For those interested we now have a minimal way to reproduce the
MessagingTimeout in Mistral.
It seems to be related to this change in Mistral:
And even more specifically, this line:
Thomas Herve managed to work around it by changing the executor.
On 16 September 2016 at 01:19, Emilien Macchi <emil...@redhat.com> wrote:
> So here's an update about current situation:
> Master / Newton
> The 2 jobs are supposed to pass, but some jobs are timing out in RH1 cloud.
> In order to reduce the timeouts, Ben ran:
> heat-manage purge_deleted 3
> nova-manage db archive_deleted_rows --verbose --max_rows 1000000
> sudo mysqlcheck -o -A
> We merged the revert: https://review.openstack.org/#/c/370250/
> At the time I'm writing this email, the job is still non-voting:
> But hopefully Infra will merge this patch soon to bring it back in the
> stable/mitaka and stable/liberty
> gate-tripleo-ci-centos-7-ovb-nonha works fine.
> gate-tripleo-ci-centos-7-ovb-ha is broken because Galera was updated
> in EPEL (and TripleO Mitaka still deploys EPEL).
> I have 2 patches in order to fix the situation:
> 1) Fix Galera configuration to work with recent EPEL (kudos to Damien
> for his help): https://review.openstack.org/#/c/371029/
> 2) (not required but good to have) Disable EPEL in tripleoclient
> https://review.openstack.org/#/c/369559/ - I would understand if
> people -1 this patch and I have no strong opinion about it.
> I hope 1) will pass CI so we can just move forward.
> It's end of day for me but if someone can monitor
> http://tripleo.org/cistatus.html during Friday morning and make sure
> everything it still running fine, we would appreciate it. Also please
> report any bug related to CI and set the ci & alert tags.
> Thanks, and let's keep focusing on Newton release!
> On Thu, Sep 15, 2016 at 11:26 AM, Emilien Macchi <emil...@redhat.com>
> > On Wed, Sep 14, 2016 at 10:13 PM, Emilien Macchi <emil...@redhat.com>
> >> Hi,
> >> Just a heads-up before end of day:
> >> 1) multinode job is failing 80% of time. James and myself did some
> >> attempts to revert or fix things but we have been unfortunate until
> >> now.
> >> Everything is documented here: https://bugs.launchpad.net/
> > We found out that https://review.openstack.org/#/c/368760/ is breaking
> > us, so we will revert it and work on it again later.
> >> 2) ovb jobs are timeing out during NetworkDeployment because
> >> 99-refresh-completed is not signaling to Heat due to instance-id being
> >> detected as null by os-apply-config.
> >> James proposed a revert: https://review.openstack.org/#/c/370250/
> >> But the patch can't be merged because of 1).
> > We are going to merge James's revert, we think it will bring back OVB
> > To merge the reverts, we need to disable voting on multinode jobs:
> > https://review.openstack.org/#/c/370922/
> > Please do not merge anything today (except the 2 reverts) until our
> > situation becomes more stable. Probably tonight or tomorrow.
> > Once situation is better, I or someone else in the team will give an
> > update here.
> > Thanks for your understanding,
> >> I'll continue to work on it tomorrow but if you're able to jump in and
> >> make progress on it, this downtime is very critical at this stage of
> >> the cycle.
> >> Any help is highly welcome.
> >> Thanks,
> >> --
> >> Emilien Macchi
> > --
> > Emilien Macchi
> Emilien Macchi
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
OpenStack Development Mailing List (not for usage questions)