So far, we're having 3 critical issues, that we all need to address as
soon as we can.

Problem #1: Upgrade jobs timeout from Newton to Ocata
https://bugs.launchpad.net/tripleo/+bug/1702955
Today I spent an hour to look at it and here's what I've found so far:
depending on which public cloud we're running the TripleO CI jobs, it
timeouts or not.
Here's an example of Heat resources that run in our CI:
https://www.diffchecker.com/VTXkNFuk
On the left, resources on a job that failed (running on internap) and
on the right (running on citycloud) it worked.
I've been through all upgrade steps and I haven't seen specific tasks
that take more time here or here, but some little changes that make
the big change at the end (so hard to debug).
Note: both jobs use AFS mirrors.
Help on that front would be very welcome.


Problem #2: from Ocata to Pike (containerized) missing container upload step
https://bugs.launchpad.net/tripleo/+bug/1710938
Wes has a patch (thanks!) that is currently in the gate:
https://review.openstack.org/#/c/493972
Thanks to that work, we managed to find the problem #3.


Problem #3: from Ocata to Pike: all container images are
uploaded/specified, even for services not deployed
https://bugs.launchpad.net/tripleo/+bug/1710992
The CI jobs are timeouting during the upgrade process because
downloading + uploading _all_ containers in local cache takes more
than 20 minutes.
So this is where we are now, upgrade jobs timeout on that. Steve Baker
is currently looking at it but we'll probably offer some help.


Solutions:
- for stable/ocata: make upgrade jobs non-voting
- for pike: keep upgrade jobs non-voting and release without upgrade testing

Risks:
- for stable/ocata: it's highly possible to inject regression if jobs
aren't voting anymore.
- for pike: the quality of the release won't be good enough in term of
CI coverage comparing to Ocata.

Mitigations:
- for stable/ocata: make jobs non-voting and enforce our
core-reviewers to pay double attention on what is landed. It should be
temporary until we manage to fix the CI jobs.
- for master: release RC1 without upgrade jobs and make progress
- Run TripleO upgrade scenarios as third party CI in RDO Cloud or
somewhere with resources and without timeout constraints.

I would like some feedback on the proposal so we can move forward this week,
Thanks.
-- 
Emilien Macchi

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to