So far, we're having 3 critical issues, that we all need to address as soon as we can.
Problem #1: Upgrade jobs timeout from Newton to Ocata https://bugs.launchpad.net/tripleo/+bug/1702955 Today I spent an hour to look at it and here's what I've found so far: depending on which public cloud we're running the TripleO CI jobs, it timeouts or not. Here's an example of Heat resources that run in our CI: https://www.diffchecker.com/VTXkNFuk On the left, resources on a job that failed (running on internap) and on the right (running on citycloud) it worked. I've been through all upgrade steps and I haven't seen specific tasks that take more time here or here, but some little changes that make the big change at the end (so hard to debug). Note: both jobs use AFS mirrors. Help on that front would be very welcome. Problem #2: from Ocata to Pike (containerized) missing container upload step https://bugs.launchpad.net/tripleo/+bug/1710938 Wes has a patch (thanks!) that is currently in the gate: https://review.openstack.org/#/c/493972 Thanks to that work, we managed to find the problem #3. Problem #3: from Ocata to Pike: all container images are uploaded/specified, even for services not deployed https://bugs.launchpad.net/tripleo/+bug/1710992 The CI jobs are timeouting during the upgrade process because downloading + uploading _all_ containers in local cache takes more than 20 minutes. So this is where we are now, upgrade jobs timeout on that. Steve Baker is currently looking at it but we'll probably offer some help. Solutions: - for stable/ocata: make upgrade jobs non-voting - for pike: keep upgrade jobs non-voting and release without upgrade testing Risks: - for stable/ocata: it's highly possible to inject regression if jobs aren't voting anymore. - for pike: the quality of the release won't be good enough in term of CI coverage comparing to Ocata. Mitigations: - for stable/ocata: make jobs non-voting and enforce our core-reviewers to pay double attention on what is landed. It should be temporary until we manage to fix the CI jobs. - for master: release RC1 without upgrade jobs and make progress - Run TripleO upgrade scenarios as third party CI in RDO Cloud or somewhere with resources and without timeout constraints. I would like some feedback on the proposal so we can move forward this week, Thanks. -- Emilien Macchi __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev