Overall sounds good. A couple of comments inline.
On 10/23/2017 05:46 AM, Sagi Shnaidman wrote:
Hi,
as you know we prepare transition of all OVB jobs from RH1 cloud to RDO
cloud, also a few long multinode upgrades jobs as well. We prepared a
workflow of transition below, please feel free to comment.
1) We run one job (ovb-ha-oooq) on every patch in following repos: oooq,
oooq-extras, tripleo-ci. We run rest of ovb jobs (containers and fs024)
as experimental in rdo cloud for following repos: oooq, oooq-extras,
tripleo-ci, tht, tripleo-common. It should cover most of our testing.
This step is completed.
Currently it's blocked by newton bug in RDO cloud:
https://bugs.launchpad.net/heat/+bug/1626256 , where cloud release
doesn't contain its fix: https://review.openstack.org/#/c/501592/ . From
other side, the upgrade to Ocata release (which would solve this issue
too) is blocked by bug: https://bugs.launchpad.net/tripleo/+bug/1724328
So we are in blocked state right now with moving.
Next steps:
2) We solve all issues with running on every patch job (ovb-ha-oooq) so
that it's passing (or failing exactly for same results as on rh1) for a
2 regular working days. (not weekend).
3) We should trigger experimental jobs in this time on various patches
in tht and tripleo-common and solve all issues for experimental jobs so
all ovb jobs pass.
4) We need to monitor all this time resources in openstack-nodepool
tenant (with help of rhops maybe) and be sure that it has the capacity
to run configured jobs.
I assume we will have a max jobs limit in nodepool (or whatever we're
using for that purpose) that will ensure we stay within capacity
regardless of what jobs are configured. We probably want to keep that
limit low initially so we don't have to worry about throwing a huge
number of jobs at the cloud accidentally (say someone submits a large
patch series that triggers our subset of jobs).
Obviously as we add jobs we'll need to bump the concurrent jobs limit,
but I think that should be the primary variable we change and that we
add more jobs as necessary to fill the configured limit. Also, rather
than set a time period of two days or whatever, ensure we run at the
configured limit for some period of time before increasing it. There
are slow days in ci where we might not get much useful information so we
need to make sure we don't get a false positive result from a step just
because of the quirks of ci load.
5) We set ovb-ha-oooq job as running for every patch in all places where
it's running in rh1 (in parallel with existing rh1 job). We monitor RDO
cloud that it doesn't fail and still have resources - 1.5 working days
6) We add featureset024 ovb job to run in every patch where it runs in
rh1. We continue to monitor RDO cloud - 1.5 working days
7) We add last containers ovb job to all patches where it runs on rh1.
We continue monitor RDO cloud - 2 days.
8) In case if everything is OK in all previous points and RDO cloud
still performs well, we remove ovb jobs from rh1 configuration and make
them as experimental.
9) During next few days we monitor ovb jobs and run rh1 ovb jobs as
experimental to check if we have the same results (or better :) )
10) OVB jobs on rh1 cloud stay in experimental pipeline in tripleo for a
next month or two.
--
Best regards
Sagi Shnaidman
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev