Re: [openstack-dev] [TripleO] overcloud_containers.yaml: container versioning, tags and use cases ?

2017-05-29 Thread Steve Baker
On Sat, May 27, 2017 at 7:07 AM, David Moreau Simard  wrote:

> Hi,
>
> Today we discussed a challenge around image tags that mostly boils
> down to limitations in how overcloud_containers.yaml is constructed
> and used.
>
> TL;DR, we need a smart and easy way to work with the
> overcloud_containers.yaml file (especially tags).
>
> Let's highlight a few use cases that we need to work through:
>
> #1. Building containers
>   For building containers, all we really care is the name of the
> images we need to build.
>   Today, we install a trunk repository and then install
> tripleo-common-containers from that trunk repository.
>   We then mostly grep/sed/awk our way from overcloud_containers.yaml
> to a clean list of images to build and then build those.
>   Relatively okay with this but prone to things breaking -- a clean
> way to get just the list of images out of there would be nice.
>
>
The command "openstack overcloud container image build" also has some
string matching logic, but then invokes kolla-build directly.

Can I suggest that we add a --list-images option to this command so that it
just returns a list of images for other image building tools to consume?


> #2. Testing and promoting containers
>   This comes right after use case #1 where we build containers in the
> pipeline.
>   For those familiar with the CI pipeline to do promotions [1], this
> would look a bit like this [2].
>
>   In practice, this works basically the same way as we build, test and
> promote tripleo-quickstart images.
>   We pick a particular trunk repository hash and build containers for
> that hash. These are then pushed with both the tags ":latest" and
> ":".
>   We're then supposed to test those containers in the pipeline but to
> do that, we need to be pulling from :, not :latest...
> although they are in theory equivalent at that given time, this might
> not always be true.
>   So the testing job(s) need a way to easily customize/pull from that
> particular hash instead of the hardcoded latest we have right now.
>
>
I would like to see another "openstack overcloud container image ..."
command which is pointed at an image registry and a
canonical overcloud_containers.yaml file, then generates
another overcloud_containers.yaml (and heat environment file) which
contains the proper latest  tags. This tool could work too for
stable version-style tags.

How about "openstack overcloud container image discover"?

This would be easier to implement if the canonical
overcloud_containers.yaml file was a template rather than a file with
hard-coded namespace and tags.


> #3. Upstream gate jobs
>   Ideally, gate jobs should never use ":latest". This is in trunk/dlrn
> terms the equivalent of "/current/" or "/consistent/".
>   They'd use something like ":latest-passed-ci" which would be the
> proper equivalent of "/current-passed-ci/" or "/current-tripleo/".
>
>
There is nothing special about the word latest. Can we give these images
the same tag as the name of the package repo they came from? so
:current-passed-ci :current-tripleo?


>   This brings an interesting challenge around how we currently add new
> images to overcloud_containers.yaml (example [3]).
>   It is expected that, when you add the image, the image is already
> present on the registry because otherwise the container jobs will fail
> since this new image cannot be pulled (example [4]).
>   My understanding currently is that humans may build and push images
> to the registry ahead of time so that this works.
>   We can keep a similar approach if that's what we really want with
> the new private registry, the job that builds container is made
> generic exactly to be able to build just a specific set of image(s) if
> we want.
>   Here's the catch, though: this new container image will have the
> ":latest" tag, it will not have ":latest-passed-ci" because it hasn't
> passed CI yet, it's being added just now.
>   So how do we address this ?
>
>
Here is an idea, the "discover" command mentioned above could filter images
based on their presence in the registry with the required tags, so the
resulting generated overcloud_containers.yaml would have less entries if
there is no image with the requested tag.


>   Note:
>   We've already discussed that some containers need to pick up the
> latest and the greatest from the "/current/" repository, either
> because they are "direct" tripleo packages or if "Depends-On" is used.
>   So far, the solution we seem to be going towards is to pick up the
> containers from ":latest-passed-ci" and then more or less add a 'yum
> update' layer to the images needing an update.
>   This is the option that is in the best interest of time, we'd
> otherwise be spending too much time building containers in jobs that
> are already taking way too long to run.
>

That is a shame, I have no suggestions to avoid this though.


> #4. Test days
>   When doing test days, we know to point testers to
> /current-passed-ci/ as well as tested 

[openstack-dev] [TripleO] overcloud_containers.yaml: container versioning, tags and use cases ?

2017-05-26 Thread David Moreau Simard
Hi,

Today we discussed a challenge around image tags that mostly boils
down to limitations in how overcloud_containers.yaml is constructed
and used.

TL;DR, we need a smart and easy way to work with the
overcloud_containers.yaml file (especially tags).

Let's highlight a few use cases that we need to work through:

#1. Building containers
  For building containers, all we really care is the name of the
images we need to build.
  Today, we install a trunk repository and then install
tripleo-common-containers from that trunk repository.
  We then mostly grep/sed/awk our way from overcloud_containers.yaml
to a clean list of images to build and then build those.
  Relatively okay with this but prone to things breaking -- a clean
way to get just the list of images out of there would be nice.

#2. Testing and promoting containers
  This comes right after use case #1 where we build containers in the pipeline.
  For those familiar with the CI pipeline to do promotions [1], this
would look a bit like this [2].

  In practice, this works basically the same way as we build, test and
promote tripleo-quickstart images.
  We pick a particular trunk repository hash and build containers for
that hash. These are then pushed with both the tags ":latest" and
":".
  We're then supposed to test those containers in the pipeline but to
do that, we need to be pulling from :, not :latest...
although they are in theory equivalent at that given time, this might
not always be true.
  So the testing job(s) need a way to easily customize/pull from that
particular hash instead of the hardcoded latest we have right now.

#3. Upstream gate jobs
  Ideally, gate jobs should never use ":latest". This is in trunk/dlrn
terms the equivalent of "/current/" or "/consistent/".
  They'd use something like ":latest-passed-ci" which would be the
proper equivalent of "/current-passed-ci/" or "/current-tripleo/".

  This brings an interesting challenge around how we currently add new
images to overcloud_containers.yaml (example [3]).
  It is expected that, when you add the image, the image is already
present on the registry because otherwise the container jobs will fail
since this new image cannot be pulled (example [4]).
  My understanding currently is that humans may build and push images
to the registry ahead of time so that this works.
  We can keep a similar approach if that's what we really want with
the new private registry, the job that builds container is made
generic exactly to be able to build just a specific set of image(s) if
we want.
  Here's the catch, though: this new container image will have the
":latest" tag, it will not have ":latest-passed-ci" because it hasn't
passed CI yet, it's being added just now.
  So how do we address this ?

  Note:
  We've already discussed that some containers need to pick up the
latest and the greatest from the "/current/" repository, either
because they are "direct" tripleo packages or if "Depends-On" is used.
  So far, the solution we seem to be going towards is to pick up the
containers from ":latest-passed-ci" and then more or less add a 'yum
update' layer to the images needing an update.
  This is the option that is in the best interest of time, we'd
otherwise be spending too much time building containers in jobs that
are already taking way too long to run.

#4. Test days
  When doing test days, we know to point testers to
/current-passed-ci/ as well as tested quickstart images.
  How can we make it easy for containers ? If the container list from
tripleo-common is hardcoded to latest, that won't work. If it's
hardcoded to :latest-passed-ci, it won't work for other use cases.
  Ideally this would be super easy for end users as well as developers
so that they can get started easily.

#5 Stable releases, users, operators & co
  The packaging workflow is not the same for trunk (dlrn out of git
source on trunk.rdoproject.oirg) and for stable releases (CentOS build
system out of released tarballs on mirror.centos.org).
  It's also going to be different for containers.
  For trunk, we'll be building containers with trunk repositories and
publishing them to a private registry analogous to
trunk.rdoproject.org repositories.
  For stable releases, while still hand-wavy and foggy, we seem to be
headed in the direction of the CentOS official registry which takes
Dockerfiles in pseudo dist-git repositories and builds/publishes the
containers through Jenkins jobs.
  This is sort of similar to how downstream would work if you replace
Jenkins by Brew/OSBS.

  So here, we want to use the overcloud_containers.yaml file to
"compile" Dockerfiles which will be shipped to git repositories and
then built by another process.
  These containers will be published somewhere that tripleo-common is
sort of expected to know ahead of time because users, customers and
developers need to be pulling from that "stable" source and from a
"stable" tag.

So... what do we do ?

[1]: