Re: [openstack-dev] [TripleO][CI] Tempest sources for testing tripleo in CI environment

2016-04-18 Thread Wesley Hayutin
On Mon, Apr 18, 2016 at 8:36 AM, Sagi Shnaidman  wrote:

> For making clear all advantages and disadvantages, I've created a doc:
>
> https://docs.google.com/document/d/1HmY-I8OzoJt0SzLzs79hCa1smKGltb-byrJOkKKGXII/edit?usp=sharing
>
> Please comment.
>
> On Sun, Apr 17, 2016 at 12:14 PM, Sagi Shnaidman 
> wrote:
>
>>
>> Hi,
>>
>> John raised up the issue - where should we take tempest sources from.
>> I'm not sure where to take them from, so I bring it to wider discussion.
>>
>> Right now I use tempest from delorean packages. In comparison with
>> original tempest I don't see any difference in tests, only additional
>> configuration scripts:
>> https://github.com/openstack/tempest/compare/master...redhat-openstack:master
>> It's worth to mention that in case of delorean tempest the configuration
>> scripts fit tempest tests configuration, however in case of original
>> tempest repo it's required to change them and maintain according to very
>> dynamical configuration.
>>
>> So, do we need to use pure upstream tempest from current source and to
>> maintain configuration scripts or we can use packaged from delorean and not
>> duplicate effort of test teams?
>>
>> Thanks
>> --
>> Best regards
>> Sagi Shnaidman
>>
>
>
>
> --
> Best regards
> Sagi Shnaidman
>


Hrm..  can't we use upstream tempest and the midstream configure script
that will be checked into tripleo upstream repos?
I don't think we need to make the choice you are proposing.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] 3rd party CI

2016-07-14 Thread Wesley Hayutin
Greetings,

Just wanted to let folks know we're bringing up two 3rd party gates for
TripleO CI atm.

1. The first gate will be for upgrading from mitaka -> newton/master
2. The second gate will be for running TripleO CI on RHEL 7.2

We're working out the logging for #2.  ATM we're going to try and store the
logs on an openshift gear as we don't have a public file server available.
If anyone has a public server where we can store logs and would like to
help, please let us know.

If you have any questions please let us know.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] 3rd party CI

2016-07-14 Thread Wesley Hayutin
On Thu, Jul 14, 2016 at 5:13 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Thu, Jul 14, 2016 at 4:45 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> > Greetings,
> >
> > Just wanted to let folks know we're bringing up two 3rd party gates for
> > TripleO CI atm.
> >
> > 1. The first gate will be for upgrading from mitaka -> newton/master
> > 2. The second gate will be for running TripleO CI on RHEL 7.2
> >
> > We're working out the logging for #2.  ATM we're going to try and store
> the
> > logs on an openshift gear as we don't have a public file server
> available.
> > If anyone has a public server where we can store logs and would like to
> > help, please let us know.
> >
> > If you have any questions please let us know.
>
> Great, this is a cool news.
> Can you provide the link of git repos that execute the jobs?
> So anyone can contribute to it and also see what it's actually testing
> (what scenario, etc).
>
> Thanks,
> --
> Emilien Macchi
>

TripleO RHEL will run w/ the normal TripleO-Quickstart playbook w/ RHEL
images [1]

Upgrades, will run a liberty deployment using [4], but is the functional
equivalent of [1] on CentOS, then execute the upgrade w/ [2]
Example [4] adds some additional inventory steps required to reach out to
all the nodes in the deployment and is an example of composable roles.

Let us know if you need more information.
Thanks

[1]
https://github.com/openstack/tripleo-quickstart/blob/master/playbooks/quickstart.yml
[2]
https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-upgrade/blob/master/playbooks/upgrade.yml

[3]
https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-upgrade
[4]
https://github.com/openstack/tripleo-quickstart/blob/master/playbooks/tripleo-roles.yml



>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Making TripleO CI easier to consume outside of TripleO CI

2016-07-21 Thread Wesley Hayutin
On Thu, Jul 21, 2016 at 5:47 PM, James Slagle <james.sla...@gmail.com>
wrote:

> On Tue, Jul 19, 2016 at 5:15 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> >
> >
> > On Tue, Jul 19, 2016 at 2:44 PM, James Slagle <james.sla...@gmail.com>
> > wrote:
> >> Part of the goal of tripleo.sh was to mirror the commands in the
> >> documentation...that the same commands in the docs are in tripleo.sh.
> >> I know that has somewhat failed, but it was the goal anyway.
> >>
> >> Do you think integrating ansible into tripleo-ci changes that at all?
> >> tripleo-ci would be using ansible in some places, which in turn runs
> >> the commands (or their equivalent) that we actually document. Is the
> >> documentation still showing the same commands it does now, or is it
> >> showing running ansible as tripleo-ci would be doing?
> >
> >
> > Harry Rybacki and I are working on this now.  I think we have something
> that
> > is reasonable for when shell can be used and when ansible modules are
> > required.  I think he can make all this work public and everyone in
> TripleO
> > can keep tabs on the progress.
>
> Yes, I saw his email shortly after sending my reply. There is a lot to
> digest there, but it sounds promising. Perhaps we could start with
> something small and iteratively consume the generated docs as well. We
> could replace the manual docs over time by including sphinx docs at
> the right places that were automatically generated.
>

Agree,
He has a small example project where I think you could track the work and
progress.
https://github.com/HarryRybacki/tripleo-documentor

We're hoping to have the undercloud and overcloud steps templated and built
in the very near future.


> >
> >>
> >>
> >> I think I'm mostly in agreement with your #2 proposal, perhaps with
> >> the exception of having to rely on external roles. I don't think I
> >> would want to see tripleo-ci rely on ansible roles from a
> >> redhat-openstack organization on github.
> >>
> >> I know that we already have a lot of external dependencies in TripleO,
> >> and that not everything has to come from git.openstack.org. However, I
> >> think that we'd want to make the "source" for tripleo-ci be owned by
> >> the TripleO project and hosted by OpenStack infrastructure as much as
> >> possible. tripleo-quickstart already is, so I think it would be fine
> >> to start proposing some changes to tripleo-ci that use
> >> tripleo-quickstart to eliminate some duplication if everyone agrees
> >> with that. Maybe the repo-setup would be a good first iterative step.
> >>
> >> As for the external roles, I'm less of a fan of relying on these
> >> directly if they're not part of TripleO. I think the project should
> >> have ownership over how it's defined in CI to deploy/update/upgrade
> >> overclouds, etc.
> >
> >
> > +1 I think this can be handled in a couple ways depending on how many
> > additional git repos are acceptable to TripleO upstream.
> >
> > So maybe if I provide an example this will make more sense.  I think bare
> > metal will work as an example.
> >
> > There is a need for the code that drives CI for virt to be able to drive
> CI
> > for bare metal.  Certainly the bare metal use case will not be used
> nearly
> > as much as the virt workflow and I suspect we don't want code conflicts,
> > merge issues coming from the bare metal use case that may interrupt or
> block
> > the mainline virt use case.  I think TripleO still cares what the bare
> metal
> > code looks like, how it's developed, and if we can use it w/ 3rd party CI
> > and extra checks.  It's important to maintain bare metal roles in TripleO
> > but it's easier if they are in another git repository.   It also
> > demonstrates the composability of the CI.
> >
> > Another use case would be anything that may be downstream specific.  I
> can't
> > think of a great example atm, but there are use cases that CI should be
> able
> > to drive that will probably never be part of the mainstream tripleo ci
> jobs.
> >
> > I believe we can solve this by having just two git repos in the long
> run.  I
> > think one git repo would be for any code path that is used directly in a
> job
> > in tripleo-ci itself.  The second repo would contain multiple ansible
> roles,
> > call it tripleo-ci-extras.  The second repo would contain any extra roles
> > that need to be plugged in for a use case that is not in a tripleo-ci job
> >

Re: [openstack-dev] [TripleO] Proposing Attila Darazs for tripleo-quickstart core​

2016-07-26 Thread Wesley Hayutin
On Tue, Jul 26, 2016 at 10:32 AM, John Trowbridge  wrote:

> I would like to add Attila to the tripleo-quickstart core reviewers
> group. Much of his work has been on some of the auxillary roles that
> quickstart makes use of in RDO CI, however his numbers on quickstart
> itself[1] are in line with the other core reviewers.
>
> I will be out for paternity leave the next 4 weeks, so it will also be
> nice to have 3 core reviewers during that time in case I dont end up
> doing too many reviews.
>
> If there are no objections I will make the change at the end of the week.
>
> - trown
>
> [1] http://stackalytics.com/report/contribution/tripleo-quickstart/90


+1 very nice


>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] additional git repo(s) for tripleo-quickstart

2016-08-11 Thread Wesley Hayutin
On Wed, Aug 10, 2016 at 9:45 PM, Lars Kellogg-Stedman <l...@redhat.com>
wrote:

> On Wed, Aug 10, 2016 at 03:26:18PM -0400, Wesley Hayutin wrote:
> > I'm proposing the creation of a repo called tripleo-quickstart-extras
> that
> > would contain some or all of the current third party roles used with
> > TripleO-Quickstart.
>
> Which roles in particular would you place in this -extras repository?
> One of our goals in moving roles *out* of the quickstart was to move
> them into a one-repository-per-role model that makes things easily
> composable (install only those roles you need) and that
> compartmentalizes related sets of changes.
>

Lars, I'm thinking about this with the following priorities in mind..
1. TripleO-Quickstart code needs to be upstream and governed by the TripleO
project
2. TripleO-Quickstart itself is a replacement for instack-virt-setup
3. TripleO-Quickstart's roles need to be composable
4. TripleO-Quickstart needs to be composable for 3rd party git repositories

If we can get one additional git repo under the TripleO umbrella I think
we've accomplished 1-3.  We can prove #4 with yum repos outside of
OpenStack.

Compartmentalizing changes in their own git repositories is nice, but also
has disadvantages.  For instance, there is less governance across the roles
by oooq core members.  If I had to weigh compartmentalizing the roles vs. a
tripleo-quickstart-extras repo in TripleO, my vote would be for the
latter.  This is just my opinion though.

James made it clear that if TripleO-Quickstart is to provide automatically
generated documentation for the TripleO project the src code has to be
under the TripleO project and the execution itself must run in the TripleO
CI environment.

It would be great if TripleO cores could weigh in and assist us in getting
one additional git repo so we can proceed with determining if automatically
generated documentation would be something TripleO would like.

Thanks



>
> Is this just a convenience for a bunch of roles that are typically
> installed together?
>
> --
> Lars Kellogg-Stedman <l...@redhat.com> | larsks @
> {freenode,twitter,github}
> Cloud Engineering / OpenStack  | http://blog.oddbit.com/
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] additional git repo(s) for tripleo-quickstart

2016-08-10 Thread Wesley Hayutin
Greetings,

In an effort to make TripleO CI composable and managed and governed by the
TripleO project we have  found the need to create additional git repos in
openstack under the TripleO project.  This could also be done outside of
the TripleO project, but ideally it's in TripleO.

I'm proposing the creation of a repo called tripleo-quickstart-extras that
would contain some or all of the current third party roles used with
TripleO-Quickstart.

The context behind this discussion is that we would like to use oooq to
document baremetal deployments to supplement and or replace the current
TripleO documentation.  It would be ideal of the code used to create this
documentation was part of the TripleO project.

We're looking for discussion and permission for a new TripleO git repo to
be created.

Thanks!
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Gabriele Cerami for tripleo-quickstart core

2016-07-18 Thread Wesley Hayutin
On Mon, Jul 18, 2016 at 11:06 AM, John Trowbridge  wrote:

> Howdy,
>
> I would like to propose Gabriele (panda on IRC), for tripleo-quickstart
> core. He has worked on some pretty major features for the project
> (explicit teardown, devmode), and has a good understanding of the code
> base.
>
> This will bring us to three dedicated core reviewers for
> tripleo-quickstart (myself and larsks being the other two), so I would
> also like to implement a 2x +2 policy at this time. Note, that all cores
> of TripleO are also cores on tripleo-quickstart, and should feel free to
> +2 changes as they are comfortable.
>
> If there are no objections, I will put in a change at the end of the week.
>
> Thanks,
>

+1 from me :)


>
> - trown
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Making TripleO CI easier to consume outside of TripleO CI

2016-07-19 Thread Wesley Hayutin
On Tue, Jul 19, 2016 at 2:44 PM, James Slagle 
wrote:

> On Tue, Jul 12, 2016 at 3:39 PM, John Trowbridge  wrote:
> > Howdy folks,
> >
> > In the TripleO meeting two weeks ago, it came up that tripleo-quickstart
> > is being used as a CI tool in RDO. This came about organically, because
> > we needed to use RDO CI to self-gate quickstart (it relies on having a
> > baremetal virthost). It displaced another ansible based CI tool there
> > (khaleesi) and most(all?) of the extra functionality from that tool
> > (upgrades, scale, baremetal, etc.) has been moved into discrete ansible
> > roles that are able to plugin to quickstart.[1]
> >
> > We are still left with two different tool sets, where one should suffice
> > (and focus CI efforts in one place).
> >
> > I see two different ways to resolve this.
> >
> > 1. Actively work on making the tripleo-ci scripts consumable external to
> > tripleo-ci. We have a project in RDO (WeiRDO)[2] that is consuming
> > upstream CI for packstack and puppet, so it is not totally far-fetched
> > to add support for TripleO jobs.
> >
> > Pros:
> > - All CI development just happens directly in tripleo-ci and RDO just
> > inherits that work.
> >
> > Cons:
> > - This is totally untried, and therefore a totally unknown amount of
> work.
> > - It is all or nothing in that there is no incremental path to get the
> > CI scripts working outside of CI.
> > - We have to rewrite a bunch of working ansible code in bash which IMO
> > is the wrong direction for a modern CI system.
> >
> >
> > 2. Actively work on making tripleo-ci consume the ansible work in
> > tripleo-quickstart and the external role ecosystem around it.
> >
> > Pros:
> > - This could be done incrementally, replacing a single function from
> > tripleo.sh with an invocation of tripleo-quickstart that performs that
> > function instead.
> > - We would be able to pull in a lot of extra functionality via these
> > external roles for free(ish).
> >
> > Cons:
> > - Similarly unknown amount of work to completely switch.
> > - CI development would be done in multiple repos, though each would have
> > discrete and well defined functionality.
>
> I agree we could consolidate as well. Having tripleo-ci integrated
> with ansible would probably be helpful. Some of the work I've been
> doing to support multinode jobs would benefit from being able to use
> ansible to run some setup tasks on each of the nodes as part of the
> job as well.
>
> Part of the goal of tripleo.sh was to mirror the commands in the
> documentation...that the same commands in the docs are in tripleo.sh.
> I know that has somewhat failed, but it was the goal anyway.
>
> Do you think integrating ansible into tripleo-ci changes that at all?
> tripleo-ci would be using ansible in some places, which in turn runs
> the commands (or their equivalent) that we actually document. Is the
> documentation still showing the same commands it does now, or is it
> showing running ansible as tripleo-ci would be doing?
>

Harry Rybacki and I are working on this now.  I think we have something
that is reasonable for when shell can be used and when ansible modules are
required.  I think he can make all this work public and everyone in TripleO
can keep tabs on the progress.


>
> I think I'm mostly in agreement with your #2 proposal, perhaps with
> the exception of having to rely on external roles. I don't think I
> would want to see tripleo-ci rely on ansible roles from a
> redhat-openstack organization on github.
>
> I know that we already have a lot of external dependencies in TripleO,
> and that not everything has to come from git.openstack.org. However, I
> think that we'd want to make the "source" for tripleo-ci be owned by
> the TripleO project and hosted by OpenStack infrastructure as much as
> possible. tripleo-quickstart already is, so I think it would be fine
> to start proposing some changes to tripleo-ci that use
> tripleo-quickstart to eliminate some duplication if everyone agrees
> with that. Maybe the repo-setup would be a good first iterative step.
>
> As for the external roles, I'm less of a fan of relying on these
> directly if they're not part of TripleO. I think the project should
> have ownership over how it's defined in CI to deploy/update/upgrade
> overclouds, etc.
>

+1 I think this can be handled in a couple ways depending on how many
additional git repos are acceptable to TripleO upstream.

So maybe if I provide an example this will make more sense.  I think bare
metal will work as an example.

There is a need for the code that drives CI for virt to be able to drive CI
for bare metal.  Certainly the bare metal use case will not be used nearly
as much as the virt workflow and I suspect we don't want code conflicts,
merge issues coming from the bare metal use case that may interrupt or
block the mainline virt use case.  I think TripleO still cares what the
bare metal code looks like, how it's developed, and if we can use 

Re: [openstack-dev] [TripleO] Making TripleO CI easier to consume outside of TripleO CI

2016-07-13 Thread Wesley Hayutin
On Tue, Jul 12, 2016 at 3:39 PM, John Trowbridge  wrote:

> Howdy folks,
>
> In the TripleO meeting two weeks ago, it came up that tripleo-quickstart
> is being used as a CI tool in RDO. This came about organically, because
> we needed to use RDO CI to self-gate quickstart (it relies on having a
> baremetal virthost). It displaced another ansible based CI tool there
> (khaleesi) and most(all?) of the extra functionality from that tool
> (upgrades, scale, baremetal, etc.) has been moved into discrete ansible
> roles that are able to plugin to quickstart.[1]
>
> We are still left with two different tool sets, where one should suffice
> (and focus CI efforts in one place).
>
> I see two different ways to resolve this.
>
> 1. Actively work on making the tripleo-ci scripts consumable external to
> tripleo-ci. We have a project in RDO (WeiRDO)[2] that is consuming
> upstream CI for packstack and puppet, so it is not totally far-fetched
> to add support for TripleO jobs.
>

I think we have to at least point out that RDO is not the only other target
for a CI tool.
There are a few more groups that would benefit from the leadership of the
upstream CI system.  Without
calling out specific groups, I'm thinking of the various OpenStack network
teams, performance teams,
test teams, etc.. that rely on setting up TripleO as the base for their
work. To make things a bit more complicated
there is not a single source of requirements for the various groups that
would benefit from a robust,
flexible upstream TripleO CI tool set.  I'm not convinced that the current
bash scripts can be
reworked or wrapped in a way that can be flexible enough to handle what is
an essentially
unknown set of requirements.

IMHO we require a tool set that is pluggable, composable and flexible
enough such that
other development, and CI teams that rely on TripleO as the base for their
work feel comfortable extending
and replacing parts of the CI tool set to fit their needs.



>
> Pros:
> - All CI development just happens directly in tripleo-ci and RDO just
> inherits that work.
>
> Cons:
> - This is totally untried, and therefore a totally unknown amount of work.
> - It is all or nothing in that there is no incremental path to get the
> CI scripts working outside of CI.
> - We have to rewrite a bunch of working ansible code in bash which IMO
> is the wrong direction for a modern CI system.
>
>
> 2. Actively work on making tripleo-ci consume the ansible work in
> tripleo-quickstart and the external role ecosystem around it.
>
> Pros:
> - This could be done incrementally, replacing a single function from
> tripleo.sh with an invocation of tripleo-quickstart that performs that
> function instead.
> - We would be able to pull in a lot of extra functionality via these
> external roles for free(ish).
>
> Cons:
> - Similarly unknown amount of work to completely switch.
> - CI development would be done in multiple repos, though each would have
> discrete and well defined functionality.
>
>
> Personally, I don't think we should do anything drastic with CI until
> after we release Newton, so we don't add any risk of impacting new
> features that haven't landed yet. I do think it would be a good goal for
> Ocata to have a CI system in TripleO that is consumable outside of
> TripleO. In any case, this email is simply to garner feedback if other's
> think this is a worthy thing to pursue and opinions on how we can get
> there.
>

+1 here.  I agree there should be enough time for thoughtful conversation
without
disrupting higher priority work.

Thanks for sending this out John!


>
>
> [1]
>
> https://github.com/redhat-openstack?utf8=%E2%9C%93=ansible-role-tripleo
> (note not all of these are actively used/developed)
> [2] https://github.com/rdo-infra/weirdo
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Making TripleO CI easier to consume outside of TripleO CI

2016-07-13 Thread Wesley Hayutin
On Wed, Jul 13, 2016 at 8:54 AM, Michal Pryc  wrote:

> John,
>
> On Tue, Jul 12, 2016 at 9:39 PM, John Trowbridge  wrote:
>
>> Howdy folks,
>>
>> In the TripleO meeting two weeks ago, it came up that tripleo-quickstart
>> is being used as a CI tool in RDO. This came about organically, because
>> we needed to use RDO CI to self-gate quickstart (it relies on having a
>> baremetal virthost). It displaced another ansible based CI tool there
>> (khaleesi) and most(all?) of the extra functionality from that tool
>> (upgrades, scale, baremetal, etc.) has been moved into discrete ansible
>> roles that are able to plugin to quickstart.[1]
>>
>>
> Here is small sum of frameworks/tools to give you an idea what we are
> currently using to test RHOS components.
>
> To create Triple-O undercloud/overcloud we are using the:
>
> https://github.com/redhat-openstack/ansible-ovb
>
> And then to install RHOSP on existing Triple-O:
>
> https://github.com/redhat-openstack/ansible-rhosp
>
> Then to run CI in such environment we use octario which is our main tool
> to run different flavors of tests and it's separate CI tool to be ready
> with different provisioning frameworks.
>
> https://github.com/redhat-openstack/octario/
>
>
> In the simplistic environment to run simple tests we are using InfraRed to
> provision simple instance (without Triple-O):
>
> https://github.com/rhosqeauto/InfraRed
>
> And then we run octario in such environment to run actual tests.
>
> Ideally if the provisioning parts were common and we could reuse them.
> Currently we need to use yet another set of tools to be able to patch rpm's
> prior to running tests.
>

+1, agree
I think this part has been addressed w/
https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart


>
> There was an idea planted by dsariel to move some of the playbooks into
> ansible-galaxy roles (possibly other teams had such idea as well), which I
> can see it's another +1 for going with tools currently developed by Wes as
> they are pretty separate and could be converted into ansible-galaxy, to be
> available across different use-cases, but then it would need to be well
> defined so the roles are not multiplied and we won't end up having similar
> roles.
>
>
We also considered ansible-galaxy and any of the ansible roles found in
github/redhat-openstack/ansible-role could be moved into galaxy.  Galaxy
did not end up meeting our requirements for installing the roles and we
ended up using python setuptools.  Some roles have specific config and
playbooks that need to be copied to a standard location and galaxy just did
not do that very well.


>
> best
> Michal Pryc
>
>
>
>> We are still left with two different tool sets, where one should suffice
>> (and focus CI efforts in one place).
>>
>> I see two different ways to resolve this.
>>
>> 1. Actively work on making the tripleo-ci scripts consumable external to
>> tripleo-ci. We have a project in RDO (WeiRDO)[2] that is consuming
>> upstream CI for packstack and puppet, so it is not totally far-fetched
>> to add support for TripleO jobs.
>>
>> Pros:
>> - All CI development just happens directly in tripleo-ci and RDO just
>> inherits that work.
>>
>> Cons:
>> - This is totally untried, and therefore a totally unknown amount of work.
>> - It is all or nothing in that there is no incremental path to get the
>> CI scripts working outside of CI.
>> - We have to rewrite a bunch of working ansible code in bash which IMO
>> is the wrong direction for a modern CI system.
>>
>>
>> 2. Actively work on making tripleo-ci consume the ansible work in
>> tripleo-quickstart and the external role ecosystem around it.
>>
>> Pros:
>> - This could be done incrementally, replacing a single function from
>> tripleo.sh with an invocation of tripleo-quickstart that performs that
>> function instead.
>> - We would be able to pull in a lot of extra functionality via these
>> external roles for free(ish).
>>
>> Cons:
>> - Similarly unknown amount of work to completely switch.
>> - CI development would be done in multiple repos, though each would have
>> discrete and well defined functionality.
>>
>>
>> Personally, I don't think we should do anything drastic with CI until
>> after we release Newton, so we don't add any risk of impacting new
>> features that haven't landed yet. I do think it would be a good goal for
>> Ocata to have a CI system in TripleO that is consumable outside of
>> TripleO. In any case, this email is simply to garner feedback if other's
>> think this is a worthy thing to pursue and opinions on how we can get
>> there.
>>
>>
>> [1]
>>
>> https://github.com/redhat-openstack?utf8=%E2%9C%93=ansible-role-tripleo
>> (note not all of these are actively used/developed)
>> [2] https://github.com/rdo-infra/weirdo
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> 

Re: [openstack-dev] [tripleo] TripleO CI mentoring

2016-07-05 Thread Wesley Hayutin
On Tue, Jul 5, 2016 at 1:06 PM, Steven Hardy  wrote:

> Hi all,
>
> At last weeks meeting, we discussed the idea of some sort of rotation where
> folks would volunteer their time to both help fix CI when it breaks, and
> also pass on some of the accrued knowledge within the team to newer folks
> wishing to learn.
>
> I'm hoping this will achive a few things:
> - Reduce the load on the subset of folks constantly fixing CI by getting
>   more people involved and familiar
> - Identify areas where we need to document better so 1-1 mentoring isn't
>   needed in the future.
>
> Note that this is explicitly *not* about volunteering to be the one person
> that fixes all-the-things in CI, everyone is still encouraged to do that,
> it's more about finding folks willing to set aside some time to be
> responsive on IRC, act as a point of contact, and take some extra time to
> pass on knowledge around the series of steps we take when a trunk
> regression or other CI related issue occurs.
>
> I started this etherpad:
>
> https://etherpad.openstack.org/p/tripleo-ci-mentoring
>
> I'd suggest we start from the week after the n-2 milestone, and I've
> volunteered as the first mentor for that week.
>
> Feel free to update if you're willing in participating in the ongoing task
> of keeping TripleO CI running smoothly in any capacity, and hopefully we
> can get more folks involved and communicating.
>
> If anyone has any thoughts around this process feel free to reply here and
> we can hopefully refine things so they are helpful to folks.
>
> Thanks!
>
> Steve
>

Awesome, thanks Steve!


>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Progress on overcloud upgrade / update jobs

2016-08-05 Thread Wesley Hayutin
On Fri, Aug 5, 2016 at 4:08 PM, Emilien Macchi  wrote:

> On Fri, Aug 5, 2016 at 1:58 PM, Steven Hardy  wrote:
> > On Thu, Aug 04, 2016 at 09:46:20PM -0400, Emilien Macchi wrote:
> >> Hi,
> >>
> >> I'm currently working by iteration to get a new upstream job that test
> >> upgrades and update.
> >> Until now, I'm doing baby steps. I bootstrapped the work to upgrade
> >> undercloud, see https> ://review.openstack.org/#/c/346995/ for details
> >> (it's almost working hitting a packaging issue now).
> >>
> >> Now I am interested by having 2 overcloud jobs:
> >>
> >> - update: Newton -> Newton: basically, we already have it with
> >> gate-tripleo-ci-centos-7-ovb-upgrades - but my proposal is to use
> >> multinode work that James started.
> >> I have a PoC (2 lines of code):
> >> https://review.openstack.org/#/c/351330/1 that works, it deploys an
> >> overcloud using packaging, applies the patch in THT and run overcloud
> >> update. I tested it and it works fine, (I tried to break Keystone).
> >> Right now the job name is
> >> gate-tripleo-ci-centos-7-nonha-multinode-upgrades-nv because I took
> >> example from the existing ovb job that does the exact same thing.
> >> I propose to rename it to
> >> gate-tripleo-ci-centos-7-nonha-multinode-updates-nv. What do you
> >> think?
> >
> > This sounds good, and it seems to be a valid replacement for the old
> > "upgrades" job - it won't catch all kinds of update bugs (in particular
> it
> > obviously won't run any packaged based updates at all), but it will catch
> > the most serious template regressions, which will be useful coverage to
> > maintain I think.
> >
> >> - upgrade: Mitaka -> Newton: I haven't started anything yet but the
> >> idea is to test the upgrade from stable to master, using multinode job
> >> now (not ovb).
> >> I can prototype something but I would like to hear from our community
> before.
> >
> > I think getting this coverage in place is very important, we're
> > experiencing a lot of post-release pain due to the lack of this coverage,
> > so +1 on any steps we can take to get some coverage here, I'd say go
> ahead
> > and do the prototype if you have time to do it.
>
> ok, /me working on it.
>
> > You may want to chat with weshay, as I know there are some RDO upgrade
> > tests which were planned to be run as third-party jobs to get some
> upgrade
> > coverage - I'm not sure if there is any scope for reuse here, or if it
> will
> > be easier to just wire in the upgrade via our current scripts (obviously
> > some form of reuse would be good if possible).
>
> ack
>
> >> Please give some feedback if you are interested by this work and I
> >> will spend some time during the next weeks on $topic.
> >>
> >> Note: please also look my thread about undercloud upgrade job, I need
> >> your feedback too.
> >
> > My only question about undercloud upgrades is whether we might combine
> the
> > overcloud upgrade job with this, e.g upgrade undercloud, then updgrade
> > overcloud.  Probably the blocker here will be the gate timeout I guess,
> > even if we're using pre-cached images etc.
>
> Yes, my final goal was to have a job like:
> 1) deploy Mitaka undercloud
> 2) deploy Mitaka overcloud
> 3) run pingtest
> 4) upgrade undercloud to Newton
> 5) upgrade overcloud to newton
> 6) re-run pingtest
>

FYI.. Mathieu wrote up https://review.openstack.org/#/c/323750/

Emilien feel free to take it over, just sync up w/ Mathieu when he returns
from PTO on Monday.
Thanks


>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Sergey (Sagi) Shnaidman for core on tripleo-ci

2017-02-02 Thread Wesley Hayutin
On Thu, Feb 2, 2017 at 9:35 AM, Attila Darazs  wrote:

> On 02/01/2017 08:37 PM, John Trowbridge wrote:
>
>>
>>
>> On 01/30/2017 10:56 AM, Emilien Macchi wrote:
>>
>>> Sagi, you're now core on TripleO CI repo. Thanks for your hard work on
>>> tripleo-quickstart transition, and also helping by keeping CI in good
>>> shape, your work is amazing!
>>>
>>> Congrats!
>>>
>>> Note: I couldn't add you to tripleo-ci group, but only to tripleo-core
>>> (Gerrit permissions), which mean you can +2 everything but we trust
>>> you to use it only on tripleo-ci. I'll figure out the Gerrit
>>> permissions later.
>>>
>>>
>> I also told Sagi that he should also feel free to +2 any
>> tripleo-quickstart/extras patches which are aimed at transitioning
>> tripleo-ci to use quickstart. I didn't really think about this as an
>> extra permission, as any tripleo core has +2 on
>> tripleo-quickstart/extras. However, I seem to have surprised the other
>> quickstart cores with this. None were opposed to the idea, but just
>> wanted to make sure that it was clearly communicated that this is allowed.
>>
>> If there is some objection to this, we can consider it further. FWIW,
>> Sagi has been consistently providing high quality critical reviews for
>> tripleo-quickstart/extras for some time now, and was pivotal in the
>> setup of the quickstart based OVB job.
>>
>
> Thanks for the clarification.
>
> And +1 on Sagi as a quickstart/extras core. I really appreciate his
> critical eyes on the changes.
>
> Attila


Thanks Emilien, John!
Congrats Sagi!


>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Sergey (Sagi) Shnaidman for core on tripleo-ci

2017-01-24 Thread Wesley Hayutin
+1

On Tue, Jan 24, 2017 at 2:10 PM, Brent Eagles  wrote:

>
>
> On Tue, Jan 24, 2017 at 1:33 PM, Juan Antonio Osorio 
> wrote:
>
>> Sagi (sshnaidm on IRC) has done significant work in TripleO CI (both
>> on the current CI solution and in getting tripleo-quickstart jobs for
>> it); So I would like to propose him as part of the TripleO CI core team.
>>
>> I think he'll make a great addition to the team and will help move CI
>> issues forward quicker.
>>
>> Best Regards,
>>
>>
>>
>> --
>> Juan Antonio Osorio R.
>> jaosorior
>>
>>
> ​+1​
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Please review, tripleo ci transition to the quickstart tool set

2016-11-08 Thread Wesley Hayutin
Greetings,

As discussed at the ocata summit, work is being done to update the set of
tools used in tripleo ci to tripleo-quickstart and
tripleo-quickstart-extras [1-2].  There is work, review, and validation
required prior to the transition.  A draft of the work items is available
[3], please if you have time review the etherpad and respond with any
questions, suggestions or concerns.

Thank you very much!

[1]
https://blueprints.launchpad.net/tripleo/+spec/use-tripleo-quickstart-and-tripleo-quickstart-extras-for-the-tripleo-ci-toolset
[2] https://review.openstack.org/#/c/386250/
[3] https://etherpad.openstack.org/p/tripleo-ci-transition-to-quickstart
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread Wesley Hayutin
See my response inline.

On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur <dtant...@redhat.com> wrote:

> On 10/17/2016 11:10 PM, Wesley Hayutin wrote:
>
>> Greetings,
>>
>> The RDO CI team is considering adding retries to our calls to
>> introspection
>> again [1].
>> This is very handy for bare metal environments where retries may be
>> needed due
>> to random chaos in the environment itself.
>>
>> We're trying to balance two things here..
>> 1. reduce the number of false negatives in CI
>> 2. try not to overstep what CI should vs. what the product should do.
>>
>> We would like to hear your comments if you think this is acceptable for
>> CI or if
>> this may be overstepping.
>>
>> Thank you
>>
>>
>> [1] http://paste.openstack.org/show/586035/
>>
>
> Hi!
>
> I probably lack some context of what exactly problems you face. I don't
> have any disagreement with retrying it, just want to make sure we're not
> missing actual bugs.
>

I agree, we have to be careful not to paper over bugs while we try to
overcome typical environmental delays that come w/ booting, rebooting $x
number of random hardware nodes.
To make this a little more crystal clear, I'm trying to determine is where
progressive delays and retries should be injected into the workflow of
deploying an overcloud.
Should we add options in the product itself that allow for $x number of
retries w/ a configurable set of delays for introspection? [2]  Is the
expectation this works the first time everytime?
Are we overstepping what CI should do by implementing [1].

Additionally would it be appropriate to implement [1], while [2] is
developed for the next release and is it OK to use [1] with older releases?

Thanks for your time and responses.


[1] http://paste.openstack.org/show/586035/
[2]
https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] containerized compute CI status

2016-11-11 Thread Wesley Hayutin
Greetings,

I wanted to send a status update on the containerized compute CI, this is
still very much a work in progress and is not yet working end to end.  I'm
hoping by sending this out early in the process we'll all benefit from
developing the CI as the feature is developed.

Everything you'll need to know to use this is documented here [1], I have
two issues that I'm filling out details on [2]

Any feedback w/ regards to the role, the documentation or in general is
welcome.  I hope if you try this, you find it easy to understand and
develop with.

Thank you!


[1]
https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-containers/
[2]
https://github.com/redhat-openstack/ansible-role-tripleo-overcloud-prep-containers/issues
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo-ci] tripleo-quickstart-extras and tripleo third party ci

2016-10-14 Thread Wesley Hayutin
Greetings,

Hey everyone, I wanted to post a link to a blueprint I'm interested in
discussing at summit with everyone.  Please share your thoughts and
comments in the spec / gerrit review.

https://blueprints.launchpad.net/tripleo/+spec/tripleo-third-party-ci-quickstart

Thank you!
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ironic] introspection and CI

2016-10-17 Thread Wesley Hayutin
Greetings,

The RDO CI team is considering adding retries to our calls to introspection
again [1].
This is very handy for bare metal environments where retries may be needed
due to random chaos in the environment itself.

We're trying to balance two things here..
1. reduce the number of false negatives in CI
2. try not to overstep what CI should vs. what the product should do.

We would like to hear your comments if you think this is acceptable for CI
or if this may be overstepping.

Thank you


[1] http://paste.openstack.org/show/586035/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [ci]

2016-12-14 Thread Wesley Hayutin
On Fri, Dec 2, 2016 at 12:04 PM, Wesley Hayutin <whayu...@redhat.com> wrote:

> Greetings,
>
> I wanted to send a status update on the quickstart based containerized
> compute ci.
>
> The work is here:
> https://review.openstack.org/#/c/393348/
>
> I had two passes on the morning of Nov 30 in a row, then later that day
> the deployment started to fail due the compute node loosing it's networking
> and became unpingable.   After poking around and talking to a few folks its
> likely that we're hitting at least one of two possible bugs [1-2]
>
> I am on pto next week but will periodically check in and can easily retest
> if these resolve.
>
> Thank you!
>
> [1] https://bugs.launchpad.net/ironic/+bug/1646477
> [2] https://bugs.launchpad.net/tripleo/+bug/1646897 just filed
>
>
>
Just a quick update,
The container CI is successfully deploying the overcloud with the
containerized compute node.
I need to update the instructions a bit so you may want to hold off on
trying it out until you see an update.

The heat ping test is failing, but this is progress and we're back on track
:)
The environment is running and logs are available, ping me personally if
you need access.

Thanks..


Ping test failure:

| stack_name| pingtest_stack

| stack_owner   | None

| stack_status  | CREATE_FAILED

| stack_status_reason   | Resource CREATE failed: ResourceInError:

|   | resources.server1: Went to status ERROR due to

|   | "Message: No valid host was found. There are not
enough
|   | hosts available., Code: 500"

| stack_user_project_id | e5fcd903a5004d59b8d3ad22aba0ae27
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] [ci]

2016-12-02 Thread Wesley Hayutin
Greetings,

I wanted to send a status update on the quickstart based containerized
compute ci.

The work is here:
https://review.openstack.org/#/c/393348/

I had two passes on the morning of Nov 30 in a row, then later that day the
deployment started to fail due the compute node loosing it's networking and
became unpingable.   After poking around and talking to a few folks its
likely that we're hitting at least one of two possible bugs [1-2]

I am on pto next week but will periodically check in and can easily retest
if these resolve.

Thank you!

[1] https://bugs.launchpad.net/ironic/+bug/1646477
[2] https://bugs.launchpad.net/tripleo/+bug/1646897 just filed
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] TripleO-Quickstart Transition to TripleO-CI Update and Invite:

2017-01-03 Thread Wesley Hayutin
adding [ci] to the subject.

On Tue, Jan 3, 2017 at 4:04 PM, Harry Rybacki  wrote:

> Greetings All,
>
> Folks have been diligently working on the blueprint[1] to prepare
> TripleO-Quickstart (OOOQ)[2] and TripleO-Quickstart-Extras[3] for
> their transition into TripleO-CI. Presently, our aim is to begin the
> actual transition to OOOQ on 4-Feb-2017. We are tracking our work on
> the RDO-Infra Trello board[4] and holding public discussion of key
> blockers on the team’s scrum etherpad[5].
>
> We are hosting weekly transition update meetings (1600-1700 UTC) and
> would like to invite folks to participate. Specifically, we are
> looking for at least one stakeholder in the existing TripleO-CI to
> join us as we prepare to migrate OOOQ. Attend and map out job/feature
> coverage to identify any holes so we can begin plugging them. Please
> reply off-list or reach out to me (hrybacki) on IRC to be added to the
> transition meeting calendar invite.
>
> [1] - https://blueprints.launchpad.net/tripleo/+spec/use-tripleo-
> quickstart-and-tripleo-quickstart-extras-for-the-tripleo-ci-toolset
> [2] - https://github.com/openstack/tripleo-quickstart/
> [3] - https://github.com/openstack/tripleo-quickstart-extras/
> [4] - https://trello.com/b/HhXlqdiu/rdo
> [5] - https://review.rdoproject.org/etherpad/p/rdo-infra-scrum
>
>
> /R
>
> Harry Rybacki
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [ci] TripleO-Quickstart Transition to TripleO-CI Update and Invite:

2017-01-04 Thread Wesley Hayutin
Greetings Steve

On Wed, Jan 4, 2017 at 4:34 AM, Steven Hardy  wrote:

> Hi Harry,
>
> On Tue, Jan 03, 2017 at 04:04:51PM -0500, Harry Rybacki wrote:
> > Greetings All,
> >
> > Folks have been diligently working on the blueprint[1] to prepare
> > TripleO-Quickstart (OOOQ)[2] and TripleO-Quickstart-Extras[3] for
> > their transition into TripleO-CI. Presently, our aim is to begin the
> > actual transition to OOOQ on 4-Feb-2017. We are tracking our work on
> > the RDO-Infra Trello board[4] and holding public discussion of key
> > blockers on the team’s scrum etherpad[5].
>
> Thanks for the update - can you please describe what "transition into
> TripleO-CI" means?
>

Hey Steve,
This includes items like the following.

* Move all oooq-extras roles upstream
* Ensure check gates are all working.
* Ensure oooq works with multinode nodepool
* Ensure the logging of jobs upstream with oooq is equivilant to the
current logs and familiar to devs
* Ensure the existing tripleo-ci can co-exist with oooq
* Ensure the documentation is clear for developers
* Ensure oooq-extras roles are composable and consistent across various
infrastructure like upstream,
rdo, internal, local builds.


> I'm happy to see this work proceeding, but we have to be mindful that the
> end of the development cycle (around the time you're proposing) is always
> a crazy-busy time where folks are trying to land features and fixes.
>
> So, we absolutely must avoid any CI outages around this time, thus I get
> nervous talking about major CI transitions around the Release-candate
> weeks ;)
>
> https://releases.openstack.org/ocata/schedule.html
>
> If we're talking about getting the jobs ready, then switching over to
> primarily oooq jobs in early pike, that's great, but please lets ensure we
> don't may any disruptive changes before the end of this (very short and
> really busy) cycle.
>

That sounds fair to me, would pushing out the transition a couple weeks be
a reasonable change?
Looking at the calendar, Feb 27 is after the release of Ocata also March
6th could also be an option.


>
> > We are hosting weekly transition update meetings (1600-1700 UTC) and
> > would like to invite folks to participate. Specifically, we are
> > looking for at least one stakeholder in the existing TripleO-CI to
> > join us as we prepare to migrate OOOQ. Attend and map out job/feature
> > coverage to identify any holes so we can begin plugging them. Please
> > reply off-list or reach out to me (hrybacki) on IRC to be added to the
> > transition meeting calendar invite.
>
> Why can't we discuss this in the weekly TripleO IRC meeting?
>

I've added to the agenda for next week.  Harry and I will fill in details.


>
> I think folks would be fine with having a standing item where we dicscuss
> this transition (there is already a CI item, but I've rarely seen this
> topic raised there).
>
> https://wiki.openstack.org/wiki/Meetings/TripleO
>
> Thanks!
>
> Steve
>

Thank you for the feedback!


>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [ci]

2016-12-20 Thread Wesley Hayutin
On Fri, Dec 16, 2016 at 9:12 AM, Flavio Percoco <fla...@redhat.com> wrote:

> On 14/12/16 21:44 -0500, Emilien Macchi wrote:
>
>> On Wed, Dec 14, 2016 at 7:22 PM, Wesley Hayutin <whayu...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 2, 2016 at 12:04 PM, Wesley Hayutin <whayu...@redhat.com>
>>> wrote:
>>>
>>>>
>>>> Greetings,
>>>>
>>>> I wanted to send a status update on the quickstart based containerized
>>>> compute ci.
>>>>
>>>> The work is here:
>>>> https://review.openstack.org/#/c/393348/
>>>>
>>>> I had two passes on the morning of Nov 30 in a row, then later that day
>>>> the deployment started to fail due the compute node loosing it's
>>>> networking
>>>> and became unpingable.   After poking around and talking to a few folks
>>>> its
>>>> likely that we're hitting at least one of two possible bugs [1-2]
>>>>
>>>> I am on pto next week but will periodically check in and can easily
>>>> retest
>>>> if these resolve.
>>>>
>>>> Thank you!
>>>>
>>>> [1] https://bugs.launchpad.net/ironic/+bug/1646477
>>>> [2] https://bugs.launchpad.net/tripleo/+bug/1646897 just filed
>>>>
>>>>
>>>>
>>> Just a quick update,
>>> The container CI is successfully deploying the overcloud with the
>>> containerized compute node.
>>>
>>
>> Do we have the job in place in tripleo CI? I might have missed the
>> info, but I don't see it yet.
>> Thanks!
>>
>> Once we have it, we might want to run it for every patch in THT that
>> touch docker/* files.
>>
>
> +1 for this, although I'm starting to think that we should just run it.
> Last
> time it broke, it was caused by a patch that didn't touch files under
> `docker`
>
> Flavio
>
>
>
>> I need to update the instructions a bit so you may want to hold off on
>>> trying it out until you see an update.
>>>
>>> The heat ping test is failing, but this is progress and we're back on
>>> track
>>> :)
>>> The environment is running and logs are available, ping me personally if
>>> you
>>> need access.
>>>
>>> Thanks..
>>>
>>>
>>> Ping test failure:
>>>
>>> | stack_name| pingtest_stack
>>> | stack_owner   | None
>>> | stack_status  | CREATE_FAILED
>>> | stack_status_reason   | Resource CREATE failed: ResourceInError:
>>> |   | resources.server1: Went to status ERROR due to
>>> |   | "Message: No valid host was found. There are
>>> not
>>> enough
>>> |   | hosts available., Code: 500"
>>> | stack_user_project_id | e5fcd903a5004d59b8d3ad22aba0ae27
>>>
>>>
>>
>>
>> --
>> Emilien Macchi
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> --
> @flaper87
> Flavio Percoco
>



Greetings,

After pulling in a few unmerged changes via devmode tripleo-quickstart the
container CI is passing the deployment and ping test.

Updated docs at

https://review.openstack.org/#/c/400983/17/roles/overcloud-prep-containers/README.md

The two patch sets are brought into the build with

export OPT_ADDITIONAL_PARAMETERS=" -e
overcloud_templates_refspec=refs/changes/80/395880/12 -e
overcloud_tripleo_common_refspec=refs/changes/08/411908/2"

I'll be working w/ the tripleo-quickstart core to try and get a merge on
the three patches required.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] container jobs are unstable

2017-04-06 Thread Wesley Hayutin
On Thu, Mar 30, 2017 at 10:08 AM, Steven Hardy  wrote:

> On Wed, Mar 29, 2017 at 10:07:24PM -0400, Paul Belanger wrote:
> > On Thu, Mar 30, 2017 at 09:56:59AM +1300, Steve Baker wrote:
> > > On Thu, Mar 30, 2017 at 9:39 AM, Emilien Macchi 
> wrote:
> > >
> > > > On Mon, Mar 27, 2017 at 8:00 AM, Flavio Percoco 
> wrote:
> > > > > On 23/03/17 16:24 +0100, Martin André wrote:
> > > > >>
> > > > >> On Wed, Mar 22, 2017 at 2:20 PM, Dan Prince 
> wrote:
> > > > >>>
> > > > >>> On Wed, 2017-03-22 at 13:35 +0100, Flavio Percoco wrote:
> > > > 
> > > >  On 22/03/17 13:32 +0100, Flavio Percoco wrote:
> > > >  > On 21/03/17 23:15 -0400, Emilien Macchi wrote:
> > > >  > > Hey,
> > > >  > >
> > > >  > > I've noticed that container jobs look pretty unstable
> lately; to
> > > >  > > me,
> > > >  > > it sounds like a timeout:
> > > >  > > http://logs.openstack.org/19/447319/2/check-tripleo/gate-
> tripleo-
> > > >  > > ci-centos-7-ovb-containers-oooq-nv/bca496a/console.html#_
> 2017-03-
> > > >  > > 22_00_08_55_358973
> > > >  >
> > > >  > There are different hypothesis on what is going on here. Some
> > > >  > patches have
> > > >  > landed to improve the write performance on containers by using
> > > >  > hostpath mounts
> > > >  > but we think the real slowness is coming from the images
> download.
> > > >  >
> > > >  > This said, this is still under investigation and the
> containers
> > > >  > squad will
> > > >  > report back as soon as there are new findings.
> > > > 
> > > >  Also, to be more precise, Martin André is looking into this. He
> also
> > > >  fixed the
> > > >  gate in the last 2 weeks.
> > > > >>>
> > > > >>>
> > > > >>> I spoke w/ Martin on IRC. He seems to think this is the cause of
> some
> > > > >>> of the failures:
> > > > >>>
> > > > >>> http://logs.openstack.org/32/446432/1/check-tripleo/gate-
> > > > tripleo-ci-cen
> > > > >>> tos-7-ovb-containers-oooq-nv/543bc80/logs/oooq/overcloud-
> controller-
> > > > >>> 0/var/log/extra/docker/containers/heat_engine/log/heat/heat-
> > > > >>> engine.log.txt.gz#_2017-03-21_20_26_29_697
> > > > >>>
> > > > >>>
> > > > >>> Looks like Heat isn't able to create Nova instances in the
> overcloud
> > > > >>> due to "Host 'overcloud-novacompute-0' is not mapped to any
> cell'. This
> > > > >>> means our cells initialization code for containers may not be
> quite
> > > > >>> right... or there is a race somewhere.
> > > > >>
> > > > >>
> > > > >> Here are some findings. I've looked at time measures from CI for
> > > > >> https://review.openstack.org/#/c/448533/ which provided the most
> > > > >> recent results:
> > > > >>
> > > > >> * gate-tripleo-ci-centos-7-ovb-ha [1]
> > > > >>undercloud install: 23
> > > > >>overcloud deploy: 72
> > > > >>total time: 125
> > > > >> * gate-tripleo-ci-centos-7-ovb-nonha [2]
> > > > >>undercloud install: 25
> > > > >>overcloud deploy: 48
> > > > >>total time: 122
> > > > >> * gate-tripleo-ci-centos-7-ovb-updates [3]
> > > > >>undercloud install: 24
> > > > >>overcloud deploy: 57
> > > > >>total time: 152
> > > > >> * gate-tripleo-ci-centos-7-ovb-containers-oooq-nv [4]
> > > > >>undercloud install: 28
> > > > >>overcloud deploy: 48
> > > > >>total time: 165 (timeout)
> > > > >>
> > > > >> Looking at the undercloud & overcloud install times, the most task
> > > > >> consuming tasks, the containers job isn't doing that bad compared
> to
> > > > >> other OVB jobs. But looking closer I could see that:
> > > > >> - the containers job pulls docker images from dockerhub, this
> process
> > > > >> takes roughly 18 min.
> > > > >
> > > > >
> > > > > I think we can optimize this a bit by having the script that
> populates
> > > > the
> > > > > local
> > > > > registry in the overcloud job to run in parallel. The docker
> daemon can
> > > > do
> > > > > multiple pulls w/o problems.
> > > > >
> > > > >> - the overcloud validate task takes 10 min more than it should
> because
> > > > >> of the bug Dan mentioned (a fix is in the queue at
> > > > >> https://review.openstack.org/#/c/448575/)
> > > > >
> > > > >
> > > > > +A
> > > > >
> > > > >> - the postci takes a long time with quickstart, 13 min (4 min
> alone
> > > > >> spent on docker log collection) whereas it takes only 3 min when
> using
> > > > >> tripleo.sh
> > > > >
> > > > >
> > > > > mmh, does this have anything to do with ansible being in between?
> Or is
> > > > that
> > > > > time specifically for the part that gets the logs?
> > > > >
> > > > >>
> > > > >> Adding all these numbers, we're at about 40 min of additional
> time for
> > > > >> oooq containers job which is enough to cross the CI job limit.
> > > > >>
> > > > >> There is certainly a lot of room for optimization here and there
> and
> > > > >> I'll explore how we can speed up the containers CI job 

Re: [openstack-dev] [infra][tripleo] initial discussion for a new periodic pipeline

2017-03-09 Thread Wesley Hayutin
On Wed, Mar 8, 2017 at 1:29 PM, Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2017-03-07 10:12:58 -0500 (-0500), Wesley Hayutin wrote:
> > The TripleO team would like to initiate a conversation about the
> > possibility of creating a new pipeline in Openstack Infra to allow
> > a set of jobs to run periodically every four hours
> [...]
>
> The request doesn't strike me as contentious/controversial. Why not
> just propose your addition to the zuul/layout.yaml file in the
> openstack-infra/project-config repo and hash out any resulting
> concerns via code review?
> --
> Jeremy Stanley
>
>
Sounds good to me.
We thought it would be nice to walk through it in an email first :)

Thanks


> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra][tripleo] initial discussion for a new periodic pipeline

2017-03-07 Thread Wesley Hayutin
Greetings,

The TripleO team would like to initiate a conversation about the
possibility of creating a new pipeline in Openstack Infra to allow a set of
jobs to run periodically every four hours [2].
The background and context of why such a pipeline is required is as
follows.  TripleO CI executes installs of Openstack with rpm packages from
a tool called delorean [1].  Every commit from upstream is built into a
rpm several times each hour of the day.  These freshly built packages need
to be validated before they are allowed to be used in the TripleO check
gates to ensure we have stable and consistent CI results.  Currently the
validation of the new rpms is done every 24 hours via a periodic pipeline.
In practice this periodic validation finds a working set of rpms roughly
once maybe twice a week depending on the time in the release, they happen
much more often at the beginning and end of each cycle.

There is often a window of time where an upstream fix has been merged and
the set of rpms will pass this validation and when a new issue/bug is
introduced into the latest set of rpms.  Validating the set of rpms only
every 24 hours does not intersect with that "working window" often enough.
It's good practice to always be testing with the latest merged code in CI
and that is what we hope to improve on in TripleO by increasing the cadence
of this validation.   A suggestion was made at the last PTG in Atlanta to
create a new pipeline to accommodate an increased cadence hopefully to
every four hours.

I suspect there will be roughly 4-6 jobs kicked off in this pipeline.  The
exact jobs that will be used is not final but will probably include [3]

I am hoping to kick off the conversation regarding how to properly proceed
with this task here, and to generally notify the community of our
intentions.

Thank you for reading through this and considering the requirement.
Wes


[1]
https://blogs.rdoproject.org/7834/delorean-openstack-packages-from-the-future
[2]
https://blueprints.launchpad.net/tripleo/+spec/increase-cadence-of-tripleo-promotions
[3]
gate-tripleo-ci-centos-7-scenario001-multinode
gate-tripleo-ci-centos-7-scenario002-multinode
gate-tripleo-ci-centos-7-scenario003-multinode
gate-tripleo-ci-centos-7-scenario004-multinode
gate-tripleo-ci-centos-7-multinode-upgrades-nv
gate-tripleo-ci-centos-7-ovb-ha-ipv6
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][tripleo] initial discussion for a new periodic pipeline

2017-03-21 Thread Wesley Hayutin
 can start from 12 hours period to see how it goes,
> although
> >> > I
> >> > > > don't think that 4 only jobs will increase load on OVB cloud, it's
> >> > > > completely negligible comparing to current OVB capacity and load.
> >> > > > But making its precedence as "low" IMHO completely removes any
> sense
> >> > > > from this pipeline to be, because we already run
> experimental-tripleo
> >> > > > pipeline which this priority and it could reach timeouts like 7-14
> >> > > > hours. So let's assume we ran periodic job, it's queued to run
> now 12 +
> >> > > > "low queue length" - about 20 and more hours. It's even worse than
> >> > usual
> >> > > > periodic job and definitely makes this change useless.
> >> > > > I'd like to notice as well that those periodic jobs unlike "usual"
> >> > > > periodic are used for repository promotion and their value are
> equal or
> >> > > > higher than check jobs, so it needs to run with "normal" or even
> "high"
> >> > > > precedence.
> >> > >
> >> > > Yeah, it makes no sense from an OVB perspective to add these as low
> >> > priority
> >> > > jobs.  Once in a while we've managed to chew through the entire
> >> > experimental
> >> > > queue during the day, but with the containers job added it's very
> >> > unlikely
> >> > > that's going to happen anymore.  Right now we have a 4.5 hour wait
> time
> >> > just
> >> > > for the check queue, then there's two hours of experimental jobs
> queued
> >> > up
> >> > > behind that.  All of which means if we started a low priority
> periodic
> >> > job
> >> > > right now it probably wouldn't run until about midnight my time,
> which I
> >> > > think is when the regular periodic jobs run now.
> >> > >
> >> > Lets just give it a try? A 12 hour periodic job with low priority.
> There is
> >> > nothing saying we cannot iterate on this after a few days / weeks /
> months.
> >> >
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > > On Thu, Mar 9, 2017 at 10:06 PM, Wesley Hayutin <
> whayu...@redhat.com
> >> > > > <mailto:whayu...@redhat.com>> wrote:
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Mar 8, 2017 at 1:29 PM, Jeremy Stanley <
> fu...@yuggoth.org
> >> > > > <mailto:fu...@yuggoth.org>> wrote:
> >> > > >
> >> > > > On 2017-03-07 10:12:58 -0500 (-0500), Wesley Hayutin
> wrote:
> >> > > > > The TripleO team would like to initiate a conversation
> about
> >> > the
> >> > > > > possibility of creating a new pipeline in Openstack
> Infra to
> >> > allow
> >> > > > > a set of jobs to run periodically every four hours
> >> > > > [...]
> >> > > >
> >> > > > The request doesn't strike me as
> contentious/controversial.
> >> > Why not
> >> > > > just propose your addition to the zuul/layout.yaml file
> in the
> >> > > > openstack-infra/project-config repo and hash out any
> resulting
> >> > > > concerns via code review?
> >> > > > --
> >> > > > Jeremy Stanley
> >> > > >
> >> > > >
> >> > > > Sounds good to me.
> >> > > > We thought it would be nice to walk through it in an email
> first :)
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > > __
> __
> >> > __
> >> > > > OpenStack Development Mailing List (not for usage
> questions)
> >> > > > Unsubscribe:
> >> > > > openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> >> > > > <http://openstack-dev-requ...@list

Re: [openstack-dev] [TripleO][CI] Bridging the production/CI workflow gap with large periodic CI jobs

2017-04-18 Thread Wesley Hayutin
On Tue, Apr 18, 2017 at 2:28 PM, Emilien Macchi  wrote:

> On Mon, Apr 17, 2017 at 3:52 PM, Justin Kilpatrick 
> wrote:
> > Because CI jobs tend to max out about 5 nodes there's a whole class of
> > minor bugs that make it into releases.
> >
> > What happens is that they never show up in small clouds, then when
> > they do show up in larger testing clouds the people deploying those
> > simply work around the issue and get onto what they where supposed to
> > be testing. These workarounds do get documented/BZ'd but since they
> > don't block anyone and only show up in large environments they become
> > hard for developers to fix.
> >
> > So the issue gets stuck in limbo, with nowhere to test a patchset and
> > no one owning the issue.
> >
> > These issues pile up and pretty soon there is a significant difference
> > between the default documented workflow and the 'scale' workflow which
> > is filled with workarounds which may or may not be documented
> > upstream.
> >
> > I'd like to propose getting these issues more visibility to having a
> > periodic upstream job that uses 20-30 ovb instances to do a larger
> > deployment. Maybe at 3am on a Sunday or some other time where there's
> > idle execution capability to exploit. The goal being to make these
> > sorts of issues more visible and hopefully get better at fixing them.
>
> Wait no, I know some folks at 3am on a Saturday night who use TripleO
> CI (ok that was a joke).
>
> > To be honest I'm not sure this is the best solution, but I'm seeing
> > this anti pattern across several issues and I think we should try and
> > come up with a solution.
> >
>
> Yes this proposal is really cool. There is an alternative to run this
> periodic scenario outside TripleO CI and send results via email maybe.
> But it is something we need to discuss with RDO Cloud people and see
> if we would have such resources to make it on a weekly frequency.
>

+1
I think with RDO Cloud it's possible to run a test of that scale either in
the
tripleo system or just report results, either would be great.  Until RDO
Cloud
is full production we might as well begin by running a job internally with
the master-tripleo-ci release config file.  The browbeat jobs are logging
[1] here
will be fairly simple step to run w/ the upstream content.

Adding Arx Cruz as he is point on a tool that distrubutes test results from
the tripleo periodic jobs that may come in handy for this scale test.  I'll
probably put you two in touch tomorrow.

I'm still looking for opportunities to run browbeat in upstream tripleo as
well.
Could be a productive sync up :)

[1] https://thirdparty-logs.rdoproject.org/

Thanks!



>
> Thanks for bringing this up, it's crucial for us to have this kind of
> feedback, now let's take actions.
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Bogdan Dobrelya core on TripleO / Containers

2017-07-28 Thread Wesley Hayutin
On Wed, Jul 26, 2017 at 7:39 AM, Jiří Stránský  wrote:

> On 21.7.2017 16:55, Emilien Macchi wrote:
>
>> Hi,
>>
>> Bogdan (bogdando on IRC) has been very active in Containerization of
>> TripleO and his quality of review has increased over time.
>> I would like to give him core permissions on container work in TripleO.
>> Any feedback is welcome as usual, we'll vote as a team.
>>
>> Thanks,
>>
>
+1


>
>>
> +1
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] critical situation with CI / upgrade jobs

2017-08-15 Thread Wesley Hayutin
On Tue, Aug 15, 2017 at 9:33 PM, Emilien Macchi  wrote:

> So far, we're having 3 critical issues, that we all need to address as
> soon as we can.
>
> Problem #1: Upgrade jobs timeout from Newton to Ocata
> https://bugs.launchpad.net/tripleo/+bug/1702955
> Today I spent an hour to look at it and here's what I've found so far:
> depending on which public cloud we're running the TripleO CI jobs, it
> timeouts or not.
> Here's an example of Heat resources that run in our CI:
> https://www.diffchecker.com/VTXkNFuk
> On the left, resources on a job that failed (running on internap) and
> on the right (running on citycloud) it worked.
> I've been through all upgrade steps and I haven't seen specific tasks
> that take more time here or here, but some little changes that make
> the big change at the end (so hard to debug).
> Note: both jobs use AFS mirrors.
> Help on that front would be very welcome.
>
>
> Problem #2: from Ocata to Pike (containerized) missing container upload
> step
> https://bugs.launchpad.net/tripleo/+bug/1710938
> Wes has a patch (thanks!) that is currently in the gate:
> https://review.openstack.org/#/c/493972
> Thanks to that work, we managed to find the problem #3.
>
>
> Problem #3: from Ocata to Pike: all container images are
> uploaded/specified, even for services not deployed
> https://bugs.launchpad.net/tripleo/+bug/1710992
> The CI jobs are timeouting during the upgrade process because
> downloading + uploading _all_ containers in local cache takes more
> than 20 minutes.
> So this is where we are now, upgrade jobs timeout on that. Steve Baker
> is currently looking at it but we'll probably offer some help.
>
>
> Solutions:
> - for stable/ocata: make upgrade jobs non-voting
> - for pike: keep upgrade jobs non-voting and release without upgrade
> testing
>
> Risks:
> - for stable/ocata: it's highly possible to inject regression if jobs
> aren't voting anymore.
> - for pike: the quality of the release won't be good enough in term of
> CI coverage comparing to Ocata.
>
> Mitigations:
> - for stable/ocata: make jobs non-voting and enforce our
> core-reviewers to pay double attention on what is landed. It should be
> temporary until we manage to fix the CI jobs.
> - for master: release RC1 without upgrade jobs and make progress
> - Run TripleO upgrade scenarios as third party CI in RDO Cloud or
> somewhere with resources and without timeout constraints.
>
> I would like some feedback on the proposal so we can move forward this
> week,
> Thanks.
> --
> Emilien Macchi
>

I think due to some of the limitations with run times upstream we may need
to rethink the workflow with upgrade tests upstream. It's not very clear to
me what can be done with the multinode nodepool jobs outside of what is
already being done.  I think we do have some choices with ovb jobs.   I'm
not going to try and solve in this email but rethinking how we CI upgrades
in the upstream infrastructure should be a focus for the Queens PTG.  We
will need to focus on bringing run times significantly down as it's
incredibly difficult to run two installs in 175 minutes across all the
upstream cloud providers.

Thanks Emilien for all the work you have done around upgrades!



>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Wesley Hayutin
Greetings,

I was looking through the mailing list and I did not see any emails
explicitly calling out the decreased coverage for telemetry in tripleo due
to [1].  A series of changes went into the CI system to disable telemetry
[2].

There is work being done to restore more coverage for telemetry by limiting
the resources it consumes [3].  We are also working on additional scenarios
in t-h-t/ci/environments/ to better cover ceilometer.

If the CI environment you are working in has the resources to cover
ceilometer that is great, however if you find issues like [1] we highly
suggest you follow the same pattern until coverage is restored upstream.

Thank you!

[1] https://bugs.launchpad.net/tripleo/+bug/1693174
[2] https://review.openstack.org/#/q/topic:bug/1680195
[3]
https://review.openstack.org/#/c/475838/
https://review.openstack.org/#/c/474969/
https://review.openstack.org/#/c/47/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] index_footer in openstack logs

2017-07-09 Thread Wesley Hayutin
posting to openstack-dev

On Sun, Jul 9, 2017 at 10:08 AM, Wesley Hayutin <whayu...@redhat.com> wrote:

> Greetings Andreas, Paul
>
> I'm looking for some pointers on how to include some instructions in our
> openstack logs in the same way the devstack gate works [1] and I'm not able
> to piece things together atm.
>
> I see the support for adding an index_footer was removed in [2], but I
> don't see what it was replaced by.  I was hoping you guys could point us in
> the right direction to enable embedding instructions directly in our logs
> like [3]
>
> Thank you for the help!
>
> [1] https://github.com/openstack-infra/devstack-gate/tree/master/help
> [2] https://github.com/openstack-infra/project-config/commit/
> 183aabbeaf528f5ef637a7bb51245eea4fab94b8#diff-
> 03d414c17dcd54548b8810c4a442b655
> [3] http://logs.openstack.org/29/480429/1/check/gate-
> tempest-dsvm-neutron-full-ubuntu-xenial/76071e1/logs/
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Forming our plans around Ansible

2017-07-09 Thread Wesley Hayutin
On Fri, Jul 7, 2017 at 6:20 PM, James Slagle  wrote:

> On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard 
> wrote:
> > On Fri, Jul 7, 2017 at 1:50 PM, James Slagle 
> wrote:
> >> (0) tripleo-quickstart which follows the common and well accepted
> >> approach to bundling a set of Ansible playbooks/roles.
> >
> > I don't want to de-rail the thread but I really want to bring some
> > attention to a pattern that tripleo-quickstart has been using across
> > it's playbooks and roles.
> > I sincerely hope that we can find a better implementation should we
> > start developing new things from scratch.
>
> Yes, just to clarify...by "well accepted" I just meant how the git
> repo is organized and how you are expected to interface with those
> playbooks and roles as opposed to what those playbooks/roles actually
> do.
>
> > I'll sound like a broken record for those that have heard me mention
> > this before but for those that haven't, here's a concrete example of
> > how things are done today:
> > (Sorry for the link overload, making sure the relevant information is
> available)
> >
> > For an example tripleo-quickstart job, here's the console [1] and it's
> > corresponding ARA report [2]:
> > - A bash script is created [3][4][5] from a jinja template [6]
> > - A task executes the bash script [7][8][9]
>
> From my limited experience, I believe the intent was that the
> playbooks should do what a user is expected to do so that it's as
> close to reproducing the user interface of TripleO 1:1.
>
> For example, we document users running commands from a shell prompt.
> Therefore, oooq ought to do the same thing as close as possible.
> Obviously there will be gaps, just as there is with tripleo.sh, but I
> feel that both tools (tripleo.sh/oooq) were trying to be faithful to
> our published docs as mush as possible, and I think there's something
> to be commended there.
>

That is exactly right James, CI should be as close to a user driven install
as possible IMHO.

David you are conflating two use cases as far as I can tell. The first use
case (a) ansible used in the project/product that is launched by
openstack/project commands,  and the second use case (b) ansible as a
wrapper around commands that users are expected to execute.

Using navtive ansible modules as part of the project/product (a) as James
is describing is perfectly fine and ansible, ARA and other tools work
really well here.

If the CI reinterprets user level commands (b) directly into ansible module
calls you basically loose the 1:1 mapping between CI, documentation and
user experience.
The *most* important function of CI is guarantee that users can follow the
documentation and have a defect free experience [docs].  Having to "look at
the logs" is a very small
price to pay to preserve that experience.   I think we'll be able to get
the logs from the templated bash into ARA, we just need a little time to
get that done.
IMHO CI is a very different topic than what James is talking about in this
thread and hopefully won't interupt this converstation further.

Thanks

[docs]
https://docs.openstack.org/tripleo-quickstart/latest/design.html#problem-help-make-the-deployment-steps-easier-to-understand



> Not saying it's right or wong, just that I believe that was the intent.
>
> An alternative would be custom ansible modules that exposed tasks for
> interfacing with our API directly. That would also be valuable, as
> that code path is mostly untested now outside of the UI and CLI.
>
> I think that tripleo-quickstart is a slightly different class of
> "thing" from the other current Ansible uses I mentioned, in that it
> sits at a layer above everything else. It's meant to automate TripleO
> itself vs TripleO automating things. Regardless, we should certainly
> consider how it fits into a larger plan.
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] proposing Alex Schultz tripleo-core in all projects

2017-07-09 Thread Wesley Hayutin
+1

On Fri, Jul 7, 2017 at 5:25 PM, Pradeep Kilambi  wrote:

> +1 to Alex
>
> On Fri, Jul 7, 2017 at 4:20 PM, Juan Antonio Osorio 
> wrote:
> > +1
> >
> > He's a great reviewer
> >
> > On 7 Jul 2017 8:40 pm, "Emilien Macchi"  wrote:
> >>
> >> Alex has demonstrated high technical and community skills in TripleO -
> >> where he's already core on THT, instack-undercloud, and puppet-tripleo
> >> - but also very involved in other repos.
> >> I propose that we extend his core status to all TripleO projects and
> >> of course trust him (like we trust all core members) to review patches
> >> were we feel confortable with.
> >>
> >> He has shown an high interest in reviewed other TripleO projects and I
> >> think he would be ready for this change.
> >> As usual, this is an open proposal, any feedback is welcome.
> >>
> >> Thanks,
> >> --
> >> Emilien Macchi
> >>
> >> 
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Cheers,
> ~ Prad
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-12 Thread Wesley Hayutin
On Wed, Jul 12, 2017 at 10:33 AM, Pradeep Kilambi <p...@redhat.com> wrote:

> On Tue, Jul 11, 2017 at 10:06 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> >
> >
> > On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi <emil...@redhat.com>
> wrote:
> >>
> >> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi <p...@redhat.com>
> wrote:
> >> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin <whayu...@redhat.com>
> >> > wrote:
> >> >> Greetings,
> >> >>
> >> >> I was looking through the mailing list and I did not see any emails
> >> >> explicitly calling out the decreased coverage for telemetry in
> tripleo
> >> >> due
> >> >> to [1].  A series of changes went into the CI system to disable
> >> >> telemetry
> >> >> [2].
> >> >>
> >> >> There is work being done to restore more coverage for telemetry by
> >> >> limiting
> >> >> the resources it consumes [3].  We are also working on additional
> >> >> scenarios
> >> >> in t-h-t/ci/environments/ to better cover ceilometer.
> >> >>
> >> >> If the CI environment you are working in has the resources to cover
> >> >> ceilometer that is great, however if you find issues like [1] we
> highly
> >> >> suggest you follow the same pattern until coverage is restored
> >> >> upstream.
> >> >>
> >> >> Thank you!
> >> >>
> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> >> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
> >> >> [3]
> >> >> https://review.openstack.org/#/c/475838/
> >> >> https://review.openstack.org/#/c/474969/
> >> >> https://review.openstack.org/#/c/47/
> >> >>
> >> >>
> >> >
> >> > Thanks for starting this thread Wes. I concur with this. We got bitten
> >> > recently by many issues that we could have caught in ci had telemetry
> >> > been enabled. I spoke to trown and Emilien about this a few times
> >> > already. I do understand the resource footprint it causes.  But with
> >> > recent improvements and changes upstream, things should be back to
> >> > being more manageable. We do have telemetry tested in scenario001 job,
> >> > but that doesn't cover all scenarios. So there is a gap in coverage.
> >>
> >> What do you mean by gap in coverage?
> >> We have scenarios on purpose, so we can horizontally scale the
> >> coverage across multiple jobs and run the jobs only when we need (e.g.
> >> touching telemetry files for scenario001).
> >>
> >> Please elaborate on what isn't covered by scenario001, because we
> >> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
> >> and soon with Swift backend in scenario002).
> >>
> >
> > Emilien,
> > Gap is the wrong word to use in the case.
> > Previously we had several jobs running with telemetry turned on including
> > ovb jobs in tripleo and other jobs outside of the upstream CI system.
> > The more jobs running, the more coverage.
> > I think that is what Pradeep was referring to, but maybe I am
> > misunderstanding this as well.
>
> Yea may be gap is not the right word. But mostly i meant what Wes
> said, but also I feel we are not testing Telemetry with full HA
> currently in CI. scenario jobs only test deploy with 1 controller not
> 3. We have seen some recent issues where things work on controller 0
> but controller 1 or 2 has statsd down for example. The ovb ha job
> would have shown us that, had the ovb ha job included telemetry
> enabled. Is it possible to run scenario001 job with full HA ?
>

Full HA is limited to ovb jobs atm and these jobs currently take longer to
run and are barely able to complete in the mandatory upstream timeout
period.
IMHO it's worth the time and effort to see if the performance improvements
currently being made to ceilometer will work properly with the OVB jobs,
but nothing I can guarantee atm.

Work is now starting on being able to deploy a full HA envrionment using
nodepool multinode jobs.  IMHO this is a better target.
I will keep you posted on the progress here.

Thank you Pradeep.


>
>
>
> >
> >
> >>
> >> >  I hope we can either re-enable these services by default in CI and
> >> > how things w

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Wesley Hayutin
On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi <p...@redhat.com> wrote:
> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> >> Greetings,
> >>
> >> I was looking through the mailing list and I did not see any emails
> >> explicitly calling out the decreased coverage for telemetry in tripleo
> due
> >> to [1].  A series of changes went into the CI system to disable
> telemetry
> >> [2].
> >>
> >> There is work being done to restore more coverage for telemetry by
> limiting
> >> the resources it consumes [3].  We are also working on additional
> scenarios
> >> in t-h-t/ci/environments/ to better cover ceilometer.
> >>
> >> If the CI environment you are working in has the resources to cover
> >> ceilometer that is great, however if you find issues like [1] we highly
> >> suggest you follow the same pattern until coverage is restored upstream.
> >>
> >> Thank you!
> >>
> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
> >> [3]
> >> https://review.openstack.org/#/c/475838/
> >> https://review.openstack.org/#/c/474969/
> >> https://review.openstack.org/#/c/47/
> >>
> >>
> >
> > Thanks for starting this thread Wes. I concur with this. We got bitten
> > recently by many issues that we could have caught in ci had telemetry
> > been enabled. I spoke to trown and Emilien about this a few times
> > already. I do understand the resource footprint it causes.  But with
> > recent improvements and changes upstream, things should be back to
> > being more manageable. We do have telemetry tested in scenario001 job,
> > but that doesn't cover all scenarios. So there is a gap in coverage.
>
> What do you mean by gap in coverage?
> We have scenarios on purpose, so we can horizontally scale the
> coverage across multiple jobs and run the jobs only when we need (e.g.
> touching telemetry files for scenario001).
>
> Please elaborate on what isn't covered by scenario001, because we
> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
> and soon with Swift backend in scenario002).
>
>
Emilien,
Gap is the wrong word to use in the case.
Previously we had several jobs running with telemetry turned on including
ovb jobs in tripleo and other jobs outside of the upstream CI system.
The more jobs running, the more coverage.
I think that is what Pradeep was referring to, but maybe I am
misunderstanding this as well.



> >  I hope we can either re-enable these services by default in CI and
> > how things work or at least add a separate gate job to be able to test
> > HA scenario properly with telemetry enabled.
> >
> > --
> > Cheers,
> > ~ Prad
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Emilien Macchi
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 27)

2017-07-12 Thread Wesley Hayutin
Greetings,

Apologies for the delayed notes with regards to the last tripleo-ci-squad
meeting.

Highlights:

* Tempest
  * master/pike is down to 2 tempest failures [1] out of 1,337 tests
executed.
  * stable/ocata is a full pass on tempest
  * proposed tempest test to replace the tripleo ping test [3]
  * additional work is being closed out around notifications of
tempest failures and reporting.

* The periodic/promotion jobs for tripleo are migrating to rdo software
factory
  * multinode nodepool is nearly complete
  * ovb work is on going
  * containers promotion is moving to use multinode nodepool and on-going
  * Jenkins is being removed from software factory
 * Options for reporting
* request for openstack-health to be deployed in software factory
* update [4] to report on rdo software factory jobs
  * Work has started in integrating the delorean-api for promotions

* Default network isolation is moving towards multi-nic across jobs [5]

That is about it, till next time.. be cool :)

[1]
http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha-tempest-oooq-master/85815d1/logs/oooq/stackviz/#/stdin
[2]
http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha-tempest-oooq-ocata/44dab99/logs/oooq/stackviz/#/stdin
[3] https://review.openstack.org/#/c/480429/
[4] http://cistatus.tripleo.org/#periodictab
[5]
https://review.openstack.org/#/q/status:open+branch:master+topic:libvirt-multi-nic


Notes:
https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] index_footer in openstack logs

2017-07-10 Thread Wesley Hayutin
On Mon, Jul 10, 2017 at 2:24 AM, Andreas Jaeger <a...@suse.com> wrote:

> On 2017-07-10 05:03, Wesley Hayutin wrote:
> > posting to openstack-dev
> >
> > On Sun, Jul 9, 2017 at 10:08 AM, Wesley Hayutin <whayu...@redhat.com
> > <mailto:whayu...@redhat.com>> wrote:
> >
> > Greetings Andreas, Paul
> >
> > I'm looking for some pointers on how to include some instructions in
> > our openstack logs in the same way the devstack gate works [1] and
> > I'm not able to piece things together atm.
> >
> > I see the support for adding an index_footer was removed in [2], but
> > I don't see what it was replaced by.  I was hoping you guys could
> > point us in the right direction to enable embedding instructions
> > directly in our logs like [3]
> >
>
> Nope - we had *duplicate* macros, the support is still there. We removed
> one way of publishing that we never used.
>
> Do you know codesearch.openstack.org? Use it to find the code like
>
> http://codesearch.openstack.org/?q=Types%20of%20logs=nope==
>
>
> Andreas
>

Got it,
http://codesearch.openstack.org/?q=tempest-overview.html=nope==

I was not aware of codesearch.openstack

Thanks



>
> > Thank you for the help!
> >
> > [1] https://github.com/openstack-infra/devstack-gate/tree/
> master/help <https://github.com/openstack-infra/devstack-gate/tree/
> master/help>
> > [2] https://github.com/openstack-infra/project-config/commit/
> 183aabbeaf528f5ef637a7bb51245eea4fab94b8#diff-
> 03d414c17dcd54548b8810c4a442b655
> > <https://github.com/openstack-infra/project-config/commit/
> 183aabbeaf528f5ef637a7bb51245eea4fab94b8#diff-
> 03d414c17dcd54548b8810c4a442b655>
> > [3] http://logs.openstack.org/29/480429/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/76071e1/logs/
> > <http://logs.openstack.org/29/480429/1/check/gate-tempest-
> dsvm-neutron-full-ubuntu-xenial/76071e1/logs/>
> >
> >
>
>
> --
>  Andreas Jaeger aj@{suse.com,opensuse.org} Twitter: jaegerandi
>   SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>HRB 21284 (AG Nürnberg)
> GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI Squad Meeting Summary (week 29)

2017-07-25 Thread Wesley Hayutin
If the topics below interest you and you want to contribute to the
discussion, feel free to join the next meeting:

Time: Thursdays, 14:30-15:30 UTC
Place: https://bluejeans.com/4113567798/

Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

= Demo =
* Tempest in gate jobs status
   *
https://trello.com/c/Z8jIillp/252-spec-out-the-work-required-to-run-tempest-on-the-tripleo-gating-jobs
   *
https://docs.google.com/presentation/d/1ZCWPV9RXHnW2z68CWOZB1jUluJ6GiKJU_IuPrF62x0E

= TripleO Periodic Jobs Migration to RDO software factory =

*
https://trello.com/c/FFaSvTrz/272-start-reporting-job-status-to-the-delorean-api
* https://review.rdoproject.org/etherpad/p/dlrn-api-pipelines
* success w/ ovb and multinode jobs
 * putting final pieces together

= multinode 3/4 nodes =
* https://review.openstack.org/#/q/topic:3nodes-oooq

Thanks all!




On Mon, Jul 17, 2017 at 7:37 AM, Attila Darazs  wrote:

> If the topics below interest you and you want to contribute to the
> discussion, feel free to join the next meeting:
>
> Time: Thursdays, 14:30-15:30 UTC
> Place: https://bluejeans.com/4113567798/
>
> Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
>
> = Announcements =
>
> TripleO Cores who would like to +workflow changes on tripleo-quickstart,
> tripleo-quickstart-extras and tripleo-ci should attend the Squad meeting to
> gain the necessary overview for deciding when to submit changes to these
> repos. This was discussed by the repo specific cores over this meeting.
>
> In other news the https://thirdparty-logs.rdoproject.org/ logserver
> (hosted on OS1) migrated to https://thirdparty.logs.rdoproject.org/ (on
> RDO cloud).
>
> = Discussion topics =
>
> This week we had a more balanced agenda, with multiple small topics. Here
> they are:
>
> * John started working on the much requested 3 node multinode feature for
> Quickstart. Here's his WIP change[1]. This is necessary to test HA +
> containers on multinode jobs.
>
> * The OVB job transition is almost over complete. Sagi was cleaning up the
> last few tasks, replacing the gate-tripleo-ci-centos-7-ovb-nonha-puppet-*
> jobs of ceph and cinder to featureset024 which deploys ceph (former updates
> job) and gate-tripleo-ci-centos-7-ovb-nonha-convergence jobs which runs
> on experimental for Heat repo.
>
> * Gabriele made a nice solution to run periodic jobs on demand if
> necessary. The patch[2] is still not merged, but it looks promising.
>
> * Ronelle and Gabriele continues to work on the RDO cloud migration (both
> OVB and multinode). There are already some new and already exisitng jobs
> migrated there as a test.
>
> That's it for last week.
>
> Best regards,
> Attila
>
> [1] https://review.openstack.org/483078
> [2] https://review.openstack.org/478516
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] looking for doc reviews with regards to adding ooo check gates

2017-07-26 Thread Wesley Hayutin
Greetings,

I thought it may be helpful if we wrote some instructions for other
OpenStack projects with how to add a TripleO job to their projects check
gate in a nonvoting or voting capacity.

I'm not 100% sure if I'm accurate in my steps or if I'm missing something
someone may want to add.
Asking for reviews on [1-2]

Thanks in advance!

[1]
http://docs-draft.openstack.org/98/487598/2/check/gate-tripleo-docs-docs-ubuntu-xenial/f6f66ba//doc/build/html/contributor/check_gates.html
[2] https://review.openstack.org/#/c/487598/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Wesley Hayutin
On Tue, Jul 11, 2017 at 3:41 PM, Pradeep Kilambi <p...@redhat.com> wrote:

> On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> > Greetings,
> >
> > I was looking through the mailing list and I did not see any emails
> > explicitly calling out the decreased coverage for telemetry in tripleo
> due
> > to [1].  A series of changes went into the CI system to disable telemetry
> > [2].
> >
> > There is work being done to restore more coverage for telemetry by
> limiting
> > the resources it consumes [3].  We are also working on additional
> scenarios
> > in t-h-t/ci/environments/ to better cover ceilometer.
> >
> > If the CI environment you are working in has the resources to cover
> > ceilometer that is great, however if you find issues like [1] we highly
> > suggest you follow the same pattern until coverage is restored upstream.
> >
> > Thank you!
> >
> > [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> > [2] https://review.openstack.org/#/q/topic:bug/1680195
> > [3]
> > https://review.openstack.org/#/c/475838/
> > https://review.openstack.org/#/c/474969/
> > https://review.openstack.org/#/c/47/
> >
> >
>
> Thanks for starting this thread Wes. I concur with this. We got bitten
> recently by many issues that we could have caught in ci had telemetry
> been enabled. I spoke to trown and Emilien about this a few times
> already. I do understand the resource footprint it causes.  But with
> recent improvements and changes upstream, things should be back to
> being more manageable. We do have telemetry tested in scenario001 job,
> but that doesn't cover all scenarios. So there is a gap in coverage.
>
>  I hope we can either re-enable these services by default in CI and
> how things work or at least add a separate gate job to be able to test
> HA scenario properly with telemetry enabled.
>
> --
> Cheers,
> ~ Prad
>

While Prad and were having the conversation, I raised the point that the
tripleo
community may be more willing to turn on more coverage for ceilometer if
the
gate-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet-nv job that runs
on ceilometer changes
was moved from non-voting to a voting job.

Note, we are trying to get more and more projects to run tripleo based jobs
in their check gates generally.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Upgrade CI job for O->P (containerization)

2017-05-10 Thread Wesley Hayutin
On Wed, May 10, 2017 at 9:26 AM, Jiří Stránský  wrote:

> Hi all,
>
> the upgrade job which tests Ocata -> Pike/master upgrade (from bare-metal
> to containers) just got a green flag from the CI [1].
>
> I've listed the remaining patches we need to land at the very top of the
> container CI etherpad [2], please let's get them reviewed and landed as
> soon as we can. The sooner we get the job going, the fewer upgrade
> regressions will get merged in the meantime (e.g. we have one from last
> week).
>
> The CI job utilizes mixed release deployment (master undercloud, overcloud
> deployed as Ocata and upgraded to latest). It tests the main overcloud
> upgrade phase (no separate compute role upgrades, no converge phase). This
> means the testing isn't exhaustive to the full expected "production
> scenario", but it covers the most important part where we're likely to see
> the most churn and potential breakages. We'll see how much spare wall time
> we have to add more things once we get the job to run on patches regularly.
>
>
> Thanks and have a good day!
>
> Jirka
>
> [1] http://logs.openstack.org/61/460061/15/experimental/gate-tri
> pleo-ci-centos-7-containers-multinode-upgrades-nv/d7faa50/
> [2] https://etherpad.openstack.org/p/tripleo-containers-ci


Really nice work Jirka
Thank you

>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] tripleo periodic jobs moving to RDO's software factory and RDO Cloud

2017-06-12 Thread Wesley Hayutin
Greetings,

I wanted to send out a summary email regarding some work that is still
developing and being planned to give interested parties time to comment and
prepare for change.

Project:
Move tripleo periodic promotion jobs

Goal:
Increase the cadence of tripleo-ci periodic promotion jobs in a way
that does not impact upstream OpenStack zuul queues and infrastructure.

Next Steps:
The dependencies in RDO's instance of software factory are now complete
and we should be able to create a new a net new zuul queue in RDO infra for
tripleo-periodic jobs.  These jobs will have to run both multinode nodepool
and ovb style jobs and utilize RDO-Cloud as the host cloud provider.  The
TripleO CI team is looking into moving the TripleO periodic jobs running
upstream to run from RDO's software factory instance. This move will allow
the CI team more flexibility in managing the periodic jobs and resources to
run the jobs more frequently.

TLDR:
There is no set date as to when the periodic jobs will move. The move
will depend on tenant resource allocation and how easily the periodic jobs
can be modified.  This email is to inform the group that changes are being
planned to the tripleo periodic workflow and allow time for comment and
preparation.

Completed Background Work:
After long discussion with Paul Belanger about increasing the cadence
of the promotion jobs [1]. Paul explained infa's position and if he doesn't
-1/-2 a new pipeline that has the same priority as check jobs someone else
will. To summarize the point, the new pipeline would compete and slow down
non-tripleo projects in the gate even when the hardware resources are our
own.
To avoid slowing down non-tripleo projects Paul has volunteered to help
setup the infrastructure in rdoproject to manage the queue ( zuul etc). We
would still use rh-openstack-1 / rdocloud for ovb, and could also trigger
multinode nodepool jobs.
There is one hitch though, currently, rdo-project does not have all the
pieces of the puzzle in place to move off of openstack zuul and onto
rdoproject zuul. Paul mentioned that nodepool-builder [2] is a hard
requirement to be setup in rdoproject before we can proceed here. He
mentioned working with the software factory guys to get this setup and
running.
At this time, I think this issue is blocked until further discussion.
[1] https://review.openstack.org/#/c/443964/
[2]
https://github.com/openstack-infra/nodepool/blob/master/nodepool/builder.py

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] where to find the CI backlog and issues we're tracking

2017-06-21 Thread Wesley Hayutin
On Tue, Jun 20, 2017 at 2:51 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Tue, Jun 20, 2017 at 12:49 PM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> > Greetings,
> >
> > It's become apparent that everyone in the tripleo community may not be
> aware
> > of where CI specific work is tracked.
> >
> > To find out which CI related features or bug fixes are in progress or to
> see
> > the backlog please consult [1].
> >
> > To find out what issues have been found in OpenStack via CI please
> consult
> > [2].
> >
> > Thanks!
>
> Thanks Wes for these informations. I was about to start adding more
> links and informations when I realized monitoring TripleO CI might
> deserve a little bit of training and documentation.
> I'll take some time this week to create a new section in TripleO docs
> with useful informations that we can easily share with our community
> so everyone can learn how to be aware about CI status.
>
>
Emilien,
That's a really good point, we should have this information in doc.
You are a busy guy, we'll take care of that.

Thanks for the input!


>
> >
> > [1] https://trello.com/b/U1ITy0cu/tripleo-ci-squad
> > [2] https://trello.com/b/WXJTwsuU/tripleo-and-rdo-ci-status
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ci] where to find the CI backlog and issues we're tracking

2017-06-20 Thread Wesley Hayutin
Greetings,

It's become apparent that everyone in the tripleo community may not be
aware of where CI specific work is tracked.

To find out which CI related features or bug fixes are in progress or to
see the backlog please consult [1].

To find out what issues have been found in OpenStack via CI please consult
[2].

Thanks!


[1] https://trello.com/b/U1ITy0cu/tripleo-ci-squad
[2] https://trello.com/b/WXJTwsuU/tripleo-and-rdo-ci-status
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] rh1 issues post-mortem

2017-06-02 Thread Wesley Hayutin
On Fri, Jun 2, 2017 at 4:42 PM, Ben Nemec  wrote:

>
>
> On 03/28/2017 05:01 PM, Ben Nemec wrote:
>
>> Final (hopefully) update:
>>
>> All active compute nodes have been rebooted and things seem to be stable
>> again.  Jobs are even running a little faster, so I'm thinking this had
>> a detrimental effect on performance too.  I've set a reminder for about
>> two months from now to reboot again if we're still using this environment.
>>
>
> The reminder popped up this week, and I've rebooted all the compute nodes
> again.  It went pretty smoothly so I doubt anyone noticed that it happened
> (except that I forgot to restart the zuul-status webapp), but if you run
> across any problems let me know.


Thanks Ben! http://zuul-status.tripleo.org/ is awesome, I missed it.


>
>
>
>> On 03/24/2017 12:48 PM, Ben Nemec wrote:
>>
>>> To follow-up on this, we've continued to hit this issue on other compute
>>> nodes.  Not surprising, of course.  They've all been up for about the
>>> same period of time and have had largely even workloads.
>>>
>>> It has caused problems though because it is cropping up faster than I
>>> can respond (it takes a few hours to cycle all the instances off a
>>> compute node, and I need to sleep sometime :-), so I've started
>>> pre-emptively rebooting compute nodes to get ahead of it.  Hopefully
>>> I'll be able to get all of the potentially broken nodes at least
>>> disabled by the end of the day so we'll have another 3 months before we
>>> have to worry about this again.
>>>
>>> On 03/24/2017 11:47 AM, Derek Higgins wrote:
>>>
 On 22 March 2017 at 22:36, Ben Nemec  wrote:

> Hi all (owl?),
>
> You may have missed it in all the ci excitement the past couple of
> days, but
> we had a partial outage of rh1 last night.  It turns out the OVS port
> issue
> Derek discussed in
> http://lists.openstack.org/pipermail/openstack-dev/2016-Dece
> mber/109182.html
>
>
> reared its ugly head on a few of our compute nodes, which caused them
> to be
> unable to spawn new instances.  They kept getting scheduled since it
> looked
> like they were underutilized, which caused most of our testenvs to
> fail.
>
> I've rebooted the affected nodes, as well as a few more that looked
> like
> they might run into the same problem in the near future.  Everything
> looks
> to be working well again since sometime this morning (when I disabled
> the
> broken compute nodes), but there aren't many jobs passing due to the
> plethora of other issues we're hitting in ci.  There have been some
> stable
> job passes though so I believe things are working again.
>
> As far as preventing this in the future, the right thing to do would
> probably be to move to a later release of OpenStack (either point or
> major)
> where hopefully this problem would be fixed.  However, I'm hesitant
> to do
> that for a few reasons.  First is "the devil you know". Outside of this
> issue, we've gotten rh1 pretty rock solid lately.  It's been
> overworked, but
> has been cranking away for months with no major cloud-related outages.
> Second is that an upgrade would be a major process, probably
> involving some
> amount of downtime.  Since the long-term plan is to move everything
> to RDO
> cloud I'm not sure that's the best use of our time at this point.
>

 +1 on keeping the status quo until moving to rdo-cloud.


> Instead, my plan for the near term is to keep a closer eye on the error
> notifications from the services.  We previously haven't had anything
> consuming those, but I've dropped a little tool on the controller
> that will
> dump out error notifications so we can watch for signs of this
> happening
> again.  I suspect the signs were there long before the actual breakage
> happened, but nobody was looking for them.  Now I will be.
>
> So that's where things stand with rh1.  Any comments or concerns
> welcome.
>
> Thanks.
>
> -Ben
>
> 
> __
>
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

 
 __


 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


>>> 
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage 

Re: [openstack-dev] [tripleo] rdo and tripleo container builds and CI

2017-06-02 Thread Wesley Hayutin
On Fri, Jun 2, 2017 at 11:42 AM, Attila Darazs  wrote:

> If the topics below interest you and you want to contribute to the
> discussion, feel free to join the next meeting:
>
> Time: Thursdays, 14:30-15:30 UTC
> Place: https://bluejeans.com/4113567798/
>
> Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
>
> = CI Promotion problems =
>
> The last promoted DLRN hash is from 21st of May, so now it's 12 day old.
> This is mostly due to not being able to thoroughly gate everything that
> consists of TripleO and we're right in the middle of the cycle where most
> work happens and a lot of code gets merged into every project.
>
> However we should still try our best to improve the situation. If you're
> in any position to help solve our blocker problems (the bugs are announced
> on #tripleo regularly), please lend a hand!
>
> = Smaller topics =
>
> * We also had a couple of issues due to trying to bump Ansible from 2.2 to
> version 2.3 in Quickstart. This uncovered a couple of gaps in our gating,
> and we decided to revert until we fix them.
>
> * We're on track with transitioning some OVB jobs to RDO Cloud, now we
> need to create our infrastructure there and add the cloud definition to
> openstack-infra/project-config.
>
> * We have RDO containers built on the CentOS CI system[1]. We should
> eventually integrate them into the promotion pipeline. Maybe use them as
> the basis for upstream CI runs eventually?
>

Thanks for sending this out Attila..

So after some discussion with David and others I wanted to spell out a bit
of a nuance that may cause this to take a little bit more time and effort.

The original plan was to build and test containers as part of the rdo
master pipeline [1].  We were on track to complete this work in the next
couple days.
However what we realized was that rdo has to feed tripleo container builds
for tripleo promotions, and tripleo promotions are always done on a random
new delorean hash.
There is no way to determine which hash tripleo will pick up, and therefore
no way to ensure the containers and rpms are at the exact same versions.
It's critical that rpms and containers are built using the exact same repos
afaik.

It is also good form and upstream policy for the tools, jobs and build
artifacts to be created upstream.

So the new plan is to build and test containers in the tripleo periodic
jobs that are used for the tripleo promotions.
When the containers pass a build they will be uploaded to the container
registry in rdo with a tag, e.g. current-tripleo.

The main point of this email is to levelset expectations that it will take
a little more time to get this done upstream.
I am very open to hearing suggestions, comments and critques of the new
high level plan.

Thank you!

[1]
https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-master-current-tripleo/



>
> * Our periodic tempest jobs are getting good results on both Ocata and
> Master, Arx keeps ironing out the remaining failures. See the current
> status here: [2].
>
> * The featureset discussion is coming to an end, we have a good idea how
> what should go in which config files, now the cores should document that to
> help contributors make the right calls when creating new config files or
> modifying existing ones.
>
> Thank you for reading the summary. Have a great weekend!
>
> Best regards,
> Attila
>
> [1] https://ci.centos.org/job/rdo-tripleo-containers-build/
> [2] http://status.openstack.org/openstack-health/#/g/project/ope
> nstack-infra~2Ftripleo-ci?searchJob=
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] RFC, be more prescriptive in our log collection in CI

2017-10-12 Thread Wesley Hayutin
Greetings,

The upstream log server is overloaded a bit, I think we can help their
cause.
Please review and add comments to this bug [1]

Yesterday the following two related patches landed [2], [3]

Thank you!


[1] https://bugs.launchpad.net/tripleo/+bug/1723182
[2] https://review.openstack.org/#/c/511349/
[3] https://review.openstack.org/#/c/511347/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] plans on testing minor updates?

2017-09-28 Thread Wesley Hayutin
On Thu, Sep 28, 2017 at 3:23 AM, Steven Hardy  wrote:

> On Thu, Sep 28, 2017 at 8:04 AM, Marios Andreou 
> wrote:
> >
> >
> > On Thu, Sep 28, 2017 at 9:50 AM, mathieu bultel 
> wrote:
> >>
> >> Hi,
> >>
> >>
> >> On 09/28/2017 05:05 AM, Emilien Macchi wrote:
> >> > I was reviewing https://review.openstack.org/#/c/487496/ and
> >> > https://review.openstack.org/#/c/487488/ when I realized that we
> still
> >> > didn't have any test coverage for minor updates.
> >> > We never had this coverage AFICT but this is not a reason to not push
> >> > forward it.
> >> Thank you for the review and the -2! :)
> >> So I'm agree with you, we need CI coverage for that part, and I was
> >> wondering how I can put quickly a test in CI for the minor update.
> >> But before that, just few things to take in account regarding those
> >> reviews:
> >>
> >
> > agree on the need for the ci coverage, but disagree on blocking this. by
> the
> > same logic we should not have landed anything minor update related during
> > the previous cycle. This is the very last part for
> > https://bugs.launchpad.net/tripleo/+bug/1715557 - wiring up the
> mechanism
> > into client and what's more matbu has managed to do it 'properly' with a
> > tripleo-common mistral action wired up to the tripleoclient cli.
> >
> > I don't think its right we don't have coverage but I also don't think its
> > right to block these last patches,
>
> Yeah I agree - FWIW we have discussed this before, and AIUI the plan was:
>
> 1 - Get multinode coverage of an HA deployment with more than on
> controller (e.g the 3nodes job) but with containers enabled
> 2- Implement a rolling minor update test based on that
> multi-controller HA-with-containers test
>
> AFAIK we're only starting to get containers+pacemaker CI scenarios
> working with one controller, so it's not really reasonable to block
> this, since that is a prerequisite to the multi-controller test, which
> is a prerequisite to the rolling update test.
>
> Personally I think we'd be best to aim directly for the rolling update
> test in CI, as doing a single node minor update doesn't really test
> the most important aspect (e.g zero downtime).
>
> The other challenge here is the walltime relative to the CI timeout -
> we've been running into that for the containers upgrade job, and I
> think we need to figure out optimizations there which may also be
> required for minor update testing (maybe we can work around that by
> only updating a very small number of containers, but that will reduce
> the test coverage considerably?)
>

OK.. I think the solution is to start migrating these jobs to RDO Software
Factory third party testing.

Here is what I propose:
1. Start with an experiment check job
https://review.rdoproject.org/r/#/c/9823/
This will help us confirm that everything works or fails as we expect.  We
are
also afforded a configurable timeout \0/. It's currently set to 360 minutes
for the overcloud upgrade jobs.

2. Once this is proven out, we can run upgrade jobs as third party on any
review upstream

3. New coverage should be prototyped in RDO Software Factory

4. If jobs prove to be reliable and consistent and run under 170 minutes we
move what
we can back upstream.

WDYT?


>
> I completely agree we need this coverage, and honestly we should have
> had it a long time ago, but we need to make progress on this last
> critical blocker for pike, while continuing to make progress on the CI
> coverage (which should certainly be a top priority for the Lifecycle
> squad, as soon as we have this completely new-for-pike minor updates
> workflow fully implemented and debugged).
>
> Thanks,
>
> Steve
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] plans on testing minor updates?

2017-09-28 Thread Wesley Hayutin
On Thu, Sep 28, 2017 at 12:32 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Thu, Sep 28, 2017 at 9:22 AM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> [...]
> > OK.. I think the solution is to start migrating these jobs to RDO
> Software
> > Factory third party testing.
> >
> > Here is what I propose:
> > 1. Start with an experiment check job
> > https://review.rdoproject.org/r/#/c/9823/
> > This will help us confirm that everything works or fails as we expect.
> We
> > are
> > also afforded a configurable timeout \0/. It's currently set to 360
> minutes
> > for the overcloud upgrade jobs.
> >
> > 2. Once this is proven out, we can run upgrade jobs as third party on any
> > review upstream
> >
> > 3. New coverage should be prototyped in RDO Software Factory
> >
> > 4. If jobs prove to be reliable and consistent and run under 170 minutes
> we
> > move what
> > we can back upstream.
> >
> > WDYT?
>
> I think this is mega cool, although your work is related to *Upgrades*
> and not minor updates but still super cool.
>
> Note: FTR we discussed on IRC that we would probably do the same kind
> of thing for minor updates testing.
>
> Thanks Wes,
> --
> Emilien Macchi
>

Right, I'm going to first attempt to get what we *have* running, and then
get the new jobs
we need in there as well. :))


>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] add mistral to the auto-update package list for TripleO CI

2017-08-24 Thread Wesley Hayutin
Greetings,

I'd like to propose that the mistral project be added to the list of
projects where in CI the very latest built packages are added to each CI
run [1].

This will help get patches that depend on mistral patches to more quickly
be tested and merged.  For example Honza's patch [2] depends on a merged
mistral change.  The mistral change has not yet landed in a tripleo build
and mistral is not on the auto-update list, so the patch fails.

Please respond if you would like to see mistral added or have any comments
or concerns.

Note that we are able to consider mistral for auto-updates because the
mistral project has a voting tripleo job [3] and the tripleo project can be
assured that the latest mistral patches will not break tripleo-ci.

I would encourage other projects to consider adding tripleo jobs to their
project to enable auto-updates as well [4] 

[1]
https://github.com/openstack/tripleo-quickstart/blob/master/config/release/tripleo-ci/master.yml#L54-L70
[2] https://review.openstack.org/#/c/469608/
[3]
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L11665
[4]
https://docs.openstack.org/tripleo-docs/latest/contributor/check_gates.html
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Wesley Hayutin
On Mon, Aug 28, 2017 at 10:19 AM, Paul Belanger 
wrote:

> On Mon, Aug 28, 2017 at 09:42:45AM -0400, David Moreau Simard wrote:
> > Hi,
> >
> > (cc whom I would at least like to attend)
> >
> > The PTG would be a great opportunity to talk about CI design/layout
> > and how we see things moving forward in TripleO with Zuul v3, upstream
> > and in review.rdoproject.org.
> >
> > Can we have a formal session on this scheduled somewhere ?
> >
> Wednesday onwards likely is best for me, otherwise, I can find time during
> Mon-Tues if that is better.
>
>
+1 from me, I'm sure John, Sagi, and Arx are also interested.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] containerized undercloud update

2017-11-30 Thread Wesley Hayutin
Greetings,

Just wanted to share some progress with the containerized undercloud work.
Ian pushed some of the patches along and we now have a successful
undercloud install with containers.

The initial undercloud install works [1]
The idempotency check failed where we reinstall the undercloud [2]

Question: Do we expect the reinstallation to work at this point? Should the
check be turned off?

I will try it w/o the idempotency check, I suspect I will run into errors
in a full run with an overcloud deployment.  I ran into issues weeks ago.
I suspect if we do hit something it will be CI related as Dan Price has
been deploying the overcloud for a while now.  Dan I may need to review
your latest doit.sh scripts to check for diffs in the CI.

Thanks


[1]
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_install.log.txt.gz
[2]
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_reinstall.log.txt.gz#_2017-11-30_19_51_26
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Ronelle Landy for Tripleo-Quickstart/Extras/CI core

2017-11-29 Thread Wesley Hayutin
+1 woot

On Wed, Nov 29, 2017 at 2:44 PM, Alex Schultz  wrote:

> +1
>
> On Wed, Nov 29, 2017 at 12:34 PM, John Trowbridge 
> wrote:
> > I would like to propose Ronelle be given +2 for the above repos. She has
> > been a solid contributor to tripleo-quickstart and extras almost since
> the
> > beginning. She has solid review numbers, but more importantly has always
> > done quality reviews. She also has been working in the very intense rover
> > role on the CI squad in the past CI sprint, and has done very well in
> that
> > role.
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-10 Thread Wesley Hayutin
On Wed, Nov 8, 2017 at 5:10 PM, Wesley Hayutin <whayu...@redhat.com> wrote:

>
>
> On Wed, Nov 8, 2017 at 5:00 PM, Alex Schultz <aschu...@redhat.com> wrote:
>
>> On Tue, Nov 7, 2017 at 2:59 PM, Emilien Macchi <emil...@redhat.com>
>> wrote:
>> > On Wed, Nov 8, 2017 at 3:30 AM, James Slagle <james.sla...@gmail.com>
>> wrote:
>> >> On Sun, Nov 5, 2017 at 7:01 PM, Emilien Macchi <emil...@redhat.com>
>> wrote:
>> >>> On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince <dpri...@redhat.com>
>> wrote:
>> >>> [...]
>> >>>
>> >>>>  -CI resources: better use of CI resources. At the PTG we received
>> >>>> feedback from the OpenStack infrastructure team that our upstream CI
>> >>>> resource usage is quite high at times (even as high as 50% of the
>> >>>> total). Because of the shared framework and single node capabilities
>> we
>> >>>> can re-architecture much of our upstream CI matrix around single
>> node.
>> >>>> We no longer require multinode jobs to be able to test many of the
>> >>>> services in tripleo-heat-templates... we can just use a single cloud
>> VM
>> >>>> instead. We'll still want multinode undercloud -> overcloud jobs for
>> >>>> testing things like HA and baremetal provisioning. But we can cover a
>> >>>> large set of the services (in particular many of the new scenario
>> jobs
>> >>>> we added in Pike) with single node CI test runs in much less time.
>> >>>
>> >>> After the last (terrible) weeks in CI, it's pretty clear we need to
>> >>> find a solution to reduce and optimize our testing.
>> >>> I'm now really convinced by switching our current scenarios jobs to
>> >>> NOT deploy the overcloud, and just an undercloud with composable
>> >>> services & run tempest.
>> >>
>> >> +1 if you mean just the scenarios.
>> >
>> > Yes, just scenarios.
>> >
>> >> I think we need to keep at least 1 multinode job voting that deploys
>> >> the overcloud, probably containers-multinode.
>> >
>> > Yes, exactly, and also work on optimizing OVB jobs (maybe just keep
>> > one or 2 jobs, instead 3).
>> >
>> >>> Benefits:
>> >>> - deploy 1 node instead of 2 nodes, so we save nodepool resources
>> >>> - faster (no overcloud)
>> >>> - reduce gate queue time, faster development process, faster CI
>> >>>
>> >>> Challenges:
>> >>> - keep overcloud testing, with OVB
>> >>
>> >> This is why I'm not sure what you're proposing. Do you mean switch all
>> >> multinode jobs to be just an undercloud, or just the scenarios?
>> >
>> > Keep 1 or 2 OVB jobs, to test ironic + mistral + HA (HA could be
>> > tested with multinode though but well).
>> >
>> >>> - reduce OVB to strict minimum: Ironic, Nova, Mistral and basic
>> >>> containerized services on overcloud.
>> >>>
>> >>> I really want to get consensus on these points, please raise your
>> >>> voice now before we engage some work on that front.
>> >>
>> >> I'm fine to optimize the scenarios to be undercloud driven, but feel
>> >> we still need a multinode job that deploys the overcloud in the gate.
>> >> Otherwise, we'll have nothing that deploys an overcloud in the gate,
>> >> which is a step in the wrong direction imo. Primarily, b/c of the loss
>> >> of coverage around mistral and all of our workflows. Perhaps down the
>> >> road we could find ways to optimize that by using an ephemeral Mistral
>> >> (similar to the ephemeral Heat container), and then use a single node,
>> >> but we're not there yet.
>> >>
>> >> On the other hand, if the goal is just to test less upstream so that
>> >> we can more quickly merge code, then *not* deploying an overcloud in
>> >> the gate at all seems to fit that goal. Is that what you're after?
>> >
>> > Yes. Thanks for reformulate with better words.
>> > Just to be clear, I want to transform the scenarios into single-node
>> > jobs that deploy the SAME services (using composable services) from
>> > the undercloud, using the new ansible installer. I also want to keep
>> > running Tempest.
>> > And of course, like we said, keep one mult

[openstack-dev] [tripleo][ci] log collection in upstream jobs

2017-11-10 Thread Wesley Hayutin
Greetings,

Infra asked the tripleo team to cut down on the logs we're producing
upstream.  We are using a lot of space on their servers and also it's
taking too long to collect the logs themselves.

We need to compromise and be flexible here, so I'd like the tripleo-ci and
tripleo core to take another pass at this review w/ fresh eyes.  I would
ask then anything that would justify a -2 to be called out and worked on in
this thread.  Please be specific, I don't want to hear we need all the logs
to do our job as that is not possible.   Thanks all!

https://review.openstack.org/#/c/511526/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Please do not approve or recheck anything not related to CI alert bugs

2017-11-13 Thread Wesley Hayutin
On Sat, Nov 11, 2017 at 10:47 PM, Alex Schultz  wrote:

> Ok so here's the current status of things.  I've gone through some of
> the pending patches and sent them to the gate over the weekend since
> the gate was empty (yay!).  We've managed to land a bunch of patches.
> That being said for any patch for master with scenario jobs, please do
> not recheck/approve. Currently the non-containerized scenario001/004
> jobs are broken due to Bug 1731688[0] (these run on
> tripleo-quickstart-extras/tripleo-ci).  There is a patch[1] out for a
> revert of the breaking change. The scenario001-container job is super
> flaky due to Bug 1731063[2] and we could use some help figuring out
> what's going on.  We're also seeing some issues around heat
> interactions[3][4] but those seems to be less of a problem than the
> previously mentioned bugs.
>
> So at the moment any changes that don't have scenario jobs associated
> with them may be approved/rechecked freely.  We can discuss on Monday
> what to do about the scenario jobs if we still are running into issues
> without a solution in sight.  Also please keep an eye on the gate
> queue[5] and don't approve things if it starts getting excessively
> long.
>
> Thanks,
> -Alex
>
>
> [0] https://bugs.launchpad.net/tripleo/+bug/1731688
> [1] https://review.openstack.org/#/c/519041/
> [2] https://bugs.launchpad.net/tripleo/+bug/1731063
> [3] https://bugs.launchpad.net/tripleo/+bug/1731032
> [4] https://bugs.launchpad.net/tripleo/+bug/1731540
> [5] http://zuulv3.openstack.org/
>
> On Wed, Nov 8, 2017 at 3:39 PM, Alex Schultz  wrote:
> > So we have some good news and some bad news.  The good news is that
> > we've managed to get the gate queue[0] under control since we've held
> > off on pushing new things to the gate.  The bad news is that we've
> > still got some random failures occurring during the deployment of
> > master.  Since we're not seeing infra related issues, we should be OK
> > to merge things to stable/* branches.  Unfortunately until we resolve
> > the issues in master[1] we could potentially backup the queue.  Please
> > do not merge things that are not critical bugs.  I would ask that
> > folks please take a look at the open bugs and help figure out what is
> > going wrong. I've created two issues today that I've seen in the gate
> > that we don't appear to have open patches for. One appears to be an
> > issue in the heat deployment process[3] and the other is related to
> > the tempest verification of being able to launch a VM & ssh to it[4].
> >
> > Thanks,
> > -Alex
> >
> > [3] https://bugs.launchpad.net/tripleo/+bug/1731032
> > [4] https://bugs.launchpad.net/tripleo/+bug/1731063
> >
> > On Tue, Nov 7, 2017 at 8:33 AM, Alex Schultz 
> wrote:
> >> Hey Folks
> >>
> >> So we're at 24+ hours again in the gate[0] and the queue only
> >> continues to grow. We currently have 6 ci/alert bugs[1]. Please do not
> >> approve of recheck anything that isn't related to these bugs.  I will
> >> most likely need to go through the queue and abandon everything to
> >> clear it up as we are consistently hitting timeouts on various jobs
> >> which is preventing anything from merging.
> >>
> >> Thanks,
> >> -Alex
> >>
> > [0] http://zuulv3.openstack.org/
> > [1] https://bugs.launchpad.net/tripleo/+bugs?field.searchtext==-
> importance%3Alist=NEW%
> 3Alist=CONFIRMED%3Alist=TRIAGED%
> 3Alist=INPROGRESS%3Alist=CRITICAL&
> assignee_option=any=_reporter=&
> field.bug_commenter==_
> subscriber==ci+alert_combinator=
> ALL_cve.used=_dupes.used=_
> dupes=on_me.used=_patch.used=&
> field.has_branches.used=_branches=on
> has_no_branches.used=_no_branches=on_
> blueprints.used=_blueprints=on_no_
> blueprints.used=_no_blueprints=on=Search
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


Thanks for continuing to push on this Alex!
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] New hash for master promoted

2017-11-03 Thread Wesley Hayutin
On Fri, Nov 3, 2017 at 5:52 AM, Gabriele Cerami  wrote:

> Hi,
>
> since the failures in Tuesday's tests were caused by infra problems, we
> believed a specific hash involved in the failures to remain the best
> candidate for a promotion.
> So we pinned it in our promotion pipeline and insisted on it for some
> hours. It has passed all the tests twice, so we dacided to promote.
> You will see the promotion date set to today, but the hash
>
> 3b718f3fecc866332ec0663fa77e758f8346ab93_4204ba89
>
> is actually from Tuesday.
>
> Thanks.
>
>
First off, thanks to TripleO team for helping to resolve so many issues [1].
I would especially like to thank Alex, Emilien and Gabriele for the VERY
LONG hours spent closing out issues.

Secondly I'd like to warn folks that some jobs started prior to the
promotion and are using rpms from September.
This is causing failures in tempest [2].  The jobs running with recently
promoted yum repos [3], are passing.
We are monitoring the queue for additional errors.

Thanks


[1] https://etherpad.openstack.org/p/tripleo-promotion-blockers-october-2017
[2]
http://logs.openstack.org/21/517221/2/gate/legacy-tripleo-ci-centos-7-containers-multinode/a91ef0a/job-output.txt.gz
http://logs.openstack.org/21/517221/2/gate/legacy-tripleo-ci-centos-7-containers-multinode/a91ef0a/logs/undercloud/var/log/extra/yum-list-installed.txt.gz
[3] https://trunk.rdoproject.org/centos7-master/current-tripleo/




> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] New hash for master promoted

2017-11-05 Thread Wesley Hayutin
On Sun, Nov 5, 2017 at 7:55 AM, Adriano Petrich  wrote:

> Amazing! Thanks for the effort that took to get the promotion fixed!
>
> On Fri, Nov 3, 2017 at 8:17 PM, Jason E. Rist  wrote:
>
>> On 11/03/2017 03:52 AM, Gabriele Cerami wrote:
>> > Hi,
>> >
>> > since the failures in Tuesday's tests were caused by infra problems, we
>> > believed a specific hash involved in the failures to remain the best
>> > candidate for a promotion.
>> > So we pinned it in our promotion pipeline and insisted on it for some
>> > hours. It has passed all the tests twice, so we dacided to promote.
>> > You will see the promotion date set to today, but the hash
>> >
>> > 3b718f3fecc866332ec0663fa77e758f8346ab93_4204ba89
>> >
>> > is actually from Tuesday.
>> >
>> > Thanks.
>> >
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.op
>> enstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> Woot! Thanks!
>>
>> -J
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
Master has promoted once again on 11/04.
Hopefully we'll see regular promotions on master for a while now.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-05 Thread Wesley Hayutin
On Sun, Nov 5, 2017 at 7:01 PM, Emilien Macchi  wrote:

> On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince  wrote:
> [...]
>
> >  -CI resources: better use of CI resources. At the PTG we received
> > feedback from the OpenStack infrastructure team that our upstream CI
> > resource usage is quite high at times (even as high as 50% of the
> > total). Because of the shared framework and single node capabilities we
> > can re-architecture much of our upstream CI matrix around single node.
> > We no longer require multinode jobs to be able to test many of the
> > services in tripleo-heat-templates... we can just use a single cloud VM
> > instead. We'll still want multinode undercloud -> overcloud jobs for
> > testing things like HA and baremetal provisioning. But we can cover a
> > large set of the services (in particular many of the new scenario jobs
> > we added in Pike) with single node CI test runs in much less time.
>
> After the last (terrible) weeks in CI, it's pretty clear we need to
> find a solution to reduce and optimize our testing.
> I'm now really convinced by switching our current scenarios jobs to
> NOT deploy the overcloud, and just an undercloud with composable
> services & run tempest.
>

First off, I'm really pleased that the containerized undercloud effort has
been reinvigerated for queens.  The containerized undercloud work has been
awesome, really nice work to everyone involved!!

I totally agree we should be shifting to using the undercloud only and
deploy the services we want need for the scenarios on the single node.

I think we should start putting plans in place to start shifting work to
the undercloud only approach, however I think it is way too early to talk
about not deploying the overcloud in CI.  I'd prefer to take a step by step
phased approach to such a large change.

Really good point Emilien and thanks for raising it!!




>
> Benefits:
> - deploy 1 node instead of 2 nodes, so we save nodepool resources
> - faster (no overcloud)
> - reduce gate queue time, faster development process, faster CI
>
> Challenges:
> - keep overcloud testing, with OVB
> - reduce OVB to strict minimum: Ironic, Nova, Mistral and basic
> containerized services on overcloud.
>
> I really want to get consensus on these points, please raise your
> voice now before we engage some work on that front.
>
> [...]
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-06 Thread Wesley Hayutin
On Mon, Nov 6, 2017 at 7:35 AM, Bogdan Dobrelya  wrote:

> On 11/6/17 1:01 AM, Emilien Macchi wrote:
>
>> On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince  wrote:
>> [...]
>>
>>   -CI resources: better use of CI resources. At the PTG we received
>>> feedback from the OpenStack infrastructure team that our upstream CI
>>> resource usage is quite high at times (even as high as 50% of the
>>> total). Because of the shared framework and single node capabilities we
>>> can re-architecture much of our upstream CI matrix around single node.
>>> We no longer require multinode jobs to be able to test many of the
>>> services in tripleo-heat-templates... we can just use a single cloud VM
>>> instead. We'll still want multinode undercloud -> overcloud jobs for
>>> testing things like HA and baremetal provisioning. But we can cover a
>>> large set of the services (in particular many of the new scenario jobs
>>> we added in Pike) with single node CI test runs in much less time.
>>>
>>
>> After the last (terrible) weeks in CI, it's pretty clear we need to
>> find a solution to reduce and optimize our testing.
>> I'm now really convinced by switching our current scenarios jobs to
>> NOT deploy the overcloud, and just an undercloud with composable
>> services & run tempest.
>>
>
> +1
> And we should start using the quickstart-extras undercloud-reploy role for
> that.
>
>
>> Benefits:
>> - deploy 1 node instead of 2 nodes, so we save nodepool resources
>> - faster (no overcloud)
>> - reduce gate queue time, faster development process, faster CI
>>
>> Challenges:
>> - keep overcloud testing, with OVB
>> - reduce OVB to strict minimum: Ironic, Nova, Mistral and basic
>> containerized services on overcloud.
>>
>> I really want to get consensus on these points, please raise your
>> voice now before we engage some work on that front.
>>
>> [...]
>>
>>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

OK,
Just got off the containers call.  We discussed the CI requirements for the
containerized undercloud.

In the upstream, launched via quickstart not tripleo.sh we want to see

1) undercloud-containers - a containerized install, should be voting by m1
2) undercloud-containers-update - minor updates run on containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be voting by m2.

The above three items will enable us to test the quality of just the
undercloud install.

Ian and I are also working together on testing full deployments with the
containerized
undercloud to test how stable full runs are generally.  This will
help us assess the readiness of switching over in full in queens.

This will also then lead into discussions and planning around where we can
remove
multinode testing in upstream and start to fully utilize the benefits of
the containerized undercloud.

Please contact myself or Sagi regarding changes in the CI for the
undercloud.
Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] legacy-tripleo-ci-centos-7-undercloud-containers job needs attention

2017-11-02 Thread Wesley Hayutin
Greetings,

We noticed the job  legacy-tripleo-ci-centos-7-undercloud-containers
failing in the gate in the following patches [1], [2].  The job was voting
in the review here [3].  ATM the job is non-voting in check and voting in
the gate.  This is a dangerous combination where the job can fail in check
unnoticed and also fail in the gate.  This can cause the gate to reset
often and delay other patches from merging.

We either need the job to become voting in check, or removed from the
gate.  Either action is fine but needs to be taken immediately.

Looking at some stats for the job itself comparing the containerized
undercloud vs. the old non-containerized job via [4].

legacy-tripleo-ci-centos-7-undercloud-containers
pass rate overall: 78%  as of 11/2/2017

legacy-tripleo-ci-centos-7-undercloud-oooq
pass rate overall: 92.6% as of 11/2/2017

Thanks for reading through this and for helping out in advance!

[1] https://review.openstack.org/#/c/514576/
[2] https://review.openstack.org/#/c/517023/
[3] https://review.openstack.org/#/c/513163/
[4] http://cistatus.tripleo.org/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo]

2017-11-07 Thread Wesley Hayutin
On Tue, Nov 7, 2017 at 5:47 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Wed, Nov 8, 2017 at 9:29 AM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> > Greetings,
> >
> > I'd like to propose we remove the upgrade jobs that are consistently
> failing
> > from the upstream infrastructure and instead focus our efforts in RDO
> > Software Factory.
> >
> > The jobs listed in https://review.openstack.org/#/c/518405/ are
> consistently
> > failing after being reviewed by myself and Mathieu.  I am leaving
> > legacy-tripleo-ci-centos-7-multinode-upgrades in place as it's passing
> with
> > an overall rate of 87%.
> >
> > It doesn't make any sense to continue to tax upstream resources on
> failing
> > jobs, lets get the jobs running correctly and consistently in rdo
> software
> > factory before moving these back to our mainline CI.
> >
> > Please let me know what you think of the proposal.
>
> +1 to remove them if we have the scenario upgrades (with parity from
> what we had upstream) in RDO CI.
> We'll need to make the job experimental in RDO CI, so we can only run
> them at demand until they actually work. Can we do that as well?
>
> Thanks,
>

There are already several upgrade jobs defined in the experimental queue
that can be triggered with "check rdo experimental"
Let's iterate on those, and then make them full 3rd party check jobs.
We talked about allowing rdo sf to -2 reviews upstream as well which would
be handy.

Thanks guys


> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo]

2017-11-07 Thread Wesley Hayutin
Greetings,

I'd like to propose we remove the upgrade jobs that are consistently
failing from the upstream infrastructure and instead focus our efforts in
RDO Software Factory.

The jobs listed in https://review.openstack.org/#/c/518405/ are
consistently failing after being reviewed by myself and Mathieu.  I am
leaving legacy-tripleo-ci-centos-7-multinode-upgrades in place as it's
passing with an overall rate of 87%.

It doesn't make any sense to continue to tax upstream resources on failing
jobs, lets get the jobs running correctly and consistently in rdo software
factory before moving these back to our mainline CI.

Please let me know what you think of the proposal.
Thanks

The upgrade job configuration for upgrades can be found in [1]
[1]
https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/jobs/tripleo-upstream.yml
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-08 Thread Wesley Hayutin
On Tue, Nov 7, 2017 at 9:07 PM, James Slagle  wrote:

> On Tue, Nov 7, 2017 at 4:59 PM, Emilien Macchi  wrote:
> > Yes. Thanks for reformulate with better words.
> > Just to be clear, I want to transform the scenarios into single-node
> > jobs that deploy the SAME services (using composable services) from
> > the undercloud, using the new ansible installer. I also want to keep
> > running Tempest.
> > And of course, like we said, keep one multinode job to test overcloud
> > workflow, and OVB with some adjustments.
> >
> > Is it good?
>
> +1
>
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


FYI,
Slow and I have been pair programming to get the latest containerized
undercloud work into the mainline CI.
The old containerized undercloud work in featureset027 was working and not
to be confused with this attempt.

Here's a link
http://logs.openstack.org/18/518118/2/check/legacy-tripleo-ci-centos-7-undercloud-containers/4eef40f/logs/undercloud/home/zuul/undercloud_install.log.txt.gz
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-08 Thread Wesley Hayutin
On Wed, Nov 8, 2017 at 5:00 PM, Alex Schultz  wrote:

> On Tue, Nov 7, 2017 at 2:59 PM, Emilien Macchi  wrote:
> > On Wed, Nov 8, 2017 at 3:30 AM, James Slagle 
> wrote:
> >> On Sun, Nov 5, 2017 at 7:01 PM, Emilien Macchi 
> wrote:
> >>> On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince  wrote:
> >>> [...]
> >>>
>   -CI resources: better use of CI resources. At the PTG we received
>  feedback from the OpenStack infrastructure team that our upstream CI
>  resource usage is quite high at times (even as high as 50% of the
>  total). Because of the shared framework and single node capabilities
> we
>  can re-architecture much of our upstream CI matrix around single node.
>  We no longer require multinode jobs to be able to test many of the
>  services in tripleo-heat-templates... we can just use a single cloud
> VM
>  instead. We'll still want multinode undercloud -> overcloud jobs for
>  testing things like HA and baremetal provisioning. But we can cover a
>  large set of the services (in particular many of the new scenario jobs
>  we added in Pike) with single node CI test runs in much less time.
> >>>
> >>> After the last (terrible) weeks in CI, it's pretty clear we need to
> >>> find a solution to reduce and optimize our testing.
> >>> I'm now really convinced by switching our current scenarios jobs to
> >>> NOT deploy the overcloud, and just an undercloud with composable
> >>> services & run tempest.
> >>
> >> +1 if you mean just the scenarios.
> >
> > Yes, just scenarios.
> >
> >> I think we need to keep at least 1 multinode job voting that deploys
> >> the overcloud, probably containers-multinode.
> >
> > Yes, exactly, and also work on optimizing OVB jobs (maybe just keep
> > one or 2 jobs, instead 3).
> >
> >>> Benefits:
> >>> - deploy 1 node instead of 2 nodes, so we save nodepool resources
> >>> - faster (no overcloud)
> >>> - reduce gate queue time, faster development process, faster CI
> >>>
> >>> Challenges:
> >>> - keep overcloud testing, with OVB
> >>
> >> This is why I'm not sure what you're proposing. Do you mean switch all
> >> multinode jobs to be just an undercloud, or just the scenarios?
> >
> > Keep 1 or 2 OVB jobs, to test ironic + mistral + HA (HA could be
> > tested with multinode though but well).
> >
> >>> - reduce OVB to strict minimum: Ironic, Nova, Mistral and basic
> >>> containerized services on overcloud.
> >>>
> >>> I really want to get consensus on these points, please raise your
> >>> voice now before we engage some work on that front.
> >>
> >> I'm fine to optimize the scenarios to be undercloud driven, but feel
> >> we still need a multinode job that deploys the overcloud in the gate.
> >> Otherwise, we'll have nothing that deploys an overcloud in the gate,
> >> which is a step in the wrong direction imo. Primarily, b/c of the loss
> >> of coverage around mistral and all of our workflows. Perhaps down the
> >> road we could find ways to optimize that by using an ephemeral Mistral
> >> (similar to the ephemeral Heat container), and then use a single node,
> >> but we're not there yet.
> >>
> >> On the other hand, if the goal is just to test less upstream so that
> >> we can more quickly merge code, then *not* deploying an overcloud in
> >> the gate at all seems to fit that goal. Is that what you're after?
> >
> > Yes. Thanks for reformulate with better words.
> > Just to be clear, I want to transform the scenarios into single-node
> > jobs that deploy the SAME services (using composable services) from
> > the undercloud, using the new ansible installer. I also want to keep
> > running Tempest.
> > And of course, like we said, keep one multinode job to test overcloud
> > workflow, and OVB with some adjustments.
> >
>
> So I'm ok with switching to use the containerized undercloud deploy to
> smoke test functionality of more complex openstack service
> deployments. What I would like to see prior to investing in this is
> that the plain containerized undercloud deploy job reliability is on
> par with the existing undercloud install.  We had to switch the
> undercloud-containers back to non-voting due to higher failure rates
> and it is still not voting.


Agree, once we have a little success I'll update featureset027 ( the
undercloud-containers )
which is still non-voting to use this updated containerized deployment.
Then we'll compare
the undercloud-oooq to undercloud-containers (fs027) After a few weeks of
running.


> With the current state of CI being
> questionable due to random failures which are not fully have resolved,
> I would prefer that we ensure existing CI is stable and that what we
> plan to move is as stable.
>

Agreed,
There are times IMHO when one must strike when the iron is hot on certain
parts of the work here.  I felt compelled to help bootstrap Ian with the
containerized undercloud work or see old habits remain and 

Re: [openstack-dev] [tripleo] [upgrade] CI status and Current effort.

2017-12-08 Thread Wesley Hayutin
On Fri, Dec 8, 2017 at 4:32 AM, Sofer Athlan-Guyot 
wrote:

> Hi,
>
> We (Upgrade Squad) have eventually save some time for fixing/creating
> the needed upgrade/update jobs.
>
> We have made an inventory of what currently exists there[1] and what is
> missing in the same spreadsheet.
>
> I recap the missing one here:
>
>  - minor updates (master and pike)
>  - UC non containerized to containerized
>  - ffu-undercould-upgrade
>  - ffu-overcloud-upgrade
>
> We are currently working on fixing the existing ocata->pike upgrade.
>
> You don’t see pike->master as we are reworking the workflow (as usual)
> and it’s non working.  But we have a jobs to track the progress in
> experimental[2] with the right depends-on.
>
> The two newcomers are FFU uc upgrade and oc upgrade (Fast Forward
> Upgrade, or upgrade from Newton to Queens).  We have a work in progress
> there[3] and there[4] for oc upgrade.
>
> It’s not yet working but would be nice to have those jobs in
> experimental as soon as we can to track our progress there and detect
> code/repo deletion that prevents the whole FFU to function.
>
> We’re are working on minor upgrade testing there[5] and there[6]
>
> Oki, nearly done...
>
> We have the tripleo-upgrade role that is stuck in transition.  This repo
> is an effort to share downstream QE testing with upstream.  Du to
> various reasons (speed, mainly) we kept using github instead of
> switching to the openstack one.  Now we are stuck there[7].  We would
> like a new import but if it is not possible, we will merge all the
> patches manually.
>
> So currently we don’t use the role but when it will be there we will do
> the switch.
>
> Thanks,
> --
> [1] https://ethercalc.openstack.org/iv2jr35o98
> [2] https://review.openstack.org/#/c/526006/
> [3] https://review.openstack.org/#/q/topic:ffu/ci+(status:open+
> OR+status:merged)
> [4] https://review.rdoproject.org/r/10827
> [5] https://review.openstack.org/#/q/topic:upgrade/ci+(status:
> open+OR+status:merged)
> [6] https://review.rdoproject.org/r/#/c/10878/
> [7] https://review.openstack.org/#/c/524141/
> --
> Sofer
>

Sofer, can you estimate when the redhat-openstack gerrit repo will be
dropped and the upstream
openstac/tripleo-upgrades role used in it's place.  I need to coordinate
the CI team to work you and
Jose Louis, however not using the upstream repo makes that difficult.

We have a meeting today, so we can also talk about it there.
Thanks



>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [upgrade] CI status and Current effort.

2017-12-09 Thread Wesley Hayutin
On Fri, Dec 8, 2017 at 10:11 PM, Emilien Macchi <emil...@redhat.com> wrote:

> On Fri, Dec 8, 2017 at 11:42 AM, Emilien Macchi <emil...@redhat.com>
> wrote:
> > On Fri, Dec 8, 2017 at 5:39 AM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
> > [...]
> >> Sofer, can you estimate when the redhat-openstack gerrit repo will be
> >> dropped and the upstream
> >> openstac/tripleo-upgrades role used in it's place.  I need to
> coordinate the
> >> CI team to work you and
> >> Jose Louis, however not using the upstream repo makes that difficult.
> >
> > When https://review.openstack.org/#/c/524141/ will be merged - we're
> > working on that. Alex cleanup the repo, today I rebased the patch and
> > added ansble-lint tests. When we have lint working, we'll probably
> > merge the patch.
> > From then, I don't see any blocker to use the upstream repo.
>
> OK, https://review.openstack.org/#/c/524141/ is now ready for review.
> I had to rebase
> https://review.openstack.org/#/c/524141/2..4/templates/
> create_registry_env.sh.j2
> - please carefully review this one.
>
> Otherwise, we fixed lint and update the CI job, all looks good.
> Once we land it, we need to make sure we have all commits from the old
> repo, and then we need to kill the repo (following this commit:
> https://github.com/openstack/puppet-tuskar/commit/
> d54150a1ad61034cb73c6160fb57956d30f2b2d9).
> Then, use the new repo, and enjoy.
>

We also need to be sure to setup CI properly on the
openstack/tripleo-upgrade repo :)


>
> Thanks,
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] latest centos iptables rpm broke tripleo ci

2017-10-21 Thread Wesley Hayutin
Greetings,

TripleO CI started to hit errors in any of the multinode node pool jobs
last Friday afternoon.
Please review the bug and patch to bring CI back online [1-2]

More work needs to be done to discover exactly why the latest rpm breaks
the vxlan network link between nodes.  It's also a good time to investigate
options regarding additional gatting for CentOS packages.  This patch
brings TripleO CI back online, but does not close the issue.

Thanks, have a good weekend.

[1] https://bugs.launchpad.net/tripleo/+bug/1725451
[2] https://review.openstack.org/#/c/513891/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-14 Thread Wesley Hayutin
On Sun, May 13, 2018 at 11:30 PM Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2018-05-13 20:44:25 -0600 (-0600), Wesley Hayutin wrote:
> [...]
> > I do think it would be helpful to say have a one week change
> > window where folks are given the opportunity to preflight check a
> > new image and the potential impact on the job workflow the updated
> > image may have. If I could update or create a non-voting job w/
> > the new image that would provide two things.
> >
> > 1. The first is the head's up, this new minor version of centos is
> > coming into the system and you have $x days to deal with it.
> >
> > 2. The ability to build a few non-voting jobs w/ the new image to
> > see what kind of impact it has on the workflow and deployments.
> [...]
>
> While I can see where you're coming from, right now even the Infra
> team doesn't know immediately when a new CentOS minor release starts
> to be used. The packages show up in the mirrors automatically and
> images begin to be built with them right away. There isn't a
> conscious "switch" which is thrown by anyone. This is essentially
> the same way we treat Ubuntu LTS point releases as well. If this is
> _not_ the way RHEL/CentOS are intended to be consumed (i.e. just
> upgrade to and run the latest packages available for a given major
> release series) then we should perhaps take a step back and
> reevaluate this model.


I think you may be conflating the notion that ubuntu or rhel/cent can be
updated w/o any issues to applications that run atop of the distributions
with what it means to introduce a minor update into the upstream openstack
ci workflow.

If jobs could execute w/o a timeout the tripleo jobs would have not gone
red.  Since we do have constraints in the upstream like a timeouts and
others we have to prepare containers, images etc to work efficiently in the
upstream.  For example, if our jobs had the time to yum update the roughly
120 containers in play in each job the tripleo jobs would have just
worked.  I am not advocating for not having timeouts or constraints on
jobs, however I am saying this is an infra issue, not a distribution or
distribution support issue.

I think this is an important point to consider and I view it as mostly
unrelated to the support claims by the distribution.  Does that make sense?
Thanks




> For now we have some fairly deep-driven
> assumptions in that regard which are reflected in the Linux
> distributions support policy of our project testing interface as
> documented in OpenStack governance.
> --
> Jeremy Stanley
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-14 Thread Wesley Hayutin
On Mon, May 14, 2018 at 10:36 AM Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2018-05-14 07:07:03 -0600 (-0600), Wesley Hayutin wrote:
> [...]
> > I think you may be conflating the notion that ubuntu or rhel/cent
> > can be updated w/o any issues to applications that run atop of the
> > distributions with what it means to introduce a minor update into
> > the upstream openstack ci workflow.
> >
> > If jobs could execute w/o a timeout the tripleo jobs would have
> > not gone red.  Since we do have constraints in the upstream like a
> > timeouts and others we have to prepare containers, images etc to
> > work efficiently in the upstream.  For example, if our jobs had
> > the time to yum update the roughly 120 containers in play in each
> > job the tripleo jobs would have just worked.  I am not advocating
> > for not having timeouts or constraints on jobs, however I am
> > saying this is an infra issue, not a distribution or distribution
> > support issue.
> >
> > I think this is an important point to consider and I view it as
> > mostly unrelated to the support claims by the distribution.  Does
> > that make sense?
> [...]
>
> Thanks, the thread jumped straight to suggesting costly fixes
> (separate images for each CentOS point release, adding an evaluation
> period or acceptance testing for new point releases, et cetera)
> without coming anywhere close to exploring the problem space. Is
> your only concern that when your jobs started using CentOS 7.5
> instead of 7.4 they took longer to run?


Yes, If they had unlimited time to run, our workflow would have everything
updated to CentOS 7.5 in the job itself and I would expect everything to
just work.


> What was the root cause? Are
> you saying your jobs consume externally-produced artifacts which lag
> behind CentOS package updates?


Yes, TripleO has externally produced overcloud images, and containers both
of which can be yum updated but we try to ensure they are frequently
recreated so the yum transaction is small.


> Couldn't a significant burst of new
> packages cause the same symptoms even without it being tied to a
> minor version increase?
>

Yes, certainly this could happen outside of a minor update of the baseos.


>
> This _doesn't_ sound to me like a problem with how we've designed
> our infrastructure, unless there are additional details you're
> omitting.


So the only thing out of our control is the package set on the base
nodepool image.
If that suddenly gets updated with too many packages, then we have to
scramble to ensure the images and containers are also udpated.
If there is a breaking change in the nodepool image for example [a], we
have to react to and fix that as well.


> It sounds like a problem with how the jobs are designed
> and expectations around distros slowly trickling package updates
> into the series without occasional larger bursts of package deltas.
> I'd like to understand more about why you upgrade packages inside
> your externally-produced container images at job runtime at all,
> rather than relying on the package versions baked into them.


We do that to ensure the gerrit review itself and it's dependencies are
built via rpm and injected into the build.
If we did not do this the job would not be testing the change at all.
 This is a result of being a package based deployment for better or worse.


> It
> seems like you're arguing that the existence of lots of new package
> versions which aren't already in your container images is the
> problem, in which case I have trouble with the rationalization of it
> being "an infra issue" insofar as it requires changes to the
> services as provided by the OpenStack Infra team.
>
> Just to be clear, we didn't "introduce a minor update into the
> upstream openstack ci workflow." We continuously pull CentOS 7
> packages into our package mirrors, and continuously rebuild our
> centos-7 images from whatever packages the distro says are current.
>

Understood, which I think is fine and probably works for most projects.
An enhancement could be to stage the new images for say one week or so.
Do we need the CentOS updates immediately? Is there a possible path that
does not create a lot of work for infra, but also provides some space for
projects
to prep for the consumption of the updates?


> Our automation doesn't know that there's a difference between
> packages which were part of CentOS 7.4 and 7.5 any more than it
> knows that there's a difference between Ubuntu 16.04.2 and 16.04.3.
> Even if we somehow managed to pause our CentOS image updates
> immediately prior to 7.5, jobs would still try to upgrade those
> 7.4-based images to the 7.5 packages in our mirror, right?
>

Understood, I suspect this will bec

Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-14 Thread Wesley Hayutin
On Mon, May 14, 2018 at 12:37 PM Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2018-05-14 09:57:17 -0600 (-0600), Wesley Hayutin wrote:
> > On Mon, May 14, 2018 at 10:36 AM Jeremy Stanley <fu...@yuggoth.org>
> wrote:
> [...]
> > > Couldn't a significant burst of new packages cause the same
> > > symptoms even without it being tied to a minor version increase?
> >
> > Yes, certainly this could happen outside of a minor update of the
> > baseos.
>
> Thanks for confirming. So this is not specifically a CentOS minor
> version increase issue, it's just more likely to occur at minor
> version boundaries.
>

Correct, you got it


>
> > So the only thing out of our control is the package set on the
> > base nodepool image. If that suddenly gets updated with too many
> > packages, then we have to scramble to ensure the images and
> > containers are also udpated.
>
> It's still unclear to me why the packages on the test instance image
> (i.e. the "container host") are related to the packages in the
> container guest images at all. That would seem to be the whole point
> of having containers?
>

You are right, just note some services are not 100% containerized yet.
This doesn't happen overnight it's a process and we're getting there.


>
> > If there is a breaking change in the nodepool image for example
> > [a], we have to react to and fix that as well.
>
> I would argue that one is a terrible workaround which happened to
> show its warts. We should fix DIB's pip-and-virtualenv element
> rather than continue rely on side effects of pinning RPM versions.
> I've commented to that effect on https://launchpad.net/bugs/1770298
> just now.
>
>
k.. thanks


> > > It sounds like a problem with how the jobs are designed
> > > and expectations around distros slowly trickling package updates
> > > into the series without occasional larger bursts of package deltas.
> > > I'd like to understand more about why you upgrade packages inside
> > > your externally-produced container images at job runtime at all,
> > > rather than relying on the package versions baked into them.
> >
> > We do that to ensure the gerrit review itself and it's
> > dependencies are built via rpm and injected into the build. If we
> > did not do this the job would not be testing the change at all.
> > This is a result of being a package based deployment for better or
> > worse.
> [...]
>
> Now I'll risk jumping to proposing solutions, but have you
> considered building those particular packages in containers too?
> That way they're built against the same package versions as will be
> present in the other container images you're using rather than to
> the package versions on the host, right? Seems like it would
> completely sidestep the problem.
>

So a little background.  The containers and images used in TripleO are
rebuilt multiple times each day via periodic jobs, when they pass our
criteria they are pushed out and used upstream.
Each zuul change and it's dependencies can potentially impact a few or all
the containers in play.   We can not rebuild all the containers due to time
constraints in each job.  We have been able to mount and yum update the
containers involved with the zuul change.

Latest patch to fine tune that process is here
https://review.openstack.org/#/c/567550/


>
> > An enhancement could be to stage the new images for say one week
> > or so. Do we need the CentOS updates immediately? Is there a
> > possible path that does not create a lot of work for infra, but
> > also provides some space for projects to prep for the consumption
> > of the updates?
> [...]
>
> Nodepool builds new images constantly, but at least daily. Part of
> this is to prevent the delta of available packages/indices and other
> files baked into those images from being more than a day or so stale
> at any given point in time. The older the image, the more packages
> (on average) jobs will need to download if they want to test with
> latest package versions and the more strain it will put on our
> mirrors and on our bandwidth quotas/donors' networks.
>

Sure that makes perfect sense.  We do the same with our containers and
images.


>
> There's also a question of retention, if we're building images at
> least daily but keeping them around for 7 days (storage on the
> builders, tenant quotas for Glance in our providers) as well as the
> explosion of additional nodes we'd need since we pre-boot nodes with
> each of our images (and the idea as I understand it is that you
> would want jobs to be able to select between any of them). One
> option, I suppose, would be to switch to building ima

Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-14 Thread Wesley Hayutin
On Mon, May 14, 2018 at 12:08 PM Clark Boylan <cboy...@sapwetik.org> wrote:

> On Mon, May 14, 2018, at 8:57 AM, Wesley Hayutin wrote:
> > On Mon, May 14, 2018 at 10:36 AM Jeremy Stanley <fu...@yuggoth.org>
> wrote:
> >
> > > On 2018-05-14 07:07:03 -0600 (-0600), Wesley Hayutin wrote:
> > > [...]
>
> snip
>
> > >
> > > This _doesn't_ sound to me like a problem with how we've designed
> > > our infrastructure, unless there are additional details you're
> > > omitting.
> >
> >
> > So the only thing out of our control is the package set on the base
> > nodepool image.
> > If that suddenly gets updated with too many packages, then we have to
> > scramble to ensure the images and containers are also udpated.
> > If there is a breaking change in the nodepool image for example [a], we
> > have to react to and fix that as well.
>
> Aren't the container images independent of the hosting platform (eg what
> infra hosts)? I'm not sure I understand why the host platform updating
> implies all the container images must also be updated.
>

You make a fine point here, I think as with anything there are some bits
that are still being worked on. At this moment it's my understanding that
pacemaker and possibly a few others components are not 100% containerized
atm.  I'm not an expert in the subject and my understanding may not be
correct.  Untill you are 100% containerized there may still be some
dependencies on the base image and an impact from changes.


>
> >
> >
> > > It sounds like a problem with how the jobs are designed
> > > and expectations around distros slowly trickling package updates
> > > into the series without occasional larger bursts of package deltas.
> > > I'd like to understand more about why you upgrade packages inside
> > > your externally-produced container images at job runtime at all,
> > > rather than relying on the package versions baked into them.
> >
> >
> > We do that to ensure the gerrit review itself and it's dependencies are
> > built via rpm and injected into the build.
> > If we did not do this the job would not be testing the change at all.
> >  This is a result of being a package based deployment for better or
> worse.
>
> You'd only need to do that for the change in review, not the entire system
> right?
>

Correct there is no intention of updating the entire distribution in run
time, the intent is to have as much updated in our jobs that build the
containers and images.
Only the rpm built zuul change should be included in the update, however
some zuul changes require a CentOS base package that was not previously
installed on the container e.g. a new python dependency introduced in a
zuul change.  Previously we had not enabled any CentOS repos in the
container update, but found that was not viable 100% of the time.

We have a change to further limit the scope of the update which should help
[1], especialy when facing a minor version update.

 [1] https://review.openstack.org/#/c/567550/

>
> >
>
> snip
>
> > > Our automation doesn't know that there's a difference between
> > > packages which were part of CentOS 7.4 and 7.5 any more than it
> > > knows that there's a difference between Ubuntu 16.04.2 and 16.04.3.
> > > Even if we somehow managed to pause our CentOS image updates
> > > immediately prior to 7.5, jobs would still try to upgrade those
> > > 7.4-based images to the 7.5 packages in our mirror, right?
> > >
> >
> > Understood, I suspect this will become a more widespread issue as
> > more projects start to use containers ( not sure ).  It's my
> understanding
> > that
> > there are some mechanisms in place to pin packages in the centos nodepool
> > image so
> > there has been some thoughts generally in the area of this issue.
>
> Again, I think we need to understand why containers would make this worse
> not better. Seems like the big feature everyone talks about when it comes
> to containers is isolating packaging whether that be python packages so
> that nova and glance can use a different version of oslo or cohabitating
> software that would otherwise conflict. Why do the packages on the host
> platform so strongly impact your container package lists?
>

I'll let others comment on that, however my thought is you don't move from
A -> Z in one step and containers do not make everything easier
immediately.  Like most things, it takes a little time.

>
> >
> > TripleO may be the exception to the rule here and that is fine, I'm more
> > interested in exploring
> > the possibilities of delivering updates in a staged fashion than

Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-14 Thread Wesley Hayutin
On Sun, May 13, 2018 at 11:50 PM Tristan Cacqueray <tdeca...@redhat.com>
wrote:

> On May 14, 2018 2:44 am, Wesley Hayutin wrote:
> [snip]
> > I do think it would be helpful to say have a one week change window where
> > folks are given the opportunity to preflight check a new image and the
> > potential impact on the job workflow the updated image may have.
> [snip]
>
> How about adding a periodic job that setup centos-release-cr in a pre
> task? This should highlight issues with up-coming updates:
> https://wiki.centos.org/AdditionalResources/Repositories/CR
>
> -Tristan
>

Thanks for the suggestion Tristan, going to propose using this repo at the
next TripleO mtg.

Thanks


> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul repo insertion in update/upgrade CI

2018-05-14 Thread Wesley Hayutin
On Mon, May 14, 2018 at 11:36 AM Jiří Stránský  wrote:

> Hi,
>
> this is mainly for CI folks and whom-it-may-concern.
>
> Recently we came across the topic of how to enable/disable zuul repos at
> various places in the CI jobs. For normal deploy jobs there's no need to
> customize, but for update/upgrade jobs there is. It's not entirely
> straightforward and there's quite a variety of enable/disable spots and
> combinations which can be useful.
>
> Even though improvements in this area are not very likely to get
> implemented right away, i had some thoughts on the topic so i wanted to
> capture them. I put the ideas into an etherpad:
>
> https://etherpad.openstack.org/p/tripleo-ci-zuul-repo-insertion
>
> Feel free to put some more thoughts there or ping me on IRC with
> anything related.
>
>
> Thanks
>
> Jirka
>
>
Thanks Jirka!!


> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Migration to Storyboard

2018-05-09 Thread Wesley Hayutin
On Wed, May 9, 2018 at 3:25 PM Alex Schultz  wrote:

> Hello tripleo folks,
>
> So we've been experimenting with migrating some squads over to
> storyboard[0] but this seems to be causing more issues than perhaps
> it's worth.  Since the upstream community would like to standardize on
> Storyboard at some point, I would propose that we do a cut over of all
> the tripleo bugs/blueprints from Launchpad to Storyboard.
>
> In the irc meeting this week[1], I asked that the tripleo-ci team make
> sure the existing scripts that we use to monitor bugs for CI support
> Storyboard.  I would consider this a prerequisite for the migration.
> I am thinking it would be beneficial to get this done before or as
> close to M2.
>
> Thoughts, concerns, etc?
>

Just clarifying.  You would like to have the tooling updated by M2, which
is fine I think.  However squads are not expected to change all their
existing procedures by M2 correct?   I'm concerned about migrating our
current kanban boards to storyboard by M2.

Thanks


>
> Thanks,
> -Alex
>
> [0] https://storyboard.openstack.org/#!/project_group/76
> [1]
> http://eavesdrop.openstack.org/meetings/tripleo/2018/tripleo.2018-05-08-14.00.log.html#l-42
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] gate jobs impacted RAX yum mirror

2018-05-09 Thread Wesley Hayutin
FYI.. https://bugs.launchpad.net/tripleo/+bug/1770298

I'm on #openstack-infra chatting w/ Ian atm.
Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-12 Thread Wesley Hayutin
On Wed, May 9, 2018 at 10:43 PM Wesley Hayutin <whayu...@redhat.com> wrote:

> FYI.. https://bugs.launchpad.net/tripleo/+bug/1770298
>
> I'm on #openstack-infra chatting w/ Ian atm.
> Thanks
>
>
Greetings,

I wanted to update  everyone on the status of the upstream tripleo check
and gate jobs.
There have been a series of infra related issues that caused the upstream
tripleo gates to go red.

1. The first issue hit was
https://bugs.launchpad.net/tripleo/+bug/1770298 which
caused package install errors
2. Shortly after #1 was resolved CentOS released 7.5 which comes directly
into the upstream repos untested and ungated.  Additionally the associated
qcow2 image and container-base images were not updated at the same time as
the yum repos.  https://bugs.launchpad.net/tripleo/+bug/1770355
3.  Related to #2 the container and bm image rpms were not in sync causing
https://bugs.launchpad.net/tripleo/+bug/1770692
4. Building the bm images was failing due to an open issue with the centos
kernel, thanks to Yatin and Alfredo for
https://review.rdoproject.org/r/#/c/13737/
5. To ensure the containers are updated to the latest rpms at build time,
we have the following patch from Alex
https://review.openstack.org/#/c/567636/.
6.  I also noticed that we are building the centos-base container in our
container build jobs, however it is not pushed out to the container
registeries because it is not included in the tripleo-common repo
<https://github.com/openstack/tripleo-common/blob/master/container-images/overcloud_containers.yaml.j2>
I would like to discuss this with some of the folks working on containers.
If we had an updated centos-base container I think some of these issues
would have been prevented.

The above issues were resolved, and the master promotion jobs all had
passed.  Thanks to all who were involved!

Once the promotion jobs pass and report status to the dlrn_api, a promotion
was triggered automatically to upload the promoted images, containers, and
updated dlrn hash.  This failed due to network latency in the tenant where
the tripleo-ci infra is hosted.  The issue is tracked here
https://bugs.launchpad.net/tripleo/+bug/1770860

Matt Young and myself worked well into the evening on Friday to diagnose
the issue and ended up having to execute the image, container and dlrn_hash
promotion outside of our tripleo-infra tenant.  Thanks to Matt for his
effort.

At the moment I have updated the ci status in #tripleo, the master check
and gate jobs are green in the upstream which should unblock merging most
patches.  The status of stable branches and third party ci is still being
investigated.

Automatic promotions are blocked until the network issues in the
tripleo-infra tenant are resolved.  The bug is marked with alert in
#tripleo.  Please see #tripleo for future status updates.

Thanks all
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-13 Thread Wesley Hayutin
On Sat, May 12, 2018 at 11:45 PM Emilien Macchi <emil...@redhat.com> wrote:

> On Sat, May 12, 2018 at 9:10 AM, Wesley Hayutin <whayu...@redhat.com>
> wrote:
>>
>> 2. Shortly after #1 was resolved CentOS released 7.5 which comes directly
>> into the upstream repos untested and ungated.  Additionally the associated
>> qcow2 image and container-base images were not updated at the same time as
>> the yum repos.  https://bugs.launchpad.net/tripleo/+bug/1770355
>>
>
> Why do we have this situation everytime the OS is upgraded to a major
> version? Can't we test the image before actually using it? We could have
> experimental jobs testing latest image and pin gate images to a specific
> one?
> Like we could configure infra to deploy centos 7.4 in our gate and 7.5 in
> experimental, so we can take our time to fix eventual problems and make the
> switch when we're ready, instead of dealing with fires (that usually come
> all together).
>
> It would be great to make a retrospective on this thing between tripleo ci
> & infra folks, and see how we can improve things.
>

I agree,
We need to in coordination with the infra team be able to pin / lock
content for production check and gate jobs while also have the ability to
stage new content e.g. centos 7.5 with experimental or periodic jobs.
In this particular case the ci team did check the tripleo deployment w/
centos 7.5 updates, however we did not stage or test what impact the centos
minor update would have on the upstream job workflow.
The key issue is that the base centos image used upstream can not be pinned
by the ci team, if say we could pin that image the ci team could pin the
centos repos used in ci and run staging jobs on the latest centos content.

I'm glad that you also see the need for some amount of coordination here,
I've been in contact with a few folks to initiate the conversation.

In an unrelated note, Sagi and I just fixed the network latency issue on
our promotion server, it was related to DNS.  Automatic promotions should
be back online.
Thanks all.


> --
> Emilien Macchi
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

2018-05-13 Thread Wesley Hayutin
On Sun, May 13, 2018 at 11:25 AM Jeremy Stanley <fu...@yuggoth.org> wrote:

> On 2018-05-13 08:25:25 -0600 (-0600), Wesley Hayutin wrote:
> [...]
> > We need to in coordination with the infra team be able to pin / lock
> > content for production check and gate jobs while also have the ability to
> > stage new content e.g. centos 7.5 with experimental or periodic jobs.
> [...]
>
> It looks like adjustments would be needed to DIB's centos-minimal
> element if we want to be able to pin it to specific minor releases.
> However, having to rotate out images in the fashion described would
> be a fair amount of manual effort and seems like it would violate
> our support expectations in governance if we end up pinning to older
> minor versions (for major LTS versions on the other hand, we expect
> to undergo this level of coordination but they come at a much slower
> pace with a lot more advance warning). If we need to add controlled
> roll-out of CentOS minor version updates, this is really no better
> than Fedora from the Infra team's perspective and we've already said
> we can't make stable branch testing guarantees for Fedora due to the
> complexity involved in using different releases for each branch and
> the need to support our stable branches longer than the distros are
> supporting the releases on which we're testing.
>

This is good insight Jeremy, thanks for replying.



>
> For example, how long would the distro maintainers have committed to
> supporting RHEL 7.4 after 7.5 was released? Longer than we're
> committing to extended maintenance on our stable/queens branches? Or
> would you expect projects to still continue to backport support for
> these minor platform bumps to all their stable branches too? And
> what sort of grace period should we give them before we take away
> the old versions? Also, how many minor versions of CentOS should we
> expect to end up maintaining in parallel? (Remember, every
> additional image means that much extra time to build and upload to
> all our providers, as well as that much more storage on our builders
> and in our Glance quotas.)
> --
> Jeremy Stanley
>

I think you may be describing a level of support that is far greater than
what I was thinking. I also don't want to tax the infra team w/ n+ versions
of the baseos to support.
I do think it would be helpful to say have a one week change window where
folks are given the opportunity to preflight check a new image and the
potential impact on the job workflow the updated image may have.  If I
could update or create a non-voting job w/ the new image that would provide
two things.

1. The first is the head's up, this new minor version of centos is coming
into the system and you have $x days to deal with it.
2. The ability to build a few non-voting jobs w/ the new image to see what
kind of impact it has on the workflow and deployments.

In this case the updated 7.5 CentOS image worked fine w/ TripleO, however
it did cause our gates to go red because..
a. when we update containers w/ zuul dependendencies all the base-os
updates were pulled in and jobs timed out.
b. a kernel bug workaround with virt-customize failed to work due the
kernel packages changed ( 3rd party job )
c. the containers we use were not yet at CentOS 7.5 but the bm image was
causing issues w/ pacemaker.
d. there may be a few more that I am forgetting, but hopefully the point is
made.

We can fix a lot of the issues and I'm not blaming anyone because if we
(tripleo ) thought of all the corner cases with our workflow we would have
been able to avoid some of these issues.  However it does seem like we get
hit by $something every time we update a minor version of the baseos.  My
preference would be to have a heads up and work through the issues than to
go immediately red and unable to merge patches.  I don't know if other
teams get impacted in similiar ways, and I understand this is a big ship
and updating CentOS may work just fine for everyone else.

Thanks all for your time and effort!




> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [docs] Automating documentation the tripleo way?

2018-05-17 Thread Wesley Hayutin
On Thu, May 17, 2018 at 10:22 AM Petr Kovar <pko...@redhat.com> wrote:

> On Wed, 16 May 2018 13:26:46 -0600
> Wesley Hayutin <whayu...@redhat.com> wrote:
>
> > On Wed, May 16, 2018 at 3:05 PM Doug Hellmann <d...@doughellmann.com>
> wrote:
> >
> > > Excerpts from Wesley Hayutin's message of 2018-05-16 12:51:25 -0600:
> > > > On Wed, May 16, 2018 at 2:41 PM Doug Hellmann <d...@doughellmann.com
> >
> > > wrote:
> > > >
> > > > > Excerpts from Petr Kovar's message of 2018-05-16 17:39:14 +0200:
> > > > > > Hi all,
> > > > > >
> > > > > > In the past few years, we've seen several efforts aimed at
> automating
> > > > > > procedural documentation, mostly centered around the OpenStack
> > > > > > installation guide. This idea to automatically produce and verify
> > > > > > installation steps or similar procedures was mentioned again at
> the
> > > last
> > > > > > Summit (
> https://etherpad.openstack.org/p/SYD-install-guide-testing).
> > > > > >
> > > > > > It was brought to my attention that the tripleo team has been
> > > working on
> > > > > > automating some of the tripleo deployment procedures, using a
> Bash
> > > script
> > > > > > with included comment lines to supply some RST-formatted
> narrative,
> > > for
> > > > > > example:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2
> > > > > >
> > > > > > The Bash script can then be converted to RST, e.g.:
> > > > > >
> > > > > >
> > > > >
> > >
> https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-queens-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-27/docs/build/
> > > > > >
> > > > > > Source Code:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/openstack/tripleo-quickstart-extras/tree/master/roles/collect-logs
> > > > > >
> > > > > > I really liked this approach and while I don't want to sound like
> > > selling
> > > > > > other people's work, I'm wondering if there is still an interest
> > > among
> > > > > the
> > > > > > broader OpenStack community in automating documentation like
> this?
> > > > > >
> > > > > > Thanks,
> > > > > > pk
> > > > > >
> > > > >
> > > > > Weren't the folks doing the training-labs or training-guides
> taking a
> > > > > similar approach? IIRC, they ended up implementing what amounted to
> > > > > their own installer for OpenStack, and then ended up with all of
> the
> > > > > associated upgrade and testing burden.
> > > > >
> > > > > I like the idea of trying to use some automation from this, but I
> > > wonder
> > > > > if we'd be better off extracting data from other tools, rather than
> > > > > building a new one.
> > > > >
> > > > > Doug
> > > > >
> > > >
> > > > So there really isn't anything new to create, the work is done and
> > > executed
> > > > on every tripleo change that runs in rdo-cloud.
> > >
> > > It wasn't clear what Petr was hoping to get. Deploying with TripleO is
> > > only one way to deploy, so we wouldn't be able to replace the current
> > > installation guides with the results of this work. It sounds like
> that's
> > > not the goal, though.
>
>
> Yes, I wasn't very clear on the goals as I didn't want to make too many
> assumptions before learning about technical details from other people.
> Ben's comments made me realize this approach would probably be best suited
> for generating documents such as quick start guides or tutorials that are
> procedural, yet they don't aim at describing multiple use cases.
>
>
> > > >
> > > > Instead of dismissing the idea upfront I'm more inclined to set an
> > > > achievable small step to see how well it works.  My thought would be
> to
> > > > focus on the upcoming all-in-one installer and the automated doc
> > > generated
> > >

Re: [openstack-dev] [docs] Automating documentation the tripleo way?

2018-05-16 Thread Wesley Hayutin
On Wed, May 16, 2018 at 2:41 PM Doug Hellmann  wrote:

> Excerpts from Petr Kovar's message of 2018-05-16 17:39:14 +0200:
> > Hi all,
> >
> > In the past few years, we've seen several efforts aimed at automating
> > procedural documentation, mostly centered around the OpenStack
> > installation guide. This idea to automatically produce and verify
> > installation steps or similar procedures was mentioned again at the last
> > Summit (https://etherpad.openstack.org/p/SYD-install-guide-testing).
> >
> > It was brought to my attention that the tripleo team has been working on
> > automating some of the tripleo deployment procedures, using a Bash script
> > with included comment lines to supply some RST-formatted narrative, for
> > example:
> >
> >
> https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2
> >
> > The Bash script can then be converted to RST, e.g.:
> >
> >
> https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-queens-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-27/docs/build/
> >
> > Source Code:
> >
> >
> https://github.com/openstack/tripleo-quickstart-extras/tree/master/roles/collect-logs
> >
> > I really liked this approach and while I don't want to sound like selling
> > other people's work, I'm wondering if there is still an interest among
> the
> > broader OpenStack community in automating documentation like this?
> >
> > Thanks,
> > pk
> >
>
> Weren't the folks doing the training-labs or training-guides taking a
> similar approach? IIRC, they ended up implementing what amounted to
> their own installer for OpenStack, and then ended up with all of the
> associated upgrade and testing burden.
>
> I like the idea of trying to use some automation from this, but I wonder
> if we'd be better off extracting data from other tools, rather than
> building a new one.
>
> Doug
>

So there really isn't anything new to create, the work is done and executed
on every tripleo change that runs in rdo-cloud.

Instead of dismissing the idea upfront I'm more inclined to set an
achievable small step to see how well it works.  My thought would be to
focus on the upcoming all-in-one installer and the automated doc generated
with that workflow.  I'd like to target publishing the all-in-one tripleo
installer doc to [1] for Stein and of course a section of tripleo.org.

What do you think?

[1] https://docs.openstack.org/queens/deploy/



>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin
On Tue, May 15, 2018 at 1:29 PM James E. Blair  wrote:

> Jeremy Stanley  writes:
>
> > On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
> > [...]
> >> We're also talking about making a new kind of job which can continue to
> >> run after it's "finished" so that you could use it to do something like
> >> host a container registry that's used by other jobs running on the
> >> change.  We don't have that feature yet, but if we did, would you prefer
> >> to use that instead of the intermediate swift storage?
> >
> > If the subsequent jobs depending on that one get nodes allocated
> > from the same provider, that could solve a lot of the potential
> > network performance risks as well.
>
> That's... tricky.  We're *also* looking at affinity for buildsets, and
> I'm optimistic we'll end up with something there eventually, but that's
> likely to be a more substantive change and probably won't happen as
> soon.  I do agree it will be nice, especially for use cases like this.
>
> -Jim
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


There is a lot here to unpack and discuss, but I really like the ideas I'm
seeing.
Nice work Bogdan!  I've added it the tripleo meeting agenda for next week
so we can continue socializing the idea and get feedback.

Thanks!

https://etherpad.openstack.org/p/tripleo-meeting-items
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin
On Tue, May 15, 2018 at 11:42 AM Jeremy Stanley  wrote:

> On 2018-05-15 17:31:07 +0200 (+0200), Bogdan Dobrelya wrote:
> [...]
> > * upload into a swift container, with an automatic expiration set, the
> > de-duplicated and compressed tarball created with something like:
> >   # docker save $(docker images -q) | gzip -1 > all.tar.xz
> > (I expect it will be something like a 2G file)
> > * something similar for DLRN repos prolly, I'm not an expert for this
> part.
> >
> > Then those stored artifacts to be picked up by the next step in the
> graph,
> > deploying undercloud and overcloud in the single step, like:
> > * fetch the swift containers with repos and container images
> [...]
>
> I do worry a little about network fragility here, as well as
> extremely variable performance. Randomly-selected job nodes could be
> shuffling those files halfway across the globe so either upload or
> download (or both) will experience high round-trip latency as well
> as potentially constrained throughput, packet loss,
> disconnects/interruptions and so on... all the things we deal with
> when trying to rely on the Internet, except magnified by the
> quantity of data being transferred about.
>
> Ultimately still worth trying, I think, but just keep in mind it may
> introduce more issues than it solves.
> --
> Jeremy Stanley
>

Question...   If we were to build or update the containers that need an
update and I'm assuming the overcloud images here as well as a parent job.

The content would then sync to a swift file server on a central point for
ALL the openstack providers or it would be sync'd to each cloud?

Not to throw too much cold water on the idea, but...
I wonder if the time to upload and download the containers and images would
significantly reduce any advantage this process has.

Although centralizing the container updates and images on a per check job
basis sounds attractive, I get the sense we need to be very careful and
fully vett the idea.  At the moment it's also an optimization ( maybe ) so
I don't see this as a very high priority atm.

Let's bring the discussion the tripleo meeting next week.  Thanks all!



> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Wesley Hayutin
On Mon, May 14, 2018 at 3:16 PM Sagi Shnaidman  wrote:

> Hi, Bogdan
>
> I like the idea with undercloud job. Actually if undercloud fails, I'd
> stop all other jobs, because it doens't make sense to run them. Seeing the
> same failure in 10 jobs doesn't add too much. So maybe adding undercloud
> job as dependency for all multinode jobs would be great idea. I think it's
> worth to check also how long it will delay jobs. Will all jobs wait until
> undercloud job is running? Or they will be aborted when undercloud job is
> failing?
>
> However I'm very sceptical about multinode containers and scenarios jobs,
> they could fail because of very different reasons, like race conditions in
> product or infra issues. Having skipping some of them will lead to more
> rechecks from devs trying to discover all problems in a row, which will
> delay the development process significantly.
>
> Thanks
>

I agree on both counts w/ Sagi here.
Thanks Sagi

>
>
> On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya 
> wrote:
>
>> An update for your review please folks
>>
>> Bogdan Dobrelya  writes:
>>>
>>> Hello.
 As Zuul documentation [0] explains, the names "check", "gate", and
 "post"  may be altered for more advanced pipelines. Is it doable to
 introduce, for particular openstack projects, multiple check
 stages/steps as check-1, check-2 and so on? And is it possible to make
 the consequent steps reusing environments from the previous steps
 finished with?

 Narrowing down to tripleo CI scope, the problem I'd want we to solve
 with this "virtual RFE", and using such multi-staged check pipelines,
 is reducing (ideally, de-duplicating) some of the common steps for
 existing CI jobs.

>>>
>>> What you're describing sounds more like a job graph within a pipeline.
>>> See:
>>> https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
>>> for how to configure a job to run only after another job has completed.
>>> There is also a facility to pass data between such jobs.
>>>
>>> ... (skipped) ...
>>>
>>> Creating a job graph to have one job use the results of the previous job
>>> can make sense in a lot of cases.  It doesn't always save *time*
>>> however.
>>>
>>> It's worth noting that in OpenStack's Zuul, we have made an explicit
>>> choice not to have long-running integration jobs depend on shorter pep8
>>> or tox jobs, and that's because we value developer time more than CPU
>>> time.  We would rather run all of the tests and return all of the
>>> results so a developer can fix all of the errors as quickly as possible,
>>> rather than forcing an iterative workflow where they have to fix all the
>>> whitespace issues before the CI system will tell them which actual tests
>>> broke.
>>>
>>> -Jim
>>>
>>
>> I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
>> undercloud deployments vs upgrades testing (and some more). Given that
>> those undercloud jobs have not so high fail rates though, I think Emilien
>> is right in his comments and those would buy us nothing.
>>
>> From the other side, what do you think folks of making the
>> tripleo-ci-centos-7-3nodes-multinode depend on
>> tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily
>> and long running, and is non-voting. It deploys (see featuresets configs
>> [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the
>> containers-multinode fails - see the CI stats page [4]. I've found only a 2
>> cases there for the otherwise situation, when containers-multinode fails,
>> but 3nodes-multinode passes. So cutting off those future failures via the
>> dependency added, *would* buy us something and allow other jobs to wait
>> less to commence, by a reasonable price of somewhat extended time of the
>> main zuul pipeline. I think it makes sense and that extended CI time will
>> not overhead the RDO CI execution times so much to become a problem. WDYT?
>>
>> [0] https://review.openstack.org/#/c/568275/
>> [1] https://review.openstack.org/#/c/568278/
>> [2] https://review.openstack.org/#/c/568326/
>> [3]
>> https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
>> [4] http://tripleo.org/cistatus.html
>>
>> * ignore the column 1, it's obsolete, all CI jobs now using configs
>> download AFAICT...
>>
>> --
>> Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [docs] Automating documentation the tripleo way?

2018-05-16 Thread Wesley Hayutin
On Wed, May 16, 2018 at 3:05 PM Doug Hellmann  wrote:

> Excerpts from Wesley Hayutin's message of 2018-05-16 12:51:25 -0600:
> > On Wed, May 16, 2018 at 2:41 PM Doug Hellmann 
> wrote:
> >
> > > Excerpts from Petr Kovar's message of 2018-05-16 17:39:14 +0200:
> > > > Hi all,
> > > >
> > > > In the past few years, we've seen several efforts aimed at automating
> > > > procedural documentation, mostly centered around the OpenStack
> > > > installation guide. This idea to automatically produce and verify
> > > > installation steps or similar procedures was mentioned again at the
> last
> > > > Summit (https://etherpad.openstack.org/p/SYD-install-guide-testing).
> > > >
> > > > It was brought to my attention that the tripleo team has been
> working on
> > > > automating some of the tripleo deployment procedures, using a Bash
> script
> > > > with included comment lines to supply some RST-formatted narrative,
> for
> > > > example:
> > > >
> > > >
> > >
> https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/overcloud-prep-images/templates/overcloud-prep-images.sh.j2
> > > >
> > > > The Bash script can then be converted to RST, e.g.:
> > > >
> > > >
> > >
> https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-queens-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-27/docs/build/
> > > >
> > > > Source Code:
> > > >
> > > >
> > >
> https://github.com/openstack/tripleo-quickstart-extras/tree/master/roles/collect-logs
> > > >
> > > > I really liked this approach and while I don't want to sound like
> selling
> > > > other people's work, I'm wondering if there is still an interest
> among
> > > the
> > > > broader OpenStack community in automating documentation like this?
> > > >
> > > > Thanks,
> > > > pk
> > > >
> > >
> > > Weren't the folks doing the training-labs or training-guides taking a
> > > similar approach? IIRC, they ended up implementing what amounted to
> > > their own installer for OpenStack, and then ended up with all of the
> > > associated upgrade and testing burden.
> > >
> > > I like the idea of trying to use some automation from this, but I
> wonder
> > > if we'd be better off extracting data from other tools, rather than
> > > building a new one.
> > >
> > > Doug
> > >
> >
> > So there really isn't anything new to create, the work is done and
> executed
> > on every tripleo change that runs in rdo-cloud.
>
> It wasn't clear what Petr was hoping to get. Deploying with TripleO is
> only one way to deploy, so we wouldn't be able to replace the current
> installation guides with the results of this work. It sounds like that's
> not the goal, though.
>
> >
> > Instead of dismissing the idea upfront I'm more inclined to set an
> > achievable small step to see how well it works.  My thought would be to
> > focus on the upcoming all-in-one installer and the automated doc
> generated
> > with that workflow.  I'd like to target publishing the all-in-one tripleo
> > installer doc to [1] for Stein and of course a section of tripleo.org.
>
> As an official project, why is TripleO still publishing docs to its own
> site? That's not something we generally encourage.
>
> That said, publishing a new deployment guide based on this technique
> makes sense in general. What about Ben's comments elsewhere in the
> thread?
>

I think Ben is referring to an older implementation and a slightly
different design but still has some points that we would want to be mindful
of.   I think this is a worthy effort to take another pass at this
regarless to be honest as we've found a good combination of interested
folks and sometimes the right people make all the difference.

My personal opinion is that I'm not expecting the automated doc generation
to be upload ready to a doc server after each run.  I do expect it to do
95% of the work, and to help keep the doc up to date with what is executed
in the latest releases of TripleO.  Also noting the doc used is a mixture
of static and generated documentation which I think worked out quite well
in order to not soley rely on what is executed in ci.

So again, my thought is to create a small achievable goal and see where the
collaboration takes us.

Thanks


>
> Doug
>
> >
> > What do you think?
> >
> > [1] https://docs.openstack.org/queens/deploy/
> >
> > >
> > >
> __
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Re: [openstack-dev] [tripleo] tripleo gate is blocked - please read

2018-06-16 Thread Wesley Hayutin
On Sat, Jun 16, 2018 at 10:21 AM Paul Belanger 
wrote:

> On Sat, Jun 16, 2018 at 12:47:10PM +, Jeremy Stanley wrote:
> > On 2018-06-15 23:15:01 -0700 (-0700), Emilien Macchi wrote:
> > [...]
> > > ## Dockerhub proxy issue
> > > Infra using wrong image layer object storage proxy for Dockerhub:
> > > https://review.openstack.org/#/c/575787/
> > > Huge thanks to infra team, specially Clark for fixing this super
> quickly,
> > > it clearly helped to stabilize our container jobs, I actually haven't
> seen
> > > timeouts since we merged your patch. Thanks a ton!
> > [...]
> >
> > As best we can tell from logs, the way Dockerhub served these images
> > changed a few weeks ago (at the end of May) leading to this problem.
> > --
> > Jeremy Stanley
>
> Should also note what we are doing here is a terrible hack, we've only
> been able
> to learn the information by sniffing the traffic to hub.docker.io for our
> reverse
> proxy cache configuration. It is also possible this can break in the
> future too,
> so something to always keep in the back of your mind.
>

Thanks Paul, Jeremy and the other infra folks involved.   The TripleO CI
team is working towards tracking the time on some of these container tasks
atm.  Thanks for doing what you guys could do given the circumstances.


>
> It would be great if docker tools just worked with HTTP proxies.
>
> -Paul
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] scenario000-multinode-oooq-container-upgrades

2018-06-11 Thread Wesley Hayutin
Greetings,

I wanted to let everyone know that we have a keystone only deployment and
upgrade job in check non-voting.  I'm asking everyone in TripleO to be
mindful of this job and to help make sure it continues to pass as we move
it from non-voting check to check and eventually gating.

Upgrade jobs are particularly difficult to keep running successfully
because of the complex workflow itself, job run times and other factors.
Your help to ensure we don't merge w/o a pass on this job will go a long
way in helping the tripleo upgrades team.

There is still work to be done here, however it's much easier to do it with
the check non-voting job in place.

Thanks all
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI is down stop workflowing

2018-06-19 Thread Wesley Hayutin
Check and gate jobs look clear.
More details on a bit.

Thanks

Sent from my mobile

On Tue, Jun 19, 2018, 07:33 Felix Enrique Llorente Pastora <
ellor...@redhat.com> wrote:

> Hi,
>
>We have the following bugs with fixes that need to land to unblock
> check/gate jobs:
>
>https://bugs.launchpad.net/tripleo/+bug/1777451
>https://bugs.launchpad.net/tripleo/+bug/1777616
>
>You can check them out at #tripleo ooolpbot.
>
>Please stop workflowing temporally until they get merged.
>
> BR.
>
> --
> Quique Llorente
>
> Openstack TripleO CI
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] zuul change gating repo name change

2018-06-13 Thread Wesley Hayutin
Greetings,

Please be aware the yum repo created in tripleo ci jobs is going to change
names to include the release [1].  This is done to ensure that only the
appropriate patches are installed when patches from multiple branches are
in play.  This is especially important to upgrade jobs.

If you are working on a patch that uses the gating.repo, this patch [1]
will impact your work.

Thank you!!

[1] https://review.openstack.org/#/c/572736/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario000-multinode-oooq-container-upgrades

2018-06-12 Thread Wesley Hayutin
On Tue, Jun 12, 2018 at 11:21 AM James Slagle 
wrote:

> On Tue, Jun 12, 2018 at 11:03 AM, Jiří Stránský  wrote:
> > On 12.6.2018 15:06, James Slagle wrote:
> >>
> >> On Mon, Jun 11, 2018 at 3:34 PM, Wesley Hayutin 
> >> wrote:
> >>>
> >>> Greetings,
> >>>
> >>> I wanted to let everyone know that we have a keystone only deployment
> and
> >>> upgrade job in check non-voting.  I'm asking everyone in TripleO to be
> >>> mindful of this job and to help make sure it continues to pass as we
> move
> >>> it
> >>> from non-voting check to check and eventually gating.
> >>
> >>
> >> +1, nice work!
> >>
> >>> Upgrade jobs are particularly difficult to keep running successfully
> >>> because
> >>> of the complex workflow itself, job run times and other factors.  Your
> >>> help
> >>> to ensure we don't merge w/o a pass on this job will go a long way in
> >>> helping the tripleo upgrades team.
> >>>
> >>> There is still work to be done here, however it's much easier to do it
> >>> with
> >>> the check non-voting job in place.
> >>
> >>
> >> The job doesn't appear to be passing at all on stable/queens. I see
> >> this same failure on several patches:
> >>
> >>
> http://logs.openstack.org/59/571459/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades/8bbd827/logs/undercloud/home/zuul/overcloud_upgrade_run_Controller.log.txt.gz
> >>
> >> Is this a known issue?
> >
> >
> > I think so, or to put it precisely, i only ever looked into making the
> job
> > work for master (and beyond).
> >
> > We could look into making it work on Queens too, but personally i think
> > effort would be better spent elsewhere at this point. E.g. upd+upg jobs
> with
> > more complete of services utilizing containerized undercloud (those would
> > not validate OC workflow at all, but would give coverage for
> > update_tasks/upgrade_tasks), user and dev docs around all lifecycle ops
> > (upd, upg, ffwd), upgrade work in the area of TLS by default, upgrade
> > handling for external_deploy_tasks (= "how do we upgrade Ceph in Rocky"),
> > also perhaps trying to DRY repeated parts of upgrade templates, etc.
> >
> > If someone wants to step up to iron out Queens issues with that job then
> we
> > can do it, but my 2 cents would be just to disable the job on Queens and
> > focus on the future.
>
> Sure, I'm just trying to figure out what can safely be ignored. The
> tone of the original email was encouraging reviewers not to ignore the
> job. Let's remove it from queens then, as right now it's just noise.
>

I think we missed a patch [1] to correctly set the release for the job.
I'll take a look at the results.

I may have jumped the gun w/ the tone of the email w/ regards to keeping it
running.  I'll make the adjustment on queens for now [2].

Thanks for catching that James, Jirka!

[1] https://review.openstack.org/#/c/574417/
[2] https://review.openstack.org/574794


>
>
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Status of Standalone installer (aka All-In-One)

2018-06-05 Thread Wesley Hayutin
On Tue, Jun 5, 2018 at 3:31 AM Raoul Scarazzini  wrote:

> On 05/06/2018 02:26, Emilien Macchi wrote:
> [...]
> > I hope this update was useful, feel free to give feedback or ask any
> > questions,
> [...]
>
> I'm no prophet here, but I see a bright future for this approach. I can
> imagine how useful this can be on the testing and much more the learning
> side. Thanks for sharing!
>
> --
> Raoul Scarazzini
> ra...@redhat.com


Real big +1 to everyone who has contributed to the standalone
installer.
>From an end user experience, this is simple, fast! This is going to be the
base for some really cool work.

Emilien, the CI is working, enjoy your PTO :)
http://logs.openstack.org/17/572217/6/check/tripleo-ci-centos-7-standalone/b2eb1b7/logs/ara_oooq/result/bb49965e-4fb7-43ea-a9e3-c227702c17de/

Thanks!



>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Overriding project-templates in Zuul

2018-05-01 Thread Wesley Hayutin
On Tue, May 1, 2018 at 1:23 PM Emilien Macchi  wrote:

> On Tue, May 1, 2018 at 10:02 AM, James E. Blair 
> wrote:
> [...]
>
> Okay, let's summarize:
>>
>> Proposal 1: All project-template and project-local job variants matching
>> the item's branch must also match the item.
>>
>> * Files and irrelevant-files on project-template and project stanzas are
>>   essentially combined in a set intersection.
>> * It's possible to further reduce the scope of jobs, but not expand.
>> * Files and irrelevant-files are still independent matchers, and if both
>>   are present, both must match.
>> * It's not possible to alter a job attribute by adding a project-local
>>   variant with only a files matcher (it would cause the whole job to run
>>   or not run).  But it's still possible to do that in the main job
>>   definition itself.
>>
>> Proposal 2: Files and irrelevant-files are treated as overwriteable
>> attributes and evaluated after branch-matching variants are combined.
>>
>> * Files and irrelevant-files are overwritten, so the last value
>>   encountered when combining all the matching variants (looking only at
>>   branches) wins.
>> * Files and irrelevant-files will be treated as a pair, so that if
>>   "irrelevant-files" appears, it will erase a previous "files"
>>   attribute.
>> * It's possible to both reduce and expand the scope of jobs, but the
>>   user may need to manually copy values from a parent or other variant
>>   in order to do so.
>> * It will no longer be possible to alter a job attribute by adding a
>>   variant with only a files matcher -- in all cases files and
>>   irrelevant-files are used solely to determine whether the job is run,
>>   not to determine whether to apply a variant.
>>
>> I think both would be good solutions to the problem.  The key points for
>> me are whether we want to keep the "alter a job attribute with variant
>> with a files matcher" functionality (the "rebuild_index" example from
>> above), and whether the additional control of overwriting the matchers
>> (at the cost of redundancy in configuration) is preferable to combining
>> the matchers.
>>
>
> In the case of TripleO, I think proposal 2 is what we want.
> We have stanzas defined in the templates definitions in
> openstack-infra/tripleo-ci repo, but really want to override the file rules
> per repo (openstack/tripleo-quickstart for example) and I don't think we
> want to have them both matching but so the last value encountered would win.
> I'll let TripleO CI squad to give more thoughts though.
>
> Thanks,
> --
> Emilien Macchi
>

I agree,
Proposal #2 makes the most sense to me and seems more straight forward in
that you have the ability to override and that the project overriding would
need to handle both files and irrelevant-files from scratch.

Nice write up



> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] TripleO Tempest squad status 07/03/2018

2018-07-03 Thread Wesley Hayutin
Greetings,

The TripleO Tempest squad has just completed Sprint 15 (6-15 - 7/03).
The following is a summary of activities during this sprint.

Epic:
# Sprint 15 Epic ( Tempest Squad): Finish the refactoring of the core
OpenStack services in python-tempestconf, refstack certification tests, and
other miscellaneous work.  Chandan was on PTO most of the sprint, Arx was
active helping to resolve tempest issues, and Martin was focused on
refstack certifications so the progress was a little less than normal this
sprint.

For a list of the completed and remaining items for the sprint please refer
to the following Epic card and the sub cards.
https://trello.com/c/6QKG0HkU/801-sprint-15-python-tempestconf

Items to Note:
* Full runs of tempest are again fully passing in upstream master, queens.
Pike will be unblocked when https://review.openstack.org/#/c/579937/ merges.
* Chandan has volunteered to ruck / rove this sprint, so the team will
again be operating with only two active team members
* New documentation was created for containerized tempest
   *
https://docs.openstack.org/tripleo-docs/latest/install/basic_deployment/tempest.html#running-containerized-tempest-manually
   * Look for an upstream discussion around moving as much tempest
documentation to the tempest project as possible.
* Sprint 16 is the final sprint that will focus on refactoring
python-tempestconf.
-- 

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.comT: +1919 <+19197544114>4232509
   IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] TripleO CI squad status 7/03/2018

2018-07-03 Thread Wesley Hayutin
Apologies,
The dark theme in my browser changed the font color of the text.

The email is available at
http://lists.openstack.org/pipermail/openstack-dev/2018-July/131984.html
Thank you!


On Tue, Jul 3, 2018 at 7:13 PM Wesley Hayutin  wrote:

> Greetings
>
> The TripleO squad has just completed Sprint 15 (6-15 - 7/03).
> The following is a summary of activities during this sprint.
>
> Epic:
> # Sprint 15 Epic (CI Squad): Begin migration of upstream jobs to native
> zuulv3.
> For a list of the completed and remaining items for the sprint please
> refer to the following Epic card and the sub cards.
> https://trello.com/c/bQuQ9aWF/802-sprint-15-ci-goals
>
> Items to Note:
> * Timeouts in jobs are a recurring issue upstream.  How to handle and fix
> the timeouts is under discussion.  Note, containers may be contributing to
> the timeouts.
>
> Ruck / Rover:
>
> TripleO Master, 0 days since last promotion
> TripleO Queens, 2 days since last promotion
> TripleO Pike, 20 days since last promotion
>* This is failing in tempest and should be resolved with
> https://review.openstack.org/#/c/579937/
>
> https://review.rdoproject.org/etherpad/p/ruckrover-sprint15
>
>
> CRITICAL IN PROGRESS
> #1779561 No realm key for 'realm1'
> tripleo  Assignee: None  Reporter: wes hayutin  2 days old  Tags:
> promotion-blocker   6
> CRITICAL IN PROGRESS
> #1779271 periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens
> Details: volume c414293d-eb0f-4d74-8b4d-f9a15e23d399 failed to reach in-use
> status (current available) within the required time (500 s).
> tripleo  Assignee: yatin  Reporter: Quique Llorente  4 days old  Tags:
> promotion-blocker   14
> CRITICAL FIX RELEASED
> #1779263 "AnsibleUndefinedVariable: 'dict object' has no attribute
> 'overcloud'"} at
> periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset010-master
> tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  4 days old
> Tags: promotion-blocker   6
> CRITICAL FIX RELEASED
> #1778847 fs027 __init__() got an unexpected keyword argument 'cafile'
> tripleo  Assignee: wes hayutin  Reporter: Quique Llorente  6 days old
> Tags: promotion-blocker  quickstart  6
> CRITICAL FIX RELEASED
> #1778472 docker pull failed: Get
> https://registry-1.docker.io/v2/tripleomaster/centos-binary-rsyslog-base/manifests/current-tripleo:
> received unexpected HTTP status: 503 Service Unavailable
> tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  8 days old
> Tags: alert  ci  promotion-blocker   6
> CRITICAL FIX RELEASED
> #1778201 os-refresh-config undercloud install Error: Evaluation Error:
> Error while evaluating a Function Call, pick(): must receive at least one
> non empty
> tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  11 days
> old  Tags: ci  promotion-blocker   6
> CRITICAL FIX RELEASED
> #1778040 Error at overcloud_prep_containers Package:
> qpid-dispatch-router-0.8.0-1.el7.x86_64 (@delorean-master-testing)", "
> Requires: libqpid-proton.so.10()(64bit)
> tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  12 days old
> Tags: alert  ci  promotion-blocker  quickstart   10
> CRITICAL FIX RELEASED
> #159 pike, volume failed to build in error status. list index out of
> range in cinder
> tripleo  Assignee: wes hayutin  Reporter: wes hayutin  13 days old  Tags:
> alert  promotion-blocker   12
> CRITICAL FIX RELEASED
> #1777616 Undercloud installation is failing: Class[Neutron]: has no
> parameter named 'rabbit_hosts'
> tripleo  Assignee: yatin  Reporter: yatin  14 days old  Tags: alert
> promotion-blocker   6
> CRITICAL FIX RELEASED
> #1777541 undercloud install error, mistra 503 unavailable
> tripleo  Assignee: Alex Schultz  Reporter: wes hayutin  14 days old  Tags:
> alert  promotion-blocker   10
> CRITICAL FIX RELEASED
> #1777451 Error: /Stage[main]/Ceph::Rgw::Keystone::Auth/Keystone_role
> Duplicate entry found with name Member
> tripleo  Assignee: Quique Llorente  Reporter: wes hayutin  15 days old
> Tags: promotion-blocker   18
> CRITICAL FIX RELEASED
> #1777261 convert-overcloud-undercloud.yml fails on missing
> update_containers variable
> tripleo  Assignee: Sagi (Sergey) Shnaidman  Reporter: wes hayutin  17 days
> old  Tags: promotion-blocker   6
> CRITICAL FIX RELEASED
> #1777168 Failures to build python-networking-ovn
> tripleo  Assignee: Emilien Macchi  Reporter: Emilien Macchi  18 days old
> Tags: alert  ci  promotion-blocker   6
> CRITICAL FIX RELEASED
> #1777130 RDO cloud is down
> tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  18 days
> old  Tags: alert  promotion-blocker
> --
>
> Wes Hayutin
>
> Associate MANAGER
>
> Red Hat
>
>

[openstack-dev] [tripleo] TripleO CI squad status 7/03/2018

2018-07-03 Thread Wesley Hayutin
Greetings

The TripleO squad has just completed Sprint 15 (6-15 - 7/03).
The following is a summary of activities during this sprint.

Epic:
# Sprint 15 Epic (CI Squad): Begin migration of upstream jobs to native
zuulv3.
For a list of the completed and remaining items for the sprint please refer
to the following Epic card and the sub cards.
https://trello.com/c/bQuQ9aWF/802-sprint-15-ci-goals

Items to Note:
* Timeouts in jobs are a recurring issue upstream.  How to handle and fix
the timeouts is under discussion.  Note, containers may be contributing to
the timeouts.

Ruck / Rover:

TripleO Master, 0 days since last promotion
TripleO Queens, 2 days since last promotion
TripleO Pike, 20 days since last promotion
   * This is failing in tempest and should be resolved with
https://review.openstack.org/#/c/579937/

https://review.rdoproject.org/etherpad/p/ruckrover-sprint15


CRITICAL IN PROGRESS
#1779561 No realm key for 'realm1'
tripleo  Assignee: None  Reporter: wes hayutin  2 days old  Tags:
promotion-blocker   6
CRITICAL IN PROGRESS
#1779271 periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens
Details: volume c414293d-eb0f-4d74-8b4d-f9a15e23d399 failed to reach in-use
status (current available) within the required time (500 s).
tripleo  Assignee: yatin  Reporter: Quique Llorente  4 days old  Tags:
promotion-blocker   14
CRITICAL FIX RELEASED
#1779263 "AnsibleUndefinedVariable: 'dict object' has no attribute
'overcloud'"} at
periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset010-master
tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  4 days old
Tags: promotion-blocker   6
CRITICAL FIX RELEASED
#1778847 fs027 __init__() got an unexpected keyword argument 'cafile'
tripleo  Assignee: wes hayutin  Reporter: Quique Llorente  6 days old
Tags: promotion-blocker  quickstart  6
CRITICAL FIX RELEASED
#1778472 docker pull failed: Get
https://registry-1.docker.io/v2/tripleomaster/centos-binary-rsyslog-base/manifests/current-tripleo:
received unexpected HTTP status: 503 Service Unavailable
tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  8 days old
Tags: alert  ci  promotion-blocker   6
CRITICAL FIX RELEASED
#1778201 os-refresh-config undercloud install Error: Evaluation Error:
Error while evaluating a Function Call, pick(): must receive at least one
non empty
tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  11 days old
Tags: ci  promotion-blocker   6
CRITICAL FIX RELEASED
#1778040 Error at overcloud_prep_containers Package:
qpid-dispatch-router-0.8.0-1.el7.x86_64 (@delorean-master-testing)", "
Requires: libqpid-proton.so.10()(64bit)
tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  12 days old
Tags: alert  ci  promotion-blocker  quickstart   10
CRITICAL FIX RELEASED
#159 pike, volume failed to build in error status. list index out of
range in cinder
tripleo  Assignee: wes hayutin  Reporter: wes hayutin  13 days old  Tags:
alert  promotion-blocker   12
CRITICAL FIX RELEASED
#1777616 Undercloud installation is failing: Class[Neutron]: has no
parameter named 'rabbit_hosts'
tripleo  Assignee: yatin  Reporter: yatin  14 days old  Tags: alert
promotion-blocker   6
CRITICAL FIX RELEASED
#1777541 undercloud install error, mistra 503 unavailable
tripleo  Assignee: Alex Schultz  Reporter: wes hayutin  14 days old  Tags:
alert  promotion-blocker   10
CRITICAL FIX RELEASED
#1777451 Error: /Stage[main]/Ceph::Rgw::Keystone::Auth/Keystone_role
Duplicate entry found with name Member
tripleo  Assignee: Quique Llorente  Reporter: wes hayutin  15 days old
Tags: promotion-blocker   18
CRITICAL FIX RELEASED
#1777261 convert-overcloud-undercloud.yml fails on missing
update_containers variable
tripleo  Assignee: Sagi (Sergey) Shnaidman  Reporter: wes hayutin  17 days
old  Tags: promotion-blocker   6
CRITICAL FIX RELEASED
#1777168 Failures to build python-networking-ovn
tripleo  Assignee: Emilien Macchi  Reporter: Emilien Macchi  18 days old
Tags: alert  ci  promotion-blocker   6
CRITICAL FIX RELEASED
#1777130 RDO cloud is down
tripleo  Assignee: Quique Llorente  Reporter: Quique Llorente  18 days old
Tags: alert  promotion-blocker
-- 

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.comT: +1919 <+19197544114>4232509
   IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo-upgrade pike branch

2018-01-19 Thread Wesley Hayutin
Thanks Marius for sending this out and kicking off a conversation.

On Tue, Jan 2, 2018 at 12:56 PM, Marius Cornea  wrote:

> Hi everyone and Happy New Year!
>
> As the migration of tripleo-upgrade repo to the openstack namespace is
> now complete I think it's the time to create a Pike branch to capture
> the current state so we can use it for Pike testing and keep the
> master branch for Queens changes. The update/upgrade steps are
> changing between versions and the aim of branching the repo is to keep
> the update/upgrade steps clean per branch to avoid using conditionals
> based on release. Also tripleo-upgrade should be compatible with
> different tools used for deployment(tripleo-quickstart, infrared,
> manual deployments) which use different vars for the version release
> so in case of using conditionals we would need extra steps to
> normalize these variables.
>

I understand the desire to create a branch to protect the work that has
been done previously.
The interesting thing is that you guys are proposing to use a branched
ansible role with
a branchless upstream project.  I want to make sure we have enough review
so that we don't hit issues
in the future.   Maybe that is OK, but I have at least one concern.

My conern is about gating the tripleo-upgrade role and it's branches.  When
tripleo-quickstart is changed
which is branchless we will be have to kick off a job for each
tripleo-upgrade branch?  That immediately doubles
the load on gates.

It's extemely important to properly gate this role against the versions of
TripleO and OSP.  I see very limited
check jobs and gate jobs on tripleo-upgrades atm.  I have only found [1].
 I think we need to see some external and internal
jobs checking and gating this role with comments posted to changes.

[1]
https://review.rdoproject.org/jenkins/job/gate-tripleo-ci-centos-7-containers-multinode-upgrades-pike/



>
> I wanted to bring this topic up for discussion to see if branching is
> the proper thing to do here.
>
> Thanks,
> Marius
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Many timeouts in zuul gates for TripleO

2018-01-19 Thread Wesley Hayutin
On Fri, Jan 19, 2018 at 12:23 PM, Ben Nemec  wrote:

>
>
> On 01/18/2018 09:45 AM, Emilien Macchi wrote:
>
>> On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar  wrote:
>>
>>> Hi,
>>> we're encountering many timeouts for zuul gates in TripleO.
>>> For example, see
>>> http://logs.openstack.org/95/508195/28/check-tripleo/tripleo
>>> -ci-centos-7-ovb-ha-oooq/c85fcb7/.
>>>
>>> rechecks won't help and sometimes specific gate is end successfully and
>>> sometimes not.
>>> The problem is that after recheck it's not always the same gate which is
>>> failed.
>>>
>>> Is there someone who have access to the servers load to see what cause
>>> this?
>>> alternatively, is there something we can do in order to reduce the
>>> running
>>> time for each gate?
>>>
>>
>> We're migrating to RDO Cloud for OVB jobs:
>> https://review.openstack.org/#/c/526481/
>> It's a work in progress but will help a lot for OVB timeouts on RH1.
>>
>> I'll let the CI folks comment on that topic.
>>
>>
> I noticed that the timeouts on rh1 have been especially bad as of late so
> I did a little testing and found that it did seem to be running more slowly
> than it should.  After some investigation I found that 6 of our compute
> nodes have warning messages that the cpu was throttled due to high
> temperature.  I've disabled 4 of them that had a lot of warnings. The other
> 2 only had a handful of warnings so I'm hopeful we can leave them active
> without affecting job performance too much.  It won't accomplish much if we
> disable the overheating nodes only to overload the remaining ones.
>
> I'll follow up with our hardware people and see if we can determine why
> these specific nodes are overheating.  They seem to be running 20 degrees C
> hotter than the rest of the nodes.
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


For the latest discussion and to-do's before rh1 ovb jobs are migrated to
rdo-cloud look here [1].
TLDR is that we're looking for a run of seven days where the jobs are
passing at around 80% or better in check.
We've reported a number of issues w/ the environment, and AFAIK everything
is now resolved just recently.

[1]
https://trello.com/c/wGUUEqty/384-steps-needed-to-migrate-ovb-to-rdo-cloud
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   >