Re: [openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

2018-10-14 Thread Arie Bregman
On Fri, Oct 12, 2018 at 2:10 PM Sofer Athlan-Guyot 
wrote:

> Hi,
>
> Testing and maintaining a green status for upgrade jobs within the 3h
> time limit has proven to be a very difficult job to say the least.
>
> The net result has been: we don't have anything even touching the
> upgrade code in the CI.
>
> So during Denver PTG it has been decided to give up on running a full
> upgrade job during the 3h time limit and instead to focus on two
> complementary approach to at least touch the upgrade code:
>  1. run a standalone upgrade: this test the ansible upgrade playbook;
>  2. run a N->N upgrade; this test the upgrade python code;
>
> And here there are, still not merged but seen working:
>  - tripleo-ci-centos-7-standalone-upgrade:
>https://review.openstack.org/#/c/604706/
>  - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
>https://review.openstack.org/#/c/607848/9
>
> The first is good to merge (but other could disagree), the second could
> be as well (but I tend to disagree :))
>
> The first leverage the standalone deployment and execute an standalone
> upgrade just after it.
>
> The limitation is that it only tests non-HA services (sorry pidone,
> cannot test ha in standalone) and only the upgrade_tasks (ie not any
> workflow related to the upgrade cli)
>
> The main benefits here are:
>  - ~2h to run the upgrade, still a bit long but far away from the 3h
>time limit;
>  - we trigger a yum upgrade so that we can catch problems there as well;
>  - we test the standalone upgrade which is good in itself;
>  - composable role available (as in standalone/all-in-all deployment) so
>you can make a specific upgrade test for your project if it fits into
>the standalone constraint;
>
> For this last point, if standalone specific role eventually goes into
> project testing (nova, neutron ...), they could have as well a way to
> test upgrade tasks.  This would be a best case scenario.
>
> Now, for the second point, the N->N upgrade.  Its "limitation" is that
> ... well it doesn't run a yum upgrade at all.  We start from master and
> run the upgrade to master.
>
> It's main benefit are:
>  - it takes ~2h20 to run, so well under the 3h time;
>  - tripleoclient upgrade code is run, which is one thing that the
>standalone ugprade cannot do.
>  - It also tend to exercise idempotency of all the tasks as it runs them
>on an already "upgraded" node;
>  - As added bonus, it could gate the tripleo-upgrade role as well as it
>definitively loads all of the role's tasks[1]
>
> For those that stayed with me to this point, I'm throwing another CI
> test that already proved useful already (caught errors), it's the
> ansible-lint test.  After a standalone deployment we just run
> ansible-lint on all playbook generated[2].
>
> It produces standalone_ansible_lint.log[3] in the working directory. It
> only takes a couple of minute to install ansible-lint and run it. It
> definitively gate against typos and the like. It touches hard to
> reach code as well, for instance the fast_forward tasks are linted.
> Still no pidone tasks in there but it could easily be added to a job
> that has HA tasks generated.
>
> Note that by default ansible-lint barks, as the generated playbooks hit
> several lintage problems, so only syntax errors and misnamed tasks or
> parameters are currently activated.  But all the lint problems are
> logged in the above file and can be fixed later on.  At which point we
> could activate full lint gating.
>
> Thanks for this long reading, any comments, shout of victory, cry of
> despair and reviews are welcomed.
>

That's awesome. It's perfect for a project we are working on (Tobiko) where
we want to run tests before upgrade (setting up resources) and after
(verifying those resources are still available).

I want to add such job (upgrade standalone) and I need help:

https://review.openstack.org/#/c/610397/

How do I set  tempest regex for pre-upgrade and another one for post
upgrade?


> [1] but this has still to be investigated.
> [2] testing review https://review.openstack.org/#/c/604756/ and main code
> https://review.openstack.org/#/c/604757/
> [3] sample output http://paste.openstack.org/show/731960/
> --
> Sofer Athlan-Guyot
> chem on #freenode
> Upgrade DFG.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

2018-10-12 Thread Wesley Hayutin
On Fri, Oct 12, 2018 at 5:10 AM Sofer Athlan-Guyot 
wrote:

> Hi,
>
> Testing and maintaining a green status for upgrade jobs within the 3h
> time limit has proven to be a very difficult job to say the least.
>

Indeed

>
> The net result has been: we don't have anything even touching the
> upgrade code in the CI.
>
> So during Denver PTG it has been decided to give up on running a full
> upgrade job during the 3h time limit and instead to focus on two
> complementary approach to at least touch the upgrade code:
>  1. run a standalone upgrade: this test the ansible upgrade playbook;
>  2. run a N->N upgrade; this test the upgrade python code;


> And here there are, still not merged but seen working:
>  - tripleo-ci-centos-7-standalone-upgrade:
>https://review.openstack.org/#/c/604706/
>  - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
>https://review.openstack.org/#/c/607848/9
>
> The first is good to merge (but other could disagree), the second could
> be as well (but I tend to disagree :))
>
> The first leverage the standalone deployment and execute an standalone
> upgrade just after it.
>
> The limitation is that it only tests non-HA services (sorry pidone,
> cannot test ha in standalone) and only the upgrade_tasks (ie not any
> workflow related to the upgrade cli)
>

This can be augmented with 3rd party.  The pidone team and the ci team are
putting the final touches on a 3rd party job for HA services.  Looking
forward, I could see a 3rd party upgrade job that runs the pidone
verification tests.


>
> The main benefits here are:
>  - ~2h to run the upgrade, still a bit long but far away from the 3h
>time limit;
>  - we trigger a yum upgrade so that we can catch problems there as well;
>  - we test the standalone upgrade which is good in itself;
>  - composable role available (as in standalone/all-in-all deployment) so
>you can make a specific upgrade test for your project if it fits into
>the standalone constraint;
>

These are all huge benefits over the previous implementation that have been
made available to us via the standalone deployment

>
> For this last point, if standalone specific role eventually goes into
> project testing (nova, neutron ...), they could have as well a way to
> test upgrade tasks.  This would be a best case scenario.
>

!   woot !!!
This is a huge point that TripleO folks need to absorb!!
!   woot !!!

In the next several sprints the TripleO CI team will do our best to focus
on the standalone deployments to convert TripleO's upstream jobs over and
paving the way for other projects to start consuming it.  IMHO I would
think other projects would be *very* interested in testing an upgrade of
their individual component w/o all the noise of unrelated
services/components.


>
> Now, for the second point, the N->N upgrade.  Its "limitation" is that
> ... well it doesn't run a yum upgrade at all.  We start from master and
> run the upgrade to master.
>
> It's main benefit are:
>  - it takes ~2h20 to run, so well under the 3h time;
>  - tripleoclient upgrade code is run, which is one thing that the
>standalone ugprade cannot do.
>  - It also tend to exercise idempotency of all the tasks as it runs them
>on an already "upgraded" node;
>  - As added bonus, it could gate the tripleo-upgrade role as well as it
>definitively loads all of the role's tasks[1]
>
> For those that stayed with me to this point, I'm throwing another CI
> test that already proved useful already (caught errors), it's the
> ansible-lint test.  After a standalone deployment we just run
> ansible-lint on all playbook generated[2].
>

This is nice, thanks chem!


>
> It produces standalone_ansible_lint.log[3] in the working directory. It
> only takes a couple of minute to install ansible-lint and run it. It
> definitively gate against typos and the like. It touches hard to
> reach code as well, for instance the fast_forward tasks are linted.
> Still no pidone tasks in there but it could easily be added to a job
> that has HA tasks generated.
>
> Note that by default ansible-lint barks, as the generated playbooks hit
> several lintage problems, so only syntax errors and misnamed tasks or
> parameters are currently activated.  But all the lint problems are
> logged in the above file and can be fixed later on.  At which point we
> could activate full lint gating.
>
> Thanks for this long reading, any comments, shout of victory, cry of
> despair and reviews are welcomed.
>
> [1] but this has still to be investigated.
> [2] testing review https://review.openstack.org/#/c/604756/ and main code
> https://review.openstack.org/#/c/604757/
> [3] sample output http://paste.openstack.org/show/731960/
> --
> Sofer Athlan-Guyot
> chem on #freenode
> Upgrade DFG.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 

[openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

2018-10-12 Thread Sofer Athlan-Guyot
Hi,

Testing and maintaining a green status for upgrade jobs within the 3h
time limit has proven to be a very difficult job to say the least.

The net result has been: we don't have anything even touching the
upgrade code in the CI.

So during Denver PTG it has been decided to give up on running a full
upgrade job during the 3h time limit and instead to focus on two
complementary approach to at least touch the upgrade code:
 1. run a standalone upgrade: this test the ansible upgrade playbook;
 2. run a N->N upgrade; this test the upgrade python code;

And here there are, still not merged but seen working:
 - tripleo-ci-centos-7-standalone-upgrade:
   https://review.openstack.org/#/c/604706/
 - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
   https://review.openstack.org/#/c/607848/9

The first is good to merge (but other could disagree), the second could
be as well (but I tend to disagree :))

The first leverage the standalone deployment and execute an standalone
upgrade just after it.

The limitation is that it only tests non-HA services (sorry pidone,
cannot test ha in standalone) and only the upgrade_tasks (ie not any
workflow related to the upgrade cli)

The main benefits here are:
 - ~2h to run the upgrade, still a bit long but far away from the 3h
   time limit;
 - we trigger a yum upgrade so that we can catch problems there as well;
 - we test the standalone upgrade which is good in itself;
 - composable role available (as in standalone/all-in-all deployment) so
   you can make a specific upgrade test for your project if it fits into
   the standalone constraint;

For this last point, if standalone specific role eventually goes into
project testing (nova, neutron ...), they could have as well a way to
test upgrade tasks.  This would be a best case scenario.

Now, for the second point, the N->N upgrade.  Its "limitation" is that
... well it doesn't run a yum upgrade at all.  We start from master and
run the upgrade to master.

It's main benefit are:
 - it takes ~2h20 to run, so well under the 3h time;
 - tripleoclient upgrade code is run, which is one thing that the
   standalone ugprade cannot do.
 - It also tend to exercise idempotency of all the tasks as it runs them
   on an already "upgraded" node;
 - As added bonus, it could gate the tripleo-upgrade role as well as it
   definitively loads all of the role's tasks[1]

For those that stayed with me to this point, I'm throwing another CI
test that already proved useful already (caught errors), it's the
ansible-lint test.  After a standalone deployment we just run
ansible-lint on all playbook generated[2].

It produces standalone_ansible_lint.log[3] in the working directory. It
only takes a couple of minute to install ansible-lint and run it. It
definitively gate against typos and the like. It touches hard to
reach code as well, for instance the fast_forward tasks are linted.
Still no pidone tasks in there but it could easily be added to a job
that has HA tasks generated.

Note that by default ansible-lint barks, as the generated playbooks hit
several lintage problems, so only syntax errors and misnamed tasks or
parameters are currently activated.  But all the lint problems are
logged in the above file and can be fixed later on.  At which point we
could activate full lint gating.

Thanks for this long reading, any comments, shout of victory, cry of
despair and reviews are welcomed.

[1] but this has still to be investigated.
[2] testing review https://review.openstack.org/#/c/604756/ and main code 
https://review.openstack.org/#/c/604757/
[3] sample output http://paste.openstack.org/show/731960/
--
Sofer Athlan-Guyot
chem on #freenode
Upgrade DFG.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev