subject:"Re\: \[openstack\-dev\] \[tripleo\] \[ci\]"

Re: [openstack-dev] [tripleo] CI is broken

2018-11-07 Thread Emilien Macchi

No alert anymore, gate is green.
recheck if needed.

On Wed, Nov 7, 2018 at 2:22 PM Emilien Macchi  wrote:

> I updated the bugs, and so far we have one alert left:
> https://bugs.launchpad.net/tripleo/+bug/1801969
>
> The patch is in gate, be patient and then we'll be able to +A/recheck
> stuff again.
>
> On Wed, Nov 7, 2018 at 7:30 AM Juan Antonio Osorio Robles <
> jaosor...@redhat.com> wrote:
>
>> Hello folks,
>>
>>
>> Please do not attempt to merge or recheck patches until we get this
>> sorted out.
>>
>> We are dealing with several issues that have broken all jobs.
>>
>> https://bugs.launchpad.net/tripleo/+bug/1801769
>> https://bugs.launchpad.net/tripleo/+bug/1801969
>> https://bugs.launchpad.net/tripleo/+bug/1802083
>> https://bugs.launchpad.net/tripleo/+bug/1802085
>>
>> Best Regards!
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Emilien Macchi
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI is broken

2018-11-07 Thread Emilien Macchi

I updated the bugs, and so far we have one alert left:
https://bugs.launchpad.net/tripleo/+bug/1801969

The patch is in gate, be patient and then we'll be able to +A/recheck stuff
again.

On Wed, Nov 7, 2018 at 7:30 AM Juan Antonio Osorio Robles <
jaosor...@redhat.com> wrote:

> Hello folks,
>
>
> Please do not attempt to merge or recheck patches until we get this
> sorted out.
>
> We are dealing with several issues that have broken all jobs.
>
> https://bugs.launchpad.net/tripleo/+bug/1801769
> https://bugs.launchpad.net/tripleo/+bug/1801969
> https://bugs.launchpad.net/tripleo/+bug/1802083
> https://bugs.launchpad.net/tripleo/+bug/1802085
>
> Best Regards!
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

2018-10-14 Thread Arie Bregman

On Fri, Oct 12, 2018 at 2:10 PM Sofer Athlan-Guyot 
wrote:

> Hi,
>
> Testing and maintaining a green status for upgrade jobs within the 3h
> time limit has proven to be a very difficult job to say the least.
>
> The net result has been: we don't have anything even touching the
> upgrade code in the CI.
>
> So during Denver PTG it has been decided to give up on running a full
> upgrade job during the 3h time limit and instead to focus on two
> complementary approach to at least touch the upgrade code:
>  1. run a standalone upgrade: this test the ansible upgrade playbook;
>  2. run a N->N upgrade; this test the upgrade python code;
>
> And here there are, still not merged but seen working:
>  - tripleo-ci-centos-7-standalone-upgrade:
>https://review.openstack.org/#/c/604706/
>  - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
>https://review.openstack.org/#/c/607848/9
>
> The first is good to merge (but other could disagree), the second could
> be as well (but I tend to disagree :))
>
> The first leverage the standalone deployment and execute an standalone
> upgrade just after it.
>
> The limitation is that it only tests non-HA services (sorry pidone,
> cannot test ha in standalone) and only the upgrade_tasks (ie not any
> workflow related to the upgrade cli)
>
> The main benefits here are:
>  - ~2h to run the upgrade, still a bit long but far away from the 3h
>time limit;
>  - we trigger a yum upgrade so that we can catch problems there as well;
>  - we test the standalone upgrade which is good in itself;
>  - composable role available (as in standalone/all-in-all deployment) so
>you can make a specific upgrade test for your project if it fits into
>the standalone constraint;
>
> For this last point, if standalone specific role eventually goes into
> project testing (nova, neutron ...), they could have as well a way to
> test upgrade tasks.  This would be a best case scenario.
>
> Now, for the second point, the N->N upgrade.  Its "limitation" is that
> ... well it doesn't run a yum upgrade at all.  We start from master and
> run the upgrade to master.
>
> It's main benefit are:
>  - it takes ~2h20 to run, so well under the 3h time;
>  - tripleoclient upgrade code is run, which is one thing that the
>standalone ugprade cannot do.
>  - It also tend to exercise idempotency of all the tasks as it runs them
>on an already "upgraded" node;
>  - As added bonus, it could gate the tripleo-upgrade role as well as it
>definitively loads all of the role's tasks[1]
>
> For those that stayed with me to this point, I'm throwing another CI
> test that already proved useful already (caught errors), it's the
> ansible-lint test.  After a standalone deployment we just run
> ansible-lint on all playbook generated[2].
>
> It produces standalone_ansible_lint.log[3] in the working directory. It
> only takes a couple of minute to install ansible-lint and run it. It
> definitively gate against typos and the like. It touches hard to
> reach code as well, for instance the fast_forward tasks are linted.
> Still no pidone tasks in there but it could easily be added to a job
> that has HA tasks generated.
>
> Note that by default ansible-lint barks, as the generated playbooks hit
> several lintage problems, so only syntax errors and misnamed tasks or
> parameters are currently activated.  But all the lint problems are
> logged in the above file and can be fixed later on.  At which point we
> could activate full lint gating.
>
> Thanks for this long reading, any comments, shout of victory, cry of
> despair and reviews are welcomed.
>

That's awesome. It's perfect for a project we are working on (Tobiko) where
we want to run tests before upgrade (setting up resources) and after
(verifying those resources are still available).

I want to add such job (upgrade standalone) and I need help:

https://review.openstack.org/#/c/610397/

How do I set  tempest regex for pre-upgrade and another one for post
upgrade?


> [1] but this has still to be investigated.
> [2] testing review https://review.openstack.org/#/c/604756/ and main code
> https://review.openstack.org/#/c/604757/
> [3] sample output http://paste.openstack.org/show/731960/
> --
> Sofer Athlan-Guyot
> chem on #freenode
> Upgrade DFG.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][upgrade] New jobs for tripleo Upgrade in the CI.

2018-10-12 Thread Wesley Hayutin

On Fri, Oct 12, 2018 at 5:10 AM Sofer Athlan-Guyot 
wrote:

> Hi,
>
> Testing and maintaining a green status for upgrade jobs within the 3h
> time limit has proven to be a very difficult job to say the least.
>

Indeed

>
> The net result has been: we don't have anything even touching the
> upgrade code in the CI.
>
> So during Denver PTG it has been decided to give up on running a full
> upgrade job during the 3h time limit and instead to focus on two
> complementary approach to at least touch the upgrade code:
>  1. run a standalone upgrade: this test the ansible upgrade playbook;
>  2. run a N->N upgrade; this test the upgrade python code;


> And here there are, still not merged but seen working:
>  - tripleo-ci-centos-7-standalone-upgrade:
>https://review.openstack.org/#/c/604706/
>  - tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades:
>https://review.openstack.org/#/c/607848/9
>
> The first is good to merge (but other could disagree), the second could
> be as well (but I tend to disagree :))
>
> The first leverage the standalone deployment and execute an standalone
> upgrade just after it.
>
> The limitation is that it only tests non-HA services (sorry pidone,
> cannot test ha in standalone) and only the upgrade_tasks (ie not any
> workflow related to the upgrade cli)
>

This can be augmented with 3rd party.  The pidone team and the ci team are
putting the final touches on a 3rd party job for HA services.  Looking
forward, I could see a 3rd party upgrade job that runs the pidone
verification tests.


>
> The main benefits here are:
>  - ~2h to run the upgrade, still a bit long but far away from the 3h
>time limit;
>  - we trigger a yum upgrade so that we can catch problems there as well;
>  - we test the standalone upgrade which is good in itself;
>  - composable role available (as in standalone/all-in-all deployment) so
>you can make a specific upgrade test for your project if it fits into
>the standalone constraint;
>

These are all huge benefits over the previous implementation that have been
made available to us via the standalone deployment

>
> For this last point, if standalone specific role eventually goes into
> project testing (nova, neutron ...), they could have as well a way to
> test upgrade tasks.  This would be a best case scenario.
>

!   woot !!!
This is a huge point that TripleO folks need to absorb!!
!   woot !!!

In the next several sprints the TripleO CI team will do our best to focus
on the standalone deployments to convert TripleO's upstream jobs over and
paving the way for other projects to start consuming it.  IMHO I would
think other projects would be *very* interested in testing an upgrade of
their individual component w/o all the noise of unrelated
services/components.


>
> Now, for the second point, the N->N upgrade.  Its "limitation" is that
> ... well it doesn't run a yum upgrade at all.  We start from master and
> run the upgrade to master.
>
> It's main benefit are:
>  - it takes ~2h20 to run, so well under the 3h time;
>  - tripleoclient upgrade code is run, which is one thing that the
>standalone ugprade cannot do.
>  - It also tend to exercise idempotency of all the tasks as it runs them
>on an already "upgraded" node;
>  - As added bonus, it could gate the tripleo-upgrade role as well as it
>definitively loads all of the role's tasks[1]
>
> For those that stayed with me to this point, I'm throwing another CI
> test that already proved useful already (caught errors), it's the
> ansible-lint test.  After a standalone deployment we just run
> ansible-lint on all playbook generated[2].
>

This is nice, thanks chem!


>
> It produces standalone_ansible_lint.log[3] in the working directory. It
> only takes a couple of minute to install ansible-lint and run it. It
> definitively gate against typos and the like. It touches hard to
> reach code as well, for instance the fast_forward tasks are linted.
> Still no pidone tasks in there but it could easily be added to a job
> that has HA tasks generated.
>
> Note that by default ansible-lint barks, as the generated playbooks hit
> several lintage problems, so only syntax errors and misnamed tasks or
> parameters are currently activated.  But all the lint problems are
> logged in the above file and can be fixed later on.  At which point we
> could activate full lint gating.
>
> Thanks for this long reading, any comments, shout of victory, cry of
> despair and reviews are welcomed.
>
> [1] but this has still to be investigated.
> [2] testing review https://review.openstack.org/#/c/604756/ and main code
> https://review.openstack.org/#/c/604757/
> [3] sample output http://paste.openstack.org/show/731960/
> --
> Sofer Athlan-Guyot
> chem on #freenode
> Upgrade DFG.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:

Re: [openstack-dev] [tripleo][ci] Having more that one queue for gate pipeline at tripleo

2018-10-11 Thread Clark Boylan

On Thu, Oct 11, 2018, at 7:17 AM, Ben Nemec wrote:
> 
> 
> On 10/11/18 8:53 AM, Felix Enrique Llorente Pastora wrote:
> > So for example, I don't see why changes at tripleo-quickstart can be 
> > reset if tripleo-ui fails, this is the kind of thing that maybe can be 
> > optimize.
> 
> Because if two incompatible changes are proposed to tripleo-quickstart 
> and tripleo-ui and both end up in parallel gate queues at the same time, 
> it's possible both queues could get wedged. Quickstart and the UI are 
> not completely independent projects. Quickstart has roles for deploying 
> the UI, which means there is a connection there.
> 
> I think the only way you could have independent gate queues is if you 
> had two disjoint sets of projects that could be gated without any use of 
> projects from the other set. I don't think it's possible to divide 
> TripleO in that way, but if I'm wrong then maybe you could do multiple 
> queues.

To follow up on this the Gate pipeline queue that your projects belong to are 
how you indicate to Zuul that there is coupling between these projects. Having 
things set up in this way allows you to ensure (through the Gate and Zuul's 
speculative future states) that a change to one project in the queue can't 
break another because they are tested together.

If your concern is "time to merge" splitting queues won't help all that much 
unless you put all of the unreliable broken code with broken tests in one queue 
and have the reliable code in another queue. Zuul tests everything in parallel 
within a queue. This means that if your code base and its tests are reliable 
you can merge 20 changes all at once and the time to merge for all 20 changes 
is the same as a single change. Problems arise when tests fail and these future 
states have to be updated and retested. This will affect one or many queues.

The fix here is to work on making reliable test jobs so that you can merge all 
20 changes in the span of time it takes to merge a single change.  This isn't 
necessarily easy, but helps you merge more code and be confident it works too.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] Having more that one queue for gate pipeline at tripleo

2018-10-11 Thread Ben Nemec




On 10/11/18 8:53 AM, Felix Enrique Llorente Pastora wrote:
So for example, I don't see why changes at tripleo-quickstart can be 
reset if tripleo-ui fails, this is the kind of thing that maybe can be 
optimize.


Because if two incompatible changes are proposed to tripleo-quickstart 
and tripleo-ui and both end up in parallel gate queues at the same time, 
it's possible both queues could get wedged. Quickstart and the UI are 
not completely independent projects. Quickstart has roles for deploying 
the UI, which means there is a connection there.


I think the only way you could have independent gate queues is if you 
had two disjoint sets of projects that could be gated without any use of 
projects from the other set. I don't think it's possible to divide 
TripleO in that way, but if I'm wrong then maybe you could do multiple 
queues.




On Thu, Oct 11, 2018 at 1:17 PM Emilien Macchi > wrote:




On Thu, Oct 11, 2018 at 10:01 AM Felix Enrique Llorente Pastora
mailto:ellor...@redhat.com>> wrote:

Hello there,

    After suffering a lot from zuul's tripleo gate piepeline
queue reseting after failures on patches I have ask myself what
would happend if we have more than one queue for gating tripleo.

    After a quick read here
https://zuul-ci.org/docs/zuul/user/gating.html, I have found the
following:

"If changes with cross-project dependencies do not share a
change queue then Zuul is unable to enqueue them together, and
the first will be required to merge before the second is enqueued."

    So it make sense to share zuul queue, but maybe only one
queue for all tripleo projects is too  much, for example sharing
queue between tripleo-ui and tripleo-quickstart, maybe we need
for example to queues for product stuff and one for CI, so
product does not get resetted if CI fails in a patch.

    What do you think ?

Probably a wrong example, as TripleO UI gate is using CI jobs
running tripleo-quickstart scenarios.
We could create more queues for projects which are really
independent from each other but we need to be very careful about it.
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Quique Llorente

Openstack TripleO CI

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] Having more that one queue for gate pipeline at tripleo

2018-10-11 Thread Felix Enrique Llorente Pastora

So for example, I don't see why changes at tripleo-quickstart can be reset
if tripleo-ui fails, this is the kind of thing that maybe can be optimize.

On Thu, Oct 11, 2018 at 1:17 PM Emilien Macchi  wrote:

>
>
> On Thu, Oct 11, 2018 at 10:01 AM Felix Enrique Llorente Pastora <
> ellor...@redhat.com> wrote:
>
>> Hello there,
>>
>>After suffering a lot from zuul's tripleo gate piepeline queue
>> reseting after failures on patches I have ask myself what would happend if
>> we have more than one queue for gating tripleo.
>>
>>After a quick read here https://zuul-ci.org/docs/zuul/user/gating.html,
>> I have found the following:
>>
>> "If changes with cross-project dependencies do not share a change queue
>> then Zuul is unable to enqueue them together, and the first will be
>> required to merge before the second is enqueued."
>>
>>So it make sense to share zuul queue, but maybe only one queue for all
>> tripleo projects is too  much, for example sharing queue between tripleo-ui
>> and tripleo-quickstart, maybe we need for example to queues for product
>> stuff and one for CI, so product does not get resetted if CI fails in a
>> patch.
>>
>>What do you think ?
>>
>
> Probably a wrong example, as TripleO UI gate is using CI jobs running
> tripleo-quickstart scenarios.
> We could create more queues for projects which are really independent from
> each other but we need to be very careful about it.
> --
> Emilien Macchi
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Quique Llorente

Openstack TripleO CI
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] Having more that one queue for gate pipeline at tripleo

2018-10-11 Thread Emilien Macchi

On Thu, Oct 11, 2018 at 10:01 AM Felix Enrique Llorente Pastora <
ellor...@redhat.com> wrote:

> Hello there,
>
>After suffering a lot from zuul's tripleo gate piepeline queue reseting
> after failures on patches I have ask myself what would happend if we have
> more than one queue for gating tripleo.
>
>After a quick read here https://zuul-ci.org/docs/zuul/user/gating.html,
> I have found the following:
>
> "If changes with cross-project dependencies do not share a change queue
> then Zuul is unable to enqueue them together, and the first will be
> required to merge before the second is enqueued."
>
>So it make sense to share zuul queue, but maybe only one queue for all
> tripleo projects is too  much, for example sharing queue between tripleo-ui
> and tripleo-quickstart, maybe we need for example to queues for product
> stuff and one for CI, so product does not get resetted if CI fails in a
> patch.
>
>What do you think ?
>

Probably a wrong example, as TripleO UI gate is using CI jobs running
tripleo-quickstart scenarios.
We could create more queues for projects which are really independent from
each other but we need to be very careful about it.
-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI is blocked

2018-08-16 Thread Wesley Hayutin

On Wed, Aug 15, 2018 at 10:13 PM Wesley Hayutin  wrote:

> On Wed, Aug 15, 2018 at 7:13 PM Alex Schultz  wrote:
>
>> Please do not approve or recheck anything until further notice. We've
>> got a few issues that have basically broken all the jobs.
>>
>> https://bugs.launchpad.net/tripleo/+bug/1786764
>
>
fix posted: https://review.openstack.org/#/c/592577/


>
>> https://bugs.launchpad.net/tripleo/+bug/1787226
>
>
Dupe of 1786764 


>
>> https://bugs.launchpad.net/tripleo/+bug/1787244
>
>
Fixed Released: https://review.openstack.org/592146


>
>> https://bugs.launchpad.net/tripleo/+bug/1787268
>
>
Proposed:
https://review.openstack.org/#/c/592233/
https://review.openstack.org/#/c/592275/



> https://bugs.launchpad.net/tripleo/+bug/1736950
>
> w
>

Will post a patch to skip the above tempest test.

Also the patch to re-enable build-test-packages, the code that injects your
change into a rpm is about to merge.
https://review.openstack.org/#/c/592218/

Thanks Steve, Alex, Jistr and others :)


>
>>
>> Thanks,
>> -Alex
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
>
> Wes Hayutin
>
> Associate MANAGER
>
> Red Hat
>
> 
>
> whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay
> 
>
> View my calendar and check my availability for meetings HERE
> 
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI is blocked

2018-08-15 Thread Wesley Hayutin

On Wed, Aug 15, 2018 at 7:13 PM Alex Schultz  wrote:

> Please do not approve or recheck anything until further notice. We've
> got a few issues that have basically broken all the jobs.
>
> https://bugs.launchpad.net/tripleo/+bug/1786764
> https://bugs.launchpad.net/tripleo/+bug/1787226
> https://bugs.launchpad.net/tripleo/+bug/1787244
> https://bugs.launchpad.net/tripleo/+bug/1787268


https://bugs.launchpad.net/tripleo/+bug/1736950

w

>
>
> Thanks,
> -Alex
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][metrics] Stucked in the middle of work because of RDO CI

2018-08-01 Thread Ben Nemec




On 07/31/2018 04:51 PM, Wesley Hayutin wrote:



On Tue, Jul 31, 2018 at 7:41 AM Sagi Shnaidman > wrote:


Hi, Martin

I see master OVB jobs are passing now [1], please recheck.

[1] http://cistatus.tripleo.org/


Things have improved and I see a lot of jobs passing however at the same 
time I see too many jobs failing due to node_failures.  We are tracking 
the data from [1].  Certainly the issue is NOT ideal for development and 
we need to remain focused on improving the situation.


I assume you're aware, but just to update the thread it looks like the 
OVB jobs are failing at a 50%+ rate again today (mostly unknown failures 
according to the tracking app).  Even with only two jobs that means your 
odds of getting them both to pass are pretty bad.




Thanks

[1] https://softwarefactory-project.io/zuul/api/tenant/rdoproject.org/builds



On Tue, Jul 31, 2018 at 12:24 PM, Martin Magr mailto:mm...@redhat.com>> wrote:

Greetings guys,

   it is pretty obvious that RDO CI jobs in TripleO projects are
broken [0]. Once Zuul CI jobs will pass would it be possible to
have AMQP/collectd patches ([1],[2],[3]) merged please even
though the negative result of RDO CI jobs? Half of the patches
for this feature is merged and the other half is stucked in this
situation, were nobody reviews these patches, because there is
red -1. Those patches passed Zuul jobs several times already and
were manually tested too.

Thanks in advance for consideration of this situation,
Martin

[0]

https://trello.com/c/hkvfxAdX/667-cixtripleoci-rdo-software-factory-3rd-party-jobs-failing-due-to-instance-nodefailure
[1] https://review.openstack.org/#/c/578749
[2] https://review.openstack.org/#/c/576057/
[3] https://review.openstack.org/#/c/572312/

-- 
Martin Mágr

Senior Software Engineer
Red Hat Czech


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Best regards

Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.com 
  T: +1919 4232509     IRC: 
weshay





Viewmycalendar and check my availability for meetings HERE 




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][metrics] FFE request for QDR integration in TripleO (Was: Stucked in the middle of work because of RDO CI)

2018-07-31 Thread Alex Schultz

On Tue, Jul 31, 2018 at 11:31 AM, Pradeep Kilambi  wrote:
> Hi Alex:
>
> Can you consider this our FFE for the QDR patches. Its mainly blocked on CI
> issues. Half the patches for QDR integration are already merged. The other 3
> referenced need to get merged once CI passes. Please consider this out
> formal request for FFE for QDR integration in tripleo.
>

Ok if it's just these patches and there is no further work it should
be OK. I did point out (prior to CI issues) that the patch[0] actually
broke the ovb jobs back in June. It seemed to be related to missing
containers or something to that effect.  So we'll need to be extra
care when merging this to ensure it does not break anything.  If we
get clean jobs prior to the rc1, we can merge it. If not I'd say we
need to hold off.  I don't consider this is a blocking feature.

Thanks,
-Alex

[0] https://review.openstack.org/#/c/578749/

> Cheers,
> ~ Prad
>
> On Tue, Jul 31, 2018 at 7:40 AM Sagi Shnaidman  wrote:
>>
>> Hi, Martin
>>
>> I see master OVB jobs are passing now [1], please recheck.
>>
>> [1] http://cistatus.tripleo.org/
>>
>> On Tue, Jul 31, 2018 at 12:24 PM, Martin Magr  wrote:
>>>
>>> Greetings guys,
>>>
>>>   it is pretty obvious that RDO CI jobs in TripleO projects are broken
>>> [0]. Once Zuul CI jobs will pass would it be possible to have AMQP/collectd
>>> patches ([1],[2],[3]) merged please even though the negative result of RDO
>>> CI jobs? Half of the patches for this feature is merged and the other half
>>> is stucked in this situation, were nobody reviews these patches, because
>>> there is red -1. Those patches passed Zuul jobs several times already and
>>> were manually tested too.
>>>
>>> Thanks in advance for consideration of this situation,
>>> Martin
>>>
>>> [0]
>>> https://trello.com/c/hkvfxAdX/667-cixtripleoci-rdo-software-factory-3rd-party-jobs-failing-due-to-instance-nodefailure
>>> [1] https://review.openstack.org/#/c/578749
>>> [2] https://review.openstack.org/#/c/576057/
>>> [3] https://review.openstack.org/#/c/572312/
>>>
>>> --
>>> Martin Mágr
>>> Senior Software Engineer
>>> Red Hat Czech
>>>
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>>
>> --
>> Best regards
>> Sagi Shnaidman
>
>
>
> --
> Cheers,
> ~ Prad

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][metrics] Stucked in the middle of work because of RDO CI

2018-07-31 Thread Wesley Hayutin

On Tue, Jul 31, 2018 at 7:41 AM Sagi Shnaidman  wrote:

> Hi, Martin
>
> I see master OVB jobs are passing now [1], please recheck.
>
> [1] http://cistatus.tripleo.org/
>

Things have improved and I see a lot of jobs passing however at the same
time I see too many jobs failing due to node_failures.  We are tracking the
data from [1].  Certainly the issue is NOT ideal for development and we
need to remain focused on improving the situation.

Thanks

[1] https://softwarefactory-project.io/zuul/api/tenant/rdoproject.org/builds



>
>
> On Tue, Jul 31, 2018 at 12:24 PM, Martin Magr  wrote:
>
>> Greetings guys,
>>
>>   it is pretty obvious that RDO CI jobs in TripleO projects are broken
>> [0]. Once Zuul CI jobs will pass would it be possible to have AMQP/collectd
>> patches ([1],[2],[3]) merged please even though the negative result of RDO
>> CI jobs? Half of the patches for this feature is merged and the other half
>> is stucked in this situation, were nobody reviews these patches, because
>> there is red -1. Those patches passed Zuul jobs several times already and
>> were manually tested too.
>>
>> Thanks in advance for consideration of this situation,
>> Martin
>>
>> [0]
>> https://trello.com/c/hkvfxAdX/667-cixtripleoci-rdo-software-factory-3rd-party-jobs-failing-due-to-instance-nodefailure
>> [1] https://review.openstack.org/#/c/578749
>> [2] https://review.openstack.org/#/c/576057/
>> [3] https://review.openstack.org/#/c/572312/
>>
>> --
>> Martin Mágr
>> Senior Software Engineer
>> Red Hat Czech
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



w hayu...@redhat.comT: +1919 <+19197544114>4232509
   IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][metrics] FFE request for QDR integration in TripleO (Was: Stucked in the middle of work because of RDO CI)

2018-07-31 Thread Pradeep Kilambi

Hi Alex:

Can you consider this our FFE for the QDR patches. Its mainly blocked on CI
issues. Half the patches for QDR integration are already merged. The other
3 referenced need to get merged once CI passes. Please consider this out
formal request for FFE for QDR integration in tripleo.

Cheers,
~ Prad

On Tue, Jul 31, 2018 at 7:40 AM Sagi Shnaidman  wrote:

> Hi, Martin
>
> I see master OVB jobs are passing now [1], please recheck.
>
> [1] http://cistatus.tripleo.org/
>
> On Tue, Jul 31, 2018 at 12:24 PM, Martin Magr  wrote:
>
>> Greetings guys,
>>
>>   it is pretty obvious that RDO CI jobs in TripleO projects are broken
>> [0]. Once Zuul CI jobs will pass would it be possible to have AMQP/collectd
>> patches ([1],[2],[3]) merged please even though the negative result of RDO
>> CI jobs? Half of the patches for this feature is merged and the other half
>> is stucked in this situation, were nobody reviews these patches, because
>> there is red -1. Those patches passed Zuul jobs several times already and
>> were manually tested too.
>>
>> Thanks in advance for consideration of this situation,
>> Martin
>>
>> [0]
>> https://trello.com/c/hkvfxAdX/667-cixtripleoci-rdo-software-factory-3rd-party-jobs-failing-due-to-instance-nodefailure
>> [1] https://review.openstack.org/#/c/578749
>> [2] https://review.openstack.org/#/c/576057/
>> [3] https://review.openstack.org/#/c/572312/
>>
>> --
>> Martin Mágr
>> Senior Software Engineer
>> Red Hat Czech
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Best regards
> Sagi Shnaidman
>


-- 
Cheers,
~ Prad
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][metrics] Stucked in the middle of work because of RDO CI

2018-07-31 Thread Sagi Shnaidman

Hi, Martin

I see master OVB jobs are passing now [1], please recheck.

[1] http://cistatus.tripleo.org/

On Tue, Jul 31, 2018 at 12:24 PM, Martin Magr  wrote:

> Greetings guys,
>
>   it is pretty obvious that RDO CI jobs in TripleO projects are broken
> [0]. Once Zuul CI jobs will pass would it be possible to have AMQP/collectd
> patches ([1],[2],[3]) merged please even though the negative result of RDO
> CI jobs? Half of the patches for this feature is merged and the other half
> is stucked in this situation, were nobody reviews these patches, because
> there is red -1. Those patches passed Zuul jobs several times already and
> were manually tested too.
>
> Thanks in advance for consideration of this situation,
> Martin
>
> [0] https://trello.com/c/hkvfxAdX/667-cixtripleoci-rdo-software-
> factory-3rd-party-jobs-failing-due-to-instance-nodefailure
> [1] https://review.openstack.org/#/c/578749
> [2] https://review.openstack.org/#/c/576057/
> [3] https://review.openstack.org/#/c/572312/
>
> --
> Martin Mágr
> Senior Software Engineer
> Red Hat Czech
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI is down stop workflowing

2018-06-19 Thread Alex Schultz

On Tue, Jun 19, 2018 at 1:45 PM, Wesley Hayutin  wrote:
> Check and gate jobs look clear.
> More details on a bit.
>


So for a recap of the last 24 hours or so...

Mistral auth problems - https://bugs.launchpad.net/tripleo/+bug/1777541
 - caused by https://review.openstack.org/#/c/574878/
 - fixed by https://review.openstack.org/#/c/576336/

Undercloud install failure - https://bugs.launchpad.net/tripleo/+bug/1777616
- caused by https://review.openstack.org/#/c/570307/
- fixed by https://review.openstack.org/#/c/576428/

Keystone duplicate role - https://bugs.launchpad.net/tripleo/+bug/1777451
- caused by https://review.openstack.org/#/c/572243/
- fixed by https://review.openstack.org/#/c/576356 and
https://review.openstack.org/#/c/576393/

The puppet issues should be prevented in the future by adding tripleo
undercloud jobs back in to the appropriate modules, see
https://review.openstack.org/#/q/topic:tripleo-ci+(status:open)
I recommended the undercloud jobs because that gives us some basic
coverage and the instack-undercloud job still uses puppet without
containers.  We'll likely want to replace these jobs with standalone
versions at somepoint as that configuration gets more mature.

We've restored any patches that were abandoned in the gate and it
should be ok to recheck.

Thanks,
-Alex

> Thanks
>
> Sent from my mobile
>
> On Tue, Jun 19, 2018, 07:33 Felix Enrique Llorente Pastora
>  wrote:
>>
>> Hi,
>>
>>We have the following bugs with fixes that need to land to unblock
>> check/gate jobs:
>>
>>https://bugs.launchpad.net/tripleo/+bug/1777451
>>https://bugs.launchpad.net/tripleo/+bug/1777616
>>
>>You can check them out at #tripleo ooolpbot.
>>
>>Please stop workflowing temporally until they get merged.
>>
>> BR.
>>
>> --
>> Quique Llorente
>>
>> Openstack TripleO CI
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI is down stop workflowing

2018-06-19 Thread Wesley Hayutin

Check and gate jobs look clear.
More details on a bit.

Thanks

Sent from my mobile

On Tue, Jun 19, 2018, 07:33 Felix Enrique Llorente Pastora <
ellor...@redhat.com> wrote:

> Hi,
>
>We have the following bugs with fixes that need to land to unblock
> check/gate jobs:
>
>https://bugs.launchpad.net/tripleo/+bug/1777451
>https://bugs.launchpad.net/tripleo/+bug/1777616
>
>You can check them out at #tripleo ooolpbot.
>
>Please stop workflowing temporally until they get merged.
>
> BR.
>
> --
> Quique Llorente
>
> Openstack TripleO CI
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-24 Thread Bogdan Dobrelya

On 5/23/18 6:49 PM, Sagi Shnaidman wrote:

Alex,

the problem is that you're working and focusing mostly on release 
specific code like featuresets and some scripts. But 
tripleo-quickstart(-extras) and tripleo-ci is much *much* more than set 
of featuresets. Only 10% of the code may be related to releases and 
branches, while other 90% is completely independent and not related to 
releases.

So in 90% code we DO need to backport every change, take for example the 
latest patch to extras: https://review.openstack.org/#/c/570167/, it's 
fixing reproducer. If oooq-extra was branched, we would need to backport 
this fix to every and every branch. And the same for all other 90% of 
code, which is complete nonsense.
Just because not using "{% if release %}" construct - to block the whole 
work of CI team and make the CI code is absolutely unmaintainable?

Some of release related templates we moved recently from tripleo-ci to 
THT repo like scenarios, OC templates, etc. If we discover another 
things in oooq that could be moved to branched THT I'd be only happy for 
that.

Sometimes it could be hard to maintain one file in extras templates with 
different logic for releases, like we have in tempest configuration for 
example. The solution is to create a few release-related templates and 
use one that match the current branch. It doesn't affect 90% of code and 
still "branch-like" approach. But I didn't see other scripts that are so 
release dependent. If we'll have ones, we could do the same. For now I 
see "{% if release %}" construct working very well.

I didn't see still any advantage of branching CI code, except of a 
little bit nicer jinja templates without "{% if release ", but amount of 
disadvantages is so huge, that it'll literally block all current work in CI.

[tl;dr] branching allows to not run cloned branched jobs against master 
patches. Or patches will wait longer in queues, and fail more often cuz 
of intermittent infra issues. See explanation and some calculations below.

So my main concern against additional stable release cloned jobs 
executed for master branches is that there is an "infra failure fee", 
which is a failure unrelated to the patch under check or gate, like an 
intermittent connectivity/timeout inducted failure. This normally is 
followed by a 'recheck' comment posted by an engineer, and sometimes is 
noticed by the elastic recheck bot as well. Say, that sort of a failure 
has a probability of N. And the real "product failure", which is related 
to the subject patch and not infra, takes P. So chances to fail for a job is

F = (1 - ((1 - N)*(1 - P)).

Now that we have added a two more "branched clones" for RDO CI OVB jobs 
and a two more zuul jobs, we have this equation as

F = (1 - ((1 - N)^4*(1 - P)).

(I assumed the chances to face a product defect for the cloned branched 
jobs remain unchanged).

This might bring significantly increased chances to fail (see some 
examples [0] for the N/P distribution cases). So folks will start 
posting 'recheck' comments now even more often, like x2 times more 
often. Which would make zuul and RDO CI queues larger, and patches 
sitting there longer - ending up with more time to wait for jobs to 
start its check/gate pipelines. That's what I call 'recheck storms'. And 
w/o branched quickstart/extras, we might have those storms amplified, 
tho that fully depends on real N/P distributions.

[0] https://pastebin.com/ckG5G7NG

Thanks

On Wed, May 23, 2018 at 7:04 PM, Alex Schultz > wrote:

On Wed, May 23, 2018 at 8:30 AM, Sagi Shnaidman > wrote:
> Hi, Sergii
>
> thanks for the question. It's not first time that this topic is raised and
> from first view it could seem that branching would help to that sort of
> issues.
>
> Although it's not the case. Tripleo-quickstart(-extras) is part of CI 
code,
> as well as tripleo-ci repo which have never been branched. The reason for
> that is relative small impact on CI code from product branching. Think 
about
> backport almost *every* patch to oooq and extras to all supported 
branches,
> down to newton at least. This will be a really *huge* price and non
> reasonable work. Just think about active maintenance of 3-4 versions of CI
> code in each of 3 repositories. It will take all time of CI team with 
almost
> zero value of this work.
>

So I'm not sure I completely agree with this assessment as there is a
price paid for every {%if release in [...]%} that we have to carry in
oooq{,-extras}.  These go away if we branch because we don't have to
worry about breaking previous releases or current release (which may
or may not actually have CI results).

> What regards patch you listed, we would have backport this change to 
*every*
> branch, and it wouldn't really help to avoid the issue. The source of

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Sergii Golovatiuk

Hi,

On Wed, May 23, 2018 at 8:20 PM, Sagi Shnaidman  wrote:
>
>>
>> to reduce the impact of a change. From my original reply:
>>
>> > If there's a high maintenance cost, we haven't properly identified the
>> > optimal way to separate functionality between tripleo/quickstart.
>>
>> IMHO this is a side effect of having a whole bunch of roles in a
>> single repo.  oooq-extras has a mix of tripleo and non-tripleo related
>> content. The reproducer IMHO is related to provisioning and could fall
>> in the oooq repo and not oooq-extras.  This is a structure problem
>> with quickstart.  If it's not version specific, then don't put it in a
>> version specific repo. But that doesn't mean don't use version
>> specific repos at all.
>>
>> This is one of the reasons why we're opting not to use this pattern of
>> a bunch of roles in a single repo for tripleo itself[0][1][2].  We
>> learned with the puppet modules that carrying all this stuff in a
>> single repo has a huge maintenance cost and if you split them out you
>> can identify re-usability and establish proper patterns for moving
>> functionality into a shared place[3].  Yes there is a maintenance cost
>> of maintaining independent repos, but at the same time there's a
>> benefit of re-usability by other projects/groups when you expose
>> important pieces of functionality as a standalone. You can establish
>> clear ways to interact with each piece, test items, and release
>> independently.  For example the ansible-role-container-registry is not
>> tripleo specific and anyone looking to manage a standalone docker
>> registry can use it & contribute.
>>
>
> We were moving between having all roles in one repo and having a separate
> repo for each role a few times. Each case has it's advantages and
> disadvantages. Last time we moved to have roles in 2 repos - quickstart and
> extras, it was a year ago I think. So far IMHO it's the best approach. There
> will be a mechanism to install additional roles, like we have for
> tirpleo-upgrade, ops-tools, etc etc.

But at the moment we don't have that mechanism so we should live
somehow until it's implemented.

> It may be a much broader topic to discuss, although I think having part of
> roles branched and part of not branched is much more headache.
> Tripleo-upgrade is a good example of it.
>
>>
>> > So in 90% code we DO need to backport every change, take for example the
>> > latest patch to extras: https://review.openstack.org/#/c/570167/, it's
>> > fixing reproducer. If oooq-extra was branched, we would need to backport
>> > this fix to every and every branch. And the same for all other 90% of
>> > code,
>> > which is complete nonsense.
>> > Just because not using "{% if release %}" construct - to block the whole
>> > work of CI team and make the CI code is absolutely unmaintainable?
>> >
>>
>> And you're saying what we currently have is maintainable?  We keep
>> breaking ourselves, there's big gaps in coverage and it takes
>> time[4][5] to identify breakages. I don't consider that maintainable
>> because this is a recurring topic because we clearly haven't fixed it
>> with the current setup.  It's time to re-evaluate what we have an see
>> if there's room for improvement.  I know I wasn't proposing to branch
>> all the repositories, but it might make sense to figure out if there's
>> a way to reduce our recurring issues with stable branches or
>> independent modules for some of the functions in CI.
>
>
>> Considering this is how we broke Queens, I'm not sure I agree.

We broke Queens, Pike, Newton by merging [1] without testing against
these releases.

>>
>
> First of all I don't see any connection between maintenance and CI
> breakages, it's different topics. And yes, it IS maintainable CI that we
> have now, and I have what to compare it with. I remember very well
> tripleo.sh based approach, also you can see almost green dashboards last
> time which proves my statement. CI is not ideal now, but it's definitely
> much better than 1-2 years ago.
>
>
> Of course we have breakages, the CI is actually history of breakages and
> fixes, as any other product. Wrt queens issue, it took about a week to solve
> it not because it was so hard, but because we had a very difficult weeks
> when trying to fix all Centos 7.5 issues and queens branch was in second
> priority. And by the way, we fixed everything much faster then it was with
> CentOS 7.4.  Having the negative attitude that every CI breakage is proof of
> wrong CI structure is not correct and doesn't help. If branching helped in
> this case, it would create much bigger problems in all other cases.

I would like to forget about feeling and discuss the technical side of
2 solutions, cost for every team and product in general to find the
solution that fits all.

>
> Anyway, we saw that having branch jobs in OVB only didn't catch queens issue
> (why - you know better) so we added multinode branch specific ones, which
> will catch such issues in the

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Sagi Shnaidman

> to reduce the impact of a change. From my original reply:
>
> > If there's a high maintenance cost, we haven't properly identified the
> optimal way to separate functionality between tripleo/quickstart.
>
> IMHO this is a side effect of having a whole bunch of roles in a
> single repo.  oooq-extras has a mix of tripleo and non-tripleo related
> content. The reproducer IMHO is related to provisioning and could fall
> in the oooq repo and not oooq-extras.  This is a structure problem
> with quickstart.  If it's not version specific, then don't put it in a
> version specific repo. But that doesn't mean don't use version
> specific repos at all.
>
> This is one of the reasons why we're opting not to use this pattern of
> a bunch of roles in a single repo for tripleo itself[0][1][2].  We
> learned with the puppet modules that carrying all this stuff in a
> single repo has a huge maintenance cost and if you split them out you
> can identify re-usability and establish proper patterns for moving
> functionality into a shared place[3].  Yes there is a maintenance cost
> of maintaining independent repos, but at the same time there's a
> benefit of re-usability by other projects/groups when you expose
> important pieces of functionality as a standalone. You can establish
> clear ways to interact with each piece, test items, and release
> independently.  For example the ansible-role-container-registry is not
> tripleo specific and anyone looking to manage a standalone docker
> registry can use it & contribute.
>
>
We were moving between having all roles in one repo and having a separate
repo for each role a few times. Each case has it's advantages and
disadvantages. Last time we moved to have roles in 2 repos - quickstart and
extras, it was a year ago I think. So far IMHO it's the best approach.
There will be a mechanism to install additional roles, like we have for
tirpleo-upgrade, ops-tools, etc etc.
It may be a much broader topic to discuss, although I think having part of
roles branched and part of not branched is much more headache.
Tripleo-upgrade is a good example of it.


> > So in 90% code we DO need to backport every change, take for example the
> > latest patch to extras: https://review.openstack.org/#/c/570167/, it's
> > fixing reproducer. If oooq-extra was branched, we would need to backport
> > this fix to every and every branch. And the same for all other 90% of
> code,
> > which is complete nonsense.
> > Just because not using "{% if release %}" construct - to block the whole
> > work of CI team and make the CI code is absolutely unmaintainable?
> >
>
> And you're saying what we currently have is maintainable?  We keep
> breaking ourselves, there's big gaps in coverage and it takes
> time[4][5] to identify breakages. I don't consider that maintainable
> because this is a recurring topic because we clearly haven't fixed it
> with the current setup.  It's time to re-evaluate what we have an see
> if there's room for improvement.  I know I wasn't proposing to branch
> all the repositories, but it might make sense to figure out if there's
> a way to reduce our recurring issues with stable branches or
> independent modules for some of the functions in CI.
>

Considering this is how we broke Queens, I'm not sure I agree.
>
>
First of all I don't see any connection between maintenance and CI
breakages, it's different topics. And yes, it IS maintainable CI that we
have now, and I have what to compare it with. I remember very well
tripleo.sh based approach, also you can see almost green dashboards last
time which proves my statement. CI is not ideal now, but it's definitely
much better than 1-2 years ago.

Of course we have breakages, the CI is actually history of breakages and
fixes, as any other product. Wrt queens issue, it took about a week to
solve it not because it was so hard, but because we had a very difficult
weeks when trying to fix all Centos 7.5 issues and queens branch was in
second priority. And by the way, we fixed everything much faster then it
was with CentOS 7.4.  Having the negative attitude that every CI breakage
is proof of wrong CI structure is not correct and doesn't help. If
branching helped in this case, it would create much bigger problems in all
other cases.

Anyway, we saw that having branch jobs in OVB only didn't catch queens
issue (why - you know better) so we added multinode branch specific ones,
which will catch such issues in the future. We hit the problem, solved it,
set preventive actions and are ready to catch it next time. This is a
normal CI workflow and I don't see any problem with it. Having multinode
branch jobs is actually pretty similar to "branching" repos but without
maintenance nightmare.

Thanks

Thanks,
> -Alex
>
> [0] http://git.openstack.org/cgit/openstack/ansible-role-
> container-registry/
> [1] http://git.openstack.org/cgit/openstack/ansible-role-redhat-
> subscription/
> [2] http://git.openstack.org/cgit/openstack/ansible-role-tripleo-keystone/
> [3]

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Alex Schultz

On Wed, May 23, 2018 at 10:49 AM, Sagi Shnaidman  wrote:
> Alex,
>
> the problem is that you're working and focusing mostly on release specific
> code like featuresets and some scripts. But tripleo-quickstart(-extras) and
> tripleo-ci is much *much* more than set of featuresets. Only 10% of the code
> may be related to releases and branches, while other 90% is completely
> independent and not related to releases.
>

It is not necessarily about release specific code, it's about being
able to reduce the impact of a change. From my original reply:

> If there's a high maintenance cost, we haven't properly identified the 
> optimal way to separate functionality between tripleo/quickstart.

IMHO this is a side effect of having a whole bunch of roles in a
single repo.  oooq-extras has a mix of tripleo and non-tripleo related
content. The reproducer IMHO is related to provisioning and could fall
in the oooq repo and not oooq-extras.  This is a structure problem
with quickstart.  If it's not version specific, then don't put it in a
version specific repo. But that doesn't mean don't use version
specific repos at all.

This is one of the reasons why we're opting not to use this pattern of
a bunch of roles in a single repo for tripleo itself[0][1][2].  We
learned with the puppet modules that carrying all this stuff in a
single repo has a huge maintenance cost and if you split them out you
can identify re-usability and establish proper patterns for moving
functionality into a shared place[3].  Yes there is a maintenance cost
of maintaining independent repos, but at the same time there's a
benefit of re-usability by other projects/groups when you expose
important pieces of functionality as a standalone. You can establish
clear ways to interact with each piece, test items, and release
independently.  For example the ansible-role-container-registry is not
tripleo specific and anyone looking to manage a standalone docker
registry can use it & contribute.

> So in 90% code we DO need to backport every change, take for example the
> latest patch to extras: https://review.openstack.org/#/c/570167/, it's
> fixing reproducer. If oooq-extra was branched, we would need to backport
> this fix to every and every branch. And the same for all other 90% of code,
> which is complete nonsense.
> Just because not using "{% if release %}" construct - to block the whole
> work of CI team and make the CI code is absolutely unmaintainable?
>

And you're saying what we currently have is maintainable?  We keep
breaking ourselves, there's big gaps in coverage and it takes
time[4][5] to identify breakages. I don't consider that maintainable
because this is a recurring topic because we clearly haven't fixed it
with the current setup.  It's time to re-evaluate what we have an see
if there's room for improvement.  I know I wasn't proposing to branch
all the repositories, but it might make sense to figure out if there's
a way to reduce our recurring issues with stable branches or
independent modules for some of the functions in CI.

> Some of release related templates we moved recently from tripleo-ci to THT
> repo like scenarios, OC templates, etc. If we discover another things in
> oooq that could be moved to branched THT I'd be only happy for that.
>
> Sometimes it could be hard to maintain one file in extras templates with
> different logic for releases, like we have in tempest configuration for
> example. The solution is to create a few release-related templates and use
> one that match the current branch. It doesn't affect 90% of code and still
> "branch-like" approach. But I didn't see other scripts that are so release
> dependent. If we'll have ones, we could do the same. For now I see "{% if
> release %}" construct working very well.

Considering this is how we broke Queens, I'm not sure I agree.

>
> I didn't see still any advantage of branching CI code, except of a little
> bit nicer jinja templates without "{% if release ", but amount of
> disadvantages is so huge, that it'll literally block all current work in CI.
>

It's about reducing our risk with test coverage. We do not properly
test all jobs and all configurations when we make these changes. This
is a repeated problem and when we have to add version specific logic,
unless we're able to identify what this is actually impacting and
verify with jobs we have a risk of breaking ourselves.  We've seen
that code review is not sufficient for these changes as we merge
things and only find out after they've been merged that we broke
stable branches. Then it takes folks tracking down changes to decipher
what we broke. For example the original patch[4] broke Queens for
about a week.  That's 7 days of nothing being able to be merged,
that's not OK.

Thanks,
-Alex

[0] http://git.openstack.org/cgit/openstack/ansible-role-container-registry/
[1] http://git.openstack.org/cgit/openstack/ansible-role-redhat-subscription/
[2]

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Sagi Shnaidman

Alex,

the problem is that you're working and focusing mostly on release specific
code like featuresets and some scripts. But tripleo-quickstart(-extras) and
tripleo-ci is much *much* more than set of featuresets. Only 10% of the
code may be related to releases and branches, while other 90% is completely
independent and not related to releases.

So in 90% code we DO need to backport every change, take for example the
latest patch to extras: https://review.openstack.org/#/c/570167/, it's
fixing reproducer. If oooq-extra was branched, we would need to backport
this fix to every and every branch. And the same for all other 90% of code,
which is complete nonsense.
Just because not using "{% if release %}" construct - to block the whole
work of CI team and make the CI code is absolutely unmaintainable?

Some of release related templates we moved recently from tripleo-ci to THT
repo like scenarios, OC templates, etc. If we discover another things in
oooq that could be moved to branched THT I'd be only happy for that.

Sometimes it could be hard to maintain one file in extras templates with
different logic for releases, like we have in tempest configuration for
example. The solution is to create a few release-related templates and use
one that match the current branch. It doesn't affect 90% of code and still
"branch-like" approach. But I didn't see other scripts that are so release
dependent. If we'll have ones, we could do the same. For now I see "{% if
release %}" construct working very well.

I didn't see still any advantage of branching CI code, except of a little
bit nicer jinja templates without "{% if release ", but amount of
disadvantages is so huge, that it'll literally block all current work in CI.

Thanks

On Wed, May 23, 2018 at 7:04 PM, Alex Schultz  wrote:

> On Wed, May 23, 2018 at 8:30 AM, Sagi Shnaidman 
> wrote:
> > Hi, Sergii
> >
> > thanks for the question. It's not first time that this topic is raised
> and
> > from first view it could seem that branching would help to that sort of
> > issues.
> >
> > Although it's not the case. Tripleo-quickstart(-extras) is part of CI
> code,
> > as well as tripleo-ci repo which have never been branched. The reason for
> > that is relative small impact on CI code from product branching. Think
> about
> > backport almost *every* patch to oooq and extras to all supported
> branches,
> > down to newton at least. This will be a really *huge* price and non
> > reasonable work. Just think about active maintenance of 3-4 versions of
> CI
> > code in each of 3 repositories. It will take all time of CI team with
> almost
> > zero value of this work.
> >
>
> So I'm not sure I completely agree with this assessment as there is a
> price paid for every {%if release in [...]%} that we have to carry in
> oooq{,-extras}.  These go away if we branch because we don't have to
> worry about breaking previous releases or current release (which may
> or may not actually have CI results).
>
> > What regards patch you listed, we would have backport this change to
> *every*
> > branch, and it wouldn't really help to avoid the issue. The source of
> > problem is not branchless repo here.
> >
>
> No we shouldn't be backporting every change.  The logic in oooq-extras
> should be version specific and if we're changing an interface in
> tripleo in a breaking fashion we're doing it wrong in tripleo. If
> we're backporting things to work around tripleo issues, we're doing it
> wrong in quickstart.
>
> > Regarding catching such issues and Bogdans point, that's right we added a
> > few jobs to catch such issues in the future and prevent breakages, and a
> few
> > running jobs is reasonable price to keep configuration working in all
> > branches. Comparing to maintenance nightmare with branches of CI code,
> it's
> > really a *zero* price.
> >
>
> Nothing is free. If there's a high maintenance cost, we haven't
> properly identified the optimal way to separate functionality between
> tripleo/quickstart.  I have repeatedly said that the provisioning
> parts of quickstart should be separate because those aren't tied to a
> tripleo version and this along with the scenario configs should be the
> only unbranched repo we have. Any roles related to how to
> configure/work with tripleo should be branched and tied to a stable
> branch of tripleo. This would actually be beneficial for tripleo as
> well because then we can see when we are introducing backwards
> incompatible changes.
>
> Thanks,
> -Alex
>
> > Thanks
> >
> >
> > On Wed, May 23, 2018 at 3:43 PM, Sergii Golovatiuk 
> > wrote:
> >>
> >> Hi,
> >>
> >> Looking at [1], I am thinking about the price we paid for not
> >> branching tripleo-quickstart. Can we discuss the options to prevent
> >> the issues such as [1]? Thank you in advance.
> >>
> >> [1] https://review.openstack.org/#/c/569830/4
> >>
> >> --
> >> Best Regards,
> >> Sergii Golovatiuk
> >>
> >>

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Alex Schultz

On Wed, May 23, 2018 at 8:30 AM, Sagi Shnaidman  wrote:
> Hi, Sergii
>
> thanks for the question. It's not first time that this topic is raised and
> from first view it could seem that branching would help to that sort of
> issues.
>
> Although it's not the case. Tripleo-quickstart(-extras) is part of CI code,
> as well as tripleo-ci repo which have never been branched. The reason for
> that is relative small impact on CI code from product branching. Think about
> backport almost *every* patch to oooq and extras to all supported branches,
> down to newton at least. This will be a really *huge* price and non
> reasonable work. Just think about active maintenance of 3-4 versions of CI
> code in each of 3 repositories. It will take all time of CI team with almost
> zero value of this work.
>

So I'm not sure I completely agree with this assessment as there is a
price paid for every {%if release in [...]%} that we have to carry in
oooq{,-extras}.  These go away if we branch because we don't have to
worry about breaking previous releases or current release (which may
or may not actually have CI results).

> What regards patch you listed, we would have backport this change to *every*
> branch, and it wouldn't really help to avoid the issue. The source of
> problem is not branchless repo here.
>

No we shouldn't be backporting every change.  The logic in oooq-extras
should be version specific and if we're changing an interface in
tripleo in a breaking fashion we're doing it wrong in tripleo. If
we're backporting things to work around tripleo issues, we're doing it
wrong in quickstart.

> Regarding catching such issues and Bogdans point, that's right we added a
> few jobs to catch such issues in the future and prevent breakages, and a few
> running jobs is reasonable price to keep configuration working in all
> branches. Comparing to maintenance nightmare with branches of CI code, it's
> really a *zero* price.
>

Nothing is free. If there's a high maintenance cost, we haven't
properly identified the optimal way to separate functionality between
tripleo/quickstart.  I have repeatedly said that the provisioning
parts of quickstart should be separate because those aren't tied to a
tripleo version and this along with the scenario configs should be the
only unbranched repo we have. Any roles related to how to
configure/work with tripleo should be branched and tied to a stable
branch of tripleo. This would actually be beneficial for tripleo as
well because then we can see when we are introducing backwards
incompatible changes.

Thanks,
-Alex

> Thanks
>
>
> On Wed, May 23, 2018 at 3:43 PM, Sergii Golovatiuk 
> wrote:
>>
>> Hi,
>>
>> Looking at [1], I am thinking about the price we paid for not
>> branching tripleo-quickstart. Can we discuss the options to prevent
>> the issues such as [1]? Thank you in advance.
>>
>> [1] https://review.openstack.org/#/c/569830/4
>>
>> --
>> Best Regards,
>> Sergii Golovatiuk
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Best regards
> Sagi Shnaidman
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Sagi Shnaidman

Hi, Sergii

thanks for the question. It's not first time that this topic is raised and
from first view it could seem that branching would help to that sort of
issues.

Although it's not the case. Tripleo-quickstart(-extras) is part of CI code,
as well as tripleo-ci repo which have never been branched. The reason for
that is relative small impact on CI code from product branching. Think
about backport almost *every* patch to oooq and extras to all supported
branches, down to newton at least. This will be a really *huge* price and
non reasonable work. Just think about active maintenance of 3-4 versions of
CI code in each of 3 repositories. It will take all time of CI team with
almost zero value of this work.

What regards patch you listed, we would have backport this change to
*every* branch, and it wouldn't really help to avoid the issue. The source
of problem is not branchless repo here.

Regarding catching such issues and Bogdans point, that's right we added a
few jobs to catch such issues in the future and prevent breakages, and a
few running jobs is reasonable price to keep configuration working in all
branches. Comparing to maintenance nightmare with branches of CI code, it's
really a *zero* price.

Thanks

On Wed, May 23, 2018 at 3:43 PM, Sergii Golovatiuk 
wrote:

> Hi,
>
> Looking at [1], I am thinking about the price we paid for not
> branching tripleo-quickstart. Can we discuss the options to prevent
> the issues such as [1]? Thank you in advance.
>
> [1] https://review.openstack.org/#/c/569830/4
>
> --
> Best Regards,
> Sergii Golovatiuk
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Bogdan Dobrelya


On 5/23/18 2:43 PM, Sergii Golovatiuk wrote:

Hi,

Looking at [1], I am thinking about the price we paid for not
branching tripleo-quickstart. Can we discuss the options to prevent
the issues such as [1]? Thank you in advance.

[1] https://review.openstack.org/#/c/569830/4



That was only a half of the full price, actually, see also additional 
multinode containers check/gate jobs  [0],[1] from now on executed 
against the master branches of all tripleo repos (IIUC), for release -2 
and -1

from master.

[0] https://review.openstack.org/#/c/569932/
[1] https://review.openstack.org/#/c/569854/


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squads’ Sprint 12 Summary: libvirt-reproducer, python-tempestconf

2018-05-09 Thread Bogdan Dobrelya


On 5/9/18 4:24 AM, Matt Young wrote:

Greetings,

The TripleO squads for CI and Tempest have just completed Sprint 12.  
The following is a summary of activities during this sprint.   Details 
on our team structure can be found in the spec [1].


---

# Sprint 12 Epic (CI): Libvirt Reproducer

* Epic Card: https://trello.com/c/JEGLSVh6/51-reproduce-ci-jobs-with-libvirt
* Tasks: http://ow.ly/O1vZ30jTSc3

"Allow developers to reproduce a multinode CI job on a bare metal host 
using libvirt"
"Enable the same workflows used in upstream CI / reproducer using 
libvirt instead of OVB as the provisioning mechanism"


The CI Squad prototyped, designed, and implemented new functionality for 
our CI reproducer.   “Reproducers” are scripts generated by each CI job 
that allow the job/test to be recreated.  These are useful to both CI 
team members when investigating failures, as well as developers creating 
failures with the intent to iteratively debug and/or fix issues.  Prior 
to this sprint, the reproducer scripts supported reproduction of 
upstream CI jobs using OVB, typically on RDO Cloud.  This sprint we 
extended this capability to support reproduction of jobs in libvirt.


This work was done for a few reasons:

* (short term) enable the team to work on upgrades and other CI team 
tasks more efficiently by mitigating recurring RDO Cloud infrastructure 
issues.  This was the primary motivator for doing this work at this time.
* (mid-longer term) enhance / enable iterative workflows such as THT 
development, debugging deployment scenarios, etc.  Snapshots in 
particular have proven quite useful.  As we look towards a future with a 
viable single-node deployment capability, libvirt has clear benefits for 
common developer scenarios.


Thank you for that, a really cool feature for tripleo development!



It is expected that further iteration and refinement of this initial 
implementation will be required before the tripleo-ci team is able to 
support this broadly.  What we’ve done works as designed.  While we 
welcome folks to explore, please note that we are not announcing a 
supported libvirt reproducer meant for use outside the tripleo-ci team 
at this time.  We expect some degree of change, and have a number of 
RFE’s resulting from our testing as well as documentation patches that 
we’re iterating on.


That said, we think it’s really cool, works well in its current form, 
and are optimistic about its future.


## We did the following (CI):

* Add support to the reproducer script [2,3] generated by CI to enable 
libvirt.

* Basic snapshot create/restore [4] capability.
* Tested Scenarios: featureset 3 (UC idem), 10 (multinode containers), 
37 (min OC + minor update).  See sprint cards for details.
* 14-18 RFE’s identified as part of testing for future work 
http://ow.ly/J2u830jTSLG


---

# Sprint 12 Epic (Tempest):

* Epic Card: https://trello.com/c/ifIYQsxs/75-sprint-12-undercloud-tempest
* Tasks: http://ow.ly/GGvc30jTSfV

“Run tempest on undercloud by using containerized and packaged tempest”
“Complete work items carried from sprint 11 or another side work going on.”

## We did the following (Tempest):

* Create tripleo-ci jobs that run containerized tempest on all stable 
branches.
* Create documentation for configuring and running tempest using 
containerized tempest on UC @tripleo.org , and blog 
posts. [5,6,7]

* Run certification tests via new Jenkins job using ansible role [8]
* Refactor validate-tempest CI role for UC and containers

---

# Ruck and Rover

Each sprint two of the team members assume the roles of Ruck and Rover 
(each for half of the sprint).


* Ruck is responsible to monitoring the CI, checking for failures, 
opening bugs, participate on meetings, and this is your focal point to 
any CI issues.
* Rover is responsible to work on these bugs, fix problems and the rest 
of the team are focused on the sprint. For more information about our 
structure, check [1]


## Ruck & Rover (Sprint 12), Etherpad [9,10]:

* Quique Llorente(quiquell)
* Gabriele Cerami (panda)

A few notable issues where substantial time was spent were:

1767099 
periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset030-master vxlan 
tunnel fails randomly

1758899 reproducer-quickstart.sh building wrong gating package.
1767343 gate tripleo-ci-centos-7-containers-multinode fails to update 
packages in cron container
1762351 
periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-queens-upload 
is timeout Depends on https://bugzilla.redhat.com/show_bug.cgi?id=1565179

1766873 quickstart on ovb doesn't yield a deployment
1767049 Error during test discovery : 'must specify exactly one of host 
or intercept' Depends on https://bugzilla.redhat.com/show_bug.cgi?id=1434385
1767076 Creating pingtest_sack fails: Failed to schedule instances: 
NoValidHost_Remote: No valid host was found

1763634 devmode.sh --ovb fails to deploy overcloud
1765680 Incorrect branch used for not gated tripleo-upgrade repo

Re: [openstack-dev] [TripleO][ci][ceph] switching to config-download by default

2018-04-20 Thread James Slagle

On Thu, Apr 5, 2018 at 10:38 AM, James Slagle  wrote:
> I've pushed up for review a set of patches to switch us over to using
> config-download by default:
>
> https://review.openstack.org/#/q/topic:bp/config-download-default
>
> I believe I've come up with the proper series of steps to switch
> things over. Let me know if you have any feedback or foresee any
> issues:
>
> FIrst, we update remaining multinode jobs
> (https://review.openstack.org/558965) and ovb jobs
> (https://review.openstack.org/559067) that run against master to
> opt-in to config-download. This will expose any issues with these jobs
> and config-download and let us fix those issues.
>
> We can then switch tripleoclient (https://review.openstack.org/558925)
> over to use config-download by default. Since this also requires a
> Heat environment, we must forcibly inject that environment via
> tripleoclient.

FYI, the above work is completed and config-download is now the
default with tripleoclient.

>
> Once the tripleoclient patch lands, we can update
> tripleo-heat-templates to use the mappings from config-download in the
> default resource registry (https://review.openstack.org/558927).
>
> We can then remove the forcibly injected environment from
> tripleoclient (https://review.openstack.org/558931)

We're now moving forward with the above 2 patches. jtomasek is making
good progress with the UI and support for config-download should be
landing there soon.

>
> Finally, we can go back and update the multinode/ovb jobs on master to
> not be opt-in for config-download since it would now be the default
> (no patch yet).
>
> Now...for Ceph it will be slightly different:

It took some CI wrangling, but Ceph is now switched over to use
external_deploy_tasks. There are patches in progress to clean up the
old workflow_tasks:

https://review.openstack.org/563040
https://review.openstack.org/563113

There will be some further patches for CI to remove other explicit
opt-in's for config-download since it's now the default.

Feel free to ping me directly if you think you've found any issues
related to any of the config-download work, or file bugs in launchpad
using the official "config-download" tag.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] use of tags in launchpad bugs

2018-04-06 Thread Rafael Folco

Thanks for the clarifications about official tags. I was the one creating
random/non-official tags for tripleo bugs.
Although this may be annoying for some people, it helped me while
ruckering/rovering CI to open unique bugs and avoid dups for the first
time(s).
There isn't a standard way of filing a bug. People open bugs using
different/non-standard wording in summary and description.
I just thought it was a good idea to tag featuresetXXX, ovb, branch, etc.,
so when somebody asks me if there is a bug for the job XYZ, the bug could
be found more easily.

Since sprint 10 ruck/rover started recording notes [1] and this helps to
keep track of the issues.
Perhaps the CI team could implement something on CI monitoring that links a
bug to the failing job(s), e.g:  [LP XX].

I'm doing a cleanup for the open bugs removing the non-official tags.

Thanks,

--Folco

[1] https://review.rdoproject.org/etherpad/p/ruckrover-sprint11


On Fri, Apr 6, 2018 at 6:09 AM, Jiří Stránský  wrote:

> On 5.4.2018 21:04, Alex Schultz wrote:
>
>> On Thu, Apr 5, 2018 at 12:55 PM, Wesley Hayutin 
>> wrote:
>>
>>> FYI...
>>>
>>> This is news to me so thanks to Emilien for pointing it out [1].
>>> There are official tags for tripleo launchpad bugs.  Personally, I like
>>> what
>>> I've seen recently with some extra tags as they could be helpful in
>>> finding
>>> the history of particular issues.
>>> So hypothetically would it be "wrong" to create an official tag for each
>>> featureset config number upstream.  I ask because that is adding a lot of
>>> tags but also serves as a good test case for what is good/bad use of
>>> tags.
>>>
>>>
>> We list official tags over in the specs repo[0].   That being said as
>> we investigate switching over to storyboard, we'll probably want to
>> revisit tags as they will have to be used more to replace some of the
>> functionality we had with launchpad (e.g. milestones).  You could
>> always add the tags without being an official tag. I'm not sure I
>> would really want all the featuresets as tags.  I'd rather see us
>> actually figure out what component is actually failing than relying on
>> a featureset (and the Rosetta stone for decoding featuresets to
>> functionality[1]).
>>
>
> We could also use both alongside. Component-based tags better relate to
> the actual root cause of the bug, while featureset-based tags are useful in
> relation to CI.
>
> E.g. "I see fs037 failing, i wonder if anyone already reported a bug for
> it" -- if the reporter tagged the bug, it would be really easy to figure
> out the answer.
>
> This might also again bring up the question of better job names to allow
> easier mapping to featuresets. IMO:
>
> tripleo-ci-centos-7-containers-multinode  -- not great
> tripleo-ci-centos-7-featureset010  -- not great
> tripleo-ci-centos-7-containers-mn-fs010  -- *happy face*
>
> Jirka
>
>
>
>>
>> Thanks,
>> -Alex
>>
>>
>> [0] http://git.openstack.org/cgit/openstack/tripleo-specs/tree/s
>> pecs/policy/bug-tagging.rst#n30
>> [1] https://git.openstack.org/cgit/openstack/tripleo-quickstart/
>> tree/doc/source/feature-configuration.rst#n21
>>
>>> Thanks
>>>
>>> [1] https://bugs.launchpad.net/tripleo/+manage-official-tags
>>>
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Rafael Folco
Senior Software Engineer
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] use of tags in launchpad bugs

2018-04-06 Thread Jiří Stránský

On 5.4.2018 21:04, Alex Schultz wrote:

On Thu, Apr 5, 2018 at 12:55 PM, Wesley Hayutin wrote:

FYI...

This is news to me so thanks to Emilien for pointing it out [1].
There are official tags for tripleo launchpad bugs. Personally, I like what
I've seen recently with some extra tags as they could be helpful in finding
the history of particular issues.
So hypothetically would it be "wrong" to create an official tag for each
featureset config number upstream. I ask because that is adding a lot of
tags but also serves as a good test case for what is good/bad use of tags.

We list official tags over in the specs repo[0]. That being said as
we investigate switching over to storyboard, we'll probably want to
revisit tags as they will have to be used more to replace some of the
functionality we had with launchpad (e.g. milestones). You could
always add the tags without being an official tag. I'm not sure I
would really want all the featuresets as tags. I'd rather see us
actually figure out what component is actually failing than relying on
a featureset (and the Rosetta stone for decoding featuresets to
functionality[1]).

We could also use both alongside. Component-based tags better relate to
the actual root cause of the bug, while featureset-based tags are useful
in relation to CI.

E.g. "I see fs037 failing, i wonder if anyone already reported a bug for
it" -- if the reporter tagged the bug, it would be really easy to figure
out the answer.

This might also again bring up the question of better job names to allow
easier mapping to featuresets. IMO:

tripleo-ci-centos-7-containers-multinode -- not great
tripleo-ci-centos-7-featureset010 -- not great
tripleo-ci-centos-7-containers-mn-fs010 -- *happy face*

Jirka

Thanks,
-Alex

[0]
http://git.openstack.org/cgit/openstack/tripleo-specs/tree/specs/policy/bug-tagging.rst#n30
[1]
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/doc/source/feature-configuration.rst#n21

Thanks

[1] https://bugs.launchpad.net/tripleo/+manage-official-tags

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] use of tags in launchpad bugs

2018-04-05 Thread Alex Schultz

On Thu, Apr 5, 2018 at 12:55 PM, Wesley Hayutin  wrote:
> FYI...
>
> This is news to me so thanks to Emilien for pointing it out [1].
> There are official tags for tripleo launchpad bugs.  Personally, I like what
> I've seen recently with some extra tags as they could be helpful in finding
> the history of particular issues.
> So hypothetically would it be "wrong" to create an official tag for each
> featureset config number upstream.  I ask because that is adding a lot of
> tags but also serves as a good test case for what is good/bad use of tags.
>

We list official tags over in the specs repo[0].   That being said as
we investigate switching over to storyboard, we'll probably want to
revisit tags as they will have to be used more to replace some of the
functionality we had with launchpad (e.g. milestones).  You could
always add the tags without being an official tag. I'm not sure I
would really want all the featuresets as tags.  I'd rather see us
actually figure out what component is actually failing than relying on
a featureset (and the Rosetta stone for decoding featuresets to
functionality[1]).

Thanks,
-Alex

[0] 
http://git.openstack.org/cgit/openstack/tripleo-specs/tree/specs/policy/bug-tagging.rst#n30
[1] 
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/doc/source/feature-configuration.rst#n21
> Thanks
>
> [1] https://bugs.launchpad.net/tripleo/+manage-official-tags
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][ci][ceph] switching to config-download by default

2018-04-05 Thread James Slagle

On Thu, Apr 5, 2018 at 10:38 AM, James Slagle  wrote:
> I've pushed up for review a set of patches to switch us over to using
> config-download by default:
>
> https://review.openstack.org/#/q/topic:bp/config-download-default
>
> I believe I've come up with the proper series of steps to switch
> things over. Let me know if you have any feedback or foresee any
> issues:
>
> FIrst, we update remaining multinode jobs
> (https://review.openstack.org/558965) and ovb jobs
> (https://review.openstack.org/559067) that run against master to
> opt-in to config-download. This will expose any issues with these jobs
> and config-download and let us fix those issues.
>
> We can then switch tripleoclient (https://review.openstack.org/558925)
> over to use config-download by default. Since this also requires a
> Heat environment, we must forcibly inject that environment via
> tripleoclient.
>
> Once the tripleoclient patch lands, we can update
> tripleo-heat-templates to use the mappings from config-download in the
> default resource registry (https://review.openstack.org/558927).

I forgot to mention that at this point the UI would have to be working
with config-download before we land that tripleo-heat-templates patch.
Or, the UI could opt-in to the
disable-config-download-environment.yaml that I'm providing with that
patch.


-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-15 Thread Sam P

Hi All,
 Sorry, Late to the party...
 I have added myself.

--- Regards,
Sampath


On Fri, Mar 16, 2018 at 9:31 AM, Ghanshyam Mann 
wrote:

> On Thu, Mar 15, 2018 at 9:45 PM, Adam Spiers  wrote:
> > Raoul Scarazzini  wrote:
> >>
> >> On 15/03/2018 01:57, Ghanshyam Mann wrote:
> >>>
> >>> Thanks all for starting the collaboration on this which is long pending
> >>> things and we all want to have some start on this.
> >>> Myself and SamP talked about it during OPS meetup in Tokyo and we
> talked
> >>> about below draft plan-
> >>> - Update the Spec - https://review.openstack.org/#/c/443504/. which is
> >>> almost ready as per SamP and his team is working on that.
> >>> - Start the technical debate on tooling we can use/reuse like Yardstick
> >>> etc, which is more this mailing thread.
> >>> - Accept the new repo for Eris under QA and start at least something in
> >>> Rocky cycle.
> >>> I am in for having meeting on this which is really good idea. non-IRC
> >>> meeting is totally fine here. Do we have meeting place and time setup ?
> >>> -gmann
> >>
> >>
> >> Hi Ghanshyam,
> >> as I wrote earlier in the thread it's no problem for me to offer my
> >> bluejeans channel, let's sort out which timeslice can be good. I've
> >> added to the main etherpad [1] my timezone (line 53), let's do all that
> >> so that we can create the meeting invite.
> >>
> >> [1] https://etherpad.openstack.org/p/extreme-testing-contacts
> >
> >
> > Good idea!  I've added mine.  We're still missing replies from several
> > key stakeholders though (lines 62++) - probably worth getting buy-in
> > from a few more people before we organise anything.  I'm pinging a few
> > on IRC with reminders about this.
> >
>
> Thanks rasca, aspiers. I have added myself there and yea good ides to
> ping remaining on IRC.
>
> -gmann
>
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-15 Thread Ghanshyam Mann

On Thu, Mar 15, 2018 at 9:45 PM, Adam Spiers  wrote:
> Raoul Scarazzini  wrote:
>>
>> On 15/03/2018 01:57, Ghanshyam Mann wrote:
>>>
>>> Thanks all for starting the collaboration on this which is long pending
>>> things and we all want to have some start on this.
>>> Myself and SamP talked about it during OPS meetup in Tokyo and we talked
>>> about below draft plan-
>>> - Update the Spec - https://review.openstack.org/#/c/443504/. which is
>>> almost ready as per SamP and his team is working on that.
>>> - Start the technical debate on tooling we can use/reuse like Yardstick
>>> etc, which is more this mailing thread.
>>> - Accept the new repo for Eris under QA and start at least something in
>>> Rocky cycle.
>>> I am in for having meeting on this which is really good idea. non-IRC
>>> meeting is totally fine here. Do we have meeting place and time setup ?
>>> -gmann
>>
>>
>> Hi Ghanshyam,
>> as I wrote earlier in the thread it's no problem for me to offer my
>> bluejeans channel, let's sort out which timeslice can be good. I've
>> added to the main etherpad [1] my timezone (line 53), let's do all that
>> so that we can create the meeting invite.
>>
>> [1] https://etherpad.openstack.org/p/extreme-testing-contacts
>
>
> Good idea!  I've added mine.  We're still missing replies from several
> key stakeholders though (lines 62++) - probably worth getting buy-in
> from a few more people before we organise anything.  I'm pinging a few
> on IRC with reminders about this.
>

Thanks rasca, aspiers. I have added myself there and yea good ides to
ping remaining on IRC.

-gmann

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-15 Thread Adam Spiers


Raoul Scarazzini  wrote:

On 15/03/2018 01:57, Ghanshyam Mann wrote:

Thanks all for starting the collaboration on this which is long pending
things and we all want to have some start on this.
Myself and SamP talked about it during OPS meetup in Tokyo and we talked
about below draft plan-
- Update the Spec - https://review.openstack.org/#/c/443504/. which is
almost ready as per SamP and his team is working on that.
- Start the technical debate on tooling we can use/reuse like Yardstick
etc, which is more this mailing thread. 
- Accept the new repo for Eris under QA and start at least something in
Rocky cycle.
I am in for having meeting on this which is really good idea. non-IRC
meeting is totally fine here. Do we have meeting place and time setup ?
-gmann


Hi Ghanshyam,
as I wrote earlier in the thread it's no problem for me to offer my
bluejeans channel, let's sort out which timeslice can be good. I've
added to the main etherpad [1] my timezone (line 53), let's do all that
so that we can create the meeting invite.

[1] https://etherpad.openstack.org/p/extreme-testing-contacts


Good idea!  I've added mine.  We're still missing replies from several
key stakeholders though (lines 62++) - probably worth getting buy-in
from a few more people before we organise anything.  I'm pinging a few
on IRC with reminders about this.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-15 Thread Raoul Scarazzini

On 15/03/2018 01:57, Ghanshyam Mann wrote:
> Thanks all for starting the collaboration on this which is long pending
> things and we all want to have some start on this.
> Myself and SamP talked about it during OPS meetup in Tokyo and we talked
> about below draft plan-
> - Update the Spec - https://review.openstack.org/#/c/443504/. which is
> almost ready as per SamP and his team is working on that.
> - Start the technical debate on tooling we can use/reuse like Yardstick
> etc, which is more this mailing thread. 
> - Accept the new repo for Eris under QA and start at least something in
> Rocky cycle.
> I am in for having meeting on this which is really good idea. non-IRC
> meeting is totally fine here. Do we have meeting place and time setup ?
> -gmann

Hi Ghanshyam,
as I wrote earlier in the thread it's no problem for me to offer my
bluejeans channel, let's sort out which timeslice can be good. I've
added to the main etherpad [1] my timezone (line 53), let's do all that
so that we can create the meeting invite.

[1] https://etherpad.openstack.org/p/extreme-testing-contacts

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-14 Thread Ghanshyam Mann

Thanks all for starting the collaboration on this which is long pending
things and we all want to have some start on this.

Myself and SamP talked about it during OPS meetup in Tokyo and we talked
about below draft plan-

- Update the Spec - https://review.openstack.org/#/c/443504/. which is
almost ready as per SamP and his team is working on that.
- Start the technical debate on tooling we can use/reuse like Yardstick
etc, which is more this mailing thread.
- Accept the new repo for Eris under QA and start at least something in
Rocky cycle.

I am in for having meeting on this which is really good idea. non-IRC
meeting is totally fine here. Do we have meeting place and time setup ?

-gmann

On Fri, Mar 9, 2018 at 8:16 PM, Bogdan Dobrelya  wrote:

> On 3/8/18 6:44 PM, Raoul Scarazzini wrote:
>
>> On 08/03/2018 17:03, Adam Spiers wrote:
>> [...]
>>
>>> Yes agreed again, this is a strong case for collaboration between the
>>> self-healing and QA SIGs.  In Dublin we also discussed the idea of the
>>> self-healing and API SIGs collaborating on the related topic of health
>>> check APIs.
>>>
>>
>> Guys, thanks a ton for your involvement in the topic, I am +1 to any
>> kind of meeting we can have to discuss this (like it was proposed by
>>
>
> Please count me in as well. I can't stop dreaming of Jepsen's Nemesis [0]
> hammering openstack to make it stronger :D
> Jokes off, let's do the best to consolidate on frameworks and tools and
> ditching NIH syndrome!
>
> [0] https://github.com/jepsen-io/jepsen/blob/master/jepsen/src/j
> epsen/nemesis.clj
>
> Adam) so I'll offer my bluejeans channel for whatever kind of meeting we
>> want to organize.
>> About the best practices part Georg was mentioning I'm 100% in
>> agreement, the testing methodologies are the first thing we need to care
>> about, starting from what we want to achieve.
>> That said, I'll keep studying Yardstick.
>>
>> Hope to hear from you soon, and thanks again!
>>
>>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-09 Thread Bogdan Dobrelya


On 3/8/18 6:44 PM, Raoul Scarazzini wrote:

On 08/03/2018 17:03, Adam Spiers wrote:
[...]

Yes agreed again, this is a strong case for collaboration between the
self-healing and QA SIGs.  In Dublin we also discussed the idea of the
self-healing and API SIGs collaborating on the related topic of health
check APIs.


Guys, thanks a ton for your involvement in the topic, I am +1 to any
kind of meeting we can have to discuss this (like it was proposed by


Please count me in as well. I can't stop dreaming of Jepsen's Nemesis 
[0] hammering openstack to make it stronger :D
Jokes off, let's do the best to consolidate on frameworks and tools and 
ditching NIH syndrome!


[0] 
https://github.com/jepsen-io/jepsen/blob/master/jepsen/src/jepsen/nemesis.clj



Adam) so I'll offer my bluejeans channel for whatever kind of meeting we
want to organize.
About the best practices part Georg was mentioning I'm 100% in
agreement, the testing methodologies are the first thing we need to care
about, starting from what we want to achieve.
That said, I'll keep studying Yardstick.

Hope to hear from you soon, and thanks again!




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-08 Thread Adam Spiers


Raoul Scarazzini  wrote:

On 08/03/2018 17:03, Adam Spiers wrote:
[...]

Yes agreed again, this is a strong case for collaboration between the
self-healing and QA SIGs.  In Dublin we also discussed the idea of the
self-healing and API SIGs collaborating on the related topic of health
check APIs.


Guys, thanks a ton for your involvement in the topic, I am +1 to any
kind of meeting we can have to discuss this (like it was proposed by
Adam) so I'll offer my bluejeans channel for whatever kind of meeting we
want to organize.


Awesome, thanks - bluejeans would be great.


About the best practices part Georg was mentioning I'm 100% in
agreement, the testing methodologies are the first thing we need to care
about, starting from what we want to achieve.
That said, I'll keep studying Yardstick.

Hope to hear from you soon, and thanks again!


Yep - let's wait for people to catch up with the thread and hopefully
we'll get enough volunteers on

 https://etherpad.openstack.org/p/extreme-testing-contacts

for critical mass and then we can start discussing!  I think it's
especially important that we have the Eris folks on board since they
have already been working on this for a while.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-08 Thread Raoul Scarazzini

On 08/03/2018 17:03, Adam Spiers wrote:
[...]
> Yes agreed again, this is a strong case for collaboration between the
> self-healing and QA SIGs.  In Dublin we also discussed the idea of the
> self-healing and API SIGs collaborating on the related topic of health
> check APIs.

Guys, thanks a ton for your involvement in the topic, I am +1 to any
kind of meeting we can have to discuss this (like it was proposed by
Adam) so I'll offer my bluejeans channel for whatever kind of meeting we
want to organize.
About the best practices part Georg was mentioning I'm 100% in
agreement, the testing methodologies are the first thing we need to care
about, starting from what we want to achieve.
That said, I'll keep studying Yardstick.

Hope to hear from you soon, and thanks again!

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-08 Thread Adam Spiers

Georg Kunz wrote:

Hi Adam,

Raoul Scarazzini wrote:
In the meantime, I'll check yardstick to see which kind of bridge we
can build to avoid reinventing the wheel.

Great, thanks! I wish I could immediately help with this, but I haven't had the
chance to learn yardstick myself yet. We should probably try to recruit
someone from OPNFV to provide advice. I've cc'd Georg who IIRC was the
person who originally told me about yardstick :-) He is an NFV expert and is
also very interested in automated testing efforts:

http://lists.openstack.org/pipermail/openstack-dev/2017-November/124942.html

so he may be able to help with this architectural challenge.

Thank you for bringing this up here. Better collaboration and sharing of knowledge, methodologies and tools across the communities is really what I'd like to see and facilitate. Hence, I am happy to help.

I have already started to advertise the newly proposed QA SIG in the OPNFV test WG and I'll happily do the same for the self-healing SIG and any HA testing efforts in general. There is certainly some overlapping interest in these testing aspects between the QA SIG and the self-healing SIG and hence collaboration between both SIGs is crucial.

That's fantastic - thank you so much!

One remark regarding tools and frameworks: I consider the true value of a SIG to be a place for talking about methodologies and best practices: What do we need to test? What are the challenges? How can we approach this across communities? The tools and frameworks are important and we should investigate which tools are available, how good they are, how much they fit a given purpose, but at the end of the day they are tools meant to enable well designed testing methodologies.

Agreed 100%.

[snipped]

I'm beginning to think that maybe we should organise a video conference call
to coordinate efforts between the various interested parties. If there is
appetite for that, the first question is: who wants to be involved? To answer
that, I have created an etherpad where interested people can sign up:

https://etherpad.openstack.org/p/extreme-testing-contacts

and I've cc'd people who I think would probably be interested. Does this
sound like a good approach?

We discussed a very similar idea in Dublin in the context of the QA SIG. I very much like the idea of a cross-community, cross-team, and apparently even cross-SIG approach.

Yes agreed again, this is a strong case for collaboration between the
self-healing and QA SIGs. In Dublin we also discussed the idea of the
self-healing and API SIGs collaborating on the related topic of health
check APIs.

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-07 Thread Georg Kunz

Hi Adam,

> Raoul Scarazzini  wrote:
> >On 06/03/2018 13:27, Adam Spiers wrote:
> >> Hi Raoul and all,
> >> Sorry for joining this discussion late!
> >[...]
> >> I do not work on TripleO, but I'm part of the wider OpenStack
> >> sub-communities which focus on HA[0] and more recently,
> >> self-healing[1].  With that hat on, I'd like to suggest that maybe
> >> it's possible to collaborate on this in a manner which is agnostic to
> >> the deployment mechanism.  There is an open spec on this>
> >> https://review.openstack.org/#/c/443504/
> >> which was mentioned in the Denver PTG session on destructive testing
> >> which you referenced[2].
> >[...]
> >>    https://www.opnfv.org/community/projects/yardstick
> >[...]
> >> Currently each sub-community and vendor seems to be reinventing HA
> >> testing by itself to some extent, which is easier to accomplish in
> >> the short-term, but obviously less efficient in the long-term.  It
> >> would be awesome if we could break these silos down and join efforts!
> >> :-)
> >
> >Hi Adam,
> >First of all thanks for your detailed answer. Then let me be honest
> >while saying that I didn't know yardstick.
> 
> Neither did I until Sydney, despite being involved with OpenStack HA for
> many years ;-)  I think this shows that either a) there is room for improved
> communication between the OpenStack and OPNFV communities, or b) I
> need to take my head out of the sand more often ;-)
> 
> >I need to start from scratch
> >here to understand what this project is. In any case, the exact meaning
> >of this thread is to involve people and have a more comprehensive look
> >at what's around.
> >The point here is that, as you can see from the tripleo-ha-utils spec
> >[1] I've created, the project is meant for TripleO specifically. On one
> >side this is a significant limitation, but on the other one, due to the
> >pluggable nature of the project, I think that integrations with other
> >software like you are proposing is not impossible.
> 
> Yep.  I totally sympathise with the tension between the need to get
> something working quickly, vs. the need to collaborate with the community
> in the most efficient way.
> 
> >Feel free to add your comments to the review.
> 
> The spec looks great to me; I don't really have anything to add, and I don't
> feel comfortable voting in a project which I know very little about.
> 
> >In the meantime, I'll check yardstick to see which kind of bridge we
> >can build to avoid reinventing the wheel.
> 
> Great, thanks!  I wish I could immediately help with this, but I haven't had 
> the
> chance to learn yardstick myself yet.  We should probably try to recruit
> someone from OPNFV to provide advice.  I've cc'd Georg who IIRC was the
> person who originally told me about yardstick :-)  He is an NFV expert and is
> also very interested in automated testing efforts:
> 
> http://lists.openstack.org/pipermail/openstack-dev/2017-
> November/124942.html
> 
> so he may be able to help with this architectural challenge.

Thank you for bringing this up here. Better collaboration and sharing of 
knowledge, methodologies and tools across the communities is really what I'd 
like to see and facilitate. Hence, I am happy to help.

I have already started to advertise the newly proposed QA SIG in the OPNFV test 
WG and I'll happily do the same for the self-healing SIG and any HA testing 
efforts in general. There is certainly some overlapping interest in these 
testing aspects between the QA SIG and the self-healing SIG and hence 
collaboration between both SIGs is crucial.

One remark regarding tools and frameworks: I consider the true value of a SIG 
to be a place for talking about methodologies and best practices: What do we 
need to test? What are the challenges? How can we approach this across 
communities? The tools and frameworks are important and we should investigate 
which tools are available, how good they are, how much they fit a given 
purpose, but at the end of the day they are tools meant to enable well designed 
testing methodologies.

> Also you should be aware that work has already started on Eris, the extreme
> testing framework proposed in this user story:
> 
> http://specs.openstack.org/openstack/openstack-user-stories/user-
> stories/proposed/openstack_extreme_testing.html
> 
> and in the spec you already saw:
> 
> https://review.openstack.org/#/c/443504/
> 
> You can see ongoing work here:
> 
> https://github.com/LCOO/eris
> https://openstack-
> lcoo.atlassian.net/wiki/spaces/LCOO/pages/13393034/Eris+-
> +Extreme+Testing+Framework+for+OpenStack
> 
> It looks like there is a plan to propose a new SIG for this, although 
> personally I
> would be very happy to see it adopted by the self-healing SIG, since this
> framework is exactly what is needed for testing any self-healing mechanism.
> 
> I'm hoping that Sampath and/or Gautum will chip in here, since I think they're
> currently the main drivers for Eris.
> 
>

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-07 Thread Adam Spiers

Raoul Scarazzini wrote:

On 06/03/2018 13:27, Adam Spiers wrote:

Hi Raoul and all,
Sorry for joining this discussion late!

[...]

I do not work on TripleO, but I'm part of the wider OpenStack
sub-communities which focus on HA[0] and more recently,
self-healing[1]. With that hat on, I'd like to suggest that maybe
it's possible to collaborate on this in a manner which is agnostic to
the deployment mechanism. There is an open spec on this>
https://review.openstack.org/#/c/443504/
which was mentioned in the Denver PTG session on destructive testing
which you referenced[2].

[...]

https://www.opnfv.org/community/projects/yardstick

[...]

Currently each sub-community and vendor seems to be reinventing HA
testing by itself to some extent, which is easier to accomplish in the
short-term, but obviously less efficient in the long-term. It would
be awesome if we could break these silos down and join efforts! :-)

Hi Adam,
First of all thanks for your detailed answer. Then let me be honest
while saying that I didn't know yardstick.

Neither did I until Sydney, despite being involved with OpenStack HA
for many years ;-) I think this shows that either a) there is room
for improved communication between the OpenStack and OPNFV
communities, or b) I need to take my head out of the sand more often ;-)

I need to start from scratch
here to understand what this project is. In any case, the exact meaning
of this thread is to involve people and have a more comprehensive look
at what's around.
The point here is that, as you can see from the tripleo-ha-utils spec
[1] I've created, the project is meant for TripleO specifically. On one
side this is a significant limitation, but on the other one, due to the
pluggable nature of the project, I think that integrations with other
software like you are proposing is not impossible.

Yep. I totally sympathise with the tension between the need to get
something working quickly, vs. the need to collaborate with the
community in the most efficient way.

Feel free to add your comments to the review.

The spec looks great to me; I don't really have anything to add, and I
don't feel comfortable voting in a project which I know very little
about.

In the meantime, I'll check yardstick to see which kind of bridge we
can build to avoid reinventing the wheel.

Great, thanks! I wish I could immediately help with this, but I
haven't had the chance to learn yardstick myself yet. We should
probably try to recruit someone from OPNFV to provide advice. I've
cc'd Georg who IIRC was the person who originally told me about
yardstick :-) He is an NFV expert and is also very interested in
automated testing efforts:

http://lists.openstack.org/pipermail/openstack-dev/2017-November/124942.html

so he may be able to help with this architectural challenge.

Also you should be aware that work has already started on Eris, the
extreme testing framework proposed in this user story:

http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/openstack_extreme_testing.html

and in the spec you already saw:

https://review.openstack.org/#/c/443504/

You can see ongoing work here:

https://github.com/LCOO/eris

https://openstack-lcoo.atlassian.net/wiki/spaces/LCOO/pages/13393034/Eris+-+Extreme+Testing+Framework+for+OpenStack

It looks like there is a plan to propose a new SIG for this, although
personally I would be very happy to see it adopted by the self-healing
SIG, since this framework is exactly what is needed for testing any
self-healing mechanism.

I'm hoping that Sampath and/or Gautum will chip in here, since I think
they're currently the main drivers for Eris.

I'm beginning to think that maybe we should organise a video
conference call to coordinate efforts between the various interested
parties. If there is appetite for that, the first question is: who
wants to be involved? To answer that, I have created an etherpad
where interested people can sign up:

https://etherpad.openstack.org/p/extreme-testing-contacts

and I've cc'd people who I think would probably be interested. Does
this sound like a good approach?

Cheers,
Adam

Re: [openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream

2018-03-06 Thread Raoul Scarazzini

On 06/03/2018 13:27, Adam Spiers wrote:
> Hi Raoul and all,
> Sorry for joining this discussion late!
[...]
> I do not work on TripleO, but I'm part of the wider OpenStack
> sub-communities which focus on HA[0] and more recently,
> self-healing[1].  With that hat on, I'd like to suggest that maybe
> it's possible to collaborate on this in a manner which is agnostic to
> the deployment mechanism.  There is an open spec on this>    
> https://review.openstack.org/#/c/443504/
> which was mentioned in the Denver PTG session on destructive testing
> which you referenced[2].
[...]
>    https://www.opnfv.org/community/projects/yardstick
[...]
> Currently each sub-community and vendor seems to be reinventing HA
> testing by itself to some extent, which is easier to accomplish in the
> short-term, but obviously less efficient in the long-term.  It would
> be awesome if we could break these silos down and join efforts! :-)

Hi Adam,
First of all thanks for your detailed answer. Then let me be honest
while saying that I didn't know yardstick. I need to start from scratch
here to understand what this project is. In any case, the exact meaning
of this thread is to involve people and have a more comprehensive look
at what's around.
The point here is that, as you can see from the tripleo-ha-utils spec
[1] I've created, the project is meant for TripleO specifically. On one
side this is a significant limitation, but on the other one, due to the
pluggable nature of the project, I think that integrations with other
software like you are proposing is not impossible.
Feel free to add your comments to the review. In the meantime, I'll
check yardstick to see which kind of bridge we can build to avoid
reinventing the wheel.

Thanks a lot again for your involvement,

[1] https://review.openstack.org/#/c/548874/

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream

2018-03-06 Thread Adam Spiers


Hi Raoul and all,

Sorry for joining this discussion late!

Raoul Scarazzini  wrote:

TL;DR: we would like to change the way HA is tested upstream to avoid
being hitten by evitable bugs that the CI process should discover.

Long version:

Today HA testing in upstream consist only in verifying that a three
controllers setup comes up correctly and can spawn an instance. That's
something, but it’s far from being enough since we continuously see "day
two" bugs.
We started covering this more than a year ago in internal CI and today
also on rdocloud using a project named tripleo-quickstart-utils [1].
Apart from his name, the project is not limited to tripleo-quickstart,
it covers three principal roles:

1 - stonith-config: a playbook that can be used to automate the creation
of fencing devices in the overcloud;
2 - instance-ha: a playbook that automates the seventeen manual steps
needed to configure instance HA in the overcloud, test them via rally
and verify that instance HA works;
3 - validate-ha: a playbook that runs a series of disruptive actions in
the overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;


Yes, a more rigorous approach to HA testing obviously has huge value,
not just for TripleO deployments, but also for any type of OpenStack
deployment.


To make this usable upstream, we need to understand where to put this
code. Here some choices:


[snipped]

I do not work on TripleO, but I'm part of the wider OpenStack
sub-communities which focus on HA[0] and more recently,
self-healing[1].  With that hat on, I'd like to suggest that maybe
it's possible to collaborate on this in a manner which is agnostic to
the deployment mechanism.  There is an open spec on this:

   https://review.openstack.org/#/c/443504/

which was mentioned in the Denver PTG session on destructive testing
which you referenced[2].

As mentioned in the self-healing SIG's session in Dublin[3], the OPNFV
community has already put a lot of effort into testing HA scenarios,
and it would be great if this work was shared across the whole
OpenStack community.  In particular they have a project called
Yardstick:

   https://www.opnfv.org/community/projects/yardstick

which contains a bunch of HA test cases:

   
http://docs.opnfv.org/en/latest/submodules/yardstick/docs/testing/user/userguide/15-list-of-tcs.html#h-a

Currently each sub-community and vendor seems to be reinventing HA
testing by itself to some extent, which is easier to accomplish in the
short-term, but obviously less efficient in the long-term.  It would
be awesome if we could break these silos down and join efforts! :-)

Cheers,
Adam

[0] #openstack-ha on Freenode IRC
[1] https://wiki.openstack.org/wiki/Self-healing_SIG
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing
[3] https://etherpad.openstack.org/p/self-healing-ptg-rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-03-02 Thread Raoul Scarazzini

On 02/03/2018 15:19, Emilien Macchi wrote:
> Talking with clarkb during PTG, we'll need to transform
> tripleo-quickstart-utils into a non-forked repo - or move the roles to
> an existing repo. But we can't continue to maintain this fork.
> Raoul, let us know what you think is best (move repo to OpenStack or
> move modules to an existing upstream repo).
> Thanks,

Hey Emilien,
I prepared this [1] in which some folks started to have a look at, maybe
it's what we need to move on on this.
If you think something else needs to be done, let me know, I'll work on it.

Thanks,

[1] https://review.openstack.org/#/c/548874

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-03-02 Thread Emilien Macchi

Talking with clarkb during PTG, we'll need to transform
tripleo-quickstart-utils into a non-forked repo - or move the roles to an
existing repo. But we can't continue to maintain this fork.

Raoul, let us know what you think is best (move repo to OpenStack or move
modules to an existing upstream repo).


Thanks,

On Fri, Feb 16, 2018 at 3:12 PM, Raoul Scarazzini  wrote:

> On 16/02/2018 15:41, Wesley Hayutin wrote:
> [...]
> > Using galaxy is an option however we would need to make sure that galaxy
> > is proxied across the upstream clouds.
> > Another option would be to follow the current established pattern of
> > adding it to the requirements file [1]
> > Thanks Bogdan, Raoul!
> > [1] https://github.com/openstack/tripleo-quickstart/
> blob/master/quickstart-extras-requirements.txt
>
> This is how we're using it today into the internal pipelines, so once we
> will have the tripleo-ha-utils (or whatever it will be called) it will
> be just a matter of adding it into the file. In the end I think that
> once the project will be created either way of using it will be fine.
>
> Thanks for your involvement on this folks!
>
> --
> Raoul Scarazzini
> ra...@redhat.com
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-02-16 Thread Raoul Scarazzini

On 16/02/2018 15:41, Wesley Hayutin wrote:
[...]
> Using galaxy is an option however we would need to make sure that galaxy
> is proxied across the upstream clouds.
> Another option would be to follow the current established pattern of
> adding it to the requirements file [1]
> Thanks Bogdan, Raoul!
> [1] 
> https://github.com/openstack/tripleo-quickstart/blob/master/quickstart-extras-requirements.txt

This is how we're using it today into the internal pipelines, so once we
will have the tripleo-ha-utils (or whatever it will be called) it will
be just a matter of adding it into the file. In the end I think that
once the project will be created either way of using it will be fine.

Thanks for your involvement on this folks!

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-02-16 Thread Wesley Hayutin

On Fri, Feb 16, 2018 at 9:16 AM, Bogdan Dobrelya 
wrote:

> On 2/16/18 2:59 PM, Raoul Scarazzini wrote:
>
>> On 16/02/2018 10:24, Bogdan Dobrelya wrote:
>> [...]
>>
>>> +1 this looks like a perfect fit. Would it be possible to install that
>>> tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside
>>> the quickstart, then apply destructive-testing playbooks with either the
>>> quickstart's static inventory [0] (from your admin/control node) or
>>> maybe via dynamic inventory [1] (from undercloud managing the overcloud
>>> under test via config-download and/or external ansible deployment
>>> mechanisms)?
>>> [0]
>>> https://git.openstack.org/cgit/openstack/tripleo-quickstart/
>>> tree/roles/tripleo-inventory
>>> [1]
>>> https://git.openstack.org/cgit/openstack/tripleo-validations
>>> /tree/scripts/tripleo-ansible-inventory
>>>
>>
>> Hi Bogdan,
>> thanks for your answer. On the inventory side of things these playbooks
>> work on any kind of inventory, we're using it at the moment with both
>> manual and quickstart generated environments, or even infrared ones.
>> We're able to do it at the same time the environment gets deployed or in
>> a second time like a day two action.
>> What is not clear to me is the ansible-galaxy part you're mentioning,
>> today we rely on the github.com/redhat-openstack git repo, so we clone
>> it and then launch the playbooks via ansible-playbook command, how do
>> you see ansible-galaxy into the picture?
>>
>
> Git clone just works as well... Though, I was thinking of some minimal
> integration via *playbooks* (not roles) in quickstart/tripleo-validations
> and *external* roles. So the in-repo playbooks will be referencing those
> external destructive testing roles. While the roles are installed with
> galaxy, like:
>
> $ ansible-galaxy install git+https://$repo_name,master -p
> $external_roles_path
>
> or prolly adding the $repo_name and $release (master or a tag) into some
> galaxy-requirements.yaml file and install from it:
>
> $ ansible-galaxy install --force -r quickstart-extras/playbooks/ex
> ternal/galaxy-requirements.yaml -p $external_roles_path
>
> Then invoked for quickstart-extras/tripleo-validations like:
>
> $ ansible-playbook -i inventory quickstart-extras/playbooks/ex
> ternal/destructive-tests.yaml
>
>
>> Thanks!
>>
>>
Using galaxy is an option however we would need to make sure that galaxy is
proxied across the upstream clouds.
Another option would be to follow the current established pattern of adding
it to the requirements file [1]

Thanks Bogdan, Raoul!

[1]
https://github.com/openstack/tripleo-quickstart/blob/master/quickstart-extras-requirements.txt


>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-02-16 Thread Bogdan Dobrelya


On 2/16/18 2:59 PM, Raoul Scarazzini wrote:

On 16/02/2018 10:24, Bogdan Dobrelya wrote:
[...]

+1 this looks like a perfect fit. Would it be possible to install that
tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside
the quickstart, then apply destructive-testing playbooks with either the
quickstart's static inventory [0] (from your admin/control node) or
maybe via dynamic inventory [1] (from undercloud managing the overcloud
under test via config-download and/or external ansible deployment
mechanisms)?
[0]
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
[1]
https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory


Hi Bogdan,
thanks for your answer. On the inventory side of things these playbooks
work on any kind of inventory, we're using it at the moment with both
manual and quickstart generated environments, or even infrared ones.
We're able to do it at the same time the environment gets deployed or in
a second time like a day two action.
What is not clear to me is the ansible-galaxy part you're mentioning,
today we rely on the github.com/redhat-openstack git repo, so we clone
it and then launch the playbooks via ansible-playbook command, how do
you see ansible-galaxy into the picture?


Git clone just works as well... Though, I was thinking of some minimal 
integration via *playbooks* (not roles) in 
quickstart/tripleo-validations and *external* roles. So the in-repo 
playbooks will be referencing those external destructive testing roles. 
While the roles are installed with galaxy, like:


$ ansible-galaxy install git+https://$repo_name,master -p 
$external_roles_path


or prolly adding the $repo_name and $release (master or a tag) into some 
galaxy-requirements.yaml file and install from it:


$ ansible-galaxy install --force -r 
quickstart-extras/playbooks/external/galaxy-requirements.yaml -p 
$external_roles_path


Then invoked for quickstart-extras/tripleo-validations like:

$ ansible-playbook -i inventory 
quickstart-extras/playbooks/external/destructive-tests.yaml




Thanks!




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-02-16 Thread Raoul Scarazzini

On 16/02/2018 10:24, Bogdan Dobrelya wrote:
[...]
> +1 this looks like a perfect fit. Would it be possible to install that
> tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside
> the quickstart, then apply destructive-testing playbooks with either the
> quickstart's static inventory [0] (from your admin/control node) or
> maybe via dynamic inventory [1] (from undercloud managing the overcloud
> under test via config-download and/or external ansible deployment
> mechanisms)?
> [0]
> https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
> [1]
> https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory

Hi Bogdan,
thanks for your answer. On the inventory side of things these playbooks
work on any kind of inventory, we're using it at the moment with both
manual and quickstart generated environments, or even infrared ones.
We're able to do it at the same time the environment gets deployed or in
a second time like a day two action.
What is not clear to me is the ansible-galaxy part you're mentioning,
today we rely on the github.com/redhat-openstack git repo, so we clone
it and then launch the playbooks via ansible-playbook command, how do
you see ansible-galaxy into the picture?

Thanks!

-- 
Raoul Scarazzini
ra...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] Validating HA on upstream

2018-02-16 Thread Bogdan Dobrelya


On 2/15/18 8:22 PM, Raoul Scarazzini wrote:

TL;DR: we would like to change the way HA is tested upstream to avoid
being hitten by evitable bugs that the CI process should discover.

Long version:

Today HA testing in upstream consist only in verifying that a three
controllers setup comes up correctly and can spawn an instance. That's
something, but it’s far from being enough since we continuously see "day
two" bugs.
We started covering this more than a year ago in internal CI and today
also on rdocloud using a project named tripleo-quickstart-utils [1].
Apart from his name, the project is not limited to tripleo-quickstart,
it covers three principal roles:

1 - stonith-config: a playbook that can be used to automate the creation
of fencing devices in the overcloud;
2 - instance-ha: a playbook that automates the seventeen manual steps
needed to configure instance HA in the overcloud, test them via rally
and verify that instance HA works;
3 - validate-ha: a playbook that runs a series of disruptive actions in
the overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;

To make this usable upstream, we need to understand where to put this
code. Here some choices:

1 - tripleo-validations: the most logical place to put this, at least
looking at the name, would be tripleo-validations. I've talked with some
of the folks working on it, and it came out that the meaning of
tripleo-validations project is not doing disruptive tests. Integrating
this stuff would be out of scope.

2 - tripleo-quickstart-extras: apart from the fact that this is not
something meant just for quickstart (the project supports infrared and
"plain" environments as well) even if we initially started there, in the
end, it came out that nobody was looking at the patches since nobody was
able to verify them. The result was a series of reviews stuck forever.
So moving back to extras would be a step backward.

3 - Dedicated project (tripleo-ha-utils or just tripleo-utils): like for
tripleo-upgrades or tripleo-validations it would be perfect having all
this grouped and usable as a standalone thing. Any integration is
possible inside the playbook for whatever kind of test. Today we're


+1 this looks like a perfect fit. Would it be possible to install that 
tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside 
the quickstart, then apply destructive-testing playbooks with either the 
quickstart's static inventory [0] (from your admin/control node) or 
maybe via dynamic inventory [1] (from undercloud managing the overcloud 
under test via config-download and/or external ansible deployment 
mechanisms)?


[0] 
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
[1] 
https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory



using the bash framework to interact with the cluster, rally to test
instance-ha and Ansible itself to simulate full power outage scenarios.

There's been a lot of talk about this during the last PTG [2], and
unfortunately, I'll not be part of the next one, but I would like to see
things moving on this side.
Everything I wrote is of course up to discussion, that's precisely the
meaning of this mail.

Thanks to all who'll give advice, suggestions, and thoughts about all
this stuff.

[1] https://github.com/redhat-openstack/tripleo-quickstart-utils
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] Which network templates to use in CI (with and without net isolation)?

2018-01-04 Thread James Slagle

On Thu, Jan 4, 2018 at 5:26 PM, Sagi Shnaidman  wrote:
> Hi, all
>
> we have now network templates in tripleo-ci repo[1] and we'd like to move
> them to tht repo[2] and to use them from there.

They've already been moved from tripleo-ci to tripleo-heat-templates:
https://review.openstack.org/#/c/476708/

> We have also default
> templates defined in overcloud-deploy role[3].
> So the question is - which templates should we use and how to configure
> them?

We should use the ones for ci, not the examples under
tripleo-heat-templates/network/config. Those examples (especially for
multiple-nics) are meant to be clear and orderly so that users can
easily understand how to adapt them to their own environments.
Especially for multiple-nics, there isn't really a sane default, and I
don't think we should make our examples match what we use in ci.

It may be possible to update ovb so that it deploys virt environments
such that the examples work. That feels like a lot of unecessary churn
though. But even then ci is using mtu:1350, which we don't want in the
examples.

> One option for configuration is set network args (incl. isolation) in
> overcloud-deploy role[3] depending on other features (like docker, ipv6,
> etc).
> The other is to set them in featureset[4] files for each job.
> The question is also which network templates we want to gate in CI and
> should it be the same we have by default in tripleo-quickstart-extras?
>
> We have a few patches from James (@slagle) to address this topic[5]

What I'm trying to do in these patches is just use the templates and
environments from tripleo-heat-templates that were copied from
tripleo-ci in 476708. I gathered that was the intent since they were
copied into tripleo-heat-templates. Otherwise, why do we need them
there are at all?

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI promotion blockers

2018-01-02 Thread Julie Pichon

On 2 January 2018 at 16:30, Alex Schultz  wrote:
> On Tue, Jan 2, 2018 at 9:08 AM, Julie Pichon  wrote:
>> Hi!
>>
>> On 27 December 2017 at 16:48, Emilien Macchi  wrote:
>>> - Keystone removed _member_ role management, so we stopped using it
>>> (only Member is enough): https://review.openstack.org/#/c/529849/
>>
>> There's been so many issues with the default member role and Horizon
>> over the years, that one got my attention. I can see that
>> puppet-horizon still expects '_member_' for role management [1].
>> However trying to understand the Keystone patch linked to in the
>> commit, it looks like there's total freedom in which role name to use
>> so we can't just change the default in puppet-horizon to use 'Member'
>> as other consumers may expect and settle on '_member_' in their
>> environment. (Right?)
>>
>> In this case, the proper way to fix this for TripleO deployments may
>> be to make the change in instack-undercloud (I presume in [2]) so that
>> the default role is explicitly set to 'Member' for us? Does that sound
>> like the correct approach to get to a working Horizon?
>>
>
> We probably should at least change _member_ to Member in
> puppet-horizon. That fixes both projects for the default case.

Oh, I thought there was no longer a default and that TripleO was
creating the 'Member' role by itself? Fixing it directly in
puppet-horizon sounds ideal in general, if changing the default value
isn't expected to cause other issues.

Thanks,

Julie

>
> Thanks,
> -Alex
>
>> Julie
>>
>> [1] 
>> https://github.com/openstack/puppet-horizon/blob/master/manifests/init.pp#L458
>> [2] 
>> https://github.com/openstack/instack-undercloud/blob/master/elements/puppet-stack-config/puppet-stack-config.yaml.template#L622

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI promotion blockers

2018-01-02 Thread Alex Schultz

On Tue, Jan 2, 2018 at 9:08 AM, Julie Pichon  wrote:
> Hi!
>
> On 27 December 2017 at 16:48, Emilien Macchi  wrote:
>> - Keystone removed _member_ role management, so we stopped using it
>> (only Member is enough): https://review.openstack.org/#/c/529849/
>
> There's been so many issues with the default member role and Horizon
> over the years, that one got my attention. I can see that
> puppet-horizon still expects '_member_' for role management [1].
> However trying to understand the Keystone patch linked to in the
> commit, it looks like there's total freedom in which role name to use
> so we can't just change the default in puppet-horizon to use 'Member'
> as other consumers may expect and settle on '_member_' in their
> environment. (Right?)
>
> In this case, the proper way to fix this for TripleO deployments may
> be to make the change in instack-undercloud (I presume in [2]) so that
> the default role is explicitly set to 'Member' for us? Does that sound
> like the correct approach to get to a working Horizon?
>

We probably should at least change _member_ to Member in
puppet-horizon. That fixes both projects for the default case.

Thanks,
-Alex

> Julie
>
> [1] 
> https://github.com/openstack/puppet-horizon/blob/master/manifests/init.pp#L458
> [2] 
> https://github.com/openstack/instack-undercloud/blob/master/elements/puppet-stack-config/puppet-stack-config.yaml.template#L622
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI promotion blockers

2018-01-02 Thread Julie Pichon

Hi!

On 27 December 2017 at 16:48, Emilien Macchi  wrote:
> - Keystone removed _member_ role management, so we stopped using it
> (only Member is enough): https://review.openstack.org/#/c/529849/

There's been so many issues with the default member role and Horizon
over the years, that one got my attention. I can see that
puppet-horizon still expects '_member_' for role management [1].
However trying to understand the Keystone patch linked to in the
commit, it looks like there's total freedom in which role name to use
so we can't just change the default in puppet-horizon to use 'Member'
as other consumers may expect and settle on '_member_' in their
environment. (Right?)

In this case, the proper way to fix this for TripleO deployments may
be to make the change in instack-undercloud (I presume in [2]) so that
the default role is explicitly set to 'Member' for us? Does that sound
like the correct approach to get to a working Horizon?

Julie

[1] 
https://github.com/openstack/puppet-horizon/blob/master/manifests/init.pp#L458
[2] 
https://github.com/openstack/instack-undercloud/blob/master/elements/puppet-stack-config/puppet-stack-config.yaml.template#L622

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI promotion blockers

2018-01-01 Thread Emilien Macchi

We've got promotion today, thanks Wes & Sagi for your help.

On Sun, Dec 31, 2017 at 9:06 AM, Emilien Macchi  wrote:
> Here's an update on what we did the last days after merging these
> blockers mentioned in the previous email:
>
> - Ignore a failing test in tempest (workaround)
> https://review.rdoproject.org/r/#/c/8/ until
> https://review.openstack.org/#/c/526647/ is merged. It allowed RDO
> repos to be consistent again, so we could have the latest patches in
> TripleO, tested by Promotion jobs.
> - scenario001 is timeouting a lot, we moved tacker/congress to
> scenario007, and also removed MongoDB that was running for nothing.
> - tripleo-ci-centos-7-containers-multinode was timeouting a lot, we
> removed cinder and some other services already covered by scenarios,
> so tripleo-ci-centos-7-containers-multinode is like ovb, testing the
> minimum set of services (which is why we created this job).
> - fixing an ipv6 issue in puppet-tripleo:
> https://review.openstack.org/#/c/530219/
>
> All of the above is merged.
> Now the remaining blocker is to update the RDO CI layout for promotion jobs:
> See https://review.rdoproject.org/r/#/c/9/ and
> https://review.rdoproject.org/r/#/c/11120/
> Once it merges and job runs, we should get a promotion.
>
> Let me know any question,
>
> On Wed, Dec 27, 2017 at 8:48 AM, Emilien Macchi  wrote:
>> Just a heads-up about what we've done the last days to make progress
>> and hopefully get a promotion this week:
>>
>> - Disabling voting on scenario001, 002 and 003. They timeout too much,
>> we haven't figured out why yet but we'll look at it this week and next
>> week. Hopefully we can re-enable voting today or so.
>> - Kolla added Sensu support and it broke our container builds. It
>> should be fixed by https://review.openstack.org/#/c/529890/ and
>> https://review.openstack.org/530232
>> - Keystone removed _member_ role management, so we stopped using it
>> (only Member is enough): https://review.openstack.org/#/c/529849/
>> - Fixup MTU configuration for CI envs: 
>> https://review.openstack.org/#/c/527249
>> - Reduce memory for undercloud image convert:
>> https://review.openstack.org/#/c/530137/
>> - Remove policy.json default rules from Heat in THT:
>> https://review.openstack.org/#/c/530225
>>
>> That's pretty all. Due to the lack of reviewers during the Christmas
>> time, we had to land some patches ourselves. If there is any problem
>> with one of them, please let us know. We're trying to maintain CI is
>> good shape this week and it's a bit of a challenge ;-)
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI promotion blockers

2017-12-31 Thread Emilien Macchi

Here's an update on what we did the last days after merging these
blockers mentioned in the previous email:

- Ignore a failing test in tempest (workaround)
https://review.rdoproject.org/r/#/c/8/ until
https://review.openstack.org/#/c/526647/ is merged. It allowed RDO
repos to be consistent again, so we could have the latest patches in
TripleO, tested by Promotion jobs.
- scenario001 is timeouting a lot, we moved tacker/congress to
scenario007, and also removed MongoDB that was running for nothing.
- tripleo-ci-centos-7-containers-multinode was timeouting a lot, we
removed cinder and some other services already covered by scenarios,
so tripleo-ci-centos-7-containers-multinode is like ovb, testing the
minimum set of services (which is why we created this job).
- fixing an ipv6 issue in puppet-tripleo:
https://review.openstack.org/#/c/530219/

All of the above is merged.
Now the remaining blocker is to update the RDO CI layout for promotion jobs:
See https://review.rdoproject.org/r/#/c/9/ and
https://review.rdoproject.org/r/#/c/11120/
Once it merges and job runs, we should get a promotion.

Let me know any question,

On Wed, Dec 27, 2017 at 8:48 AM, Emilien Macchi  wrote:
> Just a heads-up about what we've done the last days to make progress
> and hopefully get a promotion this week:
>
> - Disabling voting on scenario001, 002 and 003. They timeout too much,
> we haven't figured out why yet but we'll look at it this week and next
> week. Hopefully we can re-enable voting today or so.
> - Kolla added Sensu support and it broke our container builds. It
> should be fixed by https://review.openstack.org/#/c/529890/ and
> https://review.openstack.org/530232
> - Keystone removed _member_ role management, so we stopped using it
> (only Member is enough): https://review.openstack.org/#/c/529849/
> - Fixup MTU configuration for CI envs: https://review.openstack.org/#/c/527249
> - Reduce memory for undercloud image convert:
> https://review.openstack.org/#/c/530137/
> - Remove policy.json default rules from Heat in THT:
> https://review.openstack.org/#/c/530225
>
> That's pretty all. Due to the lack of reviewers during the Christmas
> time, we had to land some patches ourselves. If there is any problem
> with one of them, please let us know. We're trying to maintain CI is
> good shape this week and it's a bit of a challenge ;-)
> --
> Emilien Macchi



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-09-07 Thread Harry Rybacki

On Thu, Aug 31, 2017 at 12:52 PM, Juan Antonio Osorio
 wrote:
> Something that just came to my mind: Another option would be to allocate an
> extra IP Address for the undercloud, that would be dedicated to FreeIPA, and
> that way we MAY be able to deploy the FreeIPA server in the undercloud. If
> folks are OK with this I could experiment on this front. Maybe I could try
> to run FreeIPA on a container [1] (which wasn't available when I started
> working on this).
>
Interesting idea, Ozz! I'm not sure what/if the security implications
of running them
on the same host would be.

I'm cc'ing Toure to discuss possible workflow approach to this as well.

/R

> [1] https://hub.docker.com/r/freeipa/freeipa-server/
>
> On Sat, Aug 26, 2017 at 2:52 AM, Emilien Macchi  wrote:
>>
>> On Sun, Aug 20, 2017 at 11:45 PM, Juan Antonio Osorio
>>  wrote:
>> > The second option seems like the most viable. Not sure how the TripleO
>> > integration would go though. Care to elaborate on what you had in mind?
>>
>> Trying to reproduce what we did with ceph-ansible and use Mistral to
>> deploy FreeIPA with an external deployment tool.
>> Though I find the solution quite complex, maybe we can come-up with an
>> easier approach this time?
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Juan Antonio Osorio R.
> e-mail: jaosor...@gmail.com
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-31 Thread Juan Antonio Osorio

Something that just came to my mind: Another option would be to allocate an
extra IP Address for the undercloud, that would be dedicated to FreeIPA,
and that way we MAY be able to deploy the FreeIPA server in the undercloud.
If folks are OK with this I could experiment on this front. Maybe I could
try to run FreeIPA on a container [1] (which wasn't available when I
started working on this).

[1] https://hub.docker.com/r/freeipa/freeipa-server/

On Sat, Aug 26, 2017 at 2:52 AM, Emilien Macchi  wrote:

> On Sun, Aug 20, 2017 at 11:45 PM, Juan Antonio Osorio
>  wrote:
> > The second option seems like the most viable. Not sure how the TripleO
> > integration would go though. Care to elaborate on what you had in mind?
>
> Trying to reproduce what we did with ceph-ansible and use Mistral to
> deploy FreeIPA with an external deployment tool.
> Though I find the solution quite complex, maybe we can come-up with an
> easier approach this time?
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

-- 
Juan Antonio Osorio R.
e-mail: jaosor...@gmail.com
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Clark Boylan



On Mon, Aug 28, 2017, at 07:19 AM, Paul Belanger wrote:
> On Mon, Aug 28, 2017 at 09:42:45AM -0400, David Moreau Simard wrote:
> > Hi,
> > 
> > (cc whom I would at least like to attend)
> > 
> > The PTG would be a great opportunity to talk about CI design/layout
> > and how we see things moving forward in TripleO with Zuul v3, upstream
> > and in review.rdoproject.org.
> > 
> > Can we have a formal session on this scheduled somewhere ?
> > 
> Wednesday onwards likely is best for me, otherwise, I can find time
> during
> Mon-Tues if that is better.

The Zuulv3 stuff may be appropriate during the Infra team helproom on
Monday and Tuesday. There will be an afternoon Zuulv3 for OpenStack devs
session in Vail at 2pm Monday, but I think we generally plan on helping
with Zuulv3 during the entire helproom time.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread David Moreau Simard

On Mon, Aug 28, 2017 at 10:25 AM, Wesley Hayutin  wrote:
> +1 from me, I'm sure John, Sagi, and Arx are also interested.

Yes, of course, I just went with whom I knew were going to the PTG.
Anyone else is welcome to join as well !


David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]


On Mon, Aug 28, 2017 at 10:25 AM, Wesley Hayutin  wrote:
>
>
> On Mon, Aug 28, 2017 at 10:19 AM, Paul Belanger 
> wrote:
>>
>> On Mon, Aug 28, 2017 at 09:42:45AM -0400, David Moreau Simard wrote:
>> > Hi,
>> >
>> > (cc whom I would at least like to attend)
>> >
>> > The PTG would be a great opportunity to talk about CI design/layout
>> > and how we see things moving forward in TripleO with Zuul v3, upstream
>> > and in review.rdoproject.org.
>> >
>> > Can we have a formal session on this scheduled somewhere ?
>> >
>> Wednesday onwards likely is best for me, otherwise, I can find time during
>> Mon-Tues if that is better.
>>
>
> +1 from me, I'm sure John, Sagi, and Arx are also interested.
>
> Thanks
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread David Moreau Simard

On Mon, Aug 28, 2017 at 10:58 AM, Emilien Macchi  wrote:
> Yeah, this session would be interesting to do.
> Feel free to add it on https://etherpad.openstack.org/p/tripleo-ptg-queens
> We need to work on scheduling before the PTG but it would likely
> happen between Wednesday and Friday morning.

Good idea, I've added it to the etherpad [1] and I've created a pad
for the session as well [2].

[1]: https://etherpad.openstack.org/p/tripleo-ptg-queens
[2]: https://etherpad.openstack.org/p/tripleo-ptg-queens-ci

David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Emilien Macchi

On Mon, Aug 28, 2017 at 6:42 AM, David Moreau Simard  wrote:
[...]
> Can we have a formal session on this scheduled somewhere ?

Yeah, this session would be interesting to do.
Feel free to add it on https://etherpad.openstack.org/p/tripleo-ptg-queens
We need to work on scheduling before the PTG but it would likely
happen between Wednesday and Friday morning.

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Wesley Hayutin

On Mon, Aug 28, 2017 at 10:19 AM, Paul Belanger 
wrote:

> On Mon, Aug 28, 2017 at 09:42:45AM -0400, David Moreau Simard wrote:
> > Hi,
> >
> > (cc whom I would at least like to attend)
> >
> > The PTG would be a great opportunity to talk about CI design/layout
> > and how we see things moving forward in TripleO with Zuul v3, upstream
> > and in review.rdoproject.org.
> >
> > Can we have a formal session on this scheduled somewhere ?
> >
> Wednesday onwards likely is best for me, otherwise, I can find time during
> Mon-Tues if that is better.
>
>
+1 from me, I'm sure John, Sagi, and Arx are also interested.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Paul Belanger

On Mon, Aug 28, 2017 at 09:42:45AM -0400, David Moreau Simard wrote:
> Hi,
> 
> (cc whom I would at least like to attend)
> 
> The PTG would be a great opportunity to talk about CI design/layout
> and how we see things moving forward in TripleO with Zuul v3, upstream
> and in review.rdoproject.org.
> 
> Can we have a formal session on this scheduled somewhere ?
> 
Wednesday onwards likely is best for me, otherwise, I can find time during
Mon-Tues if that is better.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI design session at the PTG

2017-08-28 Thread Harry Rybacki

On Mon, Aug 28, 2017 at 9:42 AM, David Moreau Simard  wrote:
> Hi,
>
> (cc whom I would at least like to attend)
>
> The PTG would be a great opportunity to talk about CI design/layout
> and how we see things moving forward in TripleO with Zuul v3, upstream
> and in review.rdoproject.org.
>
> Can we have a formal session on this scheduled somewhere ?
>
+1

> David Moreau Simard
> Senior Software Engineer | OpenStack RDO
>
> dmsimard = [irc, github, twitter]
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-25 Thread Emilien Macchi

On Sun, Aug 20, 2017 at 11:45 PM, Juan Antonio Osorio
 wrote:
> The second option seems like the most viable. Not sure how the TripleO
> integration would go though. Care to elaborate on what you had in mind?

Trying to reproduce what we did with ceph-ansible and use Mistral to
deploy FreeIPA with an external deployment tool.
Though I find the solution quite complex, maybe we can come-up with an
easier approach this time?
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-21 Thread Juan Antonio Osorio

On Mon, Aug 21, 2017 at 5:48 PM, Ben Nemec  wrote:

>
>
> On 08/21/2017 01:45 AM, Juan Antonio Osorio wrote:
>
>> The second option seems like the most viable. Not sure how the TripleO
>> integration would go though. Care to elaborate on what you had in mind?
>>
>
> I can't remember if we discussed this when we were first implementing the
> ci job, but could FreeIPA run on the undercloud itself?  We could have the
> undercloud install process install FreeIPA before it does the rest of the
> undercloud install, and then the undercloud by default would talk to that
> local instance of FreeIPA.  We'd provide configuration options to allow use
> of a standalone server too, of course.
>

Right, this would have been the preferred option, and we did try to do
this. However, FreeIPA not very flexible (it isn't at all) on its port
configuration. And unfortunately there are port conflicts. Hence why we
decided to use a separate node.


> I feel like there was probably a reason we didn't do that in the first
> place (port conflicts?), but it would be the easiest option for deployers
> if we could make it work.
>
>
>> On Fri, Aug 18, 2017 at 9:11 PM, Emilien Macchi > > wrote:
>>
>> On Fri, Aug 18, 2017 at 8:34 AM, Harry Rybacki > > wrote:
>>  > Greetings Stackers,
>>  >
>>  > Recently, I brought up a discussion around deploying FreeIPA via
>>  > TripleO-Quickstart vs TripleO. This is part of a larger discussion
>>  > around expanding security related CI coverage for OpenStack.
>>  >
>>  > A few months back, I added the ability to deploy FreeIPA via
>>  > TripleO-Quickstart through three reviews:
>>  >
>>  > 1) Adding a role to deploy FreeIPA via OOOQ_E[1]
>>  > 2) Providing OOOQ with the ability to deploy a supplemental node
>>  > (alongside the undercloud)[2]
>>  > 3) Update the quickstart-extras playbook to deploy FreeIPA[3]
>>  >
>>  >
>>  > The reasoning behind this is as follows (copied from a conversation
>>  > with jaosorior):
>>  >
>>  >> So the deal is that both the undercloud and the overcloud need
>> to be registered as a FreeIPA client.
>>  >> This is because they need to authenticate to it in order to
>> execute actions.
>>  >>
>>  >> * The undercloud needs to have FreeIPA credentials because it's
>> running novajoin, which in turn
>>  >> executes requests to FreeIPA in order to create service principals
>>  >>  - The service principals are ultimately the service name and
>> the node name entries for which we'll
>>  >> requests the certificates.
>>  >> * The overcloud nodes need to be registered and authenticated to
>> FreeIPA (which right now happens > through a cloud-init script
>> provisioned by nova/nova-metadata) because that's how it requests
>>  >> certificates.
>>  >>
>>  >> So the flow is as follows:
>>  >>
>>  >> * FreeIPA node is provisioned.
>>  >>  - We'll appropriate credentials at this point.
>>  >>  - We register the undercloud as a FreeIPA client and get an OTP
>> (one time password) for it
>>  >> - We add the OTP to the undercloud.conf and enable novajoin.
>>  >> * We trigger the undercloud install.
>>  >>  - after the install, we have novajoin running, which is the
>> service that registers automatically the
>>  >> overcloud nodes to FreeIPA.
>>  >> * We trigger the overcloud deploy
>>  >>  - We need to set up a flag that tells the deploy to pass
>> appropriate nova metadata (which tells
>>  >> novajoin that the nodes should be registered).
>>  >>  - profit!! we can now get certificates from the CA (and do
>> other stuff that FreeIPA allows you to do,
>>  >> such as use kerberos auth, control sudo rights of the nodes'
>> users, etc.)
>>  >>
>>  >> Since the nodes need to be registered to FreeIPA, we can't rely
>> on FreeIPA being installed by
>>  >> TripleO, even if that's possible by doing it through a
>> composable service.
>>  >> If we would use a composable service to install FreeIPA, the
>> flow would be like this:
>>  >>
>>  >> * Install undercloud
>>  >> * Install overcloud with one node (running FreeIPA)
>>  >> * register undercloud node to FreeIPA and modify undercloud.conf
>>  >> * Update undercloud
>>  >> * scale overcloud and register the rest of the nodes to FreeIPA
>> through novajoin.
>>  >>
>>  >> So, while we could install FreeIPA with TripleO. This really
>> complicates the deployment to an
>>  >> unnecessary point.
>>  >>
>>  >> So I suggest keeping the current behavior, which treats FreeIPA
>> as a separate node to be
>>  >> provisioned before the undercloud). And if folks would like to
>> have a separate FreeIPA node for their > overcloud

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-21 Thread Ben Nemec




On 08/21/2017 01:45 AM, Juan Antonio Osorio wrote:
The second option seems like the most viable. Not sure how the TripleO 
integration would go though. Care to elaborate on what you had in mind?


I can't remember if we discussed this when we were first implementing 
the ci job, but could FreeIPA run on the undercloud itself?  We could 
have the undercloud install process install FreeIPA before it does the 
rest of the undercloud install, and then the undercloud by default would 
talk to that local instance of FreeIPA.  We'd provide configuration 
options to allow use of a standalone server too, of course.


I feel like there was probably a reason we didn't do that in the first 
place (port conflicts?), but it would be the easiest option for 
deployers if we could make it work.




On Fri, Aug 18, 2017 at 9:11 PM, Emilien Macchi > wrote:


On Fri, Aug 18, 2017 at 8:34 AM, Harry Rybacki > wrote:
 > Greetings Stackers,
 >
 > Recently, I brought up a discussion around deploying FreeIPA via
 > TripleO-Quickstart vs TripleO. This is part of a larger discussion
 > around expanding security related CI coverage for OpenStack.
 >
 > A few months back, I added the ability to deploy FreeIPA via
 > TripleO-Quickstart through three reviews:
 >
 > 1) Adding a role to deploy FreeIPA via OOOQ_E[1]
 > 2) Providing OOOQ with the ability to deploy a supplemental node
 > (alongside the undercloud)[2]
 > 3) Update the quickstart-extras playbook to deploy FreeIPA[3]
 >
 >
 > The reasoning behind this is as follows (copied from a conversation
 > with jaosorior):
 >
 >> So the deal is that both the undercloud and the overcloud need
to be registered as a FreeIPA client.
 >> This is because they need to authenticate to it in order to
execute actions.
 >>
 >> * The undercloud needs to have FreeIPA credentials because it's
running novajoin, which in turn
 >> executes requests to FreeIPA in order to create service principals
 >>  - The service principals are ultimately the service name and
the node name entries for which we'll
 >> requests the certificates.
 >> * The overcloud nodes need to be registered and authenticated to
FreeIPA (which right now happens > through a cloud-init script
provisioned by nova/nova-metadata) because that's how it requests
 >> certificates.
 >>
 >> So the flow is as follows:
 >>
 >> * FreeIPA node is provisioned.
 >>  - We'll appropriate credentials at this point.
 >>  - We register the undercloud as a FreeIPA client and get an OTP
(one time password) for it
 >> - We add the OTP to the undercloud.conf and enable novajoin.
 >> * We trigger the undercloud install.
 >>  - after the install, we have novajoin running, which is the
service that registers automatically the
 >> overcloud nodes to FreeIPA.
 >> * We trigger the overcloud deploy
 >>  - We need to set up a flag that tells the deploy to pass
appropriate nova metadata (which tells
 >> novajoin that the nodes should be registered).
 >>  - profit!! we can now get certificates from the CA (and do
other stuff that FreeIPA allows you to do,
 >> such as use kerberos auth, control sudo rights of the nodes'
users, etc.)
 >>
 >> Since the nodes need to be registered to FreeIPA, we can't rely
on FreeIPA being installed by
 >> TripleO, even if that's possible by doing it through a
composable service.
 >> If we would use a composable service to install FreeIPA, the
flow would be like this:
 >>
 >> * Install undercloud
 >> * Install overcloud with one node (running FreeIPA)
 >> * register undercloud node to FreeIPA and modify undercloud.conf
 >> * Update undercloud
 >> * scale overcloud and register the rest of the nodes to FreeIPA
through novajoin.
 >>
 >> So, while we could install FreeIPA with TripleO. This really
complicates the deployment to an
 >> unnecessary point.
 >>
 >> So I suggest keeping the current behavior, which treats FreeIPA
as a separate node to be
 >> provisioned before the undercloud). And if folks would like to
have a separate FreeIPA node for their > overcloud deployment (which
could provision certs for the tenants) then we could do that as a
 >> composable service, if people request it.
 >
 > I am now re-raising this to the group at large for discussion about
 > the merits of this approach vs deploying via TripleO itself.

There are 3 approaches here:

- Keep using Quickstart which is of course not the viable option since
TripleO Quickstart is only used by CI and developers right now. Not by
customers neither in production.
- Deploy your own Ansible playbooks or automation tool to deploy

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-21 Thread Juan Antonio Osorio

The second option seems like the most viable. Not sure how the TripleO
integration would go though. Care to elaborate on what you had in mind?

On Fri, Aug 18, 2017 at 9:11 PM, Emilien Macchi  wrote:

> On Fri, Aug 18, 2017 at 8:34 AM, Harry Rybacki 
> wrote:
> > Greetings Stackers,
> >
> > Recently, I brought up a discussion around deploying FreeIPA via
> > TripleO-Quickstart vs TripleO. This is part of a larger discussion
> > around expanding security related CI coverage for OpenStack.
> >
> > A few months back, I added the ability to deploy FreeIPA via
> > TripleO-Quickstart through three reviews:
> >
> > 1) Adding a role to deploy FreeIPA via OOOQ_E[1]
> > 2) Providing OOOQ with the ability to deploy a supplemental node
> > (alongside the undercloud)[2]
> > 3) Update the quickstart-extras playbook to deploy FreeIPA[3]
> >
> >
> > The reasoning behind this is as follows (copied from a conversation
> > with jaosorior):
> >
> >> So the deal is that both the undercloud and the overcloud need to be
> registered as a FreeIPA client.
> >> This is because they need to authenticate to it in order to execute
> actions.
> >>
> >> * The undercloud needs to have FreeIPA credentials because it's running
> novajoin, which in turn
> >> executes requests to FreeIPA in order to create service principals
> >>  - The service principals are ultimately the service name and the node
> name entries for which we'll
> >> requests the certificates.
> >> * The overcloud nodes need to be registered and authenticated to
> FreeIPA (which right now happens > through a cloud-init script provisioned
> by nova/nova-metadata) because that's how it requests
> >> certificates.
> >>
> >> So the flow is as follows:
> >>
> >> * FreeIPA node is provisioned.
> >>  - We'll appropriate credentials at this point.
> >>  - We register the undercloud as a FreeIPA client and get an OTP (one
> time password) for it
> >> - We add the OTP to the undercloud.conf and enable novajoin.
> >> * We trigger the undercloud install.
> >>  - after the install, we have novajoin running, which is the service
> that registers automatically the
> >> overcloud nodes to FreeIPA.
> >> * We trigger the overcloud deploy
> >>  - We need to set up a flag that tells the deploy to pass appropriate
> nova metadata (which tells
> >> novajoin that the nodes should be registered).
> >>  - profit!! we can now get certificates from the CA (and do other stuff
> that FreeIPA allows you to do,
> >> such as use kerberos auth, control sudo rights of the nodes' users,
> etc.)
> >>
> >> Since the nodes need to be registered to FreeIPA, we can't rely on
> FreeIPA being installed by
> >> TripleO, even if that's possible by doing it through a composable
> service.
> >> If we would use a composable service to install FreeIPA, the flow would
> be like this:
> >>
> >> * Install undercloud
> >> * Install overcloud with one node (running FreeIPA)
> >> * register undercloud node to FreeIPA and modify undercloud.conf
> >> * Update undercloud
> >> * scale overcloud and register the rest of the nodes to FreeIPA through
> novajoin.
> >>
> >> So, while we could install FreeIPA with TripleO. This really
> complicates the deployment to an
> >> unnecessary point.
> >>
> >> So I suggest keeping the current behavior, which treats FreeIPA as a
> separate node to be
> >> provisioned before the undercloud). And if folks would like to have a
> separate FreeIPA node for their > overcloud deployment (which could
> provision certs for the tenants) then we could do that as a
> >> composable service, if people request it.
> >
> > I am now re-raising this to the group at large for discussion about
> > the merits of this approach vs deploying via TripleO itself.
>
> There are 3 approaches here:
>
> - Keep using Quickstart which is of course not the viable option since
> TripleO Quickstart is only used by CI and developers right now. Not by
> customers neither in production.
> - Deploy your own Ansible playbooks or automation tool to deploy
> FreeIPA and host it wherever you like. Integrate the playbooks in
> TripleO, as an external component (can be deployed manually between
> some steps but will be to be documented).
> - Create a composable service that will deploy FreeIPA service(s),
> part of TripleO Heat Templates. The way it works *now* will require
> you to have a puppet-freeipa module to deploy the bits but we're
> working toward migrating to Ansible at some point.
>

This approach is not ideal and will be quite a burdain as I described
above. I wouldn't consider this an option.


> I hope it helps, let me know if you need further details on a specific
> approach.
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Re: [openstack-dev] [TripleO][CI] FreeIPA Deployment

2017-08-18 Thread Emilien Macchi

On Fri, Aug 18, 2017 at 8:34 AM, Harry Rybacki  wrote:
> Greetings Stackers,
>
> Recently, I brought up a discussion around deploying FreeIPA via
> TripleO-Quickstart vs TripleO. This is part of a larger discussion
> around expanding security related CI coverage for OpenStack.
>
> A few months back, I added the ability to deploy FreeIPA via
> TripleO-Quickstart through three reviews:
>
> 1) Adding a role to deploy FreeIPA via OOOQ_E[1]
> 2) Providing OOOQ with the ability to deploy a supplemental node
> (alongside the undercloud)[2]
> 3) Update the quickstart-extras playbook to deploy FreeIPA[3]
>
>
> The reasoning behind this is as follows (copied from a conversation
> with jaosorior):
>
>> So the deal is that both the undercloud and the overcloud need to be 
>> registered as a FreeIPA client.
>> This is because they need to authenticate to it in order to execute actions.
>>
>> * The undercloud needs to have FreeIPA credentials because it's running 
>> novajoin, which in turn
>> executes requests to FreeIPA in order to create service principals
>>  - The service principals are ultimately the service name and the node name 
>> entries for which we'll
>> requests the certificates.
>> * The overcloud nodes need to be registered and authenticated to FreeIPA 
>> (which right now happens > through a cloud-init script provisioned by 
>> nova/nova-metadata) because that's how it requests
>> certificates.
>>
>> So the flow is as follows:
>>
>> * FreeIPA node is provisioned.
>>  - We'll appropriate credentials at this point.
>>  - We register the undercloud as a FreeIPA client and get an OTP (one time 
>> password) for it
>> - We add the OTP to the undercloud.conf and enable novajoin.
>> * We trigger the undercloud install.
>>  - after the install, we have novajoin running, which is the service that 
>> registers automatically the
>> overcloud nodes to FreeIPA.
>> * We trigger the overcloud deploy
>>  - We need to set up a flag that tells the deploy to pass appropriate nova 
>> metadata (which tells
>> novajoin that the nodes should be registered).
>>  - profit!! we can now get certificates from the CA (and do other stuff that 
>> FreeIPA allows you to do,
>> such as use kerberos auth, control sudo rights of the nodes' users, etc.)
>>
>> Since the nodes need to be registered to FreeIPA, we can't rely on FreeIPA 
>> being installed by
>> TripleO, even if that's possible by doing it through a composable service.
>> If we would use a composable service to install FreeIPA, the flow would be 
>> like this:
>>
>> * Install undercloud
>> * Install overcloud with one node (running FreeIPA)
>> * register undercloud node to FreeIPA and modify undercloud.conf
>> * Update undercloud
>> * scale overcloud and register the rest of the nodes to FreeIPA through 
>> novajoin.
>>
>> So, while we could install FreeIPA with TripleO. This really complicates the 
>> deployment to an
>> unnecessary point.
>>
>> So I suggest keeping the current behavior, which treats FreeIPA as a 
>> separate node to be
>> provisioned before the undercloud). And if folks would like to have a 
>> separate FreeIPA node for their > overcloud deployment (which could 
>> provision certs for the tenants) then we could do that as a
>> composable service, if people request it.
>
> I am now re-raising this to the group at large for discussion about
> the merits of this approach vs deploying via TripleO itself.

There are 3 approaches here:

- Keep using Quickstart which is of course not the viable option since
TripleO Quickstart is only used by CI and developers right now. Not by
customers neither in production.
- Deploy your own Ansible playbooks or automation tool to deploy
FreeIPA and host it wherever you like. Integrate the playbooks in
TripleO, as an external component (can be deployed manually between
some steps but will be to be documented).
- Create a composable service that will deploy FreeIPA service(s),
part of TripleO Heat Templates. The way it works *now* will require
you to have a puppet-freeipa module to deploy the bits but we're
working toward migrating to Ansible at some point.

I hope it helps, let me know if you need further details on a specific approach.
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-12 Thread Wesley Hayutin

On Wed, Jul 12, 2017 at 10:33 AM, Pradeep Kilambi  wrote:

> On Tue, Jul 11, 2017 at 10:06 PM, Wesley Hayutin 
> wrote:
> >
> >
> > On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi 
> wrote:
> >>
> >> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi 
> wrote:
> >> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin 
> >> > wrote:
> >> >> Greetings,
> >> >>
> >> >> I was looking through the mailing list and I did not see any emails
> >> >> explicitly calling out the decreased coverage for telemetry in
> tripleo
> >> >> due
> >> >> to [1].  A series of changes went into the CI system to disable
> >> >> telemetry
> >> >> [2].
> >> >>
> >> >> There is work being done to restore more coverage for telemetry by
> >> >> limiting
> >> >> the resources it consumes [3].  We are also working on additional
> >> >> scenarios
> >> >> in t-h-t/ci/environments/ to better cover ceilometer.
> >> >>
> >> >> If the CI environment you are working in has the resources to cover
> >> >> ceilometer that is great, however if you find issues like [1] we
> highly
> >> >> suggest you follow the same pattern until coverage is restored
> >> >> upstream.
> >> >>
> >> >> Thank you!
> >> >>
> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> >> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
> >> >> [3]
> >> >> https://review.openstack.org/#/c/475838/
> >> >> https://review.openstack.org/#/c/474969/
> >> >> https://review.openstack.org/#/c/47/
> >> >>
> >> >>
> >> >
> >> > Thanks for starting this thread Wes. I concur with this. We got bitten
> >> > recently by many issues that we could have caught in ci had telemetry
> >> > been enabled. I spoke to trown and Emilien about this a few times
> >> > already. I do understand the resource footprint it causes.  But with
> >> > recent improvements and changes upstream, things should be back to
> >> > being more manageable. We do have telemetry tested in scenario001 job,
> >> > but that doesn't cover all scenarios. So there is a gap in coverage.
> >>
> >> What do you mean by gap in coverage?
> >> We have scenarios on purpose, so we can horizontally scale the
> >> coverage across multiple jobs and run the jobs only when we need (e.g.
> >> touching telemetry files for scenario001).
> >>
> >> Please elaborate on what isn't covered by scenario001, because we
> >> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
> >> and soon with Swift backend in scenario002).
> >>
> >
> > Emilien,
> > Gap is the wrong word to use in the case.
> > Previously we had several jobs running with telemetry turned on including
> > ovb jobs in tripleo and other jobs outside of the upstream CI system.
> > The more jobs running, the more coverage.
> > I think that is what Pradeep was referring to, but maybe I am
> > misunderstanding this as well.
>
> Yea may be gap is not the right word. But mostly i meant what Wes
> said, but also I feel we are not testing Telemetry with full HA
> currently in CI. scenario jobs only test deploy with 1 controller not
> 3. We have seen some recent issues where things work on controller 0
> but controller 1 or 2 has statsd down for example. The ovb ha job
> would have shown us that, had the ovb ha job included telemetry
> enabled. Is it possible to run scenario001 job with full HA ?
>

Full HA is limited to ovb jobs atm and these jobs currently take longer to
run and are barely able to complete in the mandatory upstream timeout
period.
IMHO it's worth the time and effort to see if the performance improvements
currently being made to ceilometer will work properly with the OVB jobs,
but nothing I can guarantee atm.

Work is now starting on being able to deploy a full HA envrionment using
nodepool multinode jobs.  IMHO this is a better target.
I will keep you posted on the progress here.

Thank you Pradeep.


>
>
>
> >
> >
> >>
> >> >  I hope we can either re-enable these services by default in CI and
> >> > how things work or at least add a separate gate job to be able to test
> >> > HA scenario properly with telemetry enabled.
> >> >
> >> > --
> >> > Cheers,
> >> > ~ Prad
> >> >
> >> >
> >> > 
> __
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >> --
> >> Emilien Macchi
> >
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Cheers,
> ~ Prad
>
> __
>

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-12 Thread Pradeep Kilambi

On Tue, Jul 11, 2017 at 10:06 PM, Wesley Hayutin  wrote:
>
>
> On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi  wrote:
>>
>> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi  wrote:
>> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin 
>> > wrote:
>> >> Greetings,
>> >>
>> >> I was looking through the mailing list and I did not see any emails
>> >> explicitly calling out the decreased coverage for telemetry in tripleo
>> >> due
>> >> to [1].  A series of changes went into the CI system to disable
>> >> telemetry
>> >> [2].
>> >>
>> >> There is work being done to restore more coverage for telemetry by
>> >> limiting
>> >> the resources it consumes [3].  We are also working on additional
>> >> scenarios
>> >> in t-h-t/ci/environments/ to better cover ceilometer.
>> >>
>> >> If the CI environment you are working in has the resources to cover
>> >> ceilometer that is great, however if you find issues like [1] we highly
>> >> suggest you follow the same pattern until coverage is restored
>> >> upstream.
>> >>
>> >> Thank you!
>> >>
>> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
>> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
>> >> [3]
>> >> https://review.openstack.org/#/c/475838/
>> >> https://review.openstack.org/#/c/474969/
>> >> https://review.openstack.org/#/c/47/
>> >>
>> >>
>> >
>> > Thanks for starting this thread Wes. I concur with this. We got bitten
>> > recently by many issues that we could have caught in ci had telemetry
>> > been enabled. I spoke to trown and Emilien about this a few times
>> > already. I do understand the resource footprint it causes.  But with
>> > recent improvements and changes upstream, things should be back to
>> > being more manageable. We do have telemetry tested in scenario001 job,
>> > but that doesn't cover all scenarios. So there is a gap in coverage.
>>
>> What do you mean by gap in coverage?
>> We have scenarios on purpose, so we can horizontally scale the
>> coverage across multiple jobs and run the jobs only when we need (e.g.
>> touching telemetry files for scenario001).
>>
>> Please elaborate on what isn't covered by scenario001, because we
>> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
>> and soon with Swift backend in scenario002).
>>
>
> Emilien,
> Gap is the wrong word to use in the case.
> Previously we had several jobs running with telemetry turned on including
> ovb jobs in tripleo and other jobs outside of the upstream CI system.
> The more jobs running, the more coverage.
> I think that is what Pradeep was referring to, but maybe I am
> misunderstanding this as well.

Yea may be gap is not the right word. But mostly i meant what Wes
said, but also I feel we are not testing Telemetry with full HA
currently in CI. scenario jobs only test deploy with 1 controller not
3. We have seen some recent issues where things work on controller 0
but controller 1 or 2 has statsd down for example. The ovb ha job
would have shown us that, had the ovb ha job included telemetry
enabled. Is it possible to run scenario001 job with full HA ?



>
>
>>
>> >  I hope we can either re-enable these services by default in CI and
>> > how things work or at least add a separate gate job to be able to test
>> > HA scenario properly with telemetry enabled.
>> >
>> > --
>> > Cheers,
>> > ~ Prad
>> >
>> >
>> > __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> --
>> Emilien Macchi
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Cheers,
~ Prad

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Wesley Hayutin

On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi  wrote:

> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi  wrote:
> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin 
> wrote:
> >> Greetings,
> >>
> >> I was looking through the mailing list and I did not see any emails
> >> explicitly calling out the decreased coverage for telemetry in tripleo
> due
> >> to [1].  A series of changes went into the CI system to disable
> telemetry
> >> [2].
> >>
> >> There is work being done to restore more coverage for telemetry by
> limiting
> >> the resources it consumes [3].  We are also working on additional
> scenarios
> >> in t-h-t/ci/environments/ to better cover ceilometer.
> >>
> >> If the CI environment you are working in has the resources to cover
> >> ceilometer that is great, however if you find issues like [1] we highly
> >> suggest you follow the same pattern until coverage is restored upstream.
> >>
> >> Thank you!
> >>
> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
> >> [3]
> >> https://review.openstack.org/#/c/475838/
> >> https://review.openstack.org/#/c/474969/
> >> https://review.openstack.org/#/c/47/
> >>
> >>
> >
> > Thanks for starting this thread Wes. I concur with this. We got bitten
> > recently by many issues that we could have caught in ci had telemetry
> > been enabled. I spoke to trown and Emilien about this a few times
> > already. I do understand the resource footprint it causes.  But with
> > recent improvements and changes upstream, things should be back to
> > being more manageable. We do have telemetry tested in scenario001 job,
> > but that doesn't cover all scenarios. So there is a gap in coverage.
>
> What do you mean by gap in coverage?
> We have scenarios on purpose, so we can horizontally scale the
> coverage across multiple jobs and run the jobs only when we need (e.g.
> touching telemetry files for scenario001).
>
> Please elaborate on what isn't covered by scenario001, because we
> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
> and soon with Swift backend in scenario002).
>
>
Emilien,
Gap is the wrong word to use in the case.
Previously we had several jobs running with telemetry turned on including
ovb jobs in tripleo and other jobs outside of the upstream CI system.
The more jobs running, the more coverage.
I think that is what Pradeep was referring to, but maybe I am
misunderstanding this as well.



> >  I hope we can either re-enable these services by default in CI and
> > how things work or at least add a separate gate job to be able to test
> > HA scenario properly with telemetry enabled.
> >
> > --
> > Cheers,
> > ~ Prad
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Emilien Macchi
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Emilien Macchi

On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi  wrote:
> On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin  wrote:
>> Greetings,
>>
>> I was looking through the mailing list and I did not see any emails
>> explicitly calling out the decreased coverage for telemetry in tripleo due
>> to [1].  A series of changes went into the CI system to disable telemetry
>> [2].
>>
>> There is work being done to restore more coverage for telemetry by limiting
>> the resources it consumes [3].  We are also working on additional scenarios
>> in t-h-t/ci/environments/ to better cover ceilometer.
>>
>> If the CI environment you are working in has the resources to cover
>> ceilometer that is great, however if you find issues like [1] we highly
>> suggest you follow the same pattern until coverage is restored upstream.
>>
>> Thank you!
>>
>> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
>> [2] https://review.openstack.org/#/q/topic:bug/1680195
>> [3]
>> https://review.openstack.org/#/c/475838/
>> https://review.openstack.org/#/c/474969/
>> https://review.openstack.org/#/c/47/
>>
>>
>
> Thanks for starting this thread Wes. I concur with this. We got bitten
> recently by many issues that we could have caught in ci had telemetry
> been enabled. I spoke to trown and Emilien about this a few times
> already. I do understand the resource footprint it causes.  But with
> recent improvements and changes upstream, things should be back to
> being more manageable. We do have telemetry tested in scenario001 job,
> but that doesn't cover all scenarios. So there is a gap in coverage.

What do you mean by gap in coverage?
We have scenarios on purpose, so we can horizontally scale the
coverage across multiple jobs and run the jobs only when we need (e.g.
touching telemetry files for scenario001).

Please elaborate on what isn't covered by scenario001, because we
already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
and soon with Swift backend in scenario002).

>  I hope we can either re-enable these services by default in CI and
> how things work or at least add a separate gate job to be able to test
> HA scenario properly with telemetry enabled.
>
> --
> Cheers,
> ~ Prad
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Wesley Hayutin

On Tue, Jul 11, 2017 at 3:41 PM, Pradeep Kilambi  wrote:

> On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin 
> wrote:
> > Greetings,
> >
> > I was looking through the mailing list and I did not see any emails
> > explicitly calling out the decreased coverage for telemetry in tripleo
> due
> > to [1].  A series of changes went into the CI system to disable telemetry
> > [2].
> >
> > There is work being done to restore more coverage for telemetry by
> limiting
> > the resources it consumes [3].  We are also working on additional
> scenarios
> > in t-h-t/ci/environments/ to better cover ceilometer.
> >
> > If the CI environment you are working in has the resources to cover
> > ceilometer that is great, however if you find issues like [1] we highly
> > suggest you follow the same pattern until coverage is restored upstream.
> >
> > Thank you!
> >
> > [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> > [2] https://review.openstack.org/#/q/topic:bug/1680195
> > [3]
> > https://review.openstack.org/#/c/475838/
> > https://review.openstack.org/#/c/474969/
> > https://review.openstack.org/#/c/47/
> >
> >
>
> Thanks for starting this thread Wes. I concur with this. We got bitten
> recently by many issues that we could have caught in ci had telemetry
> been enabled. I spoke to trown and Emilien about this a few times
> already. I do understand the resource footprint it causes.  But with
> recent improvements and changes upstream, things should be back to
> being more manageable. We do have telemetry tested in scenario001 job,
> but that doesn't cover all scenarios. So there is a gap in coverage.
>
>  I hope we can either re-enable these services by default in CI and
> how things work or at least add a separate gate job to be able to test
> HA scenario properly with telemetry enabled.
>
> --
> Cheers,
> ~ Prad
>

While Prad and were having the conversation, I raised the point that the
tripleo
community may be more willing to turn on more coverage for ceilometer if
the
gate-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet-nv job that runs
on ceilometer changes
was moved from non-voting to a voting job.

Note, we are trying to get more and more projects to run tripleo based jobs
in their check gates generally.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] decreased coverage for telemetry

2017-07-11 Thread Pradeep Kilambi

On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin  wrote:
> Greetings,
>
> I was looking through the mailing list and I did not see any emails
> explicitly calling out the decreased coverage for telemetry in tripleo due
> to [1].  A series of changes went into the CI system to disable telemetry
> [2].
>
> There is work being done to restore more coverage for telemetry by limiting
> the resources it consumes [3].  We are also working on additional scenarios
> in t-h-t/ci/environments/ to better cover ceilometer.
>
> If the CI environment you are working in has the resources to cover
> ceilometer that is great, however if you find issues like [1] we highly
> suggest you follow the same pattern until coverage is restored upstream.
>
> Thank you!
>
> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> [2] https://review.openstack.org/#/q/topic:bug/1680195
> [3]
> https://review.openstack.org/#/c/475838/
> https://review.openstack.org/#/c/474969/
> https://review.openstack.org/#/c/47/
>
>

Thanks for starting this thread Wes. I concur with this. We got bitten
recently by many issues that we could have caught in ci had telemetry
been enabled. I spoke to trown and Emilien about this a few times
already. I do understand the resource footprint it causes.  But with
recent improvements and changes upstream, things should be back to
being more manageable. We do have telemetry tested in scenario001 job,
but that doesn't cover all scenarios. So there is a gap in coverage.

 I hope we can either re-enable these services by default in CI and
how things work or at least add a separate gate job to be able to test
HA scenario properly with telemetry enabled.

-- 
Cheers,
~ Prad

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-07-04 Thread Emilien Macchi

On Tue, Jul 4, 2017 at 3:49 PM, Sagi Shnaidman  wrote:
> Every job contains topology file too, like "1cont_1comp" for example. And
> generally could be different jobs that run the same featureset024 but with
> different topologies. So I think the topology part is necessary too.

In upstream CI we don't have complex topologies, our major ones are
ovb and multinodes. I don't have strong opinions and would be ok to
add topology in jobname, as long as we try to keep it stupid and
simple to keep debugging friendly.

>
>
> On Tue, Jul 4, 2017 at 8:45 PM, Emilien Macchi  wrote:
>>
>> On Fri, Jun 30, 2017 at 11:06 AM, Jiří Stránský  wrote:
>> > On 30.6.2017 15:04, Attila Darazs wrote:
>> >>
>> >> = Renaming the CI jobs =
>> >>
>> >> When we started the job transition to Quickstart, we introduced the
>> >> concept of featuresets[1] that define a certain combination of features
>> >> for each job.
>> >>
>> >> This seemed to be a sensible solution, as it's not practical to mention
>> >> all the individual features in the job name, and short names can be
>> >> misleading (for example ovb-ha job does so much more than tests HA).
>> >>
>> >> We decided to keep the original names for these jobs to simplify the
>> >> transition, but the plan is to rename them to something that will help
>> >> to reproduce the jobs locally with Quickstart.
>> >>
>> >> The proposed naming scheme will be the same as the one we're now using
>> >> for job type in project-config:
>> >>
>> >> gate-tripleo-ci-centos-7-{node-config}-{featureset-config}
>> >>
>> >> So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
>> >> would look like
>> >> "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"
>> >
>> >
>> > I'd prefer to keep the job names somewhat descriptive... If i had to
>> > pick
>> > one or the other, i'd rather stick with the current way, as at least for
>> > me
>> > it's higher priority to see descriptive names in CI results than saving
>> > time
>> > on finding featureset file mapping when needing to reproduce a job
>> > result.
>> > My eyes scan probably more than a hundred of individual CI job results
>> > daily, but i only need to reproduce 0 or 1 job failures locally usually.
>> >
>> > Alternatively, could we rename "featureset001.yaml" into
>> > "featureset-ovb-ha.yaml" and then have i guess something like
>> > "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name?
>> > Maybe
>> > "ovb" would be there twice, in case it's needed both in node config and
>> > featureset parts of the job name...
>>
>> I'm in favor of keeping jobnames as simple as possible.
>> To me, we should use something like
>> gate-tripleo-ci-centos-7-ovb-featureset001
>>
>> So we know:
>>
>> - it's a tripleo gate job running on centos7
>> - it's OVB and not multinode
>> - it's deploying featureset001
>>
>> Please don't mention HA or ceph or other features in the name because
>> it would be too rigid in case of featureset would change the coverage.
>>
>> Note: if we go that way, we also might want to rename scenario jobs
>> and use featureset in the job name.
>> Note2: if we rename jobs, we need to keep doing good work on
>> documenting what featureset deploy and make
>>
>> https://github.com/openstack/tripleo-quickstart/blob/master/doc/source/feature-configuration.rst
>> more visible probably.
>>
>> My 2 cents.
>>
>> > Or we could pull the mapping between job name and job type in an
>> > automated
>> > way from project-config.
>> >
>> > (Will be on PTO for a week from now, apologies if i don't respond timely
>> > here.)
>> >
>> >
>> > Have a good day,
>> >
>> > Jirka
>> >
>> >>
>> >> The advantage of this will be that it will be easy to reproduce a gate
>> >> job on a local virthost by typing something like:
>> >>
>> >> ./quickstart.sh --release tripleo-ci/master \
>> >>   --nodes config/nodes/3ctlr_1comp.yml \
>> >>   --config config/general_config/featureset001.yml \
>> >>   
>> >>
>> >> Please let us know if this method sounds like a step forward.
>> >
>> >
>> >
>> > __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Best regards
> Sagi Shnaidman
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-07-04 Thread Sagi Shnaidman

Every job contains topology file too, like "1cont_1comp" for example. And
generally could be different jobs that run the same featureset024 but with
different topologies. So I think the topology part is necessary too.



On Tue, Jul 4, 2017 at 8:45 PM, Emilien Macchi  wrote:

> On Fri, Jun 30, 2017 at 11:06 AM, Jiří Stránský  wrote:
> > On 30.6.2017 15:04, Attila Darazs wrote:
> >>
> >> = Renaming the CI jobs =
> >>
> >> When we started the job transition to Quickstart, we introduced the
> >> concept of featuresets[1] that define a certain combination of features
> >> for each job.
> >>
> >> This seemed to be a sensible solution, as it's not practical to mention
> >> all the individual features in the job name, and short names can be
> >> misleading (for example ovb-ha job does so much more than tests HA).
> >>
> >> We decided to keep the original names for these jobs to simplify the
> >> transition, but the plan is to rename them to something that will help
> >> to reproduce the jobs locally with Quickstart.
> >>
> >> The proposed naming scheme will be the same as the one we're now using
> >> for job type in project-config:
> >>
> >> gate-tripleo-ci-centos-7-{node-config}-{featureset-config}
> >>
> >> So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
> >> would look like "gate-tripleo-ci-centos-7-ovb-
> 3ctlr_1comp-featureset001"
> >
> >
> > I'd prefer to keep the job names somewhat descriptive... If i had to pick
> > one or the other, i'd rather stick with the current way, as at least for
> me
> > it's higher priority to see descriptive names in CI results than saving
> time
> > on finding featureset file mapping when needing to reproduce a job
> result.
> > My eyes scan probably more than a hundred of individual CI job results
> > daily, but i only need to reproduce 0 or 1 job failures locally usually.
> >
> > Alternatively, could we rename "featureset001.yaml" into
> > "featureset-ovb-ha.yaml" and then have i guess something like
> > "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name?
> Maybe
> > "ovb" would be there twice, in case it's needed both in node config and
> > featureset parts of the job name...
>
> I'm in favor of keeping jobnames as simple as possible.
> To me, we should use something like gate-tripleo-ci-centos-7-ovb-
> featureset001
>
> So we know:
>
> - it's a tripleo gate job running on centos7
> - it's OVB and not multinode
> - it's deploying featureset001
>
> Please don't mention HA or ceph or other features in the name because
> it would be too rigid in case of featureset would change the coverage.
>
> Note: if we go that way, we also might want to rename scenario jobs
> and use featureset in the job name.
> Note2: if we rename jobs, we need to keep doing good work on
> documenting what featureset deploy and make
> https://github.com/openstack/tripleo-quickstart/blob/
> master/doc/source/feature-configuration.rst
> more visible probably.
>
> My 2 cents.
>
> > Or we could pull the mapping between job name and job type in an
> automated
> > way from project-config.
> >
> > (Will be on PTO for a week from now, apologies if i don't respond timely
> > here.)
> >
> >
> > Have a good day,
> >
> > Jirka
> >
> >>
> >> The advantage of this will be that it will be easy to reproduce a gate
> >> job on a local virthost by typing something like:
> >>
> >> ./quickstart.sh --release tripleo-ci/master \
> >>   --nodes config/nodes/3ctlr_1comp.yml \
> >>   --config config/general_config/featureset001.yml \
> >>   
> >>
> >> Please let us know if this method sounds like a step forward.
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-07-04 Thread Emilien Macchi

On Fri, Jun 30, 2017 at 11:06 AM, Jiří Stránský  wrote:
> On 30.6.2017 15:04, Attila Darazs wrote:
>>
>> = Renaming the CI jobs =
>>
>> When we started the job transition to Quickstart, we introduced the
>> concept of featuresets[1] that define a certain combination of features
>> for each job.
>>
>> This seemed to be a sensible solution, as it's not practical to mention
>> all the individual features in the job name, and short names can be
>> misleading (for example ovb-ha job does so much more than tests HA).
>>
>> We decided to keep the original names for these jobs to simplify the
>> transition, but the plan is to rename them to something that will help
>> to reproduce the jobs locally with Quickstart.
>>
>> The proposed naming scheme will be the same as the one we're now using
>> for job type in project-config:
>>
>> gate-tripleo-ci-centos-7-{node-config}-{featureset-config}
>>
>> So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
>> would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"
>
>
> I'd prefer to keep the job names somewhat descriptive... If i had to pick
> one or the other, i'd rather stick with the current way, as at least for me
> it's higher priority to see descriptive names in CI results than saving time
> on finding featureset file mapping when needing to reproduce a job result.
> My eyes scan probably more than a hundred of individual CI job results
> daily, but i only need to reproduce 0 or 1 job failures locally usually.
>
> Alternatively, could we rename "featureset001.yaml" into
> "featureset-ovb-ha.yaml" and then have i guess something like
> "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name? Maybe
> "ovb" would be there twice, in case it's needed both in node config and
> featureset parts of the job name...

I'm in favor of keeping jobnames as simple as possible.
To me, we should use something like gate-tripleo-ci-centos-7-ovb-featureset001

So we know:

- it's a tripleo gate job running on centos7
- it's OVB and not multinode
- it's deploying featureset001

Please don't mention HA or ceph or other features in the name because
it would be too rigid in case of featureset would change the coverage.

Note: if we go that way, we also might want to rename scenario jobs
and use featureset in the job name.
Note2: if we rename jobs, we need to keep doing good work on
documenting what featureset deploy and make
https://github.com/openstack/tripleo-quickstart/blob/master/doc/source/feature-configuration.rst
more visible probably.

My 2 cents.

> Or we could pull the mapping between job name and job type in an automated
> way from project-config.
>
> (Will be on PTO for a week from now, apologies if i don't respond timely
> here.)
>
>
> Have a good day,
>
> Jirka
>
>>
>> The advantage of this will be that it will be easy to reproduce a gate
>> job on a local virthost by typing something like:
>>
>> ./quickstart.sh --release tripleo-ci/master \
>>   --nodes config/nodes/3ctlr_1comp.yml \
>>   --config config/general_config/featureset001.yml \
>>   
>>
>> Please let us know if this method sounds like a step forward.
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-07-03 Thread Sagi Shnaidman

HI
I think job names should be meaningful too. We can include like
"featureset024" or even "-f024"
 in job name to make reproducing easily, or just to make another table of
featuresets and job names, like we have for file names and features.
gate-tripleo-ci-centos-7-ovb-f024-ha-cont-iso-bonds-ipv6-1ctrl_1comp_1ceph
seems not too long and gives a clue what runs in this job without looking
for job configuration also for people outside tripleo. Our jobs run not
only on TripleO CI, but on neutron, nova, etc

Thanks



On Fri, Jun 30, 2017 at 6:06 PM, Jiří Stránský  wrote:

> On 30.6.2017 15:04, Attila Darazs wrote:
>
>> = Renaming the CI jobs =
>>
>> When we started the job transition to Quickstart, we introduced the
>> concept of featuresets[1] that define a certain combination of features
>> for each job.
>>
>> This seemed to be a sensible solution, as it's not practical to mention
>> all the individual features in the job name, and short names can be
>> misleading (for example ovb-ha job does so much more than tests HA).
>>
>> We decided to keep the original names for these jobs to simplify the
>> transition, but the plan is to rename them to something that will help
>> to reproduce the jobs locally with Quickstart.
>>
>> The proposed naming scheme will be the same as the one we're now using
>> for job type in project-config:
>>
>> gate-tripleo-ci-centos-7-{node-config}-{featureset-config}
>>
>> So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
>> would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"
>>
>
> I'd prefer to keep the job names somewhat descriptive... If i had to pick
> one or the other, i'd rather stick with the current way, as at least for me
> it's higher priority to see descriptive names in CI results than saving
> time on finding featureset file mapping when needing to reproduce a job
> result. My eyes scan probably more than a hundred of individual CI job
> results daily, but i only need to reproduce 0 or 1 job failures locally
> usually.
>
> Alternatively, could we rename "featureset001.yaml" into
> "featureset-ovb-ha.yaml" and then have i guess something like
> "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name? Maybe
> "ovb" would be there twice, in case it's needed both in node config and
> featureset parts of the job name...
>
> Or we could pull the mapping between job name and job type in an automated
> way from project-config.
>
> (Will be on PTO for a week from now, apologies if i don't respond timely
> here.)
>
>
> Have a good day,
>
> Jirka
>
>
>> The advantage of this will be that it will be easy to reproduce a gate
>> job on a local virthost by typing something like:
>>
>> ./quickstart.sh --release tripleo-ci/master \
>>   --nodes config/nodes/3ctlr_1comp.yml \
>>   --config config/general_config/featureset001.yml \
>>   
>>
>> Please let us know if this method sounds like a step forward.
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-06-30 Thread Jiří Stránský


On 30.6.2017 17:06, Jiří Stránský wrote:

On 30.6.2017 15:04, Attila Darazs wrote:

= Renaming the CI jobs =

When we started the job transition to Quickstart, we introduced the
concept of featuresets[1] that define a certain combination of features
for each job.

This seemed to be a sensible solution, as it's not practical to mention
all the individual features in the job name, and short names can be
misleading (for example ovb-ha job does so much more than tests HA).

We decided to keep the original names for these jobs to simplify the
transition, but the plan is to rename them to something that will help
to reproduce the jobs locally with Quickstart.

The proposed naming scheme will be the same as the one we're now using
for job type in project-config:

gate-tripleo-ci-centos-7-{node-config}-{featureset-config}

So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"


I'd prefer to keep the job names somewhat descriptive... If i had to
pick one or the other, i'd rather stick with the current way, as at
least for me it's higher priority to see descriptive names in CI results
than saving time on finding featureset file mapping when needing to
reproduce a job result. My eyes scan probably more than a hundred of
individual CI job results daily, but i only need to reproduce 0 or 1 job
failures locally usually.

Alternatively, could we rename "featureset001.yaml" into
"featureset-ovb-ha.yaml" and then have i guess something like
"gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name?
Maybe "ovb" would be there twice, in case it's needed both in node
config and featureset parts of the job name...

Or we could pull the mapping between job name and job type in an
automated way from project-config.


^ I mean for the purposes of reproducing a CI job, in a similar way we 
do it for running the CI job in the first place.




(Will be on PTO for a week from now, apologies if i don't respond timely
here.)


Have a good day,

Jirka



The advantage of this will be that it will be easy to reproduce a gate
job on a local virthost by typing something like:

./quickstart.sh --release tripleo-ci/master \
   --nodes config/nodes/3ctlr_1comp.yml \
   --config config/general_config/featureset001.yml \
   

Please let us know if this method sounds like a step forward.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 26) - job renaming discussion

2017-06-30 Thread Jiří Stránský


On 30.6.2017 15:04, Attila Darazs wrote:

= Renaming the CI jobs =

When we started the job transition to Quickstart, we introduced the
concept of featuresets[1] that define a certain combination of features
for each job.

This seemed to be a sensible solution, as it's not practical to mention
all the individual features in the job name, and short names can be
misleading (for example ovb-ha job does so much more than tests HA).

We decided to keep the original names for these jobs to simplify the
transition, but the plan is to rename them to something that will help
to reproduce the jobs locally with Quickstart.

The proposed naming scheme will be the same as the one we're now using
for job type in project-config:

gate-tripleo-ci-centos-7-{node-config}-{featureset-config}

So for example the current "gate-tripleo-ci-centos-7-ovb-ha-oooq" job
would look like "gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001"


I'd prefer to keep the job names somewhat descriptive... If i had to 
pick one or the other, i'd rather stick with the current way, as at 
least for me it's higher priority to see descriptive names in CI results 
than saving time on finding featureset file mapping when needing to 
reproduce a job result. My eyes scan probably more than a hundred of 
individual CI job results daily, but i only need to reproduce 0 or 1 job 
failures locally usually.


Alternatively, could we rename "featureset001.yaml" into 
"featureset-ovb-ha.yaml" and then have i guess something like 
"gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-ovb-ha" for the job name? 
Maybe "ovb" would be there twice, in case it's needed both in node 
config and featureset parts of the job name...


Or we could pull the mapping between job name and job type in an 
automated way from project-config.


(Will be on PTO for a week from now, apologies if i don't respond timely 
here.)



Have a good day,

Jirka



The advantage of this will be that it will be easy to reproduce a gate
job on a local virthost by typing something like:

./quickstart.sh --release tripleo-ci/master \
  --nodes config/nodes/3ctlr_1comp.yml \
  --config config/general_config/featureset001.yml \
  

Please let us know if this method sounds like a step forward.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] where to find the CI backlog and issues we're tracking

2017-06-21 Thread Wesley Hayutin

On Tue, Jun 20, 2017 at 2:51 PM, Emilien Macchi  wrote:

> On Tue, Jun 20, 2017 at 12:49 PM, Wesley Hayutin 
> wrote:
> > Greetings,
> >
> > It's become apparent that everyone in the tripleo community may not be
> aware
> > of where CI specific work is tracked.
> >
> > To find out which CI related features or bug fixes are in progress or to
> see
> > the backlog please consult [1].
> >
> > To find out what issues have been found in OpenStack via CI please
> consult
> > [2].
> >
> > Thanks!
>
> Thanks Wes for these informations. I was about to start adding more
> links and informations when I realized monitoring TripleO CI might
> deserve a little bit of training and documentation.
> I'll take some time this week to create a new section in TripleO docs
> with useful informations that we can easily share with our community
> so everyone can learn how to be aware about CI status.
>
>
Emilien,
That's a really good point, we should have this information in doc.
You are a busy guy, we'll take care of that.

Thanks for the input!


>
> >
> > [1] https://trello.com/b/U1ITy0cu/tripleo-ci-squad
> > [2] https://trello.com/b/WXJTwsuU/tripleo-and-rdo-ci-status
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] where to find the CI backlog and issues we're tracking

2017-06-20 Thread Emilien Macchi

On Tue, Jun 20, 2017 at 12:49 PM, Wesley Hayutin  wrote:
> Greetings,
>
> It's become apparent that everyone in the tripleo community may not be aware
> of where CI specific work is tracked.
>
> To find out which CI related features or bug fixes are in progress or to see
> the backlog please consult [1].
>
> To find out what issues have been found in OpenStack via CI please consult
> [2].
>
> Thanks!

Thanks Wes for these informations. I was about to start adding more
links and informations when I realized monitoring TripleO CI might
deserve a little bit of training and documentation.
I'll take some time this week to create a new section in TripleO docs
with useful informations that we can easily share with our community
so everyone can learn how to be aware about CI status.


>
> [1] https://trello.com/b/U1ITy0cu/tripleo-ci-squad
> [2] https://trello.com/b/WXJTwsuU/tripleo-and-rdo-ci-status
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Emilien Macchi

On Tue, Jun 13, 2017 at 3:11 PM, Ben Nemec  wrote:
>
>
> On 06/13/2017 12:28 PM, Paul Belanger wrote:
>>
>> On Tue, Jun 13, 2017 at 11:12:08AM -0500, Ben Nemec wrote:
>>>
>>>
>>>
>>> On 06/12/2017 06:19 PM, Ronelle Landy wrote:

 Greetings,

 TripleO OVB check gates are managed by upstream Zuul and executed on
 nodes provided by test cloud RH1. RDO Cloud is now available as a test
 cloud to be used when running CI jobs. To utilize to RDO Cloud, we could
 either:

 - continue to run from upstream Zuul (and spin up nodes to deploy
 the overcloud from RDO Cloud)
 - switch the TripleO OVB check gates to run as third party and
 manage these jobs from the Zuul instance used by Software Factory

 The openstack infra team advocates moving to third party.
 The CI team is meeting with Frederic Lepied, Alan Pevec, and other
 members of the Software Factory/RDO project infra tream to discuss how
 this move could be managed.

 Note: multinode jobs are not impacted - and will continue to run from
 upstream Zuul on nodes provided by nodepool.

 Since a move to third party could have significant impact, we are
 posting this out to gather feedback and/or concerns that TripleO
 developers may have.
>>>
>>>
>>> I'm +1 on moving to third-party...eventually.  I don't think it should be
>>> done at the same time as we move to a new cloud, which is a major change
>>> in
>>> and of itself.  I suppose we could do the third-party transition in
>>> parallel
>>> with the existing rh1 jobs, but as one of the people who will probably
>>> have
>>> to debug problems in RDO cloud I'd rather keep the number of variables to
>>> a
>>> minimum.  Once we're reasonably confident that RDO cloud is stable and
>>> handling our workload well we can transition to third-party and deal with
>>> the problems that will no doubt cause on their own.
>>>
>> This was a goal for tripleo-test-cloud-rh2, to move that to thirdparty CI,
>> ensure jobs work, then migrated. As you can see, we never actually did
>> that.
>>
>> My preference would be to make the move the thirdparty now, with
>> tripleo-test-cloud-rh1.  We now have all the pieces in place for RDO
>> project to
>> support this and in parallel set up RDO cloud to run jobs from RDO.
>>
>> If RDO stablility is a concern, the move to thirdparty first seems to make
>> the
>> most sense. This avoid the need to bring RDO cloud online, ensure it
>> works, then
>> move it again, and re-insure it works.
>>
>> Again, the move can be made seemless by turning down some of the capacity
>> in
>> nodepool.o.o and increase capacity in nodepool.rdoproject.org. And I am
>> happy to
>> help work with RDO on making this happen.
>
>
> I'm good with doing the third-party migration first too.  I'm only looking
> to avoid two concurrent major changes.

+1, I do agree with Ben here.

Go for it!

>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Paul Belanger

On Tue, Jun 13, 2017 at 02:11:53PM -0500, Ben Nemec wrote:
> 
> 
> On 06/13/2017 12:28 PM, Paul Belanger wrote:
> > On Tue, Jun 13, 2017 at 11:12:08AM -0500, Ben Nemec wrote:
> > > 
> > > 
> > > On 06/12/2017 06:19 PM, Ronelle Landy wrote:
> > > > Greetings,
> > > > 
> > > > TripleO OVB check gates are managed by upstream Zuul and executed on
> > > > nodes provided by test cloud RH1. RDO Cloud is now available as a test
> > > > cloud to be used when running CI jobs. To utilize to RDO Cloud, we could
> > > > either:
> > > > 
> > > > - continue to run from upstream Zuul (and spin up nodes to deploy
> > > > the overcloud from RDO Cloud)
> > > > - switch the TripleO OVB check gates to run as third party and
> > > > manage these jobs from the Zuul instance used by Software Factory
> > > > 
> > > > The openstack infra team advocates moving to third party.
> > > > The CI team is meeting with Frederic Lepied, Alan Pevec, and other
> > > > members of the Software Factory/RDO project infra tream to discuss how
> > > > this move could be managed.
> > > > 
> > > > Note: multinode jobs are not impacted - and will continue to run from
> > > > upstream Zuul on nodes provided by nodepool.
> > > > 
> > > > Since a move to third party could have significant impact, we are
> > > > posting this out to gather feedback and/or concerns that TripleO
> > > > developers may have.
> > > 
> > > I'm +1 on moving to third-party...eventually.  I don't think it should be
> > > done at the same time as we move to a new cloud, which is a major change 
> > > in
> > > and of itself.  I suppose we could do the third-party transition in 
> > > parallel
> > > with the existing rh1 jobs, but as one of the people who will probably 
> > > have
> > > to debug problems in RDO cloud I'd rather keep the number of variables to 
> > > a
> > > minimum.  Once we're reasonably confident that RDO cloud is stable and
> > > handling our workload well we can transition to third-party and deal with
> > > the problems that will no doubt cause on their own.
> > > 
> > This was a goal for tripleo-test-cloud-rh2, to move that to thirdparty CI,
> > ensure jobs work, then migrated. As you can see, we never actually did that.
> > 
> > My preference would be to make the move the thirdparty now, with
> > tripleo-test-cloud-rh1.  We now have all the pieces in place for RDO 
> > project to
> > support this and in parallel set up RDO cloud to run jobs from RDO.
> > 
> > If RDO stablility is a concern, the move to thirdparty first seems to make 
> > the
> > most sense. This avoid the need to bring RDO cloud online, ensure it works, 
> > then
> > move it again, and re-insure it works.
> > 
> > Again, the move can be made seemless by turning down some of the capacity in
> > nodepool.o.o and increase capacity in nodepool.rdoproject.org. And I am 
> > happy to
> > help work with RDO on making this happen.
> 
> I'm good with doing the third-party migration first too.  I'm only looking
> to avoid two concurrent major changes.
> 
Great, I am happy to hear that :D

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squad Meeting Summary (week 23) - images, devmode and the RDO Cloud

2017-06-13 Thread Emilien Macchi

On Fri, Jun 9, 2017 at 10:12 AM, Attila Darazs  wrote:
> If the topics below interest you and you want to contribute to the
> discussion, feel free to join the next meeting:
>
> Time: Thursdays, 14:30-15:30 UTC
> Place: https://bluejeans.com/4113567798/
>
> Full minutes: https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
>
> We had a packed agenda and intense discussion as always! Let's start with an
> announcement:
>
> The smoothly named "TripleO deploy time optimization hackathlon" will be
> held on 21st and 22nd of June. It would be great to have the cooperation of
> multiple teams here. See the etherpad[1] for details.
>
> = Extending our image building =
>
> It seems that multiple teams would like to utilize the upstream/RDO image
> building process and produce images just like we do upstream. Unfortunately
> our current image storage systems are not having enough bandwidth (either
> upstream or on the RDO level) to increase the amount of images served.
>
> Paul Belanger joined us and explained the longer term plans of OpenStack
> infra, which would provide a proper image/binary blob hosting solution in
> the 6 months time frame.
>
> In the short term, we will recreate both the upstream and RDO image hosting
> instances on the new RDO Cloud and will test the throughput.

Also, you might want to read the future OpenStack guidelines for
managing releases of binary artifacts:
https://review.openstack.org/#/c/469265/

> = Transitioning the promotion jobs =
>
> This task still needs some further work. We're missing feature parity on the
> ovb-updates job. As the CI Squad is not able to take responsibility for the
> update functionality, we will probably migrate the job with everything else
> but the update part and make that the new promotion job.

I don't think we need to wait on the conversion to switch.
We could just configure the promotion pipeline to run ovb-oooq-ha and
ovb-updates; and put the conversion in a parallel effort. Isn't?

> We will also extend the amount of jobs voting on a promotion, probably will
> the scenario jobs.

+1000 for having scenarios. Let's start by classic deployment, and
then later we'll probably add upgrades.

> = Devmode =
>
> Quickstart's devmode.sh seems to be picking up popularity among the TripleO
> developers. Meanwhile we're starting to realize the limitations of the
> interface it provides for Quickstart. We're going to have a design session
> next week on Tuesday (13th) at 1pm UTC where we will try to come up with
> some ideas to improve this.
>
> Ian Main suggested to default devmode.sh to deploy a containerized system so
> that developers get more familiar with that. We agreed on this being a good
> idea and will follow it up with some changes.
>
> = RDO Cloud =
>
> The RDO cloud transition is continuing, however Paul requested that we don't
> add the new cloud to the tripleo queue upstream but rather use the
> rdoproject's own zuul and nodepool to be a bit more independent and run it
> like a third party CI system. This will require further cooperation with RDO
> Infra folks.
>
> Meanwhile Sagi is setting up the infrastructure needed on the RDO Cloud
> instance to run CI jobs.
>
> Thank you for reading the summary. Have a great weekend!

Thanks for the report, very useful as usual.

> Best regards,
> Attila
>
> [1] https://etherpad.openstack.org/p/tripleo-deploy-time-hack
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Ben Nemec




On 06/13/2017 12:28 PM, Paul Belanger wrote:

On Tue, Jun 13, 2017 at 11:12:08AM -0500, Ben Nemec wrote:



On 06/12/2017 06:19 PM, Ronelle Landy wrote:

Greetings,

TripleO OVB check gates are managed by upstream Zuul and executed on
nodes provided by test cloud RH1. RDO Cloud is now available as a test
cloud to be used when running CI jobs. To utilize to RDO Cloud, we could
either:

- continue to run from upstream Zuul (and spin up nodes to deploy
the overcloud from RDO Cloud)
- switch the TripleO OVB check gates to run as third party and
manage these jobs from the Zuul instance used by Software Factory

The openstack infra team advocates moving to third party.
The CI team is meeting with Frederic Lepied, Alan Pevec, and other
members of the Software Factory/RDO project infra tream to discuss how
this move could be managed.

Note: multinode jobs are not impacted - and will continue to run from
upstream Zuul on nodes provided by nodepool.

Since a move to third party could have significant impact, we are
posting this out to gather feedback and/or concerns that TripleO
developers may have.


I'm +1 on moving to third-party...eventually.  I don't think it should be
done at the same time as we move to a new cloud, which is a major change in
and of itself.  I suppose we could do the third-party transition in parallel
with the existing rh1 jobs, but as one of the people who will probably have
to debug problems in RDO cloud I'd rather keep the number of variables to a
minimum.  Once we're reasonably confident that RDO cloud is stable and
handling our workload well we can transition to third-party and deal with
the problems that will no doubt cause on their own.


This was a goal for tripleo-test-cloud-rh2, to move that to thirdparty CI,
ensure jobs work, then migrated. As you can see, we never actually did that.

My preference would be to make the move the thirdparty now, with
tripleo-test-cloud-rh1.  We now have all the pieces in place for RDO project to
support this and in parallel set up RDO cloud to run jobs from RDO.

If RDO stablility is a concern, the move to thirdparty first seems to make the
most sense. This avoid the need to bring RDO cloud online, ensure it works, then
move it again, and re-insure it works.

Again, the move can be made seemless by turning down some of the capacity in
nodepool.o.o and increase capacity in nodepool.rdoproject.org. And I am happy to
help work with RDO on making this happen.


I'm good with doing the third-party migration first too.  I'm only 
looking to avoid two concurrent major changes.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Paul Belanger

On Tue, Jun 13, 2017 at 11:12:08AM -0500, Ben Nemec wrote:
> 
> 
> On 06/12/2017 06:19 PM, Ronelle Landy wrote:
> > Greetings,
> > 
> > TripleO OVB check gates are managed by upstream Zuul and executed on
> > nodes provided by test cloud RH1. RDO Cloud is now available as a test
> > cloud to be used when running CI jobs. To utilize to RDO Cloud, we could
> > either:
> > 
> > - continue to run from upstream Zuul (and spin up nodes to deploy
> > the overcloud from RDO Cloud)
> > - switch the TripleO OVB check gates to run as third party and
> > manage these jobs from the Zuul instance used by Software Factory
> > 
> > The openstack infra team advocates moving to third party.
> > The CI team is meeting with Frederic Lepied, Alan Pevec, and other
> > members of the Software Factory/RDO project infra tream to discuss how
> > this move could be managed.
> > 
> > Note: multinode jobs are not impacted - and will continue to run from
> > upstream Zuul on nodes provided by nodepool.
> > 
> > Since a move to third party could have significant impact, we are
> > posting this out to gather feedback and/or concerns that TripleO
> > developers may have.
> 
> I'm +1 on moving to third-party...eventually.  I don't think it should be
> done at the same time as we move to a new cloud, which is a major change in
> and of itself.  I suppose we could do the third-party transition in parallel
> with the existing rh1 jobs, but as one of the people who will probably have
> to debug problems in RDO cloud I'd rather keep the number of variables to a
> minimum.  Once we're reasonably confident that RDO cloud is stable and
> handling our workload well we can transition to third-party and deal with
> the problems that will no doubt cause on their own.
> 
This was a goal for tripleo-test-cloud-rh2, to move that to thirdparty CI,
ensure jobs work, then migrated. As you can see, we never actually did that.

My preference would be to make the move the thirdparty now, with
tripleo-test-cloud-rh1.  We now have all the pieces in place for RDO project to
support this and in parallel set up RDO cloud to run jobs from RDO.

If RDO stablility is a concern, the move to thirdparty first seems to make the
most sense. This avoid the need to bring RDO cloud online, ensure it works, then
move it again, and re-insure it works.

Again, the move can be made seemless by turning down some of the capacity in
nodepool.o.o and increase capacity in nodepool.rdoproject.org. And I am happy to
help work with RDO on making this happen.

PB

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Ben Nemec




On 06/12/2017 06:19 PM, Ronelle Landy wrote:

Greetings,

TripleO OVB check gates are managed by upstream Zuul and executed on
nodes provided by test cloud RH1. RDO Cloud is now available as a test
cloud to be used when running CI jobs. To utilize to RDO Cloud, we could
either:

- continue to run from upstream Zuul (and spin up nodes to deploy
the overcloud from RDO Cloud)
- switch the TripleO OVB check gates to run as third party and
manage these jobs from the Zuul instance used by Software Factory

The openstack infra team advocates moving to third party.
The CI team is meeting with Frederic Lepied, Alan Pevec, and other
members of the Software Factory/RDO project infra tream to discuss how
this move could be managed.

Note: multinode jobs are not impacted - and will continue to run from
upstream Zuul on nodes provided by nodepool.

Since a move to third party could have significant impact, we are
posting this out to gather feedback and/or concerns that TripleO
developers may have.


I'm +1 on moving to third-party...eventually.  I don't think it should 
be done at the same time as we move to a new cloud, which is a major 
change in and of itself.  I suppose we could do the third-party 
transition in parallel with the existing rh1 jobs, but as one of the 
people who will probably have to debug problems in RDO cloud I'd rather 
keep the number of variables to a minimum.  Once we're reasonably 
confident that RDO cloud is stable and handling our workload well we can 
transition to third-party and deal with the problems that will no doubt 
cause on their own.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] tripleo periodic jobs moving to RDO's software factory and RDO Cloud

2017-06-13 Thread Javier Pena



- Original Message -
> On Mon, Jun 12, 2017 at 05:01:26PM -0400, Wesley Hayutin wrote:
> > Greetings,
> > 
> > I wanted to send out a summary email regarding some work that is still
> > developing and being planned to give interested parties time to comment and
> > prepare for change.
> > 
> > Project:
> > Move tripleo periodic promotion jobs
> > 
> > Goal:
> > Increase the cadence of tripleo-ci periodic promotion jobs in a way
> > that does not impact upstream OpenStack zuul queues and infrastructure.
> > 
> > Next Steps:
> > The dependencies in RDO's instance of software factory are now complete
> > and we should be able to create a new a net new zuul queue in RDO infra for
> > tripleo-periodic jobs.  These jobs will have to run both multinode nodepool
> > and ovb style jobs and utilize RDO-Cloud as the host cloud provider.  The
> > TripleO CI team is looking into moving the TripleO periodic jobs running
> > upstream to run from RDO's software factory instance. This move will allow
> > the CI team more flexibility in managing the periodic jobs and resources to
> > run the jobs more frequently.
> > 
> > TLDR:
> > There is no set date as to when the periodic jobs will move. The move
> > will depend on tenant resource allocation and how easily the periodic jobs
> > can be modified.  This email is to inform the group that changes are being
> > planned to the tripleo periodic workflow and allow time for comment and
> > preparation.
> > 
> > Completed Background Work:
> > After long discussion with Paul Belanger about increasing the cadence
> > of the promotion jobs [1]. Paul explained infa's position and if he doesn't
> > -1/-2 a new pipeline that has the same priority as check jobs someone else
> > will. To summarize the point, the new pipeline would compete and slow down
> > non-tripleo projects in the gate even when the hardware resources are our
> > own.
> > To avoid slowing down non-tripleo projects Paul has volunteered to help
> > setup the infrastructure in rdoproject to manage the queue ( zuul etc). We
> > would still use rh-openstack-1 / rdocloud for ovb, and could also trigger
> > multinode nodepool jobs.
> > There is one hitch though, currently, rdo-project does not have all the
> > pieces of the puzzle in place to move off of openstack zuul and onto
> > rdoproject zuul. Paul mentioned that nodepool-builder [2] is a hard
> > requirement to be setup in rdoproject before we can proceed here. He
> > mentioned working with the software factory guys to get this setup and
> > running.
> > At this time, I think this issue is blocked until further discussion.
> > [1] https://review.openstack.org/#/c/443964/
> > [2]
> > https://github.com/openstack-infra/nodepool/blob/master/nodepool/builder.py
> > 
> > Thanks
> 
> The first step is landing the nodepool elements in nodepool.rdoproject.org,
> and
> building a centos-7 DIB.  I believe number80 is currently working on this and
> hopefully that could be landed in the next day or so.  Once images have been
> built, it won't be much work to then run a job. RDO already has 3rdparty jobs
> running, we'd to the same with tripleo-ci.
> 

I'm familiar with the 3rd party CI setup in review.rdoproject.org, since I 
maintain it for the rpm-packaging project. Please feel free to ping me if you 
need any help with the setup.

Javier

> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] TripleO OVB check gates to move to third party

2017-06-13 Thread Juan Antonio Osorio

I would really appreciate that this is done once we finish moving the
TLS-everywhere job to run over oooq. This is on the works currently.

On Tue, Jun 13, 2017 at 2:19 AM, Ronelle Landy  wrote:

> Greetings,
>
> TripleO OVB check gates are managed by upstream Zuul and executed on nodes
> provided by test cloud RH1. RDO Cloud is now available as a test cloud to
> be used when running CI jobs. To utilize to RDO Cloud, we could either:
>
> - continue to run from upstream Zuul (and spin up nodes to deploy the
> overcloud from RDO Cloud)
> - switch the TripleO OVB check gates to run as third party and manage
> these jobs from the Zuul instance used by Software Factory
>
> The openstack infra team advocates moving to third party.
> The CI team is meeting with Frederic Lepied, Alan Pevec, and other members
> of the Software Factory/RDO project infra tream to discuss how this move
> could be managed.
>
> Note: multinode jobs are not impacted - and will continue to run from
> upstream Zuul on nodes provided by nodepool.
>
> Since a move to third party could have significant impact, we are posting
> this out to gather feedback and/or concerns that TripleO developers may
> have.
>
>
> Thanks!
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Juan Antonio Osorio R.
e-mail: jaosor...@gmail.com
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] tripleo periodic jobs moving to RDO's software factory and RDO Cloud

2017-06-13 Thread Juan Antonio Osorio

Currently the TLS-everywhere job (fakeha-caserver) runs as a periodic job.
If there's gonna be a move. I would really appreciate that this is done
once we move that job to run over oooq. So don't loose that job.

On Tue, Jun 13, 2017 at 12:01 AM, Wesley Hayutin 
wrote:

> Greetings,
>
> I wanted to send out a summary email regarding some work that is still
> developing and being planned to give interested parties time to comment and
> prepare for change.
>
> Project:
> Move tripleo periodic promotion jobs
>
> Goal:
> Increase the cadence of tripleo-ci periodic promotion jobs in a way
> that does not impact upstream OpenStack zuul queues and infrastructure.
>
> Next Steps:
> The dependencies in RDO's instance of software factory are now
> complete and we should be able to create a new a net new zuul queue in RDO
> infra for tripleo-periodic jobs.  These jobs will have to run both
> multinode nodepool and ovb style jobs and utilize RDO-Cloud as the host
> cloud provider.  The TripleO CI team is looking into moving the TripleO
> periodic jobs running upstream to run from RDO's software factory instance.
> This move will allow the CI team more flexibility in managing the periodic
> jobs and resources to run the jobs more frequently.
>
> TLDR:
> There is no set date as to when the periodic jobs will move. The move
> will depend on tenant resource allocation and how easily the periodic jobs
> can be modified.  This email is to inform the group that changes are being
> planned to the tripleo periodic workflow and allow time for comment and
> preparation.
>
> Completed Background Work:
> After long discussion with Paul Belanger about increasing the cadence
> of the promotion jobs [1]. Paul explained infa's position and if he doesn't
> -1/-2 a new pipeline that has the same priority as check jobs someone else
> will. To summarize the point, the new pipeline would compete and slow down
> non-tripleo projects in the gate even when the hardware resources are our
> own.
> To avoid slowing down non-tripleo projects Paul has volunteered to help
> setup the infrastructure in rdoproject to manage the queue ( zuul etc). We
> would still use rh-openstack-1 / rdocloud for ovb, and could also trigger
> multinode nodepool jobs.
> There is one hitch though, currently, rdo-project does not have all the
> pieces of the puzzle in place to move off of openstack zuul and onto
> rdoproject zuul. Paul mentioned that nodepool-builder [2] is a hard
> requirement to be setup in rdoproject before we can proceed here. He
> mentioned working with the software factory guys to get this setup and
> running.
> At this time, I think this issue is blocked until further discussion.
> [1] https://review.openstack.org/#/c/443964/
> [2] https://github.com/openstack-infra/nodepool/blob/master/
> nodepool/builder.py
>
> Thanks
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Juan Antonio Osorio R.
e-mail: jaosor...@gmail.com
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci] tripleo periodic jobs moving to RDO's software factory and RDO Cloud

2017-06-12 Thread Paul Belanger

On Mon, Jun 12, 2017 at 05:01:26PM -0400, Wesley Hayutin wrote:
> Greetings,
> 
> I wanted to send out a summary email regarding some work that is still
> developing and being planned to give interested parties time to comment and
> prepare for change.
> 
> Project:
> Move tripleo periodic promotion jobs
> 
> Goal:
> Increase the cadence of tripleo-ci periodic promotion jobs in a way
> that does not impact upstream OpenStack zuul queues and infrastructure.
> 
> Next Steps:
> The dependencies in RDO's instance of software factory are now complete
> and we should be able to create a new a net new zuul queue in RDO infra for
> tripleo-periodic jobs.  These jobs will have to run both multinode nodepool
> and ovb style jobs and utilize RDO-Cloud as the host cloud provider.  The
> TripleO CI team is looking into moving the TripleO periodic jobs running
> upstream to run from RDO's software factory instance. This move will allow
> the CI team more flexibility in managing the periodic jobs and resources to
> run the jobs more frequently.
> 
> TLDR:
> There is no set date as to when the periodic jobs will move. The move
> will depend on tenant resource allocation and how easily the periodic jobs
> can be modified.  This email is to inform the group that changes are being
> planned to the tripleo periodic workflow and allow time for comment and
> preparation.
> 
> Completed Background Work:
> After long discussion with Paul Belanger about increasing the cadence
> of the promotion jobs [1]. Paul explained infa's position and if he doesn't
> -1/-2 a new pipeline that has the same priority as check jobs someone else
> will. To summarize the point, the new pipeline would compete and slow down
> non-tripleo projects in the gate even when the hardware resources are our
> own.
> To avoid slowing down non-tripleo projects Paul has volunteered to help
> setup the infrastructure in rdoproject to manage the queue ( zuul etc). We
> would still use rh-openstack-1 / rdocloud for ovb, and could also trigger
> multinode nodepool jobs.
> There is one hitch though, currently, rdo-project does not have all the
> pieces of the puzzle in place to move off of openstack zuul and onto
> rdoproject zuul. Paul mentioned that nodepool-builder [2] is a hard
> requirement to be setup in rdoproject before we can proceed here. He
> mentioned working with the software factory guys to get this setup and
> running.
> At this time, I think this issue is blocked until further discussion.
> [1] https://review.openstack.org/#/c/443964/
> [2]
> https://github.com/openstack-infra/nodepool/blob/master/nodepool/builder.py
> 
> Thanks

The first step is landing the nodepool elements in nodepool.rdoproject.org, and
building a centos-7 DIB.  I believe number80 is currently working on this and
hopefully that could be landed in the next day or so.  Once images have been
built, it won't be much work to then run a job. RDO already has 3rdparty jobs
running, we'd to the same with tripleo-ci.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

2017-06-08 Thread Ben Nemec




On 06/08/2017 10:16 AM, Emilien Macchi wrote:

On Thu, Jun 8, 2017 at 1:47 PM, Sofer Athlan-Guyot  wrote:

Hi,

Alex Schultz  writes:


On Wed, Jun 7, 2017 at 5:20 AM, Sofer Athlan-Guyot  wrote:

Hi,

Emilien Macchi  writes:


On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot  wrote:

Hi,

I don't think we have such a job in place.  Basically that would check
that re-running the "openstack deploy ..." command won't do anything.


I've had a look at openstack-infra/tripleo-ci.  Should I test it in with
ovb/quickstart or tripleo.sh.  Both way are fine by me, but I may be
lacking context about which one is more relevant.


We had such an error by the past[1], but I'm not sure this has been
captured by an associated job.

WDYT ?


It would be interesting to measure how much time does it take to run
it again.


Could you point out how such an experiment could be done ?


If it's short, we could add it to all our scenarios + ovb
jobs.  If it's long, maybe we need an additional job, but it would
take more resources, so maybe we could run it in periodic pipeline
(note that periodic jobs are not optimal since we could break
something quite easily).


Just adding as context that the issue was already raised[1].  Beside
time constraint, it was pointed out that we would also need to parse the
log to find out if anything was restarted.  But it could be a second
step.  For parsing, this code was pointed out[2].



There's a few things that would need to be enabled in order to reuse
some of this work.  We'll need to add the ability to generate a report
on the puppet run[0]. And then we'll need to be able to capture it[1]
somewhere that we could then use that parsing code on.  From there,
just rerunning the installation would be a simple start to the
idempotency check.  In fuel, we had hacked in a special flag[2] that
we used in testing to actually rerun the task immediately to find when
a specific task was not idempotent in addition to also rerunning the
entire deployment. For tripleo a similar concept would be to rerun the
steps twice but that's usually not where the issues crop us for us. So
rerunning the entire installation deployment would be better as we
tend to have issues with configuration items between steps
conflicting.


Maybe we could go with something equivalent to:

  ts="$(date '+%F %T')"
  ... re-run deploy command ...

  sudo journalctl --since="${ts}" | egrep 'Stopping|Starting' | grep -v 
'user.*slice' > restarted.log
  wc -l restarted.log

This should be 0 on every overcloud nodes.

This is simpler to implement and should catch any unwanted service
restart.

WDYT ?


It's smart, for services. It doesn't cover configuration files changes
and other resources managed by Puppet, like Keystone resources, etc.
But it's an excellent start to me.


I just want to point out that the updates job is already doing this when 
it runs in every repo except tripleo-heat-templates (that's the only 
package we actually update in the updates job, every other project is a 
noop).  I can also tell you how long it takes to redo a deployment with 
no changes: just under 2000 seconds, or around 33 minutes.  At least 
that's the current average in tripleo-ci right now (although I see we 
just added around 100 seconds to the update time in the last day or two. 
*sigh*).






Thanks,
-Alex

[0] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@204
[1] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@102
[2] https://review.openstack.org/#/c/273737/


[1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
[2] 
https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212




[1] https://bugs.launchpad.net/tripleo/+bug/1664650


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

2017-06-08 Thread Emilien Macchi

On Thu, Jun 8, 2017 at 1:47 PM, Sofer Athlan-Guyot  wrote:
> Hi,
>
> Alex Schultz  writes:
>
>> On Wed, Jun 7, 2017 at 5:20 AM, Sofer Athlan-Guyot  
>> wrote:
>>> Hi,
>>>
>>> Emilien Macchi  writes:
>>>
 On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot  
 wrote:
> Hi,
>
> I don't think we have such a job in place.  Basically that would check
> that re-running the "openstack deploy ..." command won't do anything.
>
> I've had a look at openstack-infra/tripleo-ci.  Should I test it in with
> ovb/quickstart or tripleo.sh.  Both way are fine by me, but I may be
> lacking context about which one is more relevant.
>
> We had such an error by the past[1], but I'm not sure this has been
> captured by an associated job.
>
> WDYT ?

 It would be interesting to measure how much time does it take to run
 it again.
>>>
>>> Could you point out how such an experiment could be done ?
>>>
 If it's short, we could add it to all our scenarios + ovb
 jobs.  If it's long, maybe we need an additional job, but it would
 take more resources, so maybe we could run it in periodic pipeline
 (note that periodic jobs are not optimal since we could break
 something quite easily).
>>>
>>> Just adding as context that the issue was already raised[1].  Beside
>>> time constraint, it was pointed out that we would also need to parse the
>>> log to find out if anything was restarted.  But it could be a second
>>> step.  For parsing, this code was pointed out[2].
>>>
>>
>> There's a few things that would need to be enabled in order to reuse
>> some of this work.  We'll need to add the ability to generate a report
>> on the puppet run[0]. And then we'll need to be able to capture it[1]
>> somewhere that we could then use that parsing code on.  From there,
>> just rerunning the installation would be a simple start to the
>> idempotency check.  In fuel, we had hacked in a special flag[2] that
>> we used in testing to actually rerun the task immediately to find when
>> a specific task was not idempotent in addition to also rerunning the
>> entire deployment. For tripleo a similar concept would be to rerun the
>> steps twice but that's usually not where the issues crop us for us. So
>> rerunning the entire installation deployment would be better as we
>> tend to have issues with configuration items between steps
>> conflicting.
>
> Maybe we could go with something equivalent to:
>
>   ts="$(date '+%F %T')"
>   ... re-run deploy command ...
>
>   sudo journalctl --since="${ts}" | egrep 'Stopping|Starting' | grep -v 
> 'user.*slice' > restarted.log
>   wc -l restarted.log
>
> This should be 0 on every overcloud nodes.
>
> This is simpler to implement and should catch any unwanted service
> restart.
>
> WDYT ?

It's smart, for services. It doesn't cover configuration files changes
and other resources managed by Puppet, like Keystone resources, etc.
But it's an excellent start to me.

>>
>> Thanks,
>> -Alex
>>
>> [0] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@204
>> [1] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@102
>> [2] https://review.openstack.org/#/c/273737/
>>
>>> [1] 
>>> http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
>>> [2] 
>>> https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212
>>>

> [1] https://bugs.launchpad.net/tripleo/+bug/1664650
> --
> Sofer Athlan-Guyot
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 --
 Emilien Macchi

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> --
>>> Sofer Athlan-Guyot
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> Thanks,
> --
> Sofer Athlan-Guyot
>
> __
> OpenStack Development

Re: [openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

2017-06-08 Thread Sofer Athlan-Guyot

Hi,

Alex Schultz  writes:

> On Wed, Jun 7, 2017 at 5:20 AM, Sofer Athlan-Guyot  
> wrote:
>> Hi,
>>
>> Emilien Macchi  writes:
>>
>>> On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot  
>>> wrote:
 Hi,

 I don't think we have such a job in place.  Basically that would check
 that re-running the "openstack deploy ..." command won't do anything.

I've had a look at openstack-infra/tripleo-ci.  Should I test it in with
ovb/quickstart or tripleo.sh.  Both way are fine by me, but I may be
lacking context about which one is more relevant.

 We had such an error by the past[1], but I'm not sure this has been
 captured by an associated job.

 WDYT ?
>>>
>>> It would be interesting to measure how much time does it take to run
>>> it again.
>>
>> Could you point out how such an experiment could be done ?
>>
>>> If it's short, we could add it to all our scenarios + ovb
>>> jobs.  If it's long, maybe we need an additional job, but it would
>>> take more resources, so maybe we could run it in periodic pipeline
>>> (note that periodic jobs are not optimal since we could break
>>> something quite easily).
>>
>> Just adding as context that the issue was already raised[1].  Beside
>> time constraint, it was pointed out that we would also need to parse the
>> log to find out if anything was restarted.  But it could be a second
>> step.  For parsing, this code was pointed out[2].
>>
>
> There's a few things that would need to be enabled in order to reuse
> some of this work.  We'll need to add the ability to generate a report
> on the puppet run[0]. And then we'll need to be able to capture it[1]
> somewhere that we could then use that parsing code on.  From there,
> just rerunning the installation would be a simple start to the
> idempotency check.  In fuel, we had hacked in a special flag[2] that
> we used in testing to actually rerun the task immediately to find when
> a specific task was not idempotent in addition to also rerunning the
> entire deployment. For tripleo a similar concept would be to rerun the
> steps twice but that's usually not where the issues crop us for us. So
> rerunning the entire installation deployment would be better as we
> tend to have issues with configuration items between steps
> conflicting.

Maybe we could go with something equivalent to:

  ts="$(date '+%F %T')"
  ... re-run deploy command ...
  
  sudo journalctl --since="${ts}" | egrep 'Stopping|Starting' | grep -v 
'user.*slice' > restarted.log
  wc -l restarted.log

This should be 0 on every overcloud nodes.

This is simpler to implement and should catch any unwanted service
restart.

WDYT ?

>
> Thanks,
> -Alex
>
> [0] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@204
> [1] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@102
> [2] https://review.openstack.org/#/c/273737/
>
>> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
>> [2] 
>> https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212
>>
>>>
 [1] https://bugs.launchpad.net/tripleo/+bug/1664650
 --
 Sofer Athlan-Guyot

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>> --
>>> Emilien Macchi
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> --
>> Sofer Athlan-Guyot
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Thanks,
-- 
Sofer Athlan-Guyot

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

2017-06-07 Thread Alex Schultz

On Wed, Jun 7, 2017 at 5:20 AM, Sofer Athlan-Guyot  wrote:
> Hi,
>
> Emilien Macchi  writes:
>
>> On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot  
>> wrote:
>>> Hi,
>>>
>>> I don't think we have such a job in place.  Basically that would check
>>> that re-running the "openstack deploy ..." command won't do anything.
>>>
>>> We had such an error by the past[1], but I'm not sure this has been
>>> captured by an associated job.
>>>
>>> WDYT ?
>>
>> It would be interesting to measure how much time does it take to run
>> it again.
>
> Could you point out how such an experiment could be done ?
>
>> If it's short, we could add it to all our scenarios + ovb
>> jobs.  If it's long, maybe we need an additional job, but it would
>> take more resources, so maybe we could run it in periodic pipeline
>> (note that periodic jobs are not optimal since we could break
>> something quite easily).
>
> Just adding as context that the issue was already raised[1].  Beside
> time constraint, it was pointed out that we would also need to parse the
> log to find out if anything was restarted.  But it could be a second
> step.  For parsing, this code was pointed out[2].
>

There's a few things that would need to be enabled in order to reuse
some of this work.  We'll need to add the ability to generate a report
on the puppet run[0]. And then we'll need to be able to capture it[1]
somewhere that we could then use that parsing code on.  From there,
just rerunning the installation would be a simple start to the
idempotency check.  In fuel, we had hacked in a special flag[2] that
we used in testing to actually rerun the task immediately to find when
a specific task was not idempotent in addition to also rerunning the
entire deployment. For tripleo a similar concept would be to rerun the
steps twice but that's usually not where the issues crop us for us. So
rerunning the entire installation deployment would be better as we
tend to have issues with configuration items between steps
conflicting.

Thanks,
-Alex

[0] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@204
[1] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@102
[2] https://review.openstack.org/#/c/273737/

> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
> [2] 
> https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212
>
>>
>>> [1] https://bugs.launchpad.net/tripleo/+bug/1664650
>>> --
>>> Sofer Athlan-Guyot
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> --
> Sofer Athlan-Guyot
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

2017-06-07 Thread Emilien Macchi

On Wed, Jun 7, 2017 at 1:20 PM, Sofer Athlan-Guyot  wrote:
> Hi,
>
> Emilien Macchi  writes:
>
>> On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot  
>> wrote:
>>> Hi,
>>>
>>> I don't think we have such a job in place.  Basically that would check
>>> that re-running the "openstack deploy ..." command won't do anything.
>>>
>>> We had such an error by the past[1], but I'm not sure this has been
>>> captured by an associated job.
>>>
>>> WDYT ?
>>
>> It would be interesting to measure how much time does it take to run
>> it again.
>
> Could you point out how such an experiment could be done ?

in openstack-infra/tripleo-ci, you would run another time the
overcloud deploy command. That's it.

>> If it's short, we could add it to all our scenarios + ovb
>> jobs.  If it's long, maybe we need an additional job, but it would
>> take more resources, so maybe we could run it in periodic pipeline
>> (note that periodic jobs are not optimal since we could break
>> something quite easily).
>
> Just adding as context that the issue was already raised[1].  Beside
> time constraint, it was pointed out that we would also need to parse the
> log to find out if anything was restarted.  But it could be a second
> step.  For parsing, this code was pointed out[2].
>
> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
> [2] 
> https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212
>
>>
>>> [1] https://bugs.launchpad.net/tripleo/+bug/1664650
>>> --
>>> Sofer Athlan-Guyot
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> --
> Sofer Athlan-Guyot
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 3 >

1 - 100 of 253 matches

Mail list logo