Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-21 Thread Steven Hardy
On Mon, Mar 21, 2016 at 10:19:47AM -0400, Emilien Macchi wrote:
> On Mon, Mar 21, 2016 at 9:59 AM, Steven Hardy  wrote:
> > On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote:
> >> On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy  wrote:
> >> > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote:
> >> >>Emilien,
> >> >>
> >> >>Agree on the rant. But not clear on concrete proposal to fix it.
> >> >>
> >> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage.
> >> >>
> >> >>Unless we test known working version of each project in TripleO CI 
> >> >> you are
> >> >>dependent on health of other components.
> >> >
> >> > I've so far resisted replying to this thread, because while valid, many 
> >> > of
> >> > the concerns expressed by Emilien are quite general complaints, and it's
> >> > hard to reply with specific solutions.
> >> >
> >> > However work *is* going on to improve many of these problems, let's see 
> >> > if
> >> > I can provide a summary, to clarify the various "concrete proposals" 
> >> > which
> >> > do exist.
> >> >
> >> > 1. Core team & review velocity
> >> >
> >> > We've had a small and very overloaded core team for a while now, and this
> >> > will be helped by expanding our community to include those who've been
> >> > regularly contributing excellent work and reviews as core reviewers:
> >> >
> >> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html
> >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html
> >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html
> >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html
> >> >
> >> > Note that I personally think it's absolutely fine for folks to be more
> >> > expert in some subsystem and to focus review extra attention on e.g API,
> >> > UI, Puppet or whatever.  This subsystem-core model has been well proven 
> >> > in
> >> > other projects, and folks will naturally broaden their areas of deeper
> >> > knowledge over time.
> >> >
> >> > Related to this is movement of code, such as the puppet-tripleo 
> >> > refactoring
> >> > mentioned by Michael - this has already started, and will help with
> >> > providing a cleaner interface between the puppet and heat pieces (which
> >> > will also help focus reviewer attention appropriately).
> >>
> >> Indeed, Michael, Dan & I are working on moving out the Puppet code
> >> from THT to puppet-tripleo.
> >> That's a nice move, and I appreciate TripleO team support on it.
> >>
> >> > 2. Day 1 developer experience
> >> >
> >> > This is closely related to the CI failure rate - there are efforts to
> >> > integrate with the RDO tripleo-quickstart tooling, which simplifies the
> >> > initial undercloud setup, and potentially makes consuming pre-built,
> >> > validated undercloud images (probably output artefacts from our new
> >> > periodic CI job) much easier.
> >> >
> >> > So, this will mean that both developers and CI can potentially be less
> >> > regularly impacted by trunk regressions which often cause CI to fail, and
> >> > break developer environments.
> >> >
> >> > https://review.openstack.org/#/c/276810/5
> >> >
> >> > 3. CI coverage and trunk failure rate
> >> >
> >> > We've been working really hard to improve things here, which are really
> >> > several inter-related issues:
> >> >
> >> > - Lack of Hardware capacity in the tripleo CI cloud
> >> > - Frequent trunk regressions breaking our CI
> >> > - Lack of coverage of some key features (network isolation, SSL, IPv6, 
> >> > upgrades)
> >> > - Lack of coverage for vendor plugin templates/puppet code
> >> >
> >> > There's work ongoing to improve this from multiple perspectives:
> >> >
> >> > New periodic CI job (to be used for automated promotion of the
> >> > current-tripleo repo, and for pre-built undercloud images):
> >> > https://review.openstack.org/#/c/271370/
> >> >
> >> > Add network isolation support to CI:
> >> > https://review.openstack.org/#/c/288163/
> >> >
> >> > Test SSL enabled in overcloud:
> >> > https://review.openstack.org/#/c/281988/
> >> >
> >> > CI coverage of IPv6:
> >> > https://review.openstack.org/#/c/289445/
> >> >
> >> > Discussion around better documented integration for third-party CI:
> >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html
> >>
> >> Do we have plans to execute Tempest?
> >
> > This is something which has been discussed several times, but right now we
> > don't have the time available to run it per-commit because we'll hit the
> > job timeout.
> >
> > This situation will improve as we gain time e.g through use of cached
> > pre-built images, but right now I think we could look at enabling it only
> > on the periodic job when that is fully proven.
> >
> > Having said that, I should point out that tempest doesn't get us great
> > coverage of some newer projects - e.g all Heat 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-21 Thread Emilien Macchi
On Mon, Mar 21, 2016 at 9:59 AM, Steven Hardy  wrote:
> On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote:
>> On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy  wrote:
>> > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote:
>> >>Emilien,
>> >>
>> >>Agree on the rant. But not clear on concrete proposal to fix it.
>> >>
>> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage.
>> >>
>> >>Unless we test known working version of each project in TripleO CI you 
>> >> are
>> >>dependent on health of other components.
>> >
>> > I've so far resisted replying to this thread, because while valid, many of
>> > the concerns expressed by Emilien are quite general complaints, and it's
>> > hard to reply with specific solutions.
>> >
>> > However work *is* going on to improve many of these problems, let's see if
>> > I can provide a summary, to clarify the various "concrete proposals" which
>> > do exist.
>> >
>> > 1. Core team & review velocity
>> >
>> > We've had a small and very overloaded core team for a while now, and this
>> > will be helped by expanding our community to include those who've been
>> > regularly contributing excellent work and reviews as core reviewers:
>> >
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html
>> >
>> > Note that I personally think it's absolutely fine for folks to be more
>> > expert in some subsystem and to focus review extra attention on e.g API,
>> > UI, Puppet or whatever.  This subsystem-core model has been well proven in
>> > other projects, and folks will naturally broaden their areas of deeper
>> > knowledge over time.
>> >
>> > Related to this is movement of code, such as the puppet-tripleo refactoring
>> > mentioned by Michael - this has already started, and will help with
>> > providing a cleaner interface between the puppet and heat pieces (which
>> > will also help focus reviewer attention appropriately).
>>
>> Indeed, Michael, Dan & I are working on moving out the Puppet code
>> from THT to puppet-tripleo.
>> That's a nice move, and I appreciate TripleO team support on it.
>>
>> > 2. Day 1 developer experience
>> >
>> > This is closely related to the CI failure rate - there are efforts to
>> > integrate with the RDO tripleo-quickstart tooling, which simplifies the
>> > initial undercloud setup, and potentially makes consuming pre-built,
>> > validated undercloud images (probably output artefacts from our new
>> > periodic CI job) much easier.
>> >
>> > So, this will mean that both developers and CI can potentially be less
>> > regularly impacted by trunk regressions which often cause CI to fail, and
>> > break developer environments.
>> >
>> > https://review.openstack.org/#/c/276810/5
>> >
>> > 3. CI coverage and trunk failure rate
>> >
>> > We've been working really hard to improve things here, which are really
>> > several inter-related issues:
>> >
>> > - Lack of Hardware capacity in the tripleo CI cloud
>> > - Frequent trunk regressions breaking our CI
>> > - Lack of coverage of some key features (network isolation, SSL, IPv6, 
>> > upgrades)
>> > - Lack of coverage for vendor plugin templates/puppet code
>> >
>> > There's work ongoing to improve this from multiple perspectives:
>> >
>> > New periodic CI job (to be used for automated promotion of the
>> > current-tripleo repo, and for pre-built undercloud images):
>> > https://review.openstack.org/#/c/271370/
>> >
>> > Add network isolation support to CI:
>> > https://review.openstack.org/#/c/288163/
>> >
>> > Test SSL enabled in overcloud:
>> > https://review.openstack.org/#/c/281988/
>> >
>> > CI coverage of IPv6:
>> > https://review.openstack.org/#/c/289445/
>> >
>> > Discussion around better documented integration for third-party CI:
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html
>>
>> Do we have plans to execute Tempest?
>
> This is something which has been discussed several times, but right now we
> don't have the time available to run it per-commit because we'll hit the
> job timeout.
>
> This situation will improve as we gain time e.g through use of cached
> pre-built images, but right now I think we could look at enabling it only
> on the periodic job when that is fully proven.
>
> Having said that, I should point out that tempest doesn't get us great
> coverage of some newer projects - e.g all Heat scenario coverage was moved
> out of the tempest tree, and other projects have done similar AFAIK, so we
> may end up with very sparse API surface tests (or nothing at all) in these 
> cases.

In Puppet OpenStack CI, we execute smoke tests (a few tests of each
service, and some scenarios), and some tests not in 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-21 Thread Steven Hardy
On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote:
> On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy  wrote:
> > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote:
> >>Emilien,
> >>
> >>Agree on the rant. But not clear on concrete proposal to fix it.
> >>
> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage.
> >>
> >>Unless we test known working version of each project in TripleO CI you 
> >> are
> >>dependent on health of other components.
> >
> > I've so far resisted replying to this thread, because while valid, many of
> > the concerns expressed by Emilien are quite general complaints, and it's
> > hard to reply with specific solutions.
> >
> > However work *is* going on to improve many of these problems, let's see if
> > I can provide a summary, to clarify the various "concrete proposals" which
> > do exist.
> >
> > 1. Core team & review velocity
> >
> > We've had a small and very overloaded core team for a while now, and this
> > will be helped by expanding our community to include those who've been
> > regularly contributing excellent work and reviews as core reviewers:
> >
> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html
> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html
> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html
> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html
> >
> > Note that I personally think it's absolutely fine for folks to be more
> > expert in some subsystem and to focus review extra attention on e.g API,
> > UI, Puppet or whatever.  This subsystem-core model has been well proven in
> > other projects, and folks will naturally broaden their areas of deeper
> > knowledge over time.
> >
> > Related to this is movement of code, such as the puppet-tripleo refactoring
> > mentioned by Michael - this has already started, and will help with
> > providing a cleaner interface between the puppet and heat pieces (which
> > will also help focus reviewer attention appropriately).
> 
> Indeed, Michael, Dan & I are working on moving out the Puppet code
> from THT to puppet-tripleo.
> That's a nice move, and I appreciate TripleO team support on it.
> 
> > 2. Day 1 developer experience
> >
> > This is closely related to the CI failure rate - there are efforts to
> > integrate with the RDO tripleo-quickstart tooling, which simplifies the
> > initial undercloud setup, and potentially makes consuming pre-built,
> > validated undercloud images (probably output artefacts from our new
> > periodic CI job) much easier.
> >
> > So, this will mean that both developers and CI can potentially be less
> > regularly impacted by trunk regressions which often cause CI to fail, and
> > break developer environments.
> >
> > https://review.openstack.org/#/c/276810/5
> >
> > 3. CI coverage and trunk failure rate
> >
> > We've been working really hard to improve things here, which are really
> > several inter-related issues:
> >
> > - Lack of Hardware capacity in the tripleo CI cloud
> > - Frequent trunk regressions breaking our CI
> > - Lack of coverage of some key features (network isolation, SSL, IPv6, 
> > upgrades)
> > - Lack of coverage for vendor plugin templates/puppet code
> >
> > There's work ongoing to improve this from multiple perspectives:
> >
> > New periodic CI job (to be used for automated promotion of the
> > current-tripleo repo, and for pre-built undercloud images):
> > https://review.openstack.org/#/c/271370/
> >
> > Add network isolation support to CI:
> > https://review.openstack.org/#/c/288163/
> >
> > Test SSL enabled in overcloud:
> > https://review.openstack.org/#/c/281988/
> >
> > CI coverage of IPv6:
> > https://review.openstack.org/#/c/289445/
> >
> > Discussion around better documented integration for third-party CI:
> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html
> 
> Do we have plans to execute Tempest?

This is something which has been discussed several times, but right now we
don't have the time available to run it per-commit because we'll hit the
job timeout.

This situation will improve as we gain time e.g through use of cached
pre-built images, but right now I think we could look at enabling it only
on the periodic job when that is fully proven.

Having said that, I should point out that tempest doesn't get us great
coverage of some newer projects - e.g all Heat scenario coverage was moved
out of the tempest tree, and other projects have done similar AFAIK, so we
may end up with very sparse API surface tests (or nothing at all) in these 
cases.

There is probably more we can do within the existing pingtest though, it
creates the instance inside a heat stack, so we could just add a bunch more
resources for all the overcloud services, and pretty quickly prove that the
deployed services are at least running (without extending the runtime much).

Steve

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-21 Thread Emilien Macchi
On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy  wrote:
> On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote:
>>Emilien,
>>
>>Agree on the rant. But not clear on concrete proposal to fix it.
>>
>>Spend more time “fixing” CI and use Tempest as a gate is a bit wage.
>>
>>Unless we test known working version of each project in TripleO CI you are
>>dependent on health of other components.
>
> I've so far resisted replying to this thread, because while valid, many of
> the concerns expressed by Emilien are quite general complaints, and it's
> hard to reply with specific solutions.
>
> However work *is* going on to improve many of these problems, let's see if
> I can provide a summary, to clarify the various "concrete proposals" which
> do exist.
>
> 1. Core team & review velocity
>
> We've had a small and very overloaded core team for a while now, and this
> will be helped by expanding our community to include those who've been
> regularly contributing excellent work and reviews as core reviewers:
>
> http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html
> http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html
> http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html
> http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html
>
> Note that I personally think it's absolutely fine for folks to be more
> expert in some subsystem and to focus review extra attention on e.g API,
> UI, Puppet or whatever.  This subsystem-core model has been well proven in
> other projects, and folks will naturally broaden their areas of deeper
> knowledge over time.
>
> Related to this is movement of code, such as the puppet-tripleo refactoring
> mentioned by Michael - this has already started, and will help with
> providing a cleaner interface between the puppet and heat pieces (which
> will also help focus reviewer attention appropriately).

Indeed, Michael, Dan & I are working on moving out the Puppet code
from THT to puppet-tripleo.
That's a nice move, and I appreciate TripleO team support on it.

> 2. Day 1 developer experience
>
> This is closely related to the CI failure rate - there are efforts to
> integrate with the RDO tripleo-quickstart tooling, which simplifies the
> initial undercloud setup, and potentially makes consuming pre-built,
> validated undercloud images (probably output artefacts from our new
> periodic CI job) much easier.
>
> So, this will mean that both developers and CI can potentially be less
> regularly impacted by trunk regressions which often cause CI to fail, and
> break developer environments.
>
> https://review.openstack.org/#/c/276810/5
>
> 3. CI coverage and trunk failure rate
>
> We've been working really hard to improve things here, which are really
> several inter-related issues:
>
> - Lack of Hardware capacity in the tripleo CI cloud
> - Frequent trunk regressions breaking our CI
> - Lack of coverage of some key features (network isolation, SSL, IPv6, 
> upgrades)
> - Lack of coverage for vendor plugin templates/puppet code
>
> There's work ongoing to improve this from multiple perspectives:
>
> New periodic CI job (to be used for automated promotion of the
> current-tripleo repo, and for pre-built undercloud images):
> https://review.openstack.org/#/c/271370/
>
> Add network isolation support to CI:
> https://review.openstack.org/#/c/288163/
>
> Test SSL enabled in overcloud:
> https://review.openstack.org/#/c/281988/
>
> CI coverage of IPv6:
> https://review.openstack.org/#/c/289445/
>
> Discussion around better documented integration for third-party CI:
> http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html

Do we have plans to execute Tempest?

> In summary, we're doing a ton of work as a community to address the
> concerns raised by Emilien, and we've still got a lot more to do, but there
> *is* clear agreement on many of the problems, and a concrete plan in most
> cases to resolve them.

The recent weeks  showed real improvements (like you mentioned with
examples) and that's a good sign for TripleO project.
Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-21 Thread Steven Hardy
On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote:
>Emilien,
> 
>Agree on the rant. But not clear on concrete proposal to fix it.
> 
>Spend more time “fixing” CI and use Tempest as a gate is a bit wage.
> 
>Unless we test known working version of each project in TripleO CI you are
>dependent on health of other components.

I've so far resisted replying to this thread, because while valid, many of
the concerns expressed by Emilien are quite general complaints, and it's
hard to reply with specific solutions.

However work *is* going on to improve many of these problems, let's see if
I can provide a summary, to clarify the various "concrete proposals" which
do exist.

1. Core team & review velocity

We've had a small and very overloaded core team for a while now, and this
will be helped by expanding our community to include those who've been
regularly contributing excellent work and reviews as core reviewers:

http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html
http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html
http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html
http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html

Note that I personally think it's absolutely fine for folks to be more
expert in some subsystem and to focus review extra attention on e.g API,
UI, Puppet or whatever.  This subsystem-core model has been well proven in
other projects, and folks will naturally broaden their areas of deeper
knowledge over time.

Related to this is movement of code, such as the puppet-tripleo refactoring
mentioned by Michael - this has already started, and will help with
providing a cleaner interface between the puppet and heat pieces (which
will also help focus reviewer attention appropriately).

2. Day 1 developer experience

This is closely related to the CI failure rate - there are efforts to
integrate with the RDO tripleo-quickstart tooling, which simplifies the
initial undercloud setup, and potentially makes consuming pre-built,
validated undercloud images (probably output artefacts from our new
periodic CI job) much easier.

So, this will mean that both developers and CI can potentially be less
regularly impacted by trunk regressions which often cause CI to fail, and
break developer environments.

https://review.openstack.org/#/c/276810/5

3. CI coverage and trunk failure rate

We've been working really hard to improve things here, which are really
several inter-related issues:

- Lack of Hardware capacity in the tripleo CI cloud
- Frequent trunk regressions breaking our CI
- Lack of coverage of some key features (network isolation, SSL, IPv6, upgrades)
- Lack of coverage for vendor plugin templates/puppet code

There's work ongoing to improve this from multiple perspectives:

New periodic CI job (to be used for automated promotion of the
current-tripleo repo, and for pre-built undercloud images):
https://review.openstack.org/#/c/271370/

Add network isolation support to CI:
https://review.openstack.org/#/c/288163/

Test SSL enabled in overcloud:
https://review.openstack.org/#/c/281988/

CI coverage of IPv6:
https://review.openstack.org/#/c/289445/

Discussion around better documented integration for third-party CI:
http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html

In summary, we're doing a ton of work as a community to address the
concerns raised by Emilien, and we've still got a lot more to do, but there
*is* clear agreement on many of the problems, and a concrete plan in most
cases to resolve them.

Thanks,

Steve


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-19 Thread Arkady_Kanevsky
Emilien,
Agree on the rant. But not clear on concrete proposal to fix it.
Spend more time "fixing" CI and use Tempest as a gate is a bit wage.
Unless we test known working version of each project in TripleO CI you are 
dependent on health of other components.
Thanks,
Arkady

-Original Message-
From: Emilien Macchi [mailto:emil...@redhat.com]
Sent: Friday, March 04, 2016 8:23 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [tripleo] Contributing to TripleO is challenging

That's not the name of any Summit's talk, it's just an e-mail I wanted to write 
for a long time.

It is an attempt to expose facts or things I've heard a lot; and bring 
constructive thoughts about why it's challenging to contribute in TripleO 
project.


1/ "I don't review this patch, we don't have CI coverage."

One thing I've noticed in TripleO is that a very few people are involved in CI 
work.
In my opinion, CI system is more critical than any feature in a product.
Developing Software without tests is a bit like http://goo.gl/OlgFRc All people 
- specially core - in the project should be involved in CI work. If you are 
TripleO core and you don't contribute on CI, you might ask yourself why.


2/ "I don't review this patch, CI is broken."

Another thing I've noticed in TripleO is that when CI is broken, again, a very 
few people are actually working on fixing failures.
My experience over the last years taught me to stop my daily work when CI is 
broken and fix it asap.


3/ "I don't review it, because this feature / code is not my area".

My first though is "Aren't we supposed to be engineers and learn new areas?"
My second though is that I think we have a problem with TripleO Heat Templates.
THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If TripleO core 
say "I'm not familiar with Puppet", we have a problem here, isn't?
Maybe should we split this repository? Or revisit the list of people who can +2 
patches on THT.


4/ Patches are stalled. Most of the time.

Over the last 12 months, I've pushed a lot of patches in TripleO and one thing 
I've noticed is that if I don't ping people, my patch got no review. And I have 
to rebase it, every week, because the interface changed. I got +2, cool ! Oh, 
merge conflict. Rebasing. Waiting for +2 again... and so on..

I personally spent 20% of my time to review code, every day.
I wrote a blog post about how I'm doing review, with Gertty:
http://my1.fr/blog/reviewing-puppet-openstack-patches/
I suggest TripleO folks to spend more time on reviews, for some reasons:

* decreasing frustration from contributors
* accelerate development process
* teach new contributors to work on TripleO, and eventually scale-up the core 
team. It's a time investment, but worth it.

In Puppet team, we have weekly triage sessions and it's pretty helpful.


5/ Most of the tests are run... manually.

How many times I've heard "I've tested this patch locally, and it does not work 
so -1".

The only test we do in current CI is a ping to an instance. Seriously?
Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and real 
scenarios. And we run a ping.
That's similar to 1/ but I wanted to raise it too.



If we don't change our way to work on TripleO, people will be more frustrated 
and reduce contributions at some point.
I hope from here we can have a open and constructive discussion to try to 
improve the TripleO project.

Thank you for reading so far.
--
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-14 Thread Emilien Macchi
On Fri, Mar 11, 2016 at 3:46 AM, Michael Chapman  wrote:
>
>
> On Sat, Mar 5, 2016 at 10:31 AM, Giulio Fidente  wrote:
>>
>> On 03/04/2016 03:23 PM, Emilien Macchi wrote:
>>>
>>> That's not the name of any Summit's talk, it's just an e-mail I wanted
>>> to write for a long time.
>>>
>>> It is an attempt to expose facts or things I've heard a lot; and bring
>>> constructive thoughts about why it's challenging to contribute in
>>> TripleO project.
>>
>>
>> hi Emilien,
>>
>> thanks for bringing this up, it's not an easy topic and yet of most
>> crucial. As a core contributors I feel, to some extent, responsible for the
>> current status of things and I think it's time for us to reflect more about
>> what we can, individually, do.
>>
>> I have some ideas but I want to start by commenting to your points.
>>
>>> 1/ "I don't review this patch, we don't have CI coverage."
>>>
>>> One thing I've noticed in TripleO is that a very few people are involved
>>> in CI work.
>>> In my opinion, CI system is more critical than any feature in a product.
>>> Developing Software without tests is a bit like http://goo.gl/OlgFRc
>>> All people - specially core - in the project should be involved in CI
>>> work. If you are TripleO core and you don't contribute on CI, you might
>>> ask yourself why.
>>
>>
>> Agreed, we need more 'eyes' on out CI to cope with both the infra and the
>> inavoidable failures due to changes/bugs in the puppet modules or openstack
>> itself.
>>
>> But there is more hiding behind this problem ... we already have quite a
>> number of optional and even pluggable features in TripleO and we're even
>> designing an interface to make this easier; testing them all isn't going to
>> happen. So we'll always hit something we don't have coverage for.
>>
>> Let's have a conversation on how we can improve coverage at the summit!
>> Maybe we can make simply make our CI scenarios more variegated/complex in
>> the attempt to touch more features?
>>
>>> 2/ "I don't review this patch, CI is broken."
>>>
>>> Another thing I've noticed in TripleO is that when CI is broken, again,
>>> a very few people are actually working on fixing failures.
>>> My experience over the last years taught me to stop my daily work when
>>> CI is broken and fix it asap.
>>
>>
>> Agreed. More eyes and more coverage to increase its dependability.
>>
>>> 3/ "I don't review it, because this feature / code is not my area".
>>>
>>> My first though is "Aren't we supposed to be engineers and learn new
>>> areas?"
>>> My second though is that I think we have a problem with TripleO Heat
>>> Templates.
>>> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
>>> TripleO core say "I'm not familiar with Puppet", we have a problem here,
>>> isn't?
>>> Maybe should we split this repository? Or revisit the list of people who
>>> can +2 patches on THT.
>>
>>
>> Not sure here, I find that manifests and templates are pretty much "meant
>> to go together" so I am worried that a split could solve some problems but
>> also cause others.
>
>
> This is pretty much what I proposed last week
> (https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests)
> and I noticed Dan approved the blueprint yesterday (cheers). It's definitely
> going to cause problems in that THT defines the data interface and
> puppet-tripleo is going to have to keep up with that interface in lock-step
> in some cases so be prepared to deal with that as a patch author. This isn't
> really any different to non-tripleo puppet module situations where a change
> to the repo holding hiera data will be tied to changes in modules.
>
> Ideally I'd like to incrementally decouple the puppet-tripleo profiles from
> the data heat provides but for the first cut they'll be joined at the hip.

Michael, I've also been thinking at decoupling THT into puppet-tripleo
manifests, please review:

puppet-tripleo: glance api/registry: https://review.openstack.org/289459
THT: use puppet-tripleo to deploy Glance: https://review.openstack.org/289466

Any feedback is welcome,

> So given a new home (puppet-tripleo) for a large portion of the code
> (starting with overcloud controller and controller_pacemaker), hopefully
> this paves the way for giving those who know puppet well the opportunity to
> take on responsibility for the manifests without necessarily being
> intimately familiar with the rest of the system, which I guess helps with
> Emilien's original concern that there's a skill split across the tooling
> lines.
>
>>
>>
>> This said, let's be honest, an effective patch for THT requires a good
>> understanding of many different problems which can be TripleO specific (eg.
>> implications on upgrades), tooling specific (eg. Heat/Puppet), OpenStack
>> specific (eg. cooperation with other, optional, features) so I have myself
>> skipped changes when I didn't feel comfortable with it.
>>
>> But one problem which I think is more recently slowing reviews and 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-11 Thread Michael Chapman
On Sat, Mar 5, 2016 at 10:31 AM, Giulio Fidente  wrote:

> On 03/04/2016 03:23 PM, Emilien Macchi wrote:
>
>> That's not the name of any Summit's talk, it's just an e-mail I wanted
>> to write for a long time.
>>
>> It is an attempt to expose facts or things I've heard a lot; and bring
>> constructive thoughts about why it's challenging to contribute in
>> TripleO project.
>>
>
> hi Emilien,
>
> thanks for bringing this up, it's not an easy topic and yet of most
> crucial. As a core contributors I feel, to some extent, responsible for the
> current status of things and I think it's time for us to reflect more about
> what we can, individually, do.
>
> I have some ideas but I want to start by commenting to your points.
>
> 1/ "I don't review this patch, we don't have CI coverage."
>>
>> One thing I've noticed in TripleO is that a very few people are involved
>> in CI work.
>> In my opinion, CI system is more critical than any feature in a product.
>> Developing Software without tests is a bit like http://goo.gl/OlgFRc
>> All people - specially core - in the project should be involved in CI
>> work. If you are TripleO core and you don't contribute on CI, you might
>> ask yourself why.
>>
>
> Agreed, we need more 'eyes' on out CI to cope with both the infra and the
> inavoidable failures due to changes/bugs in the puppet modules or openstack
> itself.
>
> But there is more hiding behind this problem ... we already have quite a
> number of optional and even pluggable features in TripleO and we're even
> designing an interface to make this easier; testing them all isn't going to
> happen. So we'll always hit something we don't have coverage for.
>
> Let's have a conversation on how we can improve coverage at the summit!
> Maybe we can make simply make our CI scenarios more variegated/complex in
> the attempt to touch more features?
>
> 2/ "I don't review this patch, CI is broken."
>>
>> Another thing I've noticed in TripleO is that when CI is broken, again,
>> a very few people are actually working on fixing failures.
>> My experience over the last years taught me to stop my daily work when
>> CI is broken and fix it asap.
>>
>
> Agreed. More eyes and more coverage to increase its dependability.
>
> 3/ "I don't review it, because this feature / code is not my area".
>>
>> My first though is "Aren't we supposed to be engineers and learn new
>> areas?"
>> My second though is that I think we have a problem with TripleO Heat
>> Templates.
>> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
>> TripleO core say "I'm not familiar with Puppet", we have a problem here,
>> isn't?
>> Maybe should we split this repository? Or revisit the list of people who
>> can +2 patches on THT.
>>
>
> Not sure here, I find that manifests and templates are pretty much "meant
> to go together" so I am worried that a split could solve some problems but
> also cause others.
>

This is pretty much what I proposed last week (
https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests)
and I noticed Dan approved the blueprint yesterday (cheers). It's
definitely going to cause problems in that THT defines the data interface
and puppet-tripleo is going to have to keep up with that interface in
lock-step in some cases so be prepared to deal with that as a patch author.
This isn't really any different to non-tripleo puppet module situations
where a change to the repo holding hiera data will be tied to changes in
modules.

Ideally I'd like to incrementally decouple the puppet-tripleo profiles from
the data heat provides but for the first cut they'll be joined at the hip.

So given a new home (puppet-tripleo) for a large portion of the code
(starting with overcloud controller and controller_pacemaker), hopefully
this paves the way for giving those who know puppet well the opportunity to
take on responsibility for the manifests without necessarily being
intimately familiar with the rest of the system, which I guess helps with
Emilien's original concern that there's a skill split across the tooling
lines.


>
> This said, let's be honest, an effective patch for THT requires a good
> understanding of many different problems which can be TripleO specific (eg.
> implications on upgrades), tooling specific (eg. Heat/Puppet), OpenStack
> specific (eg. cooperation with other, optional, features) so I have myself
> skipped changes when I didn't feel comfortable with it.
>
> But one problem which I think is more recently slowing reviews and which
> is somewhat concause of 3) is that we're not dealing too well with code
> duplication in the yamls and with conditional logic in the manifests.
>
> Maybe we could stop and think a together about new HOT functionalities
> which could help us? Interesting for the summit as well?
>
> 4/ Patches are stalled. Most of the time.
>>
>> Over the last 12 months, I've pushed a lot of patches in TripleO and one
>> thing I've noticed is that if I don't ping people, my patch 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-10 Thread James Slagle
On Fri, Mar 4, 2016 at 9:23 AM, Emilien Macchi  wrote:
> That's not the name of any Summit's talk, it's just an e-mail I wanted
> to write for a long time.
>
> It is an attempt to expose facts or things I've heard a lot; and bring
> constructive thoughts about why it's challenging to contribute in
> TripleO project.

Thanks for sharing your thoughts. I struggled a bit with responding to
be honest, as I think the points you call attention to often lead to
frustration on the side of both patch authors and reviewers.

I think these things are more anecdotal than fact based to be honest.
But, that doesn't mean that it's not a constructive conversation, so
thanks for calling out the issues.

At the core, we need a lot more investment in CI. We could use more
physical resources and more contributors. Or, we could direct some of
the capacity away from new development and put it towards CI
improvements instead. That might allow us to cover more features in
CI, which I think would have a direct impact on review velocity. There
are also some architectural changes proposed (split-stack) that will
allow us to scale CI more effectively than we have in the past.

>
>
> 1/ "I don't review this patch, we don't have CI coverage."

If a patch is obviously not covered by CI, it would go along way if
the author indicated if and how they manually tested a patch.

Often, I see a non-trivial patch that our CI does not cover. When I
try to manually test the patch, it doesn't work. Sometimes in very
obvious ways. Over time, this sort of pattern lowers confidence of
core reviewers to +2 and approve non-trivial patches that they
themselves haven't manually tested.

I know that manual testing doesn't scale.

However, right now, our CI doesn't scale to cover every possible
combination of features either.

So, if we want to keep moving forward at all, some things will have to
be manually tested. If patch authors added a comment such as "i tested
this with network isolation and it worked as expected, and it doesn't
break existing CI", that would go a long ways towards giving people
the confidence to approve it.

>
> One thing I've noticed in TripleO is that a very few people are involved
> in CI work.
> In my opinion, CI system is more critical than any feature in a product.
> Developing Software without tests is a bit like http://goo.gl/OlgFRc
> All people - specially core - in the project should be involved in CI
> work. If you are TripleO core and you don't contribute on CI, you might
> ask yourself why.

That's fair. Although it's also fair to ask the same of non-cores.

It's not just the job of core reviewers to get patches to pass CI. A
lot of times I get pinged to review patches with a sense of urgency
about them, yet they are sitting with failed CI. And it's not like
people are pinging me to help them understand and fix why the patch
has failed CI. They just want it reviewed/approved.

>
>
> 2/ "I don't review this patch, CI is broken."

Do you mean when CI is generally failing across the board?

I can honestly say that when CI is generally failing for whatever
reason (infrastructure issue, OpenStack regression, TripleO
regression), there is almost always 1 or 2 TripleO cores all over that
issue. Not everyone needs to be working on the issue at once. But, I
can honestly say I've never seen TripleO just completely red where at
least one person wasn't working on it almost exclusively.

However, if you're saying that people don't review a patch if that
specific patch has failed CI on it, then I think there is a lot of
shared responsibility there. It's not just on reviewers to see why
something has failed CI, or to try to get it to pass.

I'm less likely to review a patch if it has been sitting for several
days with a failed CI job on it. The author probably doesn't need it
landed that bad if that's the case. Often, there's not even a comment
if they looked at the failed job to see why. So, yea, I'm less likely
to review those patches honestly, and maybe that's not fair.

Just so I'm clear, I'm not saying that I ignore patches with failed
CI. I try and help new contributors or people I might not recognize
get their patches to pass CI. I recognize there is a steep learning
curve there.

But when I see patches out there from folks who are capable of at
least triaging the failure, and that hasn't been done, I'm certainly
guilty of de-prioritizing reviewing that patch. Maybe that's a bad
thing.

>
> Another thing I've noticed in TripleO is that when CI is broken, again,
> a very few people are actually working on fixing failures.

This sounds like you're talking about scenarios when all of TripleO Ci
is failing. In which case, I really disagree with your assertion that
people aren't rapidly fixing those failures.

> My experience over the last years taught me to stop my daily work when
> CI is broken and fix it asap.
>
> 3/ "I don't review it, because this feature / code is not my area".
>
> My first though is 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-07 Thread Dan Prince
While I agree with some (not all) of the sentiment below I'm not sure I
want to spend the time debating this rather broad set of topics in this
email thread. I'm not sure we'd actually ever see the end of it if we
did. Nor can the upstream TripleO team control all the forces at play
here.

So rather than this... Would it be reasonable to ask that we take this
a step further and split things out into concrete ideas to improve
these areas? Perhaps each in its own spec or email thread so that we
can reach clear conclusions to each problem... a step at a time.

A couple of things to set the record straight:

On the CI issues We actually have some really good ideas on the
table to solve some of these CI problems including architectural
changes like "split stack" ideas which could allow parts of our
overcloud CI to run on normal cloud instances, auto-promoting package
repositories based on nightly periodic jobs, caching our image builds,
etc. Some of these things will open the door to new features like the
ability to run more test suites (which we haven't done yet due to the
long wall time associated with our CI at this point).

There are reasons for TripleO CI, why it exists, why we have put so
much effort into keeping it running over the years. Yes our tests take
a long time to run, and yes we have some things we still do manually,
but we do catch a lot of issues and breakages in both our own and other
OpenStack projects. And while our core team often disagrees on things I
think we do agree that continuing to expand upstream CI coverage on
major features is key to digging out of the hole we are in. 

As for the rest of it I think a lot of it has to do with doing the best
we can with limited upstream resources. To me the real problem driving
a majority of the issues you describe below is simply trying to land X
number of features upstream by a given date with little to no CI
coverage. The sooner we take the time and discipline to stop this the
better.

Dan


On Fri, 2016-03-04 at 09:23 -0500, Emilien Macchi wrote:
> That's not the name of any Summit's talk, it's just an e-mail I
> wanted
> to write for a long time.
> 
> It is an attempt to expose facts or things I've heard a lot; and
> bring
> constructive thoughts about why it's challenging to contribute in
> TripleO project.
> 
> 
> 1/ "I don't review this patch, we don't have CI coverage."
> 
> One thing I've noticed in TripleO is that a very few people are
> involved
> in CI work.
> In my opinion, CI system is more critical than any feature in a
> product.
> Developing Software without tests is a bit like http://goo.gl/OlgFRc
> All people - specially core - in the project should be involved in CI
> work. If you are TripleO core and you don't contribute on CI, you
> might
> ask yourself why.
> 
> 
> 2/ "I don't review this patch, CI is broken."
> 
> Another thing I've noticed in TripleO is that when CI is broken,
> again,
> a very few people are actually working on fixing failures.
> My experience over the last years taught me to stop my daily work
> when
> CI is broken and fix it asap.
> 
> 
> 3/ "I don't review it, because this feature / code is not my area".
> 
> My first though is "Aren't we supposed to be engineers and learn new
> areas?"
> My second though is that I think we have a problem with TripleO Heat
> Templates.
> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
> TripleO core say "I'm not familiar with Puppet", we have a problem
> here,
> isn't?
> Maybe should we split this repository? Or revisit the list of people
> who
> can +2 patches on THT.
> 
> 
> 4/ Patches are stalled. Most of the time.
> 
> Over the last 12 months, I've pushed a lot of patches in TripleO and
> one
> thing I've noticed is that if I don't ping people, my patch got no
> review. And I have to rebase it, every week, because the interface
> changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for
> +2
> again... and so on..
> 
> I personally spent 20% of my time to review code, every day.
> I wrote a blog post about how I'm doing review, with Gertty:
> http://my1.fr/blog/reviewing-puppet-openstack-patches/
> I suggest TripleO folks to spend more time on reviews, for some
> reasons:
> 
> * decreasing frustration from contributors
> * accelerate development process
> * teach new contributors to work on TripleO, and eventually scale-up
> the
> core team. It's a time investment, but worth it.
> 
> In Puppet team, we have weekly triage sessions and it's pretty
> helpful.
> 
> 
> 5/ Most of the tests are run... manually.
> 
> How many times I've heard "I've tested this patch locally, and it
> does
> not work so -1".
> 
> The only test we do in current CI is a ping to an instance.
> Seriously?
> Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs
> and
> real scenarios. And we run a ping.
> That's similar to 1/ but I wanted to raise it too.
> 
> 
> 
> If we don't change our way to work on TripleO, people will be more
> frustrated 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-06 Thread Shinobu Kinjo
Any comment?

Cheers,
S

On Sat, Mar 5, 2016 at 11:03 AM, Adam Young  wrote:
> On 03/04/2016 09:23 AM, Emilien Macchi wrote:
>
> That's not the name of any Summit's talk, it's just an e-mail I wanted
> to write for a long time.
>
> It is an attempt to expose facts or things I've heard a lot; and bring
> constructive thoughts about why it's challenging to contribute in
> TripleO project.
>
>
> 1/ "I don't review this patch, we don't have CI coverage."
>
> One thing I've noticed in TripleO is that a very few people are involved
> in CI work.
> In my opinion, CI system is more critical than any feature in a product.
> Developing Software without tests is a bit like http://goo.gl/OlgFRc
> All people - specially core - in the project should be involved in CI
> work. If you are TripleO core and you don't contribute on CI, you might
> ask yourself why.
>
>
> OK...so what is the state of Tripleo CI?  My experience with Tripleo has
> shown that it is quite resource intesive, far more so than, say, Keystone,
> and so I could see that being the gating factor.
>
>
> In order for me to be able to get into Tripleo coding, I needed a new
> machine, with 32 Gb of Ram, separate from my everyday work machine.  Not a
> killer outlay, but enough to hold me up until I got the HW allocated.
>
> If we could split up the testing undercloud vs. overcloud, it might be more
> feasable.  I see no fundamental reason that the majority of the Overcloud
> development and testing could not be done on top of a non-ironic based
> OpenStack deployment.
>
> That leaves just the undercloud, which could, possibly, also run onto top of
> an existing OpenStack deployment for much of the development.
>
> A true end to end run of Tripleo with HA requires a lot:  3 Physical
> machines plus a little overhead for the Overcloud.  But this is what is
> really needed.  Ideally, on multiple vendors' systems, so that we identify
> some aspect of the Hardware variation.
>
>
>
>
> 2/ "I don't review this patch, CI is broken."
>
> Another thing I've noticed in TripleO is that when CI is broken, again,
> a very few people are actually working on fixing failures.
> My experience over the last years taught me to stop my daily work when
> CI is broken and fix it asap.
>
>
> Puppet and Heat are black boxes to me still.  I don't clearly understand how
> they fit together.
>
> I think we need to start depuppetifying Tripleo. I know we have a lot of
> sunk costs in to it, but we went with Puppet because it was all we had, not
> that it well matched the problem set.
>
> I'd recommend a freeze on all new Puppet development, and start doing all
> new features in Ansible. Fully acknowledging the havoc this will wreak,  I
> think it is important strategically.   It is really hard to swap between two
> languages, and the rest of OpenStack in Python.  Switching to Ruby is hard.
>
> All of our Client support is in Python.
>
> The number of people that know Puppet that actively contribute to OpenStack
> is small. The number of real Ruby experts is smaller.
>
>
>
> 3/ "I don't review it, because this feature / code is not my area".
>
> My first though is "Aren't we supposed to be engineers and learn new areas?"
> My second though is that I think we have a problem with TripleO Heat
> Templates.
> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
> TripleO core say "I'm not familiar with Puppet", we have a problem here,
> isn't?
> Maybe should we split this repository? Or revisit the list of people who
> can +2 patches on THT.
>
> I am more than happy to review anything Keystone related, but again, I
> struggle with Puppet.
>
> Not really knowing Heat as well makes it even tougher. We need a better
> overall orientation guide if people are going to come up to speed quicker.
>
>
>
>
> 4/ Patches are stalled. Most of the time.
>
> Over the last 12 months, I've pushed a lot of patches in TripleO and one
> thing I've noticed is that if I don't ping people, my patch got no
> review. And I have to rebase it, every week, because the interface
> changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2
> again... and so on..
>
> Same is true on Keystone.  There is just a lot to get done on this project.
> All these projects.
>
>
> I personally spent 20% of my time to review code, every day.
> I wrote a blog post about how I'm doing review, with Gertty:
> http://my1.fr/blog/reviewing-puppet-openstack-patches/
> I suggest TripleO folks to spend more time on reviews, for some reasons:
>
>
> Nice of you to write that up.
>
> * decreasing frustration from contributors
> * accelerate development process
> * teach new contributors to work on TripleO, and eventually scale-up the
> core team. It's a time investment, but worth it.
>
> In Puppet team, we have weekly triage sessions and it's pretty helpful.
>
>
> 5/ Most of the tests are run... manually.
>
> How many times I've heard "I've tested this patch locally, and it does
> not work so -1".
>
> 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-04 Thread Adam Young

On 03/04/2016 09:23 AM, Emilien Macchi wrote:

That's not the name of any Summit's talk, it's just an e-mail I wanted
to write for a long time.

It is an attempt to expose facts or things I've heard a lot; and bring
constructive thoughts about why it's challenging to contribute in
TripleO project.


1/ "I don't review this patch, we don't have CI coverage."

One thing I've noticed in TripleO is that a very few people are involved
in CI work.
In my opinion, CI system is more critical than any feature in a product.
Developing Software without tests is a bit like http://goo.gl/OlgFRc
All people - specially core - in the project should be involved in CI
work. If you are TripleO core and you don't contribute on CI, you might
ask yourself why.


OK...so what is the state of Tripleo CI?  My experience with Tripleo has 
shown that it is quite resource intesive, far more so than, say, 
Keystone, and so I could see that being the gating factor.



In order for me to be able to get into Tripleo coding, I needed a new 
machine, with 32 Gb of Ram, separate from my everyday work machine.  Not 
a killer outlay, but enough to hold me up until I got the HW allocated.


If we could split up the testing undercloud vs. overcloud, it might be 
more feasable.  I see no fundamental reason that the majority of the 
Overcloud development and testing could not be done on top of a 
non-ironic based OpenStack deployment.


That leaves just the undercloud, which could, possibly, also run onto 
top of an existing OpenStack deployment for much of the development.


A true end to end run of Tripleo with HA requires a lot:  3 Physical 
machines plus a little overhead for the Overcloud.  But this is what is 
really needed.  Ideally, on multiple vendors' systems, so that we 
identify some aspect of the Hardware variation.






2/ "I don't review this patch, CI is broken."

Another thing I've noticed in TripleO is that when CI is broken, again,
a very few people are actually working on fixing failures.
My experience over the last years taught me to stop my daily work when
CI is broken and fix it asap.


Puppet and Heat are black boxes to me still.  I don't clearly understand 
how they fit together.


I think we need to start depuppetifying Tripleo. I know we have a lot of 
sunk costs in to it, but we went with Puppet because it was all we had, 
not that it well matched the problem set.


I'd recommend a freeze on all new Puppet development, and start doing 
all new features in Ansible. Fully acknowledging the havoc this will 
wreak,  I think it is important strategically.   It is really hard to 
swap between two languages, and the rest of OpenStack in Python.  
Switching to Ruby is hard.


All of our Client support is in Python.

The number of people that know Puppet that actively contribute to 
OpenStack is small. The number of real Ruby experts is smaller.





3/ "I don't review it, because this feature / code is not my area".

My first though is "Aren't we supposed to be engineers and learn new areas?"
My second though is that I think we have a problem with TripleO Heat
Templates.
THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
TripleO core say "I'm not familiar with Puppet", we have a problem here,
isn't?
Maybe should we split this repository? Or revisit the list of people who
can +2 patches on THT.
I am more than happy to review anything Keystone related, but again, I 
struggle with Puppet.


Not really knowing Heat as well makes it even tougher. We need a better 
overall orientation guide if people are going to come up to speed quicker.






4/ Patches are stalled. Most of the time.

Over the last 12 months, I've pushed a lot of patches in TripleO and one
thing I've noticed is that if I don't ping people, my patch got no
review. And I have to rebase it, every week, because the interface
changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2
again... and so on..
Same is true on Keystone.  There is just a lot to get done on this 
project.  All these projects.




I personally spent 20% of my time to review code, every day.
I wrote a blog post about how I'm doing review, with Gertty:
http://my1.fr/blog/reviewing-puppet-openstack-patches/
I suggest TripleO folks to spend more time on reviews, for some reasons:


Nice of you to write that up.


* decreasing frustration from contributors
* accelerate development process
* teach new contributors to work on TripleO, and eventually scale-up the
core team. It's a time investment, but worth it.

In Puppet team, we have weekly triage sessions and it's pretty helpful.


5/ Most of the tests are run... manually.

How many times I've heard "I've tested this patch locally, and it does
not work so -1".

The only test we do in current CI is a ping to an instance. Seriously?
Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and
real scenarios. And we run a ping.
That's similar to 1/ but I wanted to raise it too.
Again, testing is expensive; if I 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-04 Thread Giulio Fidente

On 03/04/2016 03:23 PM, Emilien Macchi wrote:

That's not the name of any Summit's talk, it's just an e-mail I wanted
to write for a long time.

It is an attempt to expose facts or things I've heard a lot; and bring
constructive thoughts about why it's challenging to contribute in
TripleO project.


hi Emilien,

thanks for bringing this up, it's not an easy topic and yet of most 
crucial. As a core contributors I feel, to some extent, responsible for 
the current status of things and I think it's time for us to reflect 
more about what we can, individually, do.


I have some ideas but I want to start by commenting to your points.


1/ "I don't review this patch, we don't have CI coverage."

One thing I've noticed in TripleO is that a very few people are involved
in CI work.
In my opinion, CI system is more critical than any feature in a product.
Developing Software without tests is a bit like http://goo.gl/OlgFRc
All people - specially core - in the project should be involved in CI
work. If you are TripleO core and you don't contribute on CI, you might
ask yourself why.


Agreed, we need more 'eyes' on out CI to cope with both the infra and 
the inavoidable failures due to changes/bugs in the puppet modules or 
openstack itself.


But there is more hiding behind this problem ... we already have quite a 
number of optional and even pluggable features in TripleO and we're even 
designing an interface to make this easier; testing them all isn't going 
to happen. So we'll always hit something we don't have coverage for.


Let's have a conversation on how we can improve coverage at the summit! 
Maybe we can make simply make our CI scenarios more variegated/complex 
in the attempt to touch more features?



2/ "I don't review this patch, CI is broken."

Another thing I've noticed in TripleO is that when CI is broken, again,
a very few people are actually working on fixing failures.
My experience over the last years taught me to stop my daily work when
CI is broken and fix it asap.


Agreed. More eyes and more coverage to increase its dependability.


3/ "I don't review it, because this feature / code is not my area".

My first though is "Aren't we supposed to be engineers and learn new areas?"
My second though is that I think we have a problem with TripleO Heat
Templates.
THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
TripleO core say "I'm not familiar with Puppet", we have a problem here,
isn't?
Maybe should we split this repository? Or revisit the list of people who
can +2 patches on THT.


Not sure here, I find that manifests and templates are pretty much 
"meant to go together" so I am worried that a split could solve some 
problems but also cause others.


This said, let's be honest, an effective patch for THT requires a good 
understanding of many different problems which can be TripleO specific 
(eg. implications on upgrades), tooling specific (eg. Heat/Puppet), 
OpenStack specific (eg. cooperation with other, optional, features) so I 
have myself skipped changes when I didn't feel comfortable with it.


But one problem which I think is more recently slowing reviews and which 
is somewhat concause of 3) is that we're not dealing too well with code 
duplication in the yamls and with conditional logic in the manifests.


Maybe we could stop and think a together about new HOT functionalities 
which could help us? Interesting for the summit as well?



4/ Patches are stalled. Most of the time.

Over the last 12 months, I've pushed a lot of patches in TripleO and one
thing I've noticed is that if I don't ping people, my patch got no
review. And I have to rebase it, every week, because the interface
changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2
again... and so on..

I personally spent 20% of my time to review code, every day.
I wrote a blog post about how I'm doing review, with Gertty:
http://my1.fr/blog/reviewing-puppet-openstack-patches/
I suggest TripleO folks to spend more time on reviews, for some reasons:

* decreasing frustration from contributors
* accelerate development process
* teach new contributors to work on TripleO, and eventually scale-up the
core team. It's a time investment, but worth it.


I'm inclined to think that this is a bit of a consequence of 1), 2) and 
3) together.



In Puppet team, we have weekly triage sessions and it's pretty helpful.


Right. I think we experimented with something like this before but it 
was probably perceived as an emergency measure so we put it on a side 
after a while.


I remember we had a list of 'hot reviews' which we would review during 
the weekly meetings. But it isn't trivial to understand which type of 
review is considered hot. What is the purpose of the puppet team 
triaging? To find old reviews? Mergeable reviews? To dropping stale 
reviews? To speed up bug fixes? To get attention on features?



5/ Most of the tests are run... manually.

How many times I've heard "I've tested this patch locally, and 

Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-04 Thread Paul Belanger
On Fri, Mar 04, 2016 at 09:23:19AM -0500, Emilien Macchi wrote:
> That's not the name of any Summit's talk, it's just an e-mail I wanted
> to write for a long time.
> 
> It is an attempt to expose facts or things I've heard a lot; and bring
> constructive thoughts about why it's challenging to contribute in
> TripleO project.
> 
> 
> 1/ "I don't review this patch, we don't have CI coverage."
> 
> One thing I've noticed in TripleO is that a very few people are involved
> in CI work.
> In my opinion, CI system is more critical than any feature in a product.
> Developing Software without tests is a bit like http://goo.gl/OlgFRc
> All people - specially core - in the project should be involved in CI
> work. If you are TripleO core and you don't contribute on CI, you might
> ask yourself why.
> 
As somebody who contributes to openstack-infa and knows most of the ins and outs
of OpenStack CI, I often wish the TripleO CI would be more inline with
openstack-infa.  Right now, TripleO CI is a black hole to me.  I understand
there are some reason to have separate CI (eg: baremetal provisioning) but it
would be nice to revisit the current setup and see if we can move more inline
with openstack-infra.

For the simple reason, having common tooling means I can contribute to TripleO
CI if needed.

> 
> 2/ "I don't review this patch, CI is broken."
> 
> Another thing I've noticed in TripleO is that when CI is broken, again,
> a very few people are actually working on fixing failures.
> My experience over the last years taught me to stop my daily work when
> CI is broken and fix it asap.
> 
See my above comment. I think this would go a great way to helping the team.
> 
> 3/ "I don't review it, because this feature / code is not my area".
> 
> My first though is "Aren't we supposed to be engineers and learn new areas?"
> My second though is that I think we have a problem with TripleO Heat
> Templates.
> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
> TripleO core say "I'm not familiar with Puppet", we have a problem here,
> isn't?
> Maybe should we split this repository? Or revisit the list of people who
> can +2 patches on THT.
> 
> 
> 4/ Patches are stalled. Most of the time.
> 
> Over the last 12 months, I've pushed a lot of patches in TripleO and one
> thing I've noticed is that if I don't ping people, my patch got no
> review. And I have to rebase it, every week, because the interface
> changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2
> again... and so on..
> 
> I personally spent 20% of my time to review code, every day.
> I wrote a blog post about how I'm doing review, with Gertty:
> http://my1.fr/blog/reviewing-puppet-openstack-patches/
> I suggest TripleO folks to spend more time on reviews, for some reasons:
> 
> * decreasing frustration from contributors
> * accelerate development process
> * teach new contributors to work on TripleO, and eventually scale-up the
> core team. It's a time investment, but worth it.
> 
> In Puppet team, we have weekly triage sessions and it's pretty helpful.
> 
> 
> 5/ Most of the tests are run... manually.
> 
> How many times I've heard "I've tested this patch locally, and it does
> not work so -1".
> 
> The only test we do in current CI is a ping to an instance. Seriously?
> Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and
> real scenarios. And we run a ping.
> That's similar to 1/ but I wanted to raise it too.
> 
> 
> 
> If we don't change our way to work on TripleO, people will be more
> frustrated and reduce contributions at some point.
> I hope from here we can have a open and constructive discussion to try
> to improve the TripleO project.
> 
> Thank you for reading so far.
> -- 
> Emilien Macchi
> 
So for me, I'd love to help more but having to context shift into TripleO CI is
a deal breaker for me (and more of -infra is I was a betting man).  So, anything
I can do to help move things like base images or using AFS mirrors into TripleO
I am happy to help.  However, having the TripleO team maintain CI themselves
doesn't seem to be the best case scenario.

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Contributing to TripleO is challenging

2016-03-04 Thread Emilien Macchi
That's not the name of any Summit's talk, it's just an e-mail I wanted
to write for a long time.

It is an attempt to expose facts or things I've heard a lot; and bring
constructive thoughts about why it's challenging to contribute in
TripleO project.


1/ "I don't review this patch, we don't have CI coverage."

One thing I've noticed in TripleO is that a very few people are involved
in CI work.
In my opinion, CI system is more critical than any feature in a product.
Developing Software without tests is a bit like http://goo.gl/OlgFRc
All people - specially core - in the project should be involved in CI
work. If you are TripleO core and you don't contribute on CI, you might
ask yourself why.


2/ "I don't review this patch, CI is broken."

Another thing I've noticed in TripleO is that when CI is broken, again,
a very few people are actually working on fixing failures.
My experience over the last years taught me to stop my daily work when
CI is broken and fix it asap.


3/ "I don't review it, because this feature / code is not my area".

My first though is "Aren't we supposed to be engineers and learn new areas?"
My second though is that I think we have a problem with TripleO Heat
Templates.
THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If
TripleO core say "I'm not familiar with Puppet", we have a problem here,
isn't?
Maybe should we split this repository? Or revisit the list of people who
can +2 patches on THT.


4/ Patches are stalled. Most of the time.

Over the last 12 months, I've pushed a lot of patches in TripleO and one
thing I've noticed is that if I don't ping people, my patch got no
review. And I have to rebase it, every week, because the interface
changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2
again... and so on..

I personally spent 20% of my time to review code, every day.
I wrote a blog post about how I'm doing review, with Gertty:
http://my1.fr/blog/reviewing-puppet-openstack-patches/
I suggest TripleO folks to spend more time on reviews, for some reasons:

* decreasing frustration from contributors
* accelerate development process
* teach new contributors to work on TripleO, and eventually scale-up the
core team. It's a time investment, but worth it.

In Puppet team, we have weekly triage sessions and it's pretty helpful.


5/ Most of the tests are run... manually.

How many times I've heard "I've tested this patch locally, and it does
not work so -1".

The only test we do in current CI is a ping to an instance. Seriously?
Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and
real scenarios. And we run a ping.
That's similar to 1/ but I wanted to raise it too.



If we don't change our way to work on TripleO, people will be more
frustrated and reduce contributions at some point.
I hope from here we can have a open and constructive discussion to try
to improve the TripleO project.

Thank you for reading so far.
-- 
Emilien Macchi



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev