Re: [openstack-dev] [all] periodic jobs for master

2014-10-29 Thread Ihar Hrachyshka
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 25/10/14 00:16, James E. Blair wrote:
 Andrea Frittoli andrea.fritt...@gmail.com writes:
 
 I also believe we can find ways to make post-merge / periodic
 checks useful. We need to do that to keep the gate to a sane
 scale.
 
 Yes, we have a plan to do that that we outlined at the infra/QA
 meetup this summer and described to this list in this email:
 
 http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html

  Particularly this part, but please read the whole message if you
 have not already, or have forgotten it:
 
 * For all non gold standard configurations, we'll dedicate a part
 of our infrastructure to running them in a continuous background
 loop, as well as making these configs available as experimental
 jobs. The idea here is that we'll actually be able to provide more 
 configurations that are operating in a more traditional CI (post 
 merge) context. People that are interested in keeping these bits 
 functional can monitor those jobs and help with fixes when needed. 
 The experimental jobs mean that if developers are concerned about 
 the effect of a particular change on one of these configs, it's
 easy to request a pre-merge test run.  In the near term we might
 imagine this would allow for things like ceph, mongodb, docker, and
 possibly very new libvirt to be validated in some way upstream.
 
 * Provide some kind of easy to view dashboards of these jobs, as
 well as a policy that if some job is failing for  some period of
 time, it's removed from the system. We want to provide whatever
 feedback we can to engaged parties, but people do need to realize
 that engagement is key. The biggest part of putting tests into
 OpenStack isn't landing the tests, but dealing with their
 failures.
 
 I'm glad to see people interested in this.  If you're ready to 
 contribute to it, please stop by #openstack-infra or join our next
 team meeting[1] to discuss how you can help.

I'm sorry I've missed the email that you referred to before. Indeed,
it looks like I'm not the first one who started to think about the
matter. Summit wise, will there be any sessions where the subject will
be discussed?

 
 -Jim
 
 [1] https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting
 
 ___ OpenStack-dev
 mailing list OpenStack-dev@lists.openstack.org 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUUReJAAoJEC5aWaUY1u57H6QH/17FbSgU5vvwM03OzfSCpsZi
IAG6T/UThfVQ8H08cHk6R+US9TkKdrl1QTJCDr70QhKbzLy+7OKp/H3B/PIuhaaN
enqDp7ku3XQotxRTw6AW/ksLb9LCZCMMRtDiFOemC2TI6jqNXBKRz+TwFh2terY3
a9YH8IoYk2qYyLZ0fcv+OXdS7If+zD3u0PGOAJCBwKWbpUv82STdzjbDCATM779g
rBC9BgYheSYPYfNjxpPKb/UN7aJZ/4TRPgK6MWktHGmqhZzZmlFPme+7x0rLdMvz
5/4m2Oh6k6Th/y1TV65jYcZID50w1esMO7tGdvmtX6Drc9lB9Y0r3fQF7R2eYpE=
=FmKW
-END PGP SIGNATURE-

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-29 Thread Andrea Frittoli

 I'm sorry I've missed the email that you referred to before. Indeed,
 it looks like I'm not the first one who started to think about the
 matter. Summit wise, will there be any sessions where the subject will
 be discussed?


Yes.
About post merge CI:
http://kilodesignsummit.sched.org/event/1e33d1f4896a52e2c02b062cfc18ba39#.VFEZqvmsV8E
About moving functional test to projects:
http://kilodesignsummit.sched.org/event/575938e4837e8293615845582d7e3e7f#.VFEaM_msV8E

Andrea Frittoli (andreaf)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-24 Thread Ihar Hrachyshka
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 22/10/14 12:07, Thierry Carrez wrote:
 Ihar Hrachyshka wrote:
 [...] For stable branches, we have so called periodic jobs that
 are triggered once in a while against the current code in a
 stable branch, and report to openstack-stable-maint@ mailing
 list. An example of failing periodic job report can be found at
 [2]. I envision that similar approach can be applied to test
 auxiliary features in gate. So once something is broken in
 master, the interested parties behind the auxiliary feature will
 be informed in due time. [...]
 
 The main issue with periodic jobs is that since they are
 non-blocking, they can get ignored really easily. It takes a bit of
 organization and process to get those failures addressed.
 
 It's only recently (and a lot thanks to you) that failures in the 
 periodic jobs for stable branches are being taken into account
 quickly and seriously. For years the failures just lingered until
 they blocked someone's work enough for that person to go and fix
 them.
 
 So while I think periodic jobs are a good way to increase corner
 case testing coverage, I am skeptical of our collective ability to
 have the discipline necessary for them not to become a pain. We'll
 need a strict process around them: identified groups of people
 signed up to act on failure, and failure stats so that we can
 remove jobs that don't get enough attention.
 

There should be interest groups behind each of periodic jobs (maybe
sometimes consisting of one person). Yes, jobs should be tracked,
though I assume that if the group is really interested in it, it will
track it on daily basis. Otherwise, we'll see it rot and eventually
removed. Let's say anyone can propose a job to remove in the mailing
list, and we'll assess case by case whether it's ok to remove it
instead of e.g. fixing it (because we have no interested parties to
track it).

Another question to solve is how we disseminate state of those jobs.
Do we create a separate mailing list for that? Obviously we should not
reuse -dev one, and it's overkill to create one mailing list per
interest group.

/Ihar
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUSmWxAAoJEC5aWaUY1u5742kIAIIwMpTt3WL5j7RQkwtEc9qj
xEHe0cC9gHtsCgxYrDkbhX2t3YmwZYg7tvzRYSJtds7hkRtiG4fjHSkdTWp3bW0m
jYGoC7x4wMxjP6CPv2q/3CGdkE4+0AK9/aGurL22tcmHsqHj8COIAfuMB4np/y9n
FSVyiHS86mlCx02BXIJkJwefpyO4ayM2H6IvtNjhtwYiwoH7mxQAvPpCW2vZPZOt
xBSDTu0tcvlOm0xi8V8S2LDRvVaoV90w8zAh2jaNmeYVU3f/Js+X3VUa579epBOE
kc0zaG1WYrcVxWkBDVGDRCBlvA9oCaQ4C8ZUFtJzGNS8Nss5/QfVndtoZSwWr5I=
=L0NC
-END PGP SIGNATURE-

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-24 Thread Thierry Carrez
Ihar Hrachyshka wrote:
 Another question to solve is how we disseminate state of those jobs.
 Do we create a separate mailing list for that? Obviously we should not
 reuse -dev one, and it's overkill to create one mailing list per
 interest group.

Should we explore other avenues than email for this ? If we plan to do
opt-in anyway, would some status website/RSS not work better ?

The ideal system imho would be a status website where we could see
failures and close them as handled so that everyone knows that a past
FAIL result has already been fixed. That could help avoid duplication of
painful debugging work.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-24 Thread Andrea Frittoli
I also believe we can find ways to make post-merge / periodic checks useful.
We need to do that to keep the gate to a sane scale.

On 24 October 2014 17:33, Thierry Carrez thie...@openstack.org wrote:
 Ihar Hrachyshka wrote:
 Another question to solve is how we disseminate state of those jobs.
 Do we create a separate mailing list for that? Obviously we should not
 reuse -dev one, and it's overkill to create one mailing list per
 interest group.

 Should we explore other avenues than email for this ? If we plan to do
 opt-in anyway, would some status website/RSS not work better ?

+1


 The ideal system imho would be a status website where we could see
 failures and close them as handled so that everyone knows that a past
 FAIL result has already been fixed. That could help avoid duplication of
 painful debugging work.

+1

Publicizing the test results better, and to the interested audience
will help a lot.
Same as keep a track record of fixed issues and solutions.

Tracking result history at test level (using subunit2sql), build and
analyze trends would be a great tool to identify and troubleshoot
failures.

Also be beneficial IMO would be extracting whatever information can be
gather automatically from the test results.
Rather than saying job X failed we could have tools that allow us to
tell test X started failing in a specific time range, and this is the
list of sha1s that have been merged around that time.

We will also discuss about this topic at Paris in the QA track.

Andrea Frittoli (andreaf)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-24 Thread James E. Blair
Andrea Frittoli andrea.fritt...@gmail.com writes:

 I also believe we can find ways to make post-merge / periodic checks useful.
 We need to do that to keep the gate to a sane scale.

Yes, we have a plan to do that that we outlined at the infra/QA meetup
this summer and described to this list in this email:

http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html

Particularly this part, but please read the whole message if you have
not already, or have forgotten it:

  * For all non gold standard configurations, we'll dedicate a part of
our infrastructure to running them in a continuous background loop,
as well as making these configs available as experimental jobs. The
idea here is that we'll actually be able to provide more
configurations that are operating in a more traditional CI (post
merge) context. People that are interested in keeping these bits
functional can monitor those jobs and help with fixes when needed.
The experimental jobs mean that if developers are concerned about
the effect of a particular change on one of these configs, it's easy
to request a pre-merge test run.  In the near term we might imagine
this would allow for things like ceph, mongodb, docker, and possibly
very new libvirt to be validated in some way upstream.

  * Provide some kind of easy to view dashboards of these jobs, as well
as a policy that if some job is failing for  some period of time,
it's removed from the system. We want to provide whatever feedback
we can to engaged parties, but people do need to realize that
engagement is key. The biggest part of putting tests into OpenStack
isn't landing the tests, but dealing with their failures.

I'm glad to see people interested in this.  If you're ready to
contribute to it, please stop by #openstack-infra or join our next team
meeting[1] to discuss how you can help.

-Jim

[1] https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-22 Thread Thierry Carrez
Ihar Hrachyshka wrote:
 [...]
 For stable branches, we have so called periodic jobs that are
 triggered once in a while against the current code in a stable branch,
 and report to openstack-stable-maint@ mailing list. An example of
 failing periodic job report can be found at [2]. I envision that
 similar approach can be applied to test auxiliary features in gate. So
 once something is broken in master, the interested parties behind the
 auxiliary feature will be informed in due time.
 [...]

The main issue with periodic jobs is that since they are non-blocking,
they can get ignored really easily. It takes a bit of organization and
process to get those failures addressed.

It's only recently (and a lot thanks to you) that failures in the
periodic jobs for stable branches are being taken into account quickly
and seriously. For years the failures just lingered until they blocked
someone's work enough for that person to go and fix them.

So while I think periodic jobs are a good way to increase corner case
testing coverage, I am skeptical of our collective ability to have the
discipline necessary for them not to become a pain. We'll need a strict
process around them: identified groups of people signed up to act on
failure, and failure stats so that we can remove jobs that don't get
enough attention.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-22 Thread Chris Dent

On Wed, 22 Oct 2014, Thierry Carrez wrote:


So while I think periodic jobs are a good way to increase corner case
testing coverage, I am skeptical of our collective ability to have the
discipline necessary for them not to become a pain. We'll need a strict
process around them: identified groups of people signed up to act on
failure, and failure stats so that we can remove jobs that don't get
enough attention.


It's a bummer that we often find ourselves turning to processes to
make up for a lack of discipline. If that's how it has to be how about
we make sure the pain if easy to feel. So, for example, if there are
periodic jobs on master and they've just failed for a project, how
about just close the gate for that project until the failure
identified by the periodic job is fixed?

--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] periodic jobs for master

2014-10-22 Thread David Kranz

On 10/22/2014 06:07 AM, Thierry Carrez wrote:

Ihar Hrachyshka wrote:

[...]
For stable branches, we have so called periodic jobs that are
triggered once in a while against the current code in a stable branch,
and report to openstack-stable-maint@ mailing list. An example of
failing periodic job report can be found at [2]. I envision that
similar approach can be applied to test auxiliary features in gate. So
once something is broken in master, the interested parties behind the
auxiliary feature will be informed in due time.
[...]

The main issue with periodic jobs is that since they are non-blocking,
they can get ignored really easily. It takes a bit of organization and
process to get those failures addressed.

It's only recently (and a lot thanks to you) that failures in the
periodic jobs for stable branches are being taken into account quickly
and seriously. For years the failures just lingered until they blocked
someone's work enough for that person to go and fix them.

So while I think periodic jobs are a good way to increase corner case
testing coverage, I am skeptical of our collective ability to have the
discipline necessary for them not to become a pain. We'll need a strict
process around them: identified groups of people signed up to act on
failure, and failure stats so that we can remove jobs that don't get
enough attention.

While I share some of your skepticism, we have to find a way to make 
this work.
Saying we are doing our best to ensure the quality of upstream OpenStack 
based on a single-tier of testing (the gate) that is limited to 40min runs
is not plausible. Of course a lot more testing happens downstream but we 
can do better as a community. I think we should rephrase this subject as 
non-gating jobs. We could have various kinds of stress and longevity 
jobs running to good effect if we can solve this process problem.


Following on your process suggestion, in practice the most likely way 
this could actually work is to have a rotation of build guardians that 
agree to keep an eye on jobs for a short period of time. There would 
need to be a separate rotation list for each project that has 
non-gating, project-specific jobs. This will likely happen as we move 
towards deeper functional testing in projects. The qa team would be the 
logical pool for a rotation of more global jobs of the kind I think Ihar 
was referring to.


As for failure status, each of these non-gating jobs would have their 
own name so logstash could be used to debug failures. Do we already have 
anything that tracks failure rates of jobs?


 -David




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev