Re: [openstack-dev] [ironic] 3rdparty CI status and how we can help to make it green

2017-04-13 Thread Rajini.Ram
Dell - Internal Use - Confidential
Thanks for starting the discussion. Will attend

-Original Message-
From: Dmitry Tantsur [mailto:dtant...@redhat.com]
Sent: Thursday, April 13, 2017 10:33 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [ironic] 3rdparty CI status and how we can help to 
make it green

Hi all, especially maintainers of 3rdparty CI for Ironic :)

I've been watching our 3rdparty CI results recently. While things have improved 
compared to e.g. a month ago, most of jobs still finish with failures. I've 
written a simple script [1] to fetch CI runs information from my local Gertty 
database, the results [2] show that some jobs still fail surprisingly often (> 
50% of cases):

- job: tempest-dsvm-ironic-agent-irmc
rate: 0.9857142857142858
- job: tempest-dsvm-ironic-iscsi-irmc
rate: 0.9771428571428571
- job: dell-hw-tempest-dsvm-ironic-pxe_drac
rate: 0.9682539682539683
- job: gate-tempest-ironic-ilo-driver-iscsi_ilo
rate: 0.9582463465553236
- job: dell-hw-tempest-dsvm-ironic-pxe_ipmitool
rate: 0.9111
- job: tempest-dsvm-ironic-pxe-irmc
rate: 0.8171428571428572
- job: gate-tempest-ironic-ilo-driver-pxe_ilo
rate: 0.791231732776618

I would like to start the discussion on how we (as a team) can help people 
maintaining the CI to keep failure rate closer to one of our virtual CI (< 30% 
of cases, judging by [2]).

I'm thinking of the following potential problems:

1. Our devstack plugin changes too often.

I've head this complaint at least once. Should we maybe freeze our devstack at 
some point to allow the vendor folks to catch up? Then we should start looking 
at the CI results more carefully when modifying it.

2. Our devstack plugin is inconvenient for hardware, and requires hacks.

This is something Miles (?) told me when trying to set up an environment for 
his hardware lab. If so, can we get a list of pain problems, preferably in a 
form of reported bugs? Myself and hopefully other folks can certainly dedicate 
some time to make your life easier.

3. The number of jobs to run on is too high.

I've noticed that 3rdparty CI runs even on patches that clearly don't require 
it, e.g. docs-only changes. I suggest the maintainers to adopt some exclude 
rules similar to [3].

Also, most of the vendors run 3-4 jobs for different flavors of their drivers 
(and it is going to increase with the driver composition work). I wonder if we 
should recommend switching from ironic the baremetal_basic_ops test to what we 
call "standalone" tests [4]. This will allow to have only one job testing 
several drivers/combinations of interfaces within the same time frame.

Finally, I've proposed this topic for the virtual meetup [5] planned in the end 
of April. Please feel free to stop by and let us know how we can help.

Thanks,
Dmitry.

P.S.
I've seen expired or self-signed HTTPS certificates on logs sites of some 
3rdparty CI. Please try to fix such issues as soon as possible to allow the 
community to understand failures.

[1] https://github.com/dtantsur/ci-report
[2] http://paste.openstack.org/show/606467/
[3]
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L1375-L1385
[4]
https://github.com/openstack/ironic/blob/master/ironic_tempest_plugin/tests/scenario/ironic_standalone/test_basic_ops.py
[5] https://etherpad.openstack.org/p/ironic-virtual-meetup

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ironic] 3rdparty CI status and how we can help to make it green

2017-04-13 Thread Dmitry Tantsur

Hi all, especially maintainers of 3rdparty CI for Ironic :)

I've been watching our 3rdparty CI results recently. While things have improved 
compared to e.g. a month ago, most of jobs still finish with failures. I've 
written a simple script [1] to fetch CI runs information from my local Gertty 
database, the results [2] show that some jobs still fail surprisingly often (> 
50% of cases):


- job: tempest-dsvm-ironic-agent-irmc
  rate: 0.9857142857142858
- job: tempest-dsvm-ironic-iscsi-irmc
  rate: 0.9771428571428571
- job: dell-hw-tempest-dsvm-ironic-pxe_drac
  rate: 0.9682539682539683
- job: gate-tempest-ironic-ilo-driver-iscsi_ilo
  rate: 0.9582463465553236
- job: dell-hw-tempest-dsvm-ironic-pxe_ipmitool
  rate: 0.9111
- job: tempest-dsvm-ironic-pxe-irmc
  rate: 0.8171428571428572
- job: gate-tempest-ironic-ilo-driver-pxe_ilo
  rate: 0.791231732776618

I would like to start the discussion on how we (as a team) can help people 
maintaining the CI to keep failure rate closer to one of our virtual CI (< 30% 
of cases, judging by [2]).


I'm thinking of the following potential problems:

1. Our devstack plugin changes too often.

  I've head this complaint at least once. Should we maybe freeze our devstack 
at some point to allow the vendor folks to catch up? Then we should start 
looking at the CI results more carefully when modifying it.


2. Our devstack plugin is inconvenient for hardware, and requires hacks.

  This is something Miles (?) told me when trying to set up an environment for 
his hardware lab. If so, can we get a list of pain problems, preferably in a 
form of reported bugs? Myself and hopefully other folks can certainly dedicate 
some time to make your life easier.


3. The number of jobs to run on is too high.

 I've noticed that 3rdparty CI runs even on patches that clearly don't require 
it, e.g. docs-only changes. I suggest the maintainers to adopt some exclude 
rules similar to [3].


  Also, most of the vendors run 3-4 jobs for different flavors of their drivers 
(and it is going to increase with the driver composition work). I wonder if we 
should recommend switching from ironic the baremetal_basic_ops test to what we 
call "standalone" tests [4]. This will allow to have only one job testing 
several drivers/combinations of interfaces within the same time frame.


Finally, I've proposed this topic for the virtual meetup [5] planned in the end 
of April. Please feel free to stop by and let us know how we can help.


Thanks,
Dmitry.

P.S.
I've seen expired or self-signed HTTPS certificates on logs sites of some 
3rdparty CI. Please try to fix such issues as soon as possible to allow the 
community to understand failures.


[1] https://github.com/dtantsur/ci-report
[2] http://paste.openstack.org/show/606467/
[3] 
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L1375-L1385
[4] 
https://github.com/openstack/ironic/blob/master/ironic_tempest_plugin/tests/scenario/ironic_standalone/test_basic_ops.py

[5] https://etherpad.openstack.org/p/ironic-virtual-meetup

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev