Re: [openstack-dev] [all] periodic jobs for master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 25/10/14 00:16, James E. Blair wrote: Andrea Frittoli andrea.fritt...@gmail.com writes: I also believe we can find ways to make post-merge / periodic checks useful. We need to do that to keep the gate to a sane scale. Yes, we have a plan to do that that we outlined at the infra/QA meetup this summer and described to this list in this email: http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html Particularly this part, but please read the whole message if you have not already, or have forgotten it: * For all non gold standard configurations, we'll dedicate a part of our infrastructure to running them in a continuous background loop, as well as making these configs available as experimental jobs. The idea here is that we'll actually be able to provide more configurations that are operating in a more traditional CI (post merge) context. People that are interested in keeping these bits functional can monitor those jobs and help with fixes when needed. The experimental jobs mean that if developers are concerned about the effect of a particular change on one of these configs, it's easy to request a pre-merge test run. In the near term we might imagine this would allow for things like ceph, mongodb, docker, and possibly very new libvirt to be validated in some way upstream. * Provide some kind of easy to view dashboards of these jobs, as well as a policy that if some job is failing for some period of time, it's removed from the system. We want to provide whatever feedback we can to engaged parties, but people do need to realize that engagement is key. The biggest part of putting tests into OpenStack isn't landing the tests, but dealing with their failures. I'm glad to see people interested in this. If you're ready to contribute to it, please stop by #openstack-infra or join our next team meeting[1] to discuss how you can help. I'm sorry I've missed the email that you referred to before. Indeed, it looks like I'm not the first one who started to think about the matter. Summit wise, will there be any sessions where the subject will be discussed? -Jim [1] https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJUUReJAAoJEC5aWaUY1u57H6QH/17FbSgU5vvwM03OzfSCpsZi IAG6T/UThfVQ8H08cHk6R+US9TkKdrl1QTJCDr70QhKbzLy+7OKp/H3B/PIuhaaN enqDp7ku3XQotxRTw6AW/ksLb9LCZCMMRtDiFOemC2TI6jqNXBKRz+TwFh2terY3 a9YH8IoYk2qYyLZ0fcv+OXdS7If+zD3u0PGOAJCBwKWbpUv82STdzjbDCATM779g rBC9BgYheSYPYfNjxpPKb/UN7aJZ/4TRPgK6MWktHGmqhZzZmlFPme+7x0rLdMvz 5/4m2Oh6k6Th/y1TV65jYcZID50w1esMO7tGdvmtX6Drc9lB9Y0r3fQF7R2eYpE= =FmKW -END PGP SIGNATURE- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
I'm sorry I've missed the email that you referred to before. Indeed, it looks like I'm not the first one who started to think about the matter. Summit wise, will there be any sessions where the subject will be discussed? Yes. About post merge CI: http://kilodesignsummit.sched.org/event/1e33d1f4896a52e2c02b062cfc18ba39#.VFEZqvmsV8E About moving functional test to projects: http://kilodesignsummit.sched.org/event/575938e4837e8293615845582d7e3e7f#.VFEaM_msV8E Andrea Frittoli (andreaf) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 22/10/14 12:07, Thierry Carrez wrote: Ihar Hrachyshka wrote: [...] For stable branches, we have so called periodic jobs that are triggered once in a while against the current code in a stable branch, and report to openstack-stable-maint@ mailing list. An example of failing periodic job report can be found at [2]. I envision that similar approach can be applied to test auxiliary features in gate. So once something is broken in master, the interested parties behind the auxiliary feature will be informed in due time. [...] The main issue with periodic jobs is that since they are non-blocking, they can get ignored really easily. It takes a bit of organization and process to get those failures addressed. It's only recently (and a lot thanks to you) that failures in the periodic jobs for stable branches are being taken into account quickly and seriously. For years the failures just lingered until they blocked someone's work enough for that person to go and fix them. So while I think periodic jobs are a good way to increase corner case testing coverage, I am skeptical of our collective ability to have the discipline necessary for them not to become a pain. We'll need a strict process around them: identified groups of people signed up to act on failure, and failure stats so that we can remove jobs that don't get enough attention. There should be interest groups behind each of periodic jobs (maybe sometimes consisting of one person). Yes, jobs should be tracked, though I assume that if the group is really interested in it, it will track it on daily basis. Otherwise, we'll see it rot and eventually removed. Let's say anyone can propose a job to remove in the mailing list, and we'll assess case by case whether it's ok to remove it instead of e.g. fixing it (because we have no interested parties to track it). Another question to solve is how we disseminate state of those jobs. Do we create a separate mailing list for that? Obviously we should not reuse -dev one, and it's overkill to create one mailing list per interest group. /Ihar -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJUSmWxAAoJEC5aWaUY1u5742kIAIIwMpTt3WL5j7RQkwtEc9qj xEHe0cC9gHtsCgxYrDkbhX2t3YmwZYg7tvzRYSJtds7hkRtiG4fjHSkdTWp3bW0m jYGoC7x4wMxjP6CPv2q/3CGdkE4+0AK9/aGurL22tcmHsqHj8COIAfuMB4np/y9n FSVyiHS86mlCx02BXIJkJwefpyO4ayM2H6IvtNjhtwYiwoH7mxQAvPpCW2vZPZOt xBSDTu0tcvlOm0xi8V8S2LDRvVaoV90w8zAh2jaNmeYVU3f/Js+X3VUa579epBOE kc0zaG1WYrcVxWkBDVGDRCBlvA9oCaQ4C8ZUFtJzGNS8Nss5/QfVndtoZSwWr5I= =L0NC -END PGP SIGNATURE- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
Ihar Hrachyshka wrote: Another question to solve is how we disseminate state of those jobs. Do we create a separate mailing list for that? Obviously we should not reuse -dev one, and it's overkill to create one mailing list per interest group. Should we explore other avenues than email for this ? If we plan to do opt-in anyway, would some status website/RSS not work better ? The ideal system imho would be a status website where we could see failures and close them as handled so that everyone knows that a past FAIL result has already been fixed. That could help avoid duplication of painful debugging work. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
I also believe we can find ways to make post-merge / periodic checks useful. We need to do that to keep the gate to a sane scale. On 24 October 2014 17:33, Thierry Carrez thie...@openstack.org wrote: Ihar Hrachyshka wrote: Another question to solve is how we disseminate state of those jobs. Do we create a separate mailing list for that? Obviously we should not reuse -dev one, and it's overkill to create one mailing list per interest group. Should we explore other avenues than email for this ? If we plan to do opt-in anyway, would some status website/RSS not work better ? +1 The ideal system imho would be a status website where we could see failures and close them as handled so that everyone knows that a past FAIL result has already been fixed. That could help avoid duplication of painful debugging work. +1 Publicizing the test results better, and to the interested audience will help a lot. Same as keep a track record of fixed issues and solutions. Tracking result history at test level (using subunit2sql), build and analyze trends would be a great tool to identify and troubleshoot failures. Also be beneficial IMO would be extracting whatever information can be gather automatically from the test results. Rather than saying job X failed we could have tools that allow us to tell test X started failing in a specific time range, and this is the list of sha1s that have been merged around that time. We will also discuss about this topic at Paris in the QA track. Andrea Frittoli (andreaf) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
Andrea Frittoli andrea.fritt...@gmail.com writes: I also believe we can find ways to make post-merge / periodic checks useful. We need to do that to keep the gate to a sane scale. Yes, we have a plan to do that that we outlined at the infra/QA meetup this summer and described to this list in this email: http://lists.openstack.org/pipermail/openstack-dev/2014-July/041057.html Particularly this part, but please read the whole message if you have not already, or have forgotten it: * For all non gold standard configurations, we'll dedicate a part of our infrastructure to running them in a continuous background loop, as well as making these configs available as experimental jobs. The idea here is that we'll actually be able to provide more configurations that are operating in a more traditional CI (post merge) context. People that are interested in keeping these bits functional can monitor those jobs and help with fixes when needed. The experimental jobs mean that if developers are concerned about the effect of a particular change on one of these configs, it's easy to request a pre-merge test run. In the near term we might imagine this would allow for things like ceph, mongodb, docker, and possibly very new libvirt to be validated in some way upstream. * Provide some kind of easy to view dashboards of these jobs, as well as a policy that if some job is failing for some period of time, it's removed from the system. We want to provide whatever feedback we can to engaged parties, but people do need to realize that engagement is key. The biggest part of putting tests into OpenStack isn't landing the tests, but dealing with their failures. I'm glad to see people interested in this. If you're ready to contribute to it, please stop by #openstack-infra or join our next team meeting[1] to discuss how you can help. -Jim [1] https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
Ihar Hrachyshka wrote: [...] For stable branches, we have so called periodic jobs that are triggered once in a while against the current code in a stable branch, and report to openstack-stable-maint@ mailing list. An example of failing periodic job report can be found at [2]. I envision that similar approach can be applied to test auxiliary features in gate. So once something is broken in master, the interested parties behind the auxiliary feature will be informed in due time. [...] The main issue with periodic jobs is that since they are non-blocking, they can get ignored really easily. It takes a bit of organization and process to get those failures addressed. It's only recently (and a lot thanks to you) that failures in the periodic jobs for stable branches are being taken into account quickly and seriously. For years the failures just lingered until they blocked someone's work enough for that person to go and fix them. So while I think periodic jobs are a good way to increase corner case testing coverage, I am skeptical of our collective ability to have the discipline necessary for them not to become a pain. We'll need a strict process around them: identified groups of people signed up to act on failure, and failure stats so that we can remove jobs that don't get enough attention. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
On Wed, 22 Oct 2014, Thierry Carrez wrote: So while I think periodic jobs are a good way to increase corner case testing coverage, I am skeptical of our collective ability to have the discipline necessary for them not to become a pain. We'll need a strict process around them: identified groups of people signed up to act on failure, and failure stats so that we can remove jobs that don't get enough attention. It's a bummer that we often find ourselves turning to processes to make up for a lack of discipline. If that's how it has to be how about we make sure the pain if easy to feel. So, for example, if there are periodic jobs on master and they've just failed for a project, how about just close the gate for that project until the failure identified by the periodic job is fixed? -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] periodic jobs for master
On 10/22/2014 06:07 AM, Thierry Carrez wrote: Ihar Hrachyshka wrote: [...] For stable branches, we have so called periodic jobs that are triggered once in a while against the current code in a stable branch, and report to openstack-stable-maint@ mailing list. An example of failing periodic job report can be found at [2]. I envision that similar approach can be applied to test auxiliary features in gate. So once something is broken in master, the interested parties behind the auxiliary feature will be informed in due time. [...] The main issue with periodic jobs is that since they are non-blocking, they can get ignored really easily. It takes a bit of organization and process to get those failures addressed. It's only recently (and a lot thanks to you) that failures in the periodic jobs for stable branches are being taken into account quickly and seriously. For years the failures just lingered until they blocked someone's work enough for that person to go and fix them. So while I think periodic jobs are a good way to increase corner case testing coverage, I am skeptical of our collective ability to have the discipline necessary for them not to become a pain. We'll need a strict process around them: identified groups of people signed up to act on failure, and failure stats so that we can remove jobs that don't get enough attention. While I share some of your skepticism, we have to find a way to make this work. Saying we are doing our best to ensure the quality of upstream OpenStack based on a single-tier of testing (the gate) that is limited to 40min runs is not plausible. Of course a lot more testing happens downstream but we can do better as a community. I think we should rephrase this subject as non-gating jobs. We could have various kinds of stress and longevity jobs running to good effect if we can solve this process problem. Following on your process suggestion, in practice the most likely way this could actually work is to have a rotation of build guardians that agree to keep an eye on jobs for a short period of time. There would need to be a separate rotation list for each project that has non-gating, project-specific jobs. This will likely happen as we move towards deeper functional testing in projects. The qa team would be the logical pool for a rotation of more global jobs of the kind I think Ihar was referring to. As for failure status, each of these non-gating jobs would have their own name so logstash could be used to debug failures. Do we already have anything that tracks failure rates of jobs? -David ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all] periodic jobs for master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi all, introducing a new auxiliary feature (e.g. a new messaging backend; some specific configuration of common services, like multiple workers in neutron; a new db driver supported by oslo.db; a plugin that lacks its own third-party CI like linuxbridge in neutron...) in infra usually means creating a separate job that is gating all the patches (sometimes non-voting). It requires a lot of resources on infra side, and for voting jobs, it increases chance of the whole run to fail due to intermittent problems in the gate. So there is a push to avoid adding more gating jobs to projects. I fully support that approach, though I doubt that we should leave the code without any kind of integration testing against master. Lack of such testing means it's hard to propose a change in default components used in gate (like a switch to an eventlet aware db driver that I try to pursue [1]). For stable branches, we have so called periodic jobs that are triggered once in a while against the current code in a stable branch, and report to openstack-stable-maint@ mailing list. An example of failing periodic job report can be found at [2]. I envision that similar approach can be applied to test auxiliary features in gate. So once something is broken in master, the interested parties behind the auxiliary feature will be informed in due time. Now, we could say that functional testing for a component that includes the feature should be enough. But it doesn't seem like the approach is applicable either for system wide changes like switching to Qpid, or running all services against another db driver, or for cases when the service to be tested with a new feature is tightly coupled with core (another neutron plugin). Note that I may miss something infra side, e.g. the approach may actually already be applied in some cases unknown to me, or there are some concerns with the approach. Tell me. [1]: https://review.openstack.org/#/c/125044/ [2]: http://lists.openstack.org/pipermail/openstack-stable-maint/2014-October/002794.html Cheers, /Ihar -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJURoAVAAoJEC5aWaUY1u57LSkH/36lZEMQEptFgTRpbd+2yvWC 5w8kjHTRW1Imri9S1L13lNRBfdLNDMhkoSBr+bXiAJtNV19wZG5b5II4z//0By1M BRI+hwo5VSXRmUAvHuosK+AkkrTpaL0v1rkvgRR3Q7dPyA3Z3zsa2+l/Z5wjrSm2 HQXE9sOfrl2fRMvZNumzOCFq09qxDO1lfVLVyBj9u5vrdh5sbtYOTcTX81F4BkNC 2hQUZ+wvOvsC6H5vFTsTSUo3qPCPUzr8vIL0sLb0mKS7HEVrO7nym7Y6oOq9cNLE 4/xUu6v1AoPJVXpfi9Zvnq/JzyFx/xdrpO2+py3SYoN0pg8W6BjjaN8WsHrCQAk= =Sbk6 -END PGP SIGNATURE- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev