Hi, Sorry for the slow reply, I'm currently on vacation.
I think we should include the infra mailing list on this discussion so I've cc'd them here. If it's off topic we can take this off list again, however I feel like we may be duplicating efforts at the moment. Re people not using zuul, the brainstormed idea from the infra team during the summit was to have a generic rest endpoint that can take results (and then do stats/graphs etc). Zuul would post to this endpoint as a reporter, but there would be nothing stopping others from implementing their own report posts. Anyway there looks like there is good discussion on the etherpad. Cheers, Josh ________________________________ From: Steve Weston [[email protected]] Sent: Sunday, November 09, 2014 7:00 AM To: Duncan Thomas; Joshua Hesketh Cc: [email protected]; [email protected]; Kurt Taylor; Anita Kuno Subject: Re: [third-party] CI Monitoring Tool The etherpad has been created https://etherpad.openstack.org/p/Third-Party-CI-Dashboard-InitialPlanning I have included my input on introducing a calibration service which the CI systems would use before running a patchset. The idea is this: each project would define one or more jobs which the CI system would run to make sure it is working correctly, and in synchronization with Jenkins, before reporting an errant result. I believe that this would greatly improve the stability of CI and allow problems to be fixed before the CI system runs the patch. Thoughts, comments, and input are welcome! Thanks, Steve On 11/7/14 7:58 PM, Steve Weston wrote: I have already begun work on the code for this project, and yesterday I did write a small bit of code which implements a REST API in the Django REST framework. Although my plan was to expose the data collected by the dashboard to other services, this framework can be modified to additionally be used to act as sort of a check-in service as Josh wrote about below. Tomorrow I will create an etherpad so that folks may start listing out their ideas for how this dashboard will work. I will send out a link once I have it. Thanks, Steve On 11/7/14 7:53 PM, Steve Weston wrote: + Anita On 11/7/14 5:34 PM, Duncan Thomas wrote: So it is worth noting that not every third party ci is using Zuul. I think scraping gerrit (even into a db to run queries about) is a better way forward than adding something else to the ci requirements Duncan Thomas On Nov 7, 2014 4:41 PM, "Joshua Hesketh" <[email protected]<mailto:[email protected]>> wrote: Hi Kurt, Thanks for kicking this conversation off. I wonder if the -infra list would be a good place to include more. So I believe, although we're still brainstorming etc, the vague infra plan is to have a dashboard service with API endpoints that a zuul reporter can talk to. Then all 1st + 3rd parties would report to that and therefore have a dashboard populated and statistics generated etc. So that's kind of the long term plan that will give us some more useful data we can dive into. However, for the moment I think having a simple gerrit-bot-status dashboard (as you have described) will at least help in terms of assessing the health of the systems. I don't think anybody in particular is working on radar so we could probably consume that repository. We should get Michael Still's okay first though (since he's the original author). Cheers, Josh ________________________________________ From: Kurt Taylor [[email protected]<mailto:[email protected]>] Sent: Saturday, November 08, 2014 1:06 AM To: [email protected]<mailto:[email protected]>; Joshua Hesketh; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: [third-party] CI Monitoring Tool In the third-party summit session, we discussed the need for CI systems to have a status dashboard [1]. However, it seems that there are multiple people writing a CI monitoring tool, let's level set: - Josh has written a gerrit event gatherer [2] - Duncan has too - Steve has too (I have not yet talked to Steve) - Radar has a command line scraper, we can remove and just use radar gauges with one of the api backends above, fairly simple [3] - Nova also discussed CI monitoring and status reporting [4]. Matt owns? a requirement for Nova to implement CI monitoring (I have not yet talked to Matt) [1] https://etherpad.openstack.org/p/kilo-third-party-items [2] https://github.com/stackforge/turbo-hipster/blob/master/tools/zuul_enqueue.py [3] https://github.com/rcbau/radar/blob/master/report.py [4] https://etherpad.openstack.org/p/nova-ci-status-checkpoint-kilo >From conversations with Josh and Duncan, we believe that a good initial plan is to diff a patch with what Jenkins reported, if failed and different, collect 5? (or 3?) failures then re-queue a last known successful patch run. If that fails, the CI system is not working properly. I believe that covers 95% maybe higher of scenarios. I like Josh's idea to just have a browser page refresh kick of a sample collection and report via radar guages. Start simple, then we could ask infra to have cron fire off gathering once every 20 minutes or so, then maybe push this data to a database, and so on. So, the question is, do we create a new github repo for a new tool? reuse Radar repo? Let's get skeleton code somewhere (no preference) and the we can get more involvement and figure out where this should live. We should create a spec in openstack-infra. If we agree, I'll be happy to shepherd that. Comments? Kurt Taylor (krtaylor)
_______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
