Re: [openstack-dev] The recent gate performance and how it affects you

Matt Riedemann Wed, 20 Nov 2013 13:51:11 -0800


On Wednesday, November 20, 2013 2:44:52 PM, Clark Boylan wrote:

Joe Gordon has been doing great working tracking test failures and how
often they affect us. Post Havana release the failure rate has
increased dramatically, negatively affecting the gate and forcing it to
run in a near worst case scenario. That is changes are being tested in
parallel but the head of the queue is more often than not running into a
failed job forcing all changes behind it to be retested and so on.

This led to a gate queue 130 deep with the head of the queue 18 hours
behind its approval. We have identified fixes for some of the worst
current bugs and in order to get them in have restarted Zuul effectively
cancelling the gate queue and have queued these changes up at the front
of the qeueue. Once these changes are in and we are happy with the bug
fixing results we will requeue changes that were in the queue when it
got cancelled.

How do we avoid this in the future? Step one is reviewers that are
approving changes (or reverifying them) should keep an eye on the gate
queue. If it is struggling adding more changes to that queue problably
won't help. Instead we should focus on identifying the bugs, submitting
changes to elastic-recheck to track these bugs, and work towards fixing
the bugs. Everyone is affected by persistent gate failures, we need to
work together to fix them.

Thank you for your patience,

Clark

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Let me also say that I think it's really helpful that Joe has beensending out recaps to the mailing list about the top offenders sopeople can help pitch in on investigating and fixing those (like we sawwith the Neutron team's response to Joe's recent post about the topgate failures).

People get heads-down in their own projects and what they are workingon and it's hard to keep up with what's going on in the infra channel(or nova channel for that matter), so sending out a recap that everyonecan see in the mailing list is helpful to reset where things are at andfocus possibly various isolated investigations (as we saw happen thisweek).


--

Thanks,

Matt Riedemann


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] The recent gate performance and how it affects you

Reply via email to