Hello, We have a few particularly annoying bugs that have been impacting the reliability of gate testing recently. It would be great if we could get volunteers to look at these bugs to improve the reliability of our testing as we start working on Pike.
These two issues have been identified by elastic-recheck as being our biggest problems: 1. SSH Banner bug http://status.openstack.org/elastic-recheck/#1349617 This bug is a longstanding issue that comes and goes and also has lots of very similar (but subtly different) failure modes. Tempest attempts to ssh into the cirros guest and is unable to after 18 attempts over the 300 sec timeout window and fails to login. Paramiko reports that there was an issue reading the banner returned on port 22 from the guest. This indicates that something is likely responding on port 22. We're working on trying to get more details on what is the cause here with: https://review.openstack.org/437128 2. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911 and http://status.openstack.org/elastic-recheck/#1646779 Libvirt is randomly crashing during the job which causes things to fail (for obvious reasons). To address this will likely require someone with experience debugging libvirt since it's most likely a bug isolated to libvirt. Tonyb has offered to start working on this so talk to him to coordinate efforts around fixing this. The other thing to note is the oom-killer bug: http://status.openstack.org/elastic-recheck/gate.html#1656386 while there aren't a lot of hits in logstash for this particular bug, it does raise an import issue about the increased memory pressure on the test nodes. It's likely that a lot of the instability may be related to the increased load on the nodes. As a starting point all projects should look at their memory footprint and see where they can trim things to try and make the situation better. As a friendly reminder we do track bug rate incidence within our testing using the elastic-recheck tool. You can find that data at http://status.openstack.org/elastic-recheck. It can be quite useful to start there when determining which bugs to fix based on impact. Elastic recheck also maintains a list of failures that occurred without a known signature: http://status.openstack.org/elastic-recheck/data/integrated_gate.html We also need some people to help maintain the list of existing queries, we have a lot of queries for closed bugs that have no hits and others which are overly broad and matching failures which are unrelated to the bug. This would also be good task for a new person to start getting involved with. Feel free to submit patches to: https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries to track new issues. Thank you, mtreinish and clarkb
signature.asc
Description: PGP signature
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev