On 6/10/2014 5:36 AM, Michael Still wrote:
https://review.openstack.org/99002 adds more logging to nova/network/manager.py, but I think you're not going to love the debug log level. Was this the sort of thing you were looking for though? Michael On Mon, Jun 9, 2014 at 11:45 PM, Sean Dague <[email protected]> wrote:Based on some back of envelope math the gate is basically processing 2 changes an hour, failing one of them. So if you want to know how long the gate is, take the length / 2 in hours. Right now we're doing a lot of revert roulette, trying to revert things that we think landed about the time things went bad. I call this roulette because in many cases the actual issue isn't well understood. A key reason for this is: *nova network is a blackhole* There is no work unit logging in nova-network, and no attempted verification that the commands it ran did a thing. Most of these failures that we don't have good understanding of are the network not working under nova-network. So we could *really* use a volunteer or two to prioritize getting that into nova-network. Without it we might manage to turn down the failure rate by reverting things (or we might not) but we won't really know why, and we'll likely be here again soon. -Sean -- Sean Dague http://dague.net _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
I mentioned this in the nova meeting today also but the assocated bug for the nova-network ssh timeout issue is bug 1298472 [1].
My latest theory on that one is if there could be a race/network leak in the ec2 third party tests in Tempest or something in the ec2 API in nova, because I saw this [2] showing up in the n-net logs. My thinking is the tests or the API are not tearing down cleanly and eventually network resources are leaked and we start hitting those timeouts. Just a theory at this point, but the ec2 3rd party tests do run concurrently with the scenario tests so things could be colliding at that point, but I haven't had time to dig into it, plus I have very little experience in those tests or the ec2 API in nova.
[1] https://bugs.launchpad.net/tempest/+bug/1298472 [2] http://goo.gl/6f1dfw -- Thanks, Matt Riedemann _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
