On 9/7/2014 8:39 AM, John Schwarz wrote:
Hi, Long story short: for future reference, if you initialize an eventlet Timeout, make sure you close it (either with a context manager or simply timeout.close()), and be extra-careful when writing tests using eventlet Timeouts, because these timeouts don't implicitly expire and will cause unexpected behaviours (see [1]) like gate failures. In our case this caused non-deterministic failures on the dsvm-functional test suite. Late last week, a bug was found ([2]) in which an eventlet Timeout object was initialized but not closed. This instance was left inside eventlet's inner-workings and triggered non-deterministic "Timeout: 10 seconds" errors and failures in dsvm-functional tests. As mentioned earlier, initializing a new eventlet.timeout.Timeout instance also registers it to inner mechanisms that exist within the library, and the reference remains there until it is explicitly removed (and not until the scope leaves the function block, as some would have thought). Thus, the old code (simply creating an instance without assigning it to a variable) left no way to close the timeout object. This reference remains throughout the "life" of a worker, so this can (and did) effect other tests and procedures using eventlet under the same process. Obviously this could easily effect production-grade systems with very high load. For future reference: 1) If you run into a "Timeout: %d seconds" exception whose traceback includes "hub.switch()" and "self.greenlet.switch()" calls, there might be a latent Timeout somewhere in the code, and a search for all eventlet.timeout.Timeout instances will probably produce the culprit. 2) The setup used to reproduce this error for debugging purposes is a baremetal machine running a VM with devstack. In the baremetal machine I used some 6 "dd if=/dev/zero of=/dev/null" to simulate high CPU load (full command can be found at [3]), and in the VM I ran the dsvm-functional suite. Using only a VM with similar high CPU simulation fails to produce the result. [1] http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.timeout.Timeout.Timeout.cancel [2] https://review.openstack.org/#/c/119001/ [3] http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command -- John Schwarz, Software Engineer, Red Hat. _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Thanks, that might be what's causing this timeout/gate failure in the nova unit tests. [1]
[1] https://bugs.launchpad.net/nova/+bug/1357578 -- Thanks, Matt Riedemann _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
