Re: [openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

2014-09-10 Thread Angus Lees
On Mon, 8 Sep 2014 05:25:22 PM Jay Pipes wrote:
 On 09/07/2014 10:43 AM, Matt Riedemann wrote:
  On 9/7/2014 8:39 AM, John Schwarz wrote:
  Hi,
  
  Long story short: for future reference, if you initialize an eventlet
  Timeout, make sure you close it (either with a context manager or simply
  timeout.close()), and be extra-careful when writing tests using
  eventlet Timeouts, because these timeouts don't implicitly expire and
  will cause unexpected behaviours (see [1]) like gate failures. In our
  case this caused non-deterministic failures on the dsvm-functional test
  suite.
  
  
  Late last week, a bug was found ([2]) in which an eventlet Timeout
  object was initialized but not closed. This instance was left inside
  eventlet's inner-workings and triggered non-deterministic Timeout: 10
  seconds errors and failures in dsvm-functional tests.
  
  As mentioned earlier, initializing a new eventlet.timeout.Timeout
  instance also registers it to inner mechanisms that exist within the
  library, and the reference remains there until it is explicitly removed
  (and not until the scope leaves the function block, as some would have
  thought). Thus, the old code (simply creating an instance without
  assigning it to a variable) left no way to close the timeout object.
  This reference remains throughout the life of a worker, so this can
  (and did) effect other tests and procedures using eventlet under the
  same process. Obviously this could easily effect production-grade
  systems with very high load.
  
  For future reference:
1) If you run into a Timeout: %d seconds exception whose traceback
  
  includes hub.switch() and self.greenlet.switch() calls, there might
  be a latent Timeout somewhere in the code, and a search for all
  eventlet.timeout.Timeout instances will probably produce the culprit.
  
2) The setup used to reproduce this error for debugging purposes is a
  
  baremetal machine running a VM with devstack. In the baremetal machine I
  used some 6 dd if=/dev/zero of=/dev/null to simulate high CPU load
  (full command can be found at [3]), and in the VM I ran the
  dsvm-functional suite. Using only a VM with similar high CPU simulation
  fails to produce the result.
  
  [1]
  http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.ti
  meout.Timeout.Timeout.cancel
  
  [2] https://review.openstack.org/#/c/119001/
  [3]
  http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with
  -a-bash-command
  
  
  
  --
  John Schwarz,
  Software Engineer, Red Hat.
  
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  Thanks, that might be what's causing this timeout/gate failure in the
  nova unit tests. [1]
  
  [1] https://bugs.launchpad.net/nova/+bug/1357578
 
 Indeed, there are a couple places where eventlet.timeout.Timeout() seems
 to be used in the test suite without a context manager or calling
 close() explicitly:
 
 tests/virt/libvirt/test_driver.py
 8925:raise eventlet.timeout.Timeout()
 
 tests/virt/hyperv/test_vmops.py
 196:mock_with_timeout.side_effect = etimeout.Timeout()

If it's useful for anyone, I wrote a quick pylint test that will catch all the 
above cases of misused context managers.

(Indeed, it will currently trigger on the raise Timeout() case, which is 
probably too eager but can be disabled in the usual #pylint meta-comment way)

Here: https://review.openstack.org/#/c/120320/ 

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

2014-09-10 Thread Miguel Angel Ajo Pelayo
Good catch John, and good work Angus! ;)

This will save a lot of headaches.

- Original Message -
 On Mon, 8 Sep 2014 05:25:22 PM Jay Pipes wrote:
  On 09/07/2014 10:43 AM, Matt Riedemann wrote:
   On 9/7/2014 8:39 AM, John Schwarz wrote:
   Hi,
   
   Long story short: for future reference, if you initialize an eventlet
   Timeout, make sure you close it (either with a context manager or simply
   timeout.close()), and be extra-careful when writing tests using
   eventlet Timeouts, because these timeouts don't implicitly expire and
   will cause unexpected behaviours (see [1]) like gate failures. In our
   case this caused non-deterministic failures on the dsvm-functional test
   suite.
   
   
   Late last week, a bug was found ([2]) in which an eventlet Timeout
   object was initialized but not closed. This instance was left inside
   eventlet's inner-workings and triggered non-deterministic Timeout: 10
   seconds errors and failures in dsvm-functional tests.
   
   As mentioned earlier, initializing a new eventlet.timeout.Timeout
   instance also registers it to inner mechanisms that exist within the
   library, and the reference remains there until it is explicitly removed
   (and not until the scope leaves the function block, as some would have
   thought). Thus, the old code (simply creating an instance without
   assigning it to a variable) left no way to close the timeout object.
   This reference remains throughout the life of a worker, so this can
   (and did) effect other tests and procedures using eventlet under the
   same process. Obviously this could easily effect production-grade
   systems with very high load.
   
   For future reference:
 1) If you run into a Timeout: %d seconds exception whose traceback
   
   includes hub.switch() and self.greenlet.switch() calls, there might
   be a latent Timeout somewhere in the code, and a search for all
   eventlet.timeout.Timeout instances will probably produce the culprit.
   
 2) The setup used to reproduce this error for debugging purposes is a
   
   baremetal machine running a VM with devstack. In the baremetal machine I
   used some 6 dd if=/dev/zero of=/dev/null to simulate high CPU load
   (full command can be found at [3]), and in the VM I ran the
   dsvm-functional suite. Using only a VM with similar high CPU simulation
   fails to produce the result.
   
   [1]
   http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.ti
   meout.Timeout.Timeout.cancel
   
   [2] https://review.openstack.org/#/c/119001/
   [3]
   http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with
   -a-bash-command
   
   
   
   --
   John Schwarz,
   Software Engineer, Red Hat.
   
   
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
   
   Thanks, that might be what's causing this timeout/gate failure in the
   nova unit tests. [1]
   
   [1] https://bugs.launchpad.net/nova/+bug/1357578
  
  Indeed, there are a couple places where eventlet.timeout.Timeout() seems
  to be used in the test suite without a context manager or calling
  close() explicitly:
  
  tests/virt/libvirt/test_driver.py
  8925:raise eventlet.timeout.Timeout()
  
  tests/virt/hyperv/test_vmops.py
  196:mock_with_timeout.side_effect = etimeout.Timeout()
 
 If it's useful for anyone, I wrote a quick pylint test that will catch all
 the
 above cases of misused context managers.
 
 (Indeed, it will currently trigger on the raise Timeout() case, which is
 probably too eager but can be disabled in the usual #pylint meta-comment way)
 
 Here: https://review.openstack.org/#/c/120320/
 
 --
  - Gus
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

2014-09-09 Thread Kevin L. Mitchell
On Mon, 2014-09-08 at 17:25 -0400, Jay Pipes wrote:
  Thanks, that might be what's causing this timeout/gate failure in the
  nova unit tests. [1]
 
  [1] https://bugs.launchpad.net/nova/+bug/1357578
 
 Indeed, there are a couple places where eventlet.timeout.Timeout() seems 
 to be used in the test suite without a context manager or calling 
 close() explicitly:
 
 tests/virt/libvirt/test_driver.py
 8925:raise eventlet.timeout.Timeout()
 
 tests/virt/hyperv/test_vmops.py
 196:mock_with_timeout.side_effect = etimeout.Timeout()

I looked into that too, but the docs for Timeout indicate that it's an
Exception subclass, and passing it no args doesn't seem to start the
timer running.  I think you have to explicitly pass a duration value for
Timeout to enable its timeout behavior, but that's just a guess on my
part at this point…
-- 
Kevin L. Mitchell kevin.mitch...@rackspace.com
Rackspace


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

2014-09-08 Thread Jay Pipes

On 09/07/2014 10:43 AM, Matt Riedemann wrote:

On 9/7/2014 8:39 AM, John Schwarz wrote:

Hi,

Long story short: for future reference, if you initialize an eventlet
Timeout, make sure you close it (either with a context manager or simply
timeout.close()), and be extra-careful when writing tests using
eventlet Timeouts, because these timeouts don't implicitly expire and
will cause unexpected behaviours (see [1]) like gate failures. In our
case this caused non-deterministic failures on the dsvm-functional test
suite.


Late last week, a bug was found ([2]) in which an eventlet Timeout
object was initialized but not closed. This instance was left inside
eventlet's inner-workings and triggered non-deterministic Timeout: 10
seconds errors and failures in dsvm-functional tests.

As mentioned earlier, initializing a new eventlet.timeout.Timeout
instance also registers it to inner mechanisms that exist within the
library, and the reference remains there until it is explicitly removed
(and not until the scope leaves the function block, as some would have
thought). Thus, the old code (simply creating an instance without
assigning it to a variable) left no way to close the timeout object.
This reference remains throughout the life of a worker, so this can
(and did) effect other tests and procedures using eventlet under the
same process. Obviously this could easily effect production-grade
systems with very high load.

For future reference:
  1) If you run into a Timeout: %d seconds exception whose traceback
includes hub.switch() and self.greenlet.switch() calls, there might
be a latent Timeout somewhere in the code, and a search for all
eventlet.timeout.Timeout instances will probably produce the culprit.

  2) The setup used to reproduce this error for debugging purposes is a
baremetal machine running a VM with devstack. In the baremetal machine I
used some 6 dd if=/dev/zero of=/dev/null to simulate high CPU load
(full command can be found at [3]), and in the VM I ran the
dsvm-functional suite. Using only a VM with similar high CPU simulation
fails to produce the result.

[1]
http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.timeout.Timeout.Timeout.cancel

[2] https://review.openstack.org/#/c/119001/
[3]
http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command



--
John Schwarz,
Software Engineer, Red Hat.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Thanks, that might be what's causing this timeout/gate failure in the
nova unit tests. [1]

[1] https://bugs.launchpad.net/nova/+bug/1357578


Indeed, there are a couple places where eventlet.timeout.Timeout() seems 
to be used in the test suite without a context manager or calling 
close() explicitly:


tests/virt/libvirt/test_driver.py
8925:raise eventlet.timeout.Timeout()

tests/virt/hyperv/test_vmops.py
196:mock_with_timeout.side_effect = etimeout.Timeout()

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

2014-09-07 Thread Matt Riedemann



On 9/7/2014 8:39 AM, John Schwarz wrote:

Hi,

Long story short: for future reference, if you initialize an eventlet
Timeout, make sure you close it (either with a context manager or simply
timeout.close()), and be extra-careful when writing tests using
eventlet Timeouts, because these timeouts don't implicitly expire and
will cause unexpected behaviours (see [1]) like gate failures. In our
case this caused non-deterministic failures on the dsvm-functional test
suite.


Late last week, a bug was found ([2]) in which an eventlet Timeout
object was initialized but not closed. This instance was left inside
eventlet's inner-workings and triggered non-deterministic Timeout: 10
seconds errors and failures in dsvm-functional tests.

As mentioned earlier, initializing a new eventlet.timeout.Timeout
instance also registers it to inner mechanisms that exist within the
library, and the reference remains there until it is explicitly removed
(and not until the scope leaves the function block, as some would have
thought). Thus, the old code (simply creating an instance without
assigning it to a variable) left no way to close the timeout object.
This reference remains throughout the life of a worker, so this can
(and did) effect other tests and procedures using eventlet under the
same process. Obviously this could easily effect production-grade
systems with very high load.

For future reference:
  1) If you run into a Timeout: %d seconds exception whose traceback
includes hub.switch() and self.greenlet.switch() calls, there might
be a latent Timeout somewhere in the code, and a search for all
eventlet.timeout.Timeout instances will probably produce the culprit.

  2) The setup used to reproduce this error for debugging purposes is a
baremetal machine running a VM with devstack. In the baremetal machine I
used some 6 dd if=/dev/zero of=/dev/null to simulate high CPU load
(full command can be found at [3]), and in the VM I ran the
dsvm-functional suite. Using only a VM with similar high CPU simulation
fails to produce the result.

[1]
http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.timeout.Timeout.Timeout.cancel
[2] https://review.openstack.org/#/c/119001/
[3]
http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command


--
John Schwarz,
Software Engineer, Red Hat.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Thanks, that might be what's causing this timeout/gate failure in the 
nova unit tests. [1]


[1] https://bugs.launchpad.net/nova/+bug/1357578

--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev