Copy/paste from https://bugzilla.redhat.com/show_bug.cgi?id=1384183#c10:
In the oslo.messaging rabbit driver, there is a thread for heartbeats: https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/_drivers/impl_rabbit.py?id=293084f9b5ef59c4b2dcd8c7180f7053b47337a9#n968 specifically, the three methods, _heartbeat_start, _heartbeat_stop, and _heartbeat_thread_job. In _heartbeat_start, it creates a threading.Event (this event ends up being the root of the problem), a threading.Thread for the heartbeat thread, and then starts the heartbeat thread. The heartbeat thread does work, and then at the end executes: self._heartbeat_exit_event.wait( timeout=self._heartbeat_wait_timeout) The _heartbeat_stop method sets the exit event when the connection is stopped, this is where it receives that and shuts down. With the default heartbeat configuration, _heartbeat_wait_timeout ends up being 15.0 seconds. This sounds reasonable, but explodes under the hood. Event.wait() ultimately calls Condition.wait(), which runs this gem[1] of code: # Balancing act: We can't afford a pure busy loop, so we # have to sleep; but if we sleep the whole timeout time, # we'll be unresponsive. The scheme here sleeps very # little at first, longer as time goes on, but never longer # than 20 times per second (or the timeout time remaining). endtime = _time() + timeout delay = 0.0005 # 500 us -> initial delay of 1 ms while True: gotit = waiter.acquire(0) if gotit: break remaining = endtime - _time() if remaining <= 0: break delay = min(delay * 2, remaining, .05) _sleep(delay) Now, the problem is that _sleep here is time.sleep, which has been previously monkey-patched by eventlet to be eventlet.greenthread.sleep instead. When you call sleep() in eventlet, it adds a timer onto an internal heap that it maintains[2]. When the main event loop runs, it checks whatever the shortest timer is, and uses that delta as the argument for waiting (this ultimately ends up to epoll_wait). Here's the crucial part of the code[3]: wakeup_when = self.sleep_until() if wakeup_when is None: sleep_time = self.default_sleep() else: sleep_time = wakeup_when - self.clock() if sleep_time > 0: self.wait(sleep_time) else: self.wait(0) Because Condition.wait() uses such small sleep times between 0.0005..0.05, these tiny timers are almost always the next timer to fire, and sleep_time above ends up being negative (or small enough to practically be zero anyways). The self.wait() call propagates down to epoll_wait() and you get the behavior we're seeing. So, the root of the problem is that using threading.Event inside of eventlet is bad. [1] https://hg.python.org/cpython/file/v2.7.12/Lib/threading.py#l344 [2] https://github.com/eventlet/eventlet/blob/7343f2e817fbc09d07c36727dfb2e4655a37901f/eventlet/hubs/hub.py#L389-L392 [3] https://github.com/eventlet/eventlet/blob/7343f2e817fbc09d07c36727dfb2e4655a37901f/eventlet/hubs/hub.py#L340-L348 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1518430 Title: liberty: ~busy loop on epoll_wait being called with zero timeout To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1518430/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
