Yet another discovery in our 10G production deployment.
Under heavy event load, the event queue can fill up.

We used eventlet.backdoor to debug the state of the greenlets, and found many 
greenlets stuck thus:

(3311, <eventlet.greenthread.GreenThread object at 0x64eb230>)
 File "/opt/plexus/lib/python2.6/site-packages/eventlet/greenthread.py", line 
214, in main
   result = function(*args, **kwargs)
 File "/opt/plexus/lib/python2.6/site-packages/ryu/lib/hub.py", line 52, in 
_launch
   func(*args, **kwargs)
 File "/opt/plexus/lib/python2.6/site-packages/ryu/controller/controller.py", 
line 351, in datapath_connection_factory
   if datapath.id is None:
 File "/opt/plexus/lib/python2.6/site-packages/ryu/controller/controller.py", 
line 271, in serve
   # Utility methods for convenience
 File "/opt/plexus/lib/python2.6/site-packages/ryu/controller/controller.py", 
line 104, in deactivate
   method(self)
 File "/opt/plexus/lib/python2.6/site-packages/ryu/controller/controller.py", 
line 201, in _recv_loop
   self.ofp_brick.get_handlers(ev) if
 File "/opt/plexus/lib/python2.6/site-packages/ryu/base/app_manager.py", line 
302, in send_event_to_observers
   self.send_event(observer, ev, state)
 File "/opt/plexus/lib/python2.6/site-packages/ryu/base/app_manager.py", line 
291, in send_event
   SERVICE_BRICKS[name]._send_event(ev, state)
 File "/opt/plexus/lib/python2.6/site-packages/ryu/base/app_manager.py", line 
279, in _send_event
   self.events.put((ev, state))
 File "/opt/plexus/lib/python2.6/site-packages/eventlet/queue.py", line 262, in 
put
   result = waiter.wait()
 File "/opt/plexus/lib/python2.6/site-packages/eventlet/queue.py", line 140, in 
wait
   return get_hub().switch()
 File "/opt/plexus/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 294, 
in switch
   return self.greenlet.switch()

Since the put() to the event queue is blocked indefinitely in the receive loop, 
no further OpenFlow events get processed once this occurs.
Whatever thread blocked in the put() will not yield, so no greenlet will ever 
run to perform a get() to unblock the queue.

With our hardware OpenFlow switches, there is a keepalive timer, that sends 
regular echo requests.
If the echo requests are not replied to within a given time interval, the 
switch disconnects and re-connects.
The event queue being full, however, ensures that the switch re-connection is 
never properly processed - and that leads to a downward spiral of switch 
disconnection/reconnection and unclosed sockets on the controller.

The following patch sets a timeout on the event queue put(), and logs the lost 
event if the put() times out.
In this way, we should at least be able to not block the receive loop from 
closing the socket, and may give other greenlets a chance to consume the event 
queue.

This patch also includes a minor typo fix.

Signed-off-by: Victor J. Orlikowski <[email protected]>

diff --git a/ryu/base/app_manager.py b/ryu/base/app_manager.py
index 3d5d895..5e4b8f0 100644
--- a/ryu/base/app_manager.py
+++ b/ryu/base/app_manager.py
@@ -287,7 +287,11 @@ class RyuApp(object):
                 handler(ev)
 
     def _send_event(self, ev, state):
-        self.events.put((ev, state))
+        try:
+            self.events.put((ev, state), timeout=5)
+        except hub.Full:
+            LOG.debug("EVENT LOST FOR %s %s",
+                      self.name, ev.__class__.__name__)
 
     def send_event(self, name, ev, state=None):
         """
@@ -520,7 +524,7 @@ class AppManager(object):
         self._close(app)
         events = app.events
         if not events.empty():
-            app.logger.debug('%s events remians %d', app.name, events.qsize())
+            app.logger.debug('%s events remains %d', app.name, events.qsize())
 
     def close(self):
         def close_all(close_dict):

Best,
Victor
--
Victor J. Orlikowski <> vjo@[cs.]duke.edu


------------------------------------------------------------------------------
_______________________________________________
Ryu-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ryu-devel

Reply via email to