Excerpts from Maru Newby's message of 2013-12-03 08:08:09 -0800: > I've been investigating a bug that is preventing VM's from receiving IP > addresses when a Neutron service is under high load: > > https://bugs.launchpad.net/neutron/+bug/1192381 > > High load causes the DHCP agent's status updates to be delayed, causing the > Neutron service to assume that the agent is down. This results in the > Neutron service not sending notifications of port addition to the DHCP agent. > At present, the notifications are simply dropped. A simple fix is to send > notifications regardless of agent status. Does anybody have any objections > to this stop-gap approach? I'm not clear on the implications of sending > notifications to agents that are down, but I'm hoping for a simple fix that > can be backported to both havana and grizzly (yes, this bug has been with us > that long). > > Fixing this problem for real, though, will likely be more involved. The > proposal to replace the current wsgi framework with Pecan may increase the > Neutron service's scalability, but should we continue to use a 'fire and > forget' approach to notification? Being able to track the success or failure > of a given action outside of the logs would seem pretty important, and allow > for more effective coordination with Nova than is currently possible. >
Dropping requests without triggering a user-visible error is a pretty serious problem. You didn't mention if you have filed a bug about that. If not, please do or let us know here so we can investigate and file a bug. It seems to me that they should be put into a queue to be retried. Sending the notifications blindly is almost as bad as dropping them, as you have no idea if the agent is alive or not. _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev