Excerpts from Maru Newby's message of 2013-12-03 08:08:09 -0800:
> I've been investigating a bug that is preventing VM's from receiving IP 
> addresses when a Neutron service is under high load:
> 
> https://bugs.launchpad.net/neutron/+bug/1192381
> 
> High load causes the DHCP agent's status updates to be delayed, causing the 
> Neutron service to assume that the agent is down.  This results in the 
> Neutron service not sending notifications of port addition to the DHCP agent. 
>  At present, the notifications are simply dropped.  A simple fix is to send 
> notifications regardless of agent status.  Does anybody have any objections 
> to this stop-gap approach?  I'm not clear on the implications of sending 
> notifications to agents that are down, but I'm hoping for a simple fix that 
> can be backported to both havana and grizzly (yes, this bug has been with us 
> that long).
> 
> Fixing this problem for real, though, will likely be more involved.  The 
> proposal to replace the current wsgi framework with Pecan may increase the 
> Neutron service's scalability, but should we continue to use a 'fire and 
> forget' approach to notification?  Being able to track the success or failure 
> of a given action outside of the logs would seem pretty important, and allow 
> for more effective coordination with Nova than is currently possible.
> 

Dropping requests without triggering a user-visible error is a pretty
serious problem. You didn't mention if you have filed a bug about that.
If not, please do or let us know here so we can investigate and file
a bug.

It seems to me that they should be put into a queue to be retried.
Sending the notifications blindly is almost as bad as dropping them,
as you have no idea if the agent is alive or not.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to