On Fri, Dec 06, 2013 at 04:30:17PM +0900, Maru Newby <ma...@redhat.com> wrote:
> > On Dec 5, 2013, at 5:21 PM, Isaku Yamahata <isaku.yamah...@gmail.com> wrote: > > > On Wed, Dec 04, 2013 at 12:37:19PM +0900, > > Maru Newby <ma...@redhat.com> wrote: > > > >> In the current architecture, the Neutron service handles RPC and WSGI with > >> a single process and is prone to being overloaded such that agent > >> heartbeats can be delayed beyond the limit for the agent being declared > >> 'down'. Even if we increased the agent timeout as Yongsheg suggests, > >> there is no guarantee that we can accurately detect whether an agent is > >> 'live' with the current architecture. Given that amqp can ensure eventual > >> delivery - it is a queue - is sending a notification blind such a bad > >> idea? In the best case the agent isn't really down and can process the > >> notification. In the worst case, the agent really is down but will be > >> brought up eventually by a deployment's monitoring solution and process > >> the notification when it returns. What am I missing? > >> > > > > Do you mean overload of neutron server? Not neutron agent. > > So event agent sends periodic 'live' report, the reports are piled up > > unprocessed by server. > > When server sends notification, it considers agent dead wrongly. > > Not because agent didn't send live reports due to overload of agent. > > Is this understanding correct? > > Your interpretation is likely correct. The demands on the service are going > to be much higher by virtue of having to field RPC requests from all the > agents to interact with the database on their behalf. Is this strongly indicating thread-starvation. i.e. too much unfair thread scheduling. Given that eventlet is cooperative threading, should sleep(0) to hogging thread? -- Isaku Yamahata <isaku.yamah...@gmail.com> _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev