Salvatore, By 'fairness' I meant chances for state report greenthread to get the control. In DHCP case, each network processed by a separate greenthread, so the more greenthreads agent has, the less chances that report state greenthread will be able to report in time.
Thanks, Eugene. On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <sorla...@nicira.com> wrote: > On 5 June 2015 at 01:29, Itsuro ODA <o...@valinux.co.jp> wrote: > >> Hi, >> >> > After trying to reproduce this, I'm suspecting that the issue is >> actually >> > on the server side from failing to drain the agent report state queue in >> > time. >> >> I have seen before. >> I thought the senario at that time as follows. >> * a lot of create/update resource API issued >> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked >> farther sending side of RPC. >> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size" >> pool for replying RPC. >> * receiving state_report is blocked because "rpc_thread_pool_size" pool >> exhausted. >> >> > I think this could be a good explanation couldn't it? > Kevin proved that the periodic tasks are not mutually exclusive and that > long process times for sync_routers are not an issue. > However, he correctly suspected a server-side involvement, which could > actually be a lot of requests saturating the RPC pool. > > On the other hand, how could we use this theory to explain why this issue > tend to occur when the agent is restarted? > Also, Eugene, what do you mean by stating that the issue could be in > agent's "fairness"? > > Salvatore > > > >> Thanks >> Itsuro Oda >> >> On Thu, 4 Jun 2015 14:20:33 -0700 >> Kevin Benton <blak...@gmail.com> wrote: >> >> > After trying to reproduce this, I'm suspecting that the issue is >> actually >> > on the server side from failing to drain the agent report state queue in >> > time. >> > >> > I set the report_interval to 1 second on the agent and added a logging >> > statement and I see a report every 1 second even when sync_routers is >> > taking a really long time. >> > >> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <c...@ecbaldwin.net> >> wrote: >> > >> > > Ann, >> > > >> > > Thanks for bringing this up. It has been on the shelf for a while >> now. >> > > >> > > Carl >> > > >> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando < >> sorla...@nicira.com> >> > > wrote: >> > > > One reason for not sending the heartbeat from a separate greenthread >> > > could >> > > > be that the agent is already doing it [1]. >> > > > The current proposed patch addresses the issue blindly - that is to >> say >> > > > before declaring an agent dead let's wait for some more time >> because it >> > > > could be stuck doing stuff. In that case I would probably make the >> > > > multiplier (currently 2x) configurable. >> > > > >> > > > The reason for which state report does not occur is probably that >> both it >> > > > and the resync procedure are periodic tasks. If I got it right >> they're >> > > both >> > > > executed as eventlet greenthreads but one at a time. Perhaps then >> adding >> > > an >> > > > initial delay to the full sync task might ensure the first thing an >> agent >> > > > does when it comes up is sending a heartbeat to the server? >> > > > >> > > > On the other hand, while doing the initial full resync, is the >> agent >> > > able >> > > > to process updates? If not perhaps it makes sense to have it down >> until >> > > it >> > > > finishes synchronisation. >> > > >> > > Yes, it can! The agent prioritizes updates from RPC over full resync >> > > activities. >> > > >> > > I wonder if the agent should check how long it has been since its last >> > > state report each time it finishes processing an update for a router. >> > > It normally doesn't take very long (relatively) to process an update >> > > to a single router. >> > > >> > > I still would like to know why the thread to report state is being >> > > starved. Anyone have any insight on this? I thought that with all >> > > the system calls, the greenthreads would yield often. There must be >> > > something I don't understand about it. >> > > >> > > Carl >> > > >> > > >> __________________________________________________________________________ >> > > OpenStack Development Mailing List (not for usage questions) >> > > Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > >> > >> > >> > >> > -- >> > Kevin Benton >> >> -- >> Itsuro ODA <o...@valinux.co.jp> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev