Yes, 50-100 networks received by DHCP agent on startup could cause 2nd state report to be sent seconds after it should be sent. In my tests, if I recall correctly, it was ~70 networks and delay between 1st and 2nd state report around 25 seconds (while 5 sec was configured)
Eugene. On Sun, Jun 7, 2015 at 11:11 PM, Kevin Benton <blak...@gmail.com> wrote: > Well a greenthread will only yield when it makes a blocking call like > writing to a network socket, file, etc. So once the report_state > greenthread starts executing, it won't yield until it makes a call like > that. > > I looked through the report_state code for the DHCP agent and the only > blocking call it seems to make is the AMQP report_state call/cast itself. > So even with a bunch of other workers, the report_state thread should get > execution fairly quickly since most of our workers should yield very > frequently when they make process calls, etc. That's why I assumed that > there must be something actually stopping it from sending the message. > > Do you have a way to reproduce the issue with the DHCP agent? > > On Sun, Jun 7, 2015 at 9:21 PM, Eugene Nikanorov <enikano...@mirantis.com> > wrote: > >> No, I think greenthread itself don't do anything special, it's just when >> there are too many threads, state_report thread can't get the control for >> too long, since there is no prioritization of greenthreads. >> >> Eugene. >> >> On Sun, Jun 7, 2015 at 8:24 PM, Kevin Benton <blak...@gmail.com> wrote: >> >>> I understand now. So the issue is that the report_state greenthread is >>> just blocking and yielding whenever it tries to actually send a message? >>> >>> On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov < >>> enikano...@mirantis.com> wrote: >>> >>>> Salvatore, >>>> >>>> By 'fairness' I meant chances for state report greenthread to get the >>>> control. In DHCP case, each network processed by a separate greenthread, so >>>> the more greenthreads agent has, the less chances that report state >>>> greenthread will be able to report in time. >>>> >>>> Thanks, >>>> Eugene. >>>> >>>> On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <sorla...@nicira.com> >>>> wrote: >>>> >>>>> On 5 June 2015 at 01:29, Itsuro ODA <o...@valinux.co.jp> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> > After trying to reproduce this, I'm suspecting that the issue is >>>>>> actually >>>>>> > on the server side from failing to drain the agent report state >>>>>> queue in >>>>>> > time. >>>>>> >>>>>> I have seen before. >>>>>> I thought the senario at that time as follows. >>>>>> * a lot of create/update resource API issued >>>>>> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked >>>>>> farther sending side of RPC. >>>>>> * "rpc_thread_pool_size" pool exhausted by waiting >>>>>> "rpc_conn_pool_size" >>>>>> pool for replying RPC. >>>>>> * receiving state_report is blocked because "rpc_thread_pool_size" >>>>>> pool >>>>>> exhausted. >>>>>> >>>>>> >>>>> I think this could be a good explanation couldn't it? >>>>> Kevin proved that the periodic tasks are not mutually exclusive and >>>>> that long process times for sync_routers are not an issue. >>>>> However, he correctly suspected a server-side involvement, which could >>>>> actually be a lot of requests saturating the RPC pool. >>>>> >>>>> On the other hand, how could we use this theory to explain why this >>>>> issue tend to occur when the agent is restarted? >>>>> Also, Eugene, what do you mean by stating that the issue could be in >>>>> agent's "fairness"? >>>>> >>>>> Salvatore >>>>> >>>>> >>>>> >>>>>> Thanks >>>>>> Itsuro Oda >>>>>> >>>>>> On Thu, 4 Jun 2015 14:20:33 -0700 >>>>>> Kevin Benton <blak...@gmail.com> wrote: >>>>>> >>>>>> > After trying to reproduce this, I'm suspecting that the issue is >>>>>> actually >>>>>> > on the server side from failing to drain the agent report state >>>>>> queue in >>>>>> > time. >>>>>> > >>>>>> > I set the report_interval to 1 second on the agent and added a >>>>>> logging >>>>>> > statement and I see a report every 1 second even when sync_routers >>>>>> is >>>>>> > taking a really long time. >>>>>> > >>>>>> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <c...@ecbaldwin.net> >>>>>> wrote: >>>>>> > >>>>>> > > Ann, >>>>>> > > >>>>>> > > Thanks for bringing this up. It has been on the shelf for a >>>>>> while now. >>>>>> > > >>>>>> > > Carl >>>>>> > > >>>>>> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando < >>>>>> sorla...@nicira.com> >>>>>> > > wrote: >>>>>> > > > One reason for not sending the heartbeat from a separate >>>>>> greenthread >>>>>> > > could >>>>>> > > > be that the agent is already doing it [1]. >>>>>> > > > The current proposed patch addresses the issue blindly - that >>>>>> is to say >>>>>> > > > before declaring an agent dead let's wait for some more time >>>>>> because it >>>>>> > > > could be stuck doing stuff. In that case I would probably make >>>>>> the >>>>>> > > > multiplier (currently 2x) configurable. >>>>>> > > > >>>>>> > > > The reason for which state report does not occur is probably >>>>>> that both it >>>>>> > > > and the resync procedure are periodic tasks. If I got it right >>>>>> they're >>>>>> > > both >>>>>> > > > executed as eventlet greenthreads but one at a time. Perhaps >>>>>> then adding >>>>>> > > an >>>>>> > > > initial delay to the full sync task might ensure the first >>>>>> thing an agent >>>>>> > > > does when it comes up is sending a heartbeat to the server? >>>>>> > > > >>>>>> > > > On the other hand, while doing the initial full resync, is the >>>>>> agent >>>>>> > > able >>>>>> > > > to process updates? If not perhaps it makes sense to have it >>>>>> down until >>>>>> > > it >>>>>> > > > finishes synchronisation. >>>>>> > > >>>>>> > > Yes, it can! The agent prioritizes updates from RPC over full >>>>>> resync >>>>>> > > activities. >>>>>> > > >>>>>> > > I wonder if the agent should check how long it has been since its >>>>>> last >>>>>> > > state report each time it finishes processing an update for a >>>>>> router. >>>>>> > > It normally doesn't take very long (relatively) to process an >>>>>> update >>>>>> > > to a single router. >>>>>> > > >>>>>> > > I still would like to know why the thread to report state is being >>>>>> > > starved. Anyone have any insight on this? I thought that with >>>>>> all >>>>>> > > the system calls, the greenthreads would yield often. There must >>>>>> be >>>>>> > > something I don't understand about it. >>>>>> > > >>>>>> > > Carl >>>>>> > > >>>>>> > > >>>>>> __________________________________________________________________________ >>>>>> > > OpenStack Development Mailing List (not for usage questions) >>>>>> > > Unsubscribe: >>>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Kevin Benton >>>>>> >>>>>> -- >>>>>> Itsuro ODA <o...@valinux.co.jp> >>>>>> >>>>>> >>>>>> >>>>>> __________________________________________________________________________ >>>>>> OpenStack Development Mailing List (not for usage questions) >>>>>> Unsubscribe: >>>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> >>>>> >>>>> >>>>> >>>>> __________________________________________________________________________ >>>>> OpenStack Development Mailing List (not for usage questions) >>>>> Unsubscribe: >>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>>> >>>> >>>> >>>> __________________________________________________________________________ >>>> OpenStack Development Mailing List (not for usage questions) >>>> Unsubscribe: >>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>> >>> >>> -- >>> Kevin Benton >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > > -- > Kevin Benton > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev