Hi Phil, My understanding is that, (forget Nova for a second) in a perfect eventlet world, a green thread is either doing CPU intensive computing, or wait in system calls that are IO related. In the latter case, the eventlet scheduler will suspend the green thread and switch to another green thread that is ready to run.
Back to reality, as you mentioned this is broken - some IO bound activity won't cause an eventlet switch. To me the only possibility that happens is the same reason those MySQL calls are blocking - we are using C-based modules that don't respect monkey patch and never yield. I'm suspecting that all libvirt based calls also belong to this category. Now if those blocking calls can finish in a very short of time (as we assume for DB calls), then I think inserting a sleep(0) after every blocking call should be a quick fix to the problem. But if it's a long blocking call like the snapshot case, we are probably screwed anyway and need OS thread level parallelism or multiprocessing to make it truly non-blocking.. Thanks, Yun On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil <philip....@hp.com> wrote: > Hi Yun, > > The point of the sleep(0) is to explicitly yield from a long running eventlet > to so that other eventlets aren't blocked for a long period. Depending on > how you look at that either means we're making an explicit judgement on > priority, or trying to provide a more equal sharing of run-time across > eventlets. > > It's not that things are CPU bound as such - more just that eventlets have > every few pre-emption points. Even an IO bound activity like creating a > snapshot won't cause an eventlet switch. > > So in terms of priority we're trying to get to the state where: > - Important periodic events (such as service status) run when expected (if > these take a long time we're stuffed anyway) > - User initiated actions don't get blocked by background system eventlets > (such as refreshing power-state) > - Slow action from one user don't block actions from other users (the first > user will expect their snapshot to take X seconds, the second one won't > expect their VM creation to take X + Y seconds). > > It almost feels like the right level of concurrency would be to have a > task/process running for each VM, so that there is concurrency across > un-related VMs, but serialisation for each VM. > > Phil > > -----Original Message----- > From: Yun Mao [mailto:yun...@gmail.com] > Sent: 02 March 2012 20:32 > To: Day, Phil > Cc: Chris Behrens; Joshua Harlow; openstack > Subject: Re: [Openstack] eventlet weirdness > > Hi Phil, I'm a little confused. To what extend does sleep(0) help? > > It only gives the greenlet scheduler a chance to switch to another green > thread. If we are having a CPU bound issue, sleep(0) won't give us access to > any more CPU cores. So the total time to finish should be the same no matter > what. It may improve the fairness among different green threads but shouldn't > help the throughput. I think the only apparent gain to me is situation such > that there is 1 green thread with long CPU time and many other green threads > with small CPU time. > The total finish time will be the same with or without sleep(0), but with > sleep in the first threads, the others should be much more responsive. > > However, it's unclear to me which part of Nova is very CPU intensive. > It seems that most work here is IO bound, including the snapshot. Do we have > other blocking calls besides mysql access? I feel like I'm missing something > but couldn't figure out what. > > Thanks, > > Yun > > > On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil <philip....@hp.com> wrote: >> I didn't say it was pretty - Given the choice I'd much rather have a >> threading model that really did concurrency and pre-emption all the right >> places, and it would be really cool if something managed the threads that >> were started so that is a second conflicting request was received it did >> some proper tidy up or blocking rather than just leaving the race condition >> to work itself out (then we wouldn't have to try and control it by checking >> vm_state). >> >> However ... In the current code base where we only have user space based >> eventlets, with no pre-emption, and some activities that need to be >> prioritised then forcing pre-emption with a sleep(0) seems a pretty small >> bit of untidy. And it works now without a major code refactor. >> >> Always open to other approaches ... >> >> Phil >> >> >> -----Original Message----- >> From: openstack-bounces+philip.day=hp....@lists.launchpad.net >> [mailto:openstack-bounces+philip.day=hp....@lists.launchpad.net] On >> Behalf Of Chris Behrens >> Sent: 02 March 2012 19:00 >> To: Joshua Harlow >> Cc: openstack; Chris Behrens >> Subject: Re: [Openstack] eventlet weirdness >> >> It's not just you >> >> >> On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: >> >>> Does anyone else feel that the following seems really "dirty", or is it >>> just me. >>> >>> "adding a few sleep(0) calls in various places in the Nova codebase >>> (as was recently added in the _sync_power_states() periodic task) is >>> an easy and simple win with pretty much no ill side-effects. :)" >>> >>> Dirty in that it feels like there is something wrong from a design point of >>> view. >>> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. >>> But that's just my gut feeling. >>> >>> :-( >>> >>> On 3/2/12 8:26 AM, "Armando Migliaccio" <armando.migliac...@eu.citrix.com> >>> wrote: >>> >>> I knew you'd say that :P >>> >>> There you go: https://bugs.launchpad.net/nova/+bug/944145 >>> >>> Cheers, >>> Armando >>> >>> > -----Original Message----- >>> > From: Jay Pipes [mailto:jaypi...@gmail.com] >>> > Sent: 02 March 2012 16:22 >>> > To: Armando Migliaccio >>> > Cc: firstname.lastname@example.org >>> > Subject: Re: [Openstack] eventlet weirdness >>> > >>> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: >>> > > I'd be cautious to say that no ill side-effects were introduced. >>> > > I found a >>> > race condition right in the middle of sync_power_states, which I >>> > assume was exposed by "breaking" the task deliberately. >>> > >>> > Such a party-pooper! ;) >>> > >>> > Got a link to the bug report for me? >>> > >>> > Thanks! >>> > -jay >>> >>> _______________________________________________ >>> Mailing list: https://launchpad.net/~openstack Post to : >>> email@example.com Unsubscribe : >>> https://launchpad.net/~openstack More help : >>> https://help.launchpad.net/ListHelp >>> >>> _______________________________________________ >>> Mailing list: https://launchpad.net/~openstack Post to : >>> firstname.lastname@example.org Unsubscribe : >>> https://launchpad.net/~openstack More help : >>> https://help.launchpad.net/ListHelp >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack Post to : >> email@example.com Unsubscribe : >> https://launchpad.net/~openstack More help : >> https://help.launchpad.net/ListHelp >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack Post to : >> firstname.lastname@example.org Unsubscribe : >> https://launchpad.net/~openstack More help : >> https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : email@example.com Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp