Re: [Openstack] eventlet weirdness
On 03/05/2012 08:30 PM, Adam Young wrote: The only time sleep() as called from Python code is going to help you is if you have a long running stretch of Python code, and you sleep() in the middle of it. That's exactly where the greenthread.sleep(0) call in question was used: inside a (potentially) long-running loop in _sync_power_states()... Best, -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> If the libvirt API (or other Native API) has an async mode, what you > can do is provide a synchronos, python based wrapper that does the > following. > > register_request callback() > async_call() > sleep() > > This can be set up like a more traditional multi-threaded model as well. You can eventlet.sleep while waiting for the callback handler to notify the greenthread. This of course assumes your i/o and callback are running in a different pthread (eventlet.tpool is fine). So it looks more like: condition = threading.Condition() # or something like it register_request_callback(condition) async_call() condition.wait() I found this post to be enormously helpful in understanding some of the nuances of dealing with green thread and process thread synchronization and communication: http://blog.devork.be/2011/03/synchronising-eventlets-and-threads.html Devin ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On 03/05/2012 05:08 PM, Yun Mao wrote: Hi Phil, My understanding is that, (forget Nova for a second) in a perfect eventlet world, a green thread is either doing CPU intensive computing, or wait in system calls that are IO related. In the latter case, the eventlet scheduler will suspend the green thread and switch to another green thread that is ready to run. Back to reality, as you mentioned this is broken - some IO bound activity won't cause an eventlet switch. To me the only possibility that happens is the same reason those MySQL calls are blocking - we are using C-based modules that don't respect monkey patch and never yield. I'm suspecting that all libvirt based calls also belong to this category. Agree. I expect that to be the case of any native library. Monkey patching only changes the Python side of the call, anything in native code is too far along for it to be redirected. Now if those blocking calls can finish in a very short of time (as we assume for DB calls), then I think inserting a sleep(0) after every blocking call should be a quick fix to the problem. Nope. The blocking call still blocks, then it returns, hits the sleep, and is scheduled. The only option is to wrap it with a thread pool. From an OS perspective, there are no such things as greenthreads. The same task_struct in the Linux Kernel (representing a Posix thread) that manages the body of the web application is used to process the IO. The Linux thread goes into a sleep state until the IO comes back, and the Kernel scheduler will schedule another OS process or task. In order to get both the IO to complete and the greenthread scheudler to process another greenthread, you need to have two Posix threads. If the libvirt API (or other Native API) has an async mode, what you can do is provide a synchronos, python based wrapper that does the following. register_request callback() async_call() sleep() The only time sleep() as called from Python code is going to help you is if you have a long running stretch of Python code, and you sleep() in the middle of it. But if it's a long blocking call like the snapshot case, we are probably screwed anyway and need OS thread level parallelism or multiprocessing to make it truly non-blocking.. Thanks, Yep. Yun On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil wrote: Hi Yun, The point of the sleep(0) is to explicitly yield from a long running eventlet to so that other eventlets aren't blocked for a long period. Depending on how you look at that either means we're making an explicit judgement on priority, or trying to provide a more equal sharing of run-time across eventlets. It's not that things are CPU bound as such - more just that eventlets have every few pre-emption points.Even an IO bound activity like creating a snapshot won't cause an eventlet switch. So in terms of priority we're trying to get to the state where: - Important periodic events (such as service status) run when expected (if these take a long time we're stuffed anyway) - User initiated actions don't get blocked by background system eventlets (such as refreshing power-state) - Slow action from one user don't block actions from other users (the first user will expect their snapshot to take X seconds, the second one won't expect their VM creation to take X + Y seconds). It almost feels like the right level of concurrency would be to have a task/process running for each VM, so that there is concurrency across un-related VMs, but serialisation for each VM. Phil -Original Message- From: Yun Mao [mailto:yun...@gmail.com] Sent: 02 March 2012 20:32 To: Day, Phil Cc: Chris Behrens; Joshua Harlow; openstack Subject: Re: [Openstack] eventlet weirdness Hi Phil, I'm a little confused. To what extend does sleep(0) help? It only gives the greenlet scheduler a chance to switch to another green thread. If we are having a CPU bound issue, sleep(0) won't give us access to any more CPU cores. So the total time to finish should be the same no matter what. It may improve the fairness among different green threads but shouldn't help the throughput. I think the only apparent gain to me is situation such that there is 1 green thread with long CPU time and many other green threads with small CPU time. The total finish time will be the same with or without sleep(0), but with sleep in the first threads, the others should be much more responsive. However, it's unclear to me which part of Nova is very CPU intensive. It seems that most work here is IO bound, including the snapshot. Do we have other blocking calls besides mysql access? I feel like I'm missing something but couldn't figure out what. Thanks, Yun On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil wrote: I didn't say it was pretty - Given the choice I'd much rather have a threading model that really di
Re: [Openstack] eventlet weirdness
Hi Phil, My understanding is that, (forget Nova for a second) in a perfect eventlet world, a green thread is either doing CPU intensive computing, or wait in system calls that are IO related. In the latter case, the eventlet scheduler will suspend the green thread and switch to another green thread that is ready to run. Back to reality, as you mentioned this is broken - some IO bound activity won't cause an eventlet switch. To me the only possibility that happens is the same reason those MySQL calls are blocking - we are using C-based modules that don't respect monkey patch and never yield. I'm suspecting that all libvirt based calls also belong to this category. Now if those blocking calls can finish in a very short of time (as we assume for DB calls), then I think inserting a sleep(0) after every blocking call should be a quick fix to the problem. But if it's a long blocking call like the snapshot case, we are probably screwed anyway and need OS thread level parallelism or multiprocessing to make it truly non-blocking.. Thanks, Yun On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil wrote: > Hi Yun, > > The point of the sleep(0) is to explicitly yield from a long running eventlet > to so that other eventlets aren't blocked for a long period. Depending on > how you look at that either means we're making an explicit judgement on > priority, or trying to provide a more equal sharing of run-time across > eventlets. > > It's not that things are CPU bound as such - more just that eventlets have > every few pre-emption points. Even an IO bound activity like creating a > snapshot won't cause an eventlet switch. > > So in terms of priority we're trying to get to the state where: > - Important periodic events (such as service status) run when expected (if > these take a long time we're stuffed anyway) > - User initiated actions don't get blocked by background system eventlets > (such as refreshing power-state) > - Slow action from one user don't block actions from other users (the first > user will expect their snapshot to take X seconds, the second one won't > expect their VM creation to take X + Y seconds). > > It almost feels like the right level of concurrency would be to have a > task/process running for each VM, so that there is concurrency across > un-related VMs, but serialisation for each VM. > > Phil > > -Original Message- > From: Yun Mao [mailto:yun...@gmail.com] > Sent: 02 March 2012 20:32 > To: Day, Phil > Cc: Chris Behrens; Joshua Harlow; openstack > Subject: Re: [Openstack] eventlet weirdness > > Hi Phil, I'm a little confused. To what extend does sleep(0) help? > > It only gives the greenlet scheduler a chance to switch to another green > thread. If we are having a CPU bound issue, sleep(0) won't give us access to > any more CPU cores. So the total time to finish should be the same no matter > what. It may improve the fairness among different green threads but shouldn't > help the throughput. I think the only apparent gain to me is situation such > that there is 1 green thread with long CPU time and many other green threads > with small CPU time. > The total finish time will be the same with or without sleep(0), but with > sleep in the first threads, the others should be much more responsive. > > However, it's unclear to me which part of Nova is very CPU intensive. > It seems that most work here is IO bound, including the snapshot. Do we have > other blocking calls besides mysql access? I feel like I'm missing something > but couldn't figure out what. > > Thanks, > > Yun > > > On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil wrote: >> I didn't say it was pretty - Given the choice I'd much rather have a >> threading model that really did concurrency and pre-emption all the right >> places, and it would be really cool if something managed the threads that >> were started so that is a second conflicting request was received it did >> some proper tidy up or blocking rather than just leaving the race condition >> to work itself out (then we wouldn't have to try and control it by checking >> vm_state). >> >> However ... In the current code base where we only have user space based >> eventlets, with no pre-emption, and some activities that need to be >> prioritised then forcing pre-emption with a sleep(0) seems a pretty small >> bit of untidy. And it works now without a major code refactor. >> >> Always open to other approaches ... >> >> Phil >> >> >> -Original Message- >> From: openstack-bounces+philip.day=hp@lists.launchpad.net >> [mailto:openstack-bo
Re: [Openstack] eventlet weirdness
"Eric Windisch" said: >> an rpc implementation that writes to disk and returns, > > A what? I'm not sure what problem you're looking to solve here or what you > think > the RPC mechanism should do. Perhaps you're speaking of a Kombu or AMQP > specific > improvement? > > There is no absolute need for persistence or durability in RPC. I've done > quite a > bit of analysis of this requirement and it simply isn't necessary. There is > some > need in AMQP for this due to implementation-specific issues, but not > necessarily > unsolvable. However, these problems simply do not exist for all RPC > implementations... This was a side issue and I probably should have left it out of my email. I wasn't angling for persistence at all here. Rather I was thinking that I sometimes see rpc casts taking 10-20 ms in nova-api, and I wonder if we could pare that down without harming reliability by writing casts to a local resource and streaming them over the network in the background. I'm guessing if that local resource is disk with fsyncs between each write, there would likely be a performance degradation, so I'm not advocating that. But without fsyncs seemed like it might be okay. Maybe this is just silly and you're about to tell me how it's all a bad idea anyway :-) ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> an rpc implementation that writes to disk and returns, A what? I'm not sure what problem you're looking to solve here or what you think the RPC mechanism should do. Perhaps you're speaking of a Kombu or AMQP specific improvement? There is no absolute need for persistence or durability in RPC. I've done quite a bit of analysis of this requirement and it simply isn't necessary. There is some need in AMQP for this due to implementation-specific issues, but not necessarily unsolvable. However, these problems simply do not exist for all RPC implementations... -- Eric Windisch ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> However I'd like to point out that the math below is misleading (the average > time for the non-blocking case is also miscalculated but > it's not my point). The number that matters more in real life is throughput. > For the blocking case it's 3/30 = 0.1 request per second. I think it depends on whether you are trying to characterise system performance (processing time) or perceived user experience (queuing time + processing time). My users are kind of selfish in that they don't care how many transactions per second I can get through, just how long it takes for them to get a response from when they submit the request. Making the DB calls non-blocking does help a very small bit in driving up API server utilisation - but my point was that time spent in the DB is such a small part of the total time in the API server that it's not the thing that needs to be optimised first. Any queuing system will explode when its utilisation approaches 100%, blocking or not. Moving to non-blocking just means that you can hit 100% utilisation in the API server with 2 concurrent requests instead of *only* being able to hit 90+% with one transition. That's not a great leap forward in my perception. Phil -Original Message- From: Yun Mao [mailto:yun...@gmail.com] Sent: 03 March 2012 01:11 To: Day, Phil Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness First I agree that having blocking DB calls is no big deal given the way Nova uses mysql and reasonably powerful db server hardware. However I'd like to point out that the math below is misleading (the average time for the nonblocking case is also miscalculated but it's not my point). The number that matters more in real life is throughput. For the blocking case it's 3/30 = 0.1 request per second. For the non-blocking case it's 3/27=0.11 requests per second. That means if there is a request coming in every 9 seconds constantly, the blocking system will eventually explode but the nonblocking system can still handle it. Therefore, the non-blocking one should be preferred. Thanks, Yun > > For example in the API server (before we made it properly multi-threaded) > with blocking db calls the server was essentially a serial processing queue - > each request was fully processed before the next. With non-blocking db calls > we got a lot more apparent concurrencybut only at the expense of making all > of the requests equally bad. > > Consider a request takes 10 seconds, where after 5 seconds there is a call to > the DB which takes 1 second, and three are started at the same time: > > Blocking: > 0 - Request 1 starts > 10 - Request 1 completes, request 2 starts > 20 - Request 2 completes, request 3 starts > 30 - Request 3 competes > Request 1 completes in 10 seconds > Request 2 completes in 20 seconds > Request 3 completes in 30 seconds > Ave time: 20 sec > > > Non-blocking > 0 - Request 1 Starts > 5 - Request 1 gets to db call, request 2 starts > 10 - Request 2 gets to db call, request 3 starts > 15 - Request 3 gets to db call, request 1 resumes > 19 - Request 1 completes, request 2 resumes > 23 - Request 2 completes, request 3 resumes > 27 - Request 3 completes > > Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes > in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 > seconds) Ave time: 20 sec > > So instead of worrying about making db calls non-blocking we've been working > to make certain eventlets non-blocking - i.e. add sleep(0) calls to long > running iteration loops - which IMO has a much bigger impact on the > performance of the apparent latency of the system.>>>> Thanks for the > explanation. Let me see if I understand this. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Hi Yun, The point of the sleep(0) is to explicitly yield from a long running eventlet to so that other eventlets aren't blocked for a long period. Depending on how you look at that either means we're making an explicit judgement on priority, or trying to provide a more equal sharing of run-time across eventlets. It's not that things are CPU bound as such - more just that eventlets have every few pre-emption points.Even an IO bound activity like creating a snapshot won't cause an eventlet switch. So in terms of priority we're trying to get to the state where: - Important periodic events (such as service status) run when expected (if these take a long time we're stuffed anyway) - User initiated actions don't get blocked by background system eventlets (such as refreshing power-state) - Slow action from one user don't block actions from other users (the first user will expect their snapshot to take X seconds, the second one won't expect their VM creation to take X + Y seconds). It almost feels like the right level of concurrency would be to have a task/process running for each VM, so that there is concurrency across un-related VMs, but serialisation for each VM. Phil -Original Message- From: Yun Mao [mailto:yun...@gmail.com] Sent: 02 March 2012 20:32 To: Day, Phil Cc: Chris Behrens; Joshua Harlow; openstack Subject: Re: [Openstack] eventlet weirdness Hi Phil, I'm a little confused. To what extend does sleep(0) help? It only gives the greenlet scheduler a chance to switch to another green thread. If we are having a CPU bound issue, sleep(0) won't give us access to any more CPU cores. So the total time to finish should be the same no matter what. It may improve the fairness among different green threads but shouldn't help the throughput. I think the only apparent gain to me is situation such that there is 1 green thread with long CPU time and many other green threads with small CPU time. The total finish time will be the same with or without sleep(0), but with sleep in the first threads, the others should be much more responsive. However, it's unclear to me which part of Nova is very CPU intensive. It seems that most work here is IO bound, including the snapshot. Do we have other blocking calls besides mysql access? I feel like I'm missing something but couldn't figure out what. Thanks, Yun On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil wrote: > I didn't say it was pretty - Given the choice I'd much rather have a > threading model that really did concurrency and pre-emption all the right > places, and it would be really cool if something managed the threads that > were started so that is a second conflicting request was received it did some > proper tidy up or blocking rather than just leaving the race condition to > work itself out (then we wouldn't have to try and control it by checking > vm_state). > > However ... In the current code base where we only have user space based > eventlets, with no pre-emption, and some activities that need to be > prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit > of untidy. And it works now without a major code refactor. > > Always open to other approaches ... > > Phil > > > -Original Message- > From: openstack-bounces+philip.day=hp@lists.launchpad.net > [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On > Behalf Of Chris Behrens > Sent: 02 March 2012 19:00 > To: Joshua Harlow > Cc: openstack; Chris Behrens > Subject: Re: [Openstack] eventlet weirdness > > It's not just you > > > On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > >> Does anyone else feel that the following seems really "dirty", or is it just >> me. >> >> "adding a few sleep(0) calls in various places in the Nova codebase >> (as was recently added in the _sync_power_states() periodic task) is >> an easy and simple win with pretty much no ill side-effects. :)" >> >> Dirty in that it feels like there is something wrong from a design point of >> view. >> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. >> But that's just my gut feeling. >> >> :-( >> >> On 3/2/12 8:26 AM, "Armando Migliaccio" >> wrote: >> >> I knew you'd say that :P >> >> There you go: https://bugs.launchpad.net/nova/+bug/944145 >> >> Cheers, >> Armando >> >> > -Original Message- >> > From: Jay Pipes [mailto:jaypi...@gmail.com] >> > Sent: 02 March 2012 16:22 >> > To: Armando Migliaccio >> > Cc: openstack@lists.launchpad.net >>
Re: [Openstack] eventlet weirdness
Excerpts from Mark Washenberger's message of 2012-03-04 23:34:03 -0500: > While we are on the topic of api performance and the database, I have a > few thoughts I'd like to share. > > TL;DR: > - we should consider refactoring our wsgi server to leverage multiple > processors > - we could leverage compute-cell database responsibility separataion > to speedup our api database performance by several orders of magnitude > > I think the main way eventlet holds us back right now is that we have > such low utilization. The big jump with multiprocessing or threading > would be the potential to leverage more powerful hardware. Currently > nova-api probably wouldn't run any faster on bare metal than it would > run on an m1.tiny. Of course, this isn't an eventlet limitation per se > but rather we are limiting ourselves to eventlet single-processing > performance with our wsgi server implementation. This seems fairly easily remedied without code changes via usage of something like gunicorn (in multi-process single socket mode as wsgi frontend), or any generic load balancer against multiple processes. But its of limited utility unless the individual processes can handle concurrency scenarios greater than 1. I'm a bit skeptical about the use of multiprocessing, it imposes its own set of constraints and problems. Interestingly using like zmq (again with its own issues, but more robust imo than multiprocessing) allows for transparency from single process ipc to network ipc without the file handle, event loop inheritance concerns of something like multprocessing. > > However, the greatest performance improvement I see would come from > streamlining the database interactions incurred on each nova-api > request. We have been pretty fast-and-loose with adding database > and glance calls to the openstack api controllers and compute api. > I am especially thinking of the extension mechanism, which tends > to require another database call for each /servers extension a > deployer chooses to enable. > > But, if we think in ideal terms, each api request should perform > no more than 1 database call for queries, and no more than 2 db calls > for commands (validation + initial creation). In addition, I can > imagine an implementation where these database calls don't have any > joins, and involve no more than one network roundtrip. > is there any debug tooling around api endpoints that can identify these calls ala some of the wsgi middleware targeted towards web apps (ie. debugtoolbars). > Beyond refactoring the way we add in data for response extensions, > I think the right way to get this database performance is make the > compute-cells approach the "normal". In this approach, there are > at least two nova databases, one which lives along with the nova-api > nodes, and one that lives in a compute cell. The api database is kept > up to date through asynchronous updates that bubble up from the > compute cells. With this separation, we are free to tailor the schema > of the api database to match api performance needs, while we tailor > the schema of the compute cell database to the operational requirements > of compute workers. In particular, we can completely denormalize the > tables in the api database without creating unpleasant side effects > in the compute manager code. This denormalization both means fewer > database interactions and fewer joins (which likely matters for larger > deployments). > > If we partner this streamlining and denormalization approach with > similar attentions to glance performance and an rpc implementation > that writes to disk and returns, processing network activities in > the background, I think we could get most api actions to < 10 ms on > reasonable hardware. > > As much as the initial push on compute-cells is about scale, I think > it could enable major performance improvements directly on its heels > during the fulsom cycle. This is something I'd love to talk about more > at the conference if anyone has any interest. > sounds interesting, but potentially complex, with schema and data drift possibilities. cheers, Kapil ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 4, 2012, at 8:56 PM, Gabe Westmaas > I agree with this paragraph whole heartedly! I would definitely like to see > this separation not only for the reasons you list above (performance, all > installations behaving the same way) but also because I think it gives us a > lot more power to help handle seamless upgrades - another topic I'm sure we > will be discussing at the conference. > And it makes the compute cells stuff plug in a LOT more cleanly. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Pretty much +1 to all of that. The other problem I see that a separate 'view' for the API solves...is state tracking. I feel API should be keeping its own state on things. What the API allows per the spec should be completely separated from the services state tracking. As you mention, compute cells somewhat achieves this also as a side effect of its implementation. On Mar 4, 2012, at 8:34 PM, "Mark Washenberger" wrote: > While we are on the topic of api performance and the database, I have a > few thoughts I'd like to share. > > TL;DR: > - we should consider refactoring our wsgi server to leverage multiple > processors > - we could leverage compute-cell database responsibility separataion > to speedup our api database performance by several orders of magnitude > > I think the main way eventlet holds us back right now is that we have > such low utilization. The big jump with multiprocessing or threading > would be the potential to leverage more powerful hardware. Currently > nova-api probably wouldn't run any faster on bare metal than it would > run on an m1.tiny. Of course, this isn't an eventlet limitation per se > but rather we are limiting ourselves to eventlet single-processing > performance with our wsgi server implementation. > > However, the greatest performance improvement I see would come from > streamlining the database interactions incurred on each nova-api > request. We have been pretty fast-and-loose with adding database > and glance calls to the openstack api controllers and compute api. > I am especially thinking of the extension mechanism, which tends > to require another database call for each /servers extension a > deployer chooses to enable. > > But, if we think in ideal terms, each api request should perform > no more than 1 database call for queries, and no more than 2 db calls > for commands (validation + initial creation). In addition, I can > imagine an implementation where these database calls don't have any > joins, and involve no more than one network roundtrip. > > Beyond refactoring the way we add in data for response extensions, > I think the right way to get this database performance is make the > compute-cells approach the "normal". In this approach, there are > at least two nova databases, one which lives along with the nova-api > nodes, and one that lives in a compute cell. The api database is kept > up to date through asynchronous updates that bubble up from the > compute cells. With this separation, we are free to tailor the schema > of the api database to match api performance needs, while we tailor > the schema of the compute cell database to the operational requirements > of compute workers. In particular, we can completely denormalize the > tables in the api database without creating unpleasant side effects > in the compute manager code. This denormalization both means fewer > database interactions and fewer joins (which likely matters for larger > deployments). > > If we partner this streamlining and denormalization approach with > similar attentions to glance performance and an rpc implementation > that writes to disk and returns, processing network activities in > the background, I think we could get most api actions to < 10 ms on > reasonable hardware. > > As much as the initial push on compute-cells is about scale, I think > it could enable major performance improvements directly on its heels > during the fulsom cycle. This is something I'd love to talk about more > at the conference if anyone has any interest. > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> Beyond refactoring the way we add in data for response extensions, I think > the right way to get this database performance is make the compute-cells > approach the "normal". In this approach, there are at least two nova > databases, one which lives along with the nova-api nodes, and one that lives > in a compute cell. The api database is kept up to date through asynchronous > updates that bubble up from the compute cells. With this separation, we are > free to tailor the schema of the api database to match api performance > needs, while we tailor the schema of the compute cell database to the > operational requirements of compute workers. In particular, we can > completely denormalize the tables in the api database without creating > unpleasant side effects in the compute manager code. This denormalization > both means fewer database interactions and fewer joins (which likely > matters for larger deployments). I agree with this paragraph whole heartedly! I would definitely like to see this separation not only for the reasons you list above (performance, all installations behaving the same way) but also because I think it gives us a lot more power to help handle seamless upgrades - another topic I'm sure we will be discussing at the conference. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
While we are on the topic of api performance and the database, I have a few thoughts I'd like to share. TL;DR: - we should consider refactoring our wsgi server to leverage multiple processors - we could leverage compute-cell database responsibility separataion to speedup our api database performance by several orders of magnitude I think the main way eventlet holds us back right now is that we have such low utilization. The big jump with multiprocessing or threading would be the potential to leverage more powerful hardware. Currently nova-api probably wouldn't run any faster on bare metal than it would run on an m1.tiny. Of course, this isn't an eventlet limitation per se but rather we are limiting ourselves to eventlet single-processing performance with our wsgi server implementation. However, the greatest performance improvement I see would come from streamlining the database interactions incurred on each nova-api request. We have been pretty fast-and-loose with adding database and glance calls to the openstack api controllers and compute api. I am especially thinking of the extension mechanism, which tends to require another database call for each /servers extension a deployer chooses to enable. But, if we think in ideal terms, each api request should perform no more than 1 database call for queries, and no more than 2 db calls for commands (validation + initial creation). In addition, I can imagine an implementation where these database calls don't have any joins, and involve no more than one network roundtrip. Beyond refactoring the way we add in data for response extensions, I think the right way to get this database performance is make the compute-cells approach the "normal". In this approach, there are at least two nova databases, one which lives along with the nova-api nodes, and one that lives in a compute cell. The api database is kept up to date through asynchronous updates that bubble up from the compute cells. With this separation, we are free to tailor the schema of the api database to match api performance needs, while we tailor the schema of the compute cell database to the operational requirements of compute workers. In particular, we can completely denormalize the tables in the api database without creating unpleasant side effects in the compute manager code. This denormalization both means fewer database interactions and fewer joins (which likely matters for larger deployments). If we partner this streamlining and denormalization approach with similar attentions to glance performance and an rpc implementation that writes to disk and returns, processing network activities in the background, I think we could get most api actions to < 10 ms on reasonable hardware. As much as the initial push on compute-cells is about scale, I think it could enable major performance improvements directly on its heels during the fulsom cycle. This is something I'd love to talk about more at the conference if anyone has any interest. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Excerpts from Monsyne Dragon's message of 2012-03-02 16:10:01 -0500: > > On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote: > > > On 03/02/2012 05:34 AM, Day, Phil wrote: > >> In our experience (running clusters of several hundred nodes) the DB > >> performance is not generally the significant factor, so making its calls > >> non-blocking gives only a very small increase in processing capacity and > >> creates other side effects in terms of slowing all eventlets down as they > >> wait for their turn to run. > > > > Yes, I believe I said that this was the case at the last design summit -- > > or rather, I believe I said "is there any evidence that the database is a > > performance or scalability problem at all"? > > > >> That shouldn't really be surprising given that the Nova DB is pretty small > >> and MySQL is a pretty good DB - throw reasonable hardware at the DB server > >> and give it a bit of TLC from a DBA (remove deleted entries from the DB, > >> add indexes where the slow query log tells you to, etc) and it shouldn't > >> be the bottleneck in the system for performance or scalability. > > > > ++ > > > >> We use the python driver and have experimented with allowing the eventlet > >> code to make the db calls non-blocking (its not the default setting), and > >> it works, but didn't give us any significant advantage. > > > > Yep, identical results to the work that Mark Washenberger did on the same > > subject. > > > > Has anyone thought about switching to gevent? It's similar enough to > eventlet that the port shouldn't be too bad, and because it's event loop is > in C, (libevent), there are C mysql drivers (ultramysql) that will work with > it without blocking. Switching to gevent won't fix the structural problems with the codebase, that nescitated sleeps for greenlet switching. A refactoring to an architecture more amenable to decomposing api requests into discrete tasks executed that are yieldable would help. incidentally, ultramysql is not dbapi compliant, and won't work with sqlalchemy. -kapil ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
First I agree that having blocking DB calls is no big deal given the way Nova uses mysql and reasonably powerful db server hardware. However I'd like to point out that the math below is misleading (the average time for the nonblocking case is also miscalculated but it's not my point). The number that matters more in real life is throughput. For the blocking case it's 3/30 = 0.1 request per second. For the non-blocking case it's 3/27=0.11 requests per second. That means if there is a request coming in every 9 seconds constantly, the blocking system will eventually explode but the nonblocking system can still handle it. Therefore, the non-blocking one should be preferred. Thanks, Yun > > For example in the API server (before we made it properly multi-threaded) > with blocking db calls the server was essentially a serial processing queue - > each request was fully processed before the next. With non-blocking db calls > we got a lot more apparent concurrencybut only at the expense of making all > of the requests equally bad. > > Consider a request takes 10 seconds, where after 5 seconds there is a call to > the DB which takes 1 second, and three are started at the same time: > > Blocking: > 0 - Request 1 starts > 10 - Request 1 completes, request 2 starts > 20 - Request 2 completes, request 3 starts > 30 - Request 3 competes > Request 1 completes in 10 seconds > Request 2 completes in 20 seconds > Request 3 completes in 30 seconds > Ave time: 20 sec > > > Non-blocking > 0 - Request 1 Starts > 5 - Request 1 gets to db call, request 2 starts > 10 - Request 2 gets to db call, request 3 starts > 15 - Request 3 gets to db call, request 1 resumes > 19 - Request 1 completes, request 2 resumes > 23 - Request 2 completes, request 3 resumes > 27 - Request 3 completes > > Request 1 completes in 19 seconds (+ 9 seconds) > Request 2 completes in 24 seconds (+ 4 seconds) > Request 3 completes in 27 seconds (- 3 seconds) > Ave time: 20 sec > > So instead of worrying about making db calls non-blocking we've been working > to make certain eventlets non-blocking - i.e. add sleep(0) calls to long > running iteration loops - which IMO has a much bigger impact on the > performance of the apparent latency of the system. Thanks for the > explanation. Let me see if I understand this. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 2, 2012, at 2:11 PM, Duncan McGreggor wrote: > On Fri, Mar 2, 2012 at 4:10 PM, Monsyne Dragon wrote: >> >> >> Has anyone thought about switching to gevent? It's similar enough to >> eventlet that the port shouldn't be too bad, and because it's event loop is >> in C, (libevent), there are C mysql drivers (ultramysql) that will work with >> it without blocking. > > We've been exploring this possibility at DreamHost, and chatted with > some other stackers about it at various meat-space venues. Fwiw, it's > something we'd be very interested in supporting (starting with as much > test coverage as possible of eventlet's current use in OpenStack, to > ensure as pain-free a transition as possible). > > d I would be for an experimental try at this. Based on the experience of starting with twisted and moving to eventlet, I can almost guarantee that we will run into a new set of issues. Concurrency is difficult no matter which method/library you use and each change brings a new set of challenges. That said, gevent is similar enough to eventlet that I think we will at least be dealing with the same class of problems, so it might be less painful than moving to something totally different like threads, multiprocessing, or (back to) twisted. If there were significant performance benefits to switching, it would be worth exploring. I wouldn't want to devote a huge amount of time to this unless we see a significant reason to switch, so hopefully Jay gets around to testing it out. Vish ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 2, 2012, at 12:50 PM, Jay Pipes wrote: > > We are not using multiprocessing, no. > > We simply start multiple worker processes listening on the same socket, with > each worker process having an eventlet greenthread pool. > > You can see the code (taken from Swift and adapted by Chris Behrens and Brian > Waldon to use the object-oriented Server approach that Glance/Keystone/Nova > uses) here: > > https://github.com/openstack/glance/blob/master/glance/common/wsgi.py > > There is a worker = XXX configuration option that controls the number of > worker processes created on server startup. A worker value of 0 indicates to > run identically to the way Nova currently runs (one process with an eventlet > pool of greenthreads) This would be excellent to add to nova as an option for performance reasons. Especially since you can fallback to the 0 version. I'm always concerned with mixing threading and eventlet as it leads to really odd bugs, but it sounds like HP has vetted it. If we keep 0 as the default I don't see any reason why it couldn't be added. Vish___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 2, 2012 at 4:10 PM, Monsyne Dragon wrote: > > On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote: > >> On 03/02/2012 05:34 AM, Day, Phil wrote: >>> In our experience (running clusters of several hundred nodes) the DB >>> performance is not generally the significant factor, so making its calls >>> non-blocking gives only a very small increase in processing capacity and >>> creates other side effects in terms of slowing all eventlets down as they >>> wait for their turn to run. >> >> Yes, I believe I said that this was the case at the last design summit -- or >> rather, I believe I said "is there any evidence that the database is a >> performance or scalability problem at all"? >> >>> That shouldn't really be surprising given that the Nova DB is pretty small >>> and MySQL is a pretty good DB - throw reasonable hardware at the DB server >>> and give it a bit of TLC from a DBA (remove deleted entries from the DB, >>> add indexes where the slow query log tells you to, etc) and it shouldn't be >>> the bottleneck in the system for performance or scalability. >> >> ++ >> >>> We use the python driver and have experimented with allowing the eventlet >>> code to make the db calls non-blocking (its not the default setting), and >>> it works, but didn't give us any significant advantage. >> >> Yep, identical results to the work that Mark Washenberger did on the same >> subject. >> > > Has anyone thought about switching to gevent? It's similar enough to > eventlet that the port shouldn't be too bad, and because it's event loop is > in C, (libevent), there are C mysql drivers (ultramysql) that will work with > it without blocking. We've been exploring this possibility at DreamHost, and chatted with some other stackers about it at various meat-space venues. Fwiw, it's something we'd be very interested in supporting (starting with as much test coverage as possible of eventlet's current use in OpenStack, to ensure as pain-free a transition as possible). d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Why has the ship-sailed? This is software we are talking about right, there is always a v2 (X-1) ;) On 3/2/12 12:38 PM, "Caitlin Bestler" wrote: Duncan McGregor wrote: >Like so many things that are aesthetic in nature, the statement above is >misleading. Using a callback, event-based, deferred/promise oriented system is >hard for *some*. It is far, far easier for >others (myself included). >It's a matter of perception and personal preference. I would also agree that coding your application as a series of responses to events can produce code that is easier to understand and debug. And that would be a wonderful discussion if we were starting a new project. But I hope that nobody is suggesting that we rewrite all of OpenStack code away from eventlet pseudo-threading after the fact. Personally I think it was the wrong decision, but that ship has already sailed. With event-response coding it is obvious that you have to partition any one response into segments that do not take so long to Execute that they are blocking other events. That remains true when you hide your event-driven model with eventlet pseudo-threading. Inserting sleep(0) calls is the most obvious way to break up an overly event handler, given that you've already decided to obfuscate the Code to pretend that it is a thread. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
It could be over-complicated (ie an example), but its a design that lets the program think in what tasks need to be accomplished and how to order those tasks and not have to think about how those tasks are actually ran (or hopefully even what concurrency occurs). Ideally there should be no concurrency in each step, that's the whole point of having individual steps :-) A step itself shouldn't be concurrent, but the overall "action" should/could be and you leave it up to the "engine" to decide how to run those set of steps. *Just my thought*... On 3/2/12 11:38 AM, "Day, Phil" wrote: That sounds a bit over complicated to me - Having a string of tasks sounds like you still have to think about what the concurrency is within each step. There is already a good abstraction around the context of each operation - they just (I know - big just) need to be running in something that maps to kernel threads rather than user space ones. All I really want to is to allow more than one action to run at the same time. So if I have two requests to create a snapshot, why can't they both run at the same time and still allow other things to happen ? I have all these cores sitting in my compute node that there that could be used, but I'm still having to think like a punch-card programmer submitting batch jobs to the mainframe ;-) Right now creating snapshots is pretty close to a DoS attack on a compute node. From: Joshua Harlow [mailto:harlo...@yahoo-inc.com] Sent: 02 March 2012 19:23 To: Day, Phil; Chris Behrens Cc: openstack Subject: Re: [Openstack] eventlet weirdness So a thought I had was that say if the design of a component forces as part of its design the ability to be ran with threads or with eventlet or with processes. Say if u break everything up into tasks (where a task would produce some output/result/side-effect). A set of tasks could complete some action (ie, create a vm). Subtasks could be the following: 0. Validate credentials 1. Get the image 2. Call into libvirt 3. ... These "tasks", if constructed in a way that makes them stateless, and then could be chained together to form an action, then that action could be given say to a threaded "engine" that would know how to execute those tasks with threads, or it could be given to an eventlet "engine" that would do the same with evenlet pool/greenthreads/coroutings, or with processes (and so on). This could be one way the design of your code abstracts that kind of execution (where eventlet is abstracted away from the actual work being done, instead of popping up in calls to sleep(0), ie the leaky abstraction). On 3/2/12 11:08 AM, "Day, Phil" wrote: I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state). However ... In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy. And it works now without a major code refactor. Always open to other approaches ... Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Chris Behrens Sent: 02 March 2012 19:00 To: Joshua Harlow Cc: openstack; Chris Behrens Subject: Re: [Openstack] eventlet weirdness It's not just you On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really "dirty", or is it just > me. > > "adding a few sleep(0) calls in various places in the Nova codebase > (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :)" > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. > But that's just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack]
Re: [Openstack] eventlet weirdness
On 03/02/2012 04:10 PM, Monsyne Dragon wrote: Has anyone thought about switching to gevent? It's similar enough to eventlet that the port shouldn't be too bad, and because it's event loop is in C, (libevent), there are C mysql drivers (ultramysql) that will work with it without blocking. Yep, I've thought about doing an experimental branch in Glance to see if there's a decent performance benefit. Just got stymied by that damn 24 hour limit in a day :( Damn ratelimiting. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote: > On 03/02/2012 05:34 AM, Day, Phil wrote: >> In our experience (running clusters of several hundred nodes) the DB >> performance is not generally the significant factor, so making its calls >> non-blocking gives only a very small increase in processing capacity and >> creates other side effects in terms of slowing all eventlets down as they >> wait for their turn to run. > > Yes, I believe I said that this was the case at the last design summit -- or > rather, I believe I said "is there any evidence that the database is a > performance or scalability problem at all"? > >> That shouldn't really be surprising given that the Nova DB is pretty small >> and MySQL is a pretty good DB - throw reasonable hardware at the DB server >> and give it a bit of TLC from a DBA (remove deleted entries from the DB, add >> indexes where the slow query log tells you to, etc) and it shouldn't be the >> bottleneck in the system for performance or scalability. > > ++ > >> We use the python driver and have experimented with allowing the eventlet >> code to make the db calls non-blocking (its not the default setting), and it >> works, but didn't give us any significant advantage. > > Yep, identical results to the work that Mark Washenberger did on the same > subject. > Has anyone thought about switching to gevent? It's similar enough to eventlet that the port shouldn't be too bad, and because it's event loop is in C, (libevent), there are C mysql drivers (ultramysql) that will work with it without blocking. >> For example in the API server (before we made it properly multi-threaded) > > By "properly multi-threaded" are you instead referring to making the nova-api > server multi-*processed* with eventlet greenthread pools in each process? > i.e. The way Swift (and now Glance) works? Or are you referring to a > different approach entirely? > > > with blocking db calls the server was essentially a serial processing queue > > - each request was fully processed before the next. With non-blocking db > > calls we got a lot more apparent concurrencybut only at the expense of > > making all of the requests equally bad. > > Yep, not surprising. > >> Consider a request takes 10 seconds, where after 5 seconds there is a call >> to the DB which takes 1 second, and three are started at the same time: >> >> Blocking: >> 0 - Request 1 starts >> 10 - Request 1 completes, request 2 starts >> 20 - Request 2 completes, request 3 starts >> 30 - Request 3 competes >> Request 1 completes in 10 seconds >> Request 2 completes in 20 seconds >> Request 3 completes in 30 seconds >> Ave time: 20 sec >> >> Non-blocking >> 0 - Request 1 Starts >> 5 - Request 1 gets to db call, request 2 starts >> 10 - Request 2 gets to db call, request 3 starts >> 15 - Request 3 gets to db call, request 1 resumes >> 19 - Request 1 completes, request 2 resumes >> 23 - Request 2 completes, request 3 resumes >> 27 - Request 3 completes >> >> Request 1 completes in 19 seconds (+ 9 seconds) >> Request 2 completes in 24 seconds (+ 4 seconds) >> Request 3 completes in 27 seconds (- 3 seconds) >> Ave time: 20 sec >> >> So instead of worrying about making db calls non-blocking we've been working >> to make certain eventlets non-blocking - i.e. add sleep(0) calls to long >> running iteration loops - which IMO has a much bigger impact on the >> performance of the apparent latency of the system. > > Yep, and I think adding a few sleep(0) calls in various places in the Nova > codebase (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :) > > Curious... do you have a list of all the places where sleep(0) calls were > inserted in the HP Nova code? I can turn that into a bug report and get to > work on adding them... > > All the best, > -jay > >> Phil >> >> >> >> -Original Message- >> From: openstack-bounces+philip.day=hp@lists.launchpad.net >> [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf >> Of Brian Lamar >> Sent: 01 March 2012 21:31 >> To: openstack@lists.launchpad.net >> Subject: Re: [Openstack] eventlet weirdness >> >>>> How is MySQL access handled in eventlet? Presumably it's external C >>>> library so it's not going to be monkey patched. Does that make every >>>> db access call a blocking call?
Re: [Openstack] eventlet weirdness
On 03/02/2012 03:38 PM, Caitlin Bestler wrote: Duncan McGregor wrote: Like so many things that are aesthetic in nature, the statement above is misleading. Using a callback, event-based, deferred/promise oriented system is hard for *some*. It is far, far easier for>others (myself included). It's a matter of perception and personal preference. I would also agree that coding your application as a series of responses to events can produce code that is easier to understand and debug. And that would be a wonderful discussion if we were starting a new project. But I hope that nobody is suggesting that we rewrite all of OpenStack code away from eventlet pseudo-threading after the fact. Personally I think it was the wrong decision, but that ship has already sailed. Yep, that ship has sailed more than 12 months ago. With event-response coding it is obvious that you have to partition any one response into segments that do not take so long to Execute that they are blocking other events. That remains true when you hide your event-driven model with eventlet pseudo-threading. Inserting sleep(0) calls is the most obvious way to break up an overly event handler, given that you've already decided to obfuscate the Code to pretend that it is a thread. I assume you meant "an overly greedy event handler" above? -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On 03/02/2012 01:35 PM, Joshua Harlow wrote: Does anyone else feel that the following seems really “dirty”, or is it just me. “adding a few sleep(0) calls in various places in the Nova codebase (as was recently added in the _sync_power_states() periodic task) is an easy and simple win with pretty much no ill side-effects. :)” Dirty in that it feels like there is something wrong from a design point of view. Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. But that’s just my gut feeling. It's not really all that dirty, IMHO. You just have to think of greenlet.sleep(0) as manually yielding control back to eventlet... Like Phil said, in the absence of a non-userspace threading model and thread scheduler, there's not a whole lot else one can do other than be mindful of what functions/methods may run for long periods of time and/or block I/O and call sleep(0) in those scenarios where it makes sense to yield a timeslice back to other processes. While it's true that eventlet (and to an extent Twisted) mask some of the complexities involved in non-blocking I/O in a threaded(-like) application programming model, I don't think there will be an eventlet-that-knows-what-methods-should-yield-and-which-should-be-prioritized library any time soon. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On 03/02/2012 02:27 PM, Vishvananda Ishaya wrote: On Mar 2, 2012, at 7:54 AM, Day, Phil wrote: By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely? Yep - following your posting in here pointing to the glance changes we back-ported that into the Diablo API server. We're now running each API server with 20 OS processes and 20 EC2 processes, and the world looks a lot happier. The same changes were being done in parallel into Essex by someone in the community I thought ? Can you or jay write up what this would entail in nova? (or even ship a diff) Are you using multiprocessing? In general we have had issues combining multiprocessing and eventlet, so in our deploys we run multiple api servers on different ports and load balance with ha proxy. It sounds like what you have is working though, so it would be nice to put it in (perhaps with a flag gate) if possible. We are not using multiprocessing, no. We simply start multiple worker processes listening on the same socket, with each worker process having an eventlet greenthread pool. You can see the code (taken from Swift and adapted by Chris Behrens and Brian Waldon to use the object-oriented Server approach that Glance/Keystone/Nova uses) here: https://github.com/openstack/glance/blob/master/glance/common/wsgi.py There is a worker = XXX configuration option that controls the number of worker processes created on server startup. A worker value of 0 indicates to run identically to the way Nova currently runs (one process with an eventlet pool of greenthreads) Best, -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 02, 2012, Duncan McGreggor wrote: > On Fri, Mar 2, 2012 at 2:40 PM, Johannes Erdfelt wrote: > > Twisted has a much harder programming model with the same blocking > > problem that eventlet has. > > Like so many things that are aesthetic in nature, the statement above > is misleading. Using a callback, event-based, deferred/promise > oriented system is hard for *some*. It is far, far easier for others > (myself included). > > It's a matter of perception and personal preference. > > It may be apropos to mention that Guido van Rossum himself has stated > that he shares the same view of concurrent programming in Python as > Glyph (the founder of Twisted): > https://plus.google.com/115212051037621986145/posts/a9SqS7faVWC > > Glyph's post, if you can't see that G+ link: > > http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html > > One thing to keep in mind is that with Twisted, you always have the > option of deferring to a thread for operations are not async-friendly. It's a shame that post chooses to ignore eventlet-style concurrency. It has all of the benefits of being almost as clear where concurrency can occur without needing a macro key to constantly output 'yield'. It also integrates with other python libraries better (but obviously not perfectly). Using coroutines for concurrency is anti-social programming. It excludes a whole suite of libraries merely because they didn't conform to your programming model. However, this is the wrong discussion to be having. Concurrency isn't the problem we should be worried about, it's isolation. If we can sufficiently isolate the work that each daemon needs to do, then concurrency is trivial. In the best case, they can be separate processes and we don't need to worry about a programming model. If we're not being too optimistic then threads with minimal locking is most likely. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 2, 2012 at 3:38 PM, Caitlin Bestler wrote: > Duncan McGregor wrote: > >>Like so many things that are aesthetic in nature, the statement above is >>misleading. Using a callback, event-based, deferred/promise oriented system >>is hard for *some*. It is far, far easier for >others (myself included). > >>It's a matter of perception and personal preference. > > I would also agree that coding your application as a series of responses to > events can produce code that is easier to understand and debug. > And that would be a wonderful discussion if we were starting a new project. > > But I hope that nobody is suggesting that we rewrite all of OpenStack code > away from eventlet pseudo-threading after the fact. > Personally I think it was the wrong decision, but that ship has already > sailed. Agreed. d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Duncan McGregor wrote: >Like so many things that are aesthetic in nature, the statement above is >misleading. Using a callback, event-based, deferred/promise oriented system is >hard for *some*. It is far, far easier for >others (myself included). >It's a matter of perception and personal preference. I would also agree that coding your application as a series of responses to events can produce code that is easier to understand and debug. And that would be a wonderful discussion if we were starting a new project. But I hope that nobody is suggesting that we rewrite all of OpenStack code away from eventlet pseudo-threading after the fact. Personally I think it was the wrong decision, but that ship has already sailed. With event-response coding it is obvious that you have to partition any one response into segments that do not take so long to Execute that they are blocking other events. That remains true when you hide your event-driven model with eventlet pseudo-threading. Inserting sleep(0) calls is the most obvious way to break up an overly event handler, given that you've already decided to obfuscate the Code to pretend that it is a thread. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Hi Phil, I'm a little confused. To what extend does sleep(0) help? It only gives the greenlet scheduler a chance to switch to another green thread. If we are having a CPU bound issue, sleep(0) won't give us access to any more CPU cores. So the total time to finish should be the same no matter what. It may improve the fairness among different green threads but shouldn't help the throughput. I think the only apparent gain to me is situation such that there is 1 green thread with long CPU time and many other green threads with small CPU time. The total finish time will be the same with or without sleep(0), but with sleep in the first threads, the others should be much more responsive. However, it's unclear to me which part of Nova is very CPU intensive. It seems that most work here is IO bound, including the snapshot. Do we have other blocking calls besides mysql access? I feel like I'm missing something but couldn't figure out what. Thanks, Yun On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil wrote: > I didn't say it was pretty - Given the choice I'd much rather have a > threading model that really did concurrency and pre-emption all the right > places, and it would be really cool if something managed the threads that > were started so that is a second conflicting request was received it did some > proper tidy up or blocking rather than just leaving the race condition to > work itself out (then we wouldn't have to try and control it by checking > vm_state). > > However ... In the current code base where we only have user space based > eventlets, with no pre-emption, and some activities that need to be > prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit > of untidy. And it works now without a major code refactor. > > Always open to other approaches ... > > Phil > > > -Original Message- > From: openstack-bounces+philip.day=hp@lists.launchpad.net > [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of > Chris Behrens > Sent: 02 March 2012 19:00 > To: Joshua Harlow > Cc: openstack; Chris Behrens > Subject: Re: [Openstack] eventlet weirdness > > It's not just you > > > On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > >> Does anyone else feel that the following seems really "dirty", or is it just >> me. >> >> "adding a few sleep(0) calls in various places in the Nova codebase >> (as was recently added in the _sync_power_states() periodic task) is >> an easy and simple win with pretty much no ill side-effects. :)" >> >> Dirty in that it feels like there is something wrong from a design point of >> view. >> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. >> But that's just my gut feeling. >> >> :-( >> >> On 3/2/12 8:26 AM, "Armando Migliaccio" >> wrote: >> >> I knew you'd say that :P >> >> There you go: https://bugs.launchpad.net/nova/+bug/944145 >> >> Cheers, >> Armando >> >> > -Original Message- >> > From: Jay Pipes [mailto:jaypi...@gmail.com] >> > Sent: 02 March 2012 16:22 >> > To: Armando Migliaccio >> > Cc: openstack@lists.launchpad.net >> > Subject: Re: [Openstack] eventlet weirdness >> > >> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: >> > > I'd be cautious to say that no ill side-effects were introduced. I >> > > found a >> > race condition right in the middle of sync_power_states, which I >> > assume was exposed by "breaking" the task deliberately. >> > >> > Such a party-pooper! ;) >> > >> > Got a link to the bug report for me? >> > >> > Thanks! >> > -jay >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 2, 2012 at 2:40 PM, Johannes Erdfelt wrote: > On Fri, Mar 02, 2012, Armando Migliaccio > wrote: >> I agree, but then the whole assumption of adopting eventlet to simplify >> the programming model is hindered by the fact that one has to think >> harder to what is doing...Nova could've kept Twisted for that matter. >> The programming model would have been harder, but at least it would >> have been cleaner and free from icky patching (that's my own opinion >> anyway). > > Twisted has a much harder programming model with the same blocking > problem that eventlet has. Like so many things that are aesthetic in nature, the statement above is misleading. Using a callback, event-based, deferred/promise oriented system is hard for *some*. It is far, far easier for others (myself included). It's a matter of perception and personal preference. It may be apropos to mention that Guido van Rossum himself has stated that he shares the same view of concurrent programming in Python as Glyph (the founder of Twisted): https://plus.google.com/115212051037621986145/posts/a9SqS7faVWC Glyph's post, if you can't see that G+ link: http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html One thing to keep in mind is that with Twisted, you always have the option of deferring to a thread for operations are not async-friendly. d ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> > I agree, but then the whole assumption of adopting eventlet to simplify the > programming model is hindered by the fact that one has to think harder to > what is doing...Nova could've kept Twisted for that matter. The programming > model would have been harder, but at least it would have been cleaner and > free from icky patching (that's my own opinion anyway). Then the assumption is wrong. You need to write with the premise of working with Eventlet. For me, eventlet has complicated the programming model by forcing me to a specific pattern, although I must admit this has largely been due to my use of a C library (libzmq). -- Eric Windisch ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Ok - I'll work with Jay on that. -Original Message- From: Vishvananda Ishaya [mailto:vishvana...@gmail.com] Sent: 02 March 2012 19:27 To: Day, Phil Cc: Jay Pipes; openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On Mar 2, 2012, at 7:54 AM, Day, Phil wrote: >> By "properly multi-threaded" are you instead referring to making the >> nova-api server multi-*processed* with eventlet greenthread pools in each >> process? i.e. The way Swift (and now Glance) works? Or are you referring to >> a different approach entirely? > > Yep - following your posting in here pointing to the glance changes we > back-ported that into the Diablo API server. We're now running each API > server with 20 OS processes and 20 EC2 processes, and the world looks a lot > happier. The same changes were being done in parallel into Essex by someone > in the community I thought ? Can you or jay write up what this would entail in nova? (or even ship a diff) Are you using multiprocessing? In general we have had issues combining multiprocessing and eventlet, so in our deploys we run multiple api servers on different ports and load balance with ha proxy. It sounds like what you have is working though, so it would be nice to put it in (perhaps with a flag gate) if possible. > >> Curious... do you have a list of all the places where sleep(0) calls were >> inserted in the HP Nova code? I can turn that into a bug report and get to >> work on adding them... > > So far the only two cases we've done this are in the _sync_power_state and > in the security group refresh handling > (libvirt/firewall/do_refresh_security_group_rules) - which we modified to > only refresh for instances in the group and added a sleep in the loop (I need > to finish writing the bug report for this one). Please do this ASAP, I would like to get that fix in. Vish ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 02, 2012, Armando Migliaccio wrote: > I agree, but then the whole assumption of adopting eventlet to simplify > the programming model is hindered by the fact that one has to think > harder to what is doing...Nova could've kept Twisted for that matter. > The programming model would have been harder, but at least it would > have been cleaner and free from icky patching (that's my own opinion > anyway). Twisted has a much harder programming model with the same blocking problem that eventlet has. > Yes. There is a fine balance to be struck here: do you let potential > races appear in your system and deal with them on a case-by-case base, > or do you introduce mutexes and deal with potential inefficiency > and/or deadlocks? I'd rather go with the former here. Neither of these options are acceptable IMO. If we want to minimize the number of bugs, we should make the task as easy as possible on the programmer. Constantly trying to track multiple threads of execution and what possible races that can happen and what locking is required will end up with more bugs in the long run. I'd priortize correct over performant. It's easier to optimize when you're sure the code is correct than the other way around. I'd like to see a move towards more serialization of actions. For instance, if all operations on an instance are serialized, then there are no opportunities to race against other operations on the same instance. We can loosen the restrictions when we've identified bottlenecks and we're sure it's safe to do so. I'm sure we'll find out that performance is still very good. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
That sounds a bit over complicated to me - Having a string of tasks sounds like you still have to think about what the concurrency is within each step. There is already a good abstraction around the context of each operation - they just (I know - big just) need to be running in something that maps to kernel threads rather than user space ones. All I really want to is to allow more than one action to run at the same time. So if I have two requests to create a snapshot, why can't they both run at the same time and still allow other things to happen ? I have all these cores sitting in my compute node that there that could be used, but I'm still having to think like a punch-card programmer submitting batch jobs to the mainframe ;-) Right now creating snapshots is pretty close to a DoS attack on a compute node. From: Joshua Harlow [mailto:harlo...@yahoo-inc.com] Sent: 02 March 2012 19:23 To: Day, Phil; Chris Behrens Cc: openstack Subject: Re: [Openstack] eventlet weirdness So a thought I had was that say if the design of a component forces as part of its design the ability to be ran with threads or with eventlet or with processes. Say if u break everything up into tasks (where a task would produce some output/result/side-effect). A set of tasks could complete some action (ie, create a vm). Subtasks could be the following: 0. Validate credentials 1. Get the image 2. Call into libvirt 3. ... These "tasks", if constructed in a way that makes them stateless, and then could be chained together to form an action, then that action could be given say to a threaded "engine" that would know how to execute those tasks with threads, or it could be given to an eventlet "engine" that would do the same with evenlet pool/greenthreads/coroutings, or with processes (and so on). This could be one way the design of your code abstracts that kind of execution (where eventlet is abstracted away from the actual work being done, instead of popping up in calls to sleep(0), ie the leaky abstraction). On 3/2/12 11:08 AM, "Day, Phil" wrote: I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state). However ... In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy. And it works now without a major code refactor. Always open to other approaches ... Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Chris Behrens Sent: 02 March 2012 19:00 To: Joshua Harlow Cc: openstack; Chris Behrens Subject: Re: [Openstack] eventlet weirdness It's not just you On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really "dirty", or is it just > me. > > "adding a few sleep(0) calls in various places in the Nova codebase > (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :)" > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. > But that's just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I > > > found a > > race condition right in the middle of sync_power_states, which I > > assume was exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack
Re: [Openstack] eventlet weirdness
On Mar 2, 2012, at 7:54 AM, Day, Phil wrote: >> By "properly multi-threaded" are you instead referring to making the >> nova-api server multi-*processed* with eventlet greenthread pools in each >> process? i.e. The way Swift (and now Glance) works? Or are you referring to >> a different approach entirely? > > Yep - following your posting in here pointing to the glance changes we > back-ported that into the Diablo API server. We're now running each API > server with 20 OS processes and 20 EC2 processes, and the world looks a lot > happier. The same changes were being done in parallel into Essex by someone > in the community I thought ? Can you or jay write up what this would entail in nova? (or even ship a diff) Are you using multiprocessing? In general we have had issues combining multiprocessing and eventlet, so in our deploys we run multiple api servers on different ports and load balance with ha proxy. It sounds like what you have is working though, so it would be nice to put it in (perhaps with a flag gate) if possible. > >> Curious... do you have a list of all the places where sleep(0) calls were >> inserted in the HP Nova code? I can turn that into a bug report and get to >> work on adding them... > > So far the only two cases we've done this are in the _sync_power_state and > in the security group refresh handling > (libvirt/firewall/do_refresh_security_group_rules) - which we modified to > only refresh for instances in the group and added a sleep in the loop (I need > to finish writing the bug report for this one). Please do this ASAP, I would like to get that fix in. Vish ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Fri, Mar 2, 2012 at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really “dirty”, or is it > just me. > Any feeling of dirtiness is just due to it being called "sleep," all you are doing is yielding control to allow another co-routine to schedule itself. Blocking code is still blocking code, you have to give it some break points if you are going to run a loop that waits on something else. > > “adding a few sleep(0) calls in various places in the > > Nova codebase (as was recently added in the _sync_power_states() > periodic task) is an easy and simple win with pretty much no ill > side-effects. :)” > > Dirty in that it feels like there is something wrong from a design point > of view. > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. > But that’s just my gut feeling. > > *:-( > * > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com ] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I > found a > > race condition right in the middle of sync_power_states, which I assume > was > > exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
So a thought I had was that say if the design of a component forces as part of its design the ability to be ran with threads or with eventlet or with processes. Say if u break everything up into tasks (where a task would produce some output/result/side-effect). A set of tasks could complete some action (ie, create a vm). Subtasks could be the following: 0. Validate credentials 1. Get the image 2. Call into libvirt 3. ... These "tasks", if constructed in a way that makes them stateless, and then could be chained together to form an action, then that action could be given say to a threaded "engine" that would know how to execute those tasks with threads, or it could be given to an eventlet "engine" that would do the same with evenlet pool/greenthreads/coroutings, or with processes (and so on). This could be one way the design of your code abstracts that kind of execution (where eventlet is abstracted away from the actual work being done, instead of popping up in calls to sleep(0), ie the leaky abstraction). On 3/2/12 11:08 AM, "Day, Phil" wrote: I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state). However ... In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy. And it works now without a major code refactor. Always open to other approaches ... Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Chris Behrens Sent: 02 March 2012 19:00 To: Joshua Harlow Cc: openstack; Chris Behrens Subject: Re: [Openstack] eventlet weirdness It's not just you On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really "dirty", or is it just > me. > > "adding a few sleep(0) calls in various places in the Nova codebase > (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :)" > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. > But that's just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I > > > found a > > race condition right in the middle of sync_power_states, which I > > assume was exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> -Original Message- > From: Eric Windisch [mailto:e...@cloudscaling.com] > Sent: 02 March 2012 19:04 > To: Joshua Harlow > Cc: Armando Migliaccio; Jay Pipes; openstack > Subject: Re: [Openstack] eventlet weirdness > > The problem is that unless you sleep(0), eventlet only switches context when > you hit a file descriptor. > > As long as python coroutines are used, we should put sleep(0) where-ever it is > expected that there will be a long-running loop where file descriptors are not > touched. As noted elsewhere in this thread, MySQL file descriptors don't > count, they're not coroutine friendly. > > The premise is that cpus are pretty fast and get quickly from one call of a > file descriptor to another, that the blocking of these descriptors is what a > CPU most waits on, and this is an easy and obvious place to switch coroutines > via monkey-patching. > > That said, it shouldn't be necessary to "sprinkle" sleep(0) calls. They should > be strategically placed, as necessary. I agree, but then the whole assumption of adopting eventlet to simplify the programming model is hindered by the fact that one has to think harder to what is doing...Nova could've kept Twisted for that matter. The programming model would have been harder, but at least it would have been cleaner and free from icky patching (that's my own opinion anyway). > > "race-conditions" around coroutine switching sounds more like thread-safety > issues... > Yes. There is a fine balance to be struck here: do you let potential races appear in your system and deal with them on a case-by-case base, or do you introduce mutexes and deal with potential inefficiency and/or deadlocks? I'd rather go with the former here. > -- > Eric Windisch > > > On Friday, March 2, 2012 at 1:35 PM, Joshua Harlow wrote: > > > Re: [Openstack] eventlet weirdness Does anyone else feel that the following > seems really “dirty”, or is it just me. > > > > “adding a few sleep(0) calls in various places in the Nova codebase > > (as was recently added in the _sync_power_states() periodic task) is > > an easy and simple win with pretty much no ill side-effects. :)” > > > > Dirty in that it feels like there is something wrong from a design point of > view. > > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. > > But that’s just my gut feeling. > > > > :-( > > > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > > > > I knew you'd say that :P > > > > > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > > > > > Cheers, > > > Armando > > > > > > > -Original Message- > > > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > > > Sent: 02 March 2012 16:22 > > > > To: Armando Migliaccio > > > > Cc: openstack@lists.launchpad.net > > > > Subject: Re: [Openstack] eventlet weirdness > > > > > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > > > I'd be cautious to say that no ill side-effects were introduced. > > > > > I found a > > > > > > > > race condition right in the middle of sync_power_states, which I > > > > assume was exposed by "breaking" the task deliberately. > > > > > > > > Such a party-pooper! ;) > > > > > > > > Got a link to the bug report for me? > > > > > > > > Thanks! > > > > -jay > > > > > > > > > ___ > > > Mailing list: https://launchpad.net/~openstack Post to : > > > openstack@lists.launchpad.net Unsubscribe : > > > https://launchpad.net/~openstack More help : > > > https://help.launchpad.net/ListHelp > > > > ___ > > Mailing list: https://launchpad.net/~openstack Post to : > > openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) > > Unsubscribe : https://launchpad.net/~openstack More help : > > https://help.launchpad.net/ListHelp > > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state). However ... In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy. And it works now without a major code refactor. Always open to other approaches ... Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Chris Behrens Sent: 02 March 2012 19:00 To: Joshua Harlow Cc: openstack; Chris Behrens Subject: Re: [Openstack] eventlet weirdness It's not just you On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really "dirty", or is it just > me. > > "adding a few sleep(0) calls in various places in the Nova codebase > (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :)" > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. > But that's just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I > > > found a > > race condition right in the middle of sync_power_states, which I > > assume was exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
The problem is that unless you sleep(0), eventlet only switches context when you hit a file descriptor. As long as python coroutines are used, we should put sleep(0) where-ever it is expected that there will be a long-running loop where file descriptors are not touched. As noted elsewhere in this thread, MySQL file descriptors don't count, they're not coroutine friendly. The premise is that cpus are pretty fast and get quickly from one call of a file descriptor to another, that the blocking of these descriptors is what a CPU most waits on, and this is an easy and obvious place to switch coroutines via monkey-patching. That said, it shouldn't be necessary to "sprinkle" sleep(0) calls. They should be strategically placed, as necessary. "race-conditions" around coroutine switching sounds more like thread-safety issues... -- Eric Windisch On Friday, March 2, 2012 at 1:35 PM, Joshua Harlow wrote: > Re: [Openstack] eventlet weirdness Does anyone else feel that the following > seems really “dirty”, or is it just me. > > “adding a few sleep(0) calls in various places in the > Nova codebase (as was recently added in the _sync_power_states() > periodic task) is an easy and simple win with pretty much no ill > side-effects. :)” > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. > But that’s just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > > I knew you'd say that :P > > > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > > > Cheers, > > Armando > > > > > -Original Message- > > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > > Sent: 02 March 2012 16:22 > > > To: Armando Migliaccio > > > Cc: openstack@lists.launchpad.net > > > Subject: Re: [Openstack] eventlet weirdness > > > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > > I'd be cautious to say that no ill side-effects were introduced. I > > > > found a > > > > > > race condition right in the middle of sync_power_states, which I assume > > > was > > > exposed by "breaking" the task deliberately. > > > > > > Such a party-pooper! ;) > > > > > > Got a link to the bug report for me? > > > > > > Thanks! > > > -jay > > > > > > ___ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
It's not just you On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote: > Does anyone else feel that the following seems really “dirty”, or is it just > me. > > “adding a few sleep(0) calls in various places in the > Nova codebase (as was recently added in the _sync_power_states() > periodic task) is an easy and simple win with pretty much no ill > side-effects. :)” > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. > But that’s just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I found a > > race condition right in the middle of sync_power_states, which I assume was > > exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Looks like a textbook example of a "leaky abstraction" <http://www.joelonsoftware.com/articles/LeakyAbstractions.html> to me. Take care, Lorin -- Lorin Hochstein Lead Architect - Cloud Services Nimbis Services, Inc. www.nimbisservices.com On Mar 2, 2012, at 1:35 PM, Joshua Harlow wrote: > Does anyone else feel that the following seems really “dirty”, or is it just > me. > > “adding a few sleep(0) calls in various places in the > Nova codebase (as was recently added in the _sync_power_states() > periodic task) is an easy and simple win with pretty much no ill > side-effects. :)” > > Dirty in that it feels like there is something wrong from a design point of > view. > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. > But that’s just my gut feeling. > > :-( > > On 3/2/12 8:26 AM, "Armando Migliaccio" > wrote: > > I knew you'd say that :P > > There you go: https://bugs.launchpad.net/nova/+bug/944145 > > Cheers, > Armando > > > -Original Message- > > From: Jay Pipes [mailto:jaypi...@gmail.com] > > Sent: 02 March 2012 16:22 > > To: Armando Migliaccio > > Cc: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > > I'd be cautious to say that no ill side-effects were introduced. I found a > > race condition right in the middle of sync_power_states, which I assume was > > exposed by "breaking" the task deliberately. > > > > Such a party-pooper! ;) > > > > Got a link to the bug report for me? > > > > Thanks! > > -jay > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Does anyone else feel that the following seems really "dirty", or is it just me. "adding a few sleep(0) calls in various places in the Nova codebase (as was recently added in the _sync_power_states() periodic task) is an easy and simple win with pretty much no ill side-effects. :)" Dirty in that it feels like there is something wrong from a design point of view. Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. But that's just my gut feeling. :-( On 3/2/12 8:26 AM, "Armando Migliaccio" wrote: I knew you'd say that :P There you go: https://bugs.launchpad.net/nova/+bug/944145 Cheers, Armando > -Original Message- > From: Jay Pipes [mailto:jaypi...@gmail.com] > Sent: 02 March 2012 16:22 > To: Armando Migliaccio > Cc: openstack@lists.launchpad.net > Subject: Re: [Openstack] eventlet weirdness > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > I'd be cautious to say that no ill side-effects were introduced. I found a > race condition right in the middle of sync_power_states, which I assume was > exposed by "breaking" the task deliberately. > > Such a party-pooper! ;) > > Got a link to the bug report for me? > > Thanks! > -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
I knew you'd say that :P There you go: https://bugs.launchpad.net/nova/+bug/944145 Cheers, Armando > -Original Message- > From: Jay Pipes [mailto:jaypi...@gmail.com] > Sent: 02 March 2012 16:22 > To: Armando Migliaccio > Cc: openstack@lists.launchpad.net > Subject: Re: [Openstack] eventlet weirdness > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote: > > I'd be cautious to say that no ill side-effects were introduced. I found a > race condition right in the middle of sync_power_states, which I assume was > exposed by "breaking" the task deliberately. > > Such a party-pooper! ;) > > Got a link to the bug report for me? > > Thanks! > -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On 03/02/2012 10:54 AM, Day, Phil wrote: By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely? Yep - following your posting in here pointing to the glance changes we back-ported that into the Diablo API server. We're now running each API server with 20 OS processes and 20 EC2 processes, and the world looks a lot happier. Gotcha, OK, that makes a lot of sense. > The same changes were being done in parallel into Essex by someone in the community I thought ? Hmmm, for Nova? I'm not aware of that effort, but I would certainly support it. It's a very big impact performance issue... Curious... do you have a list of all the places where sleep(0) calls were inserted in the HP Nova code? I can turn that into a bug report and get to work on adding them... So far the only two cases we've done this are in the _sync_power_state and in the security group refresh handling (libvirt/firewall/do_refresh_security_group_rules) - which we modified to only refresh for instances in the group and added a sleep in the loop (I need to finish writing the bug report for this one). OK, sounds good. I have contemplated doing something similar in the image code when reading chunks from glance - but am slightly worried that in this case the only thing that currently stops two creates for the same image from making separate requests to glance might be that one gets queued behind the other. It would be nice to do the same thing on snapshot (as this can also be a real hog), but there the transfer is handled completely within the glance client. A more radical approach would be to split out the image handling code from compute manager into a separate (co-hosted) image_manager so at least only commands which need interaction with glance will block each other. We should definitely discuss this further (separate ML thread or etherpad maybe). If not before the design summit, then definitely at it. Cheers! -jay Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Jay Pipes Sent: 02 March 2012 15:17 To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On 03/02/2012 05:34 AM, Day, Phil wrote: In our experience (running clusters of several hundred nodes) the DB performance is not generally the significant factor, so making its calls non-blocking gives only a very small increase in processing capacity and creates other side effects in terms of slowing all eventlets down as they wait for their turn to run. Yes, I believe I said that this was the case at the last design summit -- or rather, I believe I said "is there any evidence that the database is a performance or scalability problem at all"? That shouldn't really be surprising given that the Nova DB is pretty small and MySQL is a pretty good DB - throw reasonable hardware at the DB server and give it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes where the slow query log tells you to, etc) and it shouldn't be the bottleneck in the system for performance or scalability. ++ We use the python driver and have experimented with allowing the eventlet code to make the db calls non-blocking (its not the default setting), and it works, but didn't give us any significant advantage. Yep, identical results to the work that Mark Washenberger did on the same subject. For example in the API server (before we made it properly multi-threaded) By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely? > with blocking db calls the server was essentially a serial processing queue - each request was fully processed before the next. With non-blocking db calls we got a lot more apparent concurrencybut only at the expense of making all of the requests equally bad. Yep, not surprising. Consider a request takes 10 seconds, where after 5 seconds there is a call to the DB which takes 1 second, and three are started at the same time: Blocking: 0 - Request 1 starts 10 - Request 1 completes, request 2 starts 20 - Request 2 completes, request 3 starts 30 - Request 3 competes Request 1 completes in 10 seconds Request 2 completes in 20 seconds Request 3 completes in 30 seconds Ave time: 20 sec Non-blocking 0 - Request 1 Starts 5 - Request 1 gets to db call, request 2 starts 10 - Request 2 gets to db call, request 3 starts 15 - Request 3 gets to db call, request 1 resumes 19 - Request 1 completes, request 2 resume
Re: [Openstack] eventlet weirdness
On 03/02/2012 10:52 AM, Armando Migliaccio wrote: I'd be cautious to say that no ill side-effects were introduced. I found a race condition right in the middle of sync_power_states, which I assume was exposed by "breaking" the task deliberately. Such a party-pooper! ;) Got a link to the bug report for me? Thanks! -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
> By "properly multi-threaded" are you instead referring to making the nova-api > server multi-*processed* with eventlet greenthread pools in each process? > i.e. The way Swift (and now Glance) works? Or are you referring to a > different approach entirely? Yep - following your posting in here pointing to the glance changes we back-ported that into the Diablo API server. We're now running each API server with 20 OS processes and 20 EC2 processes, and the world looks a lot happier. The same changes were being done in parallel into Essex by someone in the community I thought ? > Curious... do you have a list of all the places where sleep(0) calls were > inserted in the HP Nova code? I can turn that into a bug report and get to > work on adding them... So far the only two cases we've done this are in the _sync_power_state and in the security group refresh handling (libvirt/firewall/do_refresh_security_group_rules) - which we modified to only refresh for instances in the group and added a sleep in the loop (I need to finish writing the bug report for this one). I have contemplated doing something similar in the image code when reading chunks from glance - but am slightly worried that in this case the only thing that currently stops two creates for the same image from making separate requests to glance might be that one gets queued behind the other. It would be nice to do the same thing on snapshot (as this can also be a real hog), but there the transfer is handled completely within the glance client. A more radical approach would be to split out the image handling code from compute manager into a separate (co-hosted) image_manager so at least only commands which need interaction with glance will block each other. Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Jay Pipes Sent: 02 March 2012 15:17 To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On 03/02/2012 05:34 AM, Day, Phil wrote: > In our experience (running clusters of several hundred nodes) the DB > performance is not generally the significant factor, so making its calls > non-blocking gives only a very small increase in processing capacity and > creates other side effects in terms of slowing all eventlets down as they > wait for their turn to run. Yes, I believe I said that this was the case at the last design summit -- or rather, I believe I said "is there any evidence that the database is a performance or scalability problem at all"? > That shouldn't really be surprising given that the Nova DB is pretty small > and MySQL is a pretty good DB - throw reasonable hardware at the DB server > and give it a bit of TLC from a DBA (remove deleted entries from the DB, add > indexes where the slow query log tells you to, etc) and it shouldn't be the > bottleneck in the system for performance or scalability. ++ > We use the python driver and have experimented with allowing the eventlet > code to make the db calls non-blocking (its not the default setting), and it > works, but didn't give us any significant advantage. Yep, identical results to the work that Mark Washenberger did on the same subject. > For example in the API server (before we made it properly > multi-threaded) By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely? > with blocking db calls the server was essentially a serial processing queue > - each request was fully processed before the next. With non-blocking db > calls we got a lot more apparent concurrencybut only at the expense of > making all of the requests equally bad. Yep, not surprising. > Consider a request takes 10 seconds, where after 5 seconds there is a call to > the DB which takes 1 second, and three are started at the same time: > > Blocking: > 0 - Request 1 starts > 10 - Request 1 completes, request 2 starts > 20 - Request 2 completes, request 3 starts > 30 - Request 3 competes > Request 1 completes in 10 seconds > Request 2 completes in 20 seconds > Request 3 completes in 30 seconds > Ave time: 20 sec > > Non-blocking > 0 - Request 1 Starts > 5 - Request 1 gets to db call, request 2 starts > 10 - Request 2 gets to db call, request 3 starts > 15 - Request 3 gets to db call, request 1 resumes > 19 - Request 1 completes, request 2 resumes > 23 - Request 2 completes, request 3 resumes > 27 - Request 3 completes > > Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes > in 24 seconds (+ 4 secon
Re: [Openstack] eventlet weirdness
> -Original Message- > From: openstack-bounces+armando.migliaccio=eu.citrix@lists.launchpad.net > [mailto:openstack- > bounces+armando.migliaccio=eu.citrix@lists.launchpad.net] On Behalf Of Jay > Pipes > Sent: 02 March 2012 15:17 > To: openstack@lists.launchpad.net > Subject: Re: [Openstack] eventlet weirdness > > On 03/02/2012 05:34 AM, Day, Phil wrote: > > In our experience (running clusters of several hundred nodes) the DB > performance is not generally the significant factor, so making its calls non- > blocking gives only a very small increase in processing capacity and creates > other side effects in terms of slowing all eventlets down as they wait for > their turn to run. > > Yes, I believe I said that this was the case at the last design summit > -- or rather, I believe I said "is there any evidence that the database is a > performance or scalability problem at all"? > > > That shouldn't really be surprising given that the Nova DB is pretty small > and MySQL is a pretty good DB - throw reasonable hardware at the DB server and > give it a bit of TLC from a DBA (remove deleted entries from the DB, add > indexes where the slow query log tells you to, etc) and it shouldn't be the > bottleneck in the system for performance or scalability. > > ++ > > > We use the python driver and have experimented with allowing the eventlet > code to make the db calls non-blocking (its not the default setting), and it > works, but didn't give us any significant advantage. > > Yep, identical results to the work that Mark Washenberger did on the same > subject. > > > For example in the API server (before we made it properly > > multi-threaded) > > By "properly multi-threaded" are you instead referring to making the nova-api > server multi-*processed* with eventlet greenthread pools in each process? i.e. > The way Swift (and now Glance) works? Or are you referring to a different > approach entirely? > > > with blocking db calls the server was essentially a serial processing queue > - each request was fully processed before the next. With non-blocking db > calls we got a lot more apparent concurrencybut only at the expense of making > all of the requests equally bad. > > Yep, not surprising. > > > Consider a request takes 10 seconds, where after 5 seconds there is a call > to the DB which takes 1 second, and three are started at the same time: > > > > Blocking: > > 0 - Request 1 starts > > 10 - Request 1 completes, request 2 starts > > 20 - Request 2 completes, request 3 starts > > 30 - Request 3 competes > > Request 1 completes in 10 seconds > > Request 2 completes in 20 seconds > > Request 3 completes in 30 seconds > > Ave time: 20 sec > > > > Non-blocking > > 0 - Request 1 Starts > > 5 - Request 1 gets to db call, request 2 starts > > 10 - Request 2 gets to db call, request 3 starts > > 15 - Request 3 gets to db call, request 1 resumes > > 19 - Request 1 completes, request 2 resumes > > 23 - Request 2 completes, request 3 resumes > > 27 - Request 3 completes > > > > Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes > > in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 > > seconds) Ave time: 20 sec > > > > So instead of worrying about making db calls non-blocking we've been working > to make certain eventlets non-blocking - i.e. add sleep(0) calls to long > running iteration loops - which IMO has a much bigger impact on the > performance of the apparent latency of the system. > > Yep, and I think adding a few sleep(0) calls in various places in the Nova > codebase (as was recently added in the _sync_power_states() periodic task) is > an easy and simple win with pretty much no ill side-effects. :) I'd be cautious to say that no ill side-effects were introduced. I found a race condition right in the middle of sync_power_states, which I assume was exposed by "breaking" the task deliberately. > > Curious... do you have a list of all the places where sleep(0) calls were > inserted in the HP Nova code? I can turn that into a bug report and get to > work on adding them... > > All the best, > -jay > > > Phil > > > > > > > > -Original Message- > > From: openstack-bounces+philip.day=hp@lists.launchpad.net > > [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On > > Behalf Of Brian Lamar > > Sent: 01 March 2012 21:31 > > To: openstack@lists.launchpad.net > > Subject: Re: [Openstack] eventlet weirdness > > > >>> How is MySQ
Re: [Openstack] eventlet weirdness
On 03/02/2012 05:34 AM, Day, Phil wrote: In our experience (running clusters of several hundred nodes) the DB performance is not generally the significant factor, so making its calls non-blocking gives only a very small increase in processing capacity and creates other side effects in terms of slowing all eventlets down as they wait for their turn to run. Yes, I believe I said that this was the case at the last design summit -- or rather, I believe I said "is there any evidence that the database is a performance or scalability problem at all"? That shouldn't really be surprising given that the Nova DB is pretty small and MySQL is a pretty good DB - throw reasonable hardware at the DB server and give it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes where the slow query log tells you to, etc) and it shouldn't be the bottleneck in the system for performance or scalability. ++ We use the python driver and have experimented with allowing the eventlet code to make the db calls non-blocking (its not the default setting), and it works, but didn't give us any significant advantage. Yep, identical results to the work that Mark Washenberger did on the same subject. For example in the API server (before we made it properly multi-threaded) By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely? > with blocking db calls the server was essentially a serial processing queue - each request was fully processed before the next. With non-blocking db calls we got a lot more apparent concurrencybut only at the expense of making all of the requests equally bad. Yep, not surprising. Consider a request takes 10 seconds, where after 5 seconds there is a call to the DB which takes 1 second, and three are started at the same time: Blocking: 0 - Request 1 starts 10 - Request 1 completes, request 2 starts 20 - Request 2 completes, request 3 starts 30 - Request 3 competes Request 1 completes in 10 seconds Request 2 completes in 20 seconds Request 3 completes in 30 seconds Ave time: 20 sec Non-blocking 0 - Request 1 Starts 5 - Request 1 gets to db call, request 2 starts 10 - Request 2 gets to db call, request 3 starts 15 - Request 3 gets to db call, request 1 resumes 19 - Request 1 completes, request 2 resumes 23 - Request 2 completes, request 3 resumes 27 - Request 3 completes Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 seconds) Ave time: 20 sec So instead of worrying about making db calls non-blocking we've been working to make certain eventlets non-blocking - i.e. add sleep(0) calls to long running iteration loops - which IMO has a much bigger impact on the performance of the apparent latency of the system. Yep, and I think adding a few sleep(0) calls in various places in the Nova codebase (as was recently added in the _sync_power_states() periodic task) is an easy and simple win with pretty much no ill side-effects. :) Curious... do you have a list of all the places where sleep(0) calls were inserted in the HP Nova code? I can turn that into a bug report and get to work on adding them... All the best, -jay Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Brian Lamar Sent: 01 March 2012 21:31 To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness How is MySQL access handled in eventlet? Presumably it's external C library so it's not going to be monkey patched. Does that make every db access call a blocking call? Thanks, Nope, it goes through a thread pool. I feel like this might be an over-simplification. If the question is: "How is MySQL access handled in nova?" The answer would be that we use SQLAlchemy which can load any number of SQL-drivers. These drivers can be either pure Python or C-based drivers. In the case of pure Python drivers, monkey patching can occur and db calls are non-blocking. In the case of drivers which contain C code (or perhaps other blocking calls), db calls will most likely be blocking. If the question is "How is MySQL access handled in eventlet?" the answer would be to use the eventlet.db_pool module to allow db access using thread pools. B -Original Message- From: "Adam Young" Sent: Thursday, March 1, 2012 3:27pm To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On 03/01/2012 02:45 PM, Yun Mao wrote: There are plenty eventlet discussion recently but I'll stick my question to this thread, although it's pretty much a separate question. :)
Re: [Openstack] eventlet weirdness
In our experience (running clusters of several hundred nodes) the DB performance is not generally the significant factor, so making its calls non-blocking gives only a very small increase in processing capacity and creates other side effects in terms of slowing all eventlets down as they wait for their turn to run. That shouldn't really be surprising given that the Nova DB is pretty small and MySQL is a pretty good DB - throw reasonable hardware at the DB server and give it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes where the slow query log tells you to, etc) and it shouldn't be the bottleneck in the system for performance or scalability. We use the python driver and have experimented with allowing the eventlet code to make the db calls non-blocking (its not the default setting), and it works, but didn't give us any significant advantage. For example in the API server (before we made it properly multi-threaded) with blocking db calls the server was essentially a serial processing queue - each request was fully processed before the next. With non-blocking db calls we got a lot more apparent concurrencybut only at the expense of making all of the requests equally bad. Consider a request takes 10 seconds, where after 5 seconds there is a call to the DB which takes 1 second, and three are started at the same time: Blocking: 0 - Request 1 starts 10 - Request 1 completes, request 2 starts 20 - Request 2 completes, request 3 starts 30 - Request 3 competes Request 1 completes in 10 seconds Request 2 completes in 20 seconds Request 3 completes in 30 seconds Ave time: 20 sec Non-blocking 0 - Request 1 Starts 5 - Request 1 gets to db call, request 2 starts 10 - Request 2 gets to db call, request 3 starts 15 - Request 3 gets to db call, request 1 resumes 19 - Request 1 completes, request 2 resumes 23 - Request 2 completes, request 3 resumes 27 - Request 3 completes Request 1 completes in 19 seconds (+ 9 seconds) Request 2 completes in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 seconds) Ave time: 20 sec So instead of worrying about making db calls non-blocking we've been working to make certain eventlets non-blocking - i.e. add sleep(0) calls to long running iteration loops - which IMO has a much bigger impact on the performance of the apparent latency of the system. Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Brian Lamar Sent: 01 March 2012 21:31 To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness >> How is MySQL access handled in eventlet? Presumably it's external C >> library so it's not going to be monkey patched. Does that make every >> db access call a blocking call? Thanks, > Nope, it goes through a thread pool. I feel like this might be an over-simplification. If the question is: "How is MySQL access handled in nova?" The answer would be that we use SQLAlchemy which can load any number of SQL-drivers. These drivers can be either pure Python or C-based drivers. In the case of pure Python drivers, monkey patching can occur and db calls are non-blocking. In the case of drivers which contain C code (or perhaps other blocking calls), db calls will most likely be blocking. If the question is "How is MySQL access handled in eventlet?" the answer would be to use the eventlet.db_pool module to allow db access using thread pools. B -Original Message- From: "Adam Young" Sent: Thursday, March 1, 2012 3:27pm To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On 03/01/2012 02:45 PM, Yun Mao wrote: > There are plenty eventlet discussion recently but I'll stick my > question to this thread, although it's pretty much a separate > question. :) > > How is MySQL access handled in eventlet? Presumably it's external C > library so it's not going to be monkey patched. Does that make every > db access call a blocking call? Thanks, Nope, it goes through a thread pool. > > Yun > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > wrote: >> On Wed, Feb 29, 2012, Yun Mao wrote: >>> Thanks for the explanation. Let me see if I understand this. >>> >>> 1. Eventlet will never have this problem if there is only 1 OS >>> thread >>> -- let's call it main thread. >> In fact, that's exactly what Python calls it :) >> >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or >>> the virt/firewall driver. >>> 3. The python logging module uses locks. Because of the monkey >>> patch, those locks are actually eventlet or "green" locks and may >>> trigger a green thread conte
Re: [Openstack] eventlet weirdness
We were using the eventlet db pool class. You can see the code here: https://github.com/openstack/nova/blob/2011.3/nova/db/sqlalchemy/session.py there were a few bugs about it, but here is one: https://bugs.launchpad.net/nova/+bug/838581 Vish On Mar 1, 2012, at 6:11 PM, Kapil Thangavelu wrote: > > I actually didn't mean to send the parent.. but since we're here. > > Additional threads/pool make the problem potentially worse, they are the > origin > of the greenlet switching issue that was the start of this thread. ie. with > the > monkey patching, most stdlib socket, thread module usage by python code in > the > non-main threads will potentially attempt a greenlet trampoline across > threads > at worst (ie. error), or unintentional per thread hubs at best. > > Since the mysqldb is an extension (no python code) wrapping it in eventlet > thread pool should work in theory. What where the problems with it last time > it > was attempted? > > cheers, > > -kapil > > > Excerpts from Devin Carlen's message of 2012-03-01 20:38:20 -0500: >> As long as we allocate a thread in the eventlet thread pool for the number >> of mysql connections we want to actually maintain in our connection pool, we >> shouldn't have problems getting the results we want even with the blocking >> mysql c drivers. >> >> Devin >> >> On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote: >> >>> The standard python postgresql driver (psycopg2) does have an async mode. >>> There >>> are non db api compliant async mysql drivers for gevent. >>> >>> >>> Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500: Yes it does. We actually tried to use a pool at diablo release and it was very broken. There was discussion about moving over to a pure-python mysql library, but it hasn't been tried yet. Vish On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > There are plenty eventlet discussion recently but I'll stick my > question to this thread, although it's pretty much a separate > question. :) > > How is MySQL access handled in eventlet? Presumably it's external C > library so it's not going to be monkey patched. Does that make every > db access call a blocking call? Thanks, > > Yun > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt (mailto:johan...@erdfelt.com)> wrote: >> On Wed, Feb 29, 2012, Yun Mao > (mailto:yun...@gmail.com)> wrote: >>> Thanks for the explanation. Let me see if I understand this. >>> >>> 1. Eventlet will never have this problem if there is only 1 OS thread >>> -- let's call it main thread. >>> >> >> >> In fact, that's exactly what Python calls it :) >> >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the >>> virt/firewall driver. >>> 3. The python logging module uses locks. Because of the monkey patch, >>> those locks are actually eventlet or "green" locks and may trigger a >>> green thread context switch. >>> >>> Based on 1-3, does it make sense to say that in the other OS threads >>> (i.e. not main thread), if logging (plus other pure python library >>> code involving locking) is never used, and we do not run a eventlet >>> hub at all, we should never see this problem? >>> >> >> >> That should be correct. I'd have to double check all of the monkey >> patching that eventlet does to make sure there aren't other cases where >> you may inadvertently use eventlet primitives across real threads. >> >> JE >> >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> (mailto:openstack@lists.launchpad.net) >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > (mailto:openstack@lists.launchpad.net) > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > >>> >>> >>> ___ >>> Mailing list: https://launchpad.net/~openstack >>> Post to : openstack@lists.launchpad.net >>> (mailto:openstack@lists.launchpad.net) >>> Unsubscribe : https://launchpad.net/~openstack >>> More help : https://help.launchpad.net/ListHelp >>> >>> ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
I actually didn't mean to send the parent.. but since we're here. Additional threads/pool make the problem potentially worse, they are the origin of the greenlet switching issue that was the start of this thread. ie. with the monkey patching, most stdlib socket, thread module usage by python code in the non-main threads will potentially attempt a greenlet trampoline across threads at worst (ie. error), or unintentional per thread hubs at best. Since the mysqldb is an extension (no python code) wrapping it in eventlet thread pool should work in theory. What where the problems with it last time it was attempted? cheers, -kapil Excerpts from Devin Carlen's message of 2012-03-01 20:38:20 -0500: > As long as we allocate a thread in the eventlet thread pool for the number of > mysql connections we want to actually maintain in our connection pool, we > shouldn't have problems getting the results we want even with the blocking > mysql c drivers. > > Devin > > On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote: > > > The standard python postgresql driver (psycopg2) does have an async mode. > > There > > are non db api compliant async mysql drivers for gevent. > > > > > > Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500: > > > Yes it does. We actually tried to use a pool at diablo release and it was > > > very broken. There was discussion about moving over to a pure-python > > > mysql library, but it hasn't been tried yet. > > > > > > Vish > > > > > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > > > > > > > There are plenty eventlet discussion recently but I'll stick my > > > > question to this thread, although it's pretty much a separate > > > > question. :) > > > > > > > > How is MySQL access handled in eventlet? Presumably it's external C > > > > library so it's not going to be monkey patched. Does that make every > > > > db access call a blocking call? Thanks, > > > > > > > > Yun > > > > > > > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > > > (mailto:johan...@erdfelt.com)> wrote: > > > > > On Wed, Feb 29, 2012, Yun Mao > > > > (mailto:yun...@gmail.com)> wrote: > > > > > > Thanks for the explanation. Let me see if I understand this. > > > > > > > > > > > > 1. Eventlet will never have this problem if there is only 1 OS > > > > > > thread > > > > > > -- let's call it main thread. > > > > > > > > > > > > > > > > > > > > > In fact, that's exactly what Python calls it :) > > > > > > > > > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or > > > > > > the > > > > > > virt/firewall driver. > > > > > > 3. The python logging module uses locks. Because of the monkey > > > > > > patch, > > > > > > those locks are actually eventlet or "green" locks and may trigger a > > > > > > green thread context switch. > > > > > > > > > > > > Based on 1-3, does it make sense to say that in the other OS threads > > > > > > (i.e. not main thread), if logging (plus other pure python library > > > > > > code involving locking) is never used, and we do not run a eventlet > > > > > > hub at all, we should never see this problem? > > > > > > > > > > > > > > > > > > > > > That should be correct. I'd have to double check all of the monkey > > > > > patching that eventlet does to make sure there aren't other cases > > > > > where > > > > > you may inadvertently use eventlet primitives across real threads. > > > > > > > > > > JE > > > > > > > > > > > > > > > ___ > > > > > Mailing list: https://launchpad.net/~openstack > > > > > Post to : openstack@lists.launchpad.net > > > > > (mailto:openstack@lists.launchpad.net) > > > > > Unsubscribe : https://launchpad.net/~openstack > > > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > > > > > > > > > > ___ > > > > Mailing list: https://launchpad.net/~openstack > > > > Post to : openstack@lists.launchpad.net > > > > (mailto:openstack@lists.launchpad.net) > > > > Unsubscribe : https://launchpad.net/~openstack > > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > > > > > > > > > ___ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > (mailto:openstack@lists.launchpad.net) > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
As long as we allocate a thread in the eventlet thread pool for the number of mysql connections we want to actually maintain in our connection pool, we shouldn't have problems getting the results we want even with the blocking mysql c drivers. Devin On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote: > The standard python postgresql driver (psycopg2) does have an async mode. > There > are non db api compliant async mysql drivers for gevent. > > > Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500: > > Yes it does. We actually tried to use a pool at diablo release and it was > > very broken. There was discussion about moving over to a pure-python mysql > > library, but it hasn't been tried yet. > > > > Vish > > > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > > > > > There are plenty eventlet discussion recently but I'll stick my > > > question to this thread, although it's pretty much a separate > > > question. :) > > > > > > How is MySQL access handled in eventlet? Presumably it's external C > > > library so it's not going to be monkey patched. Does that make every > > > db access call a blocking call? Thanks, > > > > > > Yun > > > > > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > > (mailto:johan...@erdfelt.com)> wrote: > > > > On Wed, Feb 29, 2012, Yun Mao > > > (mailto:yun...@gmail.com)> wrote: > > > > > Thanks for the explanation. Let me see if I understand this. > > > > > > > > > > 1. Eventlet will never have this problem if there is only 1 OS thread > > > > > -- let's call it main thread. > > > > > > > > > > > > > > > > > In fact, that's exactly what Python calls it :) > > > > > > > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the > > > > > virt/firewall driver. > > > > > 3. The python logging module uses locks. Because of the monkey patch, > > > > > those locks are actually eventlet or "green" locks and may trigger a > > > > > green thread context switch. > > > > > > > > > > Based on 1-3, does it make sense to say that in the other OS threads > > > > > (i.e. not main thread), if logging (plus other pure python library > > > > > code involving locking) is never used, and we do not run a eventlet > > > > > hub at all, we should never see this problem? > > > > > > > > > > > > > > > > > That should be correct. I'd have to double check all of the monkey > > > > patching that eventlet does to make sure there aren't other cases where > > > > you may inadvertently use eventlet primitives across real threads. > > > > > > > > JE > > > > > > > > > > > > ___ > > > > Mailing list: https://launchpad.net/~openstack > > > > Post to : openstack@lists.launchpad.net > > > > (mailto:openstack@lists.launchpad.net) > > > > Unsubscribe : https://launchpad.net/~openstack > > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > > > > > > ___ > > > Mailing list: https://launchpad.net/~openstack > > > Post to : openstack@lists.launchpad.net > > > (mailto:openstack@lists.launchpad.net) > > > Unsubscribe : https://launchpad.net/~openstack > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
The standard python postgresql driver (psycopg2) does have an async mode. There are non db api compliant async mysql drivers for gevent. Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500: > Yes it does. We actually tried to use a pool at diablo release and it was > very broken. There was discussion about moving over to a pure-python mysql > library, but it hasn't been tried yet. > > Vish > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > > > There are plenty eventlet discussion recently but I'll stick my > > question to this thread, although it's pretty much a separate > > question. :) > > > > How is MySQL access handled in eventlet? Presumably it's external C > > library so it's not going to be monkey patched. Does that make every > > db access call a blocking call? Thanks, > > > > Yun > > > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > > wrote: > >> On Wed, Feb 29, 2012, Yun Mao wrote: > >>> Thanks for the explanation. Let me see if I understand this. > >>> > >>> 1. Eventlet will never have this problem if there is only 1 OS thread > >>> -- let's call it main thread. > >> > >> In fact, that's exactly what Python calls it :) > >> > >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the > >>> virt/firewall driver. > >>> 3. The python logging module uses locks. Because of the monkey patch, > >>> those locks are actually eventlet or "green" locks and may trigger a > >>> green thread context switch. > >>> > >>> Based on 1-3, does it make sense to say that in the other OS threads > >>> (i.e. not main thread), if logging (plus other pure python library > >>> code involving locking) is never used, and we do not run a eventlet > >>> hub at all, we should never see this problem? > >> > >> That should be correct. I'd have to double check all of the monkey > >> patching that eventlet does to make sure there aren't other cases where > >> you may inadvertently use eventlet primitives across real threads. > >> > >> JE > >> > >> > >> ___ > >> Mailing list: https://launchpad.net/~openstack > >> Post to : openstack@lists.launchpad.net > >> Unsubscribe : https://launchpad.net/~openstack > >> More help : https://help.launchpad.net/ListHelp > > > > ___ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Someone might have already said this (sure wish the listserv sent me mail faster), but we tried out PyMysql and it was exceptionally slow, even under almost no load. I have a branch in my github that I was using to test out unblocking the database access. For my cases I found that it was unblocked but that it didn't really help performance as much as I had hoped. branch: https://github.com/markwash/nova/tree/optional-db-api-thread-no-pool just the relevant commit: https://github.com/markwash/nova/commit/99e38d3df579670808711eb8acd1f96806d8b6f0 "Vishvananda Ishaya" said: > Yes it does. We actually tried to use a pool at diablo release and it was > very > broken. There was discussion about moving over to a pure-python mysql > library, but > it hasn't been tried yet. > > Vish > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > >> There are plenty eventlet discussion recently but I'll stick my >> question to this thread, although it's pretty much a separate >> question. :) >> >> How is MySQL access handled in eventlet? Presumably it's external C >> library so it's not going to be monkey patched. Does that make every >> db access call a blocking call? Thanks, >> >> Yun >> >> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt >> wrote: >>> On Wed, Feb 29, 2012, Yun Mao wrote: Thanks for the explanation. Let me see if I understand this. 1. Eventlet will never have this problem if there is only 1 OS thread -- let's call it main thread. >>> >>> In fact, that's exactly what Python calls it :) >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the virt/firewall driver. 3. The python logging module uses locks. Because of the monkey patch, those locks are actually eventlet or "green" locks and may trigger a green thread context switch. Based on 1-3, does it make sense to say that in the other OS threads (i.e. not main thread), if logging (plus other pure python library code involving locking) is never used, and we do not run a eventlet hub at all, we should never see this problem? >>> >>> That should be correct. I'd have to double check all of the monkey >>> patching that eventlet does to make sure there aren't other cases where >>> you may inadvertently use eventlet primitives across real threads. >>> >>> JE >>> >>> >>> ___ >>> Mailing list: https://launchpad.net/~openstack >>> Post to : openstack@lists.launchpad.net >>> Unsubscribe : https://launchpad.net/~openstack >>> More help : https://help.launchpad.net/ListHelp >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Just because MySQL is a C library doesn't necessarily mean it can't be made to work with coroutines. ZeroMQ is supported through eventlet.green.zmq and there exists geventmysql (although it appears to me as more a proof-of-concept). Moving to a pure-python mysql library might be the path of least resistance as long as we're committed to eventlet. -- Eric Windisch On Thursday, March 1, 2012 at 3:36 PM, Vishvananda Ishaya wrote: > Yes it does. We actually tried to use a pool at diablo release and it was > very broken. There was discussion about moving over to a pure-python mysql > library, but it hasn't been tried yet. > > Vish > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > > > There are plenty eventlet discussion recently but I'll stick my > > question to this thread, although it's pretty much a separate > > question. :) > > > > How is MySQL access handled in eventlet? Presumably it's external C > > library so it's not going to be monkey patched. Does that make every > > db access call a blocking call? Thanks, > > > > Yun > > > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > (mailto:johan...@erdfelt.com)> wrote: > > > On Wed, Feb 29, 2012, Yun Mao > > (mailto:yun...@gmail.com)> wrote: > > > > Thanks for the explanation. Let me see if I understand this. > > > > > > > > 1. Eventlet will never have this problem if there is only 1 OS thread > > > > -- let's call it main thread. > > > > > > > > > > > > In fact, that's exactly what Python calls it :) > > > > > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the > > > > virt/firewall driver. > > > > 3. The python logging module uses locks. Because of the monkey patch, > > > > those locks are actually eventlet or "green" locks and may trigger a > > > > green thread context switch. > > > > > > > > Based on 1-3, does it make sense to say that in the other OS threads > > > > (i.e. not main thread), if logging (plus other pure python library > > > > code involving locking) is never used, and we do not run a eventlet > > > > hub at all, we should never see this problem? > > > > > > > > > > > > That should be correct. I'd have to double check all of the monkey > > > patching that eventlet does to make sure there aren't other cases where > > > you may inadvertently use eventlet primitives across real threads. > > > > > > JE > > > > > > > > > ___ > > > Mailing list: https://launchpad.net/~openstack > > > Post to : openstack@lists.launchpad.net > > > (mailto:openstack@lists.launchpad.net) > > > Unsubscribe : https://launchpad.net/~openstack > > > More help : https://help.launchpad.net/ListHelp > > > > > > > > ___ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > (mailto:openstack@lists.launchpad.net) > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
>> How is MySQL access handled in eventlet? Presumably it's external C >> library so it's not going to be monkey patched. Does that make every >> db access call a blocking call? Thanks, > Nope, it goes through a thread pool. I feel like this might be an over-simplification. If the question is: "How is MySQL access handled in nova?" The answer would be that we use SQLAlchemy which can load any number of SQL-drivers. These drivers can be either pure Python or C-based drivers. In the case of pure Python drivers, monkey patching can occur and db calls are non-blocking. In the case of drivers which contain C code (or perhaps other blocking calls), db calls will most likely be blocking. If the question is "How is MySQL access handled in eventlet?" the answer would be to use the eventlet.db_pool module to allow db access using thread pools. B -Original Message- From: "Adam Young" Sent: Thursday, March 1, 2012 3:27pm To: openstack@lists.launchpad.net Subject: Re: [Openstack] eventlet weirdness On 03/01/2012 02:45 PM, Yun Mao wrote: > There are plenty eventlet discussion recently but I'll stick my > question to this thread, although it's pretty much a separate > question. :) > > How is MySQL access handled in eventlet? Presumably it's external C > library so it's not going to be monkey patched. Does that make every > db access call a blocking call? Thanks, Nope, it goes through a thread pool. > > Yun > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > wrote: >> On Wed, Feb 29, 2012, Yun Mao wrote: >>> Thanks for the explanation. Let me see if I understand this. >>> >>> 1. Eventlet will never have this problem if there is only 1 OS thread >>> -- let's call it main thread. >> In fact, that's exactly what Python calls it :) >> >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the >>> virt/firewall driver. >>> 3. The python logging module uses locks. Because of the monkey patch, >>> those locks are actually eventlet or "green" locks and may trigger a >>> green thread context switch. >>> >>> Based on 1-3, does it make sense to say that in the other OS threads >>> (i.e. not main thread), if logging (plus other pure python library >>> code involving locking) is never used, and we do not run a eventlet >>> hub at all, we should never see this problem? >> That should be correct. I'd have to double check all of the monkey >> patching that eventlet does to make sure there aren't other cases where >> you may inadvertently use eventlet primitives across real threads. >> >> JE >> >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 1, 2012, at 12:36 PM, Vishvananda Ishaya wrote: > Yes it does. We actually tried to use a pool at diablo release and it was > very broken. There was discussion about moving over to a pure-python mysql > library, but it hasn't been tried yet. > I know some people have tried this... and the performance is... not great. - Chris ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Mar 1, 2012, at 12:27 PM, Adam Young wrote: > On 03/01/2012 02:45 PM, Yun Mao wrote: >> There are plenty eventlet discussion recently but I'll stick my >> question to this thread, although it's pretty much a separate >> question. :) >> >> How is MySQL access handled in eventlet? Presumably it's external C >> library so it's not going to be monkey patched. Does that make every >> db access call a blocking call? Thanks, > > Nope, it goes through a thread pool. Actually, it doesn't use a thread pool right now... so it does block, unless something has changed recently that I'm not aware of. We were using the eventpool dbpool code, but we had to remove it at diablo release time due to issues. Correct me if this is wrong. I'm not sure it's ever been completely revisited, but this is definitely a huge issue for scaling. It's been on my list for a while to take a look at. - Chris ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Yes it does. We actually tried to use a pool at diablo release and it was very broken. There was discussion about moving over to a pure-python mysql library, but it hasn't been tried yet. Vish On Mar 1, 2012, at 11:45 AM, Yun Mao wrote: > There are plenty eventlet discussion recently but I'll stick my > question to this thread, although it's pretty much a separate > question. :) > > How is MySQL access handled in eventlet? Presumably it's external C > library so it's not going to be monkey patched. Does that make every > db access call a blocking call? Thanks, > > Yun > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt > wrote: >> On Wed, Feb 29, 2012, Yun Mao wrote: >>> Thanks for the explanation. Let me see if I understand this. >>> >>> 1. Eventlet will never have this problem if there is only 1 OS thread >>> -- let's call it main thread. >> >> In fact, that's exactly what Python calls it :) >> >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the >>> virt/firewall driver. >>> 3. The python logging module uses locks. Because of the monkey patch, >>> those locks are actually eventlet or "green" locks and may trigger a >>> green thread context switch. >>> >>> Based on 1-3, does it make sense to say that in the other OS threads >>> (i.e. not main thread), if logging (plus other pure python library >>> code involving locking) is never used, and we do not run a eventlet >>> hub at all, we should never see this problem? >> >> That should be correct. I'd have to double check all of the monkey >> patching that eventlet does to make sure there aren't other cases where >> you may inadvertently use eventlet primitives across real threads. >> >> JE >> >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On 03/01/2012 02:45 PM, Yun Mao wrote: There are plenty eventlet discussion recently but I'll stick my question to this thread, although it's pretty much a separate question. :) How is MySQL access handled in eventlet? Presumably it's external C library so it's not going to be monkey patched. Does that make every db access call a blocking call? Thanks, Nope, it goes through a thread pool. Yun On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt wrote: On Wed, Feb 29, 2012, Yun Mao wrote: Thanks for the explanation. Let me see if I understand this. 1. Eventlet will never have this problem if there is only 1 OS thread -- let's call it main thread. In fact, that's exactly what Python calls it :) 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the virt/firewall driver. 3. The python logging module uses locks. Because of the monkey patch, those locks are actually eventlet or "green" locks and may trigger a green thread context switch. Based on 1-3, does it make sense to say that in the other OS threads (i.e. not main thread), if logging (plus other pure python library code involving locking) is never used, and we do not run a eventlet hub at all, we should never see this problem? That should be correct. I'd have to double check all of the monkey patching that eventlet does to make sure there aren't other cases where you may inadvertently use eventlet primitives across real threads. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
It seems that there used to be a db_pool in session.py but got removed by this commit. https://github.com/openstack/nova/commit/f3dd56e916232e38e74d9e2f24ce9a738cac63cf due to this bug: https://bugs.launchpad.net/nova/+bug/838581 But still I'm confused by the discussion. Are we saying eventlet + sqlalchemy + mysql pool is buggy so instead we make every DB call a blocking call? Thanks, Yun On Thu, Mar 1, 2012 at 2:45 PM, Yun Mao wrote: > There are plenty eventlet discussion recently but I'll stick my > question to this thread, although it's pretty much a separate > question. :) > > How is MySQL access handled in eventlet? Presumably it's external C > library so it's not going to be monkey patched. Does that make every > db access call a blocking call? Thanks, ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
There are plenty eventlet discussion recently but I'll stick my question to this thread, although it's pretty much a separate question. :) How is MySQL access handled in eventlet? Presumably it's external C library so it's not going to be monkey patched. Does that make every db access call a blocking call? Thanks, Yun On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt wrote: > On Wed, Feb 29, 2012, Yun Mao wrote: >> Thanks for the explanation. Let me see if I understand this. >> >> 1. Eventlet will never have this problem if there is only 1 OS thread >> -- let's call it main thread. > > In fact, that's exactly what Python calls it :) > >> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the >> virt/firewall driver. >> 3. The python logging module uses locks. Because of the monkey patch, >> those locks are actually eventlet or "green" locks and may trigger a >> green thread context switch. >> >> Based on 1-3, does it make sense to say that in the other OS threads >> (i.e. not main thread), if logging (plus other pure python library >> code involving locking) is never used, and we do not run a eventlet >> hub at all, we should never see this problem? > > That should be correct. I'd have to double check all of the monkey > patching that eventlet does to make sure there aren't other cases where > you may inadvertently use eventlet primitives across real threads. > > JE > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Wed, Feb 29, 2012, Yun Mao wrote: > Thanks for the explanation. Let me see if I understand this. > > 1. Eventlet will never have this problem if there is only 1 OS thread > -- let's call it main thread. In fact, that's exactly what Python calls it :) > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the > virt/firewall driver. > 3. The python logging module uses locks. Because of the monkey patch, > those locks are actually eventlet or "green" locks and may trigger a > green thread context switch. > > Based on 1-3, does it make sense to say that in the other OS threads > (i.e. not main thread), if logging (plus other pure python library > code involving locking) is never used, and we do not run a eventlet > hub at all, we should never see this problem? That should be correct. I'd have to double check all of the monkey patching that eventlet does to make sure there aren't other cases where you may inadvertently use eventlet primitives across real threads. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
Thanks for the explanation. Let me see if I understand this. 1. Eventlet will never have this problem if there is only 1 OS thread -- let's call it main thread. 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the virt/firewall driver. 3. The python logging module uses locks. Because of the monkey patch, those locks are actually eventlet or "green" locks and may trigger a green thread context switch. Based on 1-3, does it make sense to say that in the other OS threads (i.e. not main thread), if logging (plus other pure python library code involving locking) is never used, and we do not run a eventlet hub at all, we should never see this problem? Thanks, Yun On Wed, Feb 29, 2012 at 5:24 PM, Johannes Erdfelt wrote: > On Wed, Feb 29, 2012, Yun Mao wrote: >> we sometimes notice this error message which prevent us from starting >> nova services occasionally. We are using a somewhat modified diablo >> stable release on Ubuntu 11.10. It may very well be the problem from >> our patches but I'm wondering if you guys have any insight. In what >> condition does this error occur? There is a similar bug in here: >> https://bugs.launchpad.net/nova/+bug/831599 >> >> but that doesn't offer much insight to me. Helps are very appreciated. >> Thanks, > > greenlet threads (used by eventlet) can't be scheduled across real > threads. This usually isn't done explicitly, but can happen as a side > effect if code uses locks. logging is one instance that I've run into. > > This generally hasn't been a problem with nova since it uses the > eventlet monkey patching that makes it hard to generate real threads. > > There are two places (at least in trunk) where you need to be careful, > both nova/virt/xenapi_conn.py and libvirt/firewall.py use tpool which > does create a real thread in the background. > > If you use logging (and it's not the only source of this problem) then > you can run into this eventlet message. > > JE > > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Wed, Feb 29, 2012 at 1:02 PM, Yun Mao wrote: > Hi, > > we sometimes notice this error message which prevent us from starting > nova services occasionally. We are using a somewhat modified diablo > stable release on Ubuntu 11.10. It may very well be the problem from > our patches but I'm wondering if you guys have any insight. In what > condition does this error occur? There is a similar bug in here: > https://bugs.launchpad.net/nova/+bug/831599 > > but that doesn't offer much insight to me. Helps are very appreciated. Thanks, > > Yun One tip - make sure you capture stdout/stderr, as well as logs. Although I haven't see this particular error, I have seen at least one case where libvirt related errors weren't in the log, but made it to the console. mike ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
On Wed, Feb 29, 2012, Yun Mao wrote: > we sometimes notice this error message which prevent us from starting > nova services occasionally. We are using a somewhat modified diablo > stable release on Ubuntu 11.10. It may very well be the problem from > our patches but I'm wondering if you guys have any insight. In what > condition does this error occur? There is a similar bug in here: > https://bugs.launchpad.net/nova/+bug/831599 > > but that doesn't offer much insight to me. Helps are very appreciated. Thanks, greenlet threads (used by eventlet) can't be scheduled across real threads. This usually isn't done explicitly, but can happen as a side effect if code uses locks. logging is one instance that I've run into. This generally hasn't been a problem with nova since it uses the eventlet monkey patching that makes it hard to generate real threads. There are two places (at least in trunk) where you need to be careful, both nova/virt/xenapi_conn.py and libvirt/firewall.py use tpool which does create a real thread in the background. If you use logging (and it's not the only source of this problem) then you can run into this eventlet message. JE ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] eventlet weirdness
I have been encountering these quite a bit myself recently in another project. For me the errors were a result of tpool.execute() in a non-cooperative thread context. My guess as to the root cause is that some of eventlet's cooperative waiting code is not safe to use when not running in an eventlet coroutine context. My solution (which may not work for you) involved switching based on whether I'm in a greenthread or not, and either call tpool.execute() or the underlying function directly. Fortunately for me I can know at compile time what context I will be in. I think there is a way to query eventlet to see if you are currently in a greenthread or not, but I haven't finished diving into that documentation yet. Good luck, Mark On Wed, Feb 29, 2012 at 1:02 PM, Yun Mao wrote: > Hi, > > we sometimes notice this error message which prevent us from starting > nova services occasionally. We are using a somewhat modified diablo > stable release on Ubuntu 11.10. It may very well be the problem from > our patches but I'm wondering if you guys have any insight. In what > condition does this error occur? There is a similar bug in here: > https://bugs.launchpad.net/nova/+bug/831599 > > but that doesn't offer much insight to me. Helps are very appreciated. > Thanks, > > Yun > > 2012-02-23 16:54:52,788 DEBUG nova.utils > [43f98259-6ba8-4e5d-bc0e-9eab978194e5 None None] backend 'nova.db.sqlalchemy.api' from > '/opt/stack/nova/nova/db/sqlalchemy/api.pyc'> from (pid=6385) > __get_backend /opt/stack/nova/nova/utils.py:449 > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line > 336, in fire_timers >timer() > File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line > 56, in __call__ >cb(*args, **kw) > File "/usr/lib/python2.7/dist-packages/eventlet/semaphore.py", line > 95, in _do_acquire >waiter.switch() > error: cannot switch to a different thread > > ___ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] eventlet weirdness
Hi, we sometimes notice this error message which prevent us from starting nova services occasionally. We are using a somewhat modified diablo stable release on Ubuntu 11.10. It may very well be the problem from our patches but I'm wondering if you guys have any insight. In what condition does this error occur? There is a similar bug in here: https://bugs.launchpad.net/nova/+bug/831599 but that doesn't offer much insight to me. Helps are very appreciated. Thanks, Yun 2012-02-23 16:54:52,788 DEBUG nova.utils [43f98259-6ba8-4e5d-bc0e-9eab978194e5 None None] backend from (pid=6385) __get_backend /opt/stack/nova/nova/utils.py:449 Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 336, in fire_timers timer() File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 56, in __call__ cb(*args, **kw) File "/usr/lib/python2.7/dist-packages/eventlet/semaphore.py", line 95, in _do_acquire waiter.switch() error: cannot switch to a different thread ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp