Re: [Openstack] eventlet weirdness

2012-03-06 Thread Jay Pipes

On 03/05/2012 08:30 PM, Adam Young wrote:

The only time sleep() as called from Python code is going to help you is
if you have a long running stretch of Python code, and you sleep() in
the middle of it.


That's exactly where the greenthread.sleep(0) call in question was used: 
inside a (potentially) long-running loop in _sync_power_states()...


Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Devin Carlen
> If the libvirt API (or other Native API) has an async mode, what you
> can do is provide a synchronos, python based wrapper that does the 
> following.
> 
> register_request callback()
> async_call()
> sleep()
> 
> 

This can be set up like a more traditional multi-threaded model as well.  You 
can eventlet.sleep while waiting for the callback handler to notify the 
greenthread.  This of course assumes your i/o and callback are running in a 
different pthread (eventlet.tpool is fine). So it looks more like:

condition = threading.Condition() # or something like it
register_request_callback(condition)
async_call()
condition.wait()


I found this post to be enormously helpful in understanding some of the nuances 
of dealing with green thread and process thread synchronization and 
communication:
 
http://blog.devork.be/2011/03/synchronising-eventlets-and-threads.html 

Devin

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Adam Young

On 03/05/2012 05:08 PM, Yun Mao wrote:

Hi Phil,

My understanding is that, (forget Nova for a second) in a perfect
eventlet world, a green thread is either doing CPU intensive
computing, or wait in system calls that are IO related. In the latter
case, the eventlet scheduler will suspend the green thread and switch
to another green thread that is ready to run.

Back to reality, as you mentioned this is broken - some IO bound
activity won't cause an eventlet switch. To me the only possibility
that happens is the same reason those MySQL calls are blocking - we
are using C-based modules that don't respect monkey patch and never
yield. I'm suspecting that all libvirt based calls also belong to this
category.


Agree.  I expect that to be the case of any native library.  Monkey 
patching only changes the Python side of the call,  anything in native 
code is too far along for it to be redirected.


Now if those blocking calls can finish in a very short of time (as we
assume for DB calls), then I think inserting a sleep(0) after every
blocking call should be a quick fix to the problem.
Nope.  The blocking call still blocks,  then it returns,  hits the 
sleep, and is scheduled.  The only option is to wrap it with a thread pool.


From an OS perspective,  there are no such things as greenthreads.  The 
same task_struct in the Linux Kernel (representing a Posix thread) that 
manages the body of the web application is used to process the IO.  The 
Linux thread  goes into a sleep state  until the IO comes back,  and the 
Kernel scheduler will schedule another OS process or task.  In order to 
get both the IO to complete and the  greenthread scheudler to process 
another greenthread,  you need to have two Posix threads.


If the libvirt API (or other Native API) has an async mode,  what you 
can do is provide a synchronos,  python based wrapper that does the 
following.


register_request callback()
async_call()
sleep()

The only time sleep() as called from Python code is going to help you is 
if you have a long running stretch of Python code, and you sleep()  in 
the middle of it.






But if it's a long
blocking call like the snapshot case, we are probably screwed anyway
and need OS thread level parallelism or multiprocessing to make it
truly non-blocking.. Thanks,


Yep.


Yun

On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil  wrote:

Hi Yun,

The point of the sleep(0) is to explicitly yield from a long running eventlet 
to so that other eventlets aren't blocked for a long period.   Depending on how 
you look at that either means we're making an explicit judgement on priority, 
or trying to provide a more equal sharing of run-time across eventlets.

It's not that things are CPU bound as such - more just that eventlets have 
every few pre-emption points.Even an IO bound activity like creating a 
snapshot won't cause an eventlet switch.

So in terms of priority we're trying to get to the state where:
  - Important periodic events (such as service status) run when expected  (if 
these take a long time we're stuffed anyway)
  - User initiated actions don't get blocked by background system eventlets 
(such as refreshing power-state)
- Slow action from one user don't block actions from other users (the first 
user will expect their snapshot to take X seconds, the second one won't expect 
their VM creation to take X + Y seconds).

It almost feels like the right level of concurrency would be to have a 
task/process running for each VM, so that there is concurrency across 
un-related VMs, but serialisation for each VM.

Phil

-Original Message-
From: Yun Mao [mailto:yun...@gmail.com]
Sent: 02 March 2012 20:32
To: Day, Phil
Cc: Chris Behrens; Joshua Harlow; openstack
Subject: Re: [Openstack] eventlet weirdness

Hi Phil, I'm a little confused. To what extend does sleep(0) help?

It only gives the greenlet scheduler a chance to switch to another green 
thread. If we are having a CPU bound issue, sleep(0) won't give us access to 
any more CPU cores. So the total time to finish should be the same no matter 
what. It may improve the fairness among different green threads but shouldn't 
help the throughput. I think the only apparent gain to me is situation such 
that there is 1 green thread with long CPU time and many other green threads 
with small CPU time.
The total finish time will be the same with or without sleep(0), but with sleep 
in the first threads, the others should be much more responsive.

However, it's unclear to me which part of Nova is very CPU intensive.
It seems that most work here is IO bound, including the snapshot. Do we have 
other blocking calls besides mysql access? I feel like I'm missing something 
but couldn't figure out what.

Thanks,

Yun


On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil  wrote:

I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really di

Re: [Openstack] eventlet weirdness

2012-03-05 Thread Yun Mao
Hi Phil,

My understanding is that, (forget Nova for a second) in a perfect
eventlet world, a green thread is either doing CPU intensive
computing, or wait in system calls that are IO related. In the latter
case, the eventlet scheduler will suspend the green thread and switch
to another green thread that is ready to run.

Back to reality, as you mentioned this is broken - some IO bound
activity won't cause an eventlet switch. To me the only possibility
that happens is the same reason those MySQL calls are blocking - we
are using C-based modules that don't respect monkey patch and never
yield. I'm suspecting that all libvirt based calls also belong to this
category.

Now if those blocking calls can finish in a very short of time (as we
assume for DB calls), then I think inserting a sleep(0) after every
blocking call should be a quick fix to the problem. But if it's a long
blocking call like the snapshot case, we are probably screwed anyway
and need OS thread level parallelism or multiprocessing to make it
truly non-blocking.. Thanks,

Yun

On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil  wrote:
> Hi Yun,
>
> The point of the sleep(0) is to explicitly yield from a long running eventlet 
> to so that other eventlets aren't blocked for a long period.   Depending on 
> how you look at that either means we're making an explicit judgement on 
> priority, or trying to provide a more equal sharing of run-time across 
> eventlets.
>
> It's not that things are CPU bound as such - more just that eventlets have 
> every few pre-emption points.    Even an IO bound activity like creating a 
> snapshot won't cause an eventlet switch.
>
> So in terms of priority we're trying to get to the state where:
>  - Important periodic events (such as service status) run when expected  (if 
> these take a long time we're stuffed anyway)
>  - User initiated actions don't get blocked by background system eventlets 
> (such as refreshing power-state)
> - Slow action from one user don't block actions from other users (the first 
> user will expect their snapshot to take X seconds, the second one won't 
> expect their VM creation to take X + Y seconds).
>
> It almost feels like the right level of concurrency would be to have a 
> task/process running for each VM, so that there is concurrency across 
> un-related VMs, but serialisation for each VM.
>
> Phil
>
> -Original Message-
> From: Yun Mao [mailto:yun...@gmail.com]
> Sent: 02 March 2012 20:32
> To: Day, Phil
> Cc: Chris Behrens; Joshua Harlow; openstack
> Subject: Re: [Openstack] eventlet weirdness
>
> Hi Phil, I'm a little confused. To what extend does sleep(0) help?
>
> It only gives the greenlet scheduler a chance to switch to another green 
> thread. If we are having a CPU bound issue, sleep(0) won't give us access to 
> any more CPU cores. So the total time to finish should be the same no matter 
> what. It may improve the fairness among different green threads but shouldn't 
> help the throughput. I think the only apparent gain to me is situation such 
> that there is 1 green thread with long CPU time and many other green threads 
> with small CPU time.
> The total finish time will be the same with or without sleep(0), but with 
> sleep in the first threads, the others should be much more responsive.
>
> However, it's unclear to me which part of Nova is very CPU intensive.
> It seems that most work here is IO bound, including the snapshot. Do we have 
> other blocking calls besides mysql access? I feel like I'm missing something 
> but couldn't figure out what.
>
> Thanks,
>
> Yun
>
>
> On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil  wrote:
>> I didn't say it was pretty - Given the choice I'd much rather have a 
>> threading model that really did concurrency and pre-emption all the right 
>> places, and it would be really cool if something managed the threads that 
>> were started so that is a second conflicting request was received it did 
>> some proper tidy up or blocking rather than just leaving the race condition 
>> to work itself out (then we wouldn't have to try and control it by checking 
>> vm_state).
>>
>> However ...   In the current code base where we only have user space based 
>> eventlets, with no pre-emption, and some activities that need to be 
>> prioritised then forcing pre-emption with a sleep(0) seems a pretty small 
>> bit of untidy.   And it works now without a major code refactor.
>>
>> Always open to other approaches ...
>>
>> Phil
>>
>>
>> -Original Message-
>> From: openstack-bounces+philip.day=hp@lists.launchpad.net
>> [mailto:openstack-bo

Re: [Openstack] eventlet weirdness

2012-03-05 Thread Mark Washenberger


"Eric Windisch"  said:

>> an rpc implementation that writes to disk and returns,
> 
> A what? I'm not sure what problem you're looking to solve here or what you 
> think
> the RPC mechanism should do. Perhaps you're speaking of a Kombu or AMQP 
> specific
> improvement?
> 
> There is no absolute need for persistence or durability in RPC. I've done 
> quite a
> bit of analysis of this requirement and it simply isn't necessary. There is 
> some
> need in AMQP for this due to implementation-specific issues, but not 
> necessarily
> unsolvable. However, these problems simply do not exist for all RPC
> implementations...

This was a side issue and I probably should have left it out of my email. I
wasn't angling for persistence at all here.

Rather I was thinking that I sometimes see rpc casts taking 10-20 ms in
nova-api, and I wonder if we could pare that down without harming
reliability by writing casts to a local resource and streaming them over
the network in the background. I'm guessing if that local resource is disk
with fsyncs between each write, there would likely be a performance
degradation, so I'm not advocating that. But without fsyncs seemed like it
might be okay. Maybe this is just silly and you're about to tell me how it's
all a bad idea anyway :-)



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Eric Windisch
> an rpc implementation that writes to disk and returns,

A what? I'm not sure what problem you're looking to solve here or what you 
think the RPC mechanism should do. Perhaps you're speaking of a Kombu or AMQP 
specific improvement?

There is no absolute need for persistence or durability in RPC. I've done quite 
a bit of analysis of this requirement and it simply isn't necessary. There is 
some need in AMQP for this due to implementation-specific issues, but not 
necessarily unsolvable. However, these problems simply do not exist for all RPC 
implementations...

-- 
Eric Windisch




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Day, Phil
> However I'd like to point out that the math below is misleading (the average 
> time for the non-blocking case is also miscalculated but 
> it's not my point). The number that matters more in real life is throughput. 
> For the blocking case it's 3/30 = 0.1 request per second.

I think it depends on whether you are trying to characterise system performance 
(processing time) or perceived user experience (queuing time + processing 
time).   My users are kind of selfish in that they don't care how many 
transactions per second I can get through,  just how long it takes for them to 
get a response from when they submit the request.

Making the DB calls non-blocking does help a very small bit in driving up API 
server utilisation  - but my point was that time spent in the DB is such a 
small part of the total time in the API server that it's not the thing that 
needs to be optimised first. 

Any queuing system will explode when its utilisation approaches 100%, blocking 
or not.   Moving to non-blocking just means that you can hit 100% utilisation 
in the API server with 2 concurrent requests instead of *only* being able to 
hit 90+% with one transition.   That's not a great leap forward in my 
perception.

Phil

-Original Message-
From: Yun Mao [mailto:yun...@gmail.com] 
Sent: 03 March 2012 01:11
To: Day, Phil
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

First I agree that having blocking DB calls is no big deal given the way Nova 
uses mysql and reasonably powerful db server hardware.

However I'd like to point out that the math below is misleading (the average 
time for the nonblocking case is also miscalculated but it's not my point). The 
number that matters more in real life is throughput. For the blocking case it's 
3/30 = 0.1 request per second.
For the non-blocking case it's 3/27=0.11 requests per second. That means if 
there is a request coming in every 9 seconds constantly, the blocking system 
will eventually explode but the nonblocking system can still handle it. 
Therefore, the non-blocking one should be preferred.
Thanks,

Yun

>
> For example in the API server (before we made it properly multi-threaded) 
> with blocking db calls the server was essentially a serial processing queue - 
> each request was fully processed before the next.  With non-blocking db calls 
> we got a lot more apparent concurrencybut only at the expense of making all 
> of the requests equally bad.
>
> Consider a request takes 10 seconds, where after 5 seconds there is a call to 
> the DB which takes 1 second, and three are started at the same time:
>
> Blocking:
> 0 - Request 1 starts
> 10 - Request 1 completes, request 2 starts
> 20 - Request 2 completes, request 3 starts
> 30 - Request 3 competes
> Request 1 completes in 10 seconds
> Request 2 completes in 20 seconds
> Request 3 completes in 30 seconds
> Ave time: 20 sec
>
>
> Non-blocking
> 0 - Request 1 Starts
> 5 - Request 1 gets to db call, request 2 starts
> 10 - Request 2 gets to db call, request 3 starts
> 15 - Request 3 gets to db call, request 1 resumes
> 19 - Request 1 completes, request 2 resumes
> 23 - Request 2 completes,  request 3 resumes
> 27 - Request 3 completes
>
> Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes 
> in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 
> seconds) Ave time: 20 sec
>
> So instead of worrying about making db calls non-blocking we've been working 
> to make certain eventlets non-blocking - i.e. add sleep(0) calls to long 
> running iteration loops - which IMO has a much bigger impact on the 
> performance of the apparent latency of the system.>>>> Thanks for the 
> explanation. Let me see if I understand this.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-05 Thread Day, Phil
Hi Yun,

The point of the sleep(0) is to explicitly yield from a long running eventlet 
to so that other eventlets aren't blocked for a long period.   Depending on how 
you look at that either means we're making an explicit judgement on priority, 
or trying to provide a more equal sharing of run-time across eventlets.

It's not that things are CPU bound as such - more just that eventlets have 
every few pre-emption points.Even an IO bound activity like creating a 
snapshot won't cause an eventlet switch.

So in terms of priority we're trying to get to the state where:
 - Important periodic events (such as service status) run when expected  (if 
these take a long time we're stuffed anyway)
 - User initiated actions don't get blocked by background system eventlets 
(such as refreshing power-state)
- Slow action from one user don't block actions from other users (the first 
user will expect their snapshot to take X seconds, the second one won't expect 
their VM creation to take X + Y seconds).

It almost feels like the right level of concurrency would be to have a 
task/process running for each VM, so that there is concurrency across 
un-related VMs, but serialisation for each VM.

Phil 

-Original Message-
From: Yun Mao [mailto:yun...@gmail.com] 
Sent: 02 March 2012 20:32
To: Day, Phil
Cc: Chris Behrens; Joshua Harlow; openstack
Subject: Re: [Openstack] eventlet weirdness

Hi Phil, I'm a little confused. To what extend does sleep(0) help?

It only gives the greenlet scheduler a chance to switch to another green 
thread. If we are having a CPU bound issue, sleep(0) won't give us access to 
any more CPU cores. So the total time to finish should be the same no matter 
what. It may improve the fairness among different green threads but shouldn't 
help the throughput. I think the only apparent gain to me is situation such 
that there is 1 green thread with long CPU time and many other green threads 
with small CPU time.
The total finish time will be the same with or without sleep(0), but with sleep 
in the first threads, the others should be much more responsive.

However, it's unclear to me which part of Nova is very CPU intensive.
It seems that most work here is IO bound, including the snapshot. Do we have 
other blocking calls besides mysql access? I feel like I'm missing something 
but couldn't figure out what.

Thanks,

Yun


On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil  wrote:
> I didn't say it was pretty - Given the choice I'd much rather have a 
> threading model that really did concurrency and pre-emption all the right 
> places, and it would be really cool if something managed the threads that 
> were started so that is a second conflicting request was received it did some 
> proper tidy up or blocking rather than just leaving the race condition to 
> work itself out (then we wouldn't have to try and control it by checking 
> vm_state).
>
> However ...   In the current code base where we only have user space based 
> eventlets, with no pre-emption, and some activities that need to be 
> prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit 
> of untidy.   And it works now without a major code refactor.
>
> Always open to other approaches ...
>
> Phil
>
>
> -Original Message-
> From: openstack-bounces+philip.day=hp@lists.launchpad.net 
> [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On 
> Behalf Of Chris Behrens
> Sent: 02 March 2012 19:00
> To: Joshua Harlow
> Cc: openstack; Chris Behrens
> Subject: Re: [Openstack] eventlet weirdness
>
> It's not just you
>
>
> On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:
>
>> Does anyone else feel that the following seems really "dirty", or is it just 
>> me.
>>
>> "adding a few sleep(0) calls in various places in the Nova codebase 
>> (as was recently added in the _sync_power_states() periodic task) is 
>> an easy and simple win with pretty much no ill side-effects. :)"
>>
>> Dirty in that it feels like there is something wrong from a design point of 
>> view.
>> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
>> But that's just my gut feeling.
>>
>> :-(
>>
>> On 3/2/12 8:26 AM, "Armando Migliaccio"  
>> wrote:
>>
>> I knew you'd say that :P
>>
>> There you go: https://bugs.launchpad.net/nova/+bug/944145
>>
>> Cheers,
>> Armando
>>
>> > -Original Message-
>> > From: Jay Pipes [mailto:jaypi...@gmail.com]
>> > Sent: 02 March 2012 16:22
>> > To: Armando Migliaccio
>> > Cc: openstack@lists.launchpad.net
>> 

Re: [Openstack] eventlet weirdness

2012-03-05 Thread Kapil Thangavelu
Excerpts from Mark Washenberger's message of 2012-03-04 23:34:03 -0500:
> While we are on the topic of api performance and the database, I have a
> few thoughts I'd like to share.
> 
> TL;DR:
> - we should consider refactoring our wsgi server to leverage multiple
>   processors
> - we could leverage compute-cell database responsibility separataion
>   to speedup our api database performance by several orders of magnitude
> 
> I think the main way eventlet holds us back right now is that we have
> such low utilization. The big jump with multiprocessing or threading
> would be the potential to leverage more powerful hardware. Currently
> nova-api probably wouldn't run any faster on bare metal than it would
> run on an m1.tiny. Of course, this isn't an eventlet limitation per se
> but rather we are limiting ourselves to eventlet single-processing
> performance with our wsgi server implementation.


This seems fairly easily remedied without code changes via usage of something 
like gunicorn (in multi-process single socket mode as wsgi frontend), or any 
generic 
load balancer against multiple processes. But its of limited utility unless the 
individual processes can handle concurrency scenarios greater than 1.

I'm a bit skeptical about the use of multiprocessing, it imposes its own set of 
constraints and problems. Interestingly using like zmq (again with its own 
issues, but more robust imo than multiprocessing) allows for transparency from 
single process ipc to network ipc without the file handle, event loop 
inheritance concerns of something like multprocessing.


> 
> However, the greatest performance improvement I see would come from
> streamlining the database interactions incurred on each nova-api
> request. We have been pretty fast-and-loose with adding database
> and glance calls to the openstack api controllers and compute api.
> I am especially thinking of the extension mechanism, which tends
> to require another database call for each /servers extension a
> deployer chooses to enable.
> 
> But, if we think in ideal terms, each api request should perform
> no more than 1 database call for queries, and no more than 2 db calls
> for commands (validation + initial creation). In addition, I can
> imagine an implementation where these database calls don't have any
> joins, and involve no more than one network roundtrip.
>

is there any debug tooling around api endpoints that can identify these calls 
ala some of the wsgi middleware targeted towards web apps (ie. debugtoolbars).

 
> Beyond refactoring the way we add in data for response extensions,
> I think the right way to get this database performance is make the
> compute-cells approach the "normal". In this approach, there are
> at least two nova databases, one which lives along with the nova-api
> nodes, and one that lives in a compute cell. The api database is kept
> up to date through asynchronous updates that bubble up from the
> compute cells. With this separation, we are free to tailor the schema
> of the api database to match api performance needs, while we tailor
> the schema of the compute cell database to the operational requirements
> of compute workers. In particular, we can completely denormalize the
> tables in the api database without creating unpleasant side effects
> in the compute manager code. This denormalization both means fewer
> database interactions and fewer joins (which likely matters for larger
> deployments).
> 
> If we partner this streamlining and denormalization approach with
> similar attentions to glance performance and an rpc implementation
> that writes to disk and returns, processing network activities in
> the background, I think we could get most api actions to < 10 ms on
> reasonable hardware. 
> 
> As much as the initial push on compute-cells is about scale, I think
> it could enable major performance improvements directly on its heels
> during the fulsom cycle. This is something I'd love to talk about more
> at the conference if anyone has any interest.
> 

sounds interesting, but potentially complex, with schema and data drift 
possibilities.

cheers,

Kapil

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-04 Thread Chris Behrens


On Mar 4, 2012, at 8:56 PM, Gabe Westmaas  
> I agree with this paragraph whole heartedly!  I would definitely like to see 
> this separation not only for the reasons you list above (performance, all 
> installations behaving the same way) but also because I think it gives us a 
> lot more power to help handle seamless upgrades - another topic I'm sure we 
> will be discussing at the conference.
> 

And it makes the compute cells stuff plug in a LOT more cleanly.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-04 Thread Chris Behrens
Pretty much +1 to all of that.  The other problem I see that a separate 'view' 
for the API solves...is state tracking.  I feel API should be keeping its own 
state on things.  What the API allows per the spec should be completely 
separated from the services state tracking.  As you mention, compute cells 
somewhat achieves this also as a side effect of its implementation.

On Mar 4, 2012, at 8:34 PM, "Mark Washenberger" 
 wrote:

> While we are on the topic of api performance and the database, I have a
> few thoughts I'd like to share.
> 
> TL;DR:
> - we should consider refactoring our wsgi server to leverage multiple
>  processors
> - we could leverage compute-cell database responsibility separataion
>  to speedup our api database performance by several orders of magnitude
> 
> I think the main way eventlet holds us back right now is that we have
> such low utilization. The big jump with multiprocessing or threading
> would be the potential to leverage more powerful hardware. Currently
> nova-api probably wouldn't run any faster on bare metal than it would
> run on an m1.tiny. Of course, this isn't an eventlet limitation per se
> but rather we are limiting ourselves to eventlet single-processing
> performance with our wsgi server implementation.
> 
> However, the greatest performance improvement I see would come from
> streamlining the database interactions incurred on each nova-api
> request. We have been pretty fast-and-loose with adding database
> and glance calls to the openstack api controllers and compute api.
> I am especially thinking of the extension mechanism, which tends
> to require another database call for each /servers extension a
> deployer chooses to enable.
> 
> But, if we think in ideal terms, each api request should perform
> no more than 1 database call for queries, and no more than 2 db calls
> for commands (validation + initial creation). In addition, I can
> imagine an implementation where these database calls don't have any
> joins, and involve no more than one network roundtrip.
> 
> Beyond refactoring the way we add in data for response extensions,
> I think the right way to get this database performance is make the
> compute-cells approach the "normal". In this approach, there are
> at least two nova databases, one which lives along with the nova-api
> nodes, and one that lives in a compute cell. The api database is kept
> up to date through asynchronous updates that bubble up from the
> compute cells. With this separation, we are free to tailor the schema
> of the api database to match api performance needs, while we tailor
> the schema of the compute cell database to the operational requirements
> of compute workers. In particular, we can completely denormalize the
> tables in the api database without creating unpleasant side effects
> in the compute manager code. This denormalization both means fewer
> database interactions and fewer joins (which likely matters for larger
> deployments).
> 
> If we partner this streamlining and denormalization approach with
> similar attentions to glance performance and an rpc implementation
> that writes to disk and returns, processing network activities in
> the background, I think we could get most api actions to < 10 ms on
> reasonable hardware. 
> 
> As much as the initial push on compute-cells is about scale, I think
> it could enable major performance improvements directly on its heels
> during the fulsom cycle. This is something I'd love to talk about more
> at the conference if anyone has any interest.
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-04 Thread Gabe Westmaas
> Beyond refactoring the way we add in data for response extensions, I think
> the right way to get this database performance is make the compute-cells
> approach the "normal". In this approach, there are at least two nova
> databases, one which lives along with the nova-api nodes, and one that lives
> in a compute cell. The api database is kept up to date through asynchronous
> updates that bubble up from the compute cells. With this separation, we are
> free to tailor the schema of the api database to match api performance
> needs, while we tailor the schema of the compute cell database to the
> operational requirements of compute workers. In particular, we can
> completely denormalize the tables in the api database without creating
> unpleasant side effects in the compute manager code. This denormalization
> both means fewer database interactions and fewer joins (which likely
> matters for larger deployments).

I agree with this paragraph whole heartedly!  I would definitely like to see 
this separation not only for the reasons you list above (performance, all 
installations behaving the same way) but also because I think it gives us a lot 
more power to help handle seamless upgrades - another topic I'm sure we will be 
discussing at the conference.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-04 Thread Mark Washenberger
While we are on the topic of api performance and the database, I have a
few thoughts I'd like to share.

TL;DR:
- we should consider refactoring our wsgi server to leverage multiple
  processors
- we could leverage compute-cell database responsibility separataion
  to speedup our api database performance by several orders of magnitude

I think the main way eventlet holds us back right now is that we have
such low utilization. The big jump with multiprocessing or threading
would be the potential to leverage more powerful hardware. Currently
nova-api probably wouldn't run any faster on bare metal than it would
run on an m1.tiny. Of course, this isn't an eventlet limitation per se
but rather we are limiting ourselves to eventlet single-processing
performance with our wsgi server implementation.

However, the greatest performance improvement I see would come from
streamlining the database interactions incurred on each nova-api
request. We have been pretty fast-and-loose with adding database
and glance calls to the openstack api controllers and compute api.
I am especially thinking of the extension mechanism, which tends
to require another database call for each /servers extension a
deployer chooses to enable.

But, if we think in ideal terms, each api request should perform
no more than 1 database call for queries, and no more than 2 db calls
for commands (validation + initial creation). In addition, I can
imagine an implementation where these database calls don't have any
joins, and involve no more than one network roundtrip.

Beyond refactoring the way we add in data for response extensions,
I think the right way to get this database performance is make the
compute-cells approach the "normal". In this approach, there are
at least two nova databases, one which lives along with the nova-api
nodes, and one that lives in a compute cell. The api database is kept
up to date through asynchronous updates that bubble up from the
compute cells. With this separation, we are free to tailor the schema
of the api database to match api performance needs, while we tailor
the schema of the compute cell database to the operational requirements
of compute workers. In particular, we can completely denormalize the
tables in the api database without creating unpleasant side effects
in the compute manager code. This denormalization both means fewer
database interactions and fewer joins (which likely matters for larger
deployments).

If we partner this streamlining and denormalization approach with
similar attentions to glance performance and an rpc implementation
that writes to disk and returns, processing network activities in
the background, I think we could get most api actions to < 10 ms on
reasonable hardware. 

As much as the initial push on compute-cells is about scale, I think
it could enable major performance improvements directly on its heels
during the fulsom cycle. This is something I'd love to talk about more
at the conference if anyone has any interest.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Kapil Thangavelu
Excerpts from Monsyne Dragon's message of 2012-03-02 16:10:01 -0500:
> 
> On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote:
> 
> > On 03/02/2012 05:34 AM, Day, Phil wrote:
> >> In our experience (running clusters of several hundred nodes) the DB 
> >> performance is not generally the significant factor, so making its calls 
> >> non-blocking  gives only a very small increase in processing capacity and 
> >> creates other side effects in terms of slowing all eventlets down as they 
> >> wait for their turn to run.
> > 
> > Yes, I believe I said that this was the case at the last design summit -- 
> > or rather, I believe I said "is there any evidence that the database is a 
> > performance or scalability problem at all"?
> > 
> >> That shouldn't really be surprising given that the Nova DB is pretty small 
> >> and MySQL is a pretty good DB - throw reasonable hardware at the DB server 
> >> and give it a bit of TLC from a DBA (remove deleted entries from the DB, 
> >> add indexes where the slow query log tells you to, etc) and it shouldn't 
> >> be the bottleneck in the system for performance or scalability.
> > 
> > ++
> > 
> >> We use the python driver and have experimented with allowing the eventlet 
> >> code to make the db calls non-blocking (its not the default setting), and 
> >> it works, but didn't give us any significant advantage.
> > 
> > Yep, identical results to the work that Mark Washenberger did on the same 
> > subject.
> > 
> 
> Has anyone thought about switching to gevent?   It's similar enough to 
> eventlet that the port shouldn't be too bad, and because it's event loop is 
> in C, (libevent), there are C mysql drivers (ultramysql) that will work with 
> it without blocking.   


Switching to gevent won't fix the structural problems with the codebase, that 
nescitated sleeps for greenlet switching. A refactoring to an architecture more 
amenable to decomposing api requests into discrete tasks executed that are 
yieldable would help. incidentally, ultramysql is not dbapi compliant, and 
won't 
work with sqlalchemy.

-kapil


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Yun Mao
First I agree that having blocking DB calls is no big deal given the
way Nova uses mysql and reasonably powerful db server hardware.

However I'd like to point out that the math below is misleading (the
average time for the nonblocking case is also miscalculated but it's
not my point). The number that matters more in real life is
throughput. For the blocking case it's 3/30 = 0.1 request per second.
For the non-blocking case it's 3/27=0.11 requests per second. That
means if there is a request coming in every 9 seconds constantly, the
blocking system will eventually explode but the nonblocking system can
still handle it. Therefore, the non-blocking one should be preferred.
Thanks,

Yun

>
> For example in the API server (before we made it properly multi-threaded) 
> with blocking db calls the server was essentially a serial processing queue - 
> each request was fully processed before the next.  With non-blocking db calls 
> we got a lot more apparent concurrencybut only at the expense of making all 
> of the requests equally bad.
>
> Consider a request takes 10 seconds, where after 5 seconds there is a call to 
> the DB which takes 1 second, and three are started at the same time:
>
> Blocking:
> 0 - Request 1 starts
> 10 - Request 1 completes, request 2 starts
> 20 - Request 2 completes, request 3 starts
> 30 - Request 3 competes
> Request 1 completes in 10 seconds
> Request 2 completes in 20 seconds
> Request 3 completes in 30 seconds
> Ave time: 20 sec
>
>
> Non-blocking
> 0 - Request 1 Starts
> 5 - Request 1 gets to db call, request 2 starts
> 10 - Request 2 gets to db call, request 3 starts
> 15 - Request 3 gets to db call, request 1 resumes
> 19 - Request 1 completes, request 2 resumes
> 23 - Request 2 completes,  request 3 resumes
> 27 - Request 3 completes
>
> Request 1 completes in 19 seconds  (+ 9 seconds)
> Request 2 completes in 24 seconds (+ 4 seconds)
> Request 3 completes in 27 seconds (- 3 seconds)
> Ave time: 20 sec
>
> So instead of worrying about making db calls non-blocking we've been working 
> to make certain eventlets non-blocking - i.e. add sleep(0) calls to long 
> running iteration loops - which IMO has a much bigger impact on the 
> performance of the apparent latency of the system. Thanks for the 
> explanation. Let me see if I understand this.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Vishvananda Ishaya

On Mar 2, 2012, at 2:11 PM, Duncan McGreggor wrote:

> On Fri, Mar 2, 2012 at 4:10 PM, Monsyne Dragon  wrote:
>> 
>> 
>> Has anyone thought about switching to gevent?   It's similar enough to 
>> eventlet that the port shouldn't be too bad, and because it's event loop is 
>> in C, (libevent), there are C mysql drivers (ultramysql) that will work with 
>> it without blocking.
> 
> We've been exploring this possibility at DreamHost, and chatted with
> some other stackers about it at various meat-space venues. Fwiw, it's
> something we'd be very interested in supporting (starting with as much
> test coverage as possible of eventlet's current use in OpenStack, to
> ensure as pain-free a transition as possible).
> 
> d

I would be for an experimental try at this.  Based on the experience of 
starting with twisted and moving to eventlet, I can almost guarantee that we 
will run into a new set of issues.  Concurrency is difficult no matter which 
method/library you use and each change brings a new set of challenges.

That said, gevent is similar enough to eventlet that I think we will at least 
be dealing with the same class of problems, so it might be less painful than 
moving to something totally different like threads, multiprocessing, or (back 
to) twisted. If there were significant performance benefits to switching, it 
would be worth exploring.

I wouldn't want to devote a huge amount of time to this unless we see a 
significant reason to switch, so hopefully Jay gets around to testing it out.

Vish
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Vishvananda Ishaya

On Mar 2, 2012, at 12:50 PM, Jay Pipes wrote:
> 
> We are not using multiprocessing, no.
> 
> We simply start multiple worker processes listening on the same socket, with 
> each worker process having an eventlet greenthread pool.
> 
> You can see the code (taken from Swift and adapted by Chris Behrens and Brian 
> Waldon to use the object-oriented Server approach that Glance/Keystone/Nova 
> uses) here:
> 
> https://github.com/openstack/glance/blob/master/glance/common/wsgi.py
> 
> There is a worker = XXX configuration option that controls the number of 
> worker processes created on server startup. A worker value of 0 indicates to 
> run identically to the way Nova currently runs (one process with an eventlet 
> pool of greenthreads)

This would be excellent to add to nova as an option for performance reasons.  
Especially since you can fallback to the 0 version. I'm always concerned with 
mixing threading and eventlet as it leads to really odd bugs, but it sounds 
like HP has vetted it.  If we keep 0 as the default I don't see any reason why 
it couldn't be added.

Vish___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Duncan McGreggor
On Fri, Mar 2, 2012 at 4:10 PM, Monsyne Dragon  wrote:
>
> On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote:
>
>> On 03/02/2012 05:34 AM, Day, Phil wrote:
>>> In our experience (running clusters of several hundred nodes) the DB 
>>> performance is not generally the significant factor, so making its calls 
>>> non-blocking  gives only a very small increase in processing capacity and 
>>> creates other side effects in terms of slowing all eventlets down as they 
>>> wait for their turn to run.
>>
>> Yes, I believe I said that this was the case at the last design summit -- or 
>> rather, I believe I said "is there any evidence that the database is a 
>> performance or scalability problem at all"?
>>
>>> That shouldn't really be surprising given that the Nova DB is pretty small 
>>> and MySQL is a pretty good DB - throw reasonable hardware at the DB server 
>>> and give it a bit of TLC from a DBA (remove deleted entries from the DB, 
>>> add indexes where the slow query log tells you to, etc) and it shouldn't be 
>>> the bottleneck in the system for performance or scalability.
>>
>> ++
>>
>>> We use the python driver and have experimented with allowing the eventlet 
>>> code to make the db calls non-blocking (its not the default setting), and 
>>> it works, but didn't give us any significant advantage.
>>
>> Yep, identical results to the work that Mark Washenberger did on the same 
>> subject.
>>
>
> Has anyone thought about switching to gevent?   It's similar enough to 
> eventlet that the port shouldn't be too bad, and because it's event loop is 
> in C, (libevent), there are C mysql drivers (ultramysql) that will work with 
> it without blocking.

We've been exploring this possibility at DreamHost, and chatted with
some other stackers about it at various meat-space venues. Fwiw, it's
something we'd be very interested in supporting (starting with as much
test coverage as possible of eventlet's current use in OpenStack, to
ensure as pain-free a transition as possible).

d

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Joshua Harlow
Why has the ship-sailed?
This is software we are talking about right, there is always a v2 (X-1)
;)

On 3/2/12 12:38 PM, "Caitlin Bestler"  wrote:

Duncan McGregor wrote:


>Like so many things that are aesthetic in nature, the statement above is 
>misleading. Using a callback, event-based, deferred/promise oriented system is 
>hard for *some*. It is far, far easier for >others (myself included).

>It's a matter of perception and personal preference.

I would also agree that coding your application as a series of responses to 
events can produce code that is easier to understand and debug.
And that would be a wonderful discussion if we were starting a new project.

But I hope that nobody is suggesting that we rewrite all of OpenStack code away 
from eventlet pseudo-threading after the fact.
Personally I think it was the wrong decision, but that ship has already sailed.

With event-response coding it is obvious that you have to partition any one 
response into segments that do not take so long to
Execute that they are blocking other events. That remains true when you hide 
your event-driven model with eventlet pseudo-threading.
Inserting sleep(0) calls is the most obvious way to break up an overly event 
handler, given that you've already decided to obfuscate the
Code to pretend that it is a thread.



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Joshua Harlow
It could be over-complicated (ie an example), but its a design that lets the 
program think in what tasks need to be accomplished and how to order those 
tasks and not have to think about how those tasks are actually ran (or 
hopefully even what concurrency occurs). Ideally there should be no concurrency 
in each step, that's the whole point of having individual steps :-) A step 
itself shouldn't be concurrent, but the overall  "action" should/could be and 
you leave it up to the "engine" to decide how to run those set of steps. *Just 
my thought*...

On 3/2/12 11:38 AM, "Day, Phil"  wrote:

That sounds a bit over complicated to me - Having a string of tasks sounds like 
you still have to think about what the concurrency is within each step.

There is already a good abstraction around the context of each operation - they 
just (I know - big just) need to be running in something that maps to kernel 
threads rather than user space ones.

All I really want to is to allow more than one action to run at the same time.  
So if I have two requests to create a snapshot, why can't they both run at the 
same time and still allow other things to happen ? I have all these cores 
sitting in my compute node that there that could be used, but I'm still having 
to think like a punch-card programmer submitting batch jobs to the mainframe ;-)

Right now creating snapshots is pretty close to a DoS attack on a compute node.



From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
Sent: 02 March 2012 19:23
To: Day, Phil; Chris Behrens
Cc: openstack
Subject: Re: [Openstack] eventlet weirdness

So a thought I had was that say if the design of a component forces as part of 
its design the ability to be ran with threads or with eventlet or with 
processes.

Say if u break everything up into tasks (where a task would produce some 
output/result/side-effect).
A set of tasks could complete some action (ie, create a vm).
Subtasks could be the following:
0. Validate credentials
1. Get the image
2. Call into libvirt
3. ...

These "tasks", if constructed in a way that makes them stateless, and then 
could be chained together to form an action, then that action could be given 
say to a threaded "engine" that would know how to execute those tasks with 
threads, or it could be given to an eventlet "engine" that would do the same 
with evenlet pool/greenthreads/coroutings, or with processes (and so on). This 
could be one way the design of your code abstracts that kind of execution 
(where eventlet is abstracted away from the actual work being done, instead of 
popping up in calls to sleep(0), ie the leaky abstraction).

On 3/2/12 11:08 AM, "Day, Phil"  wrote:
I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really did concurrency and pre-emption all the right places, and it 
would be really cool if something managed the threads that were started so that 
is a second conflicting request was received it did some proper tidy up or 
blocking rather than just leaving the race condition to work itself out (then 
we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based 
eventlets, with no pre-emption, and some activities that need to be prioritised 
then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   
And it works now without a major code refactor.

Always open to other approaches ...

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really "dirty", or is it just 
> me.
>
> "adding a few sleep(0) calls in various places in the Nova codebase
> (as was recently added in the _sync_power_states() periodic task) is
> an easy and simple win with pretty much no ill side-effects. :)"
>
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
> But that's just my gut feeling.
>
> :-(
>
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
>
> I knew you'd say that :P
>
> There you go: https://bugs.launchpad.net/nova/+bug/944145
>
> Cheers,
> Armando
>
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] 

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 04:10 PM, Monsyne Dragon wrote:

Has anyone thought about switching to gevent?   It's similar enough to eventlet 
that the port shouldn't be too bad, and because it's event loop is in C, 
(libevent), there are C mysql drivers (ultramysql) that will work with it 
without blocking.


Yep, I've thought about doing an experimental branch in Glance to see if 
there's a decent performance benefit. Just got stymied by that damn 24 
hour limit in a day :(


Damn ratelimiting.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Monsyne Dragon

On Mar 2, 2012, at 9:17 AM, Jay Pipes wrote:

> On 03/02/2012 05:34 AM, Day, Phil wrote:
>> In our experience (running clusters of several hundred nodes) the DB 
>> performance is not generally the significant factor, so making its calls 
>> non-blocking  gives only a very small increase in processing capacity and 
>> creates other side effects in terms of slowing all eventlets down as they 
>> wait for their turn to run.
> 
> Yes, I believe I said that this was the case at the last design summit -- or 
> rather, I believe I said "is there any evidence that the database is a 
> performance or scalability problem at all"?
> 
>> That shouldn't really be surprising given that the Nova DB is pretty small 
>> and MySQL is a pretty good DB - throw reasonable hardware at the DB server 
>> and give it a bit of TLC from a DBA (remove deleted entries from the DB, add 
>> indexes where the slow query log tells you to, etc) and it shouldn't be the 
>> bottleneck in the system for performance or scalability.
> 
> ++
> 
>> We use the python driver and have experimented with allowing the eventlet 
>> code to make the db calls non-blocking (its not the default setting), and it 
>> works, but didn't give us any significant advantage.
> 
> Yep, identical results to the work that Mark Washenberger did on the same 
> subject.
> 

Has anyone thought about switching to gevent?   It's similar enough to eventlet 
that the port shouldn't be too bad, and because it's event loop is in C, 
(libevent), there are C mysql drivers (ultramysql) that will work with it 
without blocking.   



>> For example in the API server (before we made it properly multi-threaded)
> 
> By "properly multi-threaded" are you instead referring to making the nova-api 
> server multi-*processed* with eventlet greenthread pools in each process? 
> i.e. The way Swift (and now Glance) works? Or are you referring to a 
> different approach entirely?
> 
> > with blocking db calls the server was essentially a serial processing queue 
> > - each request was fully processed before the next.  With non-blocking db 
> > calls we got a lot more apparent concurrencybut only at the expense of 
> > making all of the requests equally bad.
> 
> Yep, not surprising.
> 
>> Consider a request takes 10 seconds, where after 5 seconds there is a call 
>> to the DB which takes 1 second, and three are started at the same time:
>> 
>> Blocking:
>> 0 - Request 1 starts
>> 10 - Request 1 completes, request 2 starts
>> 20 - Request 2 completes, request 3 starts
>> 30 - Request 3 competes
>> Request 1 completes in 10 seconds
>> Request 2 completes in 20 seconds
>> Request 3 completes in 30 seconds
>> Ave time: 20 sec
>> 
>> Non-blocking
>> 0 - Request 1 Starts
>> 5 - Request 1 gets to db call, request 2 starts
>> 10 - Request 2 gets to db call, request 3 starts
>> 15 - Request 3 gets to db call, request 1 resumes
>> 19 - Request 1 completes, request 2 resumes
>> 23 - Request 2 completes,  request 3 resumes
>> 27 - Request 3 completes
>> 
>> Request 1 completes in 19 seconds  (+ 9 seconds)
>> Request 2 completes in 24 seconds (+ 4 seconds)
>> Request 3 completes in 27 seconds (- 3 seconds)
>> Ave time: 20 sec
>> 
>> So instead of worrying about making db calls non-blocking we've been working 
>> to make certain eventlets non-blocking - i.e. add sleep(0) calls to long 
>> running iteration loops - which IMO has a much bigger impact on the 
>> performance of the apparent latency of the system.
> 
> Yep, and I think adding a few sleep(0) calls in various places in the Nova 
> codebase (as was recently added in the _sync_power_states() periodic task) is 
> an easy and simple win with pretty much no ill side-effects. :)
> 
> Curious... do you have a list of all the places where sleep(0) calls were 
> inserted in the HP Nova code? I can turn that into a bug report and get to 
> work on adding them...
> 
> All the best,
> -jay
> 
>> Phil
>> 
>> 
>> 
>> -Original Message-
>> From: openstack-bounces+philip.day=hp@lists.launchpad.net 
>> [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf 
>> Of Brian Lamar
>> Sent: 01 March 2012 21:31
>> To: openstack@lists.launchpad.net
>> Subject: Re: [Openstack] eventlet weirdness
>> 
>>>> How is MySQL access handled in eventlet? Presumably it's external C
>>>> library so it's not going to be monkey patched. Does that make every
>>>> db access call a blocking call?

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 03:38 PM, Caitlin Bestler wrote:

Duncan McGregor wrote:

Like so many things that are aesthetic in nature, the statement above is 
misleading. Using a callback, event-based, deferred/promise oriented system is 
hard for *some*. It is far, far easier for>others (myself included).



It's a matter of perception and personal preference.


I would also agree that coding your application as a series of responses to 
events can produce code that is easier to understand and debug.
And that would be a wonderful discussion if we were starting a new project.

But I hope that nobody is suggesting that we rewrite all of OpenStack code away 
from eventlet pseudo-threading after the fact.
Personally I think it was the wrong decision, but that ship has already sailed.


Yep, that ship has sailed more than 12 months ago.


With event-response coding it is obvious that you have to partition any one 
response into segments that do not take so long to
Execute that they are blocking other events. That remains true when you hide 
your event-driven model with eventlet pseudo-threading.
Inserting sleep(0) calls is the most obvious way to break up an overly event 
handler, given that you've already decided to obfuscate the
Code to pretend that it is a thread.


I assume you meant "an overly greedy event handler" above?

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 01:35 PM, Joshua Harlow wrote:

Does anyone else feel that the following seems really “dirty”, or is it
just me.

“adding a few sleep(0) calls in various places in the
Nova codebase (as was recently added in the _sync_power_states()
periodic task) is an easy and simple win with pretty much no ill
side-effects. :)”

Dirty in that it feels like there is something wrong from a design point
of view.
Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho.
But that’s just my gut feeling.


It's not really all that dirty, IMHO. You just have to think of 
greenlet.sleep(0) as manually yielding control back to eventlet...


Like Phil said, in the absence of a non-userspace threading model and 
thread scheduler, there's not a whole lot else one can do other than be 
mindful of what functions/methods may run for long periods of time 
and/or block I/O and call sleep(0) in those scenarios where it makes 
sense to yield a timeslice back to other processes.


While it's true that eventlet (and to an extent Twisted) mask some of 
the complexities involved in non-blocking I/O in a threaded(-like) 
application programming model, I don't think there will be an 
eventlet-that-knows-what-methods-should-yield-and-which-should-be-prioritized 
library any time soon.


-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 02:27 PM, Vishvananda Ishaya wrote:


On Mar 2, 2012, at 7:54 AM, Day, Phil wrote:


By "properly multi-threaded" are you instead referring to making the nova-api 
server multi-*processed* with eventlet greenthread pools in each process? i.e. The way 
Swift (and now Glance) works? Or are you referring to a different approach entirely?


Yep - following your posting in here pointing to the glance changes we 
back-ported that into the Diablo API server.   We're now running each API 
server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
happier.  The same changes were being done in parallel into Essex by someone in 
the community I thought ?


Can you or jay write up what this would entail in nova?  (or even ship a diff) 
Are you using multiprocessing? In general we have had issues combining 
multiprocessing and eventlet, so in our deploys we run multiple api servers on 
different ports and load balance with ha proxy. It sounds like what you have is 
working though, so it would be nice to put it in (perhaps with a flag gate) if 
possible.


We are not using multiprocessing, no.

We simply start multiple worker processes listening on the same socket, 
with each worker process having an eventlet greenthread pool.


You can see the code (taken from Swift and adapted by Chris Behrens and 
Brian Waldon to use the object-oriented Server approach that 
Glance/Keystone/Nova uses) here:


https://github.com/openstack/glance/blob/master/glance/common/wsgi.py

There is a worker = XXX configuration option that controls the number of 
worker processes created on server startup. A worker value of 0 
indicates to run identically to the way Nova currently runs (one process 
with an eventlet pool of greenthreads)


Best,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Johannes Erdfelt
On Fri, Mar 02, 2012, Duncan McGreggor  wrote:
> On Fri, Mar 2, 2012 at 2:40 PM, Johannes Erdfelt  wrote:
> > Twisted has a much harder programming model with the same blocking
> > problem that eventlet has.
> 
> Like so many things that are aesthetic in nature, the statement above
> is misleading. Using a callback, event-based, deferred/promise
> oriented system is hard for *some*. It is far, far easier for others
> (myself included).
> 
> It's a matter of perception and personal preference.
>
> It may be apropos to mention that Guido van Rossum himself has stated
> that he shares the same view of concurrent programming in Python as
> Glyph (the founder of Twisted):
>   https://plus.google.com/115212051037621986145/posts/a9SqS7faVWC
> 
> Glyph's post, if you can't see that G+ link:
>   
> http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html
> 
> One thing to keep in mind is that with Twisted, you always have the
> option of deferring to a thread for operations are not async-friendly.

It's a shame that post chooses to ignore eventlet-style concurrency. It
has all of the benefits of being almost as clear where concurrency can
occur without needing a macro key to constantly output 'yield'.

It also integrates with other python libraries better (but obviously not
perfectly).

Using coroutines for concurrency is anti-social programming. It excludes
a whole suite of libraries merely because they didn't conform to your
programming model.

However, this is the wrong discussion to be having. Concurrency isn't
the problem we should be worried about, it's isolation. If we can
sufficiently isolate the work that each daemon needs to do, then
concurrency is trivial. In the best case, they can be separate processes
and we don't need to worry about a programming model. If we're not being
too optimistic then threads with minimal locking is most likely.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Duncan McGreggor
On Fri, Mar 2, 2012 at 3:38 PM, Caitlin Bestler
 wrote:
> Duncan McGregor wrote:
>
>>Like so many things that are aesthetic in nature, the statement above is 
>>misleading. Using a callback, event-based, deferred/promise oriented system 
>>is hard for *some*. It is far, far easier for >others (myself included).
>
>>It's a matter of perception and personal preference.
>
> I would also agree that coding your application as a series of responses to 
> events can produce code that is easier to understand and debug.
> And that would be a wonderful discussion if we were starting a new project.
>
> But I hope that nobody is suggesting that we rewrite all of OpenStack code 
> away from eventlet pseudo-threading after the fact.
> Personally I think it was the wrong decision, but that ship has already 
> sailed.

Agreed.

d

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Caitlin Bestler
Duncan McGregor wrote:


>Like so many things that are aesthetic in nature, the statement above is 
>misleading. Using a callback, event-based, deferred/promise oriented system is 
>hard for *some*. It is far, far easier for >others (myself included).

>It's a matter of perception and personal preference.

I would also agree that coding your application as a series of responses to 
events can produce code that is easier to understand and debug.
And that would be a wonderful discussion if we were starting a new project.

But I hope that nobody is suggesting that we rewrite all of OpenStack code away 
from eventlet pseudo-threading after the fact.
Personally I think it was the wrong decision, but that ship has already sailed.

With event-response coding it is obvious that you have to partition any one 
response into segments that do not take so long to
Execute that they are blocking other events. That remains true when you hide 
your event-driven model with eventlet pseudo-threading.
Inserting sleep(0) calls is the most obvious way to break up an overly event 
handler, given that you've already decided to obfuscate the
Code to pretend that it is a thread.



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Yun Mao
Hi Phil, I'm a little confused. To what extend does sleep(0) help?

It only gives the greenlet scheduler a chance to switch to another
green thread. If we are having a CPU bound issue, sleep(0) won't give
us access to any more CPU cores. So the total time to finish should be
the same no matter what. It may improve the fairness among different
green threads but shouldn't help the throughput. I think the only
apparent gain to me is situation such that there is 1 green thread
with long CPU time and many other green threads with small CPU time.
The total finish time will be the same with or without sleep(0), but
with sleep in the first threads, the others should be much more
responsive.

However, it's unclear to me which part of Nova is very CPU intensive.
It seems that most work here is IO bound, including the snapshot. Do
we have other blocking calls besides mysql access? I feel like I'm
missing something but couldn't figure out what.

Thanks,

Yun


On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil  wrote:
> I didn't say it was pretty - Given the choice I'd much rather have a 
> threading model that really did concurrency and pre-emption all the right 
> places, and it would be really cool if something managed the threads that 
> were started so that is a second conflicting request was received it did some 
> proper tidy up or blocking rather than just leaving the race condition to 
> work itself out (then we wouldn't have to try and control it by checking 
> vm_state).
>
> However ...   In the current code base where we only have user space based 
> eventlets, with no pre-emption, and some activities that need to be 
> prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit 
> of untidy.   And it works now without a major code refactor.
>
> Always open to other approaches ...
>
> Phil
>
>
> -Original Message-
> From: openstack-bounces+philip.day=hp@lists.launchpad.net 
> [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
> Chris Behrens
> Sent: 02 March 2012 19:00
> To: Joshua Harlow
> Cc: openstack; Chris Behrens
> Subject: Re: [Openstack] eventlet weirdness
>
> It's not just you
>
>
> On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:
>
>> Does anyone else feel that the following seems really "dirty", or is it just 
>> me.
>>
>> "adding a few sleep(0) calls in various places in the Nova codebase
>> (as was recently added in the _sync_power_states() periodic task) is
>> an easy and simple win with pretty much no ill side-effects. :)"
>>
>> Dirty in that it feels like there is something wrong from a design point of 
>> view.
>> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
>> But that's just my gut feeling.
>>
>> :-(
>>
>> On 3/2/12 8:26 AM, "Armando Migliaccio"  
>> wrote:
>>
>> I knew you'd say that :P
>>
>> There you go: https://bugs.launchpad.net/nova/+bug/944145
>>
>> Cheers,
>> Armando
>>
>> > -Original Message-
>> > From: Jay Pipes [mailto:jaypi...@gmail.com]
>> > Sent: 02 March 2012 16:22
>> > To: Armando Migliaccio
>> > Cc: openstack@lists.launchpad.net
>> > Subject: Re: [Openstack] eventlet weirdness
>> >
>> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
>> > > I'd be cautious to say that no ill side-effects were introduced. I
>> > > found a
>> > race condition right in the middle of sync_power_states, which I
>> > assume was exposed by "breaking" the task deliberately.
>> >
>> > Such a party-pooper! ;)
>> >
>> > Got a link to the bug report for me?
>> >
>> > Thanks!
>> > -jay
>>
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>>
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Duncan McGreggor
On Fri, Mar 2, 2012 at 2:40 PM, Johannes Erdfelt  wrote:
> On Fri, Mar 02, 2012, Armando Migliaccio  
> wrote:
>> I agree, but then the whole assumption of adopting eventlet to simplify
>> the programming model is hindered by the fact that one has to think
>> harder to what is doing...Nova could've kept Twisted for that matter.
>> The programming model would have been harder, but at least it would
>> have been cleaner and free from icky patching (that's my own opinion
>> anyway).
>
> Twisted has a much harder programming model with the same blocking
> problem that eventlet has.

Like so many things that are aesthetic in nature, the statement above
is misleading. Using a callback, event-based, deferred/promise
oriented system is hard for *some*. It is far, far easier for others
(myself included).

It's a matter of perception and personal preference.

It may be apropos to mention that Guido van Rossum himself has stated
that he shares the same view of concurrent programming in Python as
Glyph (the founder of Twisted):
  https://plus.google.com/115212051037621986145/posts/a9SqS7faVWC

Glyph's post, if you can't see that G+ link:
  
http://glyph.twistedmatrix.com/2012/01/concurrency-spectrum-from-callbacks-to.html

One thing to keep in mind is that with Twisted, you always have the
option of deferring to a thread for operations are not async-friendly.

d

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Eric Windisch
> 
> I agree, but then the whole assumption of adopting eventlet to simplify the 
> programming model is hindered by the fact that one has to think harder to 
> what is doing...Nova could've kept Twisted for that matter. The programming 
> model would have been harder, but at least it would have been cleaner and 
> free from icky patching (that's my own opinion anyway).

Then the assumption is wrong. You need to write with the premise of working 
with Eventlet. For me, eventlet has complicated the programming model by 
forcing me to a specific pattern, although I must admit this has largely been 
due to my use of a C library (libzmq).

-- 
Eric Windisch 


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
Ok - I'll work with Jay on that.



-Original Message-
From: Vishvananda Ishaya [mailto:vishvana...@gmail.com] 
Sent: 02 March 2012 19:27
To: Day, Phil
Cc: Jay Pipes; openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness


On Mar 2, 2012, at 7:54 AM, Day, Phil wrote:

>> By "properly multi-threaded" are you instead referring to making the 
>> nova-api server multi-*processed* with eventlet greenthread pools in each 
>> process? i.e. The way Swift (and now Glance) works? Or are you referring to 
>> a different approach entirely?
> 
> Yep - following your posting in here pointing to the glance changes we 
> back-ported that into the Diablo API server.   We're now running each API 
> server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
> happier.  The same changes were being done in parallel into Essex by someone 
> in the community I thought ?

Can you or jay write up what this would entail in nova?  (or even ship a diff) 
Are you using multiprocessing? In general we have had issues combining 
multiprocessing and eventlet, so in our deploys we run multiple api servers on 
different ports and load balance with ha proxy. It sounds like what you have is 
working though, so it would be nice to put it in (perhaps with a flag gate) if 
possible.
> 
>> Curious... do you have a list of all the places where sleep(0) calls were 
>> inserted in the HP Nova code? I can turn that into a bug report and get to 
>> work on adding them... 
> 
> So far the only two cases we've done this are in the _sync_power_state and  
> in the security group refresh handling 
> (libvirt/firewall/do_refresh_security_group_rules) - which we modified to 
> only refresh for instances in the group and added a sleep in the loop (I need 
> to finish writing the bug report for this one).

Please do this ASAP, I would like to get that fix in.

Vish


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Johannes Erdfelt
On Fri, Mar 02, 2012, Armando Migliaccio  
wrote:
> I agree, but then the whole assumption of adopting eventlet to simplify
> the programming model is hindered by the fact that one has to think
> harder to what is doing...Nova could've kept Twisted for that matter.
> The programming model would have been harder, but at least it would
> have been cleaner and free from icky patching (that's my own opinion
> anyway).

Twisted has a much harder programming model with the same blocking
problem that eventlet has.

> Yes. There is a fine balance to be struck here: do you let potential
> races appear in your system and deal with them on a case-by-case base,
> or do you introduce mutexes and deal with potential inefficiency
> and/or deadlocks? I'd rather go with the former here.

Neither of these options are acceptable IMO.

If we want to minimize the number of bugs, we should make the task as
easy as possible on the programmer. Constantly trying to track
multiple threads of execution and what possible races that can happen
and what locking is required will end up with more bugs in the long run.

I'd priortize correct over performant. It's easier to optimize when
you're sure the code is correct than the other way around.

I'd like to see a move towards more serialization of actions. For
instance, if all operations on an instance are serialized, then there
are no opportunities to race against other operations on the same
instance.

We can loosen the restrictions when we've identified bottlenecks and
we're sure it's safe to do so.

I'm sure we'll find out that performance is still very good.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
That sounds a bit over complicated to me - Having a string of tasks sounds like 
you still have to think about what the concurrency is within each step.

There is already a good abstraction around the context of each operation - they 
just (I know - big just) need to be running in something that maps to kernel 
threads rather than user space ones.

All I really want to is to allow more than one action to run at the same time.  
So if I have two requests to create a snapshot, why can't they both run at the 
same time and still allow other things to happen ? I have all these cores 
sitting in my compute node that there that could be used, but I'm still having 
to think like a punch-card programmer submitting batch jobs to the mainframe ;-)

Right now creating snapshots is pretty close to a DoS attack on a compute node.


From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
Sent: 02 March 2012 19:23
To: Day, Phil; Chris Behrens
Cc: openstack
Subject: Re: [Openstack] eventlet weirdness

So a thought I had was that say if the design of a component forces as part of 
its design the ability to be ran with threads or with eventlet or with 
processes.

Say if u break everything up into tasks (where a task would produce some 
output/result/side-effect).
A set of tasks could complete some action (ie, create a vm).
Subtasks could be the following:
0. Validate credentials
1. Get the image
2. Call into libvirt
3. ...

These "tasks", if constructed in a way that makes them stateless, and then 
could be chained together to form an action, then that action could be given 
say to a threaded "engine" that would know how to execute those tasks with 
threads, or it could be given to an eventlet "engine" that would do the same 
with evenlet pool/greenthreads/coroutings, or with processes (and so on). This 
could be one way the design of your code abstracts that kind of execution 
(where eventlet is abstracted away from the actual work being done, instead of 
popping up in calls to sleep(0), ie the leaky abstraction).

On 3/2/12 11:08 AM, "Day, Phil"  wrote:
I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really did concurrency and pre-emption all the right places, and it 
would be really cool if something managed the threads that were started so that 
is a second conflicting request was received it did some proper tidy up or 
blocking rather than just leaving the race condition to work itself out (then 
we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based 
eventlets, with no pre-emption, and some activities that need to be prioritised 
then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   
And it works now without a major code refactor.

Always open to other approaches ...

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really "dirty", or is it just 
> me.
>
> "adding a few sleep(0) calls in various places in the Nova codebase
> (as was recently added in the _sync_power_states() periodic task) is
> an easy and simple win with pretty much no ill side-effects. :)"
>
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
> But that's just my gut feeling.
>
> :-(
>
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
>
> I knew you'd say that :P
>
> There you go: https://bugs.launchpad.net/nova/+bug/944145
>
> Cheers,
> Armando
>
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I
> > > found a
> > race condition right in the middle of sync_power_states, which I
> > assume was exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Vishvananda Ishaya

On Mar 2, 2012, at 7:54 AM, Day, Phil wrote:

>> By "properly multi-threaded" are you instead referring to making the 
>> nova-api server multi-*processed* with eventlet greenthread pools in each 
>> process? i.e. The way Swift (and now Glance) works? Or are you referring to 
>> a different approach entirely?
> 
> Yep - following your posting in here pointing to the glance changes we 
> back-ported that into the Diablo API server.   We're now running each API 
> server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
> happier.  The same changes were being done in parallel into Essex by someone 
> in the community I thought ?

Can you or jay write up what this would entail in nova?  (or even ship a diff) 
Are you using multiprocessing? In general we have had issues combining 
multiprocessing and eventlet, so in our deploys we run multiple api servers on 
different ports and load balance with ha proxy. It sounds like what you have is 
working though, so it would be nice to put it in (perhaps with a flag gate) if 
possible.
> 
>> Curious... do you have a list of all the places where sleep(0) calls were 
>> inserted in the HP Nova code? I can turn that into a bug report and get to 
>> work on adding them... 
> 
> So far the only two cases we've done this are in the _sync_power_state and  
> in the security group refresh handling 
> (libvirt/firewall/do_refresh_security_group_rules) - which we modified to 
> only refresh for instances in the group and added a sleep in the loop (I need 
> to finish writing the bug report for this one).

Please do this ASAP, I would like to get that fix in.

Vish


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Andy Smith
On Fri, Mar 2, 2012 at 10:35 AM, Joshua Harlow wrote:

>  Does anyone else feel that the following seems really “dirty”, or is it
> just me.
>

Any feeling of dirtiness is just due to it being called "sleep," all you
are doing is yielding control to allow another co-routine to schedule
itself. Blocking code is still blocking code, you have to give it some
break points if you are going to run a loop that waits on something else.



>
> “adding a few sleep(0) calls in various places in the
>
> Nova codebase (as was recently added in the _sync_power_states()
> periodic task) is an easy and simple win with pretty much no ill
> side-effects. :)”
>
> Dirty in that it feels like there is something wrong from a design point
> of view.
> Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho.
> But that’s just my gut feeling.
>
> *:-(
> *
>
> On 3/2/12 8:26 AM, "Armando Migliaccio" 
> wrote:
>
> I knew you'd say that :P
>
> There you go: https://bugs.launchpad.net/nova/+bug/944145
>
> Cheers,
> Armando
>
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com ]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I
> found a
> > race condition right in the middle of sync_power_states, which I assume
> was
> > exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Joshua Harlow
So a thought I had was that say if the design of a component forces as part of 
its design the ability to be ran with threads or with eventlet or with 
processes.

Say if u break everything up into tasks (where a task would produce some 
output/result/side-effect).
A set of tasks could complete some action (ie, create a vm).
Subtasks could be the following:
0. Validate credentials
1. Get the image
2. Call into libvirt
3. ...

These "tasks", if constructed in a way that makes them stateless, and then 
could be chained together to form an action, then that action could be given 
say to a threaded "engine" that would know how to execute those tasks with 
threads, or it could be given to an eventlet "engine" that would do the same 
with evenlet pool/greenthreads/coroutings, or with processes (and so on). This 
could be one way the design of your code abstracts that kind of execution 
(where eventlet is abstracted away from the actual work being done, instead of 
popping up in calls to sleep(0), ie the leaky abstraction).

On 3/2/12 11:08 AM, "Day, Phil"  wrote:

I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really did concurrency and pre-emption all the right places, and it 
would be really cool if something managed the threads that were started so that 
is a second conflicting request was received it did some proper tidy up or 
blocking rather than just leaving the race condition to work itself out (then 
we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based 
eventlets, with no pre-emption, and some activities that need to be prioritised 
then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   
And it works now without a major code refactor.

Always open to other approaches ...

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really "dirty", or is it just 
> me.
>
> "adding a few sleep(0) calls in various places in the Nova codebase
> (as was recently added in the _sync_power_states() periodic task) is
> an easy and simple win with pretty much no ill side-effects. :)"
>
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
> But that's just my gut feeling.
>
> :-(
>
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
>
> I knew you'd say that :P
>
> There you go: https://bugs.launchpad.net/nova/+bug/944145
>
> Cheers,
> Armando
>
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I
> > > found a
> > race condition right in the middle of sync_power_states, which I
> > assume was exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Armando Migliaccio


> -Original Message-
> From: Eric Windisch [mailto:e...@cloudscaling.com]
> Sent: 02 March 2012 19:04
> To: Joshua Harlow
> Cc: Armando Migliaccio; Jay Pipes; openstack
> Subject: Re: [Openstack] eventlet weirdness
> 
> The problem is that unless you sleep(0), eventlet only switches context when
> you hit a file descriptor.
> 
> As long as python coroutines are used, we should put sleep(0) where-ever it is
> expected that there will be a long-running loop where file descriptors are not
> touched. As noted elsewhere in this thread, MySQL file descriptors don't
> count, they're not coroutine friendly.
> 
> The premise is that cpus are pretty fast and get quickly from one call of a
> file descriptor to another, that the blocking of these descriptors is what a
> CPU most waits on, and this is an easy and obvious place to switch coroutines
> via monkey-patching.
> 
> That said, it shouldn't be necessary to "sprinkle" sleep(0) calls. They should
> be strategically placed, as necessary.

I agree, but then the whole assumption of adopting eventlet to simplify the 
programming model is hindered by the fact that one has to think harder to what 
is doing...Nova could've kept Twisted for that matter. The programming model 
would have been harder, but at least it would have been cleaner and free from 
icky patching (that's my own opinion anyway).

> 
> "race-conditions" around coroutine switching sounds more like thread-safety
> issues...
> 

Yes. There is a fine balance to be struck here: do you let potential races 
appear in your system and deal with them on a case-by-case base, or do you 
introduce mutexes and deal with potential inefficiency and/or deadlocks? I'd 
rather go with the former here.

> --
> Eric Windisch
> 
> 
> On Friday, March 2, 2012 at 1:35 PM, Joshua Harlow wrote:
> 
> > Re: [Openstack] eventlet weirdness Does anyone else feel that the following
> seems really “dirty”, or is it just me.
> >
> > “adding a few sleep(0) calls in various places in the Nova codebase
> > (as was recently added in the _sync_power_states() periodic task) is
> > an easy and simple win with pretty much no ill side-effects. :)”
> >
> > Dirty in that it feels like there is something wrong from a design point of
> view.
> > Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho.
> > But that’s just my gut feeling.
> >
> > :-(
> >
> > On 3/2/12 8:26 AM, "Armando Migliaccio" 
> wrote:
> >
> > > I knew you'd say that :P
> > >
> > > There you go: https://bugs.launchpad.net/nova/+bug/944145
> > >
> > > Cheers,
> > > Armando
> > >
> > > > -Original Message-
> > > > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > > > Sent: 02 March 2012 16:22
> > > > To: Armando Migliaccio
> > > > Cc: openstack@lists.launchpad.net
> > > > Subject: Re: [Openstack] eventlet weirdness
> > > >
> > > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > > > I'd be cautious to say that no ill side-effects were introduced.
> > > > > I found a
> > > >
> > > > race condition right in the middle of sync_power_states, which I
> > > > assume was exposed by "breaking" the task deliberately.
> > > >
> > > > Such a party-pooper! ;)
> > > >
> > > > Got a link to the bug report for me?
> > > >
> > > > Thanks!
> > > > -jay
> > >
> > >
> > > ___
> > > Mailing list: https://launchpad.net/~openstack Post to :
> > > openstack@lists.launchpad.net Unsubscribe :
> > > https://launchpad.net/~openstack More help :
> > > https://help.launchpad.net/ListHelp
> >
> > ___
> > Mailing list: https://launchpad.net/~openstack Post to :
> > openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
> > Unsubscribe : https://launchpad.net/~openstack More help :
> > https://help.launchpad.net/ListHelp
> 
> 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
I didn't say it was pretty - Given the choice I'd much rather have a threading 
model that really did concurrency and pre-emption all the right places, and it 
would be really cool if something managed the threads that were started so that 
is a second conflicting request was received it did some proper tidy up or 
blocking rather than just leaving the race condition to work itself out (then 
we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based 
eventlets, with no pre-emption, and some activities that need to be prioritised 
then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   
And it works now without a major code refactor.

Always open to other approaches ...

Phil
 

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really "dirty", or is it just 
> me.
> 
> "adding a few sleep(0) calls in various places in the Nova codebase 
> (as was recently added in the _sync_power_states() periodic task) is 
> an easy and simple win with pretty much no ill side-effects. :)"
> 
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho. 
> But that's just my gut feeling.
> 
> :-(
> 
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
> 
> I knew you'd say that :P
> 
> There you go: https://bugs.launchpad.net/nova/+bug/944145
> 
> Cheers,
> Armando
> 
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I 
> > > found a
> > race condition right in the middle of sync_power_states, which I 
> > assume was exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Eric Windisch
The problem is that unless you sleep(0), eventlet only switches context when 
you hit a file descriptor.  

As long as python coroutines are used, we should put sleep(0) where-ever it is 
expected that there will be a long-running loop where file descriptors are not 
touched. As noted elsewhere in this thread, MySQL file descriptors don't count, 
they're not coroutine friendly.

The premise is that cpus are pretty fast and get quickly from one call of a 
file descriptor to another, that the blocking of these descriptors is what a 
CPU most waits on, and this is an easy and obvious place to switch coroutines 
via monkey-patching.

That said, it shouldn't be necessary to "sprinkle" sleep(0) calls. They should 
be strategically placed, as necessary.

"race-conditions" around coroutine switching sounds more like thread-safety 
issues...  

--  
Eric Windisch


On Friday, March 2, 2012 at 1:35 PM, Joshua Harlow wrote:

> Re: [Openstack] eventlet weirdness Does anyone else feel that the following 
> seems really “dirty”, or is it just me.
>  
> “adding a few sleep(0) calls in various places in the
> Nova codebase (as was recently added in the _sync_power_states()
> periodic task) is an easy and simple win with pretty much no ill
> side-effects. :)”
>  
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho.  
> But that’s just my gut feeling.
>  
> :-(
>  
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
>  
> > I knew you'd say that :P
> >  
> > There you go: https://bugs.launchpad.net/nova/+bug/944145
> >  
> > Cheers,
> > Armando
> >  
> > > -Original Message-
> > > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > > Sent: 02 March 2012 16:22
> > > To: Armando Migliaccio
> > > Cc: openstack@lists.launchpad.net
> > > Subject: Re: [Openstack] eventlet weirdness
> > >  
> > > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > > I'd be cautious to say that no ill side-effects were introduced. I 
> > > > found a
> > >  
> > > race condition right in the middle of sync_power_states, which I assume 
> > > was
> > > exposed by "breaking" the task deliberately.
> > >  
> > > Such a party-pooper! ;)
> > >  
> > > Got a link to the bug report for me?
> > >  
> > > Thanks!
> > > -jay
> >  
> >  
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~openstack
> > More help : https://help.launchpad.net/ListHelp
>  
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Chris Behrens
It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really “dirty”, or is it just 
> me.
> 
> “adding a few sleep(0) calls in various places in the
> Nova codebase (as was recently added in the _sync_power_states()
> periodic task) is an easy and simple win with pretty much no ill
> side-effects. :)”
> 
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. 
> But that’s just my gut feeling.
> 
> :-(
> 
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
> 
> I knew you'd say that :P
> 
> There you go: https://bugs.launchpad.net/nova/+bug/944145
> 
> Cheers,
> Armando
> 
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I found a
> > race condition right in the middle of sync_power_states, which I assume was
> > exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Lorin Hochstein
Looks like a textbook example of a "leaky abstraction" 
<http://www.joelonsoftware.com/articles/LeakyAbstractions.html> to me.

Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com


On Mar 2, 2012, at 1:35 PM, Joshua Harlow wrote:

> Does anyone else feel that the following seems really “dirty”, or is it just 
> me.
> 
> “adding a few sleep(0) calls in various places in the
> Nova codebase (as was recently added in the _sync_power_states()
> periodic task) is an easy and simple win with pretty much no ill
> side-effects. :)”
> 
> Dirty in that it feels like there is something wrong from a design point of 
> view.
> Sprinkling “sleep(0)” seems like its a band-aid on a larger problem imho. 
> But that’s just my gut feeling.
> 
> :-(
> 
> On 3/2/12 8:26 AM, "Armando Migliaccio"  
> wrote:
> 
> I knew you'd say that :P
> 
> There you go: https://bugs.launchpad.net/nova/+bug/944145
> 
> Cheers,
> Armando
> 
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 02 March 2012 16:22
> > To: Armando Migliaccio
> > Cc: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> > On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > > I'd be cautious to say that no ill side-effects were introduced. I found a
> > race condition right in the middle of sync_power_states, which I assume was
> > exposed by "breaking" the task deliberately.
> >
> > Such a party-pooper! ;)
> >
> > Got a link to the bug report for me?
> >
> > Thanks!
> > -jay
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Joshua Harlow
Does anyone else feel that the following seems really "dirty", or is it just me.

"adding a few sleep(0) calls in various places in the
Nova codebase (as was recently added in the _sync_power_states()
periodic task) is an easy and simple win with pretty much no ill
side-effects. :)"

Dirty in that it feels like there is something wrong from a design point of 
view.
Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
But that's just my gut feeling.

:-(

On 3/2/12 8:26 AM, "Armando Migliaccio"  
wrote:

I knew you'd say that :P

There you go: https://bugs.launchpad.net/nova/+bug/944145

Cheers,
Armando

> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: 02 March 2012 16:22
> To: Armando Migliaccio
> Cc: openstack@lists.launchpad.net
> Subject: Re: [Openstack] eventlet weirdness
>
> On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > I'd be cautious to say that no ill side-effects were introduced. I found a
> race condition right in the middle of sync_power_states, which I assume was
> exposed by "breaking" the task deliberately.
>
> Such a party-pooper! ;)
>
> Got a link to the bug report for me?
>
> Thanks!
> -jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Armando Migliaccio
I knew you'd say that :P

There you go: https://bugs.launchpad.net/nova/+bug/944145

Cheers,
Armando

> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: 02 March 2012 16:22
> To: Armando Migliaccio
> Cc: openstack@lists.launchpad.net
> Subject: Re: [Openstack] eventlet weirdness
> 
> On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
> > I'd be cautious to say that no ill side-effects were introduced. I found a
> race condition right in the middle of sync_power_states, which I assume was
> exposed by "breaking" the task deliberately.
> 
> Such a party-pooper! ;)
> 
> Got a link to the bug report for me?
> 
> Thanks!
> -jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 10:54 AM, Day, Phil wrote:

By "properly multi-threaded" are you instead referring to making the nova-api 
server multi-*processed* with eventlet greenthread pools in each process? i.e. The way 
Swift (and now Glance) works? Or are you referring to a different approach entirely?


Yep - following your posting in here pointing to the glance changes we 
back-ported that into the Diablo API server.   We're now running each API 
server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
happier.


Gotcha, OK, that makes a lot of sense.

> The same changes were being done in parallel into Essex by someone in 
the community I thought ?


Hmmm, for Nova? I'm not aware of that effort, but I would certainly 
support it. It's a very big impact performance issue...



Curious... do you have a list of all the places where sleep(0) calls were 
inserted in the HP Nova code? I can turn that into a bug report and get to work 
on adding them...


So far the only two cases we've done this are in the _sync_power_state and  in 
the security group refresh handling 
(libvirt/firewall/do_refresh_security_group_rules) - which we modified to only 
refresh for instances in the group and added a sleep in the loop (I need to 
finish writing the bug report for this one).


OK, sounds good.


I have contemplated doing something similar in the image code when reading 
chunks from glance - but am slightly worried that in this case the only thing 
that currently stops two creates for the same image from making separate 
requests to glance might be that one gets queued behind the other.  It would be 
nice to do the same thing on snapshot (as this can also be a real hog), but 
there the transfer is handled completely within the glance client.   A more 
radical approach would be to split out the image handling code from compute 
manager into a separate (co-hosted) image_manager so at least only commands 
which need interaction with glance will block each other.


We should definitely discuss this further (separate ML thread or 
etherpad maybe). If not before the design summit, then definitely at it.


Cheers!
-jay


Phil




-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 02 March 2012 15:17
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/02/2012 05:34 AM, Day, Phil wrote:

In our experience (running clusters of several hundred nodes) the DB 
performance is not generally the significant factor, so making its calls 
non-blocking  gives only a very small increase in processing capacity and 
creates other side effects in terms of slowing all eventlets down as they wait 
for their turn to run.


Yes, I believe I said that this was the case at the last design summit
-- or rather, I believe I said "is there any evidence that the database is a 
performance or scalability problem at all"?


That shouldn't really be surprising given that the Nova DB is pretty small and 
MySQL is a pretty good DB - throw reasonable hardware at the DB server and give 
it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes 
where the slow query log tells you to, etc) and it shouldn't be the bottleneck 
in the system for performance or scalability.


++


We use the python driver and have experimented with allowing the eventlet code 
to make the db calls non-blocking (its not the default setting), and it works, 
but didn't give us any significant advantage.


Yep, identical results to the work that Mark Washenberger did on the same 
subject.


For example in the API server (before we made it properly
multi-threaded)


By "properly multi-threaded" are you instead referring to making the nova-api 
server multi-*processed* with eventlet greenthread pools in each process? i.e. The way 
Swift (and now Glance) works? Or are you referring to a different approach entirely?

  >  with blocking db calls the server was essentially a serial processing 
queue - each request was fully processed before the next.  With non-blocking db 
calls we got a lot more apparent concurrencybut only at the expense of making all 
of the requests equally bad.

Yep, not surprising.


Consider a request takes 10 seconds, where after 5 seconds there is a call to 
the DB which takes 1 second, and three are started at the same time:

Blocking:
0 - Request 1 starts
10 - Request 1 completes, request 2 starts
20 - Request 2 completes, request 3 starts
30 - Request 3 competes
Request 1 completes in 10 seconds
Request 2 completes in 20 seconds
Request 3 completes in 30 seconds
Ave time: 20 sec

Non-blocking
0 - Request 1 Starts
5 - Request 1 gets to db call, request 2 starts
10 - Request 2 gets to db call, request 3 starts
15 - Request 3 gets to db call, request 1 resumes
19 - Request 1 completes, request 2 resume

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 10:52 AM, Armando Migliaccio wrote:

I'd be cautious to say that no ill side-effects were introduced. I found a race condition 
right in the middle of sync_power_states, which I assume was exposed by 
"breaking" the task deliberately.


Such a party-pooper! ;)

Got a link to the bug report for me?

Thanks!
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
> By "properly multi-threaded" are you instead referring to making the nova-api 
> server multi-*processed* with eventlet greenthread pools in each process? 
> i.e. The way Swift (and now Glance) works? Or are you referring to a 
> different approach entirely?

Yep - following your posting in here pointing to the glance changes we 
back-ported that into the Diablo API server.   We're now running each API 
server with 20 OS processes and 20 EC2 processes, and the world looks a lot 
happier.  The same changes were being done in parallel into Essex by someone in 
the community I thought ?

> Curious... do you have a list of all the places where sleep(0) calls were 
> inserted in the HP Nova code? I can turn that into a bug report and get to 
> work on adding them... 

So far the only two cases we've done this are in the _sync_power_state and  in 
the security group refresh handling 
(libvirt/firewall/do_refresh_security_group_rules) - which we modified to only 
refresh for instances in the group and added a sleep in the loop (I need to 
finish writing the bug report for this one).

I have contemplated doing something similar in the image code when reading 
chunks from glance - but am slightly worried that in this case the only thing 
that currently stops two creates for the same image from making separate 
requests to glance might be that one gets queued behind the other.  It would be 
nice to do the same thing on snapshot (as this can also be a real hog), but 
there the transfer is handled completely within the glance client.   A more 
radical approach would be to split out the image handling code from compute 
manager into a separate (co-hosted) image_manager so at least only commands 
which need interaction with glance will block each other.

Phil




-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jay Pipes
Sent: 02 March 2012 15:17
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/02/2012 05:34 AM, Day, Phil wrote:
> In our experience (running clusters of several hundred nodes) the DB 
> performance is not generally the significant factor, so making its calls 
> non-blocking  gives only a very small increase in processing capacity and 
> creates other side effects in terms of slowing all eventlets down as they 
> wait for their turn to run.

Yes, I believe I said that this was the case at the last design summit
-- or rather, I believe I said "is there any evidence that the database is a 
performance or scalability problem at all"?

> That shouldn't really be surprising given that the Nova DB is pretty small 
> and MySQL is a pretty good DB - throw reasonable hardware at the DB server 
> and give it a bit of TLC from a DBA (remove deleted entries from the DB, add 
> indexes where the slow query log tells you to, etc) and it shouldn't be the 
> bottleneck in the system for performance or scalability.

++

> We use the python driver and have experimented with allowing the eventlet 
> code to make the db calls non-blocking (its not the default setting), and it 
> works, but didn't give us any significant advantage.

Yep, identical results to the work that Mark Washenberger did on the same 
subject.

> For example in the API server (before we made it properly 
> multi-threaded)

By "properly multi-threaded" are you instead referring to making the nova-api 
server multi-*processed* with eventlet greenthread pools in each process? i.e. 
The way Swift (and now Glance) works? Or are you referring to a different 
approach entirely?

 > with blocking db calls the server was essentially a serial processing queue 
 > - each request was fully processed before the next.  With non-blocking db 
 > calls we got a lot more apparent concurrencybut only at the expense of 
 > making all of the requests equally bad.

Yep, not surprising.

> Consider a request takes 10 seconds, where after 5 seconds there is a call to 
> the DB which takes 1 second, and three are started at the same time:
>
> Blocking:
> 0 - Request 1 starts
> 10 - Request 1 completes, request 2 starts
> 20 - Request 2 completes, request 3 starts
> 30 - Request 3 competes
> Request 1 completes in 10 seconds
> Request 2 completes in 20 seconds
> Request 3 completes in 30 seconds
> Ave time: 20 sec
>
> Non-blocking
> 0 - Request 1 Starts
> 5 - Request 1 gets to db call, request 2 starts
> 10 - Request 2 gets to db call, request 3 starts
> 15 - Request 3 gets to db call, request 1 resumes
> 19 - Request 1 completes, request 2 resumes
> 23 - Request 2 completes,  request 3 resumes
> 27 - Request 3 completes
>
> Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes 
> in 24 seconds (+ 4 secon

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Armando Migliaccio


> -Original Message-
> From: openstack-bounces+armando.migliaccio=eu.citrix@lists.launchpad.net
> [mailto:openstack-
> bounces+armando.migliaccio=eu.citrix@lists.launchpad.net] On Behalf Of Jay
> Pipes
> Sent: 02 March 2012 15:17
> To: openstack@lists.launchpad.net
> Subject: Re: [Openstack] eventlet weirdness
> 
> On 03/02/2012 05:34 AM, Day, Phil wrote:
> > In our experience (running clusters of several hundred nodes) the DB
> performance is not generally the significant factor, so making its calls non-
> blocking  gives only a very small increase in processing capacity and creates
> other side effects in terms of slowing all eventlets down as they wait for
> their turn to run.
> 
> Yes, I believe I said that this was the case at the last design summit
> -- or rather, I believe I said "is there any evidence that the database is a
> performance or scalability problem at all"?
> 
> > That shouldn't really be surprising given that the Nova DB is pretty small
> and MySQL is a pretty good DB - throw reasonable hardware at the DB server and
> give it a bit of TLC from a DBA (remove deleted entries from the DB, add
> indexes where the slow query log tells you to, etc) and it shouldn't be the
> bottleneck in the system for performance or scalability.
> 
> ++
> 
> > We use the python driver and have experimented with allowing the eventlet
> code to make the db calls non-blocking (its not the default setting), and it
> works, but didn't give us any significant advantage.
> 
> Yep, identical results to the work that Mark Washenberger did on the same
> subject.
> 
> > For example in the API server (before we made it properly
> > multi-threaded)
> 
> By "properly multi-threaded" are you instead referring to making the nova-api
> server multi-*processed* with eventlet greenthread pools in each process? i.e.
> The way Swift (and now Glance) works? Or are you referring to a different
> approach entirely?
> 
>  > with blocking db calls the server was essentially a serial processing queue
> - each request was fully processed before the next.  With non-blocking db
> calls we got a lot more apparent concurrencybut only at the expense of making
> all of the requests equally bad.
> 
> Yep, not surprising.
> 
> > Consider a request takes 10 seconds, where after 5 seconds there is a call
> to the DB which takes 1 second, and three are started at the same time:
> >
> > Blocking:
> > 0 - Request 1 starts
> > 10 - Request 1 completes, request 2 starts
> > 20 - Request 2 completes, request 3 starts
> > 30 - Request 3 competes
> > Request 1 completes in 10 seconds
> > Request 2 completes in 20 seconds
> > Request 3 completes in 30 seconds
> > Ave time: 20 sec
> >
> > Non-blocking
> > 0 - Request 1 Starts
> > 5 - Request 1 gets to db call, request 2 starts
> > 10 - Request 2 gets to db call, request 3 starts
> > 15 - Request 3 gets to db call, request 1 resumes
> > 19 - Request 1 completes, request 2 resumes
> > 23 - Request 2 completes,  request 3 resumes
> > 27 - Request 3 completes
> >
> > Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes
> > in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3
> > seconds) Ave time: 20 sec
> >
> > So instead of worrying about making db calls non-blocking we've been working
> to make certain eventlets non-blocking - i.e. add sleep(0) calls to long
> running iteration loops - which IMO has a much bigger impact on the
> performance of the apparent latency of the system.
> 
> Yep, and I think adding a few sleep(0) calls in various places in the Nova
> codebase (as was recently added in the _sync_power_states() periodic task) is
> an easy and simple win with pretty much no ill side-effects. :)

I'd be cautious to say that no ill side-effects were introduced. I found a race 
condition right in the middle of sync_power_states, which I assume was exposed 
by "breaking" the task deliberately. 

> 
> Curious... do you have a list of all the places where sleep(0) calls were
> inserted in the HP Nova code? I can turn that into a bug report and get to
> work on adding them...
> 
> All the best,
> -jay
> 
> > Phil
> >
> >
> >
> > -Original Message-
> > From: openstack-bounces+philip.day=hp@lists.launchpad.net
> > [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On
> > Behalf Of Brian Lamar
> > Sent: 01 March 2012 21:31
> > To: openstack@lists.launchpad.net
> > Subject: Re: [Openstack] eventlet weirdness
> >
> >>> How is MySQ

Re: [Openstack] eventlet weirdness

2012-03-02 Thread Jay Pipes

On 03/02/2012 05:34 AM, Day, Phil wrote:

In our experience (running clusters of several hundred nodes) the DB 
performance is not generally the significant factor, so making its calls 
non-blocking  gives only a very small increase in processing capacity and 
creates other side effects in terms of slowing all eventlets down as they wait 
for their turn to run.


Yes, I believe I said that this was the case at the last design summit 
-- or rather, I believe I said "is there any evidence that the database 
is a performance or scalability problem at all"?



That shouldn't really be surprising given that the Nova DB is pretty small and 
MySQL is a pretty good DB - throw reasonable hardware at the DB server and give 
it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes 
where the slow query log tells you to, etc) and it shouldn't be the bottleneck 
in the system for performance or scalability.


++


We use the python driver and have experimented with allowing the eventlet code 
to make the db calls non-blocking (its not the default setting), and it works, 
but didn't give us any significant advantage.


Yep, identical results to the work that Mark Washenberger did on the 
same subject.



For example in the API server (before we made it properly multi-threaded)


By "properly multi-threaded" are you instead referring to making the 
nova-api server multi-*processed* with eventlet greenthread pools in 
each process? i.e. The way Swift (and now Glance) works? Or are you 
referring to a different approach entirely?


> with blocking db calls the server was essentially a serial processing 
queue - each request was fully processed before the next.  With 
non-blocking db calls we got a lot more apparent concurrencybut only at 
the expense of making all of the requests equally bad.


Yep, not surprising.


Consider a request takes 10 seconds, where after 5 seconds there is a call to 
the DB which takes 1 second, and three are started at the same time:

Blocking:
0 - Request 1 starts
10 - Request 1 completes, request 2 starts
20 - Request 2 completes, request 3 starts
30 - Request 3 competes
Request 1 completes in 10 seconds
Request 2 completes in 20 seconds
Request 3 completes in 30 seconds
Ave time: 20 sec

Non-blocking
0 - Request 1 Starts
5 - Request 1 gets to db call, request 2 starts
10 - Request 2 gets to db call, request 3 starts
15 - Request 3 gets to db call, request 1 resumes
19 - Request 1 completes, request 2 resumes
23 - Request 2 completes,  request 3 resumes
27 - Request 3 completes

Request 1 completes in 19 seconds  (+ 9 seconds)
Request 2 completes in 24 seconds (+ 4 seconds)
Request 3 completes in 27 seconds (- 3 seconds)
Ave time: 20 sec

So instead of worrying about making db calls non-blocking we've been working to 
make certain eventlets non-blocking - i.e. add sleep(0) calls to long running 
iteration loops - which IMO has a much bigger impact on the performance of the 
apparent latency of the system.


Yep, and I think adding a few sleep(0) calls in various places in the 
Nova codebase (as was recently added in the _sync_power_states() 
periodic task) is an easy and simple win with pretty much no ill 
side-effects. :)


Curious... do you have a list of all the places where sleep(0) calls 
were inserted in the HP Nova code? I can turn that into a bug report and 
get to work on adding them...


All the best,
-jay


Phil



-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Brian Lamar
Sent: 01 March 2012 21:31
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness


How is MySQL access handled in eventlet? Presumably it's external C
library so it's not going to be monkey patched. Does that make every
db access call a blocking call? Thanks,



Nope, it goes through a thread pool.


I feel like this might be an over-simplification. If the question is:

"How is MySQL access handled in nova?"

The answer would be that we use SQLAlchemy which can load any number of 
SQL-drivers. These drivers can be either pure Python or C-based drivers. In the 
case of pure Python drivers, monkey patching can occur and db calls are 
non-blocking. In the case of drivers which contain C code (or perhaps other 
blocking calls), db calls will most likely be blocking.

If the question is "How is MySQL access handled in eventlet?" the answer would 
be to use the eventlet.db_pool module to allow db access using thread pools.

B

-Original Message-
From: "Adam Young"
Sent: Thursday, March 1, 2012 3:27pm
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/01/2012 02:45 PM, Yun Mao wrote:

There are plenty eventlet discussion recently but I'll stick my
question to this thread, although it's pretty much a separate
question. :)


Re: [Openstack] eventlet weirdness

2012-03-02 Thread Day, Phil
In our experience (running clusters of several hundred nodes) the DB 
performance is not generally the significant factor, so making its calls 
non-blocking  gives only a very small increase in processing capacity and 
creates other side effects in terms of slowing all eventlets down as they wait 
for their turn to run.

That shouldn't really be surprising given that the Nova DB is pretty small and 
MySQL is a pretty good DB - throw reasonable hardware at the DB server and give 
it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes 
where the slow query log tells you to, etc) and it shouldn't be the bottleneck 
in the system for performance or scalability.

We use the python driver and have experimented with allowing the eventlet code 
to make the db calls non-blocking (its not the default setting), and it works, 
but didn't give us any significant advantage.

For example in the API server (before we made it properly multi-threaded) with 
blocking db calls the server was essentially a serial processing queue - each 
request was fully processed before the next.  With non-blocking db calls we got 
a lot more apparent concurrencybut only at the expense of making all of the 
requests equally bad.

Consider a request takes 10 seconds, where after 5 seconds there is a call to 
the DB which takes 1 second, and three are started at the same time:

Blocking:
0 - Request 1 starts
10 - Request 1 completes, request 2 starts
20 - Request 2 completes, request 3 starts
30 - Request 3 competes
Request 1 completes in 10 seconds
Request 2 completes in 20 seconds
Request 3 completes in 30 seconds
Ave time: 20 sec


Non-blocking
0 - Request 1 Starts
5 - Request 1 gets to db call, request 2 starts
10 - Request 2 gets to db call, request 3 starts
15 - Request 3 gets to db call, request 1 resumes
19 - Request 1 completes, request 2 resumes
23 - Request 2 completes,  request 3 resumes
27 - Request 3 completes

Request 1 completes in 19 seconds  (+ 9 seconds)
Request 2 completes in 24 seconds (+ 4 seconds)
Request 3 completes in 27 seconds (- 3 seconds)
Ave time: 20 sec
 
So instead of worrying about making db calls non-blocking we've been working to 
make certain eventlets non-blocking - i.e. add sleep(0) calls to long running 
iteration loops - which IMO has a much bigger impact on the performance of the 
apparent latency of the system.

Phil



-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Brian Lamar
Sent: 01 March 2012 21:31
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

>> How is MySQL access handled in eventlet? Presumably it's external C 
>> library so it's not going to be monkey patched. Does that make every 
>> db access call a blocking call? Thanks,

> Nope, it goes through a thread pool.

I feel like this might be an over-simplification. If the question is:

"How is MySQL access handled in nova?"

The answer would be that we use SQLAlchemy which can load any number of 
SQL-drivers. These drivers can be either pure Python or C-based drivers. In the 
case of pure Python drivers, monkey patching can occur and db calls are 
non-blocking. In the case of drivers which contain C code (or perhaps other 
blocking calls), db calls will most likely be blocking.

If the question is "How is MySQL access handled in eventlet?" the answer would 
be to use the eventlet.db_pool module to allow db access using thread pools.

B

-Original Message-
From: "Adam Young" 
Sent: Thursday, March 1, 2012 3:27pm
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/01/2012 02:45 PM, Yun Mao wrote:
> There are plenty eventlet discussion recently but I'll stick my 
> question to this thread, although it's pretty much a separate 
> question. :)
>
> How is MySQL access handled in eventlet? Presumably it's external C 
> library so it's not going to be monkey patched. Does that make every 
> db access call a blocking call? Thanks,

Nope, it goes through a thread pool.
>
> Yun
>
> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  
> wrote:
>> On Wed, Feb 29, 2012, Yun Mao  wrote:
>>> Thanks for the explanation. Let me see if I understand this.
>>>
>>> 1. Eventlet will never have this problem if there is only 1 OS 
>>> thread
>>> -- let's call it main thread.
>> In fact, that's exactly what Python calls it :)
>>
>>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or 
>>> the virt/firewall driver.
>>> 3. The python logging module uses locks. Because of the monkey 
>>> patch, those locks are actually eventlet or "green" locks and may 
>>> trigger a green thread conte

Re: [Openstack] eventlet weirdness

2012-03-01 Thread Vishvananda Ishaya
We were using the eventlet db pool class.

You can see the code here:

https://github.com/openstack/nova/blob/2011.3/nova/db/sqlalchemy/session.py

there were a few bugs about it, but here is one:

https://bugs.launchpad.net/nova/+bug/838581

Vish

On Mar 1, 2012, at 6:11 PM, Kapil Thangavelu wrote:

> 
> I actually didn't mean to send the parent.. but since we're here. 
> 
> Additional threads/pool make the problem potentially worse, they are the 
> origin 
> of the greenlet switching issue that was the start of this thread. ie. with 
> the 
> monkey patching, most stdlib socket, thread module usage by python code in 
> the 
> non-main threads will potentially attempt a greenlet trampoline across 
> threads 
> at worst (ie. error), or unintentional per thread hubs at best. 
> 
> Since the mysqldb is an extension (no python code) wrapping it in eventlet 
> thread pool should work in theory. What where the problems with it last time 
> it 
> was attempted?
> 
> cheers,
> 
> -kapil
> 
> 
> Excerpts from Devin Carlen's message of 2012-03-01 20:38:20 -0500:
>> As long as we allocate a thread in the eventlet thread pool for the number 
>> of mysql connections we want to actually maintain in our connection pool, we 
>> shouldn't have problems getting the results we want even with the blocking 
>> mysql c drivers. 
>> 
>> Devin
>> 
>> On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote:
>> 
>>> The standard python postgresql driver (psycopg2) does have an async mode. 
>>> There 
>>> are non db api compliant async mysql drivers for gevent.
>>> 
>>> 
>>> Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500:
 Yes it does. We actually tried to use a pool at diablo release and it was 
 very broken. There was discussion about moving over to a pure-python mysql 
 library, but it hasn't been tried yet.
 
 Vish
 
 On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
 
> There are plenty eventlet discussion recently but I'll stick my
> question to this thread, although it's pretty much a separate
> question. :)
> 
> How is MySQL access handled in eventlet? Presumably it's external C
> library so it's not going to be monkey patched. Does that make every
> db access call a blocking call? Thanks,
> 
> Yun
> 
> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  (mailto:johan...@erdfelt.com)> wrote:
>> On Wed, Feb 29, 2012, Yun Mao > (mailto:yun...@gmail.com)> wrote:
>>> Thanks for the explanation. Let me see if I understand this.
>>> 
>>> 1. Eventlet will never have this problem if there is only 1 OS thread
>>> -- let's call it main thread.
>>> 
>> 
>> 
>> In fact, that's exactly what Python calls it :)
>> 
>>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
>>> virt/firewall driver.
>>> 3. The python logging module uses locks. Because of the monkey patch,
>>> those locks are actually eventlet or "green" locks and may trigger a
>>> green thread context switch.
>>> 
>>> Based on 1-3, does it make sense to say that in the other OS threads
>>> (i.e. not main thread), if logging (plus other pure python library
>>> code involving locking) is never used, and we do not run a eventlet
>>> hub at all, we should never see this problem?
>>> 
>> 
>> 
>> That should be correct. I'd have to double check all of the monkey
>> patching that eventlet does to make sure there aren't other cases where
>> you may inadvertently use eventlet primitives across real threads.
>> 
>> JE
>> 
>> 
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net 
>> (mailto:openstack@lists.launchpad.net)
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net 
> (mailto:openstack@lists.launchpad.net)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
> 
 
 
>>> 
>>> 
>>> ___
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to : openstack@lists.launchpad.net 
>>> (mailto:openstack@lists.launchpad.net)
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help : https://help.launchpad.net/ListHelp
>>> 
>>> 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Kapil Thangavelu

I actually didn't mean to send the parent.. but since we're here. 

Additional threads/pool make the problem potentially worse, they are the origin 
of the greenlet switching issue that was the start of this thread. ie. with the 
monkey patching, most stdlib socket, thread module usage by python code in the 
non-main threads will potentially attempt a greenlet trampoline across threads 
at worst (ie. error), or unintentional per thread hubs at best. 

Since the mysqldb is an extension (no python code) wrapping it in eventlet 
thread pool should work in theory. What where the problems with it last time it 
was attempted?

cheers,

-kapil


Excerpts from Devin Carlen's message of 2012-03-01 20:38:20 -0500:
> As long as we allocate a thread in the eventlet thread pool for the number of 
> mysql connections we want to actually maintain in our connection pool, we 
> shouldn't have problems getting the results we want even with the blocking 
> mysql c drivers. 
> 
> Devin
> 
> On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote:
> 
> > The standard python postgresql driver (psycopg2) does have an async mode. 
> > There 
> > are non db api compliant async mysql drivers for gevent.
> > 
> > 
> > Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500:
> > > Yes it does. We actually tried to use a pool at diablo release and it was 
> > > very broken. There was discussion about moving over to a pure-python 
> > > mysql library, but it hasn't been tried yet.
> > > 
> > > Vish
> > > 
> > > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
> > > 
> > > > There are plenty eventlet discussion recently but I'll stick my
> > > > question to this thread, although it's pretty much a separate
> > > > question. :)
> > > > 
> > > > How is MySQL access handled in eventlet? Presumably it's external C
> > > > library so it's not going to be monkey patched. Does that make every
> > > > db access call a blocking call? Thanks,
> > > > 
> > > > Yun
> > > > 
> > > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  > > > (mailto:johan...@erdfelt.com)> wrote:
> > > > > On Wed, Feb 29, 2012, Yun Mao  > > > > (mailto:yun...@gmail.com)> wrote:
> > > > > > Thanks for the explanation. Let me see if I understand this.
> > > > > > 
> > > > > > 1. Eventlet will never have this problem if there is only 1 OS 
> > > > > > thread
> > > > > > -- let's call it main thread.
> > > > > > 
> > > > > 
> > > > > 
> > > > > In fact, that's exactly what Python calls it :)
> > > > > 
> > > > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or 
> > > > > > the
> > > > > > virt/firewall driver.
> > > > > > 3. The python logging module uses locks. Because of the monkey 
> > > > > > patch,
> > > > > > those locks are actually eventlet or "green" locks and may trigger a
> > > > > > green thread context switch.
> > > > > > 
> > > > > > Based on 1-3, does it make sense to say that in the other OS threads
> > > > > > (i.e. not main thread), if logging (plus other pure python library
> > > > > > code involving locking) is never used, and we do not run a eventlet
> > > > > > hub at all, we should never see this problem?
> > > > > > 
> > > > > 
> > > > > 
> > > > > That should be correct. I'd have to double check all of the monkey
> > > > > patching that eventlet does to make sure there aren't other cases 
> > > > > where
> > > > > you may inadvertently use eventlet primitives across real threads.
> > > > > 
> > > > > JE
> > > > > 
> > > > > 
> > > > > ___
> > > > > Mailing list: https://launchpad.net/~openstack
> > > > > Post to : openstack@lists.launchpad.net 
> > > > > (mailto:openstack@lists.launchpad.net)
> > > > > Unsubscribe : https://launchpad.net/~openstack
> > > > > More help : https://help.launchpad.net/ListHelp
> > > > > 
> > > > 
> > > > 
> > > > ___
> > > > Mailing list: https://launchpad.net/~openstack
> > > > Post to : openstack@lists.launchpad.net 
> > > > (mailto:openstack@lists.launchpad.net)
> > > > Unsubscribe : https://launchpad.net/~openstack
> > > > More help : https://help.launchpad.net/ListHelp
> > > > 
> > > 
> > > 
> > 
> > 
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net 
> > (mailto:openstack@lists.launchpad.net)
> > Unsubscribe : https://launchpad.net/~openstack
> > More help : https://help.launchpad.net/ListHelp
> > 
> > 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Devin Carlen
As long as we allocate a thread in the eventlet thread pool for the number of 
mysql connections we want to actually maintain in our connection pool, we 
shouldn't have problems getting the results we want even with the blocking 
mysql c drivers. 

Devin


On Thursday, March 1, 2012 at 5:23 PM, Kapil Thangavelu wrote:

> The standard python postgresql driver (psycopg2) does have an async mode. 
> There 
> are non db api compliant async mysql drivers for gevent.
> 
> 
> Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500:
> > Yes it does. We actually tried to use a pool at diablo release and it was 
> > very broken. There was discussion about moving over to a pure-python mysql 
> > library, but it hasn't been tried yet.
> > 
> > Vish
> > 
> > On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
> > 
> > > There are plenty eventlet discussion recently but I'll stick my
> > > question to this thread, although it's pretty much a separate
> > > question. :)
> > > 
> > > How is MySQL access handled in eventlet? Presumably it's external C
> > > library so it's not going to be monkey patched. Does that make every
> > > db access call a blocking call? Thanks,
> > > 
> > > Yun
> > > 
> > > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  > > (mailto:johan...@erdfelt.com)> wrote:
> > > > On Wed, Feb 29, 2012, Yun Mao  > > > (mailto:yun...@gmail.com)> wrote:
> > > > > Thanks for the explanation. Let me see if I understand this.
> > > > > 
> > > > > 1. Eventlet will never have this problem if there is only 1 OS thread
> > > > > -- let's call it main thread.
> > > > > 
> > > > 
> > > > 
> > > > In fact, that's exactly what Python calls it :)
> > > > 
> > > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
> > > > > virt/firewall driver.
> > > > > 3. The python logging module uses locks. Because of the monkey patch,
> > > > > those locks are actually eventlet or "green" locks and may trigger a
> > > > > green thread context switch.
> > > > > 
> > > > > Based on 1-3, does it make sense to say that in the other OS threads
> > > > > (i.e. not main thread), if logging (plus other pure python library
> > > > > code involving locking) is never used, and we do not run a eventlet
> > > > > hub at all, we should never see this problem?
> > > > > 
> > > > 
> > > > 
> > > > That should be correct. I'd have to double check all of the monkey
> > > > patching that eventlet does to make sure there aren't other cases where
> > > > you may inadvertently use eventlet primitives across real threads.
> > > > 
> > > > JE
> > > > 
> > > > 
> > > > ___
> > > > Mailing list: https://launchpad.net/~openstack
> > > > Post to : openstack@lists.launchpad.net 
> > > > (mailto:openstack@lists.launchpad.net)
> > > > Unsubscribe : https://launchpad.net/~openstack
> > > > More help : https://help.launchpad.net/ListHelp
> > > > 
> > > 
> > > 
> > > ___
> > > Mailing list: https://launchpad.net/~openstack
> > > Post to : openstack@lists.launchpad.net 
> > > (mailto:openstack@lists.launchpad.net)
> > > Unsubscribe : https://launchpad.net/~openstack
> > > More help : https://help.launchpad.net/ListHelp
> > > 
> > 
> > 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
> 
> 


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Kapil Thangavelu
The standard python postgresql driver (psycopg2) does have an async mode. There 
are non db api compliant async mysql drivers for gevent.


Excerpts from Vishvananda Ishaya's message of 2012-03-01 15:36:43 -0500:
> Yes it does.  We actually tried to use a pool at diablo release and it was 
> very broken. There was discussion about moving over to a pure-python mysql 
> library, but it hasn't been tried yet.
> 
> Vish
> 
> On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
> 
> > There are plenty eventlet discussion recently but I'll stick my
> > question to this thread, although it's pretty much a separate
> > question. :)
> > 
> > How is MySQL access handled in eventlet? Presumably it's external C
> > library so it's not going to be monkey patched. Does that make every
> > db access call a blocking call? Thanks,
> > 
> > Yun
> > 
> > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  
> > wrote:
> >> On Wed, Feb 29, 2012, Yun Mao  wrote:
> >>> Thanks for the explanation. Let me see if I understand this.
> >>> 
> >>> 1. Eventlet will never have this problem if there is only 1 OS thread
> >>> -- let's call it main thread.
> >> 
> >> In fact, that's exactly what Python calls it :)
> >> 
> >>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
> >>> virt/firewall driver.
> >>> 3. The python logging module uses locks. Because of the monkey patch,
> >>> those locks are actually eventlet or "green" locks and may trigger a
> >>> green thread context switch.
> >>> 
> >>> Based on 1-3, does it make sense to say that in the other OS threads
> >>> (i.e. not main thread), if logging (plus other pure python library
> >>> code involving locking) is never used, and we do not run a eventlet
> >>> hub at all, we should never see this problem?
> >> 
> >> That should be correct. I'd have to double check all of the monkey
> >> patching that eventlet does to make sure there aren't other cases where
> >> you may inadvertently use eventlet primitives across real threads.
> >> 
> >> JE
> >> 
> >> 
> >> ___
> >> Mailing list: https://launchpad.net/~openstack
> >> Post to : openstack@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~openstack
> >> More help   : https://help.launchpad.net/ListHelp
> > 
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~openstack
> > More help   : https://help.launchpad.net/ListHelp
> 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Mark Washenberger
Someone might have already said this (sure wish the listserv sent me mail 
faster), but we tried out PyMysql and it was exceptionally slow, even under 
almost no load.

I have a branch in my github that I was using to test out unblocking the 
database access. For my cases I found that it was unblocked but that it didn't 
really help performance as much as I had hoped.

branch: https://github.com/markwash/nova/tree/optional-db-api-thread-no-pool

just the relevant commit: 
https://github.com/markwash/nova/commit/99e38d3df579670808711eb8acd1f96806d8b6f0

"Vishvananda Ishaya"  said:

> Yes it does.  We actually tried to use a pool at diablo release and it was 
> very
> broken. There was discussion about moving over to a pure-python mysql 
> library, but
> it hasn't been tried yet.
> 
> Vish
> 
> On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
> 
>> There are plenty eventlet discussion recently but I'll stick my
>> question to this thread, although it's pretty much a separate
>> question. :)
>>
>> How is MySQL access handled in eventlet? Presumably it's external C
>> library so it's not going to be monkey patched. Does that make every
>> db access call a blocking call? Thanks,
>>
>> Yun
>>
>> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  
>> wrote:
>>> On Wed, Feb 29, 2012, Yun Mao  wrote:
 Thanks for the explanation. Let me see if I understand this.

 1. Eventlet will never have this problem if there is only 1 OS thread
 -- let's call it main thread.
>>>
>>> In fact, that's exactly what Python calls it :)
>>>
 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
 virt/firewall driver.
 3. The python logging module uses locks. Because of the monkey patch,
 those locks are actually eventlet or "green" locks and may trigger a
 green thread context switch.

 Based on 1-3, does it make sense to say that in the other OS threads
 (i.e. not main thread), if logging (plus other pure python library
 code involving locking) is never used, and we do not run a eventlet
 hub at all, we should never see this problem?
>>>
>>> That should be correct. I'd have to double check all of the monkey
>>> patching that eventlet does to make sure there aren't other cases where
>>> you may inadvertently use eventlet primitives across real threads.
>>>
>>> JE
>>>
>>>
>>> ___
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to : openstack@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>>
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Eric Windisch
Just because MySQL is a C library doesn't necessarily mean it can't be made to 
work with coroutines. ZeroMQ is supported through eventlet.green.zmq and there 
exists geventmysql (although it appears to me as more a proof-of-concept).

Moving to a pure-python mysql library might be the path of least resistance as 
long as we're committed to eventlet. 

-- 
Eric Windisch


On Thursday, March 1, 2012 at 3:36 PM, Vishvananda Ishaya wrote:

> Yes it does. We actually tried to use a pool at diablo release and it was 
> very broken. There was discussion about moving over to a pure-python mysql 
> library, but it hasn't been tried yet.
> 
> Vish
> 
> On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:
> 
> > There are plenty eventlet discussion recently but I'll stick my
> > question to this thread, although it's pretty much a separate
> > question. :)
> > 
> > How is MySQL access handled in eventlet? Presumably it's external C
> > library so it's not going to be monkey patched. Does that make every
> > db access call a blocking call? Thanks,
> > 
> > Yun
> > 
> > On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  > (mailto:johan...@erdfelt.com)> wrote:
> > > On Wed, Feb 29, 2012, Yun Mao  > > (mailto:yun...@gmail.com)> wrote:
> > > > Thanks for the explanation. Let me see if I understand this.
> > > > 
> > > > 1. Eventlet will never have this problem if there is only 1 OS thread
> > > > -- let's call it main thread.
> > > 
> > > 
> > > 
> > > In fact, that's exactly what Python calls it :)
> > > 
> > > > 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
> > > > virt/firewall driver.
> > > > 3. The python logging module uses locks. Because of the monkey patch,
> > > > those locks are actually eventlet or "green" locks and may trigger a
> > > > green thread context switch.
> > > > 
> > > > Based on 1-3, does it make sense to say that in the other OS threads
> > > > (i.e. not main thread), if logging (plus other pure python library
> > > > code involving locking) is never used, and we do not run a eventlet
> > > > hub at all, we should never see this problem?
> > > 
> > > 
> > > 
> > > That should be correct. I'd have to double check all of the monkey
> > > patching that eventlet does to make sure there aren't other cases where
> > > you may inadvertently use eventlet primitives across real threads.
> > > 
> > > JE
> > > 
> > > 
> > > ___
> > > Mailing list: https://launchpad.net/~openstack
> > > Post to : openstack@lists.launchpad.net 
> > > (mailto:openstack@lists.launchpad.net)
> > > Unsubscribe : https://launchpad.net/~openstack
> > > More help : https://help.launchpad.net/ListHelp
> > 
> > 
> > 
> > ___
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack@lists.launchpad.net 
> > (mailto:openstack@lists.launchpad.net)
> > Unsubscribe : https://launchpad.net/~openstack
> > More help : https://help.launchpad.net/ListHelp
> 
> 
> 
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Brian Lamar
>> How is MySQL access handled in eventlet? Presumably it's external C
>> library so it's not going to be monkey patched. Does that make every
>> db access call a blocking call? Thanks,

> Nope, it goes through a thread pool.

I feel like this might be an over-simplification. If the question is:

"How is MySQL access handled in nova?"

The answer would be that we use SQLAlchemy which can load any number of 
SQL-drivers. These drivers can be either pure Python or C-based drivers. In the 
case of pure Python drivers, monkey patching can occur and db calls are 
non-blocking. In the case of drivers which contain C code (or perhaps other 
blocking calls), db calls will most likely be blocking.

If the question is "How is MySQL access handled in eventlet?" the answer would 
be to use the eventlet.db_pool module to allow db access using thread pools.

B

-Original Message-
From: "Adam Young" 
Sent: Thursday, March 1, 2012 3:27pm
To: openstack@lists.launchpad.net
Subject: Re: [Openstack] eventlet weirdness

On 03/01/2012 02:45 PM, Yun Mao wrote:
> There are plenty eventlet discussion recently but I'll stick my
> question to this thread, although it's pretty much a separate
> question. :)
>
> How is MySQL access handled in eventlet? Presumably it's external C
> library so it's not going to be monkey patched. Does that make every
> db access call a blocking call? Thanks,

Nope, it goes through a thread pool.
>
> Yun
>
> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  
> wrote:
>> On Wed, Feb 29, 2012, Yun Mao  wrote:
>>> Thanks for the explanation. Let me see if I understand this.
>>>
>>> 1. Eventlet will never have this problem if there is only 1 OS thread
>>> -- let's call it main thread.
>> In fact, that's exactly what Python calls it :)
>>
>>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
>>> virt/firewall driver.
>>> 3. The python logging module uses locks. Because of the monkey patch,
>>> those locks are actually eventlet or "green" locks and may trigger a
>>> green thread context switch.
>>>
>>> Based on 1-3, does it make sense to say that in the other OS threads
>>> (i.e. not main thread), if logging (plus other pure python library
>>> code involving locking) is never used, and we do not run a eventlet
>>> hub at all, we should never see this problem?
>> That should be correct. I'd have to double check all of the monkey
>> patching that eventlet does to make sure there aren't other cases where
>> you may inadvertently use eventlet primitives across real threads.
>>
>> JE
>>
>>
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Chris Behrens

On Mar 1, 2012, at 12:36 PM, Vishvananda Ishaya wrote:

> Yes it does.  We actually tried to use a pool at diablo release and it was 
> very broken. There was discussion about moving over to a pure-python mysql 
> library, but it hasn't been tried yet.
> 

I know some people have tried this... and the performance is...  not great.

- Chris


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Chris Behrens

On Mar 1, 2012, at 12:27 PM, Adam Young wrote:

> On 03/01/2012 02:45 PM, Yun Mao wrote:
>> There are plenty eventlet discussion recently but I'll stick my
>> question to this thread, although it's pretty much a separate
>> question. :)
>> 
>> How is MySQL access handled in eventlet? Presumably it's external C
>> library so it's not going to be monkey patched. Does that make every
>> db access call a blocking call? Thanks,
> 
> Nope, it goes through a thread pool.

Actually, it doesn't use a thread pool right now...  so it does block, unless 
something has changed recently that I'm not aware of.  We were using the 
eventpool dbpool code, but we had to remove it at diablo release time due to 
issues.  Correct me if this is wrong.  I'm not sure it's ever been completely 
revisited, but this is definitely a huge issue for scaling.  It's been on my 
list for a while to take a look at.

- Chris




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Vishvananda Ishaya
Yes it does.  We actually tried to use a pool at diablo release and it was very 
broken. There was discussion about moving over to a pure-python mysql library, 
but it hasn't been tried yet.

Vish

On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:

> There are plenty eventlet discussion recently but I'll stick my
> question to this thread, although it's pretty much a separate
> question. :)
> 
> How is MySQL access handled in eventlet? Presumably it's external C
> library so it's not going to be monkey patched. Does that make every
> db access call a blocking call? Thanks,
> 
> Yun
> 
> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  
> wrote:
>> On Wed, Feb 29, 2012, Yun Mao  wrote:
>>> Thanks for the explanation. Let me see if I understand this.
>>> 
>>> 1. Eventlet will never have this problem if there is only 1 OS thread
>>> -- let's call it main thread.
>> 
>> In fact, that's exactly what Python calls it :)
>> 
>>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
>>> virt/firewall driver.
>>> 3. The python logging module uses locks. Because of the monkey patch,
>>> those locks are actually eventlet or "green" locks and may trigger a
>>> green thread context switch.
>>> 
>>> Based on 1-3, does it make sense to say that in the other OS threads
>>> (i.e. not main thread), if logging (plus other pure python library
>>> code involving locking) is never used, and we do not run a eventlet
>>> hub at all, we should never see this problem?
>> 
>> That should be correct. I'd have to double check all of the monkey
>> patching that eventlet does to make sure there aren't other cases where
>> you may inadvertently use eventlet primitives across real threads.
>> 
>> JE
>> 
>> 
>> ___
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
> 
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Adam Young

On 03/01/2012 02:45 PM, Yun Mao wrote:

There are plenty eventlet discussion recently but I'll stick my
question to this thread, although it's pretty much a separate
question. :)

How is MySQL access handled in eventlet? Presumably it's external C
library so it's not going to be monkey patched. Does that make every
db access call a blocking call? Thanks,


Nope, it goes through a thread pool.


Yun

On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  wrote:

On Wed, Feb 29, 2012, Yun Mao  wrote:

Thanks for the explanation. Let me see if I understand this.

1. Eventlet will never have this problem if there is only 1 OS thread
-- let's call it main thread.

In fact, that's exactly what Python calls it :)


2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
virt/firewall driver.
3. The python logging module uses locks. Because of the monkey patch,
those locks are actually eventlet or "green" locks and may trigger a
green thread context switch.

Based on 1-3, does it make sense to say that in the other OS threads
(i.e. not main thread), if logging (plus other pure python library
code involving locking) is never used, and we do not run a eventlet
hub at all, we should never see this problem?

That should be correct. I'd have to double check all of the monkey
patching that eventlet does to make sure there aren't other cases where
you may inadvertently use eventlet primitives across real threads.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Yun Mao
It seems that there used to be a db_pool in session.py but got removed
by this commit.

https://github.com/openstack/nova/commit/f3dd56e916232e38e74d9e2f24ce9a738cac63cf

due to this bug: https://bugs.launchpad.net/nova/+bug/838581

But still I'm confused by the discussion. Are we saying eventlet +
sqlalchemy + mysql pool is buggy so instead we make every DB call a
blocking call? Thanks,

Yun

On Thu, Mar 1, 2012 at 2:45 PM, Yun Mao  wrote:
> There are plenty eventlet discussion recently but I'll stick my
> question to this thread, although it's pretty much a separate
> question. :)
>
> How is MySQL access handled in eventlet? Presumably it's external C
> library so it's not going to be monkey patched. Does that make every
> db access call a blocking call? Thanks,

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-03-01 Thread Yun Mao
There are plenty eventlet discussion recently but I'll stick my
question to this thread, although it's pretty much a separate
question. :)

How is MySQL access handled in eventlet? Presumably it's external C
library so it's not going to be monkey patched. Does that make every
db access call a blocking call? Thanks,

Yun

On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt  wrote:
> On Wed, Feb 29, 2012, Yun Mao  wrote:
>> Thanks for the explanation. Let me see if I understand this.
>>
>> 1. Eventlet will never have this problem if there is only 1 OS thread
>> -- let's call it main thread.
>
> In fact, that's exactly what Python calls it :)
>
>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
>> virt/firewall driver.
>> 3. The python logging module uses locks. Because of the monkey patch,
>> those locks are actually eventlet or "green" locks and may trigger a
>> green thread context switch.
>>
>> Based on 1-3, does it make sense to say that in the other OS threads
>> (i.e. not main thread), if logging (plus other pure python library
>> code involving locking) is never used, and we do not run a eventlet
>> hub at all, we should never see this problem?
>
> That should be correct. I'd have to double check all of the monkey
> patching that eventlet does to make sure there aren't other cases where
> you may inadvertently use eventlet primitives across real threads.
>
> JE
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-02-29 Thread Johannes Erdfelt
On Wed, Feb 29, 2012, Yun Mao  wrote:
> Thanks for the explanation. Let me see if I understand this.
> 
> 1. Eventlet will never have this problem if there is only 1 OS thread
> -- let's call it main thread.

In fact, that's exactly what Python calls it :)

> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
> virt/firewall driver.
> 3. The python logging module uses locks. Because of the monkey patch,
> those locks are actually eventlet or "green" locks and may trigger a
> green thread context switch.
> 
> Based on 1-3, does it make sense to say that in the other OS threads
> (i.e. not main thread), if logging (plus other pure python library
> code involving locking) is never used, and we do not run a eventlet
> hub at all, we should never see this problem?

That should be correct. I'd have to double check all of the monkey
patching that eventlet does to make sure there aren't other cases where
you may inadvertently use eventlet primitives across real threads.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-02-29 Thread Yun Mao
Thanks for the explanation. Let me see if I understand this.

1. Eventlet will never have this problem if there is only 1 OS thread
-- let's call it main thread.
2. In Nova, there is only 1 OS thread unless you use xenapi and/or the
virt/firewall driver.
3. The python logging module uses locks. Because of the monkey patch,
those locks are actually eventlet or "green" locks and may trigger a
green thread context switch.

Based on 1-3, does it make sense to say that in the other OS threads
(i.e. not main thread), if logging (plus other pure python library
code involving locking) is never used, and we do not run a eventlet
hub at all, we should never see this problem?

Thanks,

Yun

On Wed, Feb 29, 2012 at 5:24 PM, Johannes Erdfelt  wrote:
> On Wed, Feb 29, 2012, Yun Mao  wrote:
>> we sometimes notice this error message which prevent us from starting
>> nova services occasionally. We are using a somewhat modified diablo
>> stable release on Ubuntu 11.10. It may very well be the problem from
>> our patches but I'm wondering if you guys have any insight. In what
>> condition does this error occur? There is a similar bug in here:
>> https://bugs.launchpad.net/nova/+bug/831599
>>
>> but that doesn't offer much insight to me. Helps are very appreciated. 
>> Thanks,
>
> greenlet threads (used by eventlet) can't be scheduled across real
> threads. This usually isn't done explicitly, but can happen as a side
> effect if code uses locks. logging is one instance that I've run into.
>
> This generally hasn't been a problem with nova since it uses the
> eventlet monkey patching that makes it hard to generate real threads.
>
> There are two places (at least in trunk) where you need to be careful,
> both nova/virt/xenapi_conn.py and libvirt/firewall.py use tpool which
> does create a real thread in the background.
>
> If you use logging (and it's not the only source of this problem) then
> you can run into this eventlet message.
>
> JE
>
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-02-29 Thread Michael Pittaro
On Wed, Feb 29, 2012 at 1:02 PM, Yun Mao  wrote:
> Hi,
>
> we sometimes notice this error message which prevent us from starting
> nova services occasionally. We are using a somewhat modified diablo
> stable release on Ubuntu 11.10. It may very well be the problem from
> our patches but I'm wondering if you guys have any insight. In what
> condition does this error occur? There is a similar bug in here:
> https://bugs.launchpad.net/nova/+bug/831599
>
> but that doesn't offer much insight to me. Helps are very appreciated. Thanks,
>
> Yun

One tip - make sure you capture stdout/stderr, as well as logs.

Although I haven't see this particular error, I have seen at least
one case where libvirt related errors weren't in the log, but made
it to the console.

mike

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-02-29 Thread Johannes Erdfelt
On Wed, Feb 29, 2012, Yun Mao  wrote:
> we sometimes notice this error message which prevent us from starting
> nova services occasionally. We are using a somewhat modified diablo
> stable release on Ubuntu 11.10. It may very well be the problem from
> our patches but I'm wondering if you guys have any insight. In what
> condition does this error occur? There is a similar bug in here:
> https://bugs.launchpad.net/nova/+bug/831599
> 
> but that doesn't offer much insight to me. Helps are very appreciated. Thanks,

greenlet threads (used by eventlet) can't be scheduled across real
threads. This usually isn't done explicitly, but can happen as a side
effect if code uses locks. logging is one instance that I've run into.

This generally hasn't been a problem with nova since it uses the
eventlet monkey patching that makes it hard to generate real threads.

There are two places (at least in trunk) where you need to be careful,
both nova/virt/xenapi_conn.py and libvirt/firewall.py use tpool which
does create a real thread in the background.

If you use logging (and it's not the only source of this problem) then
you can run into this eventlet message.

JE


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] eventlet weirdness

2012-02-29 Thread Mark Gius
I have been encountering these quite a bit myself recently in another
project.

For me the errors were a result of tpool.execute() in a non-cooperative
thread context.  My guess as to the root cause is that some of eventlet's
cooperative waiting code is not safe to use when not running in an eventlet
coroutine context.  My solution (which may not work for you) involved
switching based on whether I'm in a greenthread or not, and either call
tpool.execute() or the underlying function directly.  Fortunately for me I
can know at compile time what context I will be in. I think there is a way
to query eventlet to see if you are currently in a greenthread or not, but
I haven't finished diving into that documentation yet.

Good luck,
Mark

On Wed, Feb 29, 2012 at 1:02 PM, Yun Mao  wrote:

> Hi,
>
> we sometimes notice this error message which prevent us from starting
> nova services occasionally. We are using a somewhat modified diablo
> stable release on Ubuntu 11.10. It may very well be the problem from
> our patches but I'm wondering if you guys have any insight. In what
> condition does this error occur? There is a similar bug in here:
> https://bugs.launchpad.net/nova/+bug/831599
>
> but that doesn't offer much insight to me. Helps are very appreciated.
> Thanks,
>
> Yun
>
> 2012-02-23 16:54:52,788 DEBUG nova.utils
> [43f98259-6ba8-4e5d-bc0e-9eab978194e5 None None] backend  'nova.db.sqlalchemy.api' from
> '/opt/stack/nova/nova/db/sqlalchemy/api.pyc'> from (pid=6385)
> __get_backend /opt/stack/nova/nova/utils.py:449
> Traceback (most recent call last):
>  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line
> 336, in fire_timers
>timer()
>  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line
> 56, in __call__
>cb(*args, **kw)
>  File "/usr/lib/python2.7/dist-packages/eventlet/semaphore.py", line
> 95, in _do_acquire
>waiter.switch()
> error: cannot switch to a different thread
>
> ___
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] eventlet weirdness

2012-02-29 Thread Yun Mao
Hi,

we sometimes notice this error message which prevent us from starting
nova services occasionally. We are using a somewhat modified diablo
stable release on Ubuntu 11.10. It may very well be the problem from
our patches but I'm wondering if you guys have any insight. In what
condition does this error occur? There is a similar bug in here:
https://bugs.launchpad.net/nova/+bug/831599

but that doesn't offer much insight to me. Helps are very appreciated. Thanks,

Yun

2012-02-23 16:54:52,788 DEBUG nova.utils
[43f98259-6ba8-4e5d-bc0e-9eab978194e5 None None] backend  from (pid=6385)
__get_backend /opt/stack/nova/nova/utils.py:449
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line
336, in fire_timers
timer()
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line
56, in __call__
cb(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/eventlet/semaphore.py", line
95, in _do_acquire
waiter.switch()
error: cannot switch to a different thread

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp