Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-02-23 Thread Roman Podoliaka
That's what I tried first :)

For some reason load distribution was still uneven. I'll check this
again, maybe I missed something.

On Tue, Feb 23, 2016 at 5:37 PM, Chris Friesen
 wrote:
> On 02/23/2016 05:25 AM, Roman Podoliaka wrote:
>
>> So looks like it's two related problems here:
>>
>> 1) the distribution of load between workers is uneven. One way to fix
>> this is to decrease the default number of greenlets in pool [2], which
>> will effectively cause a particular worker to give up new connections
>> to other forks, as soon as there are no more greenlets available in
>> the pool to process incoming requests. But this alone will *only* be
>> effective when the concurrency level is greater than the number of
>> greenlets in pool. Another way would be to add a context switch to
>> eventlet accept() loop [8] right after spawn_n() - this is what I've
>> got with greenthread.sleep(0.05) [9][10] (the trade off is that we now
>> only can accept() 1/ 0.05 = 20 new connections per second per worker -
>> I'll try to experiment with numbers here).
>
>
> Would greenthread.sleep(0) be enough to trigger a context switch?
>
> Chris
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-02-23 Thread Chris Friesen

On 02/23/2016 05:25 AM, Roman Podoliaka wrote:


So looks like it's two related problems here:

1) the distribution of load between workers is uneven. One way to fix
this is to decrease the default number of greenlets in pool [2], which
will effectively cause a particular worker to give up new connections
to other forks, as soon as there are no more greenlets available in
the pool to process incoming requests. But this alone will *only* be
effective when the concurrency level is greater than the number of
greenlets in pool. Another way would be to add a context switch to
eventlet accept() loop [8] right after spawn_n() - this is what I've
got with greenthread.sleep(0.05) [9][10] (the trade off is that we now
only can accept() 1/ 0.05 = 20 new connections per second per worker -
I'll try to experiment with numbers here).


Would greenthread.sleep(0) be enough to trigger a context switch?

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-02-23 Thread Roman Podoliaka
Hi all,

I've taken another look at this in order to propose patches to
oslo.service/oslo.db, so that we have better defaults for WSGI
greenlets number / max DB connections overflow [1] [2], which would be
more suitable for DB oriented services like our APIs are.

I used the Mike's snippet [3] for testing, 10 workers (i.e. forks)
served the WSGI app, ab concurrency level was set to 100, 3000
requests were sent.

With our default settings (1000 greenlets per worker, 5 connections in
the DB pool, 10 connections max overflow, 30 seconds timeout waiting
for a connection to become available), ~10-15 requests out of 3000
will fail with 500 due to pool timeout issue on every run [4].

As it was expected, load is distributed unevenly between workers: htop
shows that one worker is busy, while others are not [5]. Tracing
accept() calls with perf-events (sudo perf trace -e accept --pid=$PIDS
-S) allows to see the exact number of requests served by each worker
[6] - we can see that the "busy" worker served almost twice as many
WSGI requests as any other worker did. perf output [7] shows an
interesting pattern: each eventlet WSGI worker sleeps in accept()
waiting for new connections to become available in the queue handled
by the kernel; when there is a new connection available, a random
worker wakes up and tries to accept() as many connections as possible.

Reading the source code of eventlet WSGI server [8] suggests that it
will accept() new connections as long as they are available (and as
long as there are more available greenthreads in the pool) before
starting to process already accept()'ed ones (spawn_n() only creates a
new greenthread and schedules it be executed "later"). Giving the fact
we have 1000 greenlets in the pool, there is a high probability we'll
end up with an overloaded worker. If handling of these requests
involves doing DB queries, we have only 5 (pool) + 10 (max overflow)
DB connections available, others will have to wait (and may eventually
time out after 30 seconds).

So looks like it's two related problems here:

1) the distribution of load between workers is uneven. One way to fix
this is to decrease the default number of greenlets in pool [2], which
will effectively cause a particular worker to give up new connections
to other forks, as soon as there are no more greenlets available in
the pool to process incoming requests. But this alone will *only* be
effective when the concurrency level is greater than the number of
greenlets in pool. Another way would be to add a context switch to
eventlet accept() loop [8] right after spawn_n() - this is what I've
got with greenthread.sleep(0.05) [9][10] (the trade off is that we now
only can accept() 1/ 0.05 = 20 new connections per second per worker -
I'll try to experiment with numbers here).

2) even if the distribution of load is even, we still have to be able
to process requests according to the max level of concurrency,
effectively set by the number of greenlets in pool. For DB oriented
services that means we need to have DB connections available. [1]
increases the
default max_overflow value to allow SQLAlchemy to open additional
connections to a DB and handle spikes of concurrent requests.
Increasing max_overflow value further will probably lead to max number
of connection errors in RDBMs servers.

As it was already mentioned in this thread, the rule of thumb is that
for DB oriented WSGI services the max_overflow value should be at
least close to the number of greenlets. Running tests on my machine
shows that having 100 greenlets in pool / 5 DB connections in pool /
50 max_overflow / 30 seconds pool timeout allows to handle up to 500
concurrent requests without seeing pool timeout errors.

Thanks,
Roman

[1] https://review.openstack.org/#/c/269186/
[2] https://review.openstack.org/#/c/269188/
[3] https://gist.github.com/zzzeek/c69138fd0d0b3e553a1f
[4] http://paste.openstack.org/show/487867/
[5] http://imgur.com/vEWJmrd
[6] http://imgur.com/FOZ2htf
[7] http://paste.openstack.org/show/487871/
[8] https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L862-L869
[9] http://paste.openstack.org/show/487874/
[10] http://imgur.com/IuukDiD

On Mon, Jan 11, 2016 at 4:05 PM, Mike Bayer  wrote:
>
>
> On 01/11/2016 05:39 AM, Radomir Dopieralski wrote:
>> On 01/08/2016 09:51 PM, Mike Bayer wrote:
>>>
>>>
>>> On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:
 On 01/07/2016 05:55 PM, Mike Bayer wrote:

> but also even if you're under something like
> mod_wsgi, you can spawn a child process or worker thread regardless.
> You always have a Python interpreter running and all the things it can
> do.

 Actually you can't, reliably. Or, more precisely, you really shouldn't.
 Most web servers out there expect to do their own process/thread
 management and get really embarrassed if you do something like this,
 resulting in weird stuff happening.
>>>
>>> I have to disagree with this as an 

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-11 Thread Radomir Dopieralski

On 01/08/2016 09:51 PM, Mike Bayer wrote:



On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:

On 01/07/2016 05:55 PM, Mike Bayer wrote:


but also even if you're under something like
mod_wsgi, you can spawn a child process or worker thread regardless.
You always have a Python interpreter running and all the things it can
do.


Actually you can't, reliably. Or, more precisely, you really shouldn't.
Most web servers out there expect to do their own process/thread
management and get really embarrassed if you do something like this,
resulting in weird stuff happening.


I have to disagree with this as an across-the-board rule, partially
because my own work in building an enhanced database connection
management system is probably going to require that a background thread
be running in order to reap stale database connections.   Web servers
certainly do their own process/thread management, but a thoughtfully
organized background thread in conjunction with a supporting HTTP
service allows this to be feasible.   In the case of mod_wsgi,
particularly when using mod_wsgi in daemon mode, spawning of threads,
processes and in some scenarios even wholly separate applications are
supported use cases.


[...]


It is certainly reasonable that not all web application containers would
be effective with apps that include custom background threads or
processes (even though IMO any system that's running a Python
interpreter shouldn't have any issues with a limited number of
well-behaved daemon-mode threads), but at least in the case of mod_wsgi,
this is supported; that gives Openstack's HTTP-related applications with
carefully/thoughtfully organized background threads at least one
industry-standard alternative besides being forever welded to its
current homegrown WSGI server implementation.


This is still writing your application for a specific configuration of a 
specific version of a specific implementation of the protocol on a 
specific web server. While this may work as a stopgap solution, I think
it's a really bad long-term strategy. We should be programming for a 
protocol specification (WSGI in this case), not for a particular 
implementation (unless we need to throw in workarounds for 
implementation bugs). This way of thinking led to the trouble we have 
right now, and the fix is not to change the code to exploit another 
specific implementation, but to rewrite it so that it works on any 
compatible web server. If possible.


At least it seems so to my naive programmer mind. Sorry for ranting,
I'm sure that you are aware of the trade-off here.

--
Radomir Dopieralski

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-11 Thread Mike Bayer


On 01/11/2016 05:39 AM, Radomir Dopieralski wrote:
> On 01/08/2016 09:51 PM, Mike Bayer wrote:
>>
>>
>> On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:
>>> On 01/07/2016 05:55 PM, Mike Bayer wrote:
>>>
 but also even if you're under something like
 mod_wsgi, you can spawn a child process or worker thread regardless.
 You always have a Python interpreter running and all the things it can
 do.
>>>
>>> Actually you can't, reliably. Or, more precisely, you really shouldn't.
>>> Most web servers out there expect to do their own process/thread
>>> management and get really embarrassed if you do something like this,
>>> resulting in weird stuff happening.
>>
>> I have to disagree with this as an across-the-board rule, partially
>> because my own work in building an enhanced database connection
>> management system is probably going to require that a background thread
>> be running in order to reap stale database connections.   Web servers
>> certainly do their own process/thread management, but a thoughtfully
>> organized background thread in conjunction with a supporting HTTP
>> service allows this to be feasible.   In the case of mod_wsgi,
>> particularly when using mod_wsgi in daemon mode, spawning of threads,
>> processes and in some scenarios even wholly separate applications are
>> supported use cases.
> 
> [...]
> 
>> It is certainly reasonable that not all web application containers would
>> be effective with apps that include custom background threads or
>> processes (even though IMO any system that's running a Python
>> interpreter shouldn't have any issues with a limited number of
>> well-behaved daemon-mode threads), but at least in the case of mod_wsgi,
>> this is supported; that gives Openstack's HTTP-related applications with
>> carefully/thoughtfully organized background threads at least one
>> industry-standard alternative besides being forever welded to its
>> current homegrown WSGI server implementation.
> 
> This is still writing your application for a specific configuration of a
> specific version of a specific implementation of the protocol on a
> specific web server. While this may work as a stopgap solution, I think
> it's a really bad long-term strategy. We should be programming for a
> protocol specification (WSGI in this case), not for a particular
> implementation (unless we need to throw in workarounds for
> implementation bugs). 

That is fine, but then you are saying that all of those aforementioned
Nova services which do in fact use WSGI with its own homegrown eventlet
server should nevertheless be rewritten to not use any background
threads, which I also presented as the ideal choice.   Right now, the
fact that these Nova services use background threads is being used as a
justification for why these services can never move to use a proper web
server, even though they are still WSGI apps running inside of a WSGI
container, so they are already doing the thing that claims to prevent
this move from being possible.

Also, mod_wsgi's compatibility with background threads is not linked to
a "specific version", it's intrinsic in the organization of the product.
  I would wager that most other WSGI containers can probably handle this
use case as well but this would need to be confirmed.





> 
> At least it seems so to my naive programmer mind. Sorry for ranting,
> I'm sure that you are aware of the trade-off here.
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Mike Bayer


On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:
> On 01/07/2016 05:55 PM, Mike Bayer wrote:
> 
>> but also even if you're under something like
>> mod_wsgi, you can spawn a child process or worker thread regardless.
>> You always have a Python interpreter running and all the things it can
>> do.
> 
> Actually you can't, reliably. Or, more precisely, you really shouldn't.
> Most web servers out there expect to do their own process/thread
> management and get really embarrassed if you do something like this,
> resulting in weird stuff happening.

I have to disagree with this as an across-the-board rule, partially
because my own work in building an enhanced database connection
management system is probably going to require that a background thread
be running in order to reap stale database connections.   Web servers
certainly do their own process/thread management, but a thoughtfully
organized background thread in conjunction with a supporting HTTP
service allows this to be feasible.   In the case of mod_wsgi,
particularly when using mod_wsgi in daemon mode, spawning of threads,
processes and in some scenarios even wholly separate applications are
supported use cases.

In mod_wsgi daemon mode (which is its recommended mode of use [1]), the
Python interpreter is not in-process with Apache in any case, and if you
set your own thread to be a "daemon", it won't block the process from
exiting.   I have successfully used this technique (again, carefully and
thoughtfully) to achieve asynchronous workers within Apache mod_wsgi
daemon-mode processes, without negative consequences.

Graham Dumpleton's own mod_wsgi documentation illustrates how to run a
background thread on development servers in mod_wsgi daemon mode in
order to achieve code reloading [2], and he also has produced a tool [3]
that uses a background thread in order to provide a debugging shell to a
running WSGI application which can work in either embedded or daemon
mode.

In [4], he illustrates using the WSGIImportScript mod_wsgi directive
under mod_wsgi daemon mode to actually spawn a whole docker container
when an Apache mod_wsgi process starts up; this isn't something I'd want
to do myself, but this is the author of mod_wsgi illustrating even
something as heavy as spinning up a whole docker instance under mod_wsgi
which then runs it's own WSGI process as a supported technique.

It is certainly reasonable that not all web application containers would
be effective with apps that include custom background threads or
processes (even though IMO any system that's running a Python
interpreter shouldn't have any issues with a limited number of
well-behaved daemon-mode threads), but at least in the case of mod_wsgi,
this is supported; that gives Openstack's HTTP-related applications with
carefully/thoughtfully organized background threads at least one
industry-standard alternative besides being forever welded to its
current homegrown WSGI server implementation.

[1] http://lanyrd.com/2013/pycon/scdyzk/

[2] https://code.google.com/p/modwsgi/wiki/ReloadingSourceCode

[3] https://github.com/GrahamDumpleton/ispyd

[4]
http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Chris Friesen

On 01/07/2016 06:55 PM, Mike Bayer wrote:



On 01/07/2016 11:02 AM, Sean Dague wrote:

On 01/07/2016 09:56 AM, Brant Knudson wrote:



On Thu, Jan 7, 2016 at 6:39 AM, Clayton O'Neill > wrote:

 On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka
 > wrote:
 >
 > Linux gurus please correct me here, but my understanding is that Linux
 > kernel queues up to $backlog number of connections *per socket*. In
 > our case child processes inherited the FD of the socket, so they will
 > accept() connections from the same queue in the kernel, i.e. the
 > backlog value is for *all* child processes, not *per* process.


 Yes, it will be shared across all children.

 >
 > In each child process eventlet WSGI server calls accept() in a loop to
 > get a client socket from the kernel and then puts into a greenlet from
 > a pool for processing:

 It’s worse than that.  What I’ve seen (via strace) is that eventlet
 actually
 converts socket into a non-blocking socket, then converts that
 accept() into a
 epoll()/accept() pair in every child.  Then when a connection comes
 in, every
 child process wakes up out of poll and races to try to accept on the the
 non-blocking socket, and all but one of them fails.

 This means that every time there is a request, every child process
 is woken
 up, scheduled on CPU and then put back to sleep.  This is one of the
 reasons we’re (slowly) moving to uWSGI.


I just want to note that I've got a change proposed to devstack that
adds a config option to run keystone in uwsgi (rather than under
eventlet or in apache httpd mod_wsgi), see
https://review.openstack.org/#/c/257571/ . It's specific to keystone
since I didn't think other projects were moving away from eventlet, too.


I feel like this is a confused point that keeps being brought up.

The preferred long term direction of all API services is to be deployed
on a real web server platform. It's a natural fit for those services as
they are accepting HTTP requests and doing things with them.

Most OpenStack projects have worker services beyond just an HTTP server.
(Keystone is one of the very few exceptions here). Nova has nearly a
dozen of these worker services. These don't naturally fit as wsgi apps,
they are more traditional daemons, which accept requests over the
network, but also have periodic jobs internally and self initiate
actions. They are not just call / response. There is no long term
direction for these to move off of eventlet.


This is totally speaking as an outsider without taking into account all
the history of these decisions, but the notion of "Python + we're a
daemon" == "we must use eventlet" seems a little bit rigid.  Also, the
notion of "we have background tasks" == "we cant run in a web server",
also not clear.  If a service intends to serve HTTP requests, that
portion of that service should be deployed in a web server; if the
system has other "background tasks", ideally those are in a separate
daemon altogether, but also even if you're under something like
mod_wsgi, you can spawn a child process or worker thread regardless.
You always have a Python interpreter running and all the things it can do.


In the case of nova at least most (all?) of these separate worker services do 
not process HTTP requests, but rather RPC requests.


It might make sense for nova-api to run under a web server even if 
nova-compute/nova-conductor/nova-scheduler/etc don't.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Chris Friesen

On 01/07/2016 05:44 PM, Mike Bayer wrote:



On 01/07/2016 07:39 AM, Clayton O'Neill wrote:

On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka  wrote:


Linux gurus please correct me here, but my understanding is that Linux
kernel queues up to $backlog number of connections *per socket*. In
our case child processes inherited the FD of the socket, so they will
accept() connections from the same queue in the kernel, i.e. the
backlog value is for *all* child processes, not *per* process.



Yes, it will be shared across all children.



In each child process eventlet WSGI server calls accept() in a loop to
get a client socket from the kernel and then puts into a greenlet from
a pool for processing:


It’s worse than that.  What I’ve seen (via strace) is that eventlet actually
converts socket into a non-blocking socket, then converts that accept() into a
epoll()/accept() pair in every child.  Then when a connection comes in, every
child process wakes up out of poll and races to try to accept on the the
non-blocking socket, and all but one of them fails.


is that eventlet-specific or would we see the same thing in gevent ?


If you've got multiple processes all doing select()/poll()/epoll()/etc on a 
single socket that has become readable, you're going to run into this sort of 
thundering herd problem unless you have a separate mechanism to serialize things.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Radomir Dopieralski

On 01/07/2016 05:55 PM, Mike Bayer wrote:


but also even if you're under something like
mod_wsgi, you can spawn a child process or worker thread regardless.
You always have a Python interpreter running and all the things it can do.


Actually you can't, reliably. Or, more precisely, you really shouldn't. 
Most web servers out there expect to do their own process/thread 
management and get really embarrassed if you do something like this,

resulting in weird stuff happening.

--
Radomir Dopieralski

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Chris Friesen

On 01/07/2016 09:49 AM, Roman Podoliaka wrote:

Actually we already do that in the parent process. The parent process:

1) starts and creates a socket

2) binds the socket and calls listen() on it passing the backlog value
(http://linux.die.net/man/2/listen)

3) passes the socket to the eventlet WSGI server
(https://github.com/openstack/oslo.service/blob/master/oslo_service/wsgi.py#L177-L192)

4) forks $*_workers times (child processes inherit all open file
descriptors including the socket one)

5) child processes call accept() in a loop

Linux gurus please correct me here, but my understanding is that Linux
kernel queues up to $backlog number of connections *per socket*. In
our case child processes inherited the FD of the socket, so they will
accept() connections from the same queue in the kernel, i.e. the
backlog value is for *all* child processes, not *per* process.


I believe this is correct, the limit is on the (shared) socket and not the 
individual processes.


Also, an interesting point from the listen man page above:

"If the backlog argument is greater than the value in 
/proc/sys/net/core/somaxconn, then it is silently truncated to that value; the 
default value in this file is 128."


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-07 Thread Mike Bayer


On 01/07/2016 11:02 AM, Sean Dague wrote:
> On 01/07/2016 09:56 AM, Brant Knudson wrote:
>>
>>
>> On Thu, Jan 7, 2016 at 6:39 AM, Clayton O'Neill > > wrote:
>>
>> On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka
>> > wrote:
>> >
>> > Linux gurus please correct me here, but my understanding is that Linux
>> > kernel queues up to $backlog number of connections *per socket*. In
>> > our case child processes inherited the FD of the socket, so they will
>> > accept() connections from the same queue in the kernel, i.e. the
>> > backlog value is for *all* child processes, not *per* process.
>>
>>
>> Yes, it will be shared across all children.
>>
>> >
>> > In each child process eventlet WSGI server calls accept() in a loop to
>> > get a client socket from the kernel and then puts into a greenlet from
>> > a pool for processing:
>>
>> It’s worse than that.  What I’ve seen (via strace) is that eventlet
>> actually
>> converts socket into a non-blocking socket, then converts that
>> accept() into a
>> epoll()/accept() pair in every child.  Then when a connection comes
>> in, every
>> child process wakes up out of poll and races to try to accept on the the
>> non-blocking socket, and all but one of them fails.
>>
>> This means that every time there is a request, every child process
>> is woken
>> up, scheduled on CPU and then put back to sleep.  This is one of the
>> reasons we’re (slowly) moving to uWSGI.
>>
>>
>> I just want to note that I've got a change proposed to devstack that
>> adds a config option to run keystone in uwsgi (rather than under
>> eventlet or in apache httpd mod_wsgi), see
>> https://review.openstack.org/#/c/257571/ . It's specific to keystone
>> since I didn't think other projects were moving away from eventlet, too.
> 
> I feel like this is a confused point that keeps being brought up.
> 
> The preferred long term direction of all API services is to be deployed
> on a real web server platform. It's a natural fit for those services as
> they are accepting HTTP requests and doing things with them.
> 
> Most OpenStack projects have worker services beyond just an HTTP server.
> (Keystone is one of the very few exceptions here). Nova has nearly a
> dozen of these worker services. These don't naturally fit as wsgi apps,
> they are more traditional daemons, which accept requests over the
> network, but also have periodic jobs internally and self initiate
> actions. They are not just call / response. There is no long term
> direction for these to move off of eventlet.

This is totally speaking as an outsider without taking into account all
the history of these decisions, but the notion of "Python + we're a
daemon" == "we must use eventlet" seems a little bit rigid.  Also, the
notion of "we have background tasks" == "we cant run in a web server",
also not clear.  If a service intends to serve HTTP requests, that
portion of that service should be deployed in a web server; if the
system has other "background tasks", ideally those are in a separate
daemon altogether, but also even if you're under something like
mod_wsgi, you can spawn a child process or worker thread regardless.
You always have a Python interpreter running and all the things it can do.

> 
>   -Sean
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-07 Thread Clayton O'Neill
On Thu, Jan 7, 2016 at 10:44 AM, Mike Bayer  wrote:
> On 01/07/2016 07:39 AM, Clayton O'Neill wrote:
>> On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka  
>> wrote:
>>> In each child process eventlet WSGI server calls accept() in a loop to
>>> get a client socket from the kernel and then puts into a greenlet from
>>> a pool for processing:
>>
>> It’s worse than that.  What I’ve seen (via strace) is that eventlet actually
>> converts socket into a non-blocking socket, then converts that accept() into 
>> a
>> epoll()/accept() pair in every child.  Then when a connection comes in, every
>> child process wakes up out of poll and races to try to accept on the the
>> non-blocking socket, and all but one of them fails.
>
> is that eventlet-specific or would we see the same thing in gevent ?

I’m not sure.  For eventlet it’s a natural consequence of most of this being
implemented in Python.  It looks like some of this is implemented in C in
gevent, so they may handle the situation differently.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-06 Thread Roman Podoliaka
Hi Mike,

Thank you for this brilliant analysis! We've been seeing such timeout
errors in downstream periodically and this is the first time someone
has analysed the root cause thoroughly.

On Fri, Dec 18, 2015 at 10:33 PM, Mike Bayer  wrote:
> Hi all -
>
> Let me start out with the assumptions I'm going from for what I want to
> talk about.
>
> 1. I'm looking at Nova right now, but I think similar things are going
> on in other Openstack apps.
>
> 2. Settings that we see in nova.conf, including:
>
> #wsgi_default_pool_size = 1000
> #max_pool_size = 
> #max_overflow = 
> #osapi_compute_workers = 
> #metadata_workers = 
>
>
> are often not understood by deployers, and/or are left unchanged in a
> wide variety of scenarios.If you are in fact working for deployers
> that *do* change these values to something totally different, then you
> might not be impacted here, and if it turns out that everyone changes
> all these settings in real-world scenarios and zzzeek you are just being
> silly thinking nobody sets these appropriately, then fooey for me, I guess.

My understanding is that DB connection pool / workers number options
are usually changed, while the number of eventlet greenlets is not:

http://codesearch.openstack.org/?q=wsgi_default_pool_size=nope==
http://codesearch.openstack.org/?q=max_pool_size=nope==

I think it's for "historical" reasons when MySQL-Python was considered
to be the default DB API driver and we had to work around its
concurrency issues with eventlet by using multiple forks of services.

But as you point out even with a non-blocking DB API driver like
pymysql we are still having problems with timeouts due to pool vs
greenlets number settings.

> 3. There's talk about more Openstack services, at least Nova from what I
> heard the other day, moving to be based on a real webserver deployment
> in any case, the same way Keystone is.   To the degree this is true
> would also mitigate what I'm seeing but still, there's good changes that
> can be made here.

I think, ideally we'd like to have "wsgi container agnostic" apps not
coupled to eventlet or anything else - so that it will be up to a
deployer to choose the application server.

> But if we only have a super low number of greenlets and only a few dozen
> workers, what happens if we have more than 240 requests come in at once,
> aren't those connections going to get rejected?  No way!  eventlet's
> networking system is better than that, those connection requests just
> get queued up in any case, waiting for a greenlet to be available.  Play
> with the script and its settings to see.

Right, it must be controlled by the backlog argument value here:

https://github.com/openstack/oslo.service/blob/master/oslo_service/wsgi.py#L80

> But if we're blocking any connection attempts based on what's available
> at the database level, aren't we under-utilizing for API calls that need
> to do a lot of other things besides DB access?  The answer is that may
> very well be true!   Which makes the guidance more complicated based on
> what service we are talking about.   So here, my guidance is oriented
> towards those Openstack services that are primarily doing database
> access as their primary work.

I believe, all our APIs are pretty much DB oriented.

> Given the above caveat, I'm hoping people can look at this and verify my
> assumptions and the results.Assuming I am not just drunk on eggnog,
> what would my recommendations be?  Basically:
>
> 1. at least for DB-oriented services, the number of 1000 greenlets
> should be *way* *way* lower, and we most likely should allow for a lot
> more connections to be used temporarily within a particular worker,
> which means I'd take the max_overflow setting and default it to like 50,
> or 100.   The Greenlet number should then be very similar to the
> max_overflow number, and maybe even a little less, as Nova API calls
> right now often will use more than one connection concurrently.

I suggest we tweak the config options values in both oslo.service and
oslo.db to provide reasonable production defaults and document the
"correlation" between DB connection pool / greenlet workers number
settings.

> 2. longer term, let's please drop the eventlet pool thing and just use a
> real web server!  (but still tune the connection pool appropriately).  A
> real web server will at least know how to efficiently direct requests to
> worker processes.   If all Openstack workers were configurable under a
> single web server config, that would also be a nice way to centralize
> tuning and profiling overall.

I'd rather we simply not couple to eventlet unconditionally and allow
deployers to choose the WSGI container they want to use.

Thanks,
Roman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-06 Thread Mike Bayer


On 01/06/2016 09:11 AM, Roman Podoliaka wrote:
> Hi Mike,
> 
> Thank you for this brilliant analysis! We've been seeing such timeout
> errors in downstream periodically and this is the first time someone
> has analysed the root cause thoroughly.
> 
> On Fri, Dec 18, 2015 at 10:33 PM, Mike Bayer  wrote:
> 
>> But if we only have a super low number of greenlets and only a few dozen
>> workers, what happens if we have more than 240 requests come in at once,
>> aren't those connections going to get rejected?  No way!  eventlet's
>> networking system is better than that, those connection requests just
>> get queued up in any case, waiting for a greenlet to be available.  Play
>> with the script and its settings to see.
> 
> Right, it must be controlled by the backlog argument value here:
> 
> https://github.com/openstack/oslo.service/blob/master/oslo_service/wsgi.py#L80

oh wow, totally missed that!  But, how does backlog here interact with
multiple processes?   E.g. all workers are saturated, it will place a
waiting connection onto a random greenlet which then has to wait?  It
would be better if the "backlog" were pushed up to the parent process,
not sure if that's possible?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2015-12-18 Thread Mike Bayer
Hi all -

Let me start out with the assumptions I'm going from for what I want to
talk about.

1. I'm looking at Nova right now, but I think similar things are going
on in other Openstack apps.

2. Settings that we see in nova.conf, including:

#wsgi_default_pool_size = 1000
#max_pool_size = 
#max_overflow = 
#osapi_compute_workers = 
#metadata_workers = 


are often not understood by deployers, and/or are left unchanged in a
wide variety of scenarios.If you are in fact working for deployers
that *do* change these values to something totally different, then you
might not be impacted here, and if it turns out that everyone changes
all these settings in real-world scenarios and zzzeek you are just being
silly thinking nobody sets these appropriately, then fooey for me, I guess.


3. There's talk about more Openstack services, at least Nova from what I
heard the other day, moving to be based on a real webserver deployment
in any case, the same way Keystone is.   To the degree this is true
would also mitigate what I'm seeing but still, there's good changes that
can be made here.


Basically, the syndrome I want to talk about can be mostly mitigated
just by changing the numbers around in #2, but I don't really know that
people know any of this, and also I think some of the defaults here
should just be changed completely as their current values are useless in
pretty much all cases.

Suppose we run on a 24-core machine, and therefore have 24 API worker
processes.  Each worker represents a WSGI server, which will use an
eventlet greenlet pool with 1000 greenlets.

Then, assuming neither max_pool_size or max_overflow is changed, this
indicates that for a single SQLAlchemy Engine, the most database
connections that are allowed by this Engine at one time is *15*.
pool_size defaults to 5 and max_overflow defaults to 10.  We get our
engine from oslo.db however oslo.db does not change these defaults which
ultimately come from SQLAlchemy itself.

The problem is then basically that 1000 greenlets is way, way more than
15, meaning hundreds of requests can all pile up on a process and all be
blocked, waiting for a connection pool that's been configured to only
allow 15 database connections at most.

But wait!  You say.   We have twenty-four worker processes.  So if we
had 100 concurrent requests, these requests would not all pile up on
just one process, they'd be distributed among the workers.  Any
additional requests beyond the 15 * 24 == 360 that we can handle
(assuming a simplistic relationship between API requests and database
connections, which it is not) would just queue up as they do anyway, so
it makes no difference.  Right?   **Right???*

It does make a difference!  Because show me in nova source code where
exactly this algorithm is that knows how to distribute requests evenly
among the workers...There is no such logic!   Some months ago, I began
thinking and fretting, how the heck does this work?   There's 24
workers, one socket.accept(), requests come in and sockets are somehow
divvyed up to child forks, but *how*?  I asked some of the deep unix
gurus locally here, and the best answer we could come up with is:  it's
random!

Cue the mythbusters music.   "Nova receives WSGI requests and sends them
to workers with a random distribution, meaning that under load, some
workers will have too many requests and be waiting on DB access which
can in fact cause pool timeout issues under very latent circumstances,
others will be more idle than they should be".

As we begin the show, we cut into a background segment where we show
that in fact, Mike and some other folks doing load testing actually
*see* connection pool timeout errors in the logs already, on a 24 core
machine, even though we see hundreds of idle connections at the same
time (just to note, the error we are talking about is "QueuePool limit
of size 5 overflow 5 reached, connection timed out, timeout 5").   So
that we actually see this happening in an actual test situation is what
led me to finally just write a test suite for the whole thing.

Here's the test suite!
https://gist.github.com/zzzeek/c69138fd0d0b3e553a1f  I've tried to make
this as super-simple as possible to read, use, and understand.  It uses
Nova's nova.wsgi.Server directly with a simple "hello-world" style app,
as well as oslo_service.service.Service and service.launch() the same
way I see in nova/service.py (please confirm I'm using all the right
code and things here just like in Nova, thanks!).   The "hello world"
app gets a connection from the pool, does nothing with it, waits a few
seconds then returns it.   All the while counting everything going on
and reporting on its metrics every 10 requests.

The "hello world" app uses a SQLAlchemy connection pool with a little
bit lower number of connections, and a timeout of only ten seconds
instead of thirty by default (but feel free to change it on the command
line), and a "work" operation that takes a random amount of time between
zero and five