Re: Multiple Pylons instances, processor affinity, and threads

2008-04-26 Thread Graham Dumpleton

On Apr 25, 4:59 am, Devin Torres [EMAIL PROTECTED] wrote:
   Use Apache and mod_wsgi and you have all that you want except playing
   with 'processor affinity'. This is because Apache is multi process by
   design and thus can properly make use of multiple CPUs. A lot of what
   goes on in Apache is also not implemented in Python and thus not
   subject to GIL issues.

   You might also have a read of the following:

   http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws...
   http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

 Read.

 Apache is starting to look attractive now. So I assume I'm not looking
 for embedded mode, right? You say it's more performance, but at the
 cost of what? Using the worker MPM and, say, daemon-mode, using, say,
 4 processes and 16 threads each, would my processes be dying as soon
 as they're not needed? My application takes awhile to load because I
 autoload my database using SQLAlchemy. Is it that easy to configure
 apache to start 4 by default and load balance between all of them?

If you are running a web site that requires the absolute best
performance possible, you would dedicate the Apache instance to
running just the one Python web application. That Apache instance
would be setup to use prefork MPM and you would use mod_wsgi embedded
mode. You would turn off keep alive for the Apache instance. You would
throw as much memory as possible into the system and you would use a
dedicated machine and not a VPS.

At the same time, all static media would be served from a distinct
nginx or lighttpd instance or via a content delivery provider. The
static media server would still use keep alive.

A typical default Apache prefork configuration is:

IfModule mpm_prefork_module
StartServers  5
MinSpareServers   5
MaxSpareServers  10
MaxClients  150
MaxRequestsPerChild   0
/IfModule

That is, initially create 5 child processes for serving requests. To
support maximum 150 clients at a time, because each child process is
single threaded, it can theoretically create up to 150 child processes
to handle requests, if demand so requires. As demand drops off and
process become unused, it will start to kill of additional child
processes, when this occurs it will keep arise between 5 and 10 of
these additional servers as spares for future bursts in traffic.

Apart from where it creates additional process to meet demand and then
kills them off when no longer required, the child process will be kept
around for ever. This is because max requests per child is set to 0.
If you had a problem with memory creep in an application, you could
set max requests per child to some non zero number and child processes
would be recycled after than number of requests.

Now, depending on how expensive loading the application is initially
and what you expected traffic volume is, you would customise these
values to keep as many persistent child processes around as possible
to meet average demand, plus some measure of bursts in traffic. What
the values would be you would have to experiment with.

Anyway, that is the extreme end where performance is the most
important thing. In this case you would use prefork MPM and mod_wsgi
embedded mode.

The other extreme end is a memory constrained system, in which case
you would use worker MPM, with small number of initial Apache child
processes, plus use mod_wsgi daemon mode with single daemon process
with limited number of threads. Static media would be served on same
Apache instance.

The limited number of threads would be to minimise possibility of
memory blowing out due to multiple concurrent requests allocating a
lot of transient memory at the same time. To temper this one would set
maximum number of requests for a process and set inactivity timeouts
so that daemon processes recycled if not doing anything, thus bring
memory back to minimal levels.

Apache, through which MPM you use and how you configure it, plus
mod_wsgi and whether you use embedded mode or daemon mode, plus how
you configure daemon mode, provide a great deal of flexibility in
creating a setup anywhere between these extremes. What configuration
is going to be best really depends on a lot of different issues, many
of which you don't expand on, such as how important is performance,
how much memory is available, how much memory your applications
require etc etc etc.

Even when you think you have a good idea of what sort of configuration
will work, you need to then properly test it, as well as compare that
performance to alternate configurations.

Personally I'd probably just suggest you start out with mod_wsgi
embedded mode with either prefork or worker MPM and just see how it
goes and get a feel for how Apache works, especially with respect to
it use of multiple processes to handle requests.

For most peoples web site applications, the configuration doesn't
generally matter that much as there application never has high 

Re: Multiple Pylons instances, processor affinity, and threads

2008-04-24 Thread Marcin Kasperski

 Given this situation, I believe that despite paste making an effort to
 be multithreaded, it would still be advantageous to run a cluster of
 four Pylons instances and proxy to these using nginx.

You may consider apache with mod_wsgi, it can be simpler to manage in
such context. In particular WSGIDaemonProcess let you set the number
of dedicated processess...

-- 
--
| Marcin Kasperski   | A process that is too complex will fail.
| http://mekk.waw.pl |  (Booch)
||
--


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-24 Thread Devin Torres

  Use Apache and mod_wsgi and you have all that you want except playing
  with 'processor affinity'. This is because Apache is multi process by
  design and thus can properly make use of multiple CPUs. A lot of what
  goes on in Apache is also not implemented in Python and thus not
  subject to GIL issues.

  You might also have a read of the following:

   http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html
   http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

Read.

Apache is starting to look attractive now. So I assume I'm not looking
for embedded mode, right? You say it's more performance, but at the
cost of what? Using the worker MPM and, say, daemon-mode, using, say,
4 processes and 16 threads each, would my processes be dying as soon
as they're not needed? My application takes awhile to load because I
autoload my database using SQLAlchemy. Is it that easy to configure
apache to start 4 by default and load balance between all of them?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Devin Torres

So we're using Pylons and Python in general for our new company
platform. We just bought a server with 4 cores to help us reach our
scalability goals, but there are a few questions I'm interested in
asking the Pylons community.

I (mostly) understand the nature of threads in Python. From my
understanding, the GIL locks the interpreter to executing only one
Python thread at a time, but C modules can take advantage of a Python
application being multithreaded, because they can operate independant
of the GIL. Presumably, this would mean that there is, in fact, a
benefit to using threads in Paste, because most network I/O bound
stuff happens within a C module.

Given this situation, I believe that despite paste making an effort to
be multithreaded, it would still be advantageous to run a cluster of
four Pylons instances and proxy to these using nginx.

Using our setup we'd have four pylons instances being proxied to by
four nginx worker threads.

In nginx you can set the processor affinity for each worker thread,
thus placing each worker on a different core 0..3.

Here's where things get tricky:
I've found a Python package that apparently allows Python applications
to set their processor affinity (I'm afraid it doesn't work on OS X):
http://pypi.python.org/pypi/affinity/0.1.0

Using this, what do you guys thing on my idea to write a custom
cluster controller, perhaps using supervisord, that will start nginx
and the four worker processes, and then fork()'s my Pylons app into
into a cluster of four?

Is this overkill? Is Paste more mulithreaded than I'm giving it credit
for? Is there a better way to go about this? Does an alternative to
the 'affinity' package exist?

-Devin Torres

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Ian Bicking

Devin Torres wrote:
 So we're using Pylons and Python in general for our new company
 platform. We just bought a server with 4 cores to help us reach our
 scalability goals, but there are a few questions I'm interested in
 asking the Pylons community.
 
 I (mostly) understand the nature of threads in Python. From my
 understanding, the GIL locks the interpreter to executing only one
 Python thread at a time, but C modules can take advantage of a Python
 application being multithreaded, because they can operate independant
 of the GIL. Presumably, this would mean that there is, in fact, a
 benefit to using threads in Paste, because most network I/O bound
 stuff happens within a C module.
 
 Given this situation, I believe that despite paste making an effort to
 be multithreaded, it would still be advantageous to run a cluster of
 four Pylons instances and proxy to these using nginx.

Separate processes is likely to work better.  You might find one of the 
flup forking servers to be better (using fastcgi), though I don't know 
for sure.  That will run each request in its own process, so you'll get 
multiple processes without the same infrastructure complications of a 
cluster of servers.

I don't think affinity should be that important.  Doesn't the OS handle 
that itself?

-- 
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Devin Torres

If I understand you correctly, there's a flup entry point that forks
the process instead of flup_fcgi_thread? I'm not sure that would have
good performance, but maybe you think forking is capable of good
performance in this case. After forking, would SQLAlchemy connections
stay persistent? Is that safe?

Also, is it only fastcgi, not scgi as well?

-Devin

On Wed, Apr 23, 2008 at 12:56 PM, Ian Bicking [EMAIL PROTECTED] wrote:

  Devin Torres wrote:
   So we're using Pylons and Python in general for our new company
   platform. We just bought a server with 4 cores to help us reach our
   scalability goals, but there are a few questions I'm interested in
   asking the Pylons community.
  
   I (mostly) understand the nature of threads in Python. From my
   understanding, the GIL locks the interpreter to executing only one
   Python thread at a time, but C modules can take advantage of a Python
   application being multithreaded, because they can operate independant
   of the GIL. Presumably, this would mean that there is, in fact, a
   benefit to using threads in Paste, because most network I/O bound
   stuff happens within a C module.
  
   Given this situation, I believe that despite paste making an effort to
   be multithreaded, it would still be advantageous to run a cluster of
   four Pylons instances and proxy to these using nginx.

  Separate processes is likely to work better.  You might find one of the
  flup forking servers to be better (using fastcgi), though I don't know
  for sure.  That will run each request in its own process, so you'll get
  multiple processes without the same infrastructure complications of a
  cluster of servers.

  I don't think affinity should be that important.  Doesn't the OS handle
  that itself?

  --
  Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org

  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Ian Bicking

Devin Torres wrote:
 If I understand you correctly, there's a flup entry point that forks
 the process instead of flup_fcgi_thread? I'm not sure that would have
 good performance, but maybe you think forking is capable of good
 performance in this case. After forking, would SQLAlchemy connections
 stay persistent? Is that safe?

Hmm... well, yeah, that probably wouldn't work well -- I think each 
request being a new fork won't get any shared connections.  So perhaps a 
cluster of servers would work better for you.

-- 
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread climbus

Devin Torres napisał(a):

 Given this situation, I believe that despite paste making an effort to
 be multithreaded, it would still be advantageous to run a cluster of
 four Pylons instances and proxy to these using nginx.

We're using 2 instances of paster with few threads. It's working
better than one instance. We have apache load balancer in front.

Configuration:

[server:main]
use  = egg:paste#http
host = 0.0.0.0
port = 5000
use_threadpool = True
threadpool_workers = 10

[server:main2]
use  = egg:paste#http
host = 0.0.0.0
port = 5001
use_threadpool = True
threadpool_workers = 10

Start commands:

paster serve production.ini --server-name=main --pid-file=main.pid --
log-file=main.log --daemon start
paster serve production.ini --server-name=main2 --pid-file=main2.pid --
log-file=main2.log --daemon start

Apache conf:

RewriteRule   ^(.*)$ balancer://somename$1 [P,L]

Proxy balancer://somename
BalancerMember http://127.0.0.1:5000 retry=3
BalancerMember http://127.0.0.1:5001 retry=3
/Proxy

You can use nginx too.

Climbus

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Christopher Weimann

Devin Torres wrote:
 If I understand you correctly, there's a flup entry point that forks
 the process instead of flup_fcgi_thread? I'm not sure that would have
 good performance, but maybe you think forking is capable of good
 performance in this case. After forking, would SQLAlchemy connections
 stay persistent? Is that safe?
 

Flup has both fcgi_fork and scgi_fork flavors.  They are pre-fork so it
creates a pool of long running processes and it passes connections to 
them.  This is the same model that Apache uses an is in theory quite 
efficient.  You do NOT have to wait for a fork on every connection 
because the pool of processes has been forked in advance and is ready 
and waiting.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Devin Torres

On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann
[EMAIL PROTECTED] wrote:
  Flup has both fcgi_fork and scgi_fork flavors.  They are pre-fork so it
  creates a pool of long running processes and it passes connections to
  them.  This is the same model that Apache uses an is in theory quite
  efficient.  You do NOT have to wait for a fork on every connection
  because the pool of processes has been forked in advance and is ready
  and waiting.

Do you happen to know the applicable setting to use when specifying
the size of that pool?

-Devin Torres

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Graham Dumpleton

On Apr 24, 3:51 am, Devin Torres [EMAIL PROTECTED] wrote:
 So we're using Pylons and Python in general for our new company
 platform. We just bought a server with 4 cores to help us reach our
 scalability goals, but there are a few questions I'm interested in
 asking the Pylons community.

 I (mostly) understand the nature of threads in Python. From my
 understanding, the GIL locks the interpreter to executing only one
 Python thread at a time, but C modules can take advantage of a Python
 application being multithreaded, because they can operate independant
 of the GIL. Presumably, this would mean that there is, in fact, a
 benefit to using threads in Paste, because most network I/O bound
 stuff happens within a C module.

 Given this situation, I believe that despite paste making an effort to
 be multithreaded, it would still be advantageous to run a cluster of
 four Pylons instances and proxy to these using nginx.

 Using our setup we'd have four pylons instances being proxied to by
 four nginx worker threads.

 In nginx you can set the processor affinity for each worker thread,
 thus placing each worker on a different core 0..3.

 Here's where things get tricky:
 I've found a Python package that apparently allows Python applications
 to set their processor affinity (I'm afraid it doesn't work on OS 
 X):http://pypi.python.org/pypi/affinity/0.1.0

 Using this, what do you guys thing on my idea to write a custom
 cluster controller, perhaps using supervisord, that will start nginx
 and the four worker processes, and then fork()'s my Pylons app into
 into a cluster of four?

 Is this overkill? Is Paste more mulithreaded than I'm giving it credit
 for? Is there a better way to go about this? Does an alternative to
 the 'affinity' package exist?

Use Apache and mod_wsgi and you have all that you want except playing
with 'processor affinity'. This is because Apache is multi process by
design and thus can properly make use of multiple CPUs. A lot of what
goes on in Apache is also not implemented in Python and thus not
subject to GIL issues.

You might also have a read of the following:

  http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html
  http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

These explain some of these issues about multiprocess web servers and
the GIL.

Not sure why you just wouldn't let the operating system handle
allocation of processes/threads across CPUs as it is likely in general
to do a better job. Are you sure you aren't trying to solve a problem
that doesn't really exist.

Graham
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Christopher Weimann

Devin Torres wrote:
 On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann
 
 Do you happen to know the applicable setting to use when specifying
 the size of that pool?
 

Just to use fcgi_fork do this.

[server:main]
use = egg:PasteScript#flup_fcgi_fork
host = 0.0.0.0
port = 5000

I've never changed the defaults for the pool but I think this is 
supposed to be the right way to do it.

[server:main]
paste.server_factory = flup.server.fcgi_fork:factory
host = 0.0.0.0
port = 5000
maxChildren=50
maxSpare=5
minSpare=1

Those are the default pool settings so that SHOULD be the equivalent of 
the first config section.  The problem is it doesn't seem to work that 
way.  If I use the server_factory and start things up with  'paster 
serve development.ini' it seems fine until I hit ctrl-c to stop it.
Then all hell breaks loose and it starts forking off children like mad 
bringing the machine to its knees.

I was planning on moving an app from quixote using its preforked 
scgi_server.py to pylons with flup_scgi_fork but apparently thats a bad 
idea.  Either I'm using the factory wrong or I need to figure out whats 
up with flup.

I suppose another option is using a Paste#http instance for each 
processor and nginx as a reverse proxy spreading the load over them.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: Multiple Pylons instances, processor affinity, and threads

2008-04-23 Thread Cliff Wells


On Wed, 2008-04-23 at 21:04 -0400, Christopher Weimann wrote:

 
 I suppose another option is using a Paste#http instance for each 
 processor and nginx as a reverse proxy spreading the load over them.

That's what I do.

Cliff


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---