Re: Multiple Pylons instances, processor affinity, and threads
On Apr 25, 4:59 am, Devin Torres [EMAIL PROTECTED] wrote: Use Apache and mod_wsgi and you have all that you want except playing with 'processor affinity'. This is because Apache is multi process by design and thus can properly make use of multiple CPUs. A lot of what goes on in Apache is also not implemented in Python and thus not subject to GIL issues. You might also have a read of the following: http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws... http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html Read. Apache is starting to look attractive now. So I assume I'm not looking for embedded mode, right? You say it's more performance, but at the cost of what? Using the worker MPM and, say, daemon-mode, using, say, 4 processes and 16 threads each, would my processes be dying as soon as they're not needed? My application takes awhile to load because I autoload my database using SQLAlchemy. Is it that easy to configure apache to start 4 by default and load balance between all of them? If you are running a web site that requires the absolute best performance possible, you would dedicate the Apache instance to running just the one Python web application. That Apache instance would be setup to use prefork MPM and you would use mod_wsgi embedded mode. You would turn off keep alive for the Apache instance. You would throw as much memory as possible into the system and you would use a dedicated machine and not a VPS. At the same time, all static media would be served from a distinct nginx or lighttpd instance or via a content delivery provider. The static media server would still use keep alive. A typical default Apache prefork configuration is: IfModule mpm_prefork_module StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 /IfModule That is, initially create 5 child processes for serving requests. To support maximum 150 clients at a time, because each child process is single threaded, it can theoretically create up to 150 child processes to handle requests, if demand so requires. As demand drops off and process become unused, it will start to kill of additional child processes, when this occurs it will keep arise between 5 and 10 of these additional servers as spares for future bursts in traffic. Apart from where it creates additional process to meet demand and then kills them off when no longer required, the child process will be kept around for ever. This is because max requests per child is set to 0. If you had a problem with memory creep in an application, you could set max requests per child to some non zero number and child processes would be recycled after than number of requests. Now, depending on how expensive loading the application is initially and what you expected traffic volume is, you would customise these values to keep as many persistent child processes around as possible to meet average demand, plus some measure of bursts in traffic. What the values would be you would have to experiment with. Anyway, that is the extreme end where performance is the most important thing. In this case you would use prefork MPM and mod_wsgi embedded mode. The other extreme end is a memory constrained system, in which case you would use worker MPM, with small number of initial Apache child processes, plus use mod_wsgi daemon mode with single daemon process with limited number of threads. Static media would be served on same Apache instance. The limited number of threads would be to minimise possibility of memory blowing out due to multiple concurrent requests allocating a lot of transient memory at the same time. To temper this one would set maximum number of requests for a process and set inactivity timeouts so that daemon processes recycled if not doing anything, thus bring memory back to minimal levels. Apache, through which MPM you use and how you configure it, plus mod_wsgi and whether you use embedded mode or daemon mode, plus how you configure daemon mode, provide a great deal of flexibility in creating a setup anywhere between these extremes. What configuration is going to be best really depends on a lot of different issues, many of which you don't expand on, such as how important is performance, how much memory is available, how much memory your applications require etc etc etc. Even when you think you have a good idea of what sort of configuration will work, you need to then properly test it, as well as compare that performance to alternate configurations. Personally I'd probably just suggest you start out with mod_wsgi embedded mode with either prefork or worker MPM and just see how it goes and get a feel for how Apache works, especially with respect to it use of multiple processes to handle requests. For most peoples web site applications, the configuration doesn't generally matter that much as there application never has high
Re: Multiple Pylons instances, processor affinity, and threads
Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. You may consider apache with mod_wsgi, it can be simpler to manage in such context. In particular WSGIDaemonProcess let you set the number of dedicated processess... -- -- | Marcin Kasperski | A process that is too complex will fail. | http://mekk.waw.pl | (Booch) || -- --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Use Apache and mod_wsgi and you have all that you want except playing with 'processor affinity'. This is because Apache is multi process by design and thus can properly make use of multiple CPUs. A lot of what goes on in Apache is also not implemented in Python and thus not subject to GIL issues. You might also have a read of the following: http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html Read. Apache is starting to look attractive now. So I assume I'm not looking for embedded mode, right? You say it's more performance, but at the cost of what? Using the worker MPM and, say, daemon-mode, using, say, 4 processes and 16 threads each, would my processes be dying as soon as they're not needed? My application takes awhile to load because I autoload my database using SQLAlchemy. Is it that easy to configure apache to start 4 by default and load balance between all of them? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Multiple Pylons instances, processor affinity, and threads
So we're using Pylons and Python in general for our new company platform. We just bought a server with 4 cores to help us reach our scalability goals, but there are a few questions I'm interested in asking the Pylons community. I (mostly) understand the nature of threads in Python. From my understanding, the GIL locks the interpreter to executing only one Python thread at a time, but C modules can take advantage of a Python application being multithreaded, because they can operate independant of the GIL. Presumably, this would mean that there is, in fact, a benefit to using threads in Paste, because most network I/O bound stuff happens within a C module. Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. Using our setup we'd have four pylons instances being proxied to by four nginx worker threads. In nginx you can set the processor affinity for each worker thread, thus placing each worker on a different core 0..3. Here's where things get tricky: I've found a Python package that apparently allows Python applications to set their processor affinity (I'm afraid it doesn't work on OS X): http://pypi.python.org/pypi/affinity/0.1.0 Using this, what do you guys thing on my idea to write a custom cluster controller, perhaps using supervisord, that will start nginx and the four worker processes, and then fork()'s my Pylons app into into a cluster of four? Is this overkill? Is Paste more mulithreaded than I'm giving it credit for? Is there a better way to go about this? Does an alternative to the 'affinity' package exist? -Devin Torres --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Devin Torres wrote: So we're using Pylons and Python in general for our new company platform. We just bought a server with 4 cores to help us reach our scalability goals, but there are a few questions I'm interested in asking the Pylons community. I (mostly) understand the nature of threads in Python. From my understanding, the GIL locks the interpreter to executing only one Python thread at a time, but C modules can take advantage of a Python application being multithreaded, because they can operate independant of the GIL. Presumably, this would mean that there is, in fact, a benefit to using threads in Paste, because most network I/O bound stuff happens within a C module. Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. Separate processes is likely to work better. You might find one of the flup forking servers to be better (using fastcgi), though I don't know for sure. That will run each request in its own process, so you'll get multiple processes without the same infrastructure complications of a cluster of servers. I don't think affinity should be that important. Doesn't the OS handle that itself? -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
If I understand you correctly, there's a flup entry point that forks the process instead of flup_fcgi_thread? I'm not sure that would have good performance, but maybe you think forking is capable of good performance in this case. After forking, would SQLAlchemy connections stay persistent? Is that safe? Also, is it only fastcgi, not scgi as well? -Devin On Wed, Apr 23, 2008 at 12:56 PM, Ian Bicking [EMAIL PROTECTED] wrote: Devin Torres wrote: So we're using Pylons and Python in general for our new company platform. We just bought a server with 4 cores to help us reach our scalability goals, but there are a few questions I'm interested in asking the Pylons community. I (mostly) understand the nature of threads in Python. From my understanding, the GIL locks the interpreter to executing only one Python thread at a time, but C modules can take advantage of a Python application being multithreaded, because they can operate independant of the GIL. Presumably, this would mean that there is, in fact, a benefit to using threads in Paste, because most network I/O bound stuff happens within a C module. Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. Separate processes is likely to work better. You might find one of the flup forking servers to be better (using fastcgi), though I don't know for sure. That will run each request in its own process, so you'll get multiple processes without the same infrastructure complications of a cluster of servers. I don't think affinity should be that important. Doesn't the OS handle that itself? -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Devin Torres wrote: If I understand you correctly, there's a flup entry point that forks the process instead of flup_fcgi_thread? I'm not sure that would have good performance, but maybe you think forking is capable of good performance in this case. After forking, would SQLAlchemy connections stay persistent? Is that safe? Hmm... well, yeah, that probably wouldn't work well -- I think each request being a new fork won't get any shared connections. So perhaps a cluster of servers would work better for you. -- Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Devin Torres napisał(a): Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. We're using 2 instances of paster with few threads. It's working better than one instance. We have apache load balancer in front. Configuration: [server:main] use = egg:paste#http host = 0.0.0.0 port = 5000 use_threadpool = True threadpool_workers = 10 [server:main2] use = egg:paste#http host = 0.0.0.0 port = 5001 use_threadpool = True threadpool_workers = 10 Start commands: paster serve production.ini --server-name=main --pid-file=main.pid -- log-file=main.log --daemon start paster serve production.ini --server-name=main2 --pid-file=main2.pid -- log-file=main2.log --daemon start Apache conf: RewriteRule ^(.*)$ balancer://somename$1 [P,L] Proxy balancer://somename BalancerMember http://127.0.0.1:5000 retry=3 BalancerMember http://127.0.0.1:5001 retry=3 /Proxy You can use nginx too. Climbus --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Devin Torres wrote: If I understand you correctly, there's a flup entry point that forks the process instead of flup_fcgi_thread? I'm not sure that would have good performance, but maybe you think forking is capable of good performance in this case. After forking, would SQLAlchemy connections stay persistent? Is that safe? Flup has both fcgi_fork and scgi_fork flavors. They are pre-fork so it creates a pool of long running processes and it passes connections to them. This is the same model that Apache uses an is in theory quite efficient. You do NOT have to wait for a fork on every connection because the pool of processes has been forked in advance and is ready and waiting. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann [EMAIL PROTECTED] wrote: Flup has both fcgi_fork and scgi_fork flavors. They are pre-fork so it creates a pool of long running processes and it passes connections to them. This is the same model that Apache uses an is in theory quite efficient. You do NOT have to wait for a fork on every connection because the pool of processes has been forked in advance and is ready and waiting. Do you happen to know the applicable setting to use when specifying the size of that pool? -Devin Torres --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
On Apr 24, 3:51 am, Devin Torres [EMAIL PROTECTED] wrote: So we're using Pylons and Python in general for our new company platform. We just bought a server with 4 cores to help us reach our scalability goals, but there are a few questions I'm interested in asking the Pylons community. I (mostly) understand the nature of threads in Python. From my understanding, the GIL locks the interpreter to executing only one Python thread at a time, but C modules can take advantage of a Python application being multithreaded, because they can operate independant of the GIL. Presumably, this would mean that there is, in fact, a benefit to using threads in Paste, because most network I/O bound stuff happens within a C module. Given this situation, I believe that despite paste making an effort to be multithreaded, it would still be advantageous to run a cluster of four Pylons instances and proxy to these using nginx. Using our setup we'd have four pylons instances being proxied to by four nginx worker threads. In nginx you can set the processor affinity for each worker thread, thus placing each worker on a different core 0..3. Here's where things get tricky: I've found a Python package that apparently allows Python applications to set their processor affinity (I'm afraid it doesn't work on OS X):http://pypi.python.org/pypi/affinity/0.1.0 Using this, what do you guys thing on my idea to write a custom cluster controller, perhaps using supervisord, that will start nginx and the four worker processes, and then fork()'s my Pylons app into into a cluster of four? Is this overkill? Is Paste more mulithreaded than I'm giving it credit for? Is there a better way to go about this? Does an alternative to the 'affinity' package exist? Use Apache and mod_wsgi and you have all that you want except playing with 'processor affinity'. This is because Apache is multi process by design and thus can properly make use of multiple CPUs. A lot of what goes on in Apache is also not implemented in Python and thus not subject to GIL issues. You might also have a read of the following: http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html These explain some of these issues about multiprocess web servers and the GIL. Not sure why you just wouldn't let the operating system handle allocation of processes/threads across CPUs as it is likely in general to do a better job. Are you sure you aren't trying to solve a problem that doesn't really exist. Graham --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
Devin Torres wrote: On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann Do you happen to know the applicable setting to use when specifying the size of that pool? Just to use fcgi_fork do this. [server:main] use = egg:PasteScript#flup_fcgi_fork host = 0.0.0.0 port = 5000 I've never changed the defaults for the pool but I think this is supposed to be the right way to do it. [server:main] paste.server_factory = flup.server.fcgi_fork:factory host = 0.0.0.0 port = 5000 maxChildren=50 maxSpare=5 minSpare=1 Those are the default pool settings so that SHOULD be the equivalent of the first config section. The problem is it doesn't seem to work that way. If I use the server_factory and start things up with 'paster serve development.ini' it seems fine until I hit ctrl-c to stop it. Then all hell breaks loose and it starts forking off children like mad bringing the machine to its knees. I was planning on moving an app from quixote using its preforked scgi_server.py to pylons with flup_scgi_fork but apparently thats a bad idea. Either I'm using the factory wrong or I need to figure out whats up with flup. I suppose another option is using a Paste#http instance for each processor and nginx as a reverse proxy spreading the load over them. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---
Re: Multiple Pylons instances, processor affinity, and threads
On Wed, 2008-04-23 at 21:04 -0400, Christopher Weimann wrote: I suppose another option is using a Paste#http instance for each processor and nginx as a reverse proxy spreading the load over them. That's what I do. Cliff --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups pylons-discuss group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~--~~~~--~~--~--~---