On 26/11/2007, Chris Withers <[EMAIL PROTECTED]> wrote: > Hey All, > > I hope I have the right list, if not please point me in the right > direction... > > Likewise, if there are good docs that cover all of this, please send me > their way ;-) > > Right, I'm curious as to how wsgi applications end up being > multi-threaded or multi-process and if they are, how they share > resources such as databases and configuration. > > There's a couple of reasons I'm asking... > > The first was something Chris McDonough said about one ofthe issues > they're having with the repoze project: when using something like > mod_wsgi, it's the first person to hit each thread that takes the hit of > loading the configuration and opening up the zodb. Opening the ZODB, in > particular, can take a lot of time. How should repoze be structured such > that all the threads load their config and open their databases when > apache is restarted rather than when each thread is first hit? > > The second is a problem I see an app I'm working on heading towards. The > app has web-alterable configuration, so in a multi-threaded and > particular multi-process environment, I need some way to get the other > threads or processes to re-read their configuration when it has changed. > > Hope you guys can help!
For those who haven't previously read it, some background reading on issues of data visibility when using Apache and specifically mod_wsgi (although also applies to mod_python to a degree), can be found at: http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading The problem with moving initialisation from the first request to server process initialisation when using Apache/mod_wsgi is that often Apache isn't hosting just the one Python web application. Because of this applications are usually separated to run in different Python sub interpreters, or processes by using mod_wsgi daemon mode. First issue therefore, albeit not a major one, is being able to indicate within which daemon process or Python sub interpreter one should do the server process initialisation. The second issue is what do you do if the server process initialisation fails. In worst case scenario, if it causes the server process to crash, then Apache will go and startup a new process straight away, and if it keeps crashing, then you will get in a loop of continual process restarts, possibly affecting machine performance. Even if the failure doesn't crash the process but still leaves the software at that point in an unusable state, what do you do. In mod_python, where PythonImport directive is available and can be used to do server process initialisation, most people don't even consider that startup could fail. Thus, what happens is that code works okay for some time, then they have a problem and the whole server grinds to a halt until someone notices and restarts it. A more reliable way is therefore to have it so that an individual request is able to trigger the server process initialisation if it hasn't previously succeeded. Thus, if a failure has previously occurred, when a new request arrives it can retrigger any initialisation and if it works everything can then keep going. The issue then is the delay of the server process initialisation until the first request and the consequent lag noticeable by the user. To ensure initialisation can be retriggered, but also avoid a delay, one could implement initialisation at process startup as well as it being triggered by the first request if it has previously failed. The question though is whether this will make a difference to what the user sees. If the server is lightly loaded then it probably would, as the infrequent nature of requests means that in all likelihood the server process initialisation would have completed before a request arrives. If however the machine is under load with a high hit rate, the user may still see a lag anyway. Whether this will be true will with mod_wsgi depend on whether embedded or daemon mode is being used, and it using daemon mode how many processes are in the daemon process group. The worst case scenario here is using mod_wsgi daemon mode with a single process for the application. If maximum requests is reached and the process restarted, irrespective of whether you do initialisation when the process starts, you get hit with a few sources of delays. The first, and possibly overlooked as a source of problems, is how long the existing process takes to shutdown. In mod_wsgi daemon mode it will not start a new process until the old one has shutdown. Rather than just kill the process immediately it will give existing requests which are running a chance to complete. The default length of time it will wait is 5 seconds. If the requests haven't completed in that time it will kill off the process and a new one will be restarted. Even if the requests complete promptly, mod_wsgi will trigger proper shutdown of the Python interpreters, including stopping of non daemon threads (not that there should be any) and running atexit registered functions. If this for some reason also takes a long time it can trigger the default 5 second timeout and process will be killed off. Once old process has been shutdown, you still need to start up new one. This is a fork from Apache child process so quick to create process, but you still need to load and initialisation the application. As the new process isn't started until old one has been shutdown, if you are only running one daemon process for the application, then any new incoming requests will queue up within the listener socket queue. These requests will not start to be processed until new process is ready. If application initialisation done at process start, then that will still delay any pending requests, just like if the first request triggered instead triggered initialisation. These delays in shutdown and startup aren't going to be as big an issue if running multiple mod_wsgi daemon processes, or if using embedded mode, as the other processes can take over servicing requests while process is being recycled, provided of course that all daemon processes aren't being recycled at the same time. Because of how process recycling works for mod_wsgi daemon mode, if using it, and your processes are slow to startup or shutdown, or you have long running requests, then recommended that you run multiple daemon processes. Obviously, if your application isn't multiprocess safe, that could be an issue. Next is then possibly to look at what may be stopping an application from shutting down promptly. Anyway, hope this explains a few issues and gives you some things to look at. I have cc'd this over to mod_wsgi list as discussion on how mod_wsgi does things more appropriate over there. Maybe go to mod_wsgi list if you want to discuss further any new feature for allowing server process startup. I haven't ruled it out completely, but also don't want to provide a mechanism which people will just end up using in the wrong way and so not consider steps that may still be required to trigger initialisation when requests arrive. As for general issues around best way to perform application initialisation, problem is that what is the most appropriate way may depend on the specific hosting mechanism. There isn't necessarily going to be one way that will suit all ways that WSGI can be hosted, thus why there partly isn't a standard on how to do it. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com