Re: [Python-Dev] Improved thread switching
FYI: I shot an email to stdlib-sig about the fact I am proposing the inclusion of the pyProcessing module into the stdlib. Comments and thoughts regarding that would be welcome. I've got a rough outline of the PEP, but I need to spend more time with the code examples. -jesse On Wed, Mar 19, 2008 at 9:52 PM, Alex Martelli [EMAIL PROTECTED] wrote: Hmmm, sorry if I'm missing something obvious, but, if the occasional background computations are sufficiently heavy -- why not fork, do said computations in the child thread, and return the results via any of the various available IPC approaches? I've recently (at Pycon, mostly) been playing devil's advocate (i.e., being PRO-threads, for once) on the subject of utilizing multiple cores effectively -- but the classic approach (using multiple _processes_ instead) actually works quite well in many cases, and this application server would appear to be one. (the pyProcessing package appears to offer an easy way to migrate threaded code to multiple-processes approaches, although I've only played around with it, not [yet] used it for production code). Alex On Wed, Mar 19, 2008 at 10:49 AM, Adam Olsen [EMAIL PROTECTED] wrote: On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote: Adam Olsen rhamph at gmail.com writes: So you want responsiveness when idle but throughput when busy? Exactly ;) Are those calculations primarily python code, or does a C library do the grunt work? If it's a C library you shouldn't be affected by safethread's increased overhead. It's Python code all the way. Frankly, it's a huge mess, but it would be very very hard to come up with a scalable solution that would allow to optimize certain hotspots and redo them in C or C++. There isn't even anything left to optimize in particular because all those low hanging fruit have already been taken care of. So it's just ~30kloc Python code over which the total time spent is quite uniformly distributed :(. I see. Well, at this point I think the most you can do is file a bug so the problem doesn't get forgotten. If nothing else, if my safethread stuff goes in it'll very likely include a --with-gil option, so I may put together a FIFO scheduler. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
On Thu, Mar 20, 2008 at 09:58:46AM -0400, Jesse Noller wrote: FYI: I shot an email to stdlib-sig about the fact I am proposing the inclusion of the pyProcessing module into the stdlib. Comments and thoughts regarding that would be welcome. I've got a rough outline of the PEP, but I need to spend more time with the code examples. Since we officially encourage people to spawn processes instead of threads, I think that this would be a great idea. The processing module has a similar API to threading. It's easy to use, works well, and most importantly, gives us some place to point people to when they complain about the GIL. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 signature.asc Description: Digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Even I, as a strong advocate for it's inclusion think I should finish the PEP and outline all of the questions/issues that may come out of it. On Thu, Mar 20, 2008 at 1:37 PM, Facundo Batista [EMAIL PROTECTED] wrote: 2008/3/20, Andrew McNabb [EMAIL PROTECTED]: Since we officially encourage people to spawn processes instead of threads, I think that this would be a great idea. The processing module has a similar API to threading. It's easy to use, works well, and most importantly, gives us some place to point people to when they complain about the GIL. I'm +1 to include the processing module in the stdlib. just avoid confussions, with these libraries with alike names, I'm meaning this [1] module, the one that emulates the semantics of threading module. Does anybody has strong reasons for this module to not get included? Regards, [1] http://pypi.python.org/pypi/processing -- .Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Facundo Batista wrote: 2008/3/20, Andrew McNabb [EMAIL PROTECTED]: Since we officially encourage people to spawn processes instead of threads, I think that this would be a great idea. The processing module has a similar API to threading. It's easy to use, works well, and most importantly, gives us some place to point people to when they complain about the GIL. I'm +1 to include the processing module in the stdlib. just avoid confussions, with these libraries with alike names, I'm meaning this [1] module, the one that emulates the semantics of threading module. Does anybody has strong reasons for this module to not get included? Other than the pre-release version number and the fact that doing such a thing would require R. Oudkerk to actually make the offer rather than anyone else? There would also need to be the usual thing of at least a couple of people stepping up and being willing to maintain it. I also wouldn't mind seeing some performance figures for an application that was limited to making good use of only one CPU when run with the threading module, but was able to exploit multiple processors to obtain a speed improvements when run with the processing module. That said, I'm actually +1 on the general idea, since I always write my threaded Python code using worker threads that I communicate with via Queue objects. Pyprocessing would be a great way for me to scale to multiple processors if I was running CPU intensive tasks rather than potentially long-running hardware IO operations (I've been meaning to check it out for a long time, but have never actually needed to for either work or any home projects). Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Improved thread switching
The company I work for has over the last couple of years created an application server for use in most of our customer projects. It embeds Python and most project code is written in Python by now. It is quite resource-hungry (several GB of RAM, MySQL databases of 50-100GB). And of course it is multi-threaded and, at least originally, we hoped to make it utilize multiple processor cores. Which, as we all know, doesn't sit very well with Python. Our application runs heavy background calculations most of the time (in Python) and has to service multiple (few) GUI clients at the same time, also using Python. The problem was that a single background thread would increase the response time of the client threads by a factor of 10 or (usually) more. This led me to add a dirty hack to the Python core to make it switch threads more frequently. While this hack greatly improved response time for the GUI clients, it also slowed down the background threads quite a bit. top would often show significantly less CPU usage -- 80% instead of the more usual 100%. The problem with thread switching in Python is that the global semaphore used for the GIL is regularly released and immediately reacquired. Unfortunately, most of the time this leads to the very same thread winning the race on the semaphore again and thus more wait time for the other threads. This is where my dirty patch intervened and just did a nanosleep() for a short amount of time (I used 1000 nsecs). I have then created a better scheduling scheme and written a small test program that nicely mimics what Python does for some statistics. I call the scheduling algorithm the round-robin semaphore because threads can now run in a more or less round-robin fashion. Actually, it's just a semaphore with FIFO semantics. The implementation problem with the round-robin semaphore is the __thread variable I had to use because I did not want to change the signature of the Enter() and Leave() methods. For CPython, I have replaced this thread-local allocation with an additional field in the PyThreadState. Because of that, the patch for CPython I have already created is a bit more involved than the simple nanosleep() hack. Consequently, it's not very polished yet and not at all as portable as the rest of the Python core. I now show you the results from the test program which compares all three scheduling mechanisms -- standard python, my dirty hack and the new round-robin semaphore. I also show you the test program containing the three implementations nicely encapsulated. The program was run on a quad-core Xeon 1.86 GHz on Fedora 5 x86_64. The first three lines from the output (including the name of the algorithm) should be self-explanatory. The fourth and the fifth show a distribution of wait times for the individual threads. The ideal distribution would be everything on the number of threads (2 in this case) and zero everywhere else. As you can see, the round-robin semaphore is pretty close to that. Also, because of the high thread switching frequency, we could lower Python's checkinterval -- the jury is still out on the actual value, likely something between 1000 and 1. I can post my Python patch if there is enough interest. Thanks for your attention. Synch: Python lock iteration count: 24443 thread switches: 10 1 2 3 4 5 6 7 8 910 -10 -50 -100 -1k more 24433 0 0 0 0 0 0 0 0 0 0 1 1 6 0 Synch: Dirty lock iteration count: 25390 thread switches: 991 1 2 3 4 5 6 7 8 910 -10 -50 -100 -1k more 2439910 0 0 0 0 1 0 1 0 975 1 1 0 0 Synch: round-robin semaphore iteration count: 23023 thread switches: 22987 1 2 3 4 5 6 7 8 910 -10 -50 -100 -1k more 36 22984 0 0 0 0 0 0 0 0 1 0 0 0 0 // compile with g++ -g -O0 -pthread -Wall p.cpp #include pthread.h #include semaphore.h #include stdio.h #include stdlib.h #include string.h #include errno.h #include assert.h // // posix stuff class TMutex { pthread_mutex_t mutex; static pthread_mutex_t initializer_normal; static pthread_mutex_t initializer_recursive; TMutex(const TMutex ); TMutex operator=(const TMutex ); public: TMutex(bool recursive = true); ~TMutex() { pthread_mutex_destroy(mutex); } void Lock() { pthread_mutex_lock(mutex); } bool TryLock() { return pthread_mutex_trylock(mutex) == 0;} void Unlock() { pthread_mutex_unlock(mutex); } friend class TCondVar; }; class TCondVar { pthread_cond_t cond; static pthread_cond_t initializer; TCondVar(const TCondVar ); TCondVar operator=(const TCondVar ); public: TCondVar(); ~TCondVar() { pthread_cond_destroy(cond); } void Wait(TMutex *mutex) { pthread_cond_wait(cond,
Re: [Python-Dev] Improved thread switching
On Tue, Mar 18, 2008 at 1:29 AM, Stefan Ring [EMAIL PROTECTED] wrote: The company I work for has over the last couple of years created an application server for use in most of our customer projects. It embeds Python and most project code is written in Python by now. It is quite resource-hungry (several GB of RAM, MySQL databases of 50-100GB). And of course it is multi-threaded and, at least originally, we hoped to make it utilize multiple processor cores. Which, as we all know, doesn't sit very well with Python. Our application runs heavy background calculations most of the time (in Python) and has to service multiple (few) GUI clients at the same time, also using Python. The problem was that a single background thread would increase the response time of the client threads by a factor of 10 or (usually) more. This led me to add a dirty hack to the Python core to make it switch threads more frequently. While this hack greatly improved response time for the GUI clients, it also slowed down the background threads quite a bit. top would often show significantly less CPU usage -- 80% instead of the more usual 100%. The problem with thread switching in Python is that the global semaphore used for the GIL is regularly released and immediately reacquired. Unfortunately, most of the time this leads to the very same thread winning the race on the semaphore again and thus more wait time for the other threads. This is where my dirty patch intervened and just did a nanosleep() for a short amount of time (I used 1000 nsecs). Can you try with a call to sched_yield(), rather than nanosleep()? It should have the same benefit but without as much performance hit. If it works, but is still too much hit, try tuning the checkinterval to see if you can find an acceptable throughput/responsiveness balance. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Adam Olsen rhamph at gmail.com writes: Can you try with a call to sched_yield(), rather than nanosleep()? It should have the same benefit but without as much performance hit. If it works, but is still too much hit, try tuning the checkinterval to see if you can find an acceptable throughput/responsiveness balance. I tried that, and it had no effect whatsoever. I suppose it would make an effect on a single CPU or an otherwise heavily loaded SMP system but that's not the secnario we care about. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring [EMAIL PROTECTED] wrote: Adam Olsen rhamph at gmail.com writes: Can you try with a call to sched_yield(), rather than nanosleep()? It should have the same benefit but without as much performance hit. If it works, but is still too much hit, try tuning the checkinterval to see if you can find an acceptable throughput/responsiveness balance. I tried that, and it had no effect whatsoever. I suppose it would make an effect on a single CPU or an otherwise heavily loaded SMP system but that's not the secnario we care about. So you've got a lightly loaded SMP system? Multiple threads all blocked on the GIL, multiple CPUs to run them, but only one CPU is active? I that case I can imagine how sched_yield() might finish before the other CPUs wake up a thread. A FIFO scheduler would be the right thing here, but it's only a short term solution. Care for a long term solution? ;) http://code.google.com/p/python-safethread/ -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Adam Olsen rhamph at gmail.com writes: On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring s.r at visotech.at wrote: Adam Olsen rhamph at gmail.com writes: Can you try with a call to sched_yield(), rather than nanosleep()? It should have the same benefit but without as much performance hit. If it works, but is still too much hit, try tuning the checkinterval to see if you can find an acceptable throughput/responsiveness balance. I tried that, and it had no effect whatsoever. I suppose it would make an effect on a single CPU or an otherwise heavily loaded SMP system but that's not the secnario we care about. So you've got a lightly loaded SMP system? Multiple threads all blocked on the GIL, multiple CPUs to run them, but only one CPU is active? I that case I can imagine how sched_yield() might finish before the other CPUs wake up a thread. A FIFO scheduler would be the right thing here, but it's only a short term solution. Care for a long term solution? ;) http://code.google.com/p/python-safethread/ I've already seen that but it would not help us in our current situation. The performance penalty really is too heavy. Our system is slow enough already ;). And it would be very difficult bordering on impossible to parallelize Plus, I can imagine that all extension modules (and our own code) would have to be adapted. The FIFO scheduler is perfect for us because the load is typically quite low. It's mostly at those times when someone runs a lengthy calculation that all other users suffer greatly increased response times. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
On Wed, Mar 19, 2008 at 10:42 AM, Stefan Ring [EMAIL PROTECTED] wrote: On Mar 19, 2008 05:24 PM, Adam Olsen [EMAIL PROTECTED] wrote: On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring [EMAIL PROTECTED] wrote: Adam Olsen rhamph at gmail.com writes: Can you try with a call to sched_yield(), rather than nanosleep()? It should have the same benefit but without as much performance hit. If it works, but is still too much hit, try tuning the checkinterval to see if you can find an acceptable throughput/responsiveness balance. I tried that, and it had no effect whatsoever. I suppose it would make an effect on a single CPU or an otherwise heavily loaded SMP system but that's not the secnario we care about. So you've got a lightly loaded SMP system? Multiple threads all blocked on the GIL, multiple CPUs to run them, but only one CPU is active? I that case I can imagine how sched_yield() might finish before the other CPUs wake up a thread. A FIFO scheduler would be the right thing here, but it's only a short term solution. Care for a long term solution? ;) http://code.google.com/p/python-safethread/ I've already seen that but it would not help us in our current situation. The performance penalty really is too heavy. Our system is slow enough already ;). And it would be very difficult bordering on impossible to parallelize Plus, I can imagine that all extension modules (and our own code) would have to be adapted. The FIFO scheduler is perfect for us because the load is typically quite low. It's mostly at those times when someone runs a lengthy calculation that all other users suffer greatly increased response times. So you want responsiveness when idle but throughput when busy? Are those calculations primarily python code, or does a C library do the grunt work? If it's a C library you shouldn't be affected by safethread's increased overhead. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Adam Olsen rhamph at gmail.com writes: So you want responsiveness when idle but throughput when busy? Exactly ;) Are those calculations primarily python code, or does a C library do the grunt work? If it's a C library you shouldn't be affected by safethread's increased overhead. It's Python code all the way. Frankly, it's a huge mess, but it would be very very hard to come up with a scalable solution that would allow to optimize certain hotspots and redo them in C or C++. There isn't even anything left to optimize in particular because all those low hanging fruit have already been taken care of. So it's just ~30kloc Python code over which the total time spent is quite uniformly distributed :(. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote: Adam Olsen rhamph at gmail.com writes: So you want responsiveness when idle but throughput when busy? Exactly ;) Are those calculations primarily python code, or does a C library do the grunt work? If it's a C library you shouldn't be affected by safethread's increased overhead. It's Python code all the way. Frankly, it's a huge mess, but it would be very very hard to come up with a scalable solution that would allow to optimize certain hotspots and redo them in C or C++. There isn't even anything left to optimize in particular because all those low hanging fruit have already been taken care of. So it's just ~30kloc Python code over which the total time spent is quite uniformly distributed :(. I see. Well, at this point I think the most you can do is file a bug so the problem doesn't get forgotten. If nothing else, if my safethread stuff goes in it'll very likely include a --with-gil option, so I may put together a FIFO scheduler. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Improved thread switching
Hmmm, sorry if I'm missing something obvious, but, if the occasional background computations are sufficiently heavy -- why not fork, do said computations in the child thread, and return the results via any of the various available IPC approaches? I've recently (at Pycon, mostly) been playing devil's advocate (i.e., being PRO-threads, for once) on the subject of utilizing multiple cores effectively -- but the classic approach (using multiple _processes_ instead) actually works quite well in many cases, and this application server would appear to be one. (the pyProcessing package appears to offer an easy way to migrate threaded code to multiple-processes approaches, although I've only played around with it, not [yet] used it for production code). Alex On Wed, Mar 19, 2008 at 10:49 AM, Adam Olsen [EMAIL PROTECTED] wrote: On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote: Adam Olsen rhamph at gmail.com writes: So you want responsiveness when idle but throughput when busy? Exactly ;) Are those calculations primarily python code, or does a C library do the grunt work? If it's a C library you shouldn't be affected by safethread's increased overhead. It's Python code all the way. Frankly, it's a huge mess, but it would be very very hard to come up with a scalable solution that would allow to optimize certain hotspots and redo them in C or C++. There isn't even anything left to optimize in particular because all those low hanging fruit have already been taken care of. So it's just ~30kloc Python code over which the total time spent is quite uniformly distributed :(. I see. Well, at this point I think the most you can do is file a bug so the problem doesn't get forgotten. If nothing else, if my safethread stuff goes in it'll very likely include a --with-gil option, so I may put together a FIFO scheduler. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com