Re: [Python-Dev] Improved thread switching

2008-03-20 Thread Jesse Noller
FYI: I shot an email to stdlib-sig about the fact I am proposing the
inclusion of the pyProcessing module into the stdlib. Comments and
thoughts regarding that would be welcome. I've got a rough outline of
the PEP, but I need to spend more time with the code examples.

-jesse

On Wed, Mar 19, 2008 at 9:52 PM, Alex Martelli [EMAIL PROTECTED] wrote:
 Hmmm, sorry if I'm missing something obvious, but, if the occasional
  background computations are sufficiently heavy -- why not fork, do
  said computations in the child thread, and return the results via any
  of the various available IPC approaches?  I've recently (at Pycon,
  mostly) been playing devil's advocate (i.e., being PRO-threads, for
  once) on the subject of utilizing multiple cores effectively -- but
  the classic approach (using multiple _processes_ instead) actually
  works quite well in many cases, and this application server would
  appear to be one.  (the pyProcessing package appears to offer an easy
  way to migrate threaded code to multiple-processes approaches,
  although I've only played around with it, not [yet] used it for
  production code).


  Alex



  On Wed, Mar 19, 2008 at 10:49 AM, Adam Olsen [EMAIL PROTECTED] wrote:
   On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote:
 Adam Olsen rhamph at gmail.com writes:


  So you want responsiveness when idle but throughput when busy?

  Exactly ;)


   Are those calculations primarily python code, or does a C library do
   the grunt work?  If it's a C library you shouldn't be affected by
   safethread's increased overhead.
  

  It's Python code all the way. Frankly, it's a huge mess, but it would 
 be very
  very hard to come up with a scalable solution that would allow to 
 optimize
  certain hotspots and redo them in C or C++. There isn't even anything 
 left to
  optimize in particular because all those low hanging fruit have 
 already been
  taken care of. So it's just ~30kloc Python code over which the total 
 time spent
  is quite uniformly distributed :(.
  
I see.  Well, at this point I think the most you can do is file a bug
so the problem doesn't get forgotten.  If nothing else, if my
safethread stuff goes in it'll very likely include a --with-gil
option, so I may put together a FIFO scheduler.
  
  
--
Adam Olsen, aka Rhamphoryncus
  
  
   ___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com


 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-20 Thread Andrew McNabb
On Thu, Mar 20, 2008 at 09:58:46AM -0400, Jesse Noller wrote:
 FYI: I shot an email to stdlib-sig about the fact I am proposing the
 inclusion of the pyProcessing module into the stdlib. Comments and
 thoughts regarding that would be welcome. I've got a rough outline of
 the PEP, but I need to spend more time with the code examples.

Since we officially encourage people to spawn processes instead of
threads, I think that this would be a great idea.  The processing module
has a similar API to threading.  It's easy to use, works well, and most
importantly, gives us some place to point people to when they complain
about the GIL.


-- 
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868


signature.asc
Description: Digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-20 Thread Jesse Noller
Even I, as a strong advocate for it's inclusion think I should finish
the PEP and outline all of the questions/issues that may come out of
it.

On Thu, Mar 20, 2008 at 1:37 PM, Facundo Batista
[EMAIL PROTECTED] wrote:
 2008/3/20, Andrew McNabb [EMAIL PROTECTED]:


   Since we officially encourage people to spawn processes instead of
threads, I think that this would be a great idea.  The processing module
has a similar API to threading.  It's easy to use, works well, and most
importantly, gives us some place to point people to when they complain
about the GIL.

  I'm +1 to include the processing module in the stdlib.

  just avoid confussions, with these libraries with alike names, I'm
  meaning this [1] module, the one that emulates the semantics of
  threading module.

  Does anybody has strong reasons for this module to not get included?

  Regards,

  [1] http://pypi.python.org/pypi/processing

  --
  .Facundo

  Blog: http://www.taniquetil.com.ar/plog/
  PyAr: http://www.python.org/ar/


 ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-20 Thread Nick Coghlan
Facundo Batista wrote:
 2008/3/20, Andrew McNabb [EMAIL PROTECTED]:
 
 Since we officially encourage people to spawn processes instead of
  threads, I think that this would be a great idea.  The processing module
  has a similar API to threading.  It's easy to use, works well, and most
  importantly, gives us some place to point people to when they complain
  about the GIL.
 
 I'm +1 to include the processing module in the stdlib.
 
 just avoid confussions, with these libraries with alike names, I'm
 meaning this [1] module, the one that emulates the semantics of
 threading module.
 
 Does anybody has strong reasons for this module to not get included?

Other than the pre-release version number and the fact that doing such a 
thing would require R. Oudkerk to actually make the offer rather than 
anyone else? There would also need to be the usual thing of at least a 
couple of people stepping up and being willing to maintain it.

I also wouldn't mind seeing some performance figures for an application 
that was limited to making good use of only one CPU when run with the 
threading module, but was able to exploit multiple processors to obtain 
a speed improvements when run with the processing module.

That said, I'm actually +1 on the general idea, since I always write my 
threaded Python code using worker threads that I communicate with via 
Queue objects. Pyprocessing would be a great way for me to scale to 
multiple processors if I was running CPU intensive tasks rather than 
potentially long-running hardware IO operations (I've been meaning to 
check it out for a long time, but have never actually needed to for 
either work or any home projects).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Improved thread switching

2008-03-19 Thread Stefan Ring

The company I work for has over the last couple of years created an
application server for use in most of our customer projects. It embeds Python
and most project code is written in Python by now. It is quite resource-hungry
(several GB of RAM, MySQL databases of 50-100GB). And of course it is
multi-threaded and, at least originally, we hoped to make it utilize multiple
processor cores. Which, as we all know, doesn't sit very well with Python. Our
application runs heavy background calculations most of the time (in Python)
and has to service multiple (few) GUI clients at the same time, also using
Python. The problem was that a single background thread would increase the
response time of the client threads by a factor of 10 or (usually) more.

This led me to add a dirty hack to the Python core to make it switch threads
more frequently. While this hack greatly improved response time for the GUI
clients, it also slowed down the background threads quite a bit. top would
often show significantly less CPU usage -- 80% instead of the more usual 100%.

The problem with thread switching in Python is that the global semaphore used
for the GIL is regularly released and immediately reacquired. Unfortunately,
most of the time this leads to the very same thread winning the race on the
semaphore again and thus more wait time for the other threads. This is where
my dirty patch intervened and just did a nanosleep() for a short amount of
time (I used 1000 nsecs).

I have then created a better scheduling scheme and written a small test
program that nicely mimics what Python does for some statistics. I call the
scheduling algorithm the round-robin semaphore because threads can now run in
a more or less round-robin fashion. Actually, it's just a semaphore with FIFO
semantics.

The implementation problem with the round-robin semaphore is the __thread
variable I had to use because I did not want to change the signature of the
Enter() and Leave() methods. For CPython, I have replaced this thread-local
allocation with an additional field in the PyThreadState. Because of that, the
patch for CPython I have already created is a bit more involved than the
simple nanosleep() hack. Consequently, it's not very polished yet and not at
all as portable as the rest of the Python core.

I now show you the results from the test program which compares all three
scheduling mechanisms -- standard python, my dirty hack and the new
round-robin semaphore. I also show you the test program containing the three
implementations nicely encapsulated.

The program was run on a quad-core Xeon 1.86 GHz on Fedora 5 x86_64. The first
three lines from the output (including the name of the algorithm) should be
self-explanatory. The fourth and the fifth show a distribution of wait times
for the individual threads. The ideal distribution would be everything on the
number of threads (2 in this case) and zero everywhere else. As you can see,
the round-robin semaphore is pretty close to that. Also, because of the high
thread switching frequency, we could lower Python's checkinterval -- the jury
is still out on the actual value, likely something between 1000 and 1.

I can post my Python patch if there is enough interest.

Thanks for your attention.


Synch: Python lock
iteration count: 24443
thread switches: 10
 1 2 3 4 5 6 7 8 910   -10   -50  -100  
 -1k more
 24433 0 0 0 0 0 0 0 0 0 0 1 1  
   6 0

Synch: Dirty lock
iteration count: 25390
thread switches: 991
 1 2 3 4 5 6 7 8 910   -10   -50  -100  
 -1k more
 2439910 0 0 0 0 1 0 1 0   975 1 1  
   0 0

Synch: round-robin semaphore
iteration count: 23023
thread switches: 22987
 1 2 3 4 5 6 7 8 910   -10   -50  -100  
 -1k more
36 22984 0 0 0 0 0 0 0 0 1 0 0  
   0 0
// compile with g++ -g -O0 -pthread -Wall p.cpp

#include pthread.h
#include semaphore.h

#include stdio.h
#include stdlib.h

#include string.h
#include errno.h
#include assert.h

//
// posix stuff

class TMutex {
pthread_mutex_t mutex;

static pthread_mutex_t initializer_normal;
static pthread_mutex_t initializer_recursive;
TMutex(const TMutex );
TMutex operator=(const TMutex );
public:
TMutex(bool recursive = true);
~TMutex() { pthread_mutex_destroy(mutex); }
void Lock() { pthread_mutex_lock(mutex); }
bool TryLock() { return pthread_mutex_trylock(mutex) == 0;}
void Unlock() { pthread_mutex_unlock(mutex); }

friend class TCondVar;
};

class TCondVar {
pthread_cond_t cond;

static pthread_cond_t initializer;
TCondVar(const TCondVar );
TCondVar operator=(const TCondVar );
public:
TCondVar();
~TCondVar() { pthread_cond_destroy(cond); }
void Wait(TMutex *mutex) { pthread_cond_wait(cond, 

Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen
On Tue, Mar 18, 2008 at 1:29 AM, Stefan Ring [EMAIL PROTECTED] wrote:
 The company I work for has over the last couple of years created an
  application server for use in most of our customer projects. It embeds Python
  and most project code is written in Python by now. It is quite 
 resource-hungry
  (several GB of RAM, MySQL databases of 50-100GB). And of course it is
  multi-threaded and, at least originally, we hoped to make it utilize multiple
  processor cores. Which, as we all know, doesn't sit very well with Python. 
 Our
  application runs heavy background calculations most of the time (in Python)
  and has to service multiple (few) GUI clients at the same time, also using
  Python. The problem was that a single background thread would increase the
  response time of the client threads by a factor of 10 or (usually) more.

  This led me to add a dirty hack to the Python core to make it switch threads
  more frequently. While this hack greatly improved response time for the GUI
  clients, it also slowed down the background threads quite a bit. top would
  often show significantly less CPU usage -- 80% instead of the more usual 
 100%.

  The problem with thread switching in Python is that the global semaphore used
  for the GIL is regularly released and immediately reacquired. Unfortunately,
  most of the time this leads to the very same thread winning the race on the
  semaphore again and thus more wait time for the other threads. This is where
  my dirty patch intervened and just did a nanosleep() for a short amount of
  time (I used 1000 nsecs).

Can you try with a call to sched_yield(), rather than nanosleep()?  It
should have the same benefit but without as much performance hit.

If it works, but is still too much hit, try tuning the checkinterval
to see if you can find an acceptable throughput/responsiveness
balance.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Stefan Ring
Adam Olsen rhamph at gmail.com writes:

 Can you try with a call to sched_yield(), rather than nanosleep()?  It
 should have the same benefit but without as much performance hit.
 
 If it works, but is still too much hit, try tuning the checkinterval
 to see if you can find an acceptable throughput/responsiveness
 balance.
 

I tried that, and it had no effect whatsoever. I suppose it would make an effect
on a single CPU or an otherwise heavily loaded SMP system but that's not the
secnario we care about.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen
On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring [EMAIL PROTECTED] wrote:
 Adam Olsen rhamph at gmail.com writes:

   Can you try with a call to sched_yield(), rather than nanosleep()?  It
   should have the same benefit but without as much performance hit.
  
   If it works, but is still too much hit, try tuning the checkinterval
   to see if you can find an acceptable throughput/responsiveness
   balance.
  

  I tried that, and it had no effect whatsoever. I suppose it would make an 
 effect
  on a single CPU or an otherwise heavily loaded SMP system but that's not the
  secnario we care about.

So you've got a lightly loaded SMP system?  Multiple threads all
blocked on the GIL, multiple CPUs to run them, but only one CPU is
active?  I that case I can imagine how sched_yield() might finish
before the other CPUs wake up a thread.

A FIFO scheduler would be the right thing here, but it's only a short
term solution.  Care for a long term solution? ;)

http://code.google.com/p/python-safethread/

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Stefan Ring
Adam Olsen rhamph at gmail.com writes:

 
 On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring s.r at visotech.at wrote:
  Adam Olsen rhamph at gmail.com writes:
 
Can you try with a call to sched_yield(), rather than nanosleep()?  It
should have the same benefit but without as much performance hit.
   
If it works, but is still too much hit, try tuning the checkinterval
to see if you can find an acceptable throughput/responsiveness
balance.
   
 
   I tried that, and it had no effect whatsoever. I suppose it would make an
effect
   on a single CPU or an otherwise heavily loaded SMP system but that's not 
  the
   secnario we care about.
 
 So you've got a lightly loaded SMP system?  Multiple threads all
 blocked on the GIL, multiple CPUs to run them, but only one CPU is
 active?  I that case I can imagine how sched_yield() might finish
 before the other CPUs wake up a thread.
 
 A FIFO scheduler would be the right thing here, but it's only a short
 term solution.  Care for a long term solution? ;)
 
 http://code.google.com/p/python-safethread/
 


I've already seen that but it would not help us in our current
situation. The performance penalty really is too heavy. Our system is
slow enough already ;). And it would be very difficult bordering on
impossible to parallelize Plus, I can imagine that all extension modules
(and our own code) would have to be adapted.

The FIFO scheduler is perfect for us because the load is typically quite
low. It's mostly at those times when someone runs a lengthy calculation
that all other users suffer greatly increased response times.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen
On Wed, Mar 19, 2008 at 10:42 AM, Stefan Ring [EMAIL PROTECTED] wrote:

 On Mar 19, 2008 05:24 PM, Adam Olsen [EMAIL PROTECTED] wrote:

   On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring [EMAIL PROTECTED] wrote:
Adam Olsen rhamph at gmail.com writes:
   
 Can you try with a call to sched_yield(), rather than nanosleep()?
  It
  should have the same benefit but without as much performance hit.
 
 If it works, but is still too much hit, try tuning the
  checkinterval
  to see if you can find an acceptable throughput/responsiveness
  balance.
 
   
I tried that, and it had no effect whatsoever. I suppose it would
 make an effect
on a single CPU or an otherwise heavily loaded SMP system but that's
 not the
 secnario we care about.
  
   So you've got a lightly loaded SMP system?  Multiple threads all
   blocked on the GIL, multiple CPUs to run them, but only one CPU is
   active?  I that case I can imagine how sched_yield() might finish
   before the other CPUs wake up a thread.
  
   A FIFO scheduler would be the right thing here, but it's only a short
   term solution.  Care for a long term solution? ;)
  
   http://code.google.com/p/python-safethread/

  I've already seen that but it would not help us in our current
  situation. The performance penalty really is too heavy. Our system is
  slow enough already ;). And it would be very difficult bordering on
  impossible to parallelize Plus, I can imagine that all extension modules
  (and our own code) would have to be adapted.

  The FIFO scheduler is perfect for us because the load is typically quite
  low. It's mostly at those times when someone runs a lengthy calculation
  that all other users suffer greatly increased response times.

So you want responsiveness when idle but throughput when busy?

Are those calculations primarily python code, or does a C library do
the grunt work?  If it's a C library you shouldn't be affected by
safethread's increased overhead.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Stefan Ring
Adam Olsen rhamph at gmail.com writes:

 So you want responsiveness when idle but throughput when busy?

Exactly ;)

 Are those calculations primarily python code, or does a C library do
 the grunt work?  If it's a C library you shouldn't be affected by
 safethread's increased overhead.
 

It's Python code all the way. Frankly, it's a huge mess, but it would be very
very hard to come up with a scalable solution that would allow to optimize
certain hotspots and redo them in C or C++. There isn't even anything left to
optimize in particular because all those low hanging fruit have already been
taken care of. So it's just ~30kloc Python code over which the total time spent
is quite uniformly distributed :(.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen
On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote:
 Adam Olsen rhamph at gmail.com writes:


  So you want responsiveness when idle but throughput when busy?

  Exactly ;)


   Are those calculations primarily python code, or does a C library do
   the grunt work?  If it's a C library you shouldn't be affected by
   safethread's increased overhead.
  

  It's Python code all the way. Frankly, it's a huge mess, but it would be very
  very hard to come up with a scalable solution that would allow to optimize
  certain hotspots and redo them in C or C++. There isn't even anything left to
  optimize in particular because all those low hanging fruit have already been
  taken care of. So it's just ~30kloc Python code over which the total time 
 spent
  is quite uniformly distributed :(.

I see.  Well, at this point I think the most you can do is file a bug
so the problem doesn't get forgotten.  If nothing else, if my
safethread stuff goes in it'll very likely include a --with-gil
option, so I may put together a FIFO scheduler.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Alex Martelli
Hmmm, sorry if I'm missing something obvious, but, if the occasional
background computations are sufficiently heavy -- why not fork, do
said computations in the child thread, and return the results via any
of the various available IPC approaches?  I've recently (at Pycon,
mostly) been playing devil's advocate (i.e., being PRO-threads, for
once) on the subject of utilizing multiple cores effectively -- but
the classic approach (using multiple _processes_ instead) actually
works quite well in many cases, and this application server would
appear to be one.  (the pyProcessing package appears to offer an easy
way to migrate threaded code to multiple-processes approaches,
although I've only played around with it, not [yet] used it for
production code).


Alex

On Wed, Mar 19, 2008 at 10:49 AM, Adam Olsen [EMAIL PROTECTED] wrote:
 On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring [EMAIL PROTECTED] wrote:
   Adam Olsen rhamph at gmail.com writes:
  
  
So you want responsiveness when idle but throughput when busy?
  
Exactly ;)
  
  
 Are those calculations primarily python code, or does a C library do
 the grunt work?  If it's a C library you shouldn't be affected by
 safethread's increased overhead.

  
It's Python code all the way. Frankly, it's a huge mess, but it would be 
 very
very hard to come up with a scalable solution that would allow to optimize
certain hotspots and redo them in C or C++. There isn't even anything 
 left to
optimize in particular because all those low hanging fruit have already 
 been
taken care of. So it's just ~30kloc Python code over which the total time 
 spent
is quite uniformly distributed :(.

  I see.  Well, at this point I think the most you can do is file a bug
  so the problem doesn't get forgotten.  If nothing else, if my
  safethread stuff goes in it'll very likely include a --with-gil
  option, so I may put together a FIFO scheduler.


  --
  Adam Olsen, aka Rhamphoryncus


 ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com