Re: Recommended number of threads? (in CPython)

2009-11-02 Thread Aahz
In article mailman.2256.1256831821.2807.python-l...@python.org,
mk  mrk...@gmail.com wrote:

I wrote run-of-the-mill program for concurrent execution of ssh command 
over a large number of hosts. (someone may ask why reinvent the wheel 
when there's pssh and shmux around -- I'm not happy with working details 
and lack of some options in either program)

The program has a working queue of threads so that no more than 
maxthreads number are created and working at particular time.

But this begs the question: what is the recommended number of threads 
working concurrently? If it's dependent on task, the task is: open ssh 
connection, execute command (then the main thread loops over the queue 
and if the thread is finished, it closes ssh connection and does .join() 
on the thread)

Given that you code is not just I/O-bound but wait-bound, I suggest
following the suggestion to use asynch code -- then you could open a
connection to every single machine simultaneously.  Assuming your system
setup can handle the load, that is.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

[on old computer technologies and programmers]  Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As.  --Andrew Dalke
-- 
http://mail.python.org/mailman/listinfo/python-list


Recommended number of threads? (in CPython)

2009-10-29 Thread mk

Hello everyone,

I wrote run-of-the-mill program for concurrent execution of ssh command 
over a large number of hosts. (someone may ask why reinvent the wheel 
when there's pssh and shmux around -- I'm not happy with working details 
and lack of some options in either program)


The program has a working queue of threads so that no more than 
maxthreads number are created and working at particular time.


But this begs the question: what is the recommended number of threads 
working concurrently? If it's dependent on task, the task is: open ssh 
connection, execute command (then the main thread loops over the queue 
and if the thread is finished, it closes ssh connection and does .join() 
on the thread)


I found that when using more than several hundred threads causes weird 
exceptions to be thrown *sometimes* (rarely actually, but it happens 
from time to time). Although that might be dependent on modules used in 
threads (I'm using paramiko, which is claimed to be thread safe).



--
http://mail.python.org/mailman/listinfo/python-list


Re: Recommended number of threads? (in CPython)

2009-10-29 Thread Falcolas
On Oct 29, 9:56 am, mk mrk...@gmail.com wrote:
 Hello everyone,

 I wrote run-of-the-mill program for concurrent execution of ssh command
 over a large number of hosts. (someone may ask why reinvent the wheel
 when there's pssh and shmux around -- I'm not happy with working details
 and lack of some options in either program)

 The program has a working queue of threads so that no more than
 maxthreads number are created and working at particular time.

 But this begs the question: what is the recommended number of threads
 working concurrently? If it's dependent on task, the task is: open ssh
 connection, execute command (then the main thread loops over the queue
 and if the thread is finished, it closes ssh connection and does .join()
 on the thread)

 I found that when using more than several hundred threads causes weird
 exceptions to be thrown *sometimes* (rarely actually, but it happens
 from time to time). Although that might be dependent on modules used in
 threads (I'm using paramiko, which is claimed to be thread safe).

Since you're creating OS threads when doing this, your issue is
probably more related to your OS' implementation of threads than
Python. That said, several hundred threads, regardless of them being
blocked by the GIL, sounds like a recipe for trouble on most machines,
but as usual YMMV.

If you're running into problems with a large number of connections
(not related to a socket limit), you might look into doing it
asynchronously - loop over a list of connections and do non-blocking
reads to see if your command has completed. I've done this
successfully with pexpect, and didn't run into any issues with the
underlying OS.

Garrick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Recommended number of threads? (in CPython)

2009-10-29 Thread Neil Hodgson
mk:

 I found that when using more than several hundred threads causes weird
 exceptions to be thrown *sometimes* (rarely actually, but it happens
 from time to time). 

   If you are running on a 32-bit environment, it is common to run out
of address space with many threads. Each thread allocates a stack and
this allocation may be as large as 10 Megabytes on Linux. With a 4
Gigabyte 32-bit address space this means that the maximum number of
threads will be 400. In practice, the operating system will further
subdivide the address space so only 200 to 300 threads will be possible.
On Windows, I think the normal stack allocation is 1 Megabyte.

   The allocation is only of address space, not memory since memory can
be mapped into this space when it is needed and many threads do not need
very much stack.

   Neil
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Recommended number of threads? (in CPython)

2009-10-29 Thread Paul Rubin
Neil Hodgson nyamatongwe+thun...@gmail.com writes:
If you are running on a 32-bit environment, it is common to run out
 of address space with many threads. Each thread allocates a stack and
 this allocation may be as large as 10 Megabytes on Linux. 

I'm sure it's smaller than that under most circumstances.  I run
python programs with hundreds of threads all the time, and they don't
use gigabytes of memory.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Recommended number of threads? (in CPython)

2009-10-29 Thread Dave Angel

Paul Rubin wrote:

Neil Hodgson nyamatongwe+thun...@gmail.com writes:
  

   If you are running on a 32-bit environment, it is common to run out
of address space with many threads. Each thread allocates a stack and
this allocation may be as large as 10 Megabytes on Linux. 



I'm sure it's smaller than that under most circumstances.  I run
python programs with hundreds of threads all the time, and they don't
use gigabytes of memory.

  
As Neil pointed out further on, in the same message you quoted, address 
space is not the same as allocated memory.  It's easy to run out of 
allocatable address space long before you run out of virtual memory, or 
swap space.


Any time a buffer is needed that will need to be contiguous (such as a 
return stack), the address space for the max possible size must be 
reserved, but the actual virtual memory allocations (which is what you 
see when you're using the system utilities to display memory usage) are 
done incrementally, as needed.


It's been several years, but I believe the two terms on Windows are 
reserve and commit.  Reserve is done in multiples of 64k, and commit 
in multiples of 4k.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list