Re: Recommended number of threads? (in CPython)
In article mailman.2256.1256831821.2807.python-l...@python.org, mk mrk...@gmail.com wrote: I wrote run-of-the-mill program for concurrent execution of ssh command over a large number of hosts. (someone may ask why reinvent the wheel when there's pssh and shmux around -- I'm not happy with working details and lack of some options in either program) The program has a working queue of threads so that no more than maxthreads number are created and working at particular time. But this begs the question: what is the recommended number of threads working concurrently? If it's dependent on task, the task is: open ssh connection, execute command (then the main thread loops over the queue and if the thread is finished, it closes ssh connection and does .join() on the thread) Given that you code is not just I/O-bound but wait-bound, I suggest following the suggestion to use asynch code -- then you could open a connection to every single machine simultaneously. Assuming your system setup can handle the load, that is. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ [on old computer technologies and programmers] Fancy tail fins on a brand new '59 Cadillac didn't mean throwing out a whole generation of mechanics who started with model As. --Andrew Dalke -- http://mail.python.org/mailman/listinfo/python-list
Recommended number of threads? (in CPython)
Hello everyone, I wrote run-of-the-mill program for concurrent execution of ssh command over a large number of hosts. (someone may ask why reinvent the wheel when there's pssh and shmux around -- I'm not happy with working details and lack of some options in either program) The program has a working queue of threads so that no more than maxthreads number are created and working at particular time. But this begs the question: what is the recommended number of threads working concurrently? If it's dependent on task, the task is: open ssh connection, execute command (then the main thread loops over the queue and if the thread is finished, it closes ssh connection and does .join() on the thread) I found that when using more than several hundred threads causes weird exceptions to be thrown *sometimes* (rarely actually, but it happens from time to time). Although that might be dependent on modules used in threads (I'm using paramiko, which is claimed to be thread safe). -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommended number of threads? (in CPython)
On Oct 29, 9:56 am, mk mrk...@gmail.com wrote: Hello everyone, I wrote run-of-the-mill program for concurrent execution of ssh command over a large number of hosts. (someone may ask why reinvent the wheel when there's pssh and shmux around -- I'm not happy with working details and lack of some options in either program) The program has a working queue of threads so that no more than maxthreads number are created and working at particular time. But this begs the question: what is the recommended number of threads working concurrently? If it's dependent on task, the task is: open ssh connection, execute command (then the main thread loops over the queue and if the thread is finished, it closes ssh connection and does .join() on the thread) I found that when using more than several hundred threads causes weird exceptions to be thrown *sometimes* (rarely actually, but it happens from time to time). Although that might be dependent on modules used in threads (I'm using paramiko, which is claimed to be thread safe). Since you're creating OS threads when doing this, your issue is probably more related to your OS' implementation of threads than Python. That said, several hundred threads, regardless of them being blocked by the GIL, sounds like a recipe for trouble on most machines, but as usual YMMV. If you're running into problems with a large number of connections (not related to a socket limit), you might look into doing it asynchronously - loop over a list of connections and do non-blocking reads to see if your command has completed. I've done this successfully with pexpect, and didn't run into any issues with the underlying OS. Garrick -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommended number of threads? (in CPython)
mk: I found that when using more than several hundred threads causes weird exceptions to be thrown *sometimes* (rarely actually, but it happens from time to time). If you are running on a 32-bit environment, it is common to run out of address space with many threads. Each thread allocates a stack and this allocation may be as large as 10 Megabytes on Linux. With a 4 Gigabyte 32-bit address space this means that the maximum number of threads will be 400. In practice, the operating system will further subdivide the address space so only 200 to 300 threads will be possible. On Windows, I think the normal stack allocation is 1 Megabyte. The allocation is only of address space, not memory since memory can be mapped into this space when it is needed and many threads do not need very much stack. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommended number of threads? (in CPython)
Neil Hodgson nyamatongwe+thun...@gmail.com writes: If you are running on a 32-bit environment, it is common to run out of address space with many threads. Each thread allocates a stack and this allocation may be as large as 10 Megabytes on Linux. I'm sure it's smaller than that under most circumstances. I run python programs with hundreds of threads all the time, and they don't use gigabytes of memory. -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommended number of threads? (in CPython)
Paul Rubin wrote: Neil Hodgson nyamatongwe+thun...@gmail.com writes: If you are running on a 32-bit environment, it is common to run out of address space with many threads. Each thread allocates a stack and this allocation may be as large as 10 Megabytes on Linux. I'm sure it's smaller than that under most circumstances. I run python programs with hundreds of threads all the time, and they don't use gigabytes of memory. As Neil pointed out further on, in the same message you quoted, address space is not the same as allocated memory. It's easy to run out of allocatable address space long before you run out of virtual memory, or swap space. Any time a buffer is needed that will need to be contiguous (such as a return stack), the address space for the max possible size must be reserved, but the actual virtual memory allocations (which is what you see when you're using the system utilities to display memory usage) are done incrementally, as needed. It's been several years, but I believe the two terms on Windows are reserve and commit. Reserve is done in multiples of 64k, and commit in multiples of 4k. DaveA -- http://mail.python.org/mailman/listinfo/python-list