Re: [paramiko] Multithreading

2010-07-01 Thread Marcin Krol

james bardin wrote:

The main loop is a busy loop! It's hogging the GIL itself, and pegging
a cpu core at %100.
Because the threads only do work with a lock, there is no time for the
GIL to switch threads., The sleep() simply allows a few cycles for the
GIL to be released.


Oops! I didn't analyze the code in question (just skimmed the article) 
and accepted the conclusions in good faith. Quality of internet texts 
for you. I assumed, apparently incorrectly, that there was some critical 
peer review for stuff published in Linux Gazette. :-(


Anyway, how would you design such a thing  that I will describe shortly: 
 a multithreaded network server for file copying onto remote machines, 
controlled by a web application. (for reasons of potential high system 
load and security reasons a web application server cannot do it itself)


I used SocketServer.TCPServer as basis, with 2 global queues and 1 
global lock. On incoming request, the handler class for TCPServer 
acquires lock, adds item to queue and releases the lock.


The main thread that handles items in the queue periodically acquires 
lock, processes items in the queue (spawning sending SSH threads), 
updates item statuses (if e.g. sending SSH thread finished working) etc. 
and releases lock and sleeps for relatively long time (like 0.5 second).


My implementation of this design works really well so far (no contention 
issues, capable of handling many simultaneous requests and transfers), 
but if this is a bad design, I would like to know.


--

Regards,
mk

--
Premature optimization is the root of all fun.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread james bardin
On Thu, Jul 1, 2010 at 12:06 PM, Nikolaus Rath nikol...@rath.org wrote:
 You can share it, like you can share anything you want between
 threads, you need proper locking, as the client only has one channel
 for communication.

 Since I can share anything I want if I synchronize access to it myself,
 my question was meant as can I share it without explicit locking.


Generally in python, the only objects you can share without explicit
locking are single instances of core data types - basically lists and
dicts.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread Nikolaus Rath
On 07/01/2010 12:43 PM, james bardin wrote:
 On Thu, Jul 1, 2010 at 12:06 PM, Nikolaus Rath nikol...@rath.org wrote:
 You can share it, like you can share anything you want between
 threads, you need proper locking, as the client only has one channel
 for communication.

 Since I can share anything I want if I synchronize access to it myself,
 my question was meant as can I share it without explicit locking.
 
 Generally in python, the only objects you can share without explicit
 locking are single instances of core data types - basically lists and
 dicts.

As well as third-party modules that have been designed to be threadsafe.


Best,

   -Niko

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread Nikolaus Rath

On 07/01/2010 01:28 PM, james bardin wrote:

On Thu, Jul 1, 2010 at 1:00 PM, Nikolaus Rathnikol...@rath.org  wrote:


Generally in python, the only objects you can share without explicit
locking are single instances of core data types - basically lists and
dicts.


As well as third-party modules that have been designed to be threadsafe.



*Modules* can be thread-safe, but what's an example of a module that
advertises it's classes as thread-safe? Any class that does that would
need to self-lock on all non-atomic operations. It's normally just
easier to expect locking to be handled outside of the class.


In most cases you are probably right. But I think there are also good 
cases where locking is better done in the class itself. The class does 
not need to self-lock all non-atomic operations, only those that 
actually operate on instance (or global) variables. And even when 
locking is required, the method is able to lock just the one variable it 
is working with rather than the entire method.


Example: I am working with a class that uploads data to Amazon S3 
(basically an online storage service with a simple HTTP API). The class 
provides methods like put_from_fh(key, fh) and get_to_fh(key, fh) which 
are designed to be threadsafe. When called, they first compress and 
encrypt the data, then they briefly obtain a lock to get a HTTP 
connection from a pool, release the lock and upload the data.


The methods have to be multithreaded because they provide a file system 
backend (and it would be rather annoying if you would have to wait for 
you 100 MB write into file1 to complete before you can read 10 bytes 
from file2).


There are of course other possible implementations, like giving every 
thread its own S3 storage instance or managing the storage classes in a 
pool (instead of the HTTP connections), but I consider those to be less 
elegant. The S3 storage class has semantics like a dict (only that the 
amount of stored data is larger and stored elsewhere), so I would 
consider it quite awkward if I had to bother with locking or pooling 
when using it.



Best,

   -Nikolaus


Btw, I am about to write a class that provides the same functionality 
over SFTP, thus my initial question.




--
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko