Re: [paramiko] Multithreading

2010-07-02 Thread Marcin Krol

Andrew Bennetts wrote:


If you meant “can handle many concurrent connections” instead, I'd
suggest Twisted, it tends to excel at that sort of task (and without
threads, usually).  Personally, even if threads are required I'd probably
lean towards using it anyway :)


Threads are not a hard requirement, and Twisted would certainly be 
interesting to learn, but there are a few cons against it:


- first of all, event-driven programming is still a bit exotic and many 
more people are familiar with threads; and the code I'm writing will 
probably be in use for a long time and not just me will work on it. I 
can't realistically expect them to learn Twisted just to deal with my 
stuff (even being allowed to use Python was a bit of a challenge, my 
environment is almost all Java).


- in the long run, the threads support in core Python is much more 
certain to be there in at least a few years time, but future of Twisted, 
however good it is, is not so certain.


Joys of corporate pressures and conformity for you.


--

Regards,
mk

--
Premature optimization is the root of all fun.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko

Re: [paramiko] Multithreading

2010-07-01 Thread Marcin Krol

james bardin wrote:

The main loop is a busy loop! It's hogging the GIL itself, and pegging
a cpu core at %100.
Because the threads only do work with a lock, there is no time for the
GIL to switch threads., The sleep() simply allows a few cycles for the
GIL to be released.


Oops! I didn't analyze the code in question (just skimmed the article) 
and accepted the conclusions in good faith. Quality of internet texts 
for you. I assumed, apparently incorrectly, that there was some critical 
peer review for stuff published in Linux Gazette. :-(


Anyway, how would you design such a thing  that I will describe shortly: 
 a multithreaded network server for file copying onto remote machines, 
controlled by a web application. (for reasons of potential high system 
load and security reasons a web application server cannot do it itself)


I used SocketServer.TCPServer as basis, with 2 global queues and 1 
global lock. On incoming request, the handler class for TCPServer 
acquires lock, adds item to queue and releases the lock.


The main thread that handles items in the queue periodically acquires 
lock, processes items in the queue (spawning sending SSH threads), 
updates item statuses (if e.g. sending SSH thread finished working) etc. 
and releases lock and sleeps for relatively long time (like 0.5 second).


My implementation of this design works really well so far (no contention 
issues, capable of handling many simultaneous requests and transfers), 
but if this is a bad design, I would like to know.


--

Regards,
mk

--
Premature optimization is the root of all fun.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread james bardin
On Thu, Jul 1, 2010 at 12:06 PM, Nikolaus Rath nikol...@rath.org wrote:
 You can share it, like you can share anything you want between
 threads, you need proper locking, as the client only has one channel
 for communication.

 Since I can share anything I want if I synchronize access to it myself,
 my question was meant as can I share it without explicit locking.


Generally in python, the only objects you can share without explicit
locking are single instances of core data types - basically lists and
dicts.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread Nikolaus Rath
On 07/01/2010 12:43 PM, james bardin wrote:
 On Thu, Jul 1, 2010 at 12:06 PM, Nikolaus Rath nikol...@rath.org wrote:
 You can share it, like you can share anything you want between
 threads, you need proper locking, as the client only has one channel
 for communication.

 Since I can share anything I want if I synchronize access to it myself,
 my question was meant as can I share it without explicit locking.
 
 Generally in python, the only objects you can share without explicit
 locking are single instances of core data types - basically lists and
 dicts.

As well as third-party modules that have been designed to be threadsafe.


Best,

   -Niko

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-07-01 Thread Nikolaus Rath

On 07/01/2010 01:28 PM, james bardin wrote:

On Thu, Jul 1, 2010 at 1:00 PM, Nikolaus Rathnikol...@rath.org  wrote:


Generally in python, the only objects you can share without explicit
locking are single instances of core data types - basically lists and
dicts.


As well as third-party modules that have been designed to be threadsafe.



*Modules* can be thread-safe, but what's an example of a module that
advertises it's classes as thread-safe? Any class that does that would
need to self-lock on all non-atomic operations. It's normally just
easier to expect locking to be handled outside of the class.


In most cases you are probably right. But I think there are also good 
cases where locking is better done in the class itself. The class does 
not need to self-lock all non-atomic operations, only those that 
actually operate on instance (or global) variables. And even when 
locking is required, the method is able to lock just the one variable it 
is working with rather than the entire method.


Example: I am working with a class that uploads data to Amazon S3 
(basically an online storage service with a simple HTTP API). The class 
provides methods like put_from_fh(key, fh) and get_to_fh(key, fh) which 
are designed to be threadsafe. When called, they first compress and 
encrypt the data, then they briefly obtain a lock to get a HTTP 
connection from a pool, release the lock and upload the data.


The methods have to be multithreaded because they provide a file system 
backend (and it would be rather annoying if you would have to wait for 
you 100 MB write into file1 to complete before you can read 10 bytes 
from file2).


There are of course other possible implementations, like giving every 
thread its own S3 storage instance or managing the storage classes in a 
pool (instead of the HTTP connections), but I consider those to be less 
elegant. The S3 storage class has semantics like a dict (only that the 
amount of stored data is larger and stored elsewhere), so I would 
consider it quite awkward if I had to bother with locking or pooling 
when using it.



Best,

   -Nikolaus


Btw, I am about to write a class that provides the same functionality 
over SFTP, thus my initial question.




--
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-06-30 Thread Nikolaus Rath
On 06/30/2010 09:03 AM, Marcin Krol wrote:
 Nikolaus Rath wrote:
 Hello,

 I would like to use an SFTPClient instance concurrently with several
 threads, but I couldn't find any information about thread safety in the
 API documentation.

  - Can I just share the SFTPClient instance between several threads?
 
 Why use SFTPClient? Its performance might not be very good plus
 compatibility issues with some SSH servers might crop up. I know I had
 compatibility issues even with some pretty standard SSH servers on some
 platforms (esp. Solaris).
 
 I would avoid SFTPClient if I were you.
 
 I wrote my own multithreaded SCP class (handling both upload and
 download) which I can post if you're interested. It's been in use for a
 while by several users and I think it's pretty well debugged by now.


All I need is a Python API for uploading, downloading and renaming files
over SSH. I chose SFTPClient since it seemed to be the simplest
solution, and I don't remember seeing any warnings about performance or
compatibility. Can you tell me what exactly the problem with SFTPClient
is? Are there any better options within paramiko? In any case, I am
certainly interested in taking a look at your solution.


 There's quite a number of rather nasty problems you need to deal with:
 for instance, what if you need to close down the thread in the middle of
 operation but SFTPClient doesn't allow that? 

When I'm in the middle of an operation, then I am in the middle of an
SFTPClient method. Obviously I can't shut down the thread while the
interpreter is not executing my code. This doesn't seem to be an
SFTPClient or even multithreading specific problem to me.

 What if you're shutting
 down an interpreter and SFTPClient throws an exception which is visible
 for end user?

The interpreter should keep running while at least one thread is alive.
It seems to me that if SFTPClient throws an exception, obviously
something went wrong and it is a good think to know about it.


 What if there's no SCP/sftp on the other end (and this
 does happen from time to time) 

If the user tries to establish an SFTP connection to a server that does
not support SFTP, then things will obviously break. But that's not a bug
in the program.


 I have done quite a lot of work on
 getting my class to work reasonably under such circumstances: for
 instance, thread's file sending/downloading methods watch value of
 thread.abort flag and if it's set to True by an external class, they
 shut down gracefully.


I think you are using multithreading in a different way. Let me guess:
you have a main thread that has to stay responsive and therefore
delegates time-consuming operations to individual worker threads. These
worker threads must shut down when the main threads asks them to do so.
In this situation you have to deal with the problems you describe, but
they are not specific to SFTPClient and they do not always arise when
using multiple threads.

For example, my application is much simpler. I have several threads
which work independently of each other. There is no main controlling
thread. The application terminates when all the threads have finished
their work. (I am essentially programming a server with individual
threads handling client requests).



 Obviously, all the normal caveats about multithreading apply: 
 remembering to sleep just in case after releasing locks to prevent
 starvation

I never heard of that. Could you explain in more detail what you mean?



Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko

Re: [paramiko] Multithreading

2010-06-30 Thread Marcin Krol

Nikolaus Rath wrote:

All I need is a Python API for uploading, downloading and renaming files
over SSH. I chose SFTPClient since it seemed to be the simplest
solution, and I don't remember seeing any warnings about performance or
compatibility. 


I don't know about paramiko implementation of SFTPClient, but I tried 
using SFTP in my environment (lots of diverse operating systems) and it 
didn't work well really: sometimes it worked, sometimes it didn't, so I 
moved to SCP and haven't looked back. I didn't really investigate those 
problems with SFTP, just got discouraged by them.



Can you tell me what exactly the problem with SFTPClient
is? Are there any better options within paramiko? 


Simple SCP.



In any case, I am
certainly interested in taking a look at your solution.


Here:

class SSHThread(threading.Thread):
def __init__(self, hostip, user, passwd, sshprivkey, localpath, 
remotepath, action):


threading.Thread.__init__(self)

self.ip = hostip
self.username = user
self.passw = passwd
self.sshprivkey = sshprivkey
self.localpath = localpath
self.remotepath = remotepath
self.action = action
self.port = 22
self.finished = False
self.conerror = ''
self.confailed = False
self.abort = False
self.trans = None
self.sock = None
self.sentbytes = 0
self.socket_timeout = 30
self.result = ''

def run(self):
self.ssh_connect_for_scp()
if self.action == 'download' and not self.confailed:
self.download()
elif self.action == 'upload' and not self.confailed:
self.sendfilesrecursive()
elif self.action == 'listdir' and not self.confailed:
self.listdir()
try:
self.trans.close()
except AttributeError:
pass
self.finished = True

def _get_transport(self):
sock = socket.create_connection((self.ip,self.port), 
self.socket_timeout)

trans = paramiko.Transport(sock)
ciphers = trans.get_security_options()._get_ciphers()
if 'blowfish-cbc' in ciphers:
ciphers = ('blowfish-cbc',) + ciphers
if 'arcfour128' in ciphers:
ciphers = ('arcfour128',) + ciphers
trans.get_security_options().ciphers = ciphers
self.sock = sock
self.trans = trans
self.trans.set_log_channel('adserver')

def printandclose(self):
print \ntrying to close transport.sock
#self.trans.close()
self.trans.sock.close()
self.flag = True

def ssh_connect_for_scp(self):
loginsuccess = False
try:
self._get_transport()
except Exception, e:
self.conerror = str(e)
self.confailed = True
self.finished = True
return
# password
if self.passw:
try:
#timer = threading.Timer(4, self.printandclose)
#timer.start()
self.trans.connect(hostkey=None, 
username=self.username, password=self.passw, pkey=None)

#self.trans.start_client()
#timer.cancel()
loginsuccess = True
except Exception, e:
self.conerror = str(e)
# key file
pkey = None
if not loginsuccess and self.sshprivkey:
try:
pkey = 
paramiko.RSAKey.from_private_key_file(self.sshprivkey)

except paramiko.SSHException:
try:
pkey = 
paramiko.DSSKey.from_private_key_file(self.sshprivkey)

except paramiko.SSHException:
pass
except IOError, e:
self.conerror = str(e)
self.finished = True
self.confailed = True
return
if pkey:
try:
#self.conobj.connect(self.ip, 
username=self.username, key_filename=self.sshprivkey, port=self.port, 
timeout=opts.timeout)

self._get_transport()
self.trans.connect(hostkey=None, 
username=self.username, pkey = pkey)

loginsuccess = True
self.conerror = ''
except Exception, e:
self.conerror = str(e)
# agent
#if not loginsuccess:
#agent = paramiko.Agent()
#ak = agent.get_keys()
#for key in ak:
#try:
#self._get_transport()
#self.trans.connect(hostkey=None, 
username=self.username, pkey = key )

#loginsuccess = True
#self.conerror = ''
#break
#except Exception, e:
#self.conerror = str(e)

if not loginsuccess:
self.confailed = True
self.conobj = None
self.finished = 

Re: [paramiko] Multithreading

2010-06-30 Thread james bardin
On Wed, Jun 30, 2010 at 11:13 AM, Marcin Krol mrk...@gmail.com wrote:

 Obviously, all the normal caveats about multithreading apply: remembering
 to sleep just in case after releasing locks to prevent
 starvation

 I never heard of that. Could you explain in more detail what you mean?

 http://linuxgazette.net/107/pai.html

 Caveat: I don't know if this has been improved in Python thread handling
 since that article has been written, but I add a bit of sleeping after lock
 release anyway just to be safe.


That's a poor example of python coding. It's highlighting a GIL issue,
but if you have that sort of contention, you shouldn't be using
multiple threads. The author may have encountered a problem, but he
didn't fully understand it, or his own solution; which is why adding a
magic sleep seems to work. Note that you will get slightly different
results on a multicore system.

The main loop is a busy loop! It's hogging the GIL itself, and pegging
a cpu core at %100.
Because the threads only do work with a lock, there is no time for the
GIL to switch threads., The sleep() simply allows a few cycles for the
GIL to be released.


-jim

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] Multithreading

2010-06-30 Thread Nikolaus Rath
james bardin jbar...@bu.edu writes:
 On Wed, Jun 30, 2010 at 9:33 AM, Nikolaus Rath nikol...@rath.org wrote:
 Nikolaus Rath wrote:
 Hello,

 I would like to use an SFTPClient instance concurrently with several
 threads, but I couldn't find any information about thread safety in the
 API documentation.

  - Can I just share the SFTPClient instance between several threads?


 Yes. Though you may have a use case, there's no performance benefit,
 as the client can only handle one operation at a time.

I am mostly concerned about reducing the network latency. Suppose I want
to create 100 1-bit files, then I hope that it is going to be faster to
send 100 requests at once from 100 threads rather than having one thread
that works through the files sequentially. That should still work even
with a single threaded client, right?

 Why use SFTPClient? Its performance might not be very good plus
 compatibility issues with some SSH servers might crop up. I know I had
 compatibility issues even with some pretty standard SSH servers on some
 platforms (esp. Solaris).


 There are throughput performance issues with older ssh servers that
 don't support prefetch and pipelining.

Ah, so the problem is with the server and not with the SFTPClient class?
My servers are given and only speak SFTP, so I am to live with them in
any case.


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko

Re: [paramiko] Multithreading

2010-06-27 Thread Nikolaus Rath
Hi,

Really no one around who knows anything about this?

-Nikolaus


Nikolaus Rath nikol...@rath.org writes:
 Hello,

 I would like to use an SFTPClient instance concurrently with several
 threads, but I couldn't find any information about thread safety in the
 API documentation.

  - Can I just share the SFTPClient instance between several threads?
  - Or can I share the SSHClient object, but each thread needs its own
SFTPClient?
  - Or do I need a separate SSHClient for each thread?
  - Or is there another layer in between that I can share between threads?
  - Or do I have to avoid multithreading at all when using paramiko?


 Thanks!

-Nikolaus

 -- 
  »Time flies like an arrow, fruit flies like a Banana.«

   PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

 ___
 paramiko mailing list
 paramiko@lag.net
 http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko