Re: [paramiko] SCPClient slower than scp

2010-02-01 Thread james bardin
On Mon, Feb 1, 2010 at 2:52 AM, Roman Yakovenko
roman.yakove...@gmail.com wrote:
 Hello.

 I am using SCPClient class from branch(
 http://bazaar.launchpad.net/~jbardin/paramiko/paramiko_scp/annotate/500?file_id=scp.py-20081117202350-5q0ozjv6zz9ww66y-1
 ) with paramiko 1.7.6 and Python 2.6 on Ubuntu Karmic Koala.

 I am testing my code with file size 1 GB.

 The SCPClient upload rate starts with 10 MB/s and than drops to 5.2
 MB/s. The average is 5.2 MB/s. I tried to change buffer size, but this
 didn't help
 The scp command upload rate starts with 20 MB/s and then drops to 10
 MB/s. The average is 10 MB/s.
 To complete the statistics, paramiko built-in SFTPClient average rate
 is 2.2 MB. I use put method as is, with the default configuration.

 I am not sure where to start to solve the problem. Initially, I
 suspected that local file reading is a problem, but that
 functionality works pretty well.

You would normally start by using a profiler to see where the
performance bottleneck is, before you start speculating. You would
have seen that most of the time is spent in paramiko.Transport
manipulating data, and waiting for pyCrypto. SCPClient adds almost
nothing to the overall time.


 Right now, I am using work around ( executing scp with subprocess )
 but it is less than optimal solution.

 Any help is appreciated.


Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor. There may be
some places where optimizations could be made in paramiko and
pyCrypto, but I haven't looked into it myself.

-jim

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] SCPClient slower than scp

2010-02-01 Thread Roman Yakovenko
On Mon, Feb 1, 2010 at 5:36 PM, james bardin jbar...@bu.edu wrote:
 On Mon, Feb 1, 2010 at 2:52 AM, Roman Yakovenko
 I am testing my code with file size 1 GB.

 The SCPClient upload rate starts with 10 MB/s and than drops to 5.2
 MB/s. The average is 5.2 MB/s. I tried to change buffer size, but this
 didn't help
 The scp command upload rate starts with 20 MB/s and then drops to 10
 MB/s. The average is 10 MB/s.
 To complete the statistics, paramiko built-in SFTPClient average rate
 is 2.2 MB. I use put method as is, with the default configuration.

 I am not sure where to start to solve the problem. Initially, I
 suspected that local file reading is a problem, but that
 functionality works pretty well.

 You would normally start by using a profiler to see where the
 performance bottleneck is, before you start speculating. You would
 have seen that most of the time is spent in paramiko.Transport
 manipulating data, and waiting for pyCrypto. SCPClient adds almost
 nothing to the overall time.

Thanks for advice. I'll follow it. It was not a complete speculation.
The CPU usage was pretty same for all solutions.

 Right now, I am using work around ( executing scp with subprocess )
 but it is less than optimal solution.

 Any help is appreciated.


 Yes, the solution written entirely in c will be significantly faster.
 Since this is mostly python, cpu is the limiting factor.

I have zero experience in ssh and encryption, but my expection was
that at least in the case of transfering 10+ Gb files, the process
will be bounded by network and not CPU.

 There may be
 some places where optimizations could be made in paramiko and
 pyCrypto, but I haven't looked into it myself.

Thank you.


-- 
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] SCPClient slower than scp

2010-02-01 Thread james bardin
On Mon, Feb 1, 2010 at 11:49 AM, Roman Yakovenko
roman.yakove...@gmail.com wrote:

 You would normally start by using a profiler to see where the
 performance bottleneck is, before you start speculating. You would
 have seen that most of the time is spent in paramiko.Transport
 manipulating data, and waiting for pyCrypto. SCPClient adds almost
 nothing to the overall time.

 Thanks for advice. I'll follow it. It was not a complete speculation.
 The CPU usage was pretty same for all solutions.


There are some other limiting factors in both paramiko and
openssh(http://www.psc.edu/networking/projects/hpn-ssh/), but I have
always hit the cpu wall with paramiko long before anything else is
relevant. If you're maxing out 1 processor core for each, you're just
seeing the difference in the efficiency of the c code vs the python+c
code (pyCrypto does the heavy lifting in c).



 Yes, the solution written entirely in c will be significantly faster.
 Since this is mostly python, cpu is the limiting factor.

 I have zero experience in ssh and encryption, but my expection was
 that at least in the case of transfering 10+ Gb files, the process
 will be bounded by network and not CPU.


The size of the file has nothing to do with it once the connection and
negotiation time become irrelevant. It's an encrypted stream of data,
so you're limited by how fast you can process it, not by how long it
is.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] SCPClient slower than scp

2010-02-01 Thread james bardin
On Mon, Feb 1, 2010 at 12:16 PM, james bardin jbar...@bu.edu wrote:


 Yes, the solution written entirely in c will be significantly faster.
 Since this is mostly python, cpu is the limiting factor.

 I have zero experience in ssh and encryption, but my expection was
 that at least in the case of transfering 10+ Gb files, the process
 will be bounded by network and not CPU.


Your email got me thinking, so I did a few tests:

The biggest boost in performance was had by using the latest
pycrypto(2.1.0). You'll get a deprecation warning from paramiko that
you can ignore for now (bug already submitted in github). There was a
change to the HMAC code that made a huge difference in paramiko's
performance.

I tried using a limited bandwidth connection, and paramiko was on par
with openssh when cpu wasn't a concern.
When bandwidth wasn't an issue (using loopback), paramiko was about
85% of the speed of openssh on my machine.

Each newer version of python2.X was slightly faster as well.

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko


Re: [paramiko] SCPClient slower than scp

2010-02-01 Thread Roman Yakovenko
On Tue, Feb 2, 2010 at 12:26 AM, james bardin jbar...@bu.edu wrote:
 On Mon, Feb 1, 2010 at 12:16 PM, james bardin jbar...@bu.edu wrote:


 Yes, the solution written entirely in c will be significantly faster.
 Since this is mostly python, cpu is the limiting factor.

 I have zero experience in ssh and encryption, but my expection was
 that at least in the case of transfering 10+ Gb files, the process
 will be bounded by network and not CPU.


 Your email got me thinking, so I did a few tests:

 The biggest boost in performance was had by using the latest
 pycrypto(2.1.0). You'll get a deprecation warning from paramiko that
 you can ignore for now (bug already submitted in github). There was a
 change to the HMAC code that made a huge difference in paramiko's
 performance.

 I tried using a limited bandwidth connection, and paramiko was on par
 with openssh when cpu wasn't a concern.

As expected, since sending data take much more time then encryption.

 When bandwidth wasn't an issue (using loopback), paramiko was about
 85% of the speed of openssh on my machine.

Those are really good news. I will try to upgrade the code.

I am using the real IP to test my code

( I found the following code on the internet )
import socket
import struct
import fcntl

def get_ip_address(fname='eth0'):
SIOCGIFADDR = 0x8915
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
io_result = fcntl.ioctl( s.fileno(), SIOCGIFADDR,
struct.pack('256s', fname[:15] ) )
return socket.inet_ntoa( io_result[20:24] )

from one side all requests goes via router, from the other side I have
local access to the both ends. In case of file transfer, the md5sum
is executed on both files and compared.

 Each newer version of python2.X was slightly faster as well.

I am using Python 2.4 ( production sys admins are  so conservative :-)
) and 2.6 in development, but as you noted there is no a big
difference between them.

Thank you for help.

-- 
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/

___
paramiko mailing list
paramiko@lag.net
http://www.lag.net/cgi-bin/mailman/listinfo/paramiko