Re: Hardware accelerators and RAM/CPU guidance

Darren Tucker Fri, 02 Jun 2006 15:25:11 -0700

mantis wrote:

I am seeking some advice on hardware for a dedicated SSH box.  I can't
seem to find any performance statistics for anything but SCP/SFTP
transfers and the HPN patches for fat pipes.


The goal is to support 500-1000 simultaneous SSH sessions, which are in
TCP port forwarding mode only.  The data travelling through the tunnel
will be small (about 1KB over the course of one minute).

The first question is whether crypto accelerator cards are actually
useful with this type of scenario (on the server)?  I looked at the
OpenSSH code and it seems that it only uses crypto accelerator cards for
specific tasks (keys on a smartcard).  The only calls that I see to the
OpenSSL functions that provide hardware acceleration (ENGINE_set_*) are
in "scard.c" and "scard-opensc.c" files.  So OpenSSH effectively doesn't
use any hardware accelerators supported by OpenSSL?  Is this correct?

Depends on the platform. On OpenBSD if you enable the kern.usercryptosysctl then everything in userspace that uses OpenSSL, includingOpenSSH, will use the hardware acceleration.

On other platforms the current release versions of OpenSSH will not usecrypto hardware (unless you're using a vendor-supplied binary and theyadded it). In the current development versions of OpenSSH (and thus thenext major release) it is a configure-time option ("./configure--with-ssl-engine"). The diff to add this is quite simple, and if youwould like I can send you one against 4.3p2.

That said, the actual throughput you're talking about is relativelysmall, and it is rare that symmetric crypto throughput is the bottleneckon modern processors anyway.

The second question is how much RAM/CPU would be necessary to support
500-1000 simultaneous connections (assuming no hardware accelerator
support)?  This will probably be AMD64 CPUs.  Anybody have some
statistics or guidance they can share?


There's 3 things to consider:

1) SSH connection establishment is relatively expensive because of theDH handshake (and to a lesser extent, the public key operations). Itsounds like you intend to have relatively long-lived connections, sothis probably won't represent a large overhead.

2) Each connection is going to have one (if you're not using privsep) ortwo (if you are using privsep) processes associated with it. This iseasy to estimate: bring up a couple of connections on your targetplatform, look at the RSS with ps or similar, multiply by your expectednumber of connections and add a safety factor.

3) Connection throughput requires symmetric crypto and a hmac (andcompression, if you use it). You can measure the approximate throughputon your target hardware with "openssl speed xxx". The worst case wouldprobably be 3des and hmac-sha1 (OpenSSL doesn't seem to have a speedtest for the latter, though).

For your application, the approximate throughput would be 1kbyte per min* 8bits / 60sec * 1000 users = ~136Kbit/s. This kind of throughput isnot likely to challenge a modern processor much :-) You are more likelyto be limited by memory: on an i386, each sshd uses ~2MB of RAM, so 1000users at 2 processes per user is ~4GB (although you might be able totweak down the usage depending on your requirements). If it was me, I'dsave the money on crypto hardware and spend it on DIMMs...


--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4  37C9 C982 80C7 8FF4 FA69
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Re: Hardware accelerators and RAM/CPU guidance

Reply via email to