mantis wrote:
I am seeking some advice on hardware for a dedicated SSH box.  I can't
seem to find any performance statistics for anything but SCP/SFTP
transfers and the HPN patches for fat pipes.

The goal is to support 500-1000 simultaneous SSH sessions, which are in
TCP port forwarding mode only.  The data travelling through the tunnel
will be small (about 1KB over the course of one minute).

The first question is whether crypto accelerator cards are actually
useful with this type of scenario (on the server)?  I looked at the
OpenSSH code and it seems that it only uses crypto accelerator cards for
specific tasks (keys on a smartcard).  The only calls that I see to the
OpenSSL functions that provide hardware acceleration (ENGINE_set_*) are
in "scard.c" and "scard-opensc.c" files.  So OpenSSH effectively doesn't
use any hardware accelerators supported by OpenSSL?  Is this correct?

Depends on the platform. On OpenBSD if you enable the kern.usercrypto sysctl then everything in userspace that uses OpenSSL, including OpenSSH, will use the hardware acceleration.

On other platforms the current release versions of OpenSSH will not use crypto hardware (unless you're using a vendor-supplied binary and they added it). In the current development versions of OpenSSH (and thus the next major release) it is a configure-time option ("./configure --with-ssl-engine"). The diff to add this is quite simple, and if you would like I can send you one against 4.3p2.

That said, the actual throughput you're talking about is relatively small, and it is rare that symmetric crypto throughput is the bottleneck on modern processors anyway.

The second question is how much RAM/CPU would be necessary to support
500-1000 simultaneous connections (assuming no hardware accelerator
support)?  This will probably be AMD64 CPUs.  Anybody have some
statistics or guidance they can share?

There's 3 things to consider:
1) SSH connection establishment is relatively expensive because of the DH handshake (and to a lesser extent, the public key operations). It sounds like you intend to have relatively long-lived connections, so this probably won't represent a large overhead.

2) Each connection is going to have one (if you're not using privsep) or two (if you are using privsep) processes associated with it. This is easy to estimate: bring up a couple of connections on your target platform, look at the RSS with ps or similar, multiply by your expected number of connections and add a safety factor.

3) Connection throughput requires symmetric crypto and a hmac (and compression, if you use it). You can measure the approximate throughput on your target hardware with "openssl speed xxx". The worst case would probably be 3des and hmac-sha1 (OpenSSL doesn't seem to have a speed test for the latter, though).

For your application, the approximate throughput would be 1kbyte per min * 8bits / 60sec * 1000 users = ~136Kbit/s. This kind of throughput is not likely to challenge a modern processor much :-) You are more likely to be limited by memory: on an i386, each sshd uses ~2MB of RAM, so 1000 users at 2 processes per user is ~4GB (although you might be able to tweak down the usage depending on your requirements). If it was me, I'd save the money on crypto hardware and spend it on DIMMs...

--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4  37C9 C982 80C7 8FF4 FA69
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply via email to