mantis wrote:
I am seeking some advice on hardware for a dedicated SSH box. I can't
seem to find any performance statistics for anything but SCP/SFTP
transfers and the HPN patches for fat pipes.
The goal is to support 500-1000 simultaneous SSH sessions, which are in
TCP port forwarding mode only. The data travelling through the tunnel
will be small (about 1KB over the course of one minute).
The first question is whether crypto accelerator cards are actually
useful with this type of scenario (on the server)? I looked at the
OpenSSH code and it seems that it only uses crypto accelerator cards for
specific tasks (keys on a smartcard). The only calls that I see to the
OpenSSL functions that provide hardware acceleration (ENGINE_set_*) are
in "scard.c" and "scard-opensc.c" files. So OpenSSH effectively doesn't
use any hardware accelerators supported by OpenSSL? Is this correct?
Depends on the platform. On OpenBSD if you enable the kern.usercrypto
sysctl then everything in userspace that uses OpenSSL, including
OpenSSH, will use the hardware acceleration.
On other platforms the current release versions of OpenSSH will not use
crypto hardware (unless you're using a vendor-supplied binary and they
added it). In the current development versions of OpenSSH (and thus the
next major release) it is a configure-time option ("./configure
--with-ssl-engine"). The diff to add this is quite simple, and if you
would like I can send you one against 4.3p2.
That said, the actual throughput you're talking about is relatively
small, and it is rare that symmetric crypto throughput is the bottleneck
on modern processors anyway.
The second question is how much RAM/CPU would be necessary to support
500-1000 simultaneous connections (assuming no hardware accelerator
support)? This will probably be AMD64 CPUs. Anybody have some
statistics or guidance they can share?
There's 3 things to consider:
1) SSH connection establishment is relatively expensive because of the
DH handshake (and to a lesser extent, the public key operations). It
sounds like you intend to have relatively long-lived connections, so
this probably won't represent a large overhead.
2) Each connection is going to have one (if you're not using privsep) or
two (if you are using privsep) processes associated with it. This is
easy to estimate: bring up a couple of connections on your target
platform, look at the RSS with ps or similar, multiply by your expected
number of connections and add a safety factor.
3) Connection throughput requires symmetric crypto and a hmac (and
compression, if you use it). You can measure the approximate throughput
on your target hardware with "openssl speed xxx". The worst case would
probably be 3des and hmac-sha1 (OpenSSL doesn't seem to have a speed
test for the latter, though).
For your application, the approximate throughput would be 1kbyte per min
* 8bits / 60sec * 1000 users = ~136Kbit/s. This kind of throughput is
not likely to challenge a modern processor much :-) You are more likely
to be limited by memory: on an i386, each sshd uses ~2MB of RAM, so 1000
users at 2 processes per user is ~4GB (although you might be able to
tweak down the usage depending on your requirements). If it was me, I'd
save the money on crypto hardware and spend it on DIMMs...
--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.