Hi there,
On Mon, 23 Oct 2000, David Schwartz wrote:
> > This is not true. Session caching is independant of the IO mechanism you
> > choose to use.
>
> Then how does the client code know which session to reuse? It doesn't know
> what server it's talking to.
In the case of an SSL server, it is irrelevant. In the case of an SSL
client, the way you identify the peer is up to you - if that's by matching
up IP addresses then you will at least need the network abstraction to
give you that information, if it's implicit in the application logic then
there may be other ways. The point is that you don't necessarily lose any
of this because you're not operating the sockets yourself. And of course
SSL doesn't need to be a socket thing - you can SSL inside any "stream",
which means the association between session resuming and some concept of a
"peer context" will have a different form altogether.
> Exactly. I've done this myself in the SSL version of ConferenceRoom and its
> web server. Really the only special case is during connection setup. In this
> case, the SSL code may want to send data out the dirty side even though you
> didn't hand it anything to send.
Well, yes ... but as stated - you proactively create this event yourself,
namely starting the SSL handshake - if at that point (as with all
"events") you run around the SSL popping out any data it has to give, it
won't then give any more until the next "event" (which will be, in all
probability, data arriving back from the peer). Any "event", if it's
followed by an attempt to suck data out of the SSL object where possible,
can then be processed in one run - and then you can wait for the next
event (which may be one you create such as renegotiating or closing, or
one the peer makes - ie. there's new data to deliver to the SSL).
> Right, the SSL will never block on I/O. The only harmful affect it may have
> on your system is that it will spin the CPU when it has expensive
> operations. Some servers have response time requirements that this can
> adversely affect.
absolutely - as I said, this becomes a playoff with latencies. But
generally speaking (the SMP issues aside) you can either do a lot of
operations simultaneously (slowly) or run them quickly one after the
other. If a public key operation should not be able to have any noticable
impact on the responsiveness of other streams you either need to ensure
the server (relative to your expected load) is able to do these public key
ops fast enough, or use more concurrency (threads, processes, etc).
> > Not true. The only need for multithreading/multiprocessing is to get a
> > tighter limit on latency, it won't gain you anything in terms of the
> > throughput (with the obvious exception that a single thread or process can
> > only utilise 1 CPU, even in a SMP machine).
>
> Nonsense. A singly-threaded program can't do any other work while it's
> waiting for a disk read or while the operating system is servicing a page
> fault. A multi-threaded process can. Singly-threaded servers are notoriously
> bursty because of this.
Well if disk-reads are even required at all, then that is a form of inline
blocking behaviour you need to consider - exactly as you would if network
operation was blocking. SSL itself (at least the way it's done in OpenSSL)
shouldn't require disk reads par default - but if it does and disk IO
factors at all into your performance, then interpret that as appropriate.
My point had been that multitasking does not (necessarily) give better
"performance" (depending on your metric). If your particular scanario is a
VPN, then you may be talking about a few high bandwidth connections - in
which case async may very well outperform multitasking from a throughput
point of view, with only a slight increase in latency (more "bursty" as
you put it). This is because "chunking" data into larger blocks before
they are applied to the SSL state machine may give you a lower expansion
in the volume of traffic - and if the network is your bottleneck, there
may be an advantage to this over pumping little blocks of traffic into the
SSL the moment each of them arrives (and then having the pure data logic
processing them slowly and still give you latencies anyway).
> > In fact, non-blocking (async)
> > multiplexing of many SSL streams in the same process/thread can have some
> > distinct performance advantages; most notably; (1) you don't have loads of
> > context switches going on,
>
> Why would you have context switches in a multi-threaded approach? I wasn't
> suggesting one-thread-per-connnection. I'm not that stupid.
Well, many people are - and many SSL tunneling applications/utils do
precisely that. In fact this is very commonly done from inetd where it is
one-process-per-connection. If what you are suggesting is a hybrid where
each thread can operate some configurable limit of SSL streams, but you
use some number of threads or processes to scale this up - then I could
simply not agree with you more. This is the ideal model, because you can
utilise enough multitasking so that the server is not *too* bursty (and
makes use of multiple CPUs if appropriate), but gains from some of the
async benefits. Let's not forget that neither threads or processes come
for "free" so you can converge on a mix that suits.
> > and (2) the longer the "loop" across all the
> > distinct SSL streams takes, the more chance that an SSL stream will
> > accumulate larger "packets" for when its turn next comes round again. If
> > your responsiveness is too quick, you may end up encapsulating more (and
> > smaller) blocks, each one having its own SSL overhead (in terms of
> > processing and data size). If your peer is trickling through bytes quickly
> > but only one at a time, the SSL traffic bloat will be huge if you pump
> > these in and out of the SSL machine a byte at a time - so having a slight
> > builtin latency to let those bytes accumulate before they're pumped into
> > the SSL en masse means you'll be making better use of your "packaging" :-)
>
> You get that automatically with any setup. If load is high, it will take
> you longer to get around to doing anything -- with any architecture.
Yes ... and servers in particular that try to throw lots of multitasking
at this often end up with egg on their face. The misconception is that by
multitasking, *every* stream is being processed immediately rather than
having to wait. The problem of course is that *every* stream is being
processed *slowly* :-) In a purely non-blocking async model, every
operation the server performs is at full speed, and if this means that
some streams have network data accumulating until it is their turn then so
be it - the server doesn't degrade in performance, but the streaming gets
more bursty. Depends how you define "performance" I guess :-)
Our good friend Apache, an otherwise damned fine application, currently
suffers from this problem and it's a biggee - namely even with the
upcoming version 2 it appears it will at best be able to multithread
rather than multiprocess, but there still doesn't appear to be much hope
for async. I think this is the same for IIS and others anyway so it's
hardly a slight on the Apache team - but it does mean that these
applications are well suited to stability and processing "fat" operations,
but a bit too bulky and lethargic when processing "streamy" "thin"
operations (these similies are getting a bit much ...)
> Blocking is, in the general case, unavoidable. There is always, for
> example, the case where a client causes an seldom-used code path to be used
> which requires the servicing of a page fault. Why stall your entire server
> for as long as it takes the page fault to get serviced?
Right, you're talking here about system and application stability in the
face of the general class of "errors". We're not too proud to admit that
some of the OpenSSL code is quite "opaque" - so there's every possibility
that there are areas there which may pose these sorts of "infrequent but
painful when they happen" errors. Having just pointed out a limitation in
Apache, this is an example of one of its main strengths. Not only Apache
itself, but the potentially 3rd party garbage it is trying to invoke
(users' cgis, php, etc) can go wrong - but the process model generally
means that the consequences are only felt locally. I guess you just have
to question the "fatness" of the job you're doing - eg. a heavily
multithreaded or multiprocessed transparent proxy would be completely
stupid - its job is trivial and async generally is a much better idea. An
opposite example, you probably wouldn't want to write a single system
"shell" that services multiple clients ("invocations") asynchronously -
the clients could be doing just about *anything* inline which can/would
block everything else (and if it does damage, it also damages everything
else).
> You seem to be equating multithreaded with a thread-per-connection
> approach, as opposed to a 'thread per CPU, plus thread per I/O I wish to
> pend'.
OK - in that case we probably agree then.
> > Again, this is not true - you get an "event" to turn the SSL state machine
> > - namely notification from your outside network code that data has
> > arrived. When you run around the state machine trying to push that data
> > in, you should at the same time pop out any generated outgoing data the
> > SSL wants to send. Once that is done - the SSL will not spontaneously
> > create traffic out of thin air ... the SSL is "idle" until some other
> > network activity takes place, or you proactively decide that you *want*
> > something to happen.
>
> Nevertheless, the SSL code will take the CPU for arbitrarily long amounts
> of time, depending upon how much work it has to do. This may or may not be a
> problem, depending upon parameters of his situation that we don't know.
> Threads may or may not be a good solution to that, again depending upon
> factors we don't know.
Totally agreed. But "arbitrarily long" is not necessarily the case -
obviously the big thing in SSL is the private key operations during
handshakes and renegotiates - and if your server employs a model where
there's a fixed number of threads/processes, you can deduce some upper
bound on how much "work" the system can be doing at one time, and how long
certain operations will take before they will return. If I wasn't so
against fashionable jargon, I'd say this was classic "real-time"
programming. In a tunneling scanario - it may be that the desire is to
guarantee some level of minimum system performance/load, even if that
means the tunnelled traffic will get increasingly "chunky" under high
usage.
> My own code uses bio pairs. We special case the connection setup phase.
> Otherwise, we basically just manage the four I/O streams and the SSL code
> does its part without any special effort. Of course, it's multithreaded, but
> then it has to run on high-end SMP machines.
The connection setup phase shouldn't need to be "special" - but of course
I don't know what interesting things you may be doing :-) If your model
requires that this is unique and otherwise you've got I/O logic built
around the smooth functioning of the dirty-to-clean and clean-to-dirty
traffic, then it would fail (just as the connection setup would) in the
event that the peer asked for a renegotiate of the session. The run-time
logic needs to assume that data could start to bounce between peers
without the continuing arrival of new data on the clean side of either
peer. If your logic achieves that, then as a result the "connection setup
phase" doesn't really need anything special beyond that (except of course
a call to start the SSL handshake in the first place).
Cheers,
Geoff
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]