Hi,

yesterday I spent a few ours looking at the peruser source code trying
to get some basic understanding of it. The idea behind peruser is great
and I'm still wondering why it (or something similar) is not the default
MPM for apache. Obviously the task isn't easy otherwise per_child
wouldn't have died and peruser would be backed up by much more
developers. Given that, many thanks to Sean Gabriel for taking module as
far as it is. Since I've read the source code now I've got at least
vague idea how much work this might have been. Good work.

I also looked at the SSL problem which seems to be the most pressing
problem after a workaround for the IP logging is in place (I cannot
reproduce this BTW). With the Apache log level set to debug and full
peruser debugging turned on the problem is easy to spot: The request is
processed twice by mod_ssl. The first time it is decoded in the
multiplexer. Then the request is passed on to a worker/processor. As
soon as the worker accepts the request mod_ssl jumps in again and tries
to decode the already decoded request. Obviously this doesn't work. SSL
ouputs an error message that HTTP was used over a non-HTTP connection
and exits. This error message is never displayed by the web browser be
it is still in SSL mode as set up by the multiplexer.

I recognized three approaches to fix the problem:

1) We could do all SSL decoding in the multiplexer and then pass on the
decoded request to the worker. This obviously is a potential performance
bottleneck but that could be solved later on using multiple
multiplexers. SSL processing has to be disabled in the processors for
this approach to work. Luckily mod_ssl offers an optional function
(ssl_engine_disable) which can be aquired and used at runtime by
peruser. I tried this approach but was not able to get it working.
Although the processing of the document seems to work it fails to encode
the data which is returned to the client. I had to learn that the
multiplexer only accepts the data but is not involved in actually
returning the data to the client. Still, this approach looks somewaht
promising and if we'd be able to only enable the SSL output filter on
the processor we might be able to get this working. But yes, this would
be a bit messy and require changes to mod_ssl.

2) We could disable SSL processing on the multiplexer.  I didn't expect
this approach to work at all but with the code from the first approach
in place this basically was a one-character change. My expectation was
that the request wouldn't be passed on at all since pass_request() works
on a already parsed request which obviously isn't possible when the data
is still encoded. To my big surprise the data was passed to a processor
and mod_ssl started to process it. The problem now is that the SSL/TLS
protocol is an interactive one. Unlike HTTP where only a single request
is sent and then a single answer is sent back the TLS protocol involves
multiple steps. But they multiplexer simply reads the data from the
client, passes them to the processor and closes the connection. So there
is no way for the ssl module in the processor to start the two-way
protocol with the client. Again, pretty close but still not there. We
obviously could try to hack the multiplexer to relay all communication
between the client and the processor but this again will be a bottleneck
and might open the way for a simple DOS attack.

3) The most promising approach I see would be to extend peruser to pass
the input socket to the processor without touching it. Since SSL doesn't
allow name based virtual hosting the decision which vhost should handle
the request is easy to make, the IP address and the port number are the
only values involved and both can be taken from the socket itself. There
is no requirement to read any data. We only need to read the headers for
the non-ssl case and then only if name based virtual hosting is
activated for the given ip/port. Besides allowing us to support ssl this
will also reduce the workload of the multiplexer for normal http servers
and thus make peruser more scalable. I did a very crude hack and indeed
it works ;) Currently it's a hack, not more. There is a hard coded check
for port 443 and it always selects the first available processor,
regardless which server environment has been selected for a given vhost.
It might also have some memory leaks, I don't know the APR framework in
detail, so this might need some work, too.

I'll prepare a clean patch and send it around later on.

Stefan


_______________________________________________
Peruser mailing list
[email protected]
http://www.telana.com/mailman/listinfo/peruser

Reply via email to