Hi,
I agree with Ian's comments. In all the playing I've done
with optimizing the core socket dispatching the performance
gains are almost always neglible once you get into most
real-world scenarios. As Chuck has pointed out to me
several times, Webware is already fast enough for its
intended use: dynamic content. Because of servlet
persistence, it's an order of magnitude faster than PHP for
complex applications.
However, as the Tux docs point out, alot of web content is
static, such as images. Even with dynamic sites most of the
output is static. Consider all the code you write to manage
the look of you sites. Once the AppServer is started the
output from this code rarely needs to change per-request as
truly dynamic content such as discussion forums, shopping
carts, etc. do.
There's not much more optimizing left to be done in Webware
for dynamic pages, but there's plenty we can still do to
improve the serving of static files (images,etc.) and
soft-coded-yet-static servlet pages.
Here's my thoughts on these issues, in context of the
Webware redesign I've been working on. I hope this will
help to explain some the ideas behind its implementation.
I'd appreciate any comments.
------
Note:
The code in the redesign has been fleshed out a fair bit
since Monday. The HTTPServer class is now stable and
comparable with AsynchThreadedHTTPServer; the launcher
script now has the functionality of Monitor.py and oneshot
rolled into it and will launch the HTTPServer in tandem
with the AppServer if you want; lots of little bug fixes;
etc.
http://calrudd.com/Webware-experimental_redesign.tar.gz
------
Serving static files
=============
On high load sites it makes no sense to serve static files
from the AppServer. I feel we need to make it easier to
serve only dynamic pages from the AppServer and leave
static files from the webserver (Apache, IIS, Tux,
whatever).
With the current implementation of WebKit it's not
straightforward to do this if you want to serve static
files from a relative dir, as it uses the PATH_INFO var to
store the servlet's path. You have to use various tricks:
<base>, mod_rewrite, servlet methods, etc.
The need for these tricks can easily be removed by allowing
the AppServer to use SCRIPT_NAME for the servlet's path; in
addition to the current approach. That's the idea behind
the reserved Application ID 'ROOT_APP', in the redesign.
See the file '.webware_config_annotated' in the tarball for
more information.
Output caching
===========
I feel that Webware needs an output caching framework to
deal with soft-coded-yet-static servlet pages and pages
that only need updating periodically.
MultiPortServer is the core of the Webware redesign. It
listens to a set of network ports and dispatches incoming
requests to a 'service' that has been bound to port the
request is received on. The AppServer, HTTPServer, and
AdminServer services are the core services in the redesign.
I've just reimplemented the MultiPortServer class and it's
now roughly as fast as Medussa (Asyncore.dispatcher, etc.)
for simple I/O bound requests. A simple hello_world
service responds at 2200 req/sec. A service that sends a
13K memory-mapped file to the browser responds at 1358
req/sec. A service that sends an 8K file from disk responds
at 1094 req/sec. For all examples concurrency is 50 and n
is 1000, on an AMD 650, 256MB, Linux 2.2.18, running X.
For comparison sake, Apache serves hello_world.html at 1300
req/sec on this box.
These figures are irrelevant in the context of dynamic
content: they're fast because they don't do any of the
complex processing that the Apache, the AppServer and
mod_webkit must do for every request. However, in context
of output caching, they're highly relevant.
They show that we can implement a very efficient output
caching framework, without much effort. I did a trial run
of in-memory output caching by modifying the HTTPServer
class in the redesign. The same 13K file was served up at
943 req/sec.
When using the builtin HTTPServer all we need to do is tell
the AppServer which servlets to cache and then store the
output in a tempfile or in memory, using the HTTP request
line (e.g. "GET /myServlet.py") as the key. For each
incoming socket connection we simply read the request line
and return a cache item if one exists for that key. If no
cache item exists, we process the request as usual. A
simple "_cacheOutput" attribute in the Servlet class could
be used to tell the AppServer which servlets to cache.
"_cacheOptions" could be used to specify mem/file,
duration, refresh interval, etc.
The same caching method can be used to cache dynamic
content as well! If a servlet uses GET vars in the query
string, but the output is constant for each GET var
configuration you can treat each configuration as a
separate cache item, as the HTTP request line includes the
query string. "GET /myServlet?id=1234&color=black" would
be one item and "GET /myServlet?id=4321&color=white" would
be another.
This would also work when using another web server and an
adaptor like mod_webkit, although, with the extra request
processing that goes on the speed increases wouldn't be as
dramatic. We could build a similar output caching
framework into mod_webkit or a Tux adaptor. The potential
time and CPU savings might be well worth the effort.
What do you think?
Cheers,
Tavis
_______________________________________________
Webware-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/webware-devel