Hi,
I agree with Ian's comments. In all the playing I've done 
with optimizing the core socket dispatching the performance 
gains are almost always neglible once you get into most 
real-world  scenarios.  As Chuck has pointed out to me 
several times, Webware is already fast enough for its 
intended use: dynamic content. Because of servlet 
persistence, it's an order of magnitude faster than PHP for 
complex applications.

However, as the Tux docs point out, alot of web content is 
static, such as images. Even with dynamic sites most of the 
output is static. Consider all the code you write to manage 
the look of you sites.  Once the AppServer is started the 
output from this code rarely needs to change per-request as 
truly dynamic content such as discussion forums, shopping 
carts, etc. do.   

There's not much more optimizing left to be done in Webware 
for dynamic pages, but there's plenty we can still do to 
improve the serving of static files (images,etc.) and 
soft-coded-yet-static servlet pages.  

Here's my thoughts on these issues, in context of the 
Webware redesign I've been working on.   I hope this will 
help to explain some the ideas behind its implementation. 
I'd appreciate any comments.

------
Note: 
The code in the redesign has been fleshed out a fair bit 
since Monday.  The HTTPServer class is now stable and 
comparable with AsynchThreadedHTTPServer; the launcher 
script now has the functionality of Monitor.py and oneshot 
rolled into it and will launch the HTTPServer in tandem 
with the AppServer if you want; lots of little bug fixes; 
etc.
http://calrudd.com/Webware-experimental_redesign.tar.gz
------

Serving static files
=============
On high load sites it makes no sense to serve static files 
from the AppServer.  I feel we need to make it easier to 
serve only dynamic pages from the AppServer and leave 
static files from the webserver (Apache, IIS, Tux, 
whatever).   

With the current implementation of WebKit it's not 
straightforward to do this if you want to serve static 
files from a relative dir, as it uses the PATH_INFO var to 
store the servlet's path. You have to use various tricks: 
<base>, mod_rewrite, servlet methods, etc.  

The need for these tricks can easily be removed by allowing 
the AppServer to use SCRIPT_NAME for the servlet's path; in 
addition to the current approach.  That's the idea behind 
the reserved Application ID 'ROOT_APP', in the redesign. 
See the file '.webware_config_annotated' in the tarball for 
more information.

Output caching
===========
I feel that Webware needs an output caching framework to 
deal with soft-coded-yet-static servlet pages and pages 
that only need updating periodically.  

MultiPortServer is the core of the Webware redesign. It 
listens to a set of network ports and dispatches incoming 
requests to a 'service' that has been bound to port the 
request is received on.  The AppServer, HTTPServer, and 
AdminServer services are the core services in the redesign. 

I've just reimplemented the MultiPortServer class and it's 
now roughly as fast as Medussa (Asyncore.dispatcher, etc.) 
for simple I/O bound requests.  A simple hello_world 
service responds at 2200 req/sec.  A service that sends a 
13K memory-mapped file to the browser responds at 1358 
req/sec. A service that sends an 8K file from disk responds 
at 1094 req/sec.  For all examples concurrency is 50 and n 
is 1000, on an AMD 650, 256MB, Linux 2.2.18, running X.  
For comparison sake, Apache serves hello_world.html at 1300 
req/sec on this box.

These figures are irrelevant in the context of dynamic 
content: they're fast because they don't do any of the 
complex processing that the Apache, the AppServer and 
mod_webkit must do for every request.  However, in context 
of output caching, they're highly relevant.  

They show that we can implement a very efficient output 
caching framework, without much effort.  I did a trial run 
of in-memory output caching by modifying the HTTPServer 
class in the redesign.  The same 13K file was served up at 
943 req/sec.  

When using the builtin HTTPServer all we need to do is tell 
the AppServer which servlets to cache and then store the 
output in a tempfile or in memory, using the HTTP request 
line (e.g. "GET /myServlet.py") as the key.  For each 
incoming socket connection we simply read the request line 
and return a cache item if one exists for that key.  If no 
cache item exists, we process the request as usual.  A 
simple "_cacheOutput" attribute in the Servlet class could 
be used to tell the AppServer which servlets to cache.  
"_cacheOptions" could be used to specify mem/file, 
duration, refresh interval, etc.  

The same caching method can be used to cache dynamic 
content as well!   If a servlet uses GET vars in the query 
string, but the output is constant for each GET var 
configuration you can treat each configuration as a 
separate cache item, as the HTTP request line includes the 
query string.  "GET /myServlet?id=1234&color=black" would 
be one item and "GET /myServlet?id=4321&color=white" would 
be another.

This would also work when using another web server and an 
adaptor like mod_webkit, although, with the extra request 
processing that goes on the speed increases wouldn't be as 
dramatic.  We could build a similar output caching 
framework into mod_webkit or a Tux adaptor.  The potential 
time and CPU savings might be well worth the effort.

What do you think?

Cheers,
Tavis

_______________________________________________
Webware-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/webware-devel

Reply via email to