> > this post has gotten long enough that i won't go into the gory
> > details (those of you who bet that i *don't* know how to shut up
> > can pay up, and face the terrifying concept that this is
> > *voluntary* behavior.. ;-), but i'll be happy to provide details
> > if you want them.
>
> Hey, if you don't have any nagging tendonitis, feel free....
well, i'm thoroughly into rant mode at this point, so what the
heck.
the first piece of the puzzle is the Apache SSI system,
specifically the directive:
<!--#include virtual="filepath" -->
this is the preferred way to include another file, or the output
of an executable, in the current page. the filepath is
restricted to the local machine, but you can work around that
with the proxying system, which is the second piece.
the Apache proxy system does all sorts of stuff, but one of its
directives is called ProxyPass, which takes requests for files
in a given directory, and passes them to another server for
handling. the upshot is that you can proxy your entire cgi-bin
directory off to another machine, but as far as the SSI system
is concerned, the scripts are still local.
when a user requests a page, the SSI system will see the virtual
include directive, and do an internal request for another server
thread to run the script and return output. that thread will
see that the cgi-bin directory has been proxied. instead of
trying to acquire and run the script itself, it merely opens a
connection to the remote server and lets that do the processing.
output from the remote server comes back through the proxy
thread, then gets passed back to the original server thread, and
sent to the user:
front-end server
-------------------------------
[ main thread ] [ proxy thread ] [ remote server ]
| | |
page request | | | |
------------->| | SSI request | |
| --------------->| | proxy request |
| | | --------------->| |
| | | | | ---/ processing
| | | | output | |<-
| | output | |<--------------- |
output | |<--------------- | |
<-------------- | | |
| | |
granted, the main thread does have to wait while the other
threads are doing their business, but most of that is idle time
the server can use to handle other requests. most of the lag
time in the transaction will belong to the remote server which
does the actual processing, which brings us to the third piece
of the puzzle.
the way to eliminate bottlenecks at the remote server is to
increase the effective processing speed of that server. the
nice thing about web requests is that they tend to be more or
less independent of each other. that makes them good
candidates for parallel processing, which means you can get
better performance by throwing hardware at the problem.
the biggest trick is finding a way to have the front-end server
query a whole group of remote servers to find the one that has
the most free time at the moment. if you really want to, you
can but a $12K load-balancing server from Cisco, which will make
sure every back-end server sees almost exactly as much traffic
as any other. OTOH, you can just toss a couple extra lines in
your DNS files, and the network will take care of things on its
own.
what most people don't know is that it's perfectly legal to give
more than one machine the same name in a DNS file. if my host
file looks like so:
remote.foo.com. IN A 10.0.0.100
remote.foo.com. IN A 10.0.0.102
remote.foo.com. IN A 10.0.0.103
any request for 'remote.foo.com' will be sent equally to all
three machines. the client making the request will set up a
connection with the one that answers first, and ignore the
others. just by the simple voodoo of process scheduling and
network topology, the load will more or less balance out across
all three machines.
the upside is that the busier a machine happens to be, the less
likely it is to be the first one that answers. therefore, the
network tends to balance out its processing load across the
machines in good Marxist fashion.. from each according to its
abilities, to each according to its needs.
the capacity to duplicate machines also operates with regard to
the front-end servers. webpage requests are by definition
independent of each other, so you can have multiple front-end
machines proxying requests off to the same back-end server.
the minimum configuration for a fairly robust system is to have
two identical machines at the front passing requests to two
identical machines at the back. you can put a .45 slug through
the CPU of any single machine, and the system as a whole will
continue to operate. by isolating processing to a specific
group of machines, you can build parallelized subnodes of a
larger cluster. once you have those, you can tune the
performance of the subnodes to meet the demands of the cluster
as a whole.
as you get into other protocols or specialized daemons, you
introduce more ways for machines to share information, and
increase the performance or feature set of the cluster. you
can do quite a lot in a webserver farm with just the three
pieces i've already mentioned, though. and the good news is
that you can do it with a stock Apache installation and ordinary
DNS.
mike stone <[EMAIL PROTECTED]> 'net geek..
been there, done that, have network, will travel.
____________________________________________________________________
--------------------------------------------------------------------
Join The NEW Web Consultants Association FORUMS and CHAT:
Register Today at: http://just4u.com/forums/
Web Consultants Web Site : http://just4u.com/webconsultants
Give the Gift of Life This Year...
Just4U Stop Smoking Support forum - helping smokers for
over three years-tell a friend: http://just4u.com/forums/
To get 500 Banner Ads for FREE
go to http://www.linkbuddies.com/start.go?id=111261
---------------------------------------------------------------------