On Mon, 2003-06-02 at 09:16, Jon Biddell wrote:
> Hi all,
> 
> Our "marketing types" want 24/7 availability of our corporate web 
> site - a fair enough request, I guess...
> 
> However we have a number of restrictions on what we can do;
> 
> 1. Must (presently) remain with IIS - moving to a Linux/Apache 
> solution may become possible later, but it's "political"

 :}. I suppose windows/Apache is also political?

> 3. There must be NO DISCERNABLE INTERRUPTION TO SERVICE when one 
> fails. Doing a "shift-reload" in the browser is NOT an option. It 
> must be TOTALLY TRANSPARENT.

This is (as has already been mentioned) tricky. See below for a
discussion....

> Keeping the boxes in sync is no problem.
> 
> I was thinking of a Linux box with 3 NICs - one to each server and 
> one to the 'net, but this will only work if the servers are 
> physically located on the same network.

That box becomes a single point of failure.

> The only other solution I can come up with, given the above anal 
> restrictions, is to use a "round robin" DNS setup, but this will 
> involve doing a reload if the primary server fails to pick up the 
> secondary DNS entry.

Much more than a reload: If you encounter a flapping situation with both
servers, you may actually increase the perceived downtime (as a as worst
case...).

> I'm open to suggestions if anyone knows of a more elegant way of 
> doing it - hell, if anyone knows how to make it work, I'll listen 
> !!

Firstly, you haven't clear identified to us, your free conslutants, the
current greatest failure risks. I.e. if the mean time between failure
for the various components is (using arbitrary figures):
Firewalls 60,000hrs.
Lan switches 100,000 hrs.
IIS 48 hrs.
Windows 200hrs.
Linux front end server 30,000 hrs.

And for simplicity we'll assume that failure here is catastrophic: you
put in a cold spare in the event of a failure. It's easy to see in the
above scenario that anything that encapsulates IIS will give you huge
uptimes relative to the naked beast being directly visible. That said,
you can start to plan how you make it all hang together.

ASSUMING that you are only concerned about IIS, not about NIC failures,
switch failures, firewall or router failures, it's really quite trivial:
front end IIS with squid, with a couple of hacks. The hacks will be to
buffer entire objects before sending full headers to the client, that
way a crashed server can result in squid retrying from the other serer,
not in the client recieving an error.

If you want to protect against network failures, multihomed connectivity
*at each site* is the way to go. Unless you have a large network, many
core routers won't propogate dual homed routes (because of the filtering
of long prefixes) - so get your ISP to dual-home you to their network at
each site.

That protects you against transient link failures at each site, and the
multiple sites allows you to fail over. You'll need a hacked DNS setup
to dynamic add and remove virtual servers as each site comes online or
suffers a failure, and that means you'll want your TTL way down. Be sure
to have the DNS servers located far away from your hosting site.

The above will not get you your requirement to 'not have to reload'. To
do that you need another hack to the front end we've introduced - you
need to convert all dynamic content to fixed length content...

Here's why:
1) You cannot realistically force everyone to use HTTP/1.1.
2) HTTP/1.0 treats a TCP connection close as 'EOF' on dynamic content -
unless you have -only- static content, browsers WILL end up with corrupt
files from time to time.

So, the above covers:
unrecoverable front end server failure mid transmission (convert all
responses to static length)
back end server failure (front end reattempts from fall over server).
simple router failures (dual homed network links).
site failures (multiple sites, with dns updates triggered on link down /
heartbeat failure (link down is better - faster updates)).
round robin cache time issues (low DNS ttl).
There's more that can be down, but the above should keep you nice and
busy.

Lastly, let me add that in all the large scale sites I've been involved
with (usually web application hosting of some sort), the business folk
do not ACTUALLY want 100% 24/7 availability - which is what all your
requirements add up to - once the cost is detailed (with reasons).
Usually, 4 nines (99.99% uptime - 1 hour of unscheduled downtime per
year) is more than enough to keep clients paying large $$$ happy. IIRC
the rule of thumb is: for each 9 you add, multiple the total project
cost by 10. And, 4 nines is 'trivially' achievable from a single site
with the appropriate resources.

My suggestion for you:
A good ISP with a end to end redundant network (including standby
routers within each lan and redundant switches).
Dual homed connection to them, using two separate exchanges and/or
connection technologies - on two power grids... you may need to rent
facilities to get these two things.
Solid UPS's and diesel backup power.
Redundant internal network infrastructure (as per the ISP).
Two IIS servers, with a single squid 2.5 stable 3 front end using the
'echo' network service to check for machine availability and load (HTTP
accept failures will be detected as well, separately). Optionally, use 3
linux machines and linux virtual server to get layer 7 resilience - but
I think you'll find the failure rate on squid is - so - low that it's
not an issue.

Rob
-- 
GPG key available at: <http://members.aardvark.net.au/lifeless/keys.txt>.

Attachment: signature.asc
Description: This is a digitally signed message part

-- 
SLUG - Sydney Linux User's Group - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to