Re: Fault Tolerance

Lucas Albers 1 Jul 2004 22:20:22 -0000

[snip]
> The trick is understanding how to model the risk to your systems so you
> can make intelligent decisions about where to spend your availability
> dollar. Understanding and controlling your system architecture and
> operation are key, as is having good historical data. A little fault
> tree analysis (FTA) and/or event tree analysis (ETA) can point out
> subtle interactions between systems, often problems that can be resolved
> with simple changes in configuration or operation. Obviously you can
> spend as much or as little effort on this as you want; at some point you
> need to decide what's good enough and what risks are simply not worth
> defending against.
>
>> There are plenty of setups with only *one* spamd server that fail
>> infrequently enough to make it impractical to deploy multiple machines.
>
> Failure probability is only half the story[1]; the other half is failure
> consequences. Balancing cost, risk, and benefit is a black art.
>
> -- Bob
>



Bob,
This was one of the most fascintating and relevant implifications for
managing reliability on a system.
It gave me a lot to think about.
Do you mind if I reuse the information in the "Maximum Uptime" document I
am editing?

-- 
Luke Computer Science System Administrator
Security Administrator,College of Engineering
Montana State University-Bozeman,Montana

Re: Fault Tolerance

Reply via email to