[snip] > The trick is understanding how to model the risk to your systems so you > can make intelligent decisions about where to spend your availability > dollar. Understanding and controlling your system architecture and > operation are key, as is having good historical data. A little fault > tree analysis (FTA) and/or event tree analysis (ETA) can point out > subtle interactions between systems, often problems that can be resolved > with simple changes in configuration or operation. Obviously you can > spend as much or as little effort on this as you want; at some point you > need to decide what's good enough and what risks are simply not worth > defending against. > >> There are plenty of setups with only *one* spamd server that fail >> infrequently enough to make it impractical to deploy multiple machines. > > Failure probability is only half the story[1]; the other half is failure > consequences. Balancing cost, risk, and benefit is a black art. > > -- Bob >
Bob, This was one of the most fascintating and relevant implifications for managing reliability on a system. It gave me a lot to think about. Do you mind if I reuse the information in the "Maximum Uptime" document I am editing? -- Luke Computer Science System Administrator Security Administrator,College of Engineering Montana State University-Bozeman,Montana
