On Thu, 30 Sep 2010, Giovanni Tirloni wrote:
Hello,Recently during an electrical maintenance, we faced a problem with some servers that had redundant PSUs. After the power was shut down on the circuit that serves the first PSU, the second PSU failed to keep some servers up and they rebooted (came back normal and stayed stable after that). Tonight the same procedure was done on the other power circuit and the second PSU failed too (on a smaller number of machines). These are all enterprise-level servers which vendors will promptly replace failed PSUs.. but these PSUs were working fine as far as we can tell. Has anyone had this problem too? I'm looking for some advice regarding proactive PSU replacements. Is it a common practice? We do replace disks as proactively as we can by monitoring several performance metrics but for PSUs I'm at a loss here.
PSU's can die over time, and there can (and have been) flaws in design or components that will cause similar devices to start failing at around the same timeframe.
If I start having them fail on several servers, I accelerate efforts to replace that generation of servers.
the most common thing to fail in a PSU are the capacitors, and if the vendor had a bad batch of caps that made it into the power supplies, I would be reluctant to just replace the PSUs and put the systems back into mission-critical use, if those caps are bad, how can I really trust the other caps in the system?
I am in the process of doing this with a batch of systems purchased 5-6 years ago.
David Lang
_______________________________________________ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
_______________________________________________ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/