On Thu, 30 Sep 2010, Giovanni Tirloni wrote:

Hello,

Recently during an electrical maintenance, we faced a problem with some
servers that had redundant PSUs. After the power was shut down on the
circuit that serves the first PSU, the second PSU failed to keep some
servers up and they rebooted (came back normal and stayed stable after
that). Tonight the same procedure was done on the other power circuit and
the second PSU failed too (on a smaller number of machines). These are all
enterprise-level servers which vendors will promptly replace failed PSUs..
but these PSUs were working fine as far as we can tell. Has anyone had this
problem too?

I'm looking for some advice regarding proactive PSU replacements. Is it a
common practice? We do replace disks as proactively as we can by monitoring
several performance metrics but for PSUs I'm at a loss here.

PSU's can die over time, and there can (and have been) flaws in design or components that will cause similar devices to start failing at around the same timeframe.

If I start having them fail on several servers, I accelerate efforts to replace that generation of servers.

the most common thing to fail in a PSU are the capacitors, and if the vendor had a bad batch of caps that made it into the power supplies, I would be reluctant to just replace the PSUs and put the systems back into mission-critical use, if those caps are bad, how can I really trust the other caps in the system?

I am in the process of doing this with a batch of systems purchased 5-6 years ago.

David Lang
_______________________________________________
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/
_______________________________________________
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to