On 9/30/2010 10:04 AM, Giovanni Tirloni wrote:
Hello,

Recently during an electrical maintenance, we faced a problem with some servers that had redundant PSUs. After the power was shut down on the circuit that serves the first PSU, the second PSU failed to keep some servers up and they rebooted (came back normal and stayed stable after that). Tonight the same procedure was done on the other power circuit and the second PSU failed too (on a smaller number of machines). These are all enterprise-level servers which vendors will promptly replace failed PSUs.. but these PSUs were working fine as far as we can tell. Has anyone had this problem too?

I'm looking for some advice regarding proactive PSU replacements. Is it a common practice? We do replace disks as proactively as we can by monitoring several performance metrics but for PSUs I'm at a loss here.

Thank you,

--
Giovanni Tirloni
gtirl...@sysdroid.com <mailto:gtirl...@sysdroid.com>

Unfortunately, the most likely time for a PSU to fail is the stress caused during a sudden change in state. This usually happens when the machine has been running for a couple/few years, you lose power, it comes back on, has 50A of inrush current for .1 seconds, then *pow*. But it can happen during failover, as you found too.

You could try asking the vendor for failure stats and if there are any particular lot/revision numbers of PSUs that have this issue since it hit you a lot. You could do some of this yourself by looking at the model number on the replaceable unit. Many times they'll have a rev like A, B, C, etc. You can check to see if there's any commonality and then ask the vendor. If you don't get anywhere, you could try demanding replacement for all of the same model number/rev.

Are you still under service coverage?
_______________________________________________
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to