On Thu, Sep 30, 2010 at 11:25 AM, Doug Hughes <d...@will.to> wrote: > Unfortunately, the most likely time for a PSU to fail is the stress caused > during a sudden change in state. This usually happens when the machine has > been running for a couple/few years, you lose power, it comes back on, has > 50A of inrush current for .1 seconds, then *pow*. But it can happen during > failover, as you found too. > > You could try asking the vendor for failure stats and if there are any > particular lot/revision numbers of PSUs that have this issue since it hit > you a lot. You could do some of this yourself by looking at the model number > on the replaceable unit. Many times they'll have a rev like A, B, C, etc. > You can check to see if there's any commonality and then ask the vendor. If > you don't get anywhere, you could try demanding replacement for all of the > same model number/rev. >
That's a good idea. Only 2-3% of the servers failed so I think we'll be able to narrow it down. Are you still under service coverage? > Yes. For the machine that failed we're going to replace the PSUs, that should not be a problem. -- Giovanni Tirloni gtirl...@sysdroid.com
_______________________________________________ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/