On 2011 Jan 24, at 17:01, Derek J. Balling wrote: [snip]
> When they ask for diag output from some random software they want me to > install and run, I tell them I can't install other applications on these > machines for security reasons, and when they ask me to boot into a > diagnostics mode, I tell them I can't schedule the downtime other than for > the swap of what their onboard software already in place claims is > failed/degraded. (Obviously, if I *don't* know what the problem is, the > service call takes a completely different tack). > > All that said, though, I *do* make sure in "degraded" conditions, that I'm > running the latest firmware, and ensure that the error crops back up in the > current firmware revs, because sometimes the diagnostic code has bugs and > false-positives claiming a non-existent DIMM condition and such (we've seen > it happen). We refuse to run the latest and greatest firmware, and push back mightily when told to upgrade. This goes so far as to escalate when they refuse to do a hard drive swap for a completely failed drive because the firmware isn't up to date, or the DIMM has been caught causing kernel panics left and right. If (and this is a huge if), the vendor can show that the firmware version in question has fixes directly related to our issue, then we'll consider the upgrade, scheduled when we have lots of on-call staff and have the hardware support on standby to handle a failed firmware install, or firmware that has to be backed out. We've been burned too many times by lousy firmware. I'm mainly willing to run DSET and similar tools when I don't see the problem. If I can see what is by the more old fashioned way, we push hard. This is because a certain vendor was caught hiding pending hardware failures with firmware updates. "Oh, this failure isn't critical, so just disable this warning that says it is likely to die in three months." ---- "The speed of communications is wondrous to behold. It is also true that speed can multiply the distribution of information that we know to be untrue." Edward R Murrow (1964) Mark McCullough [email protected] _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
