On Fri, Mar 11, 2011 at 11:21 PM, Jerry Sutton <jer...@airmail.net> wrote: > I believe you will find that fmd has disabled this DIMM or DIMMs. > If you know how much memory is supposed to be there you can check the > output of 'prtconf", or, preferably, that of "prtdiag -v" and compare that > to what you believe is physically installed. > The last time I read the fmadm manpage, as I recall, end users are > expected to NOT run fmadm repair unless specifically instructed to do so > by Sun XXX Oracle support staff. It *may* *not* need to be run at all when > this DIMM is replaced (various Sun Sparc hardware seems to behave each a bit > differently with respect to fault management in my experience) > > Certainly, you do NOT run 'fmadm repair' before actually replacing the DIMMs > identified as failing. > > Unless you suffer additional error rate problems with other DIMMs I would > not expect another panic induced reboot.
If you're running a reasonably recent version of Solaris 10 (or later) on hardware that you can query serial numbers off the DIMMS (I think anything using US-III or later CPUs will have this), it should detect that the DIMM has been replaced, and automatically repair the fault. If you're using older hardware where I can't obtain that info, you may have to run fmadm repair after the DIMM(s) have been replaced. > > On 03/11/11 14:17, Paul Robertson wrote: >> >> Our V890 server reported a memory fault, rebooted, and now shows the >> following: >> >> csgams08:~>sudo fmadm faulty Password: --------------- >> ------------------------------------ -------------- --------- TIME >> EVENT-ID MSG-ID SEVERITY >> --------------- ------------------------------------ -------------- >> --------- Mar 11 00:21:46 7acef7a1-6c9e-49db-9b03-ff6f0d5f911d >> SUN4U-8000-35 Critical >> >> Fault class : fault.memory.bank 95% Affects : >> mem:///unum=Slot,B:J8100,J8101,J8201,J8200 degraded but still in >> service FRU : mem:///unum=Slot,B:J8100,J8101,J8201,J8200 95% >> Serial ID. : >> >> Description : The number of errors associated with this memory module >> has exceeded acceptable levels. Refer to >> http://sun.com/msg/SUN4U-8000-35 for more information. >> >> Response : Pages of memory associated with this memory module are >> being removed from service as errors are reported. >> >> Impact : Total system memory capacity will be reduced as pages >> are retired. >> >> Action : Schedule a repair procedure to replace the affected >> memory module. Use fmdump -v -u<EVENT_ID> to identify the module. >> >> We've scheduled the replacement already, but I want to understand >> whether fmd has effectively disabled these dimms until such time as >> we run "fmadm repair". In other words, is it likely that we'll get >> another failure/reboot before we can schedule the maintenance? If so, >> I guess we'll try and asr-disable these dimms to minimize the risk. >> >> Please advise. >> >> Paul > > -- > Jerry Sutton jer...@airmail.net > _______________________________________________ > opensolaris-discuss mailing list > opensolaris-discuss@opensolaris.org > _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org