On Fri, Mar 11, 2011 at 11:21 PM, Jerry Sutton <jer...@airmail.net> wrote:
> I believe you will find that fmd has disabled this DIMM or DIMMs.
> If you know how much memory is supposed to be there you can check the
> output of 'prtconf", or, preferably, that of "prtdiag -v" and compare that
> to what you believe is physically installed.
> The last time I read the fmadm manpage, as I recall, end users are
> expected to NOT run fmadm repair unless specifically instructed to do so
> by Sun XXX Oracle support staff.  It *may* *not* need to be run at all when
> this DIMM is replaced (various Sun Sparc hardware seems to behave each a bit
> differently with respect to fault management in my experience)
>
> Certainly, you do NOT run 'fmadm repair' before actually replacing the DIMMs
> identified as failing.
>
> Unless you suffer additional error rate problems with other DIMMs I would
> not expect another panic induced reboot.

If you're running a reasonably recent version of Solaris 10 (or later)
on hardware that you can query serial numbers off the DIMMS (I think
anything using US-III or later CPUs will have this), it should detect
that the DIMM has been replaced, and automatically repair the fault.

If you're using older hardware where I can't obtain that info, you may
have to run fmadm repair after the DIMM(s) have been replaced.

>
> On 03/11/11 14:17, Paul Robertson wrote:
>>
>> Our V890 server reported a memory fault, rebooted, and now shows the
>> following:
>>
>> csgams08:~>sudo fmadm faulty Password: ---------------
>> ------------------------------------  -------------- --------- TIME
>> EVENT-ID                              MSG-ID         SEVERITY
>> --------------- ------------------------------------  --------------
>> --------- Mar 11 00:21:46 7acef7a1-6c9e-49db-9b03-ff6f0d5f911d
>> SUN4U-8000-35  Critical
>>
>> Fault class : fault.memory.bank 95% Affects     :
>> mem:///unum=Slot,B:J8100,J8101,J8201,J8200 degraded but still in
>> service FRU         : mem:///unum=Slot,B:J8100,J8101,J8201,J8200 95%
>> Serial ID.  :
>>
>> Description : The number of errors associated with this memory module
>> has exceeded acceptable levels.  Refer to
>> http://sun.com/msg/SUN4U-8000-35 for more information.
>>
>> Response    : Pages of memory associated with this memory module are
>> being removed from service as errors are reported.
>>
>> Impact      : Total system memory capacity will be reduced as pages
>> are retired.
>>
>> Action      : Schedule a repair procedure to replace the affected
>> memory module. Use fmdump -v -u<EVENT_ID>  to identify the module.
>>
>> We've scheduled the replacement already, but I want to understand
>> whether fmd has effectively disabled these dimms until such time as
>> we run "fmadm repair". In other words, is it likely that we'll get
>> another failure/reboot before we can schedule the maintenance? If so,
>> I guess we'll try and asr-disable these dimms to minimize the risk.
>>
>> Please advise.
>>
>> Paul
>
> --
> Jerry Sutton    jer...@airmail.net
> _______________________________________________
> opensolaris-discuss mailing list
> opensolaris-discuss@opensolaris.org
>
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to