I am sponsoring this case on behalf of Yanmin Sun.  This case seeks
micro/patch binding.  Timeout is 10/15/09.

The case introduces new FMA events and a new fmd plug-in module
supporting the Nehalem_EX architecture.  From the ARC perspective,
this case is mostly about carving out a piece of the FMA namespace.  

Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         FMA for Nehalem_EX
    1.2. Name of Document Author/Supplier:
         Author:  Yanmin Sun
    1.3  Date of This Document:
        08 October, 2009

4. Technical Description

Exported Interface (Sun Private)
----------------------------------
/usr/lib/fm/fmd/plugins/fdd-msg.so

This project introduces a new fmd module, fdd-msg. fdd is a 
daemon running on the Service Processor (SP) that does fault diagnosis.
Since both Solaris FMA and fdd can diagnose the same set of errors,
(memory and Quickpath interconnect errors, for example) there needs to 
be some coordination between Solaris FMA and fdd. Without this, it
can be confusing if both sides are handling the same errors but
producing different faults/taking different actions.

fdd-msg is a single direction (Solaris -> SP) communication module 
based on IPMI that tells fdd on the SP what Solaris is doing.  When
fmd starts, it loads the fdd-msg plugin, which in turn sends a message
to fdd. The message tells fdd what Solaris FMA is doing at a high
level, ie., whether or not Solaris FMA is doing the diagnosis for
memory correctable errors.  fdd, upon receiving the message, will turn
off its memory correctable error diagnosis and let Solaris FMA handle
the errors.

Exported Events (All Events have stability Sun Private) 
--------------------------------------------------------
ereport.cpu.intel.quickpath.bus_bad_hdr_double_ecc_external
ereport.cpu.intel.quickpath.bus_bad_msg
ereport.cpu.intel.quickpath.bus_bad_msg_external
ereport.cpu.intel.quickpath.bus_bad_sbu_route
ereport.cpu.intel.quickpath.bus_bad_sbu_route_external
ereport.cpu.intel.quickpath.bus_bad_vn_credit
ereport.cpu.intel.quickpath.bus_bad_vn_credit_external
ereport.cpu.intel.quickpath.bus_crc_flit
ereport.cpu.intel.quickpath.bus_crc_flit_external
ereport.cpu.intel.quickpath.bus_eot_parity
ereport.cpu.intel.quickpath.bus_eot_parity_external
ereport.cpu.intel.quickpath.bus_link_init_ce
ereport.cpu.intel.quickpath.bus_link_retry_err
ereport.cpu.intel.quickpath.bus_link_retry_err_external
ereport.cpu.intel.quickpath.bus_opr_poison_err
ereport.cpu.intel.quickpath.bus_opr_poison_err_external
ereport.cpu.intel.quickpath.bus_retry_abort
ereport.cpu.intel.quickpath.bus_rta_parity
ereport.cpu.intel.quickpath.bus_rta_parity_external
ereport.cpu.intel.quickpath.bus_single_ecc
ereport.cpu.intel.quickpath.bus_unknown
ereport.cpu.intel.quickpath.bus_unknown_external
ereport.cpu.intel.quickpath.bus_unknown_uc
ereport.cpu.intel.quickpath.bus_unknown_uc_external
ereport.cpu.intel.quickpath.home_agent
ereport.cpu.intel.quickpath.llc_ewb_uc
ereport.cpu.intel.quickpath.mem_ecc
ereport.cpu.intel.quickpath.mem_ecc_uc
ereport.cpu.intel.quickpath.mem_errflw_fsm_fail
ereport.cpu.intel.quickpath.mem_errflw_fsm_fail_uc
ereport.cpu.intel.quickpath.mem_even_parity
ereport.cpu.intel.quickpath.mem_even_parity_uc
ereport.cpu.intel.quickpath.mem_failover_mir
ereport.cpu.intel.quickpath.mem_fberr_uc
ereport.cpu.intel.quickpath.mem_lnkcrcvld
ereport.cpu.intel.quickpath.mem_lnkcrcvld_uc
ereport.cpu.intel.quickpath.mem_lnkpers
ereport.cpu.intel.quickpath.mem_lnkpers_uc
ereport.cpu.intel.quickpath.mem_lnktrns
ereport.cpu.intel.quickpath.mem_lnkuncorr_uc
ereport.cpu.intel.quickpath.mem_mcpar_fsmerr_uc
ereport.cpu.intel.quickpath.mem_nbfbdlnkerr
ereport.cpu.intel.quickpath.mem_nbfbdlnkerr_err
ereport.cpu.intel.quickpath.mem_ptrl_fsm_err
ereport.cpu.intel.quickpath.mem_ptrl_fsm_err_uc
ereport.cpu.intel.quickpath.mem_sbfbdlinkerr
ereport.cpu.intel.quickpath.mem_sbfbdlinkerr_uc
ereport.cpu.intel.quickpath.mem_scrubbing_uc
ereport.cpu.intel.quickpath.mem_vberr
ereport.cpu.intel.quickpath.mem_vberr_uc
ereport.cpu.intel.quickpath.sys_cfg_cfa_ecc
ereport.cpu.intel.quickpath.sys_cfg_uc
ereport.cpu.intel.quickpath.system_cache
fault.cpu.intel.discard_fault
fault.cpu.intel.has_poison
fault.cpu.intel.quickpath.bus_interconnect
fault.cpu.intel.quickpath.home_agent
fault.cpu.intel.quickpath.llc_ewb
fault.cpu.intel.quickpath.mem_controller_ce
fault.cpu.intel.quickpath.mem_controller_ue
fault.cpu.intel.quickpath.mem_failover_mir
fault.cpu.intel.quickpath.mem_link_ce
fault.cpu.intel.quickpath.mem_link_ue
fault.cpu.intel.quickpath.mem_scrubbing
fault.cpu.intel.quickpath.sys_cfg
fault.cpu.intel.quickpath.system_cache

The full Event definitions themselves will be archived persistently in
the FMA Event Registry and are detailed in the ercheck.html in the case
directory.  The events registry is archived regularly as a tarball and
instructions for accessing it are here:
http://opensolaris.org/os/project/events-registry/.

The approved FMA portfolio information can be found here:

http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2009.024.nehalem_ex

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open

Reply via email to