I am sponsoring this case on behalf of Yanmin Sun. This case seeks micro/patch binding. Timeout is 10/15/09.
The case introduces new FMA events and a new fmd plug-in module supporting the Nehalem_EX architecture. From the ARC perspective, this case is mostly about carving out a piece of the FMA namespace. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: FMA for Nehalem_EX 1.2. Name of Document Author/Supplier: Author: Yanmin Sun 1.3 Date of This Document: 08 October, 2009 4. Technical Description Exported Interface (Sun Private) ---------------------------------- /usr/lib/fm/fmd/plugins/fdd-msg.so This project introduces a new fmd module, fdd-msg. fdd is a daemon running on the Service Processor (SP) that does fault diagnosis. Since both Solaris FMA and fdd can diagnose the same set of errors, (memory and Quickpath interconnect errors, for example) there needs to be some coordination between Solaris FMA and fdd. Without this, it can be confusing if both sides are handling the same errors but producing different faults/taking different actions. fdd-msg is a single direction (Solaris -> SP) communication module based on IPMI that tells fdd on the SP what Solaris is doing. When fmd starts, it loads the fdd-msg plugin, which in turn sends a message to fdd. The message tells fdd what Solaris FMA is doing at a high level, ie., whether or not Solaris FMA is doing the diagnosis for memory correctable errors. fdd, upon receiving the message, will turn off its memory correctable error diagnosis and let Solaris FMA handle the errors. Exported Events (All Events have stability Sun Private) -------------------------------------------------------- ereport.cpu.intel.quickpath.bus_bad_hdr_double_ecc_external ereport.cpu.intel.quickpath.bus_bad_msg ereport.cpu.intel.quickpath.bus_bad_msg_external ereport.cpu.intel.quickpath.bus_bad_sbu_route ereport.cpu.intel.quickpath.bus_bad_sbu_route_external ereport.cpu.intel.quickpath.bus_bad_vn_credit ereport.cpu.intel.quickpath.bus_bad_vn_credit_external ereport.cpu.intel.quickpath.bus_crc_flit ereport.cpu.intel.quickpath.bus_crc_flit_external ereport.cpu.intel.quickpath.bus_eot_parity ereport.cpu.intel.quickpath.bus_eot_parity_external ereport.cpu.intel.quickpath.bus_link_init_ce ereport.cpu.intel.quickpath.bus_link_retry_err ereport.cpu.intel.quickpath.bus_link_retry_err_external ereport.cpu.intel.quickpath.bus_opr_poison_err ereport.cpu.intel.quickpath.bus_opr_poison_err_external ereport.cpu.intel.quickpath.bus_retry_abort ereport.cpu.intel.quickpath.bus_rta_parity ereport.cpu.intel.quickpath.bus_rta_parity_external ereport.cpu.intel.quickpath.bus_single_ecc ereport.cpu.intel.quickpath.bus_unknown ereport.cpu.intel.quickpath.bus_unknown_external ereport.cpu.intel.quickpath.bus_unknown_uc ereport.cpu.intel.quickpath.bus_unknown_uc_external ereport.cpu.intel.quickpath.home_agent ereport.cpu.intel.quickpath.llc_ewb_uc ereport.cpu.intel.quickpath.mem_ecc ereport.cpu.intel.quickpath.mem_ecc_uc ereport.cpu.intel.quickpath.mem_errflw_fsm_fail ereport.cpu.intel.quickpath.mem_errflw_fsm_fail_uc ereport.cpu.intel.quickpath.mem_even_parity ereport.cpu.intel.quickpath.mem_even_parity_uc ereport.cpu.intel.quickpath.mem_failover_mir ereport.cpu.intel.quickpath.mem_fberr_uc ereport.cpu.intel.quickpath.mem_lnkcrcvld ereport.cpu.intel.quickpath.mem_lnkcrcvld_uc ereport.cpu.intel.quickpath.mem_lnkpers ereport.cpu.intel.quickpath.mem_lnkpers_uc ereport.cpu.intel.quickpath.mem_lnktrns ereport.cpu.intel.quickpath.mem_lnkuncorr_uc ereport.cpu.intel.quickpath.mem_mcpar_fsmerr_uc ereport.cpu.intel.quickpath.mem_nbfbdlnkerr ereport.cpu.intel.quickpath.mem_nbfbdlnkerr_err ereport.cpu.intel.quickpath.mem_ptrl_fsm_err ereport.cpu.intel.quickpath.mem_ptrl_fsm_err_uc ereport.cpu.intel.quickpath.mem_sbfbdlinkerr ereport.cpu.intel.quickpath.mem_sbfbdlinkerr_uc ereport.cpu.intel.quickpath.mem_scrubbing_uc ereport.cpu.intel.quickpath.mem_vberr ereport.cpu.intel.quickpath.mem_vberr_uc ereport.cpu.intel.quickpath.sys_cfg_cfa_ecc ereport.cpu.intel.quickpath.sys_cfg_uc ereport.cpu.intel.quickpath.system_cache fault.cpu.intel.discard_fault fault.cpu.intel.has_poison fault.cpu.intel.quickpath.bus_interconnect fault.cpu.intel.quickpath.home_agent fault.cpu.intel.quickpath.llc_ewb fault.cpu.intel.quickpath.mem_controller_ce fault.cpu.intel.quickpath.mem_controller_ue fault.cpu.intel.quickpath.mem_failover_mir fault.cpu.intel.quickpath.mem_link_ce fault.cpu.intel.quickpath.mem_link_ue fault.cpu.intel.quickpath.mem_scrubbing fault.cpu.intel.quickpath.sys_cfg fault.cpu.intel.quickpath.system_cache The full Event definitions themselves will be archived persistently in the FMA Event Registry and are detailed in the ercheck.html in the case directory. The events registry is archived regularly as a tarball and instructions for accessing it are here: http://opensolaris.org/os/project/events-registry/. The approved FMA portfolio information can be found here: http://wikihome.sfbay.sun.com/fma-portfolio/Wiki.jsp?page=2009.024.nehalem_ex 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open