On Tue, 30 Nov 2010 22:22:26 -0800
Suresh Siddha <[email protected]> wrote:

> On platforms with Intel 7500 chipset, there were some reports of system
> hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled.
> 
> During kdump, there is a window where the devices might be still using old
> kernel's interrupt information, while the kdump kernel is coming up. This can
> cause vt-d faults as the interrupt configuration from the old kernel map to
> null IRTE entries in the new kernel etc. (with out interrupt-remapping 
> enabled,
> we still have the same issue but in this case we will see benign spurious
> interrupt hit the new kernel).
> 
> Based on platform config settings, these platforms seem to generate NMI/SMI
> when a vt-d fault happens and there were reports that the resulting SMI causes
> the  system to hang.
> 
> Fix it by masking vt-d spec defined errors to platform error reporting logic.
> VT-d spec related errors are already handled by the VT-d OS code, so need to
> report the same erorr through other channels.
> 
> Signed-off-by: Suresh Siddha <[email protected]>
> Cc: [email protected] [v2.6.32+]
> ---
>  drivers/pci/quirks.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> Index: tip/drivers/pci/quirks.c
> ===================================================================
> --- tip.orig/drivers/pci/quirks.c
> +++ tip/drivers/pci/quirks.c
> @@ -2764,6 +2764,26 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RI
>  DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, 
> PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832);
>  #endif /*CONFIG_MMC_RICOH_MMC*/
>  
> +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP)
> +/*
> + * This is a quirk for masking vt-d spec defined errors to platform error
> + * handling logic. With out this, platforms seem to generate NMI/SMI (based
> + * on the RAS config settings of the platform) when a vt-d fault happens and
> + * there were reports that the resulting SMI causes system to hang.
> + *
> + * VT-d spec related errors are already handled by the VT-d OS code, so no
> + * need to report the same erorr through other channels.
> + */
> +static void vtd_mask_spec_errors(struct pci_dev *dev)
> +{
> +     u32 word;
> +
> +     pci_read_config_dword(dev, 0x1AC, &word);
> +     pci_write_config_dword(dev, 0x1AC, word | (1 << 31));
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors);
> +#endif
>  
>  static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
>                         struct pci_fixup *end)

Can we make these registers and bits a bit more self-documenting (i.e.
#defines for both, maybe along with other useful bit definitions for
this reg)? Also, "error" is misspelled as "erorr" above. :)

-- 
Jesse Barnes, Intel Open Source Technology Center

_______________________________________________
stable mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/stable

Reply via email to