On Tue, 30 Nov 2010 22:22:26 -0800 Suresh Siddha <[email protected]> wrote:
> On platforms with Intel 7500 chipset, there were some reports of system > hang/NMI's during kexec/kdump in the presence of interrupt-remapping enabled. > > During kdump, there is a window where the devices might be still using old > kernel's interrupt information, while the kdump kernel is coming up. This can > cause vt-d faults as the interrupt configuration from the old kernel map to > null IRTE entries in the new kernel etc. (with out interrupt-remapping > enabled, > we still have the same issue but in this case we will see benign spurious > interrupt hit the new kernel). > > Based on platform config settings, these platforms seem to generate NMI/SMI > when a vt-d fault happens and there were reports that the resulting SMI causes > the system to hang. > > Fix it by masking vt-d spec defined errors to platform error reporting logic. > VT-d spec related errors are already handled by the VT-d OS code, so need to > report the same erorr through other channels. > > Signed-off-by: Suresh Siddha <[email protected]> > Cc: [email protected] [v2.6.32+] > --- > drivers/pci/quirks.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > Index: tip/drivers/pci/quirks.c > =================================================================== > --- tip.orig/drivers/pci/quirks.c > +++ tip/drivers/pci/quirks.c > @@ -2764,6 +2764,26 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_RI > DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_RICOH, > PCI_DEVICE_ID_RICOH_R5C832, ricoh_mmc_fixup_r5c832); > #endif /*CONFIG_MMC_RICOH_MMC*/ > > +#if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) > +/* > + * This is a quirk for masking vt-d spec defined errors to platform error > + * handling logic. With out this, platforms seem to generate NMI/SMI (based > + * on the RAS config settings of the platform) when a vt-d fault happens and > + * there were reports that the resulting SMI causes system to hang. > + * > + * VT-d spec related errors are already handled by the VT-d OS code, so no > + * need to report the same erorr through other channels. > + */ > +static void vtd_mask_spec_errors(struct pci_dev *dev) > +{ > + u32 word; > + > + pci_read_config_dword(dev, 0x1AC, &word); > + pci_write_config_dword(dev, 0x1AC, word | (1 << 31)); > +} > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x342e, vtd_mask_spec_errors); > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors); > +#endif > > static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f, > struct pci_fixup *end) Can we make these registers and bits a bit more self-documenting (i.e. #defines for both, maybe along with other useful bit definitions for this reg)? Also, "error" is misspelled as "erorr" above. :) -- Jesse Barnes, Intel Open Source Technology Center _______________________________________________ stable mailing list [email protected] http://linux.kernel.org/mailman/listinfo/stable
