Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, Jun 15, 2018 at 09:34:06AM -0700, Luck, Tony wrote: > I was just worried that Thomas is holding off asking Linus to pull > because he's waiting for part 2/3, while you might be planning to > hold onto part 2/3 for the next merge (and add the other cleanups > you RFC'd). > > I'm OK either way on part 2/3 (it just makes for better error messages, > it doesn't fix any bugs). Yes, and this is my understanding too - 2/3 is not urgent material so no need to hurry that particular one. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, Jun 15, 2018 at 09:34:06AM -0700, Luck, Tony wrote: > I was just worried that Thomas is holding off asking Linus to pull > because he's waiting for part 2/3, while you might be planning to > hold onto part 2/3 for the next merge (and add the other cleanups > you RFC'd). > > I'm OK either way on part 2/3 (it just makes for better error messages, > it doesn't fix any bugs). Yes, and this is my understanding too - 2/3 is not urgent material so no need to hurry that particular one. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, Jun 15, 2018 at 01:45:18PM +0200, Borislav Petkov wrote: > On Thu, Jun 14, 2018 at 02:57:54PM -0700, Luck, Tony wrote: > > On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > > > happened so I'll send it later. It is nice to have anyway. > > > > Did you fix up part 2/3? > > You said "Parts 1 & 2 are nice-to-have" and we have merge window now. So > what exactly do you mean? > > > I see 1 & 3 were staged by Thomas in TIP ras/urgent and > > ras-urgent-for-linus but haven't gone into Linus' tree yet. > > Merge window ain't over yet :) "send it later" could be read a couple of ways. I was just worried that Thomas is holding off asking Linus to pull because he's waiting for part 2/3, while you might be planning to hold onto part 2/3 for the next merge (and add the other cleanups you RFC'd). I'm OK either way on part 2/3 (it just makes for better error messages, it doesn't fix any bugs). -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, Jun 15, 2018 at 01:45:18PM +0200, Borislav Petkov wrote: > On Thu, Jun 14, 2018 at 02:57:54PM -0700, Luck, Tony wrote: > > On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > > > happened so I'll send it later. It is nice to have anyway. > > > > Did you fix up part 2/3? > > You said "Parts 1 & 2 are nice-to-have" and we have merge window now. So > what exactly do you mean? > > > I see 1 & 3 were staged by Thomas in TIP ras/urgent and > > ras-urgent-for-linus but haven't gone into Linus' tree yet. > > Merge window ain't over yet :) "send it later" could be read a couple of ways. I was just worried that Thomas is holding off asking Linus to pull because he's waiting for part 2/3, while you might be planning to hold onto part 2/3 for the next merge (and add the other cleanups you RFC'd). I'm OK either way on part 2/3 (it just makes for better error messages, it doesn't fix any bugs). -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 14, 2018 at 02:57:54PM -0700, Luck, Tony wrote: > On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > > happened so I'll send it later. It is nice to have anyway. > > Did you fix up part 2/3? You said "Parts 1 & 2 are nice-to-have" and we have merge window now. So what exactly do you mean? > I see 1 & 3 were staged by Thomas in TIP ras/urgent and > ras-urgent-for-linus but haven't gone into Linus' tree yet. Merge window ain't over yet :) -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 14, 2018 at 02:57:54PM -0700, Luck, Tony wrote: > On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > > happened so I'll send it later. It is nice to have anyway. > > Did you fix up part 2/3? You said "Parts 1 & 2 are nice-to-have" and we have merge window now. So what exactly do you mean? > I see 1 & 3 were staged by Thomas in TIP ras/urgent and > ras-urgent-for-linus but haven't gone into Linus' tree yet. Merge window ain't over yet :) -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > happened so I'll send it later. It is nice to have anyway. Did you fix up part 2/3? I see 1 & 3 were staged by Thomas in TIP ras/urgent and ras-urgent-for-linus but haven't gone into Linus' tree yet. -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > happened so I'll send it later. It is nice to have anyway. Did you fix up part 2/3? I see 1 & 3 were staged by Thomas in TIP ras/urgent and ras-urgent-for-linus but haven't gone into Linus' tree yet. -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > On Thu, Jun 07, 2018 at 01:18:31PM -0700, Dan Williams wrote: > > I'm making an effort to get all persistent memory error handling holes > > covered this cycle, so I think it makes sense for this to go through > > the nvdimm tree. This looks sufficiently non-controversial that I > > could justify sending it to Linus along with the other pmem updates. > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > happened so I'll send it later. It is nice to have anyway. Thanks Boris. -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 10:24:46PM +0200, Borislav Petkov wrote: > On Thu, Jun 07, 2018 at 01:18:31PM -0700, Dan Williams wrote: > > I'm making an effort to get all persistent memory error handling holes > > covered this cycle, so I think it makes sense for this to go through > > the nvdimm tree. This looks sufficiently non-controversial that I > > could justify sending it to Linus along with the other pmem updates. > > tglx just took 1 and 3, 2/3 had a minor issue but the merge window > happened so I'll send it later. It is nice to have anyway. Thanks Boris. -Tony
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, 7 Jun 2018, Dan Williams wrote: > On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony wrote: > > On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: > >> Currently we just check the "CAPID0" register to see whether the CPU > >> can recover from machine checks. > >> > >> But there are also some special SKUs which do not have all advanced > >> RAS features, but do enable machine check recovery for use with NVDIMMs. > >> > >> Add a check for any of bits {8:5} in the "CAPID5" register (each > >> reports some NVDIMM mode available, if any of them are set, then > >> the system supports memory machine check recovery). > >> > >> Cc: sta...@vger.kernel.org # 4.9 > >> Signed-off-by: Tony Luck > >> --- > > > > Has this stalled somewhere? I'd like to see this one go into the > > 4.18 merge because it unbreaks some real hardware. > > > > Parts 1 & 2 are nice-to-have, but they just make for better error > > messages so aren't as critical. > > I'm making an effort to get all persistent memory error handling holes > covered this cycle, so I think it makes sense for this to go through > the nvdimm tree. This looks sufficiently non-controversial that I > could justify sending it to Linus along with the other pmem updates. I've picked it up already and please can we let stuff go through the right trees? The worlds does not stop turning if a fix goes in 2 days later. Thanks, tglx
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, 7 Jun 2018, Dan Williams wrote: > On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony wrote: > > On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: > >> Currently we just check the "CAPID0" register to see whether the CPU > >> can recover from machine checks. > >> > >> But there are also some special SKUs which do not have all advanced > >> RAS features, but do enable machine check recovery for use with NVDIMMs. > >> > >> Add a check for any of bits {8:5} in the "CAPID5" register (each > >> reports some NVDIMM mode available, if any of them are set, then > >> the system supports memory machine check recovery). > >> > >> Cc: sta...@vger.kernel.org # 4.9 > >> Signed-off-by: Tony Luck > >> --- > > > > Has this stalled somewhere? I'd like to see this one go into the > > 4.18 merge because it unbreaks some real hardware. > > > > Parts 1 & 2 are nice-to-have, but they just make for better error > > messages so aren't as critical. > > I'm making an effort to get all persistent memory error handling holes > covered this cycle, so I think it makes sense for this to go through > the nvdimm tree. This looks sufficiently non-controversial that I > could justify sending it to Linus along with the other pmem updates. I've picked it up already and please can we let stuff go through the right trees? The worlds does not stop turning if a fix goes in 2 days later. Thanks, tglx
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 01:18:31PM -0700, Dan Williams wrote: > I'm making an effort to get all persistent memory error handling holes > covered this cycle, so I think it makes sense for this to go through > the nvdimm tree. This looks sufficiently non-controversial that I > could justify sending it to Linus along with the other pmem updates. tglx just took 1 and 3, 2/3 had a minor issue but the merge window happened so I'll send it later. It is nice to have anyway. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 07, 2018 at 01:18:31PM -0700, Dan Williams wrote: > I'm making an effort to get all persistent memory error handling holes > covered this cycle, so I think it makes sense for this to go through > the nvdimm tree. This looks sufficiently non-controversial that I > could justify sending it to Linus along with the other pmem updates. tglx just took 1 and 3, 2/3 had a minor issue but the merge window happened so I'll send it later. It is nice to have anyway. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony wrote: > On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: >> Currently we just check the "CAPID0" register to see whether the CPU >> can recover from machine checks. >> >> But there are also some special SKUs which do not have all advanced >> RAS features, but do enable machine check recovery for use with NVDIMMs. >> >> Add a check for any of bits {8:5} in the "CAPID5" register (each >> reports some NVDIMM mode available, if any of them are set, then >> the system supports memory machine check recovery). >> >> Cc: sta...@vger.kernel.org # 4.9 >> Signed-off-by: Tony Luck >> --- > > Has this stalled somewhere? I'd like to see this one go into the > 4.18 merge because it unbreaks some real hardware. > > Parts 1 & 2 are nice-to-have, but they just make for better error > messages so aren't as critical. I'm making an effort to get all persistent memory error handling holes covered this cycle, so I think it makes sense for this to go through the nvdimm tree. This looks sufficiently non-controversial that I could justify sending it to Linus along with the other pmem updates.
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony wrote: > On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: >> Currently we just check the "CAPID0" register to see whether the CPU >> can recover from machine checks. >> >> But there are also some special SKUs which do not have all advanced >> RAS features, but do enable machine check recovery for use with NVDIMMs. >> >> Add a check for any of bits {8:5} in the "CAPID5" register (each >> reports some NVDIMM mode available, if any of them are set, then >> the system supports memory machine check recovery). >> >> Cc: sta...@vger.kernel.org # 4.9 >> Signed-off-by: Tony Luck >> --- > > Has this stalled somewhere? I'd like to see this one go into the > 4.18 merge because it unbreaks some real hardware. > > Parts 1 & 2 are nice-to-have, but they just make for better error > messages so aren't as critical. I'm making an effort to get all persistent memory error handling holes covered this cycle, so I think it makes sense for this to go through the nvdimm tree. This looks sufficiently non-controversial that I could justify sending it to Linus along with the other pmem updates.
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: > Currently we just check the "CAPID0" register to see whether the CPU > can recover from machine checks. > > But there are also some special SKUs which do not have all advanced > RAS features, but do enable machine check recovery for use with NVDIMMs. > > Add a check for any of bits {8:5} in the "CAPID5" register (each > reports some NVDIMM mode available, if any of them are set, then > the system supports memory machine check recovery). > > Cc: sta...@vger.kernel.org # 4.9 > Signed-off-by: Tony Luck > --- Has this stalled somewhere? I'd like to see this one go into the 4.18 merge because it unbreaks some real hardware. Parts 1 & 2 are nice-to-have, but they just make for better error messages so aren't as critical. > arch/x86/kernel/quirks.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c > index 697a4ce04308..736348ead421 100644 > --- a/arch/x86/kernel/quirks.c > +++ b/arch/x86/kernel/quirks.c > @@ -645,12 +645,19 @@ static void quirk_intel_brickland_xeon_ras_cap(struct > pci_dev *pdev) > /* Skylake */ > static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev) > { > - u32 capid0; > + u32 capid0, capid5; > > pci_read_config_dword(pdev, 0x84, ); > + pci_read_config_dword(pdev, 0x98, ); > > - if ((capid0 & 0xc0) == 0xc0) > + /* > + * CAPID0{7:6} indicate whether this is an advanced RAS SKU > + * CAPID5{8:5} indicate that various NVDIMM usage modes are > + * enabled, so memory machine check recovery is also enabled. > + */ > + if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) > static_branch_inc(_key); > + > } > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, > quirk_intel_brickland_xeon_ras_cap); > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, > quirk_intel_brickland_xeon_ras_cap); > -- > 2.17.0 >
Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote: > Currently we just check the "CAPID0" register to see whether the CPU > can recover from machine checks. > > But there are also some special SKUs which do not have all advanced > RAS features, but do enable machine check recovery for use with NVDIMMs. > > Add a check for any of bits {8:5} in the "CAPID5" register (each > reports some NVDIMM mode available, if any of them are set, then > the system supports memory machine check recovery). > > Cc: sta...@vger.kernel.org # 4.9 > Signed-off-by: Tony Luck > --- Has this stalled somewhere? I'd like to see this one go into the 4.18 merge because it unbreaks some real hardware. Parts 1 & 2 are nice-to-have, but they just make for better error messages so aren't as critical. > arch/x86/kernel/quirks.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c > index 697a4ce04308..736348ead421 100644 > --- a/arch/x86/kernel/quirks.c > +++ b/arch/x86/kernel/quirks.c > @@ -645,12 +645,19 @@ static void quirk_intel_brickland_xeon_ras_cap(struct > pci_dev *pdev) > /* Skylake */ > static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev) > { > - u32 capid0; > + u32 capid0, capid5; > > pci_read_config_dword(pdev, 0x84, ); > + pci_read_config_dword(pdev, 0x98, ); > > - if ((capid0 & 0xc0) == 0xc0) > + /* > + * CAPID0{7:6} indicate whether this is an advanced RAS SKU > + * CAPID5{8:5} indicate that various NVDIMM usage modes are > + * enabled, so memory machine check recovery is also enabled. > + */ > + if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) > static_branch_inc(_key); > + > } > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, > quirk_intel_brickland_xeon_ras_cap); > DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, > quirk_intel_brickland_xeon_ras_cap); > -- > 2.17.0 >
[PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
Currently we just check the "CAPID0" register to see whether the CPU can recover from machine checks. But there are also some special SKUs which do not have all advanced RAS features, but do enable machine check recovery for use with NVDIMMs. Add a check for any of bits {8:5} in the "CAPID5" register (each reports some NVDIMM mode available, if any of them are set, then the system supports memory machine check recovery). Cc: sta...@vger.kernel.org # 4.9 Signed-off-by: Tony Luck--- arch/x86/kernel/quirks.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 697a4ce04308..736348ead421 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -645,12 +645,19 @@ static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev) /* Skylake */ static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev) { - u32 capid0; + u32 capid0, capid5; pci_read_config_dword(pdev, 0x84, ); + pci_read_config_dword(pdev, 0x98, ); - if ((capid0 & 0xc0) == 0xc0) + /* +* CAPID0{7:6} indicate whether this is an advanced RAS SKU +* CAPID5{8:5} indicate that various NVDIMM usage modes are +* enabled, so memory machine check recovery is also enabled. +*/ + if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) static_branch_inc(_key); + } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, quirk_intel_brickland_xeon_ras_cap); -- 2.17.0
[PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake
Currently we just check the "CAPID0" register to see whether the CPU can recover from machine checks. But there are also some special SKUs which do not have all advanced RAS features, but do enable machine check recovery for use with NVDIMMs. Add a check for any of bits {8:5} in the "CAPID5" register (each reports some NVDIMM mode available, if any of them are set, then the system supports memory machine check recovery). Cc: sta...@vger.kernel.org # 4.9 Signed-off-by: Tony Luck --- arch/x86/kernel/quirks.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 697a4ce04308..736348ead421 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -645,12 +645,19 @@ static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev) /* Skylake */ static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev) { - u32 capid0; + u32 capid0, capid5; pci_read_config_dword(pdev, 0x84, ); + pci_read_config_dword(pdev, 0x98, ); - if ((capid0 & 0xc0) == 0xc0) + /* +* CAPID0{7:6} indicate whether this is an advanced RAS SKU +* CAPID5{8:5} indicate that various NVDIMM usage modes are +* enabled, so memory machine check recovery is also enabled. +*/ + if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) static_branch_inc(_key); + } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, quirk_intel_brickland_xeon_ras_cap); -- 2.17.0