RE: UART/TTY console deadlock
This one: https://lkml.org/lkml/2020/6/30/394 I did reply to all, not sure what I am missing while replying. Regards, Shirish S -Original Message- From: Greg Kroah-Hartman Sent: Thursday, July 2, 2020 11:42 AM To: S, Shirish Cc: Tony Lindgren ; Sergey Senozhatsky ; Petr Mladek ; Andy Shevchenko ; Raul Rangel ; Sergey Senozhatsky ; linux-kernel ; Andy Shevchenko ; k...@linutronix.de; Peter Zijlstra ; John Ogness ; Steven Rostedt Subject: Re: UART/TTY console deadlock On Thu, Jul 02, 2020 at 03:48:43AM +, S, Shirish wrote: > Hi All, > > Can we land this patch upstream? What patch? > Feel free to add my tested-by. Please send it in a form that I can add it, to the patch in question. thanks, greg k-h
RE: UART/TTY console deadlock
Hi All, Can we land this patch upstream? Feel free to add my tested-by. Thanks. Regards, Shirish S -Original Message- From: S, Shirish Sent: Wednesday, July 1, 2020 12:15 PM To: Tony Lindgren ; Sergey Senozhatsky Cc: Petr Mladek ; Andy Shevchenko ; Raul Rangel ; Sergey Senozhatsky ; linux-kernel ; Greg Kroah-Hartman ; Andy Shevchenko ; k...@linutronix.de; S, Shirish ; Peter Zijlstra ; John Ogness ; Steven Rostedt Subject: Re: UART/TTY console deadlock On 6/30/2020 11:32 PM, Tony Lindgren wrote: > * Sergey Senozhatsky [200630 13:06]: >> On (20/06/30 14:22), Petr Mladek wrote: > ... > >>>>>> @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port) >>>>>> * allow register changes to become visible. >>>>>> */ >>>>>> spin_lock_irqsave(>lock, flags); >>>>>> -if (up->port.irqflags & IRQF_SHARED) >>>>>> -disable_irq_nosync(port->irq); >>>>>> >>>>>> wait_for_xmitr(up, UART_LSR_THRE); >>>>>> serial_port_out_sync(port, UART_IER, UART_IER_THRI); @@ >>>>>> -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port) >>>>>> iir = serial_port_in(port, UART_IIR); >>>>>> serial_port_out(port, UART_IER, 0); >>>>>> >>>>>> -if (port->irqflags & IRQF_SHARED) >>>>>> -enable_irq(port->irq); >>>>>> spin_unlock_irqrestore(>lock, flags); >>>>>> +if (irq_shared) >>>>>> +enable_irq(port->irq); >>>>>> >>>>>> /* >>>>>> * If the interrupt is not reasserted, or we otherwise >>>>> I think that it might be safe but I am not 100% sure, sigh. >>>> Yeah, I'm not 100%, but I'd give it a try. >>> I do not feel brave enough to ack it today. But I am all for trying >>> it if anyone more familiar with the code is fine with it. >> I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers. >> I would not be very happy if _only_ printk people Ack the patch. FWIW, the lockdep trace is not seen anymore with the patch applied. Regards, Shirish S > This conditional disable for irq_shared does not look nice to me from > the other device point of view :) > > Would it be possible to just set up te dummy interrupt handler for the > startup, then change it back afterwards? See for example > omap8250_no_handle_irq(). > > The other device for irq_shared should be capable of dealing with > spurious interrupts if it's shared. > > Regards, > > Tony -- Regards, Shirish S
Re: UART/TTY console deadlock
On 6/30/2020 11:32 PM, Tony Lindgren wrote: * Sergey Senozhatsky [200630 13:06]: On (20/06/30 14:22), Petr Mladek wrote: ... @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port) * allow register changes to become visible. */ spin_lock_irqsave(>lock, flags); - if (up->port.irqflags & IRQF_SHARED) - disable_irq_nosync(port->irq); wait_for_xmitr(up, UART_LSR_THRE); serial_port_out_sync(port, UART_IER, UART_IER_THRI); @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port) iir = serial_port_in(port, UART_IIR); serial_port_out(port, UART_IER, 0); - if (port->irqflags & IRQF_SHARED) - enable_irq(port->irq); spin_unlock_irqrestore(>lock, flags); + if (irq_shared) + enable_irq(port->irq); /* * If the interrupt is not reasserted, or we otherwise I think that it might be safe but I am not 100% sure, sigh. Yeah, I'm not 100%, but I'd give it a try. I do not feel brave enough to ack it today. But I am all for trying it if anyone more familiar with the code is fine with it. I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers. I would not be very happy if _only_ printk people Ack the patch. FWIW, the lockdep trace is not seen anymore with the patch applied. Regards, Shirish S This conditional disable for irq_shared does not look nice to me from the other device point of view :) Would it be possible to just set up te dummy interrupt handler for the startup, then change it back afterwards? See for example omap8250_no_handle_irq(). The other device for irq_shared should be capable of dealing with spurious interrupts if it's shared. Regards, Tony -- Regards, Shirish S
Re: linux-next: Fixes tag needs some work in the amdgpu tree
Hi Stephen, On 10/3/2019 3:41 AM, Stephen Rothwell wrote: > Hi all, > > In commit > >8c9f69bc5cc4 ("drm/amdgpu: fix build error without CONFIG_HSA_AMD") > > Fixes tag > >Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W > scheduling policy") > > has these problem(s): > >- Target SHA1 does not exist > > Did you mean > > Fixes: aa978594cf7f ("drm/amdgpu: disable gfxoff while use no H/W scheduling > policy") Yes, I meant "aa978594cf7f". Regards, Shirish S -- Regards, Shirish S
RE: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well
I believe its fixed now in : https://lkml.org/lkml/2019/1/16/507 https://lkml.org/lkml/2019/1/16/508 I don’t see S@vger ... in the From field in the above links instead I see "S, Shirish" <> Regards, Shirish S -Original Message- From: S, Shirish Sent: Wednesday, January 16, 2019 9:01 PM To: Borislav Petkov Cc: Thomas Gleixner ; Ingo Molnar ; H . Peter Anvin ; maintainer : X86 ARCHITECTURE ; Tony Luck ; Vishal Verma ; open list : X86 ARCHITECTURE ; Lendacky, Thomas ; Singh, Brijesh Subject: RE: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well Nope thats not my email id, am not sure how(s...@vger.kernel.org) its getting added. Do you find the same for the new patchset I have sent? Regards, Shirish S -Original Message- From: Borislav Petkov Sent: Wednesday, January 16, 2019 8:57 PM To: S, Shirish Cc: Thomas Gleixner ; Ingo Molnar ; H . Peter Anvin ; maintainer : X86 ARCHITECTURE ; Tony Luck ; Vishal Verma ; open list : X86 ARCHITECTURE ; Lendacky, Thomas ; Singh, Brijesh Subject: Re: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well On Wed, Jan 16, 2019 at 03:14:29PM +, S wrote: > Done. Hope the mailer is fine now, i have added --from while sending > the new patchset and also got it reviewed from folks here. No, it isn't. And the problem is not git. The problem is your normal emails you're sending. Look at what you just sent: https://lore.kernel.org/lkml/8bed39c8-df48-dbb9-3674-dfbc4aea2...@amd.com/raw and especially your From: field: From: s...@vger.kernel.org Is that your email address? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
RE: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well
Nope thats not my email id, am not sure how(s...@vger.kernel.org) its getting added. Do you find the same for the new patchset I have sent? Regards, Shirish S -Original Message- From: Borislav Petkov Sent: Wednesday, January 16, 2019 8:57 PM To: S, Shirish Cc: Thomas Gleixner ; Ingo Molnar ; H . Peter Anvin ; maintainer : X86 ARCHITECTURE ; Tony Luck ; Vishal Verma ; open list : X86 ARCHITECTURE ; Lendacky, Thomas ; Singh, Brijesh Subject: Re: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well On Wed, Jan 16, 2019 at 03:14:29PM +, S wrote: > Done. Hope the mailer is fine now, i have added --from while sending > the new patchset and also got it reviewed from folks here. No, it isn't. And the problem is not git. The problem is your normal emails you're sending. Look at what you just sent: https://lore.kernel.org/lkml/8bed39c8-df48-dbb9-3674-dfbc4aea2...@amd.com/raw and especially your From: field: From: s...@vger.kernel.org Is that your email address? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.
[PATCH 2/2] x86/mce/amd: carve out MC4_MISC thresholding quirk
MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and S3 -> S0 state transitions, which follow different code paths, hence carve it out and move it mce_amd_feature_init(), which is the converging point of both code paths. Changelog[v2]: - move the quirk to mce/amd.c Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/amd.c | 34 ++ arch/x86/kernel/cpu/mce/core.c | 29 - 2 files changed, 34 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 89298c8..f6a5c96 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -545,6 +545,33 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, return offset; } +void mc4_misc_thresholding_quirk(void) +{ + int i; + u64 hwcr; + bool need_toggle; + u32 msrs[] = { + 0x0413, /* MC4_MISC0 */ + 0xc408, /* MC4_MISC1 */ + }; + + rdmsrl(MSR_K7_HWCR, hwcr); + + /* McStatusWrEn has to be set */ + need_toggle = !(hwcr & BIT(18)); + + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); + + /* Clear CntP bit safely */ + for (i = 0; i < ARRAY_SIZE(msrs); i++) + msr_clear_bit(msrs[i], 62); + + /* restore old settings */ + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr); +} + /* cpu init entry point, called from mce.c with preempt off */ void mce_amd_feature_init(struct cpuinfo_x86 *c) { @@ -552,6 +579,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) unsigned int bank, block, cpu = smp_processor_id(); int offset = -1; + /* +* Turn off MC4_MISC thresholding banks on all family 15 models since +* they're not supported there. +*/ + if (c->x86 == 0x15) + mc4_misc_thresholding_quirk(); + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (mce_flags.smca) smca_configure(bank, cpu); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index d0c5416..6063ae2 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1611,35 +1611,6 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (c->x86 == 0x15 && c->x86_model <= 0xf) mce_flags.overflow_recov = 1; - /* -* Turn off MC4_MISC thresholding banks on all models since -* they're not supported there. -*/ - if (c->x86 == 0x15) { - int i; - u64 hwcr; - bool need_toggle; - u32 msrs[] = { - 0x0413, /* MC4_MISC0 */ - 0xc408, /* MC4_MISC1 */ - }; - - rdmsrl(MSR_K7_HWCR, hwcr); - - /* McStatusWrEn has to be set */ - need_toggle = !(hwcr & BIT(18)); - - if (need_toggle) - wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); - - /* Clear CntP bit safely */ - for (i = 0; i < ARRAY_SIZE(msrs); i++) - msr_clear_bit(msrs[i], 62); - - /* restore old settings */ - if (need_toggle) - wrmsrl(MSR_K7_HWCR, hwcr); - } } if (c->x86_vendor == X86_VENDOR_INTEL) { -- 2.7.4
[PATCH 1/2] x86/mce/amd: apply MC4_MISC thresholding to all models of family 15
Its evident from various forums and logs that MC4_MISC thresholding is not supported for the family 15 processors, hence skip the x86_model check while applying quirk. Changelog[v2]: - reword commit message to adhere to coding standards - remove check of model range Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 672c722..d0c5416 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1612,11 +1612,10 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) mce_flags.overflow_recov = 1; /* -* Turn off MC4_MISC thresholding banks on those models since +* Turn off MC4_MISC thresholding banks on all models since * they're not supported there. */ - if (c->x86 == 0x15 && - (c->x86_model >= 0x10 && c->x86_model <= 0x1f)) { + if (c->x86 == 0x15) { int i; u64 hwcr; bool need_toggle; -- 2.7.4
[PATCH 0/2] x86/mce/amd: apply missing quirks to family 15 models (v2)
Below patch series applies to family 15 CPU's of AMD platform, to address a consistent warning of: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." at every boot and every resume, which is misguiding as the reason is not a Firmware Bug but "MC4_MISC thresholding quirk" not being apporpriately applied. Shirish S (2): x86/mce/amd: apply MC4_MISC thresholding to all models of family 15 x86/mce/amd: carve out MC4_MISC thresholding quirk arch/x86/kernel/cpu/mce/amd.c | 34 ++ arch/x86/kernel/cpu/mce/core.c | 30 -- 2 files changed, 34 insertions(+), 30 deletions(-) -- 2.7.4
[PATCH 3/3] x86/mce/amd: apply MC4_MISC thresholding quirk in resume path
There are 2 code paths leading to mce_amd_feature_init() as below. 1) S5 -> S0: (boot) secondary_startup_64 -> start_kernel -> identify_boot_cpu -> identify_cpu -> mcheck_cpu_init (calls __mcheck_cpu_apply_quirks before) -> mce_amd_feature_init 2) S3 -> S0: (resume) syscore_resume -> mce_syscore_resume -> mce_amd_feature_init Its clear that the quirks are not applied in S3 -> S0 path, hence apply the same. Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/amd.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 89298c8..ab1b12a 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -552,6 +552,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) unsigned int bank, block, cpu = smp_processor_id(); int offset = -1; + /* +* mcheck_cpu_init() is not called when called by mce_syscore_resume(), +* hence re-apply quirks, to be on safer side. +*/ + quirk_fam15_mc4_misc_thresholding(); + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (mce_flags.smca) smca_configure(bank, cpu); -- 2.7.4
[PATCH 0/3] x86/mce/amd: apply missing quirks to family 15 models (v2)
Below patch series applies to family 15 CPU's of AMD platform, to address a consistent warning of: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." at every boot and every resume, which is misguiding as the reason is not a Firmware Bug but "MC4_MISC thresholding quirk" not being apporpriately applied. Shirish S (3): x86/mce/amd: apply MC4_MISC thresholding to all models of family 15 x86/mce/amd: carve out MC4_MISC thresholding quirk x86/mce/amd: apply MC4_MISC thresholding quirk in resume path arch/x86/include/asm/mce.h | 1 + arch/x86/kernel/cpu/mce/amd.c | 6 arch/x86/kernel/cpu/mce/core.c | 65 +++--- 3 files changed, 42 insertions(+), 30 deletions(-) -- 2.7.4
[PATCH 1/3] x86/mce/amd: apply MC4_MISC thresholding to all models of family 15
Its evident from various forums and logs that MC4_MISC thresholding is not supported for the family 15 processors, hence skip the x86_model check while applying quirk. Changelog[v2]: - reword commit message to adhere to coding standards - remove check of model range Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 672c722..d0c5416 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1612,11 +1612,10 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) mce_flags.overflow_recov = 1; /* -* Turn off MC4_MISC thresholding banks on those models since +* Turn off MC4_MISC thresholding banks on all models since * they're not supported there. */ - if (c->x86 == 0x15 && - (c->x86_model >= 0x10 && c->x86_model <= 0x1f)) { + if (c->x86 == 0x15) { int i; u64 hwcr; bool need_toggle; -- 2.7.4
[PATCH 2/3] x86/mce/amd: carve out MC4_MISC thresholding quirk
MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and S3 -> S0 state transitions, which follow different code paths, hence carve it out so as to facilitate its application in both scenarios. Signed-off-by: Shirish S --- arch/x86/include/asm/mce.h | 1 + arch/x86/kernel/cpu/mce/core.c | 64 +++--- 2 files changed, 36 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index c1a812b..328b65c 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -216,6 +216,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr); static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { return -EINVAL; }; #endif +void quirk_fam15_mc4_misc_thresholding(void); static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index d0c5416..51f61cf 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1570,6 +1570,39 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs) m->cs = regs->cs; } +/* + * Turn off MC4_MISC thresholding banks on all family 15 models since + * they're not supported there. + */ +void quirk_fam15_mc4_misc_thresholding(void) +{ + if (boot_cpu_data.x86 == 0x15) { + int i; + u64 hwcr; + bool need_toggle; + u32 msrs[] = { + 0x0413, /* MC4_MISC0 */ + 0xc408, /* MC4_MISC1 */ + }; + + rdmsrl(MSR_K7_HWCR, hwcr); + + /* McStatusWrEn has to be set */ + need_toggle = !(hwcr & BIT(18)); + + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); + + /* Clear CntP bit safely */ + for (i = 0; i < ARRAY_SIZE(msrs); i++) + msr_clear_bit(msrs[i], 62); + + /* restore old settings */ + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr); + } +} + /* Add per CPU specific workarounds here */ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) { @@ -1611,35 +1644,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (c->x86 == 0x15 && c->x86_model <= 0xf) mce_flags.overflow_recov = 1; - /* -* Turn off MC4_MISC thresholding banks on all models since -* they're not supported there. -*/ - if (c->x86 == 0x15) { - int i; - u64 hwcr; - bool need_toggle; - u32 msrs[] = { - 0x0413, /* MC4_MISC0 */ - 0xc408, /* MC4_MISC1 */ - }; - - rdmsrl(MSR_K7_HWCR, hwcr); - - /* McStatusWrEn has to be set */ - need_toggle = !(hwcr & BIT(18)); - - if (need_toggle) - wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); - - /* Clear CntP bit safely */ - for (i = 0; i < ARRAY_SIZE(msrs); i++) - msr_clear_bit(msrs[i], 62); - - /* restore old settings */ - if (need_toggle) - wrmsrl(MSR_K7_HWCR, hwcr); - } + quirk_fam15_mc4_misc_thresholding(); + } if (c->x86_vendor == X86_VENDOR_INTEL) { -- 2.7.4
[PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well
This patch adds threshold quirk applicable for family 15 in resume path as well, since mce_amd_feature_init() does not have quirks applied when originating from mce_syscore_resume(), resulting in the below message at every successful resume: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/amd.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 89298c8..27cbf66 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -545,6 +545,34 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, return offset; } +void disable_err_thresholding(struct cpuinfo_x86 *c) +{ + int i; + u64 hwcr; + bool need_toggle; + u32 msrs[] = { + 0x0413, /* MC4_MISC0 */ + 0xc408, /* MC4_MISC1 */ + }; + + if (c->x86_model >= 0x10 && c->x86_model <= 0x7f) { + rdmsrl(MSR_K7_HWCR, hwcr); + + /* McStatusWrEn has to be set */ + need_toggle = !(hwcr & BIT(18)); + + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); + + /* Clear CntP bit safely */ + for (i = 0; i < ARRAY_SIZE(msrs); i++) + msr_clear_bit(msrs[i], 62); + + /* restore old settings */ + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr); + } +} /* cpu init entry point, called from mce.c with preempt off */ void mce_amd_feature_init(struct cpuinfo_x86 *c) { @@ -552,6 +580,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) unsigned int bank, block, cpu = smp_processor_id(); int offset = -1; + /* Disable error thresholding bank in S3 resume path as well, +* for 15h family +*/ + if (c->x86 == 0x15) + disable_err_thresholding(c); + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (mce_flags.smca) smca_configure(bank, cpu); -- 2.7.4
[PATCH 1/2] x86/mce/amd: Extend "Disable error thresholding bank 4" to more models
The below patch added this quirk only for the first generation of family 15 processors, over time its noticed that its required for later generations too. "575203b4747c x86, MCE, AMD: Disable error thresholding bank 4 on some models" This patch extends the quirk to make it applicable till 7th Generation, so as to address the below warning at boot: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 672c722..051b536 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1616,7 +1616,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) * they're not supported there. */ if (c->x86 == 0x15 && - (c->x86_model >= 0x10 && c->x86_model <= 0x1f)) { + (c->x86_model >= 0x10 && c->x86_model <= 0x7f)) { int i; u64 hwcr; bool need_toggle; -- 2.7.4
[PATCH 0/2] x86/mce/amd: apply missing quirks for family 15 models
This patch series applies to family 15 CPU's of AMD platforms, so as to address a consistent warning of "[Firmware Bug]: cpu 0, invalid threshold interrupt offset" at every boot and upon completiong of successful S3 cycle, due to a missing quirk, which was not extended to newer models and also not applied in resume path. Shirish S (2): x86/mce/amd: Extend "Disable error thresholding bank 4" to more models x86/mce/amd: Ensure quirks are applied in resume path as well arch/x86/kernel/cpu/mce/amd.c | 34 ++ arch/x86/kernel/cpu/mce/core.c | 2 +- 2 files changed, 35 insertions(+), 1 deletion(-) -- 2.7.4
[PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well
This patch adds threshold quirk applicable for family 15 in resume path as well, since mce_amd_feature_init() does not have quirks applied when originating from mce_syscore_resume(), resulting in the below message at every successful resume: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/amd.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 89298c8..27cbf66 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -545,6 +545,34 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, return offset; } +void disable_err_thresholding(struct cpuinfo_x86 *c) +{ + int i; + u64 hwcr; + bool need_toggle; + u32 msrs[] = { + 0x0413, /* MC4_MISC0 */ + 0xc408, /* MC4_MISC1 */ + }; + + if (c->x86_model >= 0x10 && c->x86_model <= 0x7f) { + rdmsrl(MSR_K7_HWCR, hwcr); + + /* McStatusWrEn has to be set */ + need_toggle = !(hwcr & BIT(18)); + + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr | BIT(18)); + + /* Clear CntP bit safely */ + for (i = 0; i < ARRAY_SIZE(msrs); i++) + msr_clear_bit(msrs[i], 62); + + /* restore old settings */ + if (need_toggle) + wrmsrl(MSR_K7_HWCR, hwcr); + } +} /* cpu init entry point, called from mce.c with preempt off */ void mce_amd_feature_init(struct cpuinfo_x86 *c) { @@ -552,6 +580,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) unsigned int bank, block, cpu = smp_processor_id(); int offset = -1; + /* Disable error thresholding bank in S3 resume path as well, +* for 15h family +*/ + if (c->x86 == 0x15) + disable_err_thresholding(c); + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (mce_flags.smca) smca_configure(bank, cpu); -- 2.7.4
[PATCH 1/2] x86/mce/amd: Extend "Disable error thresholding bank 4" to more models
The below patch added this quirk only for the first generation of family 15 processors, over time its noticed that its required for later generations too. "575203b4747c x86, MCE, AMD: Disable error thresholding bank 4 on some models" This patch extends the quirk to make it applicable till 7th Generation, so as to address the below warning at boot: "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..." Signed-off-by: Shirish S --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 672c722..051b536 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1616,7 +1616,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) * they're not supported there. */ if (c->x86 == 0x15 && - (c->x86_model >= 0x10 && c->x86_model <= 0x1f)) { + (c->x86_model >= 0x10 && c->x86_model <= 0x7f)) { int i; u64 hwcr; bool need_toggle; -- 2.7.4
Re: [PATCH] tpm: Fix NULL pointer dereference in tpm_transmit()
On 7/4/2018 10:43 PM, Jarkko Sakkinen wrote: On Wed, Jul 04, 2018 at 02:33:40PM +0530, Shirish S wrote: During system shutdown, tpm_class_shutdown() when called with TPM_CHIP_FLAG_TPM2 flag set, makes chip->ops NULL. However tpm_chip_unregister() called later in shutdown sequence tries to access chip->ops in tpm_try_transmit() leading the NULL pointer dereference. This patch fixes this issue. Below is the trace for reference: BUG: unable to handle kernel NULL pointer dereference at 0048 IP: tpm_transmit+0x267/0x565 PGD 0 P4D 0 Oops: [#1] PREEMPT SMP NOPTI ... task: 937c847fe580 task.stack: a79f80b04000 RIP: 0010:tpm_transmit+0x267/0x565 RSP: 0018:a79f80b07c08 EFLAGS: 00010286 RAX: RBX: 937ca9bc8000 RCX: 937c847fe580 RDX: RSI: 0002 RDI: 98e3cd40 RBP: a79f80b07c88 R08: 0001fff4 R09: R10: R11: R12: a79f80b07cd4 R13: 008c R14: ffc3 R15: FS: 7ef31f747740() GS:937caed0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0048 CR3: 0001243d2000 CR4: 001406e0 Call Trace: tpm_transmit_cmd+0x25/0x70 tpm2_shutdown+0x69/0xa3 ? __radix_tree_replace+0xd9/0x120 ? idr_replace_ext+0x92/0xb6 tpm_chip_unregister+0xaa/0xdb cr50_i2c_shutdown+0x1e/0x41 device_shutdown+0x157/0x193 kernel_power_off+0x35/0x6e SYSC_reboot+0x120/0x1a3 ? do_writepages+0x36/0x6e ? do_writepages+0x36/0x6e ? sync_inodes_one_sb+0x17/0x17 ? _raw_spin_unlock+0xe/0x20 ? iput+0x87/0x1bd do_syscall_64+0x64/0x72 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Signed-off-by: Shirish S --- drivers/char/tpm/tpm-interface.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index e32f6e8..b14d196 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -411,6 +411,14 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, unsigned long stop; bool need_locality; + /* chip->ops is made NULL in tpm_class_shutdown() +* This case is hit when tpm_chip_unregister() is called post +* tpm_class_shutdown(), hence exit early and return +* transmit operation not permitted +*/ + if (!chip->ops) + return -EPERM; + rc = tpm_validate_command(chip, space, buf, bufsiz); if (rc == -EINVAL) return rc; -- 2.7.4 What is cr50 anyway? Please ignore this patch, i realised the cr50 is not yet upstreamed. Hence the issue lies in in cr50 and not tpm driver. Thanks. Regards, Shirish S /Jarkko -- Regards, Shirish S
Re: [PATCH] tpm: Fix NULL pointer dereference in tpm_transmit()
On 7/4/2018 10:43 PM, Jarkko Sakkinen wrote: On Wed, Jul 04, 2018 at 02:33:40PM +0530, Shirish S wrote: During system shutdown, tpm_class_shutdown() when called with TPM_CHIP_FLAG_TPM2 flag set, makes chip->ops NULL. However tpm_chip_unregister() called later in shutdown sequence tries to access chip->ops in tpm_try_transmit() leading the NULL pointer dereference. This patch fixes this issue. Below is the trace for reference: BUG: unable to handle kernel NULL pointer dereference at 0048 IP: tpm_transmit+0x267/0x565 PGD 0 P4D 0 Oops: [#1] PREEMPT SMP NOPTI ... task: 937c847fe580 task.stack: a79f80b04000 RIP: 0010:tpm_transmit+0x267/0x565 RSP: 0018:a79f80b07c08 EFLAGS: 00010286 RAX: RBX: 937ca9bc8000 RCX: 937c847fe580 RDX: RSI: 0002 RDI: 98e3cd40 RBP: a79f80b07c88 R08: 0001fff4 R09: R10: R11: R12: a79f80b07cd4 R13: 008c R14: ffc3 R15: FS: 7ef31f747740() GS:937caed0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0048 CR3: 0001243d2000 CR4: 001406e0 Call Trace: tpm_transmit_cmd+0x25/0x70 tpm2_shutdown+0x69/0xa3 ? __radix_tree_replace+0xd9/0x120 ? idr_replace_ext+0x92/0xb6 tpm_chip_unregister+0xaa/0xdb cr50_i2c_shutdown+0x1e/0x41 device_shutdown+0x157/0x193 kernel_power_off+0x35/0x6e SYSC_reboot+0x120/0x1a3 ? do_writepages+0x36/0x6e ? do_writepages+0x36/0x6e ? sync_inodes_one_sb+0x17/0x17 ? _raw_spin_unlock+0xe/0x20 ? iput+0x87/0x1bd do_syscall_64+0x64/0x72 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Signed-off-by: Shirish S --- drivers/char/tpm/tpm-interface.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index e32f6e8..b14d196 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -411,6 +411,14 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, unsigned long stop; bool need_locality; + /* chip->ops is made NULL in tpm_class_shutdown() +* This case is hit when tpm_chip_unregister() is called post +* tpm_class_shutdown(), hence exit early and return +* transmit operation not permitted +*/ + if (!chip->ops) + return -EPERM; + rc = tpm_validate_command(chip, space, buf, bufsiz); if (rc == -EINVAL) return rc; -- 2.7.4 What is cr50 anyway? Please ignore this patch, i realised the cr50 is not yet upstreamed. Hence the issue lies in in cr50 and not tpm driver. Thanks. Regards, Shirish S /Jarkko -- Regards, Shirish S