Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-07 Thread Ganesh

On 9/8/21 11:10 AM, Michael Ellerman wrote:


Ganesh  writes:

On 9/6/21 6:03 PM, Michael Ellerman wrote:


Ganesh Goudar  writes

Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
NIP:  c0735d60 LR: c0318640 CTR: 
REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
MSR:  80001003   CR: 28008228  XER: 0001
CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
GPR08: 0023 c12ffe08  c00801460240
GPR12:  c0001ec9a900 c0002ac4bd00 
GPR16: 05a0 c008006b c008006b05a0 c0ff3068
GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
GPR24: c00801490108 c1636198 c00801470090 c00801470058
GPR28: 0510 c0080100 c0080819 0019
NIP [c0735d60] llist_add_batch+0x0/0x40
LR [c0318640] __irq_work_queue_local+0x70/0xc0
Call Trace:
[c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
[c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
[c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
handler")

Please explain in more detail why that commit caused this breakage.

After enabling translation in mce_handle_error() we used to leave it enabled to 
avoid
crashing here, but now with this commit we are restoring the MSR to disable 
translation.

Are you sure we left the MMU enabled to avoid crashing there, or we just
left it enabled by accident?


No, I think we left it enabled intentionally, I mentioned about leaving it 
enabled
in my comment and commit message of a95a0a1654 "powerpc/pseries: Fix MCE handling on 
pseries".



But yeah, previously the MMU was enabled when we got here whereas now
it's not, because of that change.


Missed to mention it in commit log, I will add it.

Thanks.


diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..9d1e39d42e3e 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -249,6 +249,7 @@ void machine_check_queue_event(void)
   {
int index;
struct machine_check_event evt;
+   unsigned long msr;
   
   	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))

return;
@@ -262,8 +263,19 @@ void machine_check_queue_event(void)
memcpy(&local_paca->mce_info->mce_event_queue[index],
   &evt, sizeof(evt));
   
-	/* Queue irq work to process this event later. */

-   irq_work_queue(&mce_event_process_work);
+   /* Queue irq work to process this event later. Before
+* queuing the work enable translation for non radix LPAR,
+* as irq_work_queue may try to access memory outside RMO
+* region.
+*/
+   if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+   msr = mfmsr();
+   mtmsr(msr | MSR_IR | MSR_DR);
+   irq_work_queue(&mce_event_process_work);
+   mtmsr(msr);
+   } else {
+   irq_work_queue(&mce_event_process_work);
+   }
   }

We already went to virtual mode and queued (different) irq work in
arch/powerpc/platforms/pseries/ras.c:mce_handle_error()

We also called save_mce_event() which also might have queued irq work,
via machine_check_ue_event().

So it really feels like something about the design is wrong if we have
to go to virtual mode again and queue more irq work here.

I guess we can probably merge this as a backportable fix, doing anything
else would be a bigger change.

I agree.


Looking at ras.c there's the comment:

 * Enable translation as we will be accessing per-cpu variables
 * in save_mce_event() which may fall outside RMO region, also

But AFAICS it's only irq_work_queue() that touches anything percpu?

Yeah, we left the comment unchanged after doing some modifications around it,
It needs to be updated, ill send a separate patch for it.

Thanks.

I see some other comments that look out of date, ie. the one above
machine_check_process_queued_event() mentions syscall exit, which is no
longer true.


ill take care of it.



There's also comments in pseries/ras.c about fwnmi_release_errinfo()
crashing in real mode, but we call it in real mode now so that must be
fixed?


Yes, it is fixed now.




So maybe we should just not be using irq_work_queue(). It's a pretty
thin wrapper around set_dec(1), perhaps we just need to hand-roll some
real-mode friendly way of doing that.

You mean, have separate 

Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-07 Thread Michael Ellerman
Ganesh  writes:
> On 9/6/21 6:03 PM, Michael Ellerman wrote:
>
>> Ganesh Goudar  writes:
>>> We queue an irq work for deferred processing of mce event
>>> in realmode mce handler, where translation is disabled.
>>> Queuing of the work may result in accessing memory outside
>>> RMO region, such access needs the translation to be enabled
>>> for an LPAR running with hash mmu else the kernel crashes.
>>>
>>> So enable the translation before queuing the work.
>>>
>>> Without this change following trace is seen on injecting machine
>>> check error in an LPAR running with hash mmu.
>> What type of error are you injecting?
>
> SLB multihit in kernel mode.
>
>>
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>> CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
>>> NIP:  c0735d60 LR: c0318640 CTR: 
>>> REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
>>> MSR:  80001003   CR: 28008228  XER: 0001
>>> CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
>>> GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
>>> GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
>>> GPR08: 0023 c12ffe08  c00801460240
>>> GPR12:  c0001ec9a900 c0002ac4bd00 
>>> GPR16: 05a0 c008006b c008006b05a0 c0ff3068
>>> GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
>>> GPR24: c00801490108 c1636198 c00801470090 c00801470058
>>> GPR28: 0510 c0080100 c0080819 0019
>>> NIP [c0735d60] llist_add_batch+0x0/0x40
>>> LR [c0318640] __irq_work_queue_local+0x70/0xc0
>>> Call Trace:
>>> [c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
>>> [c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
>>> [c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
>>> [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4
>>>
>>> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning 
>>> from handler")
>> Please explain in more detail why that commit caused this breakage.
>
> After enabling translation in mce_handle_error() we used to leave it enabled 
> to avoid
> crashing here, but now with this commit we are restoring the MSR to disable 
> translation.

Are you sure we left the MMU enabled to avoid crashing there, or we just
left it enabled by accident?

But yeah, previously the MMU was enabled when we got here whereas now
it's not, because of that change.

> Missed to mention it in commit log, I will add it.

Thanks.

>>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>>> index 47a683cd00d2..9d1e39d42e3e 100644
>>> --- a/arch/powerpc/kernel/mce.c
>>> +++ b/arch/powerpc/kernel/mce.c
>>> @@ -249,6 +249,7 @@ void machine_check_queue_event(void)
>>>   {
>>> int index;
>>> struct machine_check_event evt;
>>> +   unsigned long msr;
>>>   
>>> if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
>>> return;
>>> @@ -262,8 +263,19 @@ void machine_check_queue_event(void)
>>> memcpy(&local_paca->mce_info->mce_event_queue[index],
>>>&evt, sizeof(evt));
>>>   
>>> -   /* Queue irq work to process this event later. */
>>> -   irq_work_queue(&mce_event_process_work);
>>> +   /* Queue irq work to process this event later. Before
>>> +* queuing the work enable translation for non radix LPAR,
>>> +* as irq_work_queue may try to access memory outside RMO
>>> +* region.
>>> +*/
>>> +   if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
>>> +   msr = mfmsr();
>>> +   mtmsr(msr | MSR_IR | MSR_DR);
>>> +   irq_work_queue(&mce_event_process_work);
>>> +   mtmsr(msr);
>>> +   } else {
>>> +   irq_work_queue(&mce_event_process_work);
>>> +   }
>>>   }
>> We already went to virtual mode and queued (different) irq work in
>> arch/powerpc/platforms/pseries/ras.c:mce_handle_error()
>>
>> We also called save_mce_event() which also might have queued irq work,
>> via machine_check_ue_event().
>>
>> So it really feels like something about the design is wrong if we have
>> to go to virtual mode again and queue more irq work here.
>>
>> I guess we can probably merge this as a backportable fix, doing anything
>> else would be a bigger change.
>
> I agree.
>
>>
>> Looking at ras.c there's the comment:
>>
>>   * Enable translation as we will be accessing per-cpu variables
>>   * in save_mce_event() which may fall outside RMO region, also
>>
>> But AFAICS it's only irq_work_queue() that touches anything percpu?
>
> Yeah, we left the comment unchanged after doing some modifications around it,
> It needs to be updated, ill send a separate patch for it.

Thanks.

I see 

Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-07 Thread Ganesh


On 9/6/21 6:03 PM, Michael Ellerman wrote:


Ganesh Goudar  writes:

We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
for an LPAR running with hash mmu else the kernel crashes.

So enable the translation before queuing the work.

Without this change following trace is seen on injecting machine
check error in an LPAR running with hash mmu.

What type of error are you injecting?


SLB multihit in kernel mode.




Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
NIP:  c0735d60 LR: c0318640 CTR: 
REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
MSR:  80001003   CR: 28008228  XER: 0001
CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
GPR08: 0023 c12ffe08  c00801460240
GPR12:  c0001ec9a900 c0002ac4bd00 
GPR16: 05a0 c008006b c008006b05a0 c0ff3068
GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
GPR24: c00801490108 c1636198 c00801470090 c00801470058
GPR28: 0510 c0080100 c0080819 0019
NIP [c0735d60] llist_add_batch+0x0/0x40
LR [c0318640] __irq_work_queue_local+0x70/0xc0
Call Trace:
[c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
[c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
[c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
handler")

Please explain in more detail why that commit caused this breakage.


After enabling translation in mce_handle_error() we used to leave it enabled to 
avoid
crashing here, but now with this commit we are restoring the MSR to disable 
translation.
Missed to mention it in commit log, I will add it.




diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..9d1e39d42e3e 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -249,6 +249,7 @@ void machine_check_queue_event(void)
  {
int index;
struct machine_check_event evt;
+   unsigned long msr;
  
  	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))

return;
@@ -262,8 +263,19 @@ void machine_check_queue_event(void)
memcpy(&local_paca->mce_info->mce_event_queue[index],
   &evt, sizeof(evt));
  
-	/* Queue irq work to process this event later. */

-   irq_work_queue(&mce_event_process_work);
+   /* Queue irq work to process this event later. Before
+* queuing the work enable translation for non radix LPAR,
+* as irq_work_queue may try to access memory outside RMO
+* region.
+*/
+   if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+   msr = mfmsr();
+   mtmsr(msr | MSR_IR | MSR_DR);
+   irq_work_queue(&mce_event_process_work);
+   mtmsr(msr);
+   } else {
+   irq_work_queue(&mce_event_process_work);
+   }
  }

We already went to virtual mode and queued (different) irq work in
arch/powerpc/platforms/pseries/ras.c:mce_handle_error()

We also called save_mce_event() which also might have queued irq work,
via machine_check_ue_event().

So it really feels like something about the design is wrong if we have
to go to virtual mode again and queue more irq work here.

I guess we can probably merge this as a backportable fix, doing anything
else would be a bigger change.


I agree.



Looking at ras.c there's the comment:

 * Enable translation as we will be accessing per-cpu variables
 * in save_mce_event() which may fall outside RMO region, also

But AFAICS it's only irq_work_queue() that touches anything percpu?


Yeah, we left the comment unchanged after doing some modifications around it,
It needs to be updated, ill send a separate patch for it.



So maybe we should just not be using irq_work_queue(). It's a pretty
thin wrapper around set_dec(1), perhaps we just need to hand-roll some
real-mode friendly way of doing that.


You mean, have separate queue and run the work from timer handler?



cheers


Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-06 Thread Michael Ellerman
Ganesh Goudar  writes:
> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
>
> So enable the translation before queuing the work.
>
> Without this change following trace is seen on injecting machine
> check error in an LPAR running with hash mmu.

What type of error are you injecting?

> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
> NIP:  c0735d60 LR: c0318640 CTR: 
> REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
> MSR:  80001003   CR: 28008228  XER: 0001
> CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
> GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
> GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
> GPR08: 0023 c12ffe08  c00801460240
> GPR12:  c0001ec9a900 c0002ac4bd00 
> GPR16: 05a0 c008006b c008006b05a0 c0ff3068
> GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
> GPR24: c00801490108 c1636198 c00801470090 c00801470058
> GPR28: 0510 c0080100 c0080819 0019
> NIP [c0735d60] llist_add_batch+0x0/0x40
> LR [c0318640] __irq_work_queue_local+0x70/0xc0
> Call Trace:
> [c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
> [c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
> [c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
> [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4
>
> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
> handler")

Please explain in more detail why that commit caused this breakage.

> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 47a683cd00d2..9d1e39d42e3e 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -249,6 +249,7 @@ void machine_check_queue_event(void)
>  {
>   int index;
>   struct machine_check_event evt;
> + unsigned long msr;
>  
>   if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
>   return;
> @@ -262,8 +263,19 @@ void machine_check_queue_event(void)
>   memcpy(&local_paca->mce_info->mce_event_queue[index],
>  &evt, sizeof(evt));
>  
> - /* Queue irq work to process this event later. */
> - irq_work_queue(&mce_event_process_work);
> + /* Queue irq work to process this event later. Before
> +  * queuing the work enable translation for non radix LPAR,
> +  * as irq_work_queue may try to access memory outside RMO
> +  * region.
> +  */
> + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
> + msr = mfmsr();
> + mtmsr(msr | MSR_IR | MSR_DR);
> + irq_work_queue(&mce_event_process_work);
> + mtmsr(msr);
> + } else {
> + irq_work_queue(&mce_event_process_work);
> + }
>  }

We already went to virtual mode and queued (different) irq work in
arch/powerpc/platforms/pseries/ras.c:mce_handle_error()

We also called save_mce_event() which also might have queued irq work,
via machine_check_ue_event().

So it really feels like something about the design is wrong if we have
to go to virtual mode again and queue more irq work here.

I guess we can probably merge this as a backportable fix, doing anything
else would be a bigger change.

Looking at ras.c there's the comment:

 * Enable translation as we will be accessing per-cpu variables
 * in save_mce_event() which may fall outside RMO region, also

But AFAICS it's only irq_work_queue() that touches anything percpu?

So maybe we should just not be using irq_work_queue(). It's a pretty
thin wrapper around set_dec(1), perhaps we just need to hand-roll some
real-mode friendly way of doing that.

cheers


[PATCH] powerpc/mce: Fix access error in mce handler

2021-09-06 Thread Ganesh Goudar
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
for an LPAR running with hash mmu else the kernel crashes.

So enable the translation before queuing the work.

Without this change following trace is seen on injecting machine
check error in an LPAR running with hash mmu.

Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0-mce+ #137
NIP:  c0735d60 LR: c0318640 CTR: 
REGS: c0001ebff9a0 TRAP: 0300   Tainted: G   OE  (5.14.0-mce+)
MSR:  80001003   CR: 28008228  XER: 0001
CFAR: c031863c DAR: c0027fa8fe08 DSISR: 4000 IRQMASK: 0
GPR00: c03186d0 c0001ebffc40 c1b0df00 c16337e8
GPR04: c16337e8 c0027fa8fe08 0023 c16337f0
GPR08: 0023 c12ffe08  c00801460240
GPR12:  c0001ec9a900 c0002ac4bd00 
GPR16: 05a0 c008006b c008006b05a0 c0ff3068
GPR20: c0002ac4bbc0 0001 c0002ac4bbc0 c00801490298
GPR24: c00801490108 c1636198 c00801470090 c00801470058
GPR28: 0510 c0080100 c0080819 0019
NIP [c0735d60] llist_add_batch+0x0/0x40
LR [c0318640] __irq_work_queue_local+0x70/0xc0
Call Trace:
[c0001ebffc40] [c0001ebffc0c] 0xc0001ebffc0c (unreliable)
[c0001ebffc60] [c03186d0] irq_work_queue+0x40/0x70
[c0001ebffc80] [c004425c] machine_check_queue_event+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4

Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from 
handler")
Signed-off-by: Ganesh Goudar 
---
 arch/powerpc/kernel/mce.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..9d1e39d42e3e 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -249,6 +249,7 @@ void machine_check_queue_event(void)
 {
int index;
struct machine_check_event evt;
+   unsigned long msr;
 
if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
return;
@@ -262,8 +263,19 @@ void machine_check_queue_event(void)
memcpy(&local_paca->mce_info->mce_event_queue[index],
   &evt, sizeof(evt));
 
-   /* Queue irq work to process this event later. */
-   irq_work_queue(&mce_event_process_work);
+   /* Queue irq work to process this event later. Before
+* queuing the work enable translation for non radix LPAR,
+* as irq_work_queue may try to access memory outside RMO
+* region.
+*/
+   if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+   msr = mfmsr();
+   mtmsr(msr | MSR_IR | MSR_DR);
+   irq_work_queue(&mce_event_process_work);
+   mtmsr(msr);
+   } else {
+   irq_work_queue(&mce_event_process_work);
+   }
 }
 
 void mce_common_process_ue(struct pt_regs *regs,
-- 
2.31.1