[PATCH v2] powerpc/eeh: Permanently disable the removed device

2024-04-22 Thread Ganesh Goudar
if the state is not moved to permanent failure state. Signed-off-by: Ganesh Goudar --- V2: * Elobrate the commit message. * Fix formatting issues in commit message and comments. --- arch/powerpc/kernel/eeh.c| 11 ++- arch/powerpc/kernel/eeh_driver.c | 13 +++-- 2 files changed

Re: [PATCH] powerpc/eeh: Permanently disable the removed device

2024-04-15 Thread Ganesh G R
On 4/9/24 14:37, Michael Ellerman wrote: Hi Ganesh, Ganesh Goudar writes: When a device is hot removed on powernv, the hotplug driver clears the device's state. However, on pseries, if a device is removed by phyp after reaching the error threshold, the kernel remains unaware, leading

[PATCH] powerpc/eeh: Permanently disable the removed device

2024-04-05 Thread Ganesh Goudar
failover. Permanently disable the device if the presence check fails. Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/eeh.c| 4 +++- arch/powerpc/kernel/eeh_driver.c | 8 +++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc

[PATCH 1/1] powerpc/eeh: Enable PHBs to recovery in parallel

2024-02-25 Thread Ganesh Goudar
. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/eeh_event.h | 7 + arch/powerpc/include/asm/pci-bridge.h | 4 +++ arch/powerpc/kernel/eeh_driver.c | 27 +-- arch/powerpc/kernel/eeh_event.c | 38 ++- arch/powerpc/kernel/eeh_pe.c

[PATCH 0/1] Parallel EEH recovery between PHBs

2024-02-25 Thread Ganesh Goudar
. On powernv the improvement is not so significant. Ganesh Goudar (1): powerpc/eeh: Enable PHBs to recovery in parallel arch/powerpc/include/asm/eeh_event.h | 7 + arch/powerpc/include/asm/pci-bridge.h | 4 +++ arch/powerpc/kernel/eeh_driver.c | 27 +-- arch/powerpc/kernel

[RFC PATCH v2 3/3] powerpc/eeh: Asynchronous recovery

2023-07-24 Thread Ganesh Goudar
the constraint, above, the driver handlers are called by traversing the tree of affected PEs from the top, stopping to call handlers (in parallel) when a PE with devices is discovered. When the calls for that PE are complete, traversal continues at each child PE. Signed-off-by: Ganesh Goudar --- arch

[RFC PATCH v2 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery

2023-07-24 Thread Ganesh Goudar
Based on the original work from Sam Bobroff. Give a unique ID to each recovery event, to ease log parsing and prepare for parallel recovery. Also add some new messages with a very simple format that may be useful to log-parsers. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm

[RFC PATCH v2 1/3] powerpc/eeh: Synchronization for safety

2023-07-24 Thread Ganesh Goudar
. Care must be taken when ordering these locks against the PCI rescan/remove lock and the device locks to avoid deadlocking. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/eeh.h | 12 +- arch/powerpc/kernel/eeh.c| 112 ++-- arch/powerpc/kernel

[RFC PATCH v2 0/3] Asynchronous EEH recovery

2023-07-24 Thread Ganesh Goudar
, Please comment. Thanks. V2: * Since we now have event list per phb, Have per phb event list lock. * Appropriate names given to the locks. * Remove stale comments (few more to be removed). * Initialize event_id to 0 instead of 1. * And some cosmetic changes. Ganesh Goudar (3): powerpc/eeh

Re: [RFC 0/3] Asynchronous EEH recovery

2023-07-17 Thread Ganesh G R
On 6/13/23 8:06 AM, Oliver O'Halloran wrote: On Tue, Jun 13, 2023 at 11:44 AM Ganesh Goudar wrote: Hi, EEH recovery is currently serialized and these patches shorten the time taken for EEH recovery by making the recovery to run in parallel. The original author of these patches is Sam

[RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery

2023-06-12 Thread Ganesh Goudar
Based on the original work from Sam Bobroff. Give a unique ID to each recovery event, to ease log parsing and prepare for parallel recovery. Also add some new messages with a very simple format that may be useful to log-parsers. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm

[RFC 3/3] powerpc/eeh: Asynchronous recovery

2023-06-12 Thread Ganesh Goudar
the constraint, above, the driver handlers are called by traversing the tree of affected PEs from the top, stopping to call handlers (in parallel) when a PE with devices is discovered. When the calls for that PE are complete, traversal continues at each child PE. Signed-off-by: Ganesh Goudar --- arch

[RFC 1/3] powerpc/eeh: Synchronization for safety

2023-06-12 Thread Ganesh Goudar
. Care must be taken when ordering these locks against the PCI rescan/remove lock and the device locks to avoid deadlocking. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/eeh.h | 6 +- arch/powerpc/kernel/eeh.c| 112 ++-- arch/powerpc/kernel

[RFC 0/3] Asynchronous EEH recovery

2023-06-12 Thread Ganesh Goudar
, Please comment. Thanks. Ganesh Goudar (3): powerpc/eeh: Synchronization for safety powerpc/eeh: Provide a unique ID for each EEH recovery powerpc/eeh: Asynchronous recovery arch/powerpc/include/asm/eeh.h | 7 +- arch/powerpc/include/asm/eeh_event.h | 10

[PATCH] powerpc/eeh: Set channel state after notifying the drivers

2023-02-09 Thread Ganesh Goudar
ermanent failure after notifying the drivers. Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices non-actionable") Suggested-by: Mahesh Salgaonkar Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/eeh_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

Re: [PATCH v2] powerpc/mce: log the error for all unrecoverable errors

2023-02-01 Thread Ganesh G R
On 1/31/23 4:59 PM, Michael Ellerman wrote: Ganesh Goudar writes: For all unrecoverable errors we are missing to log the error, Since machine_check_log_err() is not getting called for unrecoverable errors. Raise irq work in save_mce_event() for unrecoverable errors, So that we log the error

[PATCH v3] powerpc/mce: log the error for all unrecoverable errors

2023-02-01 Thread Ganesh Goudar
NIP: [1e48] MCE: CPU24: Initiator CPU MCE: CPU24: Unknown RTAS: event: 5, Type: Platform Error (224), Severity: 3 Signed-off-by: Ganesh Goudar Reviewed-by: Mahesh Salgaonkar --- V3: Rephrasing the commit message. --- arch/powerpc/kernel/mce.c | 10 +++--- 1 file changed, 7

[PATCH v2] powerpc/mce: log the error for all unrecoverable errors

2023-01-27 Thread Ganesh Goudar
/Store (foreign/control memory) [Not recovered] MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [1e48] MCE: CPU24: Initiator CPU MCE: CPU24: Unknown RTAS: event: 5, Type: Platform Error (224), Severity: 3 Signed-off-by: Ganesh Goudar Reviewed-by: Mahesh Salgaonkar --- V2

[PATCH] powerpc/mce: log the error for all unrecoverable errors

2022-11-13 Thread Ganesh Goudar
machine_check_log_err() is not getting called for all unrecoverable errors, And we are missing to log the error. Raise irq work in save_mce_event() for unrecoverable errors, So that we log the error from MCE event handling block in timer handler. Signed-off-by: Ganesh Goudar --- arch/powerpc

[PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-26 Thread Ganesh Goudar
KASAN instrumentation. Signed-off-by: Ganesh Goudar --- v2: Force inline few more functions. v3: Adding noinstr to few functions instead of __always_inline. --- arch/powerpc/include/asm/hw_irq.h| 8 arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/rtas.h | 4

Re: [PACTH v2] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-19 Thread Ganesh
On 9/7/22 09:49, Nicholas Piggin wrote: On Mon Sep 5, 2022 at 4:38 PM AEST, Ganesh Goudar wrote: Part of machine check error handling is done in realmode, As of now instrumentation is not possible for any code that runs in realmode. When MCE is injected on KASAN enabled kernel, crash

Re: [RFC 0/3] Asynchronous EEH recovery

2022-09-15 Thread Ganesh
On 9/2/22 05:49, Jason Gunthorpe wrote: On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote: Hi, EEH reocvery is currently serialized and these patches shorten the time taken for EEH recovery by making the recovery to run in parallel. The original author of these patches is Sam

[PACTH v2] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-05 Thread Ganesh Goudar
KASAN instrumentation. Signed-off-by: Ganesh Goudar --- v2: Force inline few more functions. --- arch/powerpc/include/asm/hw_irq.h| 8 arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/rtas.h | 4 ++-- arch/powerpc/kernel/rtas.c | 4 ++-- 4 files

[PATCH] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-08-29 Thread Ganesh Goudar
KASAN instrumentation. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/rtas.h | 4 ++-- arch/powerpc/kernel/rtas.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b

Re: [6.0-rc1] Kernel crash while running MCE tests

2022-08-22 Thread Ganesh
On 8/22/22 11:01, Sachin Sant wrote: On 19-Aug-2022, at 10:12 AM, Ganesh wrote We'll have to make sure everything get_pseries_errorlog() is either forced inline, or marked noinstr. Making the following functions always_inline and noinstr is fixing the issue. __always_inline

Re: [6.0-rc1] Kernel crash while running MCE tests

2022-08-22 Thread Ganesh
On 8/22/22 11:19, Michael Ellerman wrote: So I guess the compiler has decided not to inline it (why?!), and it is not marked noinstr, so it gets KASAN instrumentation which crashes in real mode. We'll have to make sure everything get_pseries_errorlog() is either forced inline, or marked

Re: [6.0-rc1] Kernel crash while running MCE tests

2022-08-18 Thread Ganesh
On 8/17/22 11:28, Michael Ellerman wrote: Sachin Sant writes: Following crash is seen while running powerpc/mce subtest on a Power10 LPAR. 1..1 # selftests: powerpc/mce: inject-ra-err [ 155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503 [ 155.240618]

[RFC 3/3] powerpc/eeh: Asynchronous recovery

2022-08-15 Thread Ganesh Goudar
the constraint, above, the driver handlers are called by traversing the tree of affected PEs from the top, stopping to call handlers (in parallel) when a PE with devices is discovered. When the calls for that PE are complete, traversal continues at each child PE. Signed-off-by: Ganesh Goudar --- arch

[RFC 1/3] powerpc/eeh: Synchronization for safety

2022-08-15 Thread Ganesh Goudar
. Care must be taken when ordering these locks against the PCI rescan/remove lock and the device locks to avoid deadlocking. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/eeh.h | 6 +- arch/powerpc/kernel/eeh.c| 112 ++-- arch/powerpc/kernel

[RFC 0/3] Asynchronous EEH recovery

2022-08-15 Thread Ganesh Goudar
in time taken in EEH recovery, Yet to be tested on powernv. These patches were originally posted as separate RFCs, I think posting them as single series would be more helpful, I know the patches are too big, I will try to logically divide in next iterations. Thanks Ganesh Goudar (3): powerpc

[RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery

2022-08-15 Thread Ganesh Goudar
Based on the original work from Sam Bobroff. Give a unique ID to each recovery event, to ease log parsing and prepare for parallel recovery. Also add some new messages with a very simple format that may be useful to log-parsers. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm

Re: [PATCH v3 RESEND 1/3] powerpc/pseries: Parse control memory access error

2022-02-21 Thread Ganesh
On 1/7/22 19:44, Ganesh Goudar wrote: Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 36 1 file changed

[PATCH v5] powerpc/mce: Avoid using irq_work_queue() in realmode

2022-01-20 Thread Ganesh Goudar
in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar --- V2: * Use arch_irq_work_raise to raise decrementer interrupt. * Avoid having atomic variable. V3: * Fix build error. Reported by kernel test bot

Re: [PATCH v3 2/2] pseries/mce: Refactor the pseries mce handling code

2022-01-17 Thread Ganesh
On 11/24/21 18:40, Nicholas Piggin wrote: Excerpts from Ganesh Goudar's message of November 24, 2021 7:55 pm: Now that we are no longer switching on the mmu in realmode mce handler, Revert the commit 4ff753feab02("powerpc/pseries: Avoid using addr_to_pfn in real mode") partia

Re: [PATCH v3 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2022-01-17 Thread Ganesh
On 11/24/21 18:33, Nicholas Piggin wrote: Excerpts from Ganesh Goudar's message of November 24, 2021 7:54 pm: In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside

[PATCH v4] powerpc/mce: Avoid using irq_work_queue() in realmode

2022-01-17 Thread Ganesh Goudar
in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar --- V2: * Use arch_irq_work_raise to raise decrementer interrupt. * Avoid having atomic variable. V3: * Fix build error. Reported by kernel test bot

[PATCH v3 RESEND 3/3] powerpc/mce: Modify the real address error logging messages

2022-01-07 Thread Ganesh Goudar
s space. Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index fd829f7f25a4..55ccc651d1b0 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.

[PATCH v3 RESEND 1/3] powerpc/pseries: Parse control memory access error

2022-01-07 Thread Ganesh Goudar
Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 36 1 file changed, 32 insertions(+), 4 deletions(-) diff --git

[PATCH v3 RESEND 2/3] selftests/powerpc: Add test for real address error handling

2022-01-07 Thread Ganesh Goudar
receives SIGBUS. Signed-off-by: Ganesh Goudar --- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/mce/Makefile | 7 ++ .../selftests/powerpc/mce/inject-ra-err.c | 65 +++ tools/testing/selftests/powerpc/mce/vas-api.h | 1 + 4 files changed

[PATCH v3 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-24 Thread Ganesh Goudar
in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar --- V2: * Use arch_irq_work_raise to raise decrementer interrupt. * Avoid having atomic variable. V3: * Fix build error. Reported by kernel test bot

[PATCH v3 2/2] pseries/mce: Refactor the pseries mce handling code

2021-11-24 Thread Ganesh Goudar
to enabled. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 122 +++ 1 file changed, 49 insertions(+), 73 deletions(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 8613f9cc5798..62e1519b8355 100644

[PATCH v2 2/2] pseries/mce: Refactor the pseries mce handling code

2021-11-23 Thread Ganesh Goudar
to enabled. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 122 +++ 1 file changed, 49 insertions(+), 73 deletions(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 8613f9cc5798..62e1519b8355 100644

[PATCH v2 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-23 Thread Ganesh Goudar
in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar --- V2: * Use arch_irq_work_raise to raise decrementer interrupt. * Avoid having atomic variable. --- arch/powerpc/include/asm/machdep.h | 2

Re: [PATCH 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-18 Thread Ganesh
On 11/8/21 19:49, Nicholas Piggin wrote: Excerpts from Ganesh Goudar's message of November 8, 2021 6:38 pm: In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside

Re: [PATCH 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-18 Thread Ganesh
ch 2/2, refactors this. - - /* -* Queue irq work to log this rtas event later. -* irq_work_queue uses per-cpu variables, so do this in virt -* mode as well. -*/ - irq_work_queue(_errlog_process_work); - - mtmsr(msr); - return disposition; } Thanks for the review :) . Ganesh

Re: [PATCH v3 1/3] powerpc/pseries: Parse control memory access error

2021-11-08 Thread Ganesh
On 9/6/21 14:13, Ganesh Goudar wrote: Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar --- v3: Modify the commit log to mention the document according to which changes are made

[PATCH 2/2] pseries/mce: Refactor the pseries mce handling code

2021-11-08 Thread Ganesh Goudar
to enabled. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 122 +++ 1 file changed, 49 insertions(+), 73 deletions(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 8613f9cc5798..62e1519b8355 100644

[PATCH 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-08 Thread Ganesh Goudar
in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/machdep.h | 2 + arch/powerpc/include/asm/mce.h | 2 + arch/powerpc/include/asm/paca.h | 1

Re: [PATCH v1] powerpc/64s: Fix unrecoverable MCE crash

2021-09-23 Thread Ganesh
On 9/22/21 7:32 AM, Nicholas Piggin wrote: The machine check handler is not considered NMI on 64s. The early handler is the true NMI handler, and then it schedules the machine_check_exception handler to run when interrupts are enabled. This works fine except the case of an unrecoverable MCE,

Re: [PATCH] powerpc/mce: check if event info is valid

2021-09-17 Thread Ganesh
On 8/6/21 6:53 PM, Ganesh Goudar wrote: Check if the event info is valid before printing the event information. When a fwnmi enabled nested kvm guest hits a machine check exception L0 and L2 would generate machine check event info, But L1 would not generate any machine check event info

Re: [PATCH v2] powerpc/mce: Fix access error in mce handler

2021-09-17 Thread Ganesh
On 9/17/21 12:09 PM, Daniel Axtens wrote: Hi Ganesh, We queue an irq work for deferred processing of mce event in realmode mce handler, where translation is disabled. Queuing of the work may result in accessing memory outside RMO region, such access needs the translation to be enabled

[PATCH v2] powerpc/mce: Fix access error in mce handler

2021-09-09 Thread Ganesh Goudar
+0xbc/0xd0 [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4 Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler") Signed-off-by: Ganesh Goudar --- v2: Change in commit message. --- arch/powerpc/kernel/mce.c | 16 ++-- 1 file ch

Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-08 Thread Ganesh
On 9/8/21 11:10 AM, Michael Ellerman wrote: Ganesh writes: On 9/6/21 6:03 PM, Michael Ellerman wrote: Ganesh Goudar writes Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0

Re: [PATCH] powerpc/mce: Fix access error in mce handler

2021-09-07 Thread Ganesh
On 9/6/21 6:03 PM, Michael Ellerman wrote: Ganesh Goudar writes: We queue an irq work for deferred processing of mce event in realmode mce handler, where translation is disabled. Queuing of the work may result in accessing memory outside RMO region, such access needs the translation

[PATCH v3 3/3] powerpc/mce: Modify the real address error logging messages

2021-09-06 Thread Ganesh Goudar
s space. Signed-off-by: Ganesh Goudar --- v3: No changes. v2: No changes. --- arch/powerpc/kernel/mce.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 9d1e39d42e3e..5baf69503349 100644 --- a/arch/powerpc/ker

[PATCH v3 2/3] selftests/powerpc: Add test for real address error handling

2021-09-06 Thread Ganesh Goudar
receives SIGBUS. Signed-off-by: Ganesh Goudar --- v3: Avoid using shell script to inject error. v2: Fix build error. --- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/mce/Makefile | 7 ++ .../selftests/powerpc/mce/inject-ra-err.c | 65

[PATCH v3 1/3] powerpc/pseries: Parse control memory access error

2021-09-06 Thread Ganesh Goudar
Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar --- v3: Modify the commit log to mention the document according to which changes are made. Define and use a macro to check

[PATCH] powerpc/mce: Fix access error in mce handler

2021-09-06 Thread Ganesh Goudar
] machine_check_queue_event+0xbc/0xd0 [c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4 Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler") Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 16 ++-- 1 file c

Re: [PATCH v2 2/3] selftests/powerpc: Add test for real address error handling

2021-08-26 Thread Ganesh
On 8/26/21 8:57 AM, Michael Ellerman wrote: Ganesh writes: On 8/24/21 6:18 PM, Michael Ellerman wrote: Ganesh Goudar writes: Add test for real address or control memory address access error handling, using NX-GZIP engine. The error is injected by accessing the control memory address

Re: [PATCH v2 1/3] powerpc/pseries: Parse control memory access error

2021-08-25 Thread Ganesh
On 8/25/21 2:54 AM, Segher Boessenkool wrote: On Tue, Aug 24, 2021 at 04:39:57PM +1000, Michael Ellerman wrote: + case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK: + mce_err.u.ra_error_type = +

Re: [PATCH v2 2/3] selftests/powerpc: Add test for real address error handling

2021-08-25 Thread Ganesh
On 8/24/21 6:18 PM, Michael Ellerman wrote: Ganesh Goudar writes: Add test for real address or control memory address access error handling, using NX-GZIP engine. The error is injected by accessing the control memory address using illegal instruction, on successful handling the process

Re: [PATCH v2 1/3] powerpc/pseries: Parse control memory access error

2021-08-25 Thread Ganesh
On 8/24/21 12:09 PM, Michael Ellerman wrote: Hi Ganesh, Some comments below ... Ganesh Goudar writes: Add support to parse and log control memory access error for pseries. Signed-off-by: Ganesh Goudar --- v2: No changes in this patch. --- arch/powerpc/platforms/pseries/ras.c | 21

Re: [PATCH v2 1/3] powerpc/pseries: Parse control memory access error

2021-08-23 Thread Ganesh
Hi mpe, Any comments on this patchset? On 8/5/21 2:50 PM, Ganesh Goudar wrote: Add support to parse and log control memory access error for pseries. Signed-off-by: Ganesh Goudar --- v2: No changes in this patch. --- arch/powerpc/platforms/pseries/ras.c | 21 + 1 file

[PATCH] powerpc/mce: check if event info is valid

2021-08-06 Thread Ganesh Goudar
structure will be empty in L1. "Machine Check Exception, Unknown event version 0". Signed-off-by: Ganesh Goudar --- arch/powerpc/include/asm/mce.h | 2 +- arch/powerpc/kernel/mce.c | 7 +-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mc

[PATCH v2 3/3] powerpc/mce: Modify the real address error logging messages

2021-08-05 Thread Ganesh Goudar
s space. Signed-off-by: Ganesh Goudar --- v2: No changes in this patch. --- arch/powerpc/kernel/mce.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 47a683cd00d2..f3ef480bb739 100644 --- a/arch/powerpc/ker

[PATCH v2 2/3] selftests/powerpc: Add test for real address error handling

2021-08-05 Thread Ganesh Goudar
receives SIGBUS. Signed-off-by: Ganesh Goudar --- v2: Fix build error. --- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/mce/Makefile | 6 +++ .../selftests/powerpc/mce/inject-ra-err.c | 42 +++ .../selftests/powerpc/mce/inject-ra-err.sh

[PATCH v2 1/3] powerpc/pseries: Parse control memory access error

2021-08-05 Thread Ganesh Goudar
Add support to parse and log control memory access error for pseries. Signed-off-by: Ganesh Goudar --- v2: No changes in this patch. --- arch/powerpc/platforms/pseries/ras.c | 21 + 1 file changed, 21 insertions(+) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch

[PATCH 1/3] powerpc/pseries: Parse control memory access error

2021-07-30 Thread Ganesh Goudar
Add support to parse and log control memory access error for pseries. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 21 + 1 file changed, 21 insertions(+) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c

[PATCH 2/3] selftests/powerpc: Add test for real address error handling

2021-07-30 Thread Ganesh Goudar
receives SIGBUS. Signed-off-by: Ganesh Goudar --- tools/testing/selftests/powerpc/Makefile | 3 +- tools/testing/selftests/powerpc/mce/Makefile | 6 +++ .../selftests/powerpc/mce/inject-ra-err.c | 42 +++ .../selftests/powerpc/mce/inject-ra-err.sh| 19 + 4 files

[PATCH 3/3] powerpc/mce: Modify the real address error logging messages

2021-07-30 Thread Ganesh Goudar
s space. Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 47a683cd00d2..f3ef480bb739 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.

Re: [PATCH] powerpc/mce: save ignore_event flag unconditionally for UE

2021-04-22 Thread Ganesh
On 4/22/21 11:31 AM, Ganesh wrote: On 4/7/21 10:28 AM, Ganesh Goudar wrote: When we hit an UE while using machine check safe copy routines, ignore_event flag is set and the event is ignored by mce handler, And the flag is also saved for defered handling and printing of mce event information

Re: [PATCH] powerpc/mce: save ignore_event flag unconditionally for UE

2021-04-22 Thread Ganesh
On 4/7/21 10:28 AM, Ganesh Goudar wrote: When we hit an UE while using machine check safe copy routines, ignore_event flag is set and the event is ignored by mce handler, And the flag is also saved for defered handling and printing of mce event information, But as of now saving of this flag

Re: [PATCH] powerpc/mce: save ignore_event flag unconditionally for UE

2021-04-20 Thread Ganesh
On 4/20/21 12:54 PM, Santosh Sivaraj wrote: Hi Ganesh, Ganesh Goudar writes: When we hit an UE while using machine check safe copy routines, ignore_event flag is set and the event is ignored by mce handler, And the flag is also saved for defered handling and printing of mce event

Re: [PATCH] powerpc/pseries/mce: Fix a typo in error type assignment

2021-04-19 Thread Ganesh
On 4/17/21 6:06 PM, Michael Ellerman wrote: Ganesh Goudar writes: The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE. Do you mean "is ICACHE not DCACHE" ? Right :), Should I send v2 ? cheers Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries

[PATCH] powerpc/pseries/mce: Fix a typo in error type assignment

2021-04-16 Thread Ganesh Goudar
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE. Signed-off-by: Ganesh Goudar --- arch/powerpc/platforms/pseries/ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index

[PATCH] powerpc/mce: save ignore_event flag unconditionally for UE

2021-04-06 Thread Ganesh Goudar
] memcpy+0x88/0x90 [ 512.972456] MCE: CPU1: Initiator CPU [ 512.972534] MCE: CPU1: Unknown Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 11f0cae086ed

[PATCH v5 2/2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-28 Thread Ganesh Goudar
on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Signed-off-by: Ganesh Goudar --- v2: Dynamically allocate memory for machine check event info. v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid

[PATCH v5 1/2] powerpc/mce: Reduce the size of event arrays

2021-01-28 Thread Ganesh Goudar
Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. This saves us ~19kB of memory and has no fatal consequences. Signed-off-by: Ganesh Goudar --- v4: This patch is a fragment of the orignal patch which is split into two. v5

Re: [PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-28 Thread Ganesh
On 1/25/21 2:54 PM, Christophe Leroy wrote: Le 22/01/2021 à 13:32, Ganesh Goudar a écrit : Access to per-cpu variables requires translation to be enabled on pseries machine running in hash mmu mode, Since part of MCE handler runs in realmode and part of MCE handling code is shared between

[PATCH v4 2/2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-22 Thread Ganesh Goudar
on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Signed-off-by: Ganesh Goudar --- v2: Dynamically allocate memory for machine check event info v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid

[PATCH v4 1/2] powerpc/mce: Reduce the size of event arrays

2021-01-22 Thread Ganesh Goudar
Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. This saves us ~19kB of memory and has no fatal consequences. Signed-off-by: Ganesh Goudar --- v4: This patch is a fragment of the orignal patch which is split into two

Re: [PATCH v3] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-21 Thread Ganesh
On 1/19/21 9:28 AM, Nicholas Piggin wrote: Excerpts from Ganesh Goudar's message of January 15, 2021 10:58 pm: Access to per-cpu variables requires translation to be enabled on pseries machine running in hash mmu mode, Since part of MCE handler runs in realmode and part of MCE handling code

[PATCH v3] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-15 Thread Ganesh Goudar
on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. Signed-off-by: Ganesh Goudar --- v2: Dynamically

[PATCH v2] powerpc/mce: Remove per cpu variables from MCE handlers

2021-01-07 Thread Ganesh Goudar
on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. Signed-off-by: Ganesh Goudar --- v2: Dynamically

Re: [PATCH] powerpc/mce: Remove per cpu variables from MCE handlers

2020-12-08 Thread Ganesh
On 12/8/20 4:01 PM, Michael Ellerman wrote: Ganesh Goudar writes: diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 9454d29ff4b4..4769954efa7d 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -273,6 +274,17 @@ struct

[PATCH] powerpc/mce: Remove per cpu variables from MCE handlers

2020-12-04 Thread Ganesh Goudar
on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. Signed-off-by: Ganesh Goudar --- arch/powerpc/include

[PATCH v5] lkdtm/powerpc: Add SLB multihit test

2020-11-30 Thread Ganesh Goudar
To check machine check handling, add support to inject slb multihit errors. Cc: Kees Cook Cc: Michal Suchánek Co-developed-by: Mahesh Salgaonkar Signed-off-by: Mahesh Salgaonkar Signed-off-by: Ganesh Goudar --- v5: - Insert entries at SLB_NUM_BOLTED and SLB_NUM_BOLTED +1, remove index

Re: [PATCH v4 2/2] lkdtm/powerpc: Add SLB multihit test

2020-11-26 Thread Ganesh
On 10/19/20 6:45 PM, Michal Suchánek wrote: On Mon, Oct 19, 2020 at 09:59:57PM +1100, Michael Ellerman wrote: Hi Ganesh, Some comments below ... Ganesh Goudar writes: To check machine check handling, add support to inject slb multihit errors. Cc: Kees Cook Reviewed-by: Michal Suchánek

Re: [PATCH v4] powerpc/pseries: Avoid using addr_to_pfn in real mode

2020-10-20 Thread Ganesh
On 7/24/20 12:09 PM, Ganesh Goudar wrote: When an UE or memory error exception is encountered the MCE handler tries to find the pfn using addr_to_pfn() which takes effective address as an argument, later pfn is used to poison the page where memory error occurred, recent rework in this area made

Re: [PATCH v4 0/2] powerpc/mce: Fix mce handler and add selftest

2020-10-18 Thread Ganesh
On 10/16/20 5:02 PM, Michael Ellerman wrote: On Fri, 9 Oct 2020 12:10:03 +0530, Ganesh Goudar wrote: This patch series fixes mce handling for pseries, Adds LKDTM test for SLB multihit recovery and enables selftest for the same, basically to test MCE handling on pseries/powernv machines running

[PATCH v4 2/2] lkdtm/powerpc: Add SLB multihit test

2020-10-09 Thread Ganesh Goudar
To check machine check handling, add support to inject slb multihit errors. Cc: Kees Cook Reviewed-by: Michal Suchánek Co-developed-by: Mahesh Salgaonkar Signed-off-by: Mahesh Salgaonkar Signed-off-by: Ganesh Goudar --- drivers/misc/lkdtm/Makefile | 1 + drivers/misc/lkdtm

[PATCH v4 1/2] powerpc/mce: remove nmi_enter/exit from real mode handler

2020-10-09 Thread Ganesh Goudar
on pseries machine running in hash mmu mode. Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/mc

[PATCH v4 0/2] powerpc/mce: Fix mce handler and add selftest

2020-10-09 Thread Ganesh Goudar
, as nesting is supported. * Fix build errors and remove unused variables. * Integrate error injection code into LKDTM. * Add support to inject multihit in paca. Ganesh Goudar (2): powerpc/mce: remove nmi_enter/exit from real mode handler lkdtm/powerpc: Add SLB multihit test arch/powerpc/kernel

Re: [PATCH v3 1/2] powerpc/mce: remove nmi_enter/exit from real mode handler

2020-10-09 Thread Ganesh
On 10/1/20 11:21 PM, Ganesh Goudar wrote: Use of nmi_enter/exit in real mode handler causes the kernel to panic and reboot on injecting slb mutihit on pseries machine running in hash mmu mode, As these calls try to accesses memory outside RMO region in real mode handler where translation

[PATCH v3 2/2] lkdtm/powerpc: Add SLB multihit test

2020-10-01 Thread Ganesh Goudar
To check machine check handling, add support to inject slb multihit errors. Reviewed-by: Michal Suchánek Co-developed-by: Mahesh Salgaonkar Signed-off-by: Mahesh Salgaonkar Signed-off-by: Ganesh Goudar --- drivers/misc/lkdtm/Makefile | 1 + drivers/misc/lkdtm/core.c

[PATCH v3 1/2] powerpc/mce: remove nmi_enter/exit from real mode handler

2020-10-01 Thread Ganesh Goudar
on pseries machine running in hash mmu mode. Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") Signed-off-by: Ganesh Goudar --- arch/powerpc/kernel/mce.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kern

[PATCH v3 0/2] powerpc/mce: Fix mce handler and add selftest

2020-10-01 Thread Ganesh Goudar
support to inject multihit in paca. Ganesh Goudar (2): powerpc/mce: remove nmi_enter/exit from real mode handler lkdtm/powerpc: Add SLB multihit test arch/powerpc/kernel/mce.c | 10 +- drivers/misc/lkdtm/Makefile | 1 + drivers/misc/lkdtm/core.c | 3

Re: [PATCH v2 3/3] selftests/lkdtm: Enable selftest for SLB multihit

2020-09-28 Thread Ganesh
On 9/26/20 1:29 AM, Kees Cook wrote: On Fri, Sep 25, 2020 at 04:01:23PM +0530, Ganesh Goudar wrote: Add PPC_SLB_MULTIHIT to lkdtm selftest framework. Signed-off-by: Ganesh Goudar --- tools/testing/selftests/lkdtm/tests.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing

Re: [PATCH v2 2/3] lkdtm/powerpc: Add SLB multihit test

2020-09-28 Thread Ganesh
On 9/26/20 1:27 AM, Kees Cook wrote: On Fri, Sep 25, 2020 at 04:01:22PM +0530, Ganesh Goudar wrote: Add support to inject slb multihit errors, to test machine check handling. Thank you for more tests in here! Based on work by Mahesh Salgaonkar and Michal Suchánek. Cc: Mahesh Salgaonkar

[PATCH v2 0/3] powerpc/mce: Fix mce handler and add selftest

2020-09-25 Thread Ganesh Goudar
. * Fix build errors and remove unused variables. * Integrate error injection code into LKDTM. * Add support to inject multihit in paca. Ganesh Goudar (3): powerpc/mce: remove nmi_enter/exit from real mode handler lkdtm/powerpc: Add SLB multihit test selftests/lkdtm: Enable selftest for SLB

  1   2   >