if the state is not moved to permanent failure state.
Signed-off-by: Ganesh Goudar
---
V2:
* Elobrate the commit message.
* Fix formatting issues in commit message and comments.
---
arch/powerpc/kernel/eeh.c| 11 ++-
arch/powerpc/kernel/eeh_driver.c | 13 +++--
2 files changed
On 4/9/24 14:37, Michael Ellerman wrote:
Hi Ganesh,
Ganesh Goudar writes:
When a device is hot removed on powernv, the hotplug
driver clears the device's state. However, on pseries,
if a device is removed by phyp after reaching the error
threshold, the kernel remains unaware, leading
failover.
Permanently disable the device if the presence check
fails.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/eeh.c| 4 +++-
arch/powerpc/kernel/eeh_driver.c | 8 +++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc
.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh_event.h | 7 +
arch/powerpc/include/asm/pci-bridge.h | 4 +++
arch/powerpc/kernel/eeh_driver.c | 27 +--
arch/powerpc/kernel/eeh_event.c | 38 ++-
arch/powerpc/kernel/eeh_pe.c
. On powernv the improvement
is not so significant.
Ganesh Goudar (1):
powerpc/eeh: Enable PHBs to recovery in parallel
arch/powerpc/include/asm/eeh_event.h | 7 +
arch/powerpc/include/asm/pci-bridge.h | 4 +++
arch/powerpc/kernel/eeh_driver.c | 27 +--
arch/powerpc/kernel
the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.
Signed-off-by: Ganesh Goudar
---
arch
Based on the original work from Sam Bobroff.
Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.
Also add some new messages with a very simple format that may
be useful to log-parsers.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh.h | 12 +-
arch/powerpc/kernel/eeh.c| 112 ++--
arch/powerpc/kernel
, Please comment.
Thanks.
V2:
* Since we now have event list per phb, Have per phb event list lock.
* Appropriate names given to the locks.
* Remove stale comments (few more to be removed).
* Initialize event_id to 0 instead of 1.
* And some cosmetic changes.
Ganesh Goudar (3):
powerpc/eeh
On 6/13/23 8:06 AM, Oliver O'Halloran wrote:
On Tue, Jun 13, 2023 at 11:44 AM Ganesh Goudar wrote:
Hi,
EEH recovery is currently serialized and these patches shorten
the time taken for EEH recovery by making the recovery to run
in parallel. The original author of these patches is Sam
Based on the original work from Sam Bobroff.
Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.
Also add some new messages with a very simple format that may
be useful to log-parsers.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.
Signed-off-by: Ganesh Goudar
---
arch
. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh.h | 6 +-
arch/powerpc/kernel/eeh.c| 112 ++--
arch/powerpc/kernel
, Please comment.
Thanks.
Ganesh Goudar (3):
powerpc/eeh: Synchronization for safety
powerpc/eeh: Provide a unique ID for each EEH recovery
powerpc/eeh: Asynchronous recovery
arch/powerpc/include/asm/eeh.h | 7 +-
arch/powerpc/include/asm/eeh_event.h | 10
ermanent failure after
notifying the drivers.
Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices
non-actionable")
Suggested-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/eeh_driver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
On 1/31/23 4:59 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
For all unrecoverable errors we are missing to log the
error, Since machine_check_log_err() is not getting called
for unrecoverable errors.
Raise irq work in save_mce_event() for unrecoverable errors,
So that we log the error
NIP: [1e48]
MCE: CPU24: Initiator CPU
MCE: CPU24: Unknown
RTAS: event: 5, Type: Platform Error (224), Severity: 3
Signed-off-by: Ganesh Goudar
Reviewed-by: Mahesh Salgaonkar
---
V3: Rephrasing the commit message.
---
arch/powerpc/kernel/mce.c | 10 +++---
1 file changed, 7
/Store (foreign/control
memory) [Not recovered]
MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [1e48]
MCE: CPU24: Initiator CPU
MCE: CPU24: Unknown
RTAS: event: 5, Type: Platform Error (224), Severity: 3
Signed-off-by: Ganesh Goudar
Reviewed-by: Mahesh Salgaonkar
---
V2
machine_check_log_err() is not getting called for all
unrecoverable errors, And we are missing to log the error.
Raise irq work in save_mce_event() for unrecoverable errors,
So that we log the error from MCE event handling block in
timer handler.
Signed-off-by: Ganesh Goudar
---
arch/powerpc
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
v2: Force inline few more functions.
v3: Adding noinstr to few functions instead of __always_inline.
---
arch/powerpc/include/asm/hw_irq.h| 8
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4
On 9/7/22 09:49, Nicholas Piggin wrote:
On Mon Sep 5, 2022 at 4:38 PM AEST, Ganesh Goudar wrote:
Part of machine check error handling is done in realmode,
As of now instrumentation is not possible for any code that
runs in realmode.
When MCE is injected on KASAN enabled kernel, crash
On 9/2/22 05:49, Jason Gunthorpe wrote:
On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
Hi,
EEH reocvery is currently serialized and these patches shorten
the time taken for EEH recovery by making the recovery to run
in parallel. The original author of these patches is Sam
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
v2: Force inline few more functions.
---
arch/powerpc/include/asm/hw_irq.h| 8
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4 ++--
arch/powerpc/kernel/rtas.c | 4 ++--
4 files
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4 ++--
arch/powerpc/kernel/rtas.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h
b
On 8/22/22 11:01, Sachin Sant wrote:
On 19-Aug-2022, at 10:12 AM, Ganesh wrote
We'll have to make sure everything get_pseries_errorlog() is either
forced inline, or marked noinstr.
Making the following functions always_inline and noinstr is fixing the issue.
__always_inline
On 8/22/22 11:19, Michael Ellerman wrote:
So I guess the compiler has decided not to inline it (why?!), and it is
not marked noinstr, so it gets KASAN instrumentation which crashes in
real mode.
We'll have to make sure everything get_pseries_errorlog() is either
forced inline, or marked
On 8/17/22 11:28, Michael Ellerman wrote:
Sachin Sant writes:
Following crash is seen while running powerpc/mce subtest on
a Power10 LPAR.
1..1
# selftests: powerpc/mce: inject-ra-err
[ 155.240591] BUG: Unable to handle kernel data access on read at
0xc00e00022d55b503
[ 155.240618]
the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.
Signed-off-by: Ganesh Goudar
---
arch
. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh.h | 6 +-
arch/powerpc/kernel/eeh.c| 112 ++--
arch/powerpc/kernel
in time taken in EEH recovery, Yet to be tested on powernv.
These patches were originally posted as separate RFCs, I think
posting them as single series would be more helpful, I know the
patches are too big, I will try to logically divide in next
iterations.
Thanks
Ganesh Goudar (3):
powerpc
Based on the original work from Sam Bobroff.
Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.
Also add some new messages with a very simple format that may
be useful to log-parsers.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
On 1/7/22 19:44, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 36
1 file changed
in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar
---
V2:
* Use arch_irq_work_raise to raise decrementer interrupt.
* Avoid having atomic variable.
V3:
* Fix build error.
Reported by kernel test bot
On 11/24/21 18:40, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 24, 2021 7:55 pm:
Now that we are no longer switching on the mmu in realmode
mce handler, Revert the commit 4ff753feab02("powerpc/pseries:
Avoid using addr_to_pfn in real mode") partia
On 11/24/21 18:33, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 24, 2021 7:54 pm:
In realmode mce handler we use irq_work_queue() to defer
the processing of mce events, irq_work_queue() can only
be called when translation is enabled because it touches
memory outside
in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar
---
V2:
* Use arch_irq_work_raise to raise decrementer interrupt.
* Avoid having atomic variable.
V3:
* Fix build error.
Reported by kernel test bot
s space.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index fd829f7f25a4..55ccc651d1b0 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 36
1 file changed, 32 insertions(+), 4 deletions(-)
diff --git
receives SIGBUS.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/mce/Makefile | 7 ++
.../selftests/powerpc/mce/inject-ra-err.c | 65 +++
tools/testing/selftests/powerpc/mce/vas-api.h | 1 +
4 files changed
in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar
---
V2:
* Use arch_irq_work_raise to raise decrementer interrupt.
* Avoid having atomic variable.
V3:
* Fix build error.
Reported by kernel test bot
to enabled.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 122 +++
1 file changed, 49 insertions(+), 73 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index 8613f9cc5798..62e1519b8355 100644
to enabled.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 122 +++
1 file changed, 49 insertions(+), 73 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index 8613f9cc5798..62e1519b8355 100644
in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar
---
V2:
* Use arch_irq_work_raise to raise decrementer interrupt.
* Avoid having atomic variable.
---
arch/powerpc/include/asm/machdep.h | 2
On 11/8/21 19:49, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 8, 2021 6:38 pm:
In realmode mce handler we use irq_work_queue() to defer
the processing of mce events, irq_work_queue() can only
be called when translation is enabled because it touches
memory outside
ch 2/2, refactors this.
-
- /*
-* Queue irq work to log this rtas event later.
-* irq_work_queue uses per-cpu variables, so do this in virt
-* mode as well.
-*/
- irq_work_queue(_errlog_process_work);
-
- mtmsr(msr);
-
return disposition;
}
Thanks for the review :) .
Ganesh
On 9/6/21 14:13, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
v3: Modify the commit log to mention the document according
to which changes are made
to enabled.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 122 +++
1 file changed, 49 insertions(+), 73 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index 8613f9cc5798..62e1519b8355 100644
in realmode.
To avoid this, program the decrementer and call the event
processing functions from timer handler.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/machdep.h | 2 +
arch/powerpc/include/asm/mce.h | 2 +
arch/powerpc/include/asm/paca.h | 1
On 9/22/21 7:32 AM, Nicholas Piggin wrote:
The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.
This works fine except the case of an unrecoverable MCE,
On 8/6/21 6:53 PM, Ganesh Goudar wrote:
Check if the event info is valid before printing the
event information. When a fwnmi enabled nested kvm guest
hits a machine check exception L0 and L2 would generate
machine check event info, But L1 would not generate any
machine check event info
On 9/17/21 12:09 PM, Daniel Axtens wrote:
Hi Ganesh,
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4
Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from
handler")
Signed-off-by: Ganesh Goudar
---
v2: Change in commit message.
---
arch/powerpc/kernel/mce.c | 16 ++--
1 file ch
On 9/8/21 11:10 AM, Michael Ellerman wrote:
Ganesh writes:
On 9/6/21 6:03 PM, Michael Ellerman wrote:
Ganesh Goudar writes
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0
On 9/6/21 6:03 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation
s space.
Signed-off-by: Ganesh Goudar
---
v3: No changes.
v2: No changes.
---
arch/powerpc/kernel/mce.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 9d1e39d42e3e..5baf69503349 100644
--- a/arch/powerpc/ker
receives SIGBUS.
Signed-off-by: Ganesh Goudar
---
v3: Avoid using shell script to inject error.
v2: Fix build error.
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/mce/Makefile | 7 ++
.../selftests/powerpc/mce/inject-ra-err.c | 65
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
v3: Modify the commit log to mention the document according
to which changes are made.
Define and use a macro to check
] machine_check_queue_event+0xbc/0xd0
[c0001ebffcf0] [c000838c] machine_check_early_common+0x16c/0x1f4
Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from
handler")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 16 ++--
1 file c
On 8/26/21 8:57 AM, Michael Ellerman wrote:
Ganesh writes:
On 8/24/21 6:18 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
Add test for real address or control memory address access
error handling, using NX-GZIP engine.
The error is injected by accessing the control memory address
On 8/25/21 2:54 AM, Segher Boessenkool wrote:
On Tue, Aug 24, 2021 at 04:39:57PM +1000, Michael Ellerman wrote:
+ case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK:
+ mce_err.u.ra_error_type =
+
On 8/24/21 6:18 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
Add test for real address or control memory address access
error handling, using NX-GZIP engine.
The error is injected by accessing the control memory address
using illegal instruction, on successful handling the process
On 8/24/21 12:09 PM, Michael Ellerman wrote:
Hi Ganesh,
Some comments below ...
Ganesh Goudar writes:
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/platforms/pseries/ras.c | 21
Hi mpe, Any comments on this patchset?
On 8/5/21 2:50 PM, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/platforms/pseries/ras.c | 21 +
1 file
structure will be
empty in L1.
"Machine Check Exception, Unknown event version 0".
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/mce.h | 2 +-
arch/powerpc/kernel/mce.c | 7 +--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/mc
s space.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/kernel/mce.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..f3ef480bb739 100644
--- a/arch/powerpc/ker
receives SIGBUS.
Signed-off-by: Ganesh Goudar
---
v2: Fix build error.
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/mce/Makefile | 6 +++
.../selftests/powerpc/mce/inject-ra-err.c | 42 +++
.../selftests/powerpc/mce/inject-ra-err.sh
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/platforms/pseries/ras.c | 21 +
1 file changed, 21 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 21 +
1 file changed, 21 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
receives SIGBUS.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/mce/Makefile | 6 +++
.../selftests/powerpc/mce/inject-ra-err.c | 42 +++
.../selftests/powerpc/mce/inject-ra-err.sh| 19 +
4 files
s space.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..f3ef480bb739 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.
On 4/22/21 11:31 AM, Ganesh wrote:
On 4/7/21 10:28 AM, Ganesh Goudar wrote:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event information
On 4/7/21 10:28 AM, Ganesh Goudar wrote:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event information, But as of now saving of this flag
On 4/20/21 12:54 PM, Santosh Sivaraj wrote:
Hi Ganesh,
Ganesh Goudar writes:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event
On 4/17/21 6:06 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.
Do you mean "is ICACHE not DCACHE" ?
Right :), Should I send v2 ?
cheers
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index
] memcpy+0x88/0x90
[ 512.972456] MCE: CPU1: Initiator CPU
[ 512.972534] MCE: CPU1: Unknown
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 11f0cae086ed
on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Signed-off-by: Ganesh Goudar
---
v2: Dynamically allocate memory for machine check event info.
v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100. This saves us ~19kB
of memory and has no fatal consequences.
Signed-off-by: Ganesh Goudar
---
v4: This patch is a fragment of the orignal patch which is
split into two.
v5
On 1/25/21 2:54 PM, Christophe Leroy wrote:
Le 22/01/2021 à 13:32, Ganesh Goudar a écrit :
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between
on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Signed-off-by: Ganesh Goudar
---
v2: Dynamically allocate memory for machine check event info
v3: Remove check for hash mmu lpar, use memblock_alloc_try_nid
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100. This saves us ~19kB
of memory and has no fatal consequences.
Signed-off-by: Ganesh Goudar
---
v4: This patch is a fragment of the orignal patch which is
split into two
On 1/19/21 9:28 AM, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of January 15, 2021 10:58 pm:
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code
on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100.
Signed-off-by: Ganesh Goudar
---
v2: Dynamically
on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100.
Signed-off-by: Ganesh Goudar
---
v2: Dynamically
On 12/8/20 4:01 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..4769954efa7d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -273,6 +274,17 @@ struct
on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include
To check machine check handling, add support to inject slb
multihit errors.
Cc: Kees Cook
Cc: Michal Suchánek
Co-developed-by: Mahesh Salgaonkar
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
v5:
- Insert entries at SLB_NUM_BOLTED and SLB_NUM_BOLTED +1, remove index
On 10/19/20 6:45 PM, Michal Suchánek wrote:
On Mon, Oct 19, 2020 at 09:59:57PM +1100, Michael Ellerman wrote:
Hi Ganesh,
Some comments below ...
Ganesh Goudar writes:
To check machine check handling, add support to inject slb
multihit errors.
Cc: Kees Cook
Reviewed-by: Michal Suchánek
On 7/24/20 12:09 PM, Ganesh Goudar wrote:
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to poison the page where
memory error occurred, recent rework in this area made
On 10/16/20 5:02 PM, Michael Ellerman wrote:
On Fri, 9 Oct 2020 12:10:03 +0530, Ganesh Goudar wrote:
This patch series fixes mce handling for pseries, Adds LKDTM test
for SLB multihit recovery and enables selftest for the same,
basically to test MCE handling on pseries/powernv machines running
To check machine check handling, add support to inject slb
multihit errors.
Cc: Kees Cook
Reviewed-by: Michal Suchánek
Co-developed-by: Mahesh Salgaonkar
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mc
,
as nesting is supported.
* Fix build errors and remove unused variables.
* Integrate error injection code into LKDTM.
* Add support to inject multihit in paca.
Ganesh Goudar (2):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
arch/powerpc/kernel
On 10/1/20 11:21 PM, Ganesh Goudar wrote:
Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting slb mutihit on pseries machine running in hash
mmu mode, As these calls try to accesses memory outside RMO region in
real mode handler where translation
To check machine check handling, add support to inject slb
multihit errors.
Reviewed-by: Michal Suchánek
Co-developed-by: Mahesh Salgaonkar
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm/core.c
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kern
support to inject multihit in paca.
Ganesh Goudar (2):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
arch/powerpc/kernel/mce.c | 10 +-
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm/core.c | 3
On 9/26/20 1:29 AM, Kees Cook wrote:
On Fri, Sep 25, 2020 at 04:01:23PM +0530, Ganesh Goudar wrote:
Add PPC_SLB_MULTIHIT to lkdtm selftest framework.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/lkdtm/tests.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing
On 9/26/20 1:27 AM, Kees Cook wrote:
On Fri, Sep 25, 2020 at 04:01:22PM +0530, Ganesh Goudar wrote:
Add support to inject slb multihit errors, to test machine
check handling.
Thank you for more tests in here!
Based on work by Mahesh Salgaonkar and Michal Suchánek.
Cc: Mahesh Salgaonkar
.
* Fix build errors and remove unused variables.
* Integrate error injection code into LKDTM.
* Add support to inject multihit in paca.
Ganesh Goudar (3):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
selftests/lkdtm: Enable selftest for SLB
1 - 100 of 145 matches
Mail list logo