Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.
The original idea was discussed in:
https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915...@redhat.com
This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.
We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO
that we want to use __GFP_ZERO for clearing once alloc_contig_pages()
understands that.
Tested with in QEMU/TCG with 10 GiB of main memory:
[root@localhost ~]# echo 0x4000 >
/sys/kernel/debug/powerpc/memtrace/enable
[ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at
0x8000
[root@localhost ~]# echo 0x4000 >
/sys/kernel/debug/powerpc/memtrace/enable
[ 145.042493][ T1080] radix-mmu: Mapped
0x8000-0xc000 with 64.0 KiB pages
[ 145.049019][ T1080] memtrace: Freed trace memory back on node 0
[ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at
0x8000
[root@localhost ~]# echo 0x8000 >
/sys/kernel/debug/powerpc/memtrace/enable
[ 213.606916][ T1080] radix-mmu: Mapped
0x8000-0xc000 with 64.0 KiB pages
[ 213.613855][ T1080] memtrace: Freed trace memory back on node 0
[ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at
0x8000
[root@localhost ~]# echo 0x1 >
/sys/kernel/debug/powerpc/memtrace/enable
[ 234.874872][ T1080] radix-mmu: Mapped
0x8000-0x0001 with 64.0 KiB pages
[ 234.886974][ T1080] memtrace: Freed trace memory back on node 0
[ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0
[root@localhost ~]# echo 0x4000 >
/sys/kernel/debug/powerpc/memtrace/enable
[ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at
0x8000
I also made sure allocated memory is properly zeroed.
Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
pages are not movable. However, as we don't run with any memory
hot(un)plug mechanism around, we could make an exception to
increase the chance of allocations succeeding.
Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
along PG_reserved in hibernation code will always return "true"
on powerpc, resulting in the pages getting touched. It's too
generic - e.g., indicates boot allocations.
Note 3: For now, we keep using memory_block_size_bytes() as minimum
granularity.
Suggested-by: Michal Hocko
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Rashmica Gupta
Cc: Andrew Morton
Cc: Mike Rapoport
Cc: Michal Hocko
Cc: Oscar Salvador
Cc: Wei Yang
Signed-off-by: David Hildenbrand
---
arch/powerpc/platforms/powernv/Kconfig| 8 +-
arch/powerpc/platforms/powernv/memtrace.c | 163 --
2 files changed, 62 insertions(+), 109 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/Kconfig
b/arch/powerpc/platforms/powernv/Kconfig
index 938803eab0ad..619b093a0657 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,11 +27,11 @@ config OPAL_PRD
recovery diagnostics on OpenPower machines
config PPC_MEMTRACE
- bool "Enable removal of RAM from kernel mappings for tracing"
- depends on PPC_POWERNV && MEMORY_HOTREMOVE
+ bool "Enable runtime allocation of RAM for tracing"
+ depends on PPC_POWERNV && MEMORY_HOTPLUG && CONTIG_ALLOC
help
- Enabling this option allows for the removal of memory (RAM)
- from the kernel mappings to be used for hardware tracing.
+ Enabling this option allows for runtime allocation of memory (RAM)
+ for hardware tracing.
config PPC_VAS
bool "IBM Virtual Accelerator Switchboard (VAS)"
diff --git a/arch/powerpc/platforms/powernv/memtrace.c
b/arch/powerpc/platforms/powernv/memtrace.c
index 0e42fe2d7b6a..5fc9408bb0b3 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -51,33 +51,12 @@ static const struct file_operations memtrace_fops = {
.open = simple_open,
};
-static int check_memblock_online(struct memory_block *mem, void *arg)
-{
- if (mem->state != MEM_ONLINE)
- return -1;
-
- return 0;
-}
-
-static int change_memblock_state(struct memory_block *mem, void *arg)
-{
- unsigned long state = (unsigned long)arg;
-
- mem->state = state;
-
- return 0;
-}
-
static void memtrace_clear_range(unsigned long start_pfn,
unsigned long nr