Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On Wed, 18 Dec 2019 18:08:04 +0100 David Hildenbrand wrote: > On 01.12.19 00:21, Andrew Morton wrote: > > On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand > > wrote: > > > >> I think I just found an issue with try_offline_node(). > >> try_offline_node() is pretty much broken already (touches garbage > >> memmaps and will not considers mixed NIDs within sections), however, > >> relies on the node span to look for memory sections to probe. So it > >> seems to rely on the nodes getting shrunk when removing memory, not when > >> offlining. > >> > >> As we shrink the node span when offlining now and not when removing, > >> this can go wrong once we offline the last memory block of the node and > >> offline the last CPU. We could still have memory around that we could > >> re-online, however, the node would already be offline. Unlikely, but > >> possible. > >> > >> Note that the same is also broken without this patch in case memory is > >> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the > >> garbage memmap, resulting in no memory being detected as belonging to > >> the node. Also, resize_pgdat_range() is called when onlining memory, not > >> when adding it. :/ Oh this is so broken :) > >> > >> The right fix is probably to walk over all memory blocks that could > >> exist and test if they belong to the nid (if offline, check the > >> block->nid, if online check all pageblocks). A fix we can then move in > >> front of this patch. > >> > >> Will look into this this week. > > > > And this series shows almost no sign of having been reviewed. I'll hold > > it over for 5.6. > > > > Hi Andrew, any chance we can get the (now at least reviewed - thx Oscar) > fix in patch #5 into 5.5? (I want to do the final stable backports for > the uninitialized memmap stuff) Sure, I queued it for the next batch of 5.5 fixes.
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On 01.12.19 00:21, Andrew Morton wrote: > On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand wrote: > >> I think I just found an issue with try_offline_node(). >> try_offline_node() is pretty much broken already (touches garbage >> memmaps and will not considers mixed NIDs within sections), however, >> relies on the node span to look for memory sections to probe. So it >> seems to rely on the nodes getting shrunk when removing memory, not when >> offlining. >> >> As we shrink the node span when offlining now and not when removing, >> this can go wrong once we offline the last memory block of the node and >> offline the last CPU. We could still have memory around that we could >> re-online, however, the node would already be offline. Unlikely, but >> possible. >> >> Note that the same is also broken without this patch in case memory is >> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the >> garbage memmap, resulting in no memory being detected as belonging to >> the node. Also, resize_pgdat_range() is called when onlining memory, not >> when adding it. :/ Oh this is so broken :) >> >> The right fix is probably to walk over all memory blocks that could >> exist and test if they belong to the nid (if offline, check the >> block->nid, if online check all pageblocks). A fix we can then move in >> front of this patch. >> >> Will look into this this week. > > And this series shows almost no sign of having been reviewed. I'll hold > it over for 5.6. > Hi Andrew, any chance we can get the (now at least reviewed - thx Oscar) fix in patch #5 into 5.5? (I want to do the final stable backports for the uninitialized memmap stuff) -- Thanks, David / dhildenb
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On 03.12.19 16:10, Oscar Salvador wrote: > On Sun, Oct 06, 2019 at 10:56:41AM +0200, David Hildenbrand wrote: >> Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") >> Signed-off-by: David Hildenbrand > > I did not see anything wrong with the taken approach, and makes sense to me. > The only thing that puzzles me is we seem to not balance spanned_pages > for ZONE_DEVICE anymore. > memremap_pages() increments them via move_pfn_range_to_zone, but we skip > ZONE_DEVICE in remove_pfn_range_from_zone. Yes, documented e.g., in commit 7ce700bf11b5e2cb84e4352bbdf2123a7a239c84 Author: David Hildenbrand Date: Thu Nov 21 17:53:56 2019 -0800 mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() Needs some more thought - but is definitely not urgent (well, now it's at least no longer completely broken). > > That is not really related to this patch, so I might be missing something, > but it caught my eye while reviewing this. > > Anyway, for this one: > > Reviewed-by: Oscar Salvador > Thanks! > > off-topic: I __think__ we really need to trim the CC list. Yes we should :) - done. -- Thanks, David / dhildenb
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On Sun, Oct 06, 2019 at 10:56:41AM +0200, David Hildenbrand wrote: > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") > Signed-off-by: David Hildenbrand I did not see anything wrong with the taken approach, and makes sense to me. The only thing that puzzles me is we seem to not balance spanned_pages for ZONE_DEVICE anymore. memremap_pages() increments them via move_pfn_range_to_zone, but we skip ZONE_DEVICE in remove_pfn_range_from_zone. That is not really related to this patch, so I might be missing something, but it caught my eye while reviewing this. Anyway, for this one: Reviewed-by: Oscar Salvador off-topic: I __think__ we really need to trim the CC list. > --- > arch/arm64/mm/mmu.c| 4 +--- > arch/ia64/mm/init.c| 4 +--- > arch/powerpc/mm/mem.c | 3 +-- > arch/s390/mm/init.c| 4 +--- > arch/sh/mm/init.c | 4 +--- > arch/x86/mm/init_32.c | 4 +--- > arch/x86/mm/init_64.c | 4 +--- > include/linux/memory_hotplug.h | 7 +-- > mm/memory_hotplug.c| 31 --- > mm/memremap.c | 2 +- > 10 files changed, 29 insertions(+), 38 deletions(-) > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 60c929f3683b..d10247fab0fd 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -1069,7 +1069,6 @@ void arch_remove_memory(int nid, u64 start, u64 size, > { > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct zone *zone; > > /* >* FIXME: Cleanup page tables (also in arch_add_memory() in case > @@ -1078,7 +1077,6 @@ void arch_remove_memory(int nid, u64 start, u64 size, >* unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be >* unlocked yet. >*/ > - zone = page_zone(pfn_to_page(start_pfn)); > - __remove_pages(zone, start_pfn, nr_pages, altmap); > + __remove_pages(start_pfn, nr_pages, altmap); > } > #endif > diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c > index bf9df2625bc8..a6dd80a2c939 100644 > --- a/arch/ia64/mm/init.c > +++ b/arch/ia64/mm/init.c > @@ -689,9 +689,7 @@ void arch_remove_memory(int nid, u64 start, u64 size, > { > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct zone *zone; > > - zone = page_zone(pfn_to_page(start_pfn)); > - __remove_pages(zone, start_pfn, nr_pages, altmap); > + __remove_pages(start_pfn, nr_pages, altmap); > } > #endif > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > index be941d382c8d..97e5922cb52e 100644 > --- a/arch/powerpc/mm/mem.c > +++ b/arch/powerpc/mm/mem.c > @@ -130,10 +130,9 @@ void __ref arch_remove_memory(int nid, u64 start, u64 > size, > { > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap); > int ret; > > - __remove_pages(page_zone(page), start_pfn, nr_pages, altmap); > + __remove_pages(start_pfn, nr_pages, altmap); > > /* Remove htab bolted mappings for this section of memory */ > start = (unsigned long)__va(start); > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c > index a124f19f7b3c..c1d96e588152 100644 > --- a/arch/s390/mm/init.c > +++ b/arch/s390/mm/init.c > @@ -291,10 +291,8 @@ void arch_remove_memory(int nid, u64 start, u64 size, > { > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct zone *zone; > > - zone = page_zone(pfn_to_page(start_pfn)); > - __remove_pages(zone, start_pfn, nr_pages, altmap); > + __remove_pages(start_pfn, nr_pages, altmap); > vmem_remove_mapping(start, size); > } > #endif /* CONFIG_MEMORY_HOTPLUG */ > diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c > index dfdbaa50946e..d1b1ff2be17a 100644 > --- a/arch/sh/mm/init.c > +++ b/arch/sh/mm/init.c > @@ -434,9 +434,7 @@ void arch_remove_memory(int nid, u64 start, u64 size, > { > unsigned long start_pfn = PFN_DOWN(start); > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct zone *zone; > > - zone = page_zone(pfn_to_page(start_pfn)); > - __remove_pages(zone, start_pfn, nr_pages, altmap); > + __remove_pages(start_pfn, nr_pages, altmap); > } > #endif /* CONFIG_MEMORY_HOTPLUG */ > diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c > index 930edeb41ec3..0a74407ef92e 100644 > --- a/arch/x86/mm/init_32.c > +++ b/arch/x86/mm/init_32.c > @@ -865,10 +865,8 @@ void arch_remove_memory(int nid, u64 start, u64 size, > { > unsigned long start_pfn = start >> PAGE_SHIFT; > unsigned long nr_pages = size >> PAGE_SHIFT; > - struct zone *zone; > > - zone = page_zone(pfn_to_page(start_pfn)); > - __remove_pages(zone, start_pfn, nr_pages, altmap); > +
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
> Am 01.12.2019 um 00:22 schrieb Andrew Morton : > > On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand > wrote: > >> I think I just found an issue with try_offline_node(). >> try_offline_node() is pretty much broken already (touches garbage >> memmaps and will not considers mixed NIDs within sections), however, >> relies on the node span to look for memory sections to probe. So it >> seems to rely on the nodes getting shrunk when removing memory, not when >> offlining. >> >> As we shrink the node span when offlining now and not when removing, >> this can go wrong once we offline the last memory block of the node and >> offline the last CPU. We could still have memory around that we could >> re-online, however, the node would already be offline. Unlikely, but >> possible. >> >> Note that the same is also broken without this patch in case memory is >> never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the >> garbage memmap, resulting in no memory being detected as belonging to >> the node. Also, resize_pgdat_range() is called when onlining memory, not >> when adding it. :/ Oh this is so broken :) >> >> The right fix is probably to walk over all memory blocks that could >> exist and test if they belong to the nid (if offline, check the >> block->nid, if online check all pageblocks). A fix we can then move in >> front of this patch. >> >> Will look into this this week. > > And this series shows almost no sign of having been reviewed. I'll hold > it over for 5.6. > Makes sense, can‘t do anything about it. Btw, this one is the last stable patch to fix access of uninitialized memmaps that is not upstream yet... so it has to remain broken for some longer.
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On Sun, 27 Oct 2019 23:45:52 +0100 David Hildenbrand wrote: > I think I just found an issue with try_offline_node(). > try_offline_node() is pretty much broken already (touches garbage > memmaps and will not considers mixed NIDs within sections), however, > relies on the node span to look for memory sections to probe. So it > seems to rely on the nodes getting shrunk when removing memory, not when > offlining. > > As we shrink the node span when offlining now and not when removing, > this can go wrong once we offline the last memory block of the node and > offline the last CPU. We could still have memory around that we could > re-online, however, the node would already be offline. Unlikely, but > possible. > > Note that the same is also broken without this patch in case memory is > never onlined. The "pfn_to_nid(pfn) != nid" can easily succeed on the > garbage memmap, resulting in no memory being detected as belonging to > the node. Also, resize_pgdat_range() is called when onlining memory, not > when adding it. :/ Oh this is so broken :) > > The right fix is probably to walk over all memory blocks that could > exist and test if they belong to the nid (if offline, check the > block->nid, if online check all pageblocks). A fix we can then move in > front of this patch. > > Will look into this this week. And this series shows almost no sign of having been reviewed. I'll hold it over for 5.6.
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On 06.10.19 10:56, David Hildenbrand wrote: We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): :/# [ 23.912993] BUG: unable to handle page fault for address: 353d [ 23.914219] #PF: supervisor write access in kernel mode [ 23.915199] #PF: error_code(0x0002) - not-present page [ 23.916160] PGD 0 P4D 0 [ 23.916627] Oops: 0002 [#1] SMP PTI [ 23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 [ 23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 [ 23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10 [ 23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 [ 23.926876] RSP: 0018:ad2400043c98 EFLAGS: 00010246 [ 23.927928] RAX: RBX: 0002 RCX: [ 23.929458] RDX: 0020 RSI: 0014 RDI: 2f40 [ 23.930899] RBP: 00014000 R08: R09: 0001 [ 23.932362] R10: R11: R12: 0014 [ 23.933603] R13: 0014 R14: 2f40 R15: 9e3e7aff3680 [ 23.934913] FS: () GS:9e3e7bb0() knlGS: [ 23.936294] CS: 0010 DS: ES: CR0: 80050033 [ 23.937481] CR2: 353d CR3: 5861 CR4: 06e0 [ 23.938687] DR0: DR1: DR2: [ 23.939889] DR3: DR6: fffe0ff0 DR7: 0400 [ 23.941168] Call Trace: [ 23.941580] __remove_pages+0x4b/0x640 [ 23.942303] ? mark_held_locks+0x49/0x70 [ 23.943149] arch_remove_memory+0x63/0x8d [ 23.943921] try_remove_memory+0xdb/0x130 [ 23.944766] ? walk_memory_blocks+0x7f/0x9e [ 23.945616] __remove_memory+0xa/0x11 [ 23.946274] acpi_memory_device_remove+0x70/0x100 [ 23.947308] acpi_bus_trim+0x55/0x90 [ 23.947914] acpi_device_hotplug+0x227/0x3a0 [ 23.948714] acpi_hotplug_work_fn+0x1a/0x30 [ 23.949433] process_one_work+0x221/0x550 [ 23.950190] worker_thread+0x50/0x3b0 [ 23.950993] kthread+0x105/0x140 [ 23.951644] ? process_one_work+0x550/0x550 [ 23.952508] ? kthread_park+0x80/0x80 [ 23.953367] ret_from_fork+0x3a/0x50 [ 23.954025] Modules linked in: [ 23.954613] CR2: 353d [ 23.955248] ---[ end trace 93d982b1fb3e1a69 ]--- Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Cc: Catalin Marinas Cc: Will Deacon Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: Yoshinori Sato Cc: Rich Felker Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Andrew Morton Cc: Mark Rutland Cc: Steve Capper Cc: Mike Rapoport Cc: Anshuman Khandual Cc: Yu Zhao Cc: Jun Yao Cc: Robin Murphy Cc: Michal Hocko Cc: Oscar Salvador Cc: "Matthew Wilcox (Oracle)" Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: Pavel Tatashin Cc: Gerald Schaefer Cc: Halil Pasic Cc: Tom Lendacky Cc: Greg Kroah-Hartman Cc: Masahiro Yamada Cc: Dan Williams Cc: Wei Yang Cc: Qian Cai Cc: Jason Gunthorpe Cc: Logan Gunthorpe Cc: Ira Weiny Cc: linux-arm-ker...@lists.infradead.org Cc: linux-i...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") Signed-off-by: David Hildenbrand --- arch/arm64/mm/mmu.c| 4 +--- arch/ia64/mm/init.c| 4 +--- arch/powerpc/mm/mem.c | 3 +-- arch/s390/mm/init.c| 4 +--- arch/sh/mm/init.c | 4 +--- arch/x86/mm/init_32.c | 4 +--- arch/x86/mm/init_64.c | 4 +--- include/linux/memory_hotplug.h | 7 +-- mm/memory_hotplug.c| 31 --- mm/memremap.c | 2 +- 10 files changed, 29 insertions(+), 38 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 60c929f3683b..d10247fab0fd 100644 ---
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On Mon, 14 Oct 2019 11:39:13 +0200 David Hildenbrand wrote: > > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") > > @Andrew, can you convert that to > > Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to > zones until online") # visible after d0dc12e86b319 Done. > While adding cc'ing sta...@vger.kernel.org # v4.13+ would be nice, > I doubt it will be easily possible to backport, as we are missing > some prereq patches (e.g., from Oscar like 2c2a5af6fed2 ("mm, > memory_hotplug: add nid parameter to arch_remove_memory")). But, it could > be done with some work. > > I think "Cc: sta...@vger.kernel.org # v5.0+" could be done more > easily. Maybe it's okay to not cc:stable this one. We usually > online all memory (except s390x), however, s390x does not remove that > memory ever. Devmem with driver reserved memory would be, however, > worth backporting this. I added Cc: [5.0+]
Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
On 06.10.19 10:56, David Hildenbrand wrote: > We currently try to shrink a single zone when removing memory. We use the > zone of the first page of the memory we are removing. If that memmap was > never initialized (e.g., memory was never onlined), we will read garbage > and can trigger kernel BUGs (due to a stale pointer): > > :/# [ 23.912993] BUG: unable to handle page fault for address: > 353d > [ 23.914219] #PF: supervisor write access in kernel mode > [ 23.915199] #PF: error_code(0x0002) - not-present page > [ 23.916160] PGD 0 P4D 0 > [ 23.916627] Oops: 0002 [#1] SMP PTI > [ 23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted > 5.3.0-rc5-next-20190820+ #317 > [ 23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 > [ 23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn > [ 23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10 > [ 23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 > c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 > [ 23.926876] RSP: 0018:ad2400043c98 EFLAGS: 00010246 > [ 23.927928] RAX: RBX: 0002 RCX: > > [ 23.929458] RDX: 0020 RSI: 0014 RDI: > 2f40 > [ 23.930899] RBP: 00014000 R08: R09: > 0001 > [ 23.932362] R10: R11: R12: > 0014 > [ 23.933603] R13: 0014 R14: 2f40 R15: > 9e3e7aff3680 > [ 23.934913] FS: () GS:9e3e7bb0() > knlGS: > [ 23.936294] CS: 0010 DS: ES: CR0: 80050033 > [ 23.937481] CR2: 353d CR3: 5861 CR4: > 06e0 > [ 23.938687] DR0: DR1: DR2: > > [ 23.939889] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 23.941168] Call Trace: > [ 23.941580] __remove_pages+0x4b/0x640 > [ 23.942303] ? mark_held_locks+0x49/0x70 > [ 23.943149] arch_remove_memory+0x63/0x8d > [ 23.943921] try_remove_memory+0xdb/0x130 > [ 23.944766] ? walk_memory_blocks+0x7f/0x9e > [ 23.945616] __remove_memory+0xa/0x11 > [ 23.946274] acpi_memory_device_remove+0x70/0x100 > [ 23.947308] acpi_bus_trim+0x55/0x90 > [ 23.947914] acpi_device_hotplug+0x227/0x3a0 > [ 23.948714] acpi_hotplug_work_fn+0x1a/0x30 > [ 23.949433] process_one_work+0x221/0x550 > [ 23.950190] worker_thread+0x50/0x3b0 > [ 23.950993] kthread+0x105/0x140 > [ 23.951644] ? process_one_work+0x550/0x550 > [ 23.952508] ? kthread_park+0x80/0x80 > [ 23.953367] ret_from_fork+0x3a/0x50 > [ 23.954025] Modules linked in: > [ 23.954613] CR2: 353d > [ 23.955248] ---[ end trace 93d982b1fb3e1a69 ]--- > > Instead, shrink the zones when offlining memory or when onlining failed. > Introduce and use remove_pfn_range_from_zone(() for that. We now properly > shrink the zones, even if we have DIMMs whereby > - Some memory blocks fall into no zone (never onlined) > - Some memory blocks fall into multiple zones (offlined+re-onlined) > - Multiple memory blocks that fall into different zones > > Drop the zone parameter (with a potential dubious value) from > __remove_pages() and __remove_section(). > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Tony Luck > Cc: Fenghua Yu > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: Andrew Morton > Cc: Mark Rutland > Cc: Steve Capper > Cc: Mike Rapoport > Cc: Anshuman Khandual > Cc: Yu Zhao > Cc: Jun Yao > Cc: Robin Murphy > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: "Matthew Wilcox (Oracle)" > Cc: Christophe Leroy > Cc: "Aneesh Kumar K.V" > Cc: Pavel Tatashin > Cc: Gerald Schaefer > Cc: Halil Pasic > Cc: Tom Lendacky > Cc: Greg Kroah-Hartman > Cc: Masahiro Yamada > Cc: Dan Williams > Cc: Wei Yang > Cc: Qian Cai > Cc: Jason Gunthorpe > Cc: Logan Gunthorpe > Cc: Ira Weiny > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-i...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-s...@vger.kernel.org > Cc: linux...@vger.kernel.org > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") @Andrew, can you convert that to Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 While adding cc'ing sta...@vger.kernel.org # v4.13+ would be nice, I doubt it will be easily possible to backport, as we are missing some prereq patches (e.g., from Oscar like 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to
[PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): :/# [ 23.912993] BUG: unable to handle page fault for address: 353d [ 23.914219] #PF: supervisor write access in kernel mode [ 23.915199] #PF: error_code(0x0002) - not-present page [ 23.916160] PGD 0 P4D 0 [ 23.916627] Oops: 0002 [#1] SMP PTI [ 23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 [ 23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 [ 23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10 [ 23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 [ 23.926876] RSP: 0018:ad2400043c98 EFLAGS: 00010246 [ 23.927928] RAX: RBX: 0002 RCX: [ 23.929458] RDX: 0020 RSI: 0014 RDI: 2f40 [ 23.930899] RBP: 00014000 R08: R09: 0001 [ 23.932362] R10: R11: R12: 0014 [ 23.933603] R13: 0014 R14: 2f40 R15: 9e3e7aff3680 [ 23.934913] FS: () GS:9e3e7bb0() knlGS: [ 23.936294] CS: 0010 DS: ES: CR0: 80050033 [ 23.937481] CR2: 353d CR3: 5861 CR4: 06e0 [ 23.938687] DR0: DR1: DR2: [ 23.939889] DR3: DR6: fffe0ff0 DR7: 0400 [ 23.941168] Call Trace: [ 23.941580] __remove_pages+0x4b/0x640 [ 23.942303] ? mark_held_locks+0x49/0x70 [ 23.943149] arch_remove_memory+0x63/0x8d [ 23.943921] try_remove_memory+0xdb/0x130 [ 23.944766] ? walk_memory_blocks+0x7f/0x9e [ 23.945616] __remove_memory+0xa/0x11 [ 23.946274] acpi_memory_device_remove+0x70/0x100 [ 23.947308] acpi_bus_trim+0x55/0x90 [ 23.947914] acpi_device_hotplug+0x227/0x3a0 [ 23.948714] acpi_hotplug_work_fn+0x1a/0x30 [ 23.949433] process_one_work+0x221/0x550 [ 23.950190] worker_thread+0x50/0x3b0 [ 23.950993] kthread+0x105/0x140 [ 23.951644] ? process_one_work+0x550/0x550 [ 23.952508] ? kthread_park+0x80/0x80 [ 23.953367] ret_from_fork+0x3a/0x50 [ 23.954025] Modules linked in: [ 23.954613] CR2: 353d [ 23.955248] ---[ end trace 93d982b1fb3e1a69 ]--- Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Cc: Catalin Marinas Cc: Will Deacon Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: Yoshinori Sato Cc: Rich Felker Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Andrew Morton Cc: Mark Rutland Cc: Steve Capper Cc: Mike Rapoport Cc: Anshuman Khandual Cc: Yu Zhao Cc: Jun Yao Cc: Robin Murphy Cc: Michal Hocko Cc: Oscar Salvador Cc: "Matthew Wilcox (Oracle)" Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: Pavel Tatashin Cc: Gerald Schaefer Cc: Halil Pasic Cc: Tom Lendacky Cc: Greg Kroah-Hartman Cc: Masahiro Yamada Cc: Dan Williams Cc: Wei Yang Cc: Qian Cai Cc: Jason Gunthorpe Cc: Logan Gunthorpe Cc: Ira Weiny Cc: linux-arm-ker...@lists.infradead.org Cc: linux-i...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") Signed-off-by: David Hildenbrand --- arch/arm64/mm/mmu.c| 4 +--- arch/ia64/mm/init.c| 4 +--- arch/powerpc/mm/mem.c | 3 +-- arch/s390/mm/init.c| 4 +--- arch/sh/mm/init.c | 4 +--- arch/x86/mm/init_32.c | 4 +--- arch/x86/mm/init_64.c | 4 +--- include/linux/memory_hotplug.h | 7 +-- mm/memory_hotplug.c| 31 --- mm/memremap.c | 2 +- 10 files changed, 29 insertions(+), 38 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 60c929f3683b..d10247fab0fd 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1069,7