Re: [RESEND PATCH v5 00/11] ppc64: enable kdump support for kexec_file_load syscall
On 07/27/2020 03:36 AM, Hari Bathini wrote: > Sorry! There was a gateway issue on my system while posting v5, due to > which some patches did not make it through. Resending... > > This patch series enables kdump support for kexec_file_load system > call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools > code but heavily modified for kernel consumption. > > The first patch adds a weak arch_kexec_locate_mem_hole() function to > override locate memory hole logic suiting arch needs. There are some > special regions in ppc64 which should be avoided while loading buffer > & there are multiple callers to kexec_add_buffer making it complicated > to maintain range sanity and using generic lookup at the same time. > > The second patch marks ppc64 specific code within arch/powerpc/kexec > and arch/powerpc/purgatory to make the subsequent code changes easy > to understand. > > The next patch adds helper function to setup different memory ranges > needed for loading kdump kernel, booting into it and exporting the > crashing kernel's elfcore. > > The fourth patch overrides arch_kexec_locate_mem_hole() function to > locate memory hole for kdump segments by accounting for the special > memory regions, referred to as excluded memory ranges, and sets > kbuf->mem when a suitable memory region is found. > > The fifth patch moves walk_drmem_lmbs() out of .init section with > a few changes to reuse it for setting up kdump kernel's usable memory > ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs > and set linux,drconf-usable-memory & linux,usable-memory properties > in order to restrict kdump kernel's memory usage. > > The seventh patch updates purgatory to setup r8 & r9 with opal base > and opal entry addresses respectively to aid kernels built with > CONFIG_PPC_EARLY_DEBUG_OPAL enabled. The next patch setups up backup > region as a kexec segment while loading kdump kernel and teaches > purgatory to copy data from source to destination. > > Patch 09 builds the elfcore header for the running kernel & passes > the info to kdump kernel via "elfcorehdr=" parameter to export as > /proc/vmcore file. The next patch sets up the memory reserve map > for the kexec kernel and also claims kdump support for kdump as > all the necessary changes are added. > > The last patch fixes a lookup issue for `kexec -l -s` case when > memory is reserved for crashkernel. > > Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER > boxes, one with secureboot enabled, KVM guest and a simulator. > > v4 -> v5: > * Dropped patches 07/12 & 08/12 and updated purgatory to do everything > in assembly. I guess you achieve this by carefully selecting instruction to avoid relocation issue, right? Thanks, Pingfan
Re: [PATCH v2 00/12] ppc64: enable kdump support for kexec_file_load syscall
On 07/03/2020 03:53 AM, Hari Bathini wrote: > This patch series enables kdump support for kexec_file_load system > call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools > code but heavily modified for kernel consumption. There is scope to > expand purgatory to verify sha256 digest along with other improvements > in purgatory code. Will deal with those changes in a separate patch > series later. > > The first patch adds a weak arch_kexec_locate_mem_hole() function to > override locate memory hole logic suiting arch needs. There are some > special regions in ppc64 which should be avoided while loading buffer > & there are multiple callers to kexec_add_buffer making it complicated > to maintain range sanity and using generic lookup at the same time. > > The second patch marks ppc64 specific code within arch/powerpc/kexec > and arch/powerpc/purgatory to make the subsequent code changes easy > to understand. > > The next patch adds helper function to setup different memory ranges > needed for loading kdump kernel, booting into it and exporting the > crashing kernel's elfcore. > > The fourth patch overrides arch_kexec_locate_mem_hole() function to > locate memory hole for kdump segments by accounting for the special > memory regions, referred to as excluded memory ranges, and sets > kbuf->mem when a suitable memory region is found. > > The fifth patch moves walk_drmem_lmbs() out of .init section with > a few changes to reuse it for setting up kdump kernel's usable memory > ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs > and set linux,drconf-usable-memory & linux,usable-memory properties > in order to restrict kdump kernel's memory usage. > > The seventh patch adds relocation support for the purgatory. Patch 8 > helps setup the stack for the purgatory. The next patch setups up > backup region as a segment while loading kdump kernel and teaches > purgatory to copy it from source to destination. > > Patch 10 builds the elfcore header for the running kernel & passes > the info to kdump kernel via "elfcorehdr=" parameter to export as > /proc/vmcore file. The next patch sets up the memory reserve map > for the kexec kernel and also claims kdump support for kdump as > all the necessary changes are added. > > The last patch fixes a lookup issue for `kexec -l -s` case when > memory is reserved for crashkernel. > > Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER > boxes and a simulator. > > Changes in v2: > * Introduced arch_kexec_locate_mem_hole() for override and dropped > weak arch_kexec_add_buffer(). > * Addressed warnings reported by lkp. > * Added patch to address kexec load issue when memory is reserved > for crashkernel. > * Used the appropriate license header for the new files added. > * Added an option to merge ranges to minimize reallocations while > adding memory ranges. > * Dropped within_crashkernel parameter for add_opal_mem_range() & > add_rtas_mem_range() functions as it is not really needed. > > --- > > Hari Bathini (12): > kexec_file: allow archs to handle special regions while locating memory > hole > powerpc/kexec_file: mark PPC64 specific code > powerpc/kexec_file: add helper functions for getting memory ranges > ppc64/kexec_file: avoid stomping memory used by special regions > powerpc/drmem: make lmb walk a bit more flexible > ppc64/kexec_file: restrict memory usage of kdump kernel > ppc64/kexec_file: add support to relocate purgatory > ppc64/kexec_file: setup the stack for purgatory > ppc64/kexec_file: setup backup region for kdump kernel > ppc64/kexec_file: prepare elfcore header for crashing kernel > ppc64/kexec_file: add appropriate regions for memory reserve map > ppc64/kexec_file: fix kexec load failure with lack of memory hole > > > arch/powerpc/include/asm/crashdump-ppc64.h | 15 > arch/powerpc/include/asm/drmem.h |9 > arch/powerpc/include/asm/kexec.h | 35 + > arch/powerpc/include/asm/kexec_ranges.h| 18 > arch/powerpc/include/asm/purgatory.h | 11 > arch/powerpc/kernel/prom.c | 13 > arch/powerpc/kexec/Makefile|2 > arch/powerpc/kexec/elf_64.c| 35 + > arch/powerpc/kexec/file_load.c | 78 + > arch/powerpc/kexec/file_load_64.c | 1509 > > arch/powerpc/kexec/ranges.c| 397 +++ > arch/powerpc/mm/drmem.c| 87 +- > arch/powerpc/mm/numa.c | 13 > arch/powerpc/purgatory/Makefile| 28 - > arch/powerpc/purgatory/purgatory_64.c | 36 + > arch/powerpc/purgatory/trampoline.S| 117 -- > arch/powerpc/purgatory/trampoline_64.S | 175 +++ > include/linux/kexec.h | 29 - > kernel/kexec_file.c| 16 > 19 files changed, 2413 insertions(+), 210
Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
On 07/01/2020 03:40 PM, Dave Young wrote: > Hi Hari, > On 06/27/20 at 12:35am, Hari Bathini wrote: >> crashkernel region could have an overlap with special memory regions >> like opal, rtas, tce-table & such. These regions are referred to as >> exclude memory ranges. Setup this ranges during image probe in order >> to avoid them while finding the buffer for different kdump segments. >> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole >> accounting for these ranges. Also, override arch_kexec_add_buffer() >> to locate a memory hole & later call __kexec_add_buffer() function >> with kbuf->mem set to skip the generic locate memory hole lookup. >> >> Signed-off-by: Hari Bathini >> --- >> arch/powerpc/include/asm/crashdump-ppc64.h | 10 + >> arch/powerpc/include/asm/kexec.h |7 - >> arch/powerpc/kexec/elf_64.c|7 + >> arch/powerpc/kexec/file_load_64.c | 292 >> >> 4 files changed, 312 insertions(+), 4 deletions(-) >> create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h >> > [snip] >> /** >> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes >> + * regions like opal/rtas, tce-table, initrd, >> + * kernel, htab which should be avoided while >> + * setting up kexec load segments. >> + * @mem_ranges:Range list to add the memory ranges to. >> + * >> + * Returns 0 on success, negative errno on error. >> + */ >> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) >> +{ >> +int ret; >> + >> +ret = add_tce_mem_ranges(mem_ranges); >> +if (ret) >> +goto out; >> + >> +ret = add_initrd_mem_range(mem_ranges); >> +if (ret) >> +goto out; >> + >> +ret = add_htab_mem_range(mem_ranges); >> +if (ret) >> +goto out; >> + >> +ret = add_kernel_mem_range(mem_ranges); >> +if (ret) >> +goto out; >> + >> +ret = add_rtas_mem_range(mem_ranges, false); >> +if (ret) >> +goto out; >> + >> +ret = add_opal_mem_range(mem_ranges, false); >> +if (ret) >> +goto out; >> + >> +ret = add_reserved_ranges(mem_ranges); >> +if (ret) >> +goto out; >> + >> +/* exclude memory ranges should be sorted for easy lookup */ >> +sort_memory_ranges(*mem_ranges); >> +out: >> +if (ret) >> +pr_err("Failed to setup exclude memory ranges\n"); >> +return ret; >> +} > > I'm confused about the "overlap with crashkernel memory", does that mean > those normal kernel used memory could be put in crashkernel reserved > memory range? If so why can't just skip those areas while crashkernel > doing the reservation? I raised the same question in another mail. As Hari's answer, "kexec -p" skips these ranges in user space. And the same logic should be done in "kexec -s -p" Regards, Pingfan
Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
On 06/30/2020 02:10 PM, Hari Bathini wrote: > > > On 30/06/20 9:00 am, piliu wrote: >> >> >> On 06/29/2020 01:55 PM, Hari Bathini wrote: >>> >>> >>> On 28/06/20 7:44 am, piliu wrote: >>>> Hi Hari, >>> >>> Hi Pingfan, >>> >>>> >>>> After a quick through for this series, I have a few question/comment on >>>> this patch for the time being. Pls see comment inline. >>>> >>>> On 06/27/2020 03:05 AM, Hari Bathini wrote: >>>>> crashkernel region could have an overlap with special memory regions >>>>> like opal, rtas, tce-table & such. These regions are referred to as >>>>> exclude memory ranges. Setup this ranges during image probe in order >>>>> to avoid them while finding the buffer for different kdump segments. >>> >>> [...] >>> >>>>> + /* >>>>> + * Use the locate_mem_hole logic in kexec_add_buffer() for regular >>>>> + * kexec_file_load syscall >>>>> + */ >>>>> + if (kbuf->image->type != KEXEC_TYPE_CRASH) >>>>> + return 0; >>>> Can the ranges overlap [crashk_res.start, crashk_res.end]? Otherwise >>>> there is no requirement for @exclude_ranges. >>> >>> The ranges like rtas, opal are loaded by f/w. They almost always overlap >>> with >>> crashkernel region. So, @exclude_ranges is required to support kdump. >> f/w passes rtas/opal as service, then must f/w mark these ranges as >> fdt_reserved_mem in order to make kernel aware not to use these ranges? > > It does. Actually, reserve_map + reserved-ranges are reserved as soon as > memblock allocator is ready but not before crashkernel reservation. > Check early_reserve_mem() call in kernel/prom.c > >> Otherwise kernel memory allocation besides kdump can also overwrite >> these ranges.> >> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved >> memory into consider except kernel text. Could it work based on memblock >> allocator? > > So, kdump could possibly overwrite these regions which is why an exclude > range list is needed. Same thing was done in kexec-tools as well. OK, got it. Thanks, Pingfan > > Thanks > Hari > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec >
Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
On 06/29/2020 01:55 PM, Hari Bathini wrote: > > > On 28/06/20 7:44 am, piliu wrote: >> Hi Hari, > > Hi Pingfan, > >> >> After a quick through for this series, I have a few question/comment on >> this patch for the time being. Pls see comment inline. >> >> On 06/27/2020 03:05 AM, Hari Bathini wrote: >>> crashkernel region could have an overlap with special memory regions >>> like opal, rtas, tce-table & such. These regions are referred to as >>> exclude memory ranges. Setup this ranges during image probe in order >>> to avoid them while finding the buffer for different kdump segments. > > [...] > >>> + /* >>> +* Use the locate_mem_hole logic in kexec_add_buffer() for regular >>> +* kexec_file_load syscall >>> +*/ >>> + if (kbuf->image->type != KEXEC_TYPE_CRASH) >>> + return 0; >> Can the ranges overlap [crashk_res.start, crashk_res.end]? Otherwise >> there is no requirement for @exclude_ranges. > > The ranges like rtas, opal are loaded by f/w. They almost always overlap with > crashkernel region. So, @exclude_ranges is required to support kdump. f/w passes rtas/opal as service, then must f/w mark these ranges as fdt_reserved_mem in order to make kernel aware not to use these ranges? Otherwise kernel memory allocation besides kdump can also overwrite these ranges. Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved memory into consider except kernel text. Could it work based on memblock allocator? Thanks, Pingfan > >> I guess you have a design for future. If not true, then it is better to >> fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the >> caller and rename this function to better distinguish use cases between >> kexec and kdump > > Yeah, this condition will be folded. I have a follow-up patch for that > explaining > why kexec case should also be folded. Will try to add that to this series for > v2. > > Thanks > Hari >
Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
Hi Hari, If in [4/11], get_exclude_memory_ranges() turns out to be unnecessary ,then this patch is abundant either. As my understanding, memblock has already helped to achieved the purpose that get_exclude_memory_ranges() wants. Thanks, Pingfan On 06/27/2020 03:04 AM, Hari Bathini wrote: > Some archs can have special memory regions, within the given memory > range, which can't be used for the buffer in a kexec segment. As > kexec_add_buffer() function is being called from generic code as well, > add weak arch_kexec_add_buffer definition for archs to override & take > care of special regions before trying to locate a memory hole. > > Signed-off-by: Hari Bathini > --- > include/linux/kexec.h |5 + > kernel/kexec_file.c | 37 + > 2 files changed, 38 insertions(+), 4 deletions(-) > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index 1776eb2..1237682 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -195,6 +195,11 @@ int __weak arch_kexec_apply_relocations(struct > purgatory_info *pi, > const Elf_Shdr *relsec, > const Elf_Shdr *symtab); > > +extern int arch_kexec_add_buffer(struct kexec_buf *kbuf); > + > +/* arch_kexec_add_buffer calls this when it is ready */ > +extern int __kexec_add_buffer(struct kexec_buf *kbuf); > + > extern int kexec_add_buffer(struct kexec_buf *kbuf); > int kexec_locate_mem_hole(struct kexec_buf *kbuf); > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index bb05fd5..a0b4f7f 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -669,10 +669,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) > */ > int kexec_add_buffer(struct kexec_buf *kbuf) > { > - > - struct kexec_segment *ksegment; > - int ret; > - > /* Currently adding segment this way is allowed only in file mode */ > if (!kbuf->image->file_mode) > return -EINVAL; > @@ -696,6 +692,25 @@ int kexec_add_buffer(struct kexec_buf *kbuf) > kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE); > kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); > > + return arch_kexec_add_buffer(kbuf); > +} > + > +/** > + * __kexec_add_buffer - arch_kexec_add_buffer would call this function after > + * updating kbuf, to place a buffer in a kexec segment. > + * @kbuf: Buffer contents and memory parameters. > + * > + * This function assumes that kexec_mutex is held. > + * On successful return, @kbuf->mem will have the physical address of > + * the buffer in memory. > + * > + * Return: 0 on success, negative errno on error. > + */ > +int __kexec_add_buffer(struct kexec_buf *kbuf) > +{ > + struct kexec_segment *ksegment; > + int ret; > + > /* Walk the RAM ranges and allocate a suitable range for the buffer */ > ret = kexec_locate_mem_hole(kbuf); > if (ret) > @@ -711,6 +726,20 @@ int kexec_add_buffer(struct kexec_buf *kbuf) > return 0; > } > > +/** > + * arch_kexec_add_buffer - Some archs have memory regions within the given > + * range that can't be used to place a kexec segment. > + * Such archs can override this function to take care > + * of them before trying to locate the memory hole. > + * @kbuf: Buffer contents and memory parameters. > + * > + * Return: 0 on success, negative errno on error. > + */ > +int __weak arch_kexec_add_buffer(struct kexec_buf *kbuf) > +{ > + return __kexec_add_buffer(kbuf); > +} > + > /* Calculate and store the digest of segments */ > static int kexec_calculate_store_digests(struct kimage *image) > { > > > ___ > kexec mailing list > ke...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec >
Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
Hi Hari, After a quick through for this series, I have a few question/comment on this patch for the time being. Pls see comment inline. On 06/27/2020 03:05 AM, Hari Bathini wrote: > crashkernel region could have an overlap with special memory regions > like opal, rtas, tce-table & such. These regions are referred to as > exclude memory ranges. Setup this ranges during image probe in order > to avoid them while finding the buffer for different kdump segments. > Implement kexec_locate_mem_hole_ppc64() that locates a memory hole > accounting for these ranges. Also, override arch_kexec_add_buffer() > to locate a memory hole & later call __kexec_add_buffer() function > with kbuf->mem set to skip the generic locate memory hole lookup. > > Signed-off-by: Hari Bathini > --- > arch/powerpc/include/asm/crashdump-ppc64.h | 10 + > arch/powerpc/include/asm/kexec.h |7 - > arch/powerpc/kexec/elf_64.c|7 + > arch/powerpc/kexec/file_load_64.c | 292 > > 4 files changed, 312 insertions(+), 4 deletions(-) > create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h > > diff --git a/arch/powerpc/include/asm/crashdump-ppc64.h > b/arch/powerpc/include/asm/crashdump-ppc64.h > new file mode 100644 > index 000..3596c25 > --- /dev/null > +++ b/arch/powerpc/include/asm/crashdump-ppc64.h > @@ -0,0 +1,10 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +#ifndef _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H > +#define _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H > + > +/* min & max addresses for kdump load segments */ > +#define KDUMP_BUF_MIN(crashk_res.start) > +#define KDUMP_BUF_MAX((crashk_res.end < ppc64_rma_size) ? \ > + crashk_res.end : (ppc64_rma_size - 1)) > + > +#endif /* __ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H */ > diff --git a/arch/powerpc/include/asm/kexec.h > b/arch/powerpc/include/asm/kexec.h > index 7008ea1..bf47a01 100644 > --- a/arch/powerpc/include/asm/kexec.h > +++ b/arch/powerpc/include/asm/kexec.h > @@ -100,14 +100,16 @@ void relocate_new_kernel(unsigned long > indirection_page, unsigned long reboot_co > #ifdef CONFIG_KEXEC_FILE > extern const struct kexec_file_ops kexec_elf64_ops; > > -#ifdef CONFIG_IMA_KEXEC > #define ARCH_HAS_KIMAGE_ARCH > > struct kimage_arch { > + struct crash_mem *exclude_ranges; > + > +#ifdef CONFIG_IMA_KEXEC > phys_addr_t ima_buffer_addr; > size_t ima_buffer_size; > -}; > #endif > +}; > > int setup_purgatory(struct kimage *image, const void *slave_code, > const void *fdt, unsigned long kernel_load_addr, > @@ -125,6 +127,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void > *fdt, > unsigned long initrd_load_addr, > unsigned long initrd_len, const char *cmdline); > #endif /* CONFIG_PPC64 */ > + > #endif /* CONFIG_KEXEC_FILE */ > > #else /* !CONFIG_KEXEC_CORE */ > diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c > index 23ad04c..c695f94 100644 > --- a/arch/powerpc/kexec/elf_64.c > +++ b/arch/powerpc/kexec/elf_64.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > static void *elf64_load(struct kimage *image, char *kernel_buf, > unsigned long kernel_len, char *initrd, > @@ -46,6 +47,12 @@ static void *elf64_load(struct kimage *image, char > *kernel_buf, > if (ret) > goto out; > > + if (image->type == KEXEC_TYPE_CRASH) { > + /* min & max buffer values for kdump case */ > + kbuf.buf_min = pbuf.buf_min = KDUMP_BUF_MIN; > + kbuf.buf_max = pbuf.buf_max = KDUMP_BUF_MAX; > + } > + > ret = kexec_elf_load(image, , _info, , _load_addr); > if (ret) > goto out; > diff --git a/arch/powerpc/kexec/file_load_64.c > b/arch/powerpc/kexec/file_load_64.c > index e6bff960..f1d7160 100644 > --- a/arch/powerpc/kexec/file_load_64.c > +++ b/arch/powerpc/kexec/file_load_64.c > @@ -17,6 +17,8 @@ > #include > #include > #include > +#include > +#include > > const struct kexec_file_ops * const kexec_file_loaders[] = { > _elf64_ops, > @@ -24,6 +26,247 @@ const struct kexec_file_ops * const kexec_file_loaders[] > = { > }; > > /** > + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes > + * regions like opal/rtas, tce-table, initrd, > + * kernel, htab which should be avoided while > + * setting up kexec load segments. > + * @mem_ranges:Range list to add the memory ranges to. > + * > + * Returns 0 on success, negative errno on error. > + */ > +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) Is it needed? See the comment below. > +{ > + int ret; > + > + ret = add_tce_mem_ranges(mem_ranges); > + if (ret) > + goto out; > + > +
Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()
On 04/09/2020 03:26 PM, David Hildenbrand wrote: > On 09.04.20 04:59, piliu wrote: >> >> >> On 04/08/2020 10:46 AM, Baoquan He wrote: >>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL. >>> >>> On 04/07/20 at 03:54pm, David Hildenbrand wrote: >>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory >>>> blocks as removable"), the user space interface to compute whether a memory >>>> block can be offlined (exposed via >>>> /sys/devices/system/memory/memoryX/removable) has effectively been >>>> deprecated. We want to remove the leftovers of the kernel implementation. >>> >>> Pingfan, can you have a look at this change on PPC? Please feel free to >>> give comments if any concern, or offer ack if it's OK to you. >>> >>>> >>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()), >>>> we'll start by: >>>> 1. Testing if it contains any holes, and reject if so >>>> 2. Testing if pages belong to different zones, and reject if so >>>> 3. Isolating the page range, checking if it contains any unmovable pages >>>> >>>> Using is_mem_section_removable() before trying to offline is not only racy, >>>> it can easily result in false positives/negatives. Let's stop manually >>>> checking is_mem_section_removable(), and let device_offline() handle it >>>> completely instead. We can remove the racy is_mem_section_removable() >>>> implementation next. >>>> >>>> We now take more locks (e.g., memory hotplug lock when offlining and the >>>> zone lock when isolating), but maybe we should optimize that >>>> implementation instead if this ever becomes a real problem (after all, >>>> memory unplug is already an expensive operation). We started using >>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries: >>>> Implement memory hotplug remove in the kernel"), with the initial >>>> hotremove support of lmbs. >>>> >>>> Cc: Nathan Fontenot >>>> Cc: Michael Ellerman >>>> Cc: Benjamin Herrenschmidt >>>> Cc: Paul Mackerras >>>> Cc: Michal Hocko >>>> Cc: Andrew Morton >>>> Cc: Oscar Salvador >>>> Cc: Baoquan He >>>> Cc: Wei Yang >>>> Signed-off-by: David Hildenbrand >>>> --- >>>> .../platforms/pseries/hotplug-memory.c| 26 +++ >>>> 1 file changed, 3 insertions(+), 23 deletions(-) >>>> >>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c >>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c >>>> index b2cde1732301..5ace2f9a277e 100644 >>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c >>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c >>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct >>>> device_node *np) >>>> >>>> static bool lmb_is_removable(struct drmem_lmb *lmb) >>>> { >>>> - int i, scns_per_block; >>>> - bool rc = true; >>>> - unsigned long pfn, block_sz; >>>> - u64 phys_addr; >>>> - >>>>if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) >>>>return false; >>>> >>>> - block_sz = memory_block_size_bytes(); >>>> - scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; >>>> - phys_addr = lmb->base_addr; >>>> - >>>> #ifdef CONFIG_FA_DUMP >>>>/* >>>> * Don't hot-remove memory that falls in fadump boot memory area >>>> * and memory that is reserved for capturing old kernel memory. >>>> */ >>>> - if (is_fadump_memory_area(phys_addr, block_sz)) >>>> + if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes())) >>>>return false; >>>> #endif >>>> - >>>> - for (i = 0; i < scns_per_block; i++) { >>>> - pfn = PFN_DOWN(phys_addr); >>>> - if (!pfn_in_present_section(pfn)) { >>>> - phys_addr += MIN_MEMORY_BLOCK_SIZE; >>>> - continue; >>>> - } >>>> - >>>> - rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION); >>>> - phys_addr += MIN_MEMORY_BLOCK_SIZE; >>>> -
Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()
On 04/08/2020 10:46 AM, Baoquan He wrote: > Add Pingfan to CC since he usually handles ppc related bugs for RHEL. > > On 04/07/20 at 03:54pm, David Hildenbrand wrote: >> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory >> blocks as removable"), the user space interface to compute whether a memory >> block can be offlined (exposed via >> /sys/devices/system/memory/memoryX/removable) has effectively been >> deprecated. We want to remove the leftovers of the kernel implementation. > > Pingfan, can you have a look at this change on PPC? Please feel free to > give comments if any concern, or offer ack if it's OK to you. > >> >> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()), >> we'll start by: >> 1. Testing if it contains any holes, and reject if so >> 2. Testing if pages belong to different zones, and reject if so >> 3. Isolating the page range, checking if it contains any unmovable pages >> >> Using is_mem_section_removable() before trying to offline is not only racy, >> it can easily result in false positives/negatives. Let's stop manually >> checking is_mem_section_removable(), and let device_offline() handle it >> completely instead. We can remove the racy is_mem_section_removable() >> implementation next. >> >> We now take more locks (e.g., memory hotplug lock when offlining and the >> zone lock when isolating), but maybe we should optimize that >> implementation instead if this ever becomes a real problem (after all, >> memory unplug is already an expensive operation). We started using >> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries: >> Implement memory hotplug remove in the kernel"), with the initial >> hotremove support of lmbs. >> >> Cc: Nathan Fontenot >> Cc: Michael Ellerman >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Michal Hocko >> Cc: Andrew Morton >> Cc: Oscar Salvador >> Cc: Baoquan He >> Cc: Wei Yang >> Signed-off-by: David Hildenbrand >> --- >> .../platforms/pseries/hotplug-memory.c| 26 +++ >> 1 file changed, 3 insertions(+), 23 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c >> b/arch/powerpc/platforms/pseries/hotplug-memory.c >> index b2cde1732301..5ace2f9a277e 100644 >> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c >> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c >> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node >> *np) >> >> static bool lmb_is_removable(struct drmem_lmb *lmb) >> { >> -int i, scns_per_block; >> -bool rc = true; >> -unsigned long pfn, block_sz; >> -u64 phys_addr; >> - >> if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) >> return false; >> >> -block_sz = memory_block_size_bytes(); >> -scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; >> -phys_addr = lmb->base_addr; >> - >> #ifdef CONFIG_FA_DUMP >> /* >> * Don't hot-remove memory that falls in fadump boot memory area >> * and memory that is reserved for capturing old kernel memory. >> */ >> -if (is_fadump_memory_area(phys_addr, block_sz)) >> +if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes())) >> return false; >> #endif >> - >> -for (i = 0; i < scns_per_block; i++) { >> -pfn = PFN_DOWN(phys_addr); >> -if (!pfn_in_present_section(pfn)) { >> -phys_addr += MIN_MEMORY_BLOCK_SIZE; >> -continue; >> -} >> - >> -rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION); >> -phys_addr += MIN_MEMORY_BLOCK_SIZE; >> -} >> - >> -return rc; >> +/* device_offline() will determine if we can actually remove this lmb */ >> +return true; So I think here swaps the check and do sequence. At least it breaks dlpar_memory_remove_by_count(). It is doable to remove is_mem_section_removable(), but here should be more effort to re-arrange the code. Thanks, Pingfan >> } >> >> static int dlpar_add_lmb(struct drmem_lmb *); >> -- >> 2.25.1 >>