Re: [RESEND PATCH v5 00/11] ppc64: enable kdump support for kexec_file_load syscall

2020-07-27 Thread piliu



On 07/27/2020 03:36 AM, Hari Bathini wrote:
> Sorry! There was a gateway issue on my system while posting v5, due to
> which some patches did not make it through. Resending...
> 
> This patch series enables kdump support for kexec_file_load system
> call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
> code but heavily modified for kernel consumption.
> 
> The first patch adds a weak arch_kexec_locate_mem_hole() function to
> override locate memory hole logic suiting arch needs. There are some
> special regions in ppc64 which should be avoided while loading buffer
> & there are multiple callers to kexec_add_buffer making it complicated
> to maintain range sanity and using generic lookup at the same time.
> 
> The second patch marks ppc64 specific code within arch/powerpc/kexec
> and arch/powerpc/purgatory to make the subsequent code changes easy
> to understand.
> 
> The next patch adds helper function to setup different memory ranges
> needed for loading kdump kernel, booting into it and exporting the
> crashing kernel's elfcore.
> 
> The fourth patch overrides arch_kexec_locate_mem_hole() function to
> locate memory hole for kdump segments by accounting for the special
> memory regions, referred to as excluded memory ranges, and sets
> kbuf->mem when a suitable memory region is found.
> 
> The fifth patch moves walk_drmem_lmbs() out of .init section with
> a few changes to reuse it for setting up kdump kernel's usable memory
> ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
> and set linux,drconf-usable-memory & linux,usable-memory properties
> in order to restrict kdump kernel's memory usage.
> 
> The seventh patch updates purgatory to setup r8 & r9 with opal base
> and opal entry addresses respectively to aid kernels built with
> CONFIG_PPC_EARLY_DEBUG_OPAL enabled. The next patch setups up backup
> region as a kexec segment while loading kdump kernel and teaches
> purgatory to copy data from source to destination.
> 
> Patch 09 builds the elfcore header for the running kernel & passes
> the info to kdump kernel via "elfcorehdr=" parameter to export as
> /proc/vmcore file. The next patch sets up the memory reserve map
> for the kexec kernel and also claims kdump support for kdump as
> all the necessary changes are added.
> 
> The last patch fixes a lookup issue for `kexec -l -s` case when
> memory is reserved for crashkernel.
> 
> Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
> boxes, one with secureboot enabled, KVM guest and a simulator.
> 
> v4 -> v5:
> * Dropped patches 07/12 & 08/12 and updated purgatory to do everything
>   in assembly.
I guess you achieve this by carefully selecting instruction to avoid
relocation issue, right?

Thanks,
Pingfan



Re: [PATCH v2 00/12] ppc64: enable kdump support for kexec_file_load syscall

2020-07-06 Thread piliu



On 07/03/2020 03:53 AM, Hari Bathini wrote:
> This patch series enables kdump support for kexec_file_load system
> call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
> code but heavily modified for kernel consumption. There is scope to
> expand purgatory to verify sha256 digest along with other improvements
> in purgatory code. Will deal with those changes in a separate patch
> series later.
> 
> The first patch adds a weak arch_kexec_locate_mem_hole() function to
> override locate memory hole logic suiting arch needs. There are some
> special regions in ppc64 which should be avoided while loading buffer
> & there are multiple callers to kexec_add_buffer making it complicated
> to maintain range sanity and using generic lookup at the same time.
> 
> The second patch marks ppc64 specific code within arch/powerpc/kexec
> and arch/powerpc/purgatory to make the subsequent code changes easy
> to understand.
> 
> The next patch adds helper function to setup different memory ranges
> needed for loading kdump kernel, booting into it and exporting the
> crashing kernel's elfcore.
> 
> The fourth patch overrides arch_kexec_locate_mem_hole() function to
> locate memory hole for kdump segments by accounting for the special
> memory regions, referred to as excluded memory ranges, and sets
> kbuf->mem when a suitable memory region is found.
> 
> The fifth patch moves walk_drmem_lmbs() out of .init section with
> a few changes to reuse it for setting up kdump kernel's usable memory
> ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
> and set linux,drconf-usable-memory & linux,usable-memory properties
> in order to restrict kdump kernel's memory usage.
> 
> The seventh patch adds relocation support for the purgatory. Patch 8
> helps setup the stack for the purgatory. The next patch setups up
> backup region as a segment while loading kdump kernel and teaches
> purgatory to copy it from source to destination.
> 
> Patch 10 builds the elfcore header for the running kernel & passes
> the info to kdump kernel via "elfcorehdr=" parameter to export as
> /proc/vmcore file. The next patch sets up the memory reserve map
> for the kexec kernel and also claims kdump support for kdump as
> all the necessary changes are added.
> 
> The last patch fixes a lookup issue for `kexec -l -s` case when
> memory is reserved for crashkernel.
> 
> Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
> boxes and a simulator.
> 
> Changes in v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Addressed warnings reported by lkp.
> * Added patch to address kexec load issue when memory is reserved
>   for crashkernel.
> * Used the appropriate license header for the new files added.
> * Added an option to merge ranges to minimize reallocations while
>   adding memory ranges.
> * Dropped within_crashkernel parameter for add_opal_mem_range() &
>   add_rtas_mem_range() functions as it is not really needed.
> 
> ---
> 
> Hari Bathini (12):
>   kexec_file: allow archs to handle special regions while locating memory 
> hole
>   powerpc/kexec_file: mark PPC64 specific code
>   powerpc/kexec_file: add helper functions for getting memory ranges
>   ppc64/kexec_file: avoid stomping memory used by special regions
>   powerpc/drmem: make lmb walk a bit more flexible
>   ppc64/kexec_file: restrict memory usage of kdump kernel
>   ppc64/kexec_file: add support to relocate purgatory
>   ppc64/kexec_file: setup the stack for purgatory
>   ppc64/kexec_file: setup backup region for kdump kernel
>   ppc64/kexec_file: prepare elfcore header for crashing kernel
>   ppc64/kexec_file: add appropriate regions for memory reserve map
>   ppc64/kexec_file: fix kexec load failure with lack of memory hole
> 
> 
>  arch/powerpc/include/asm/crashdump-ppc64.h |   15 
>  arch/powerpc/include/asm/drmem.h   |9 
>  arch/powerpc/include/asm/kexec.h   |   35 +
>  arch/powerpc/include/asm/kexec_ranges.h|   18 
>  arch/powerpc/include/asm/purgatory.h   |   11 
>  arch/powerpc/kernel/prom.c |   13 
>  arch/powerpc/kexec/Makefile|2 
>  arch/powerpc/kexec/elf_64.c|   35 +
>  arch/powerpc/kexec/file_load.c |   78 +
>  arch/powerpc/kexec/file_load_64.c  | 1509 
> 
>  arch/powerpc/kexec/ranges.c|  397 +++
>  arch/powerpc/mm/drmem.c|   87 +-
>  arch/powerpc/mm/numa.c |   13 
>  arch/powerpc/purgatory/Makefile|   28 -
>  arch/powerpc/purgatory/purgatory_64.c  |   36 +
>  arch/powerpc/purgatory/trampoline.S|  117 --
>  arch/powerpc/purgatory/trampoline_64.S |  175 +++
>  include/linux/kexec.h  |   29 -
>  kernel/kexec_file.c|   16 
>  19 files changed, 2413 insertions(+), 210 

Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-01 Thread piliu



On 07/01/2020 03:40 PM, Dave Young wrote:
> Hi Hari,
> On 06/27/20 at 12:35am, Hari Bathini wrote:
>> crashkernel region could have an overlap with special memory regions
>> like  opal, rtas, tce-table & such. These regions are referred to as
>> exclude memory ranges. Setup this ranges during image probe in order
>> to avoid them while finding the buffer for different kdump segments.
>> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
>> accounting for these ranges. Also, override arch_kexec_add_buffer()
>> to locate a memory hole & later call __kexec_add_buffer() function
>> with kbuf->mem set to skip the generic locate memory hole lookup.
>>
>> Signed-off-by: Hari Bathini 
>> ---
>>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
>>  arch/powerpc/include/asm/kexec.h   |7 -
>>  arch/powerpc/kexec/elf_64.c|7 +
>>  arch/powerpc/kexec/file_load_64.c  |  292 
>> 
>>  4 files changed, 312 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
>>
> [snip]
>>  /**
>> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
>> + * regions like opal/rtas, tce-table, initrd,
>> + * kernel, htab which should be avoided while
>> + * setting up kexec load segments.
>> + * @mem_ranges:Range list to add the memory ranges to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
>> +{
>> +int ret;
>> +
>> +ret = add_tce_mem_ranges(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_initrd_mem_range(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_htab_mem_range(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_kernel_mem_range(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_rtas_mem_range(mem_ranges, false);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_opal_mem_range(mem_ranges, false);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_reserved_ranges(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +/* exclude memory ranges should be sorted for easy lookup */
>> +sort_memory_ranges(*mem_ranges);
>> +out:
>> +if (ret)
>> +pr_err("Failed to setup exclude memory ranges\n");
>> +return ret;
>> +}
> 
> I'm confused about the "overlap with crashkernel memory", does that mean
> those normal kernel used memory could be put in crashkernel reserved
> memory range?  If so why can't just skip those areas while crashkernel
> doing the reservation?
I raised the same question in another mail. As Hari's answer, "kexec -p"
skips these ranges in user space. And the same logic should be done in
"kexec -s -p"

Regards,
Pingfan



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-30 Thread piliu



On 06/30/2020 02:10 PM, Hari Bathini wrote:
> 
> 
> On 30/06/20 9:00 am, piliu wrote:
>>
>>
>> On 06/29/2020 01:55 PM, Hari Bathini wrote:
>>>
>>>
>>> On 28/06/20 7:44 am, piliu wrote:
>>>> Hi Hari,
>>>
>>> Hi Pingfan,
>>>
>>>>
>>>> After a quick through for this series, I have a few question/comment on
>>>> this patch for the time being. Pls see comment inline.
>>>>
>>>> On 06/27/2020 03:05 AM, Hari Bathini wrote:
>>>>> crashkernel region could have an overlap with special memory regions
>>>>> like  opal, rtas, tce-table & such. These regions are referred to as
>>>>> exclude memory ranges. Setup this ranges during image probe in order
>>>>> to avoid them while finding the buffer for different kdump segments.
>>>
>>> [...]
>>>
>>>>> + /*
>>>>> +  * Use the locate_mem_hole logic in kexec_add_buffer() for regular
>>>>> +  * kexec_file_load syscall
>>>>> +  */
>>>>> + if (kbuf->image->type != KEXEC_TYPE_CRASH)
>>>>> + return 0;
>>>> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
>>>> there is no requirement for @exclude_ranges.
>>>
>>> The ranges like rtas, opal are loaded by f/w. They almost always overlap 
>>> with
>>> crashkernel region. So, @exclude_ranges is required to support kdump.
>> f/w passes rtas/opal as service, then must f/w mark these ranges as
>> fdt_reserved_mem in order to make kernel aware not to use these ranges?
> 
> It does. Actually, reserve_map + reserved-ranges are reserved as soon as
> memblock allocator is ready but not before crashkernel reservation.
> Check early_reserve_mem() call in kernel/prom.c
> 
>> Otherwise kernel memory allocation besides kdump can also overwrite
>> these ranges.> 
>> Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
>> memory into consider except kernel text. Could it work based on memblock
>> allocator?
> 
> So, kdump could possibly overwrite these regions which is why an exclude
> range list is needed. Same thing was done in kexec-tools as well.
OK, got it.

Thanks,
Pingfan
> 
> Thanks
> Hari
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-29 Thread piliu



On 06/29/2020 01:55 PM, Hari Bathini wrote:
> 
> 
> On 28/06/20 7:44 am, piliu wrote:
>> Hi Hari,
> 
> Hi Pingfan,
> 
>>
>> After a quick through for this series, I have a few question/comment on
>> this patch for the time being. Pls see comment inline.
>>
>> On 06/27/2020 03:05 AM, Hari Bathini wrote:
>>> crashkernel region could have an overlap with special memory regions
>>> like  opal, rtas, tce-table & such. These regions are referred to as
>>> exclude memory ranges. Setup this ranges during image probe in order
>>> to avoid them while finding the buffer for different kdump segments.
> 
> [...]
> 
>>> +   /*
>>> +* Use the locate_mem_hole logic in kexec_add_buffer() for regular
>>> +* kexec_file_load syscall
>>> +*/
>>> +   if (kbuf->image->type != KEXEC_TYPE_CRASH)
>>> +   return 0;
>> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
>> there is no requirement for @exclude_ranges.
> 
> The ranges like rtas, opal are loaded by f/w. They almost always overlap with
> crashkernel region. So, @exclude_ranges is required to support kdump.
f/w passes rtas/opal as service, then must f/w mark these ranges as
fdt_reserved_mem in order to make kernel aware not to use these ranges?
Otherwise kernel memory allocation besides kdump can also overwrite
these ranges.

Hmm, revisiting reserve_crashkernel(). It seems not to take any reserved
memory into consider except kernel text. Could it work based on memblock
allocator?

Thanks,
Pingfan
> 
>> I guess you have a design for future. If not true, then it is better to
>> fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the
>> caller and rename this function to better distinguish use cases between
>> kexec and kdump
> 
> Yeah, this condition will be folded. I have a follow-up patch for that 
> explaining
> why kexec case should also be folded. Will try to add that to this series for 
> v2.
> 
> Thanks
> Hari
> 



Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole

2020-06-27 Thread piliu
Hi Hari,

If in [4/11],  get_exclude_memory_ranges() turns out to be unnecessary
,then this patch is abundant either. As my understanding, memblock has
already helped to achieved the purpose that get_exclude_memory_ranges()
wants.

Thanks,
Pingfan

On 06/27/2020 03:04 AM, Hari Bathini wrote:
> Some archs can have special memory regions, within the given memory
> range, which can't be used for the buffer in a kexec segment. As
> kexec_add_buffer() function is being called from generic code as well,
> add weak arch_kexec_add_buffer definition for archs to override & take
> care of special regions before trying to locate a memory hole.
> 
> Signed-off-by: Hari Bathini 
> ---
>  include/linux/kexec.h |5 +
>  kernel/kexec_file.c   |   37 +
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2..1237682 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -195,6 +195,11 @@ int __weak arch_kexec_apply_relocations(struct 
> purgatory_info *pi,
>   const Elf_Shdr *relsec,
>   const Elf_Shdr *symtab);
>  
> +extern int arch_kexec_add_buffer(struct kexec_buf *kbuf);
> +
> +/* arch_kexec_add_buffer calls this when it is ready */
> +extern int __kexec_add_buffer(struct kexec_buf *kbuf);
> +
>  extern int kexec_add_buffer(struct kexec_buf *kbuf);
>  int kexec_locate_mem_hole(struct kexec_buf *kbuf);
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index bb05fd5..a0b4f7f 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -669,10 +669,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>   */
>  int kexec_add_buffer(struct kexec_buf *kbuf)
>  {
> -
> - struct kexec_segment *ksegment;
> - int ret;
> -
>   /* Currently adding segment this way is allowed only in file mode */
>   if (!kbuf->image->file_mode)
>   return -EINVAL;
> @@ -696,6 +692,25 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>   kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE);
>   kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
>  
> + return arch_kexec_add_buffer(kbuf);
> +}
> +
> +/**
> + * __kexec_add_buffer - arch_kexec_add_buffer would call this function after
> + *  updating kbuf, to place a buffer in a kexec segment.
> + * @kbuf:   Buffer contents and memory parameters.
> + *
> + * This function assumes that kexec_mutex is held.
> + * On successful return, @kbuf->mem will have the physical address of
> + * the buffer in memory.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __kexec_add_buffer(struct kexec_buf *kbuf)
> +{
> + struct kexec_segment *ksegment;
> + int ret;
> +
>   /* Walk the RAM ranges and allocate a suitable range for the buffer */
>   ret = kexec_locate_mem_hole(kbuf);
>   if (ret)
> @@ -711,6 +726,20 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>   return 0;
>  }
>  
> +/**
> + * arch_kexec_add_buffer - Some archs have memory regions within the given
> + * range that can't be used to place a kexec segment.
> + * Such archs can override this function to take care
> + * of them before trying to locate the memory hole.
> + * @kbuf:  Buffer contents and memory parameters.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __weak arch_kexec_add_buffer(struct kexec_buf *kbuf)
> +{
> + return __kexec_add_buffer(kbuf);
> +}
> +
>  /* Calculate and store the digest of segments */
>  static int kexec_calculate_store_digests(struct kimage *image)
>  {
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-27 Thread piliu
Hi Hari,

After a quick through for this series, I have a few question/comment on
this patch for the time being. Pls see comment inline.

On 06/27/2020 03:05 AM, Hari Bathini wrote:
> crashkernel region could have an overlap with special memory regions
> like  opal, rtas, tce-table & such. These regions are referred to as
> exclude memory ranges. Setup this ranges during image probe in order
> to avoid them while finding the buffer for different kdump segments.
> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
> accounting for these ranges. Also, override arch_kexec_add_buffer()
> to locate a memory hole & later call __kexec_add_buffer() function
> with kbuf->mem set to skip the generic locate memory hole lookup.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
>  arch/powerpc/include/asm/kexec.h   |7 -
>  arch/powerpc/kexec/elf_64.c|7 +
>  arch/powerpc/kexec/file_load_64.c  |  292 
> 
>  4 files changed, 312 insertions(+), 4 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
> 
> diff --git a/arch/powerpc/include/asm/crashdump-ppc64.h 
> b/arch/powerpc/include/asm/crashdump-ppc64.h
> new file mode 100644
> index 000..3596c25
> --- /dev/null
> +++ b/arch/powerpc/include/asm/crashdump-ppc64.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H
> +#define _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H
> +
> +/* min & max addresses for kdump load segments */
> +#define KDUMP_BUF_MIN(crashk_res.start)
> +#define KDUMP_BUF_MAX((crashk_res.end < ppc64_rma_size) ? \
> +  crashk_res.end : (ppc64_rma_size - 1))
> +
> +#endif /* __ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H */
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 7008ea1..bf47a01 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -100,14 +100,16 @@ void relocate_new_kernel(unsigned long 
> indirection_page, unsigned long reboot_co
>  #ifdef CONFIG_KEXEC_FILE
>  extern const struct kexec_file_ops kexec_elf64_ops;
>  
> -#ifdef CONFIG_IMA_KEXEC
>  #define ARCH_HAS_KIMAGE_ARCH
>  
>  struct kimage_arch {
> + struct crash_mem *exclude_ranges;
> +
> +#ifdef CONFIG_IMA_KEXEC
>   phys_addr_t ima_buffer_addr;
>   size_t ima_buffer_size;
> -};
>  #endif
> +};
>  
>  int setup_purgatory(struct kimage *image, const void *slave_code,
>   const void *fdt, unsigned long kernel_load_addr,
> @@ -125,6 +127,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
> *fdt,
>   unsigned long initrd_load_addr,
>   unsigned long initrd_len, const char *cmdline);
>  #endif /* CONFIG_PPC64 */
> +
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #else /* !CONFIG_KEXEC_CORE */
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index 23ad04c..c695f94 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static void *elf64_load(struct kimage *image, char *kernel_buf,
>   unsigned long kernel_len, char *initrd,
> @@ -46,6 +47,12 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   if (ret)
>   goto out;
>  
> + if (image->type == KEXEC_TYPE_CRASH) {
> + /* min & max buffer values for kdump case */
> + kbuf.buf_min = pbuf.buf_min = KDUMP_BUF_MIN;
> + kbuf.buf_max = pbuf.buf_max = KDUMP_BUF_MAX;
> + }
> +
>   ret = kexec_elf_load(image, , _info, , _load_addr);
>   if (ret)
>   goto out;
> diff --git a/arch/powerpc/kexec/file_load_64.c 
> b/arch/powerpc/kexec/file_load_64.c
> index e6bff960..f1d7160 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>   _elf64_ops,
> @@ -24,6 +26,247 @@ const struct kexec_file_ops * const kexec_file_loaders[] 
> = {
>  };
>  
>  /**
> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
> + * regions like opal/rtas, tce-table, initrd,
> + * kernel, htab which should be avoided while
> + * setting up kexec load segments.
> + * @mem_ranges:Range list to add the memory ranges to.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
Is it needed? See the comment below.
> +{
> + int ret;
> +
> + ret = add_tce_mem_ranges(mem_ranges);
> + if (ret)
> + goto out;
> +
> +

Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread piliu



On 04/09/2020 03:26 PM, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>>> blocks as removable"), the user space interface to compute whether a memory
>>>> block can be offlined (exposed via
>>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>
>>>>
>>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>>> we'll start by:
>>>> 1. Testing if it contains any holes, and reject if so
>>>> 2. Testing if pages belong to different zones, and reject if so
>>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>>
>>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>>> it can easily result in false positives/negatives. Let's stop manually
>>>> checking is_mem_section_removable(), and let device_offline() handle it
>>>> completely instead. We can remove the racy is_mem_section_removable()
>>>> implementation next.
>>>>
>>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>>> zone lock when isolating), but maybe we should optimize that
>>>> implementation instead if this ever becomes a real problem (after all,
>>>> memory unplug is already an expensive operation). We started using
>>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>>> Implement memory hotplug remove in the kernel"), with the initial
>>>> hotremove support of lmbs.
>>>>
>>>> Cc: Nathan Fontenot 
>>>> Cc: Michael Ellerman 
>>>> Cc: Benjamin Herrenschmidt 
>>>> Cc: Paul Mackerras 
>>>> Cc: Michal Hocko 
>>>> Cc: Andrew Morton 
>>>> Cc: Oscar Salvador 
>>>> Cc: Baoquan He 
>>>> Cc: Wei Yang 
>>>> Signed-off-by: David Hildenbrand 
>>>> ---
>>>>  .../platforms/pseries/hotplug-memory.c| 26 +++
>>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index b2cde1732301..5ace2f9a277e 100644
>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct 
>>>> device_node *np)
>>>>  
>>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>>  {
>>>> -  int i, scns_per_block;
>>>> -  bool rc = true;
>>>> -  unsigned long pfn, block_sz;
>>>> -  u64 phys_addr;
>>>> -
>>>>if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>>return false;
>>>>  
>>>> -  block_sz = memory_block_size_bytes();
>>>> -  scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>>> -  phys_addr = lmb->base_addr;
>>>> -
>>>>  #ifdef CONFIG_FA_DUMP
>>>>/*
>>>> * Don't hot-remove memory that falls in fadump boot memory area
>>>> * and memory that is reserved for capturing old kernel memory.
>>>> */
>>>> -  if (is_fadump_memory_area(phys_addr, block_sz))
>>>> +  if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>>return false;
>>>>  #endif
>>>> -
>>>> -  for (i = 0; i < scns_per_block; i++) {
>>>> -  pfn = PFN_DOWN(phys_addr);
>>>> -  if (!pfn_in_present_section(pfn)) {
>>>> -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -  continue;
>>>> -  }
>>>> -
>>>> -  rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>>> -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -  

Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-08 Thread piliu



On 04/08/2020 10:46 AM, Baoquan He wrote:
> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
> 
> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>> blocks as removable"), the user space interface to compute whether a memory
>> block can be offlined (exposed via
>> /sys/devices/system/memory/memoryX/removable) has effectively been
>> deprecated. We want to remove the leftovers of the kernel implementation.
> 
> Pingfan, can you have a look at this change on PPC?  Please feel free to
> give comments if any concern, or offer ack if it's OK to you.
> 
>>
>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>> we'll start by:
>> 1. Testing if it contains any holes, and reject if so
>> 2. Testing if pages belong to different zones, and reject if so
>> 3. Isolating the page range, checking if it contains any unmovable pages
>>
>> Using is_mem_section_removable() before trying to offline is not only racy,
>> it can easily result in false positives/negatives. Let's stop manually
>> checking is_mem_section_removable(), and let device_offline() handle it
>> completely instead. We can remove the racy is_mem_section_removable()
>> implementation next.
>>
>> We now take more locks (e.g., memory hotplug lock when offlining and the
>> zone lock when isolating), but maybe we should optimize that
>> implementation instead if this ever becomes a real problem (after all,
>> memory unplug is already an expensive operation). We started using
>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>> Implement memory hotplug remove in the kernel"), with the initial
>> hotremove support of lmbs.
>>
>> Cc: Nathan Fontenot 
>> Cc: Michael Ellerman 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michal Hocko 
>> Cc: Andrew Morton 
>> Cc: Oscar Salvador 
>> Cc: Baoquan He 
>> Cc: Wei Yang 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  .../platforms/pseries/hotplug-memory.c| 26 +++
>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index b2cde1732301..5ace2f9a277e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node 
>> *np)
>>  
>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>  {
>> -int i, scns_per_block;
>> -bool rc = true;
>> -unsigned long pfn, block_sz;
>> -u64 phys_addr;
>> -
>>  if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>  return false;
>>  
>> -block_sz = memory_block_size_bytes();
>> -scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> -phys_addr = lmb->base_addr;
>> -
>>  #ifdef CONFIG_FA_DUMP
>>  /*
>>   * Don't hot-remove memory that falls in fadump boot memory area
>>   * and memory that is reserved for capturing old kernel memory.
>>   */
>> -if (is_fadump_memory_area(phys_addr, block_sz))
>> +if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>  return false;
>>  #endif
>> -
>> -for (i = 0; i < scns_per_block; i++) {
>> -pfn = PFN_DOWN(phys_addr);
>> -if (!pfn_in_present_section(pfn)) {
>> -phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -continue;
>> -}
>> -
>> -rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>> -phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> -}
>> -
>> -return rc;
>> +/* device_offline() will determine if we can actually remove this lmb */
>> +return true;
So I think here swaps the check and do sequence. At least it breaks
dlpar_memory_remove_by_count(). It is doable to remove
is_mem_section_removable(), but here should be more effort to re-arrange
the code.

Thanks,
Pingfan
>>  }
>>  
>>  static int dlpar_add_lmb(struct drmem_lmb *);
>> -- 
>> 2.25.1
>>