Re: [PATCH v3 6/6] crash hp: Add x86 crash hotplug support

2022-01-26 Thread Baoquan He
On 01/26/22 at 11:32am, Eric DeVolder wrote:
..snip 
> > > > > +void arch_crash_hotplug_handler(struct kimage *image,
> > > > > + unsigned int hp_action, unsigned long a, unsigned long b)
> > > > > +{
> > > > > + /*
> > > > > +  * To accurately reflect hot un/plug changes, the elfcorehdr 
> > > > > (which
> > > > > +  * is passed to the crash kernel via the elfcorehdr= parameter)
> > > > > +  * must be updated with the new list of CPUs and memories. The 
> > > > > new
> > > > > +  * elfcorehdr is prepared in a kernel buffer, and if no errors,
> > > > > +  * then it is written on top of the existing/old elfcorehdr.
> > > > > +  *
> > > > > +  * Due to the change to the elfcorehdr, purgatory must 
> > > > > explicitly
> > > > > +  * exclude the elfcorehdr from the list of segments it checks.
> > > > > +  */
> > > > 
> > > > Please move this code comment to above function as kernel-doc if you
> > > > this it benefits the entire function. Otherwise should move them above
> > > > the code block they are explaining. For this place, I think moving them
> > > > to above arch_crash_hotplug_handler() is better.
> > > 
> > > ok, I will do that!
> > > 
> > > > 
> > > > > + struct kexec_segment *ksegment;
> > > > > + unsigned char *ptr = NULL;
> > > > > + unsigned long elfsz = 0;
> > > > > + void *elfbuf = NULL;
> > > > > + unsigned long mem, memsz;
> > > > > + unsigned int n;
> > > > > +
> > > > > + /*
> > > > > +  * When the struct kimage is alloced, it is wiped to zero, so
> > > > > +  * the elf_index_valid defaults to false. It is set on the
> > > > > +  * kexec_file_load path, or here for kexec_load.
> > > > > +  */
> > > > 
> > > > I think this kexec loading part should be taken out and post after this
> > > > whole patchset being accepted. At least, it's worth to put them in a
> > > > separate patch.
> > > 
> > > This little bit of code that identifies the incoming elfcorehdr is all 
> > > that
> > > is needed to support kexec_load (and the userspace changes of course). I'm
> > > happy to split as a separate patch, but I would think that be maintaining 
> > > it
> > > with this series, then when it is accepted, both the kexec_load and
> > > kexec_file_load paths would be supported? Your call.
> > 
> > Hmm, at first, let's split it out from this patch since it's an
> > independent action to kdump. I would suggest we don't carry it in this
> > series. After this series is done, you can post another patchset
> > including this part as kernel patch, and also the code change in
> > kexec_tools as user space patch.
> > 
> > ..
> > 
> 
> OK, I'll remove the bit of code that supports kexec_load, so it can be 
> introduced
> later coincident with the changes to kexec-tools.
> 
> In a previous message you mentioned making changes to the order of the 
> patches,
> was this it, or is there more to come?

Yeah, replied to cover letter, please check it there.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/6] crash: Kernel handling of CPU and memory hot un/plug

2022-01-26 Thread Baoquan He
Hi Eric,

On 01/10/22 at 02:57pm, Eric DeVolder wrote:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
> and memory in the system, must also be updated, else the resulting
> vmcore is inaccurate (eg. missing either CPU context or memory
> regions).
> 
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (e. kernel, initrd, boot_params, puratory and
> elfcorehdr) by the userspace kexec utility. In previous posts I have
> outlined the significant performance problems related to offloading
> this activity to userspace.
> 
> This patchset introduces a generic crash hot un/plug handler that
> registers with the CPU and memory notifiers. Upon CPU or memory
> changes, this generic handler is invoked and performs important
> housekeeping, for example obtaining the appropriate lock, and then
> invokes an architecture specific handler to do the appropriate
> updates.
> 
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory. No involvement
> with userspace needed.

About patch splitting, it is always a headache to me when I post
patches. Both too coarse or too small granularity are not good. But we
should obey two rules:
1) commits had better be bisectable, so that later we find out which
   commit cause issue or regression easier. Here, your patch 5/6 and 6/6 are
   obviously breaking the rule. Because crash_prepare_elf64_headers()'s
   protype has been changed, its invokation is not modified accodingly.

2) Each patch only cover one action, bad better not stack too much in
one patch. Surely, we should avoid to scatter one thing into several
patch, it's not easy to reivew. E.g in this patchset, kexec_load can be
taken out, for this we have discussed. Skipping the elfcorehdr digest
can be taken out, its code change is simple, while it's worth to tell
the reason in a patch.

So my suggestion to rearrange the patchset as below, please take it as a
reference.
1)crash: fix minor typo/bug in debug message
2)crash hp: Introduce CRASH_HOTPLUG configuration options
3)crash hp: definitions and prototype changes
  Note: this patch need cover those invokation of crash_prepare_elf64_headers.
  Otherwise it's not bisectable.
4)crash hp: introduce helper function map_crash_pages/unmap_crash_pages
5)crash hp: generic crash hotplug support infrastructure
6)crash hp: exclude elfcorehdr from the segment checksum calculation
7)crash hp: exclude hot removed cpu from cpu notes of elfcorehdr
  this includes the assigment part in crash_hotplug_handler() and 
  usage part in crash_prepare_elf64_headers().
8)crash hp/x86: add the crash hp support for x86

Above patch splitting and patch subject may not be appropriate, please
adust or change with your understanding.

> 
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
> 
>  - Disable the udev rule for updating kdump on hot un/plug changes
>Eg. on RHEL: rm -f /usr/lib/udev/rules.d/98-kexec.rules
>or other technique to neuter the rule.
> 
>  - Change to the kexec_file_load for loading the kdump kernel:
>Eg. on RHEL: in /usr/bin/kdumpctl, change to:
> standard_kexec_args="-p -d -s"
>which adds the -s to select kexec_file_load syscall.
> 
> This patchset supports kexec_load with a modified kexec userspace
> utility, and a working changeset to the kexec userspace utility
> is provided here (and to use, the above change to standard_kexec_args
> would be, for example, to append --hotplug-size=262144 instead of -s).
> 
>  diff --git a/kexec/arch/i386/crashdump-x86.c 
> b/kexec/arch/i386/crashdump-x86.c
>  index 9826f6d..06adb7e 100644
>  --- a/kexec/arch/i386/crashdump-x86.c
>  +++ b/kexec/arch/i386/crashdump-x86.c
>  @@ -48,6 +48,7 @@
>   #include 
>   
>   extern struct arch_options_t arch_options;
>  +extern unsigned long long hotplug_size;
>   
>   static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
> struct crash_elf_info *elf_info)
>  @@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, 
> char* mod_cmdline,
>   } else {
>   memsz = bufsz;
>   }
>  +
>  +/* If hotplug support enabled, use that size */
>  +if (hotplug_size) {
>  +memsz = hotplug_size;
>  +}
>  +
>  +info->elfcorehdr =
>   elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
>   max_addr, -1);
>   dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
>  diff --git a/kexec/kexec.c b/kexec/kexec.c
>  index f63b36b..9569d9a 100644
>  --- a/kexec/kexec.c
>  +++ b/kexec/kexec.c
>  @@ -58,6 +58,7 @@
>   
>   unsigned long long mem_min = 0;
>   unsigned long long mem_max = ULONG_MAX;
>  +unsigned long long hotplug_size = 0;
>   static unsigned long kexec_flags = 0;
>   /* Flags for kexec 

Re: [PATCH v3 6/6] crash hp: Add x86 crash hotplug support

2022-01-26 Thread Eric DeVolder

Baoquan,
Thanks for looking at this! See inline response below.
eric

On 1/26/22 03:12, Baoquan He wrote:

On 01/21/22 at 08:06am, Eric DeVolder wrote:
..

   arch/x86/kernel/crash.c | 138 +++-
   1 file changed, 137 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..d185137b33d4 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
@@ -265,7 +266,8 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
goto out;
/* By default prepare 64bit headers */
-   ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
addr, sz);
+   ret =  crash_prepare_elf64_headers(image, cmem,
+   IS_ENABLED(CONFIG_X86_64), addr, sz);
   out:
vfree(cmem);
@@ -397,7 +399,17 @@ int crash_load_segments(struct kimage *image)
image->elf_headers = kbuf.buffer;
image->elf_headers_sz = kbuf.bufsz;
+#ifdef CONFIG_CRASH_HOTPLUG
+   /* Ensure elfcorehdr segment large enough for hotplug changes */
+   kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;


I would define a default value for the size, meantime provide a Kconfig
option to allow user to customize.


In patch 2/6 of this series, "crash hp: Introduce CRASH_HOTPLUG
configuration options", I provide the following:

+config CRASH_HOTPLUG_ELFCOREHDR_SZ
+   depends on CRASH_HOTPLUG
+   int
+   default 131072
+   help
+ Specify the maximum size of the elfcorehdr buffer/segment.

which defines a default value of 128KiB, and can be overriden at configure time.

Are you asking for a different technique?


I thought to define a global variable, like

/* Defaults to ahve 128K elfcorehdr buffer which contains 2048 entries.*/
unsigned long crash_hotplug_elfcorehdr_size = 0x2;

Then initialize it in crash_hotplug_init() if CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ
is enabled.

Seems using the config directly is also OK. Let's keep it and see if
other people have comment.


OK, I will leave alone for the time being.








+   /* For marking as usable to crash kernel */
+   image->elf_headers_sz = kbuf.memsz;
+   /* Record the index of the elfcorehdr segment */
+   image->elf_index = image->nr_segments;
+   image->elf_index_valid = true;
+#else
kbuf.memsz = kbuf.bufsz;
+#endif
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer();
@@ -412,3 +424,127 @@ int crash_load_segments(struct kimage *image)
return ret;
   }
   #endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG


These two helper function should be carved out into a separate patch as
a preparatory one. I am considering how to rearrange and split the
patches, will reply to cover letter.


OK, I look forward to that insight!




+void *map_crash_pages(unsigned long paddr, unsigned long size)
+{
+   /*
+* NOTE: The addresses and sizes passed to this routine have
+* already been fully aligned on page boundaries. There is no
+* need for massaging the address or size.
+*/
+   void *ptr = NULL;
+
+   /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
+   if (size > 0) {
+   struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+
+   ptr = kmap(page);
+   }
+
+   return ptr;
+}
+
+void unmap_crash_pages(void **ptr)
+{
+   if (ptr) {
+   if (*ptr)
+   kunmap(*ptr);
+   *ptr = NULL;
+   }
+}
+
+void arch_crash_hotplug_handler(struct kimage *image,
+   unsigned int hp_action, unsigned long a, unsigned long b)
+{
+   /*
+* To accurately reflect hot un/plug changes, the elfcorehdr (which
+* is passed to the crash kernel via the elfcorehdr= parameter)
+* must be updated with the new list of CPUs and memories. The new
+* elfcorehdr is prepared in a kernel buffer, and if no errors,
+* then it is written on top of the existing/old elfcorehdr.
+*
+* Due to the change to the elfcorehdr, purgatory must explicitly
+* exclude the elfcorehdr from the list of segments it checks.
+*/


Please move this code comment to above function as kernel-doc if you
this it benefits the entire function. Otherwise should move them above
the code block they are explaining. For this place, I think moving them
to above arch_crash_hotplug_handler() is better.


ok, I will do that!




+   struct kexec_segment *ksegment;
+   unsigned char *ptr = NULL;
+   unsigned long elfsz = 0;
+   void *elfbuf = NULL;
+   unsigned long mem, memsz;
+   unsigned int n;
+
+   /*
+* When the struct kimage is alloced, it is wiped to zero, so
+* the elf_index_valid defaults to false. It is 

Re: [PATCH v20 5/5] kdump: update Documentation about crashkernel

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

So update the Documentation.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 


Acked-by: John Donnelly  


---
  Documentation/admin-guide/kdump/kdump.rst   | 11 +--
  Documentation/admin-guide/kernel-parameters.txt | 11 +--
  2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
 kernel will automatically locate the crash kernel image within the
 first 512MB of RAM if X is not given.
  
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of

-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
+   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+   fall back to high allocation if it fails.
+   We can also use "crashkernel=X,high" to select a high region above
+   DMA zone, which also tries to allocate at least 256M low memory in
+   DMA zone automatically.
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
+   Use "crashkernel=Y@X" if you really have to reserve memory from
+   specified start address X. Note that the start address of the kernel,
+   X if explicitly specified, must be aligned to 2MiB (0x20).
  
  Load the Dump-capture Kernel

  
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9ed9..65780c2ca830be0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -792,6 +792,9 @@
[KNL, X86-64] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+   [KNL, ARM64] Try low allocation in DMA zone and fall 
back
+   to high allocation if it fails when '@offset' hasn't 
been
+   specified.
See Documentation/admin-guide/kdump/kdump.rst for 
further details.
  
  	crashkernel=range1:size1[,range2:size2,...][@offset]

@@ -808,6 +811,8 @@
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
+   [KNL, ARM64] range in high memory.
+   Allow kernel to allocate physical memory region from 
top.
crashkernel=size[KMG],low
[KNL, X86-64] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
@@ -816,13 +821,15 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
-   at least 256M below 4G automatically.
+   least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+   [KNL, ARM64] range in low memory.
+   This one let user to specify a low range in DMA zone for
+   crash dump kernel.
cryptomgr.notests
[KNL] Disable crypto self-tests
  



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 4/5] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
 linux,usable-memory-range = 

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou 
Co-developed-by: Zhen Lei 
Signed-off-by: Zhen Lei 
Reviewed-by: Rob Herring 
Tested-by: Dave Kleikamp 


Acked-by: John Donnelly  


---
  drivers/of/fdt.c | 33 +++--
  1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index ad85ff6474ff139..df4b9d2418a13d4 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -973,16 +973,24 @@ static void __init 
early_init_dt_check_for_elfcorehdr(unsigned long node)
  
  static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
  
+/*

+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES  2
+
  /**
   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
   * location from flat tree
   */
  void __init early_init_dt_check_for_usable_mem_range(void)
  {
-   const __be32 *prop;
-   int len;
-   phys_addr_t cap_mem_addr;
-   phys_addr_t cap_mem_size;
+   struct memblock_region rgn[MAX_USABLE_RANGES] = {0};
+   const __be32 *prop, *endp;
+   int len, i;
unsigned long node = chosen_node_offset;
  
  	if ((long)node < 0)

@@ -991,16 +999,21 @@ void __init early_init_dt_check_for_usable_mem_range(void)
pr_debug("Looking for usable-memory-range property... ");
  
  	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", );

-   if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+   if (!prop || (len % (dt_root_addr_cells + dt_root_size_cells)))
return;
  
-	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, );

-   cap_mem_size = dt_mem_next_cell(dt_root_size_cells, );
+   endp = prop + (len / sizeof(__be32));
+   for (i = 0; i < MAX_USABLE_RANGES && prop < endp; i++) {
+   rgn[i].base = dt_mem_next_cell(dt_root_addr_cells, );
+   rgn[i].size = dt_mem_next_cell(dt_root_size_cells, );
  
-	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", _mem_addr,

-_mem_size);
+   pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+i, [i].base, [i].size);
+   }
  
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);

+   memblock_cap_memory_range(rgn[0].base, rgn[0].size);
+   for (i = 1; i < MAX_USABLE_RANGES && rgn[i].size; i++)
+   memblock_add(rgn[i].base, rgn[i].size);
  }
  
  #ifdef CONFIG_SERIAL_EARLYCON



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 3/5] arm64: kdump: reimplement crashkernel=X

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.

Signed-off-by: Chen Zhou 
Co-developed-by: Zhen Lei 
Signed-off-by: Zhen Lei 



Acked-by: John Donnelly  


---
  arch/arm64/kernel/machine_kexec.c  |  9 +++-
  arch/arm64/kernel/machine_kexec_file.c | 12 -
  arch/arm64/mm/init.c   | 68 --
  3 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index e16b248699d5c3c..19c2d487cb08feb 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn)
  
  	/* in reserved memory? */

addr = __pfn_to_phys(pfn);
-   if ((addr < crashk_res.start) || (crashk_res.end < addr))
-   return false;
+   if ((addr < crashk_res.start) || (crashk_res.end < addr)) {
+   if (!crashk_low_res.end)
+   return false;
+
+   if ((addr < crashk_low_res.start) || (crashk_low_res.end < 
addr))
+   return false;
+   }
  
  	if (!kexec_crash_image)

return true;
diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index 59c648d51848886..889951291cc0f9c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long 
*sz)
  
  	/* Exclude crashkernel region */

ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+   if (ret)
+   goto out;
+
+   if (crashk_low_res.end) {
+   ret = crash_exclude_mem_range(cmem, crashk_low_res.start, 
crashk_low_res.end);
+   if (ret)
+   goto out;
+   }
  
-	if (!ret)

-   ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+   ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
  
+out:

kfree(cmem);
return ret;
  }
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6c653a2c7cff052..a5d43feac0d7d96 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -71,6 +71,30 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  #define CRASH_ADDR_LOW_MAXarm64_dma_phys_limit
  #define CRASH_ADDR_HIGH_MAX   MEMBLOCK_ALLOC_ACCESSIBLE
  
+static int __init reserve_crashkernel_low(unsigned long long low_size)

+{
+   unsigned long long low_base;
+
+   /* passed with crashkernel=0,low ? */
+   if (!low_size)
+   return 0;
+
+   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
CRASH_ADDR_LOW_MAX);
+   if (!low_base) {
+   pr_err("cannot allocate crashkernel low memory 
(size:0x%llx).\n", low_size);
+   return -ENOMEM;
+   }
+
+   pr_info("crashkernel low memory reserved: 0x%llx - 0x%llx (%lld MB)\n",
+   low_base, low_base + low_size, low_size >> 20);
+
+   crashk_low_res.start = low_base;
+   crashk_low_res.end   = low_base + low_size - 1;
+   insert_resource(_resource, _low_res);
+
+   return 0;
+}
+
  /*
   * reserve_crashkernel() - reserves memory for crash kernel
   *
@@ -81,29 +105,62 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
+   unsigned long long crash_low_size = SZ_256M;
unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
+   bool fixed_base;
+   char *cmdline = boot_command_line;
  
-	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),

+   /* crashkernel=X[@offset] */
+   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
_size, _base);
-   /* no crashkernel= or invalid value specified */
-   if (ret || !crash_size)
-   return;
+   if (ret || !crash_size) {
+   unsigned long long low_size;
  
+		/* crashkernel=X,high */

+   ret = parse_crashkernel_high(cmdline, 0, _size, 
_base);
+   if (ret || !crash_size)
+   return;
+
+   /* crashkernel=X,low */
+   ret = 

Re: [PATCH v20 2/5] arm64: kdump: introduce some macros for crash kernel reservation

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

From: Chen Zhou 

Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.

Signed-off-by: Chen Zhou 
Signed-off-by: Zhen Lei 
Tested-by: John Donnelly 
Tested-by: Dave Kleikamp 



Acked-by: John Donnelly  


---
  arch/arm64/mm/init.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 90f276d46b93bc6..6c653a2c7cff052 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -65,6 +65,12 @@ EXPORT_SYMBOL(memstart_addr);
  phys_addr_t arm64_dma_phys_limit __ro_after_init;
  
  #ifdef CONFIG_KEXEC_CORE

+/* Current arm64 boot protocol requires 2MB alignment */
+#define CRASH_ALIGNSZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAXMEMBLOCK_ALLOC_ACCESSIBLE
+
  /*
   * reserve_crashkernel() - reserves memory for crash kernel
   *
@@ -75,7 +81,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
  static void __init reserve_crashkernel(void)
  {
unsigned long long crash_base, crash_size;
-   unsigned long long crash_max = arm64_dma_phys_limit;
+   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
int ret;
  
  	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),

@@ -90,8 +96,7 @@ static void __init reserve_crashkernel(void)
if (crash_base)
crash_max = crash_base + crash_size;
  
-	/* Current arm64 boot protocol requires 2MB alignment */

-   crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
   crash_base, crash_max);
if (!crash_base) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v20 1/5] arm64: Use insert_resource() to simplify code

2022-01-26 Thread john . p . donnelly

On 1/24/22 2:47 AM, Zhen Lei wrote:

insert_resource() traverses the subtree layer by layer from the root node
until a proper location is found. Compared with request_resource(), the
parent node does not need to be determined in advance.

In addition, move the insertion of node 'crashk_res' into function
reserve_crashkernel() to make the associated code close together.

Signed-off-by: Zhen Lei 



Acked-by: John Donnelly  


---
  arch/arm64/kernel/setup.c | 17 +++--
  arch/arm64/mm/init.c  |  1 +
  2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f70573928f1bff0..a81efcc359e4e78 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -225,6 +225,8 @@ static void __init request_standard_resources(void)
kernel_code.end = __pa_symbol(__init_begin - 1);
kernel_data.start   = __pa_symbol(_sdata);
kernel_data.end = __pa_symbol(_end - 1);
+   insert_resource(_resource, _code);
+   insert_resource(_resource, _data);
  
  	num_standard_resources = memblock.memory.cnt;

res_size = num_standard_resources * sizeof(*standard_resources);
@@ -246,20 +248,7 @@ static void __init request_standard_resources(void)
res->end = 
__pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
}
  
-		request_resource(_resource, res);

-
-   if (kernel_code.start >= res->start &&
-   kernel_code.end <= res->end)
-   request_resource(res, _code);
-   if (kernel_data.start >= res->start &&
-   kernel_data.end <= res->end)
-   request_resource(res, _data);
-#ifdef CONFIG_KEXEC_CORE
-   /* Userspace will find "Crash kernel" region in /proc/iomem. */
-   if (crashk_res.end && crashk_res.start >= res->start &&
-   crashk_res.end <= res->end)
-   request_resource(res, _res);
-#endif
+   insert_resource(_resource, res);
}
  }
  
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c

index db63cc885771a52..90f276d46b93bc6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -109,6 +109,7 @@ static void __init reserve_crashkernel(void)
kmemleak_ignore_phys(crash_base);
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
+   insert_resource(_resource, _res);
  }
  #else
  static void __init reserve_crashkernel(void)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: disable cpu hotplug until the rebooting cpu is stable

2022-01-26 Thread Valentin Schneider
On 26/01/22 10:45, Pingfan Liu wrote:
> On Wed, Jan 26, 2022 at 12:29 AM Valentin Schneider
>  wrote:
>>
>> On 25/01/22 11:39, Pingfan Liu wrote:
>> > The following identical code piece appears in both
>> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>> >
>> >   if (!cpu_online(primary_cpu))
>> >   primary_cpu = cpumask_first(cpu_online_mask);
>> >
>> > Although the kexec-reboot task can get through a cpu_down() on its cpu,
>> > this code looks a little confusing.
>> >
>> > Make things straight forward by keep cpu hotplug disabled until
>> > smp_shutdown_nonboot_cpus() holds cpu_add_remove_lock. By this way, the
>> > rebooting cpu can keep unchanged.
>> >
>>
>> So is this supposed to be a refactor with no change in behaviour? AFAICT it
>> actually does change things (and isn't necessarily clearer).
>>
> Yes, as you have seen, it does change behavior. Before this patch,
> there is a breakage:
>   migrate_to_reboot_cpu();
>   cpu_hotplug_enable();
>  --> technical, here can
> comes a cpu_down(this_cpu)
>   machine_shutdown();
>
> And this patch squeezes out this breakage.
>

Ok, that's worth pointing out in the changelog.

>> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> > index 68480f731192..db4fa6b174e3 100644
>> > --- a/kernel/kexec_core.c
>> > +++ b/kernel/kexec_core.c
>> > @@ -1168,14 +1168,12 @@ int kernel_kexec(void)
>> >   kexec_in_progress = true;
>> >   kernel_restart_prepare("kexec reboot");
>> >   migrate_to_reboot_cpu();
>> > -
>> >   /*
>> > -  * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> > -  * no further code needs to use CPU hotplug (which is true in
>> > -  * the reboot case). However, the kexec path depends on using
>> > -  * CPU hotplug again; so re-enable it here.
>> > +  * migrate_to_reboot_cpu() disables CPU hotplug. If an arch
>> > +  * relies on the cpu teardown to achieve reboot, it needs to
>> > +  * re-enable CPU hotplug there.
>> >*/
>> > - cpu_hotplug_enable();
>> > +
>>
>> Not all archs map machine_shutdown() to smp_shutdown_nonboot_cpus(), other
>> archs will now be missing a cpu_hotplug_enable() prior to a kexec
>> machine_shutdown(). That said, AFAICT none of those archs rely on the
>> hotplug machinery in machine_shutdown(), so it might be OK, but that's not
>> obvious at all.
>>
> At the first glance, it may be not obvious, but tracing down
> cpu_hotplug_enable() to the variable cpu_hotplug_disabled, you can
> find out the limited involved functions are all related to
> cpu_up/cpu_down.
>
> IOW, if no code path connects with the interface of cpu_up/cpu_down,
> then kexec-reboot will not be affected.
>

That's my point, this only works if the other archs truly do not rely on
hotplug for machine_shutdown(), which seems to be the case but it wouldn't
hurt for you to double-check that and explicitely call it out in the
changelog.

> And after this patch, it is more clear how to handle the following cases:
> arch/arm/kernel/reboot.c:94:smp_shutdown_nonboot_cpus(reboot_cpu);
> arch/arm64/kernel/process.c:88: smp_shutdown_nonboot_cpus(reboot_cpu);
> arch/ia64/kernel/process.c:578: smp_shutdown_nonboot_cpus(reboot_cpu);
>

FWIW riscv is also concerned.

> Thanks,
> Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 07/12] x86/sev: Setup code to park APs in the AP Jump Table

2022-01-26 Thread Joerg Roedel
On Wed, Nov 10, 2021 at 05:37:32PM +0100, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:55:58PM +0200, Joerg Roedel wrote:
> >  extern unsigned char real_mode_blob[];
> > diff --git a/arch/x86/include/asm/sev-ap-jumptable.h 
> > b/arch/x86/include/asm/sev-ap-jumptable.h
> > new file mode 100644
> > index ..1c8b2ce779e2
> > --- /dev/null
> > +++ b/arch/x86/include/asm/sev-ap-jumptable.h
> 
> Why a separate header? arch/x86/include/asm/sev.h looks small enough.

The header is included in assembly, so I made a separate one.

> > +void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)
> 
> Why is this a separate function?
> 
> It is all part of the jump table setup.

Right, but the sev_es_setup_ap_jump_table_blob() function is already
pretty big and I wanted to keep things readable.

> 
> > +   return 0;
> 
> Why are you returning 0 here and below?

This is in an initcall and it returns just 0 when the environment is not
ready to setup the ap jump table. Returning non-zero would create a
warning message in the caller for something that is not a bug in the
kernel.

> > + * This file contains the source code for the binary blob which gets 
> > copied to
> > + * the SEV-ES AP Jumptable to park APs while offlining CPUs or booting a 
> > new
> 
> I've seen "Jumptable", "Jump Table" and "jump table" at least. I'd say, do
> the last one everywhere pls.

Fair, sorry for my english being too german :) I changed everything to
'jump table'.

> > +   /* Reset remaining registers */
> > +   movl$0, %esp
> > +   movl$0, %eax
> > +   movl$0, %ebx
> > +   movl$0, %edx
> 
> All 4: use xor

XOR changes EFLAGS, can not use them here.

> > +
> > +   /* Reset CR0 to get out of protected mode */
> > +   movl$0x6010, %ecx
> 
> Another magic naked number.

This is the CR0 reset value. I have updated the comment to make this
more clear.

Thanks,

-- 
Jörg Rödel
jroe...@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V4] notifier/panic: Introduce panic_notifier_filter

2022-01-26 Thread Petr Mladek
On Wed 2022-01-26 11:10:39, Baoquan He wrote:
> On 01/24/22 at 11:48am, Guilherme G. Piccoli wrote:
> > On 24/01/2022 10:59, Baoquan He wrote:
> > > [...]
> > > About pre_dump, if the dump is crash dump, hope those pre_dump notifiers
> > > will be executed under conditional check, e.g only if 
> > > 'crash_kexec_post_notifiers'
> > > is specified in kernel cmdline. 
> > 
> > Hi Baoquan, based on Petr's suggestion, I think pre_dump would be
> > responsible for really *non-intrusive/non-risky* tasks and should be
> > always executed in the panic path (before kdump), regardless of
> > "crash_kexec_post_notifiers".
> > 
> > The idea is that the majority of the notifiers would be executed in the
> > post_dump portion, and for that, we have the
> > "crash_kexec_post_notifiers" conditional. I also suggest we have
> > blacklist options (based on function names) for both notifiers, in order
> > to make kdump issues debug easier.
> > 
> > Do you agree with that? Feel free to comment with suggestions!
> > Cheers,
> 
> I would say "please NO" cautiously.
> 
> As Petr said, kdump mostly works only if people configure it correctly.
> That's because we try best to switch to kdump kernel from the fragile
> panicked kernel immediately. When we try to add anthing before the switching,
> please consider carefully and ask if that adding is mandatory, otherwise
> switching into kdump kernel may fail. If the answer is yes, the adding
> is needed and welcomed. Othewise, any unnecessary action, including any
> "non-intrusive/non-risky" tasks, would be unwelcomed.

I still do not have the complete picture. But it seems that some
actions make always sense even for kdump:

+ Super safe operations that might disable churn from broken
  system. For examle, disabling watchdogs by setting a single
  variable, see rcu_panic() notifier

+ Actions needed that allow to kexec the crash kernel a safe
  way under some hypervisor, see
  
https://lore.kernel.org/r/mwhpr21mb15933573f5c81c5250bf6a1cd7...@mwhpr21mb1593.namprd21.prod.outlook.com


> Surely, we don't oppose the "non-intrusive/non-risky" or completely
> "intrusive/risky" action adding before kdump kernel switching, with a
> conditional limitation. When we handle customers' kdump support, we
> explicitly declare we only support normal and default kdump operation.
> If any action which is done before switching into kdump kernel is specified,
> e.g panic_notifier, panic_print, they need take care of their own kdump
> failure.

All this actually started because of kmsg_dump. It might make sense to
allow both kmsg_dump and kdump together. The messages stored by
kmsg_dump might be better than nothing when kdump fails.

It actually seems to be the main motivation to introduce
"crash_kexec_post_notifier" parameter, see the commit
f06e5153f4ae2e2f3b03 ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers").

And this patch introduces panic_notifier_filter that tries to select
notifiers that are helpful and harmful. IMHO, it is almost unusable.
It seems that even kernel developers do not understand what exactly
some notifiers do and why they are needed. Usually only the author
and people familiar with the subsystem have some idea. It makes
it pretty hard for anyone to create a reasonable filter.

I am pretty sure that we could do better. I propose to add more
notifier lists that will be called at various places with reasonable
rules and restrictions. Then the susbsystem maintainers could decide
where exactly a given action must be done.

The result might be that we will need only few options that will
enable/disable some well defined optional features.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V4] notifier/panic: Introduce panic_notifier_filter

2022-01-26 Thread d.hatay...@fujitsu.com
> On 01/24/22 at 11:48am, Guilherme G. Piccoli wrote:
> > On 24/01/2022 10:59, Baoquan He wrote:
> > > [...]
> > > About pre_dump, if the dump is crash dump, hope those pre_dump notifiers
> > > will be executed under conditional check, e.g only if 
> > > 'crash_kexec_post_notifiers'
> > > is specified in kernel cmdline.
> >
> > Hi Baoquan, based on Petr's suggestion, I think pre_dump would be
> > responsible for really *non-intrusive/non-risky* tasks and should be
> > always executed in the panic path (before kdump), regardless of
> > "crash_kexec_post_notifiers".
> >
> > The idea is that the majority of the notifiers would be executed in the
> > post_dump portion, and for that, we have the
> > "crash_kexec_post_notifiers" conditional. I also suggest we have
> > blacklist options (based on function names) for both notifiers, in order
> > to make kdump issues debug easier.
> >
> > Do you agree with that? Feel free to comment with suggestions!
> > Cheers,
>
> I would say "please NO" cautiously.
>
> As Petr said, kdump mostly works only if people configure it correctly.
> That's because we try best to switch to kdump kernel from the fragile
> panicked kernel immediately. When we try to add anthing before the switching,
> please consider carefully and ask if that adding is mandatory, otherwise
> switching into kdump kernel may fail. If the answer is yes, the adding
> is needed and welcomed. Othewise, any unnecessary action, including any
> "non-intrusive/non-risky" tasks, would be unwelcomed.
>
> Surely, we don't oppose the "non-intrusive/non-risky" or completely
> "intrusive/risky" action adding before kdump kernel switching, with a
> conditional limitation. When we handle customers' kdump support, we
> explicitly declare we only support normal and default kdump operation.
> If any action which is done before switching into kdump kernel is specified,
> e.g panic_notifier, panic_print, they need take care of their own kdump
> failure.

Sorry for bringing back the past discussion, but how about
reconsidering the following design?

- kdump-specific notifier list within crash_kexec()

I don't think that the current implementation of
crash_kexec_post_notifiers is ideal, where panic() and panic notifier
are used, which contains the code that is unnecessary for kdump, and
it unnecessarily decreases kdump's reliability. The presence of kdump
code in panic() also conversely makes panic()'s code unnecessarily
complicated.

I use crash_kexec_post_notifiers with the understanding that it
affects kdump's reliability to some degree, but at the same time I
want it to be as reliable as possible.

I think introducing kdump-specific notifier list will resolve the
issues caused by dependencies to panic()'s code.

In addition, as I already said in another mail, I want to avoid
invoking any other handlers passed by users other than me, in order to
keep kdump's reliability. For such purpose, I think some feature like
Guilherme's panic notifier filter is needed.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-26 Thread Petr Mladek
On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> From: Baoquan He  Sent: Friday, January 21, 2022 8:34 PM
> > 
> > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > From: Baoquan He  Sent: Thursday, January 20, 2022 6:31 
> > > PM
> > > >
> > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > Hi Baoquan, some comments inline below:
> > > > >
> > > > > On 20/01/2022 05:51, Baoquan He wrote:
> 
> [snip]
> 
> > > > > Do you think it should be necessary?
> > > > > How about if we allow users to just "panic_print" with or without the
> > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > might have a much clear code.
> > > >
> > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to 
> > > > enable
> > > > it. And we got bug report on the HyperV issue. In our internal 
> > > > discussion,
> > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > leave it to user to decide.
> > > >
> > >
> > > Regarding Hyper-V:   Invoking the Hyper-V notifier prior to running the
> > > kdump kernel is necessary for correctness.  During initial boot of the
> > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > may write to.   A VMbus connection is also established. Before kexec'ing
> > > into the kdump kernel, the sharing of these pages must be rescinded
> > > and the VMbus connection must be terminated.   If this isn't done, the
> > > kdump kernel will see strange memory overwrites if these shared guest
> > > physical memory pages get used for something else.
> 
> In the Azure cloud, collecting data before crash dumps is a motivation
> as well for setting crash_kexec_post_notifiers to true.   That way as
> cloud operator we can see broad failure trends, and in specific cases
> customers often expect the cloud operator to be able to provide info
> about a problem even if they have taken a kdump.  Where did you
> envision adding a comment in the code to help clarify these intentions?
> 
> I looked at the code again, and should revise my previous comments
> somewhat.   The Hyper-V resets that I described indeed must be done
> prior to kexec'ing the kdump kernel.   Most such resets are actually
> done via __crash_kexec() -> machine_crash_shutdown(), not via the
> panic notifier. However, the Hyper-V panic notifier must terminate the
> VMbus connection, because that must be done even if kdump is not
> being invoked.  See commit 74347a99e73.
>
> Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure 
> were probably due to the machine_crash_shutdown() path, and not due
> to running the panic notifiers prior to kexec'ing the kdump kernel.  The
> exception is terminating the VMbus connection, which had problems that
> are hopefully now fixed because of adding a timeout.

My undestanding is that we could split the actions into three groups:

  1. Actions that has to be before kexec'ing kdump kernel, like
 resetting physicall memory shared with Hyper-V.

 These operation(s) are needed only for kexec and can be done
 in kexec.


   2. Notify Hyper-V so that, for example, Azure cloud, could collect
  data before crash dump.

  It is nice to have.

  It should be configurable if it is not completely safe. I mean
  that there should be a way to disable it when it might increase
  the risk that kexec'ing kdump kernel might fail.


   3. Some actions are needed only when panic() ends up in the
  infinite loop.

  For example, unloading vmbus channel. At least the commit
  74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
  hv panic callback") says that it is done in kdump path
  out of box.

All these operations are needed and used only when the kernel is
running under Hyper-V.

My mine intention is to understand if we need 2 or 3 notifier lists
or the current one is enough.

The 3 notifier lists would be:

   + always do (even before kdump)
   + optionally do before or after kdump
   + needed only when kdump is not called

Thanks a lot for the very valuable input.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2] kexec-tools: mips: Modify some code

2022-01-26 Thread Tiezhu Yang
Tiezhu Yang (2):
  kexec-tools: mips: Add some debug info
  kexec-tools: mips: Concatenate --reuse-cmdline and --append

 kexec/arch/mips/kexec-mips.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

-- 
2.1.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] kexec-tools: mips: Concatenate --reuse-cmdline and --append

2022-01-26 Thread Tiezhu Yang
Use concat_cmdline() to concatenate the --append string and
the --reuse-cmdline string, otherwise only one of the two
options is valid.

This is similar with commit 8b42c99aa3bc ("Fix --reuse-cmdline
so it is usable.").

Signed-off-by: Tiezhu Yang 
---
 kexec/arch/mips/kexec-mips.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kexec/arch/mips/kexec-mips.c b/kexec/arch/mips/kexec-mips.c
index 10ae45b..d8cbea8 100644
--- a/kexec/arch/mips/kexec-mips.c
+++ b/kexec/arch/mips/kexec-mips.c
@@ -109,15 +109,17 @@ int arch_process_options(int argc, char **argv)
};
static const char short_options[] = KEXEC_ARCH_OPT_STR;
int opt;
+   char *cmdline = NULL;
+   const char *append = NULL;
 
while ((opt = getopt_long(argc, argv, short_options,
  options, 0)) != -1) {
switch (opt) {
case OPT_APPEND:
-   arch_options.command_line = optarg;
+   append = optarg;
break;
case OPT_REUSE_CMDLINE:
-   arch_options.command_line = get_command_line();
+   cmdline = get_command_line();
break;
case OPT_DTB:
arch_options.dtb_file = optarg;
@@ -130,6 +132,8 @@ int arch_process_options(int argc, char **argv)
}
}
 
+   arch_options.command_line = concat_cmdline(cmdline, append);
+
dbgprintf("%s:%d: command_line: %s\n", __func__, __LINE__,
arch_options.command_line);
dbgprintf("%s:%d: initrd: %s\n", __func__, __LINE__,
-- 
2.1.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] kexec-tools: mips: Add some debug info

2022-01-26 Thread Tiezhu Yang
Use dbgprintf() to print command_line, initrd and dtb
in arch_process_options() for debugging.

Signed-off-by: Tiezhu Yang 
---
 kexec/arch/mips/kexec-mips.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kexec/arch/mips/kexec-mips.c b/kexec/arch/mips/kexec-mips.c
index 5866e24..10ae45b 100644
--- a/kexec/arch/mips/kexec-mips.c
+++ b/kexec/arch/mips/kexec-mips.c
@@ -130,6 +130,13 @@ int arch_process_options(int argc, char **argv)
}
}
 
+   dbgprintf("%s:%d: command_line: %s\n", __func__, __LINE__,
+   arch_options.command_line);
+   dbgprintf("%s:%d: initrd: %s\n", __func__, __LINE__,
+   arch_options.initrd_file);
+   dbgprintf("%s:%d: dtb: %s\n", __func__, __LINE__,
+   arch_options.dtb_file);
+
return 0;
 }
 
-- 
2.1.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 03/12] x86/sev: Save and print negotiated GHCB protocol version

2022-01-26 Thread Joerg Roedel
On Wed, Nov 03, 2021 at 03:27:23PM +0100, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:55:54PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel 
> > 
> > Save the results of the GHCB protocol negotiation into a data structure
> > and print information about versions supported and used to the kernel
> > log.
> 
> Which is useful for?

For easier debugging, I added a sentence about that to the changelog.

> > +struct sev_ghcb_protocol_info {
> 
> Too long a name - ghcb_info is perfectly fine.

Changed, thanks.

-- 
Jörg Rödel
jroe...@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 6/6] crash hp: Add x86 crash hotplug support

2022-01-26 Thread Baoquan He
On 01/21/22 at 08:06am, Eric DeVolder wrote:
..
> > >   arch/x86/kernel/crash.c | 138 +++-
> > >   1 file changed, 137 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > > index 9730c88530fc..d185137b33d4 100644
> > > --- a/arch/x86/kernel/crash.c
> > > +++ b/arch/x86/kernel/crash.c
> > > @@ -25,6 +25,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include 
> > >   #include 
> > > @@ -265,7 +266,8 @@ static int prepare_elf_headers(struct kimage *image, 
> > > void **addr,
> > >   goto out;
> > >   /* By default prepare 64bit headers */
> > > - ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
> > > addr, sz);
> > > + ret =  crash_prepare_elf64_headers(image, cmem,
> > > + IS_ENABLED(CONFIG_X86_64), addr, sz);
> > >   out:
> > >   vfree(cmem);
> > > @@ -397,7 +399,17 @@ int crash_load_segments(struct kimage *image)
> > >   image->elf_headers = kbuf.buffer;
> > >   image->elf_headers_sz = kbuf.bufsz;
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > + /* Ensure elfcorehdr segment large enough for hotplug changes */
> > > + kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;
> > 
> > I would define a default value for the size, meantime provide a Kconfig
> > option to allow user to customize.
> 
> In patch 2/6 of this series, "crash hp: Introduce CRASH_HOTPLUG
> configuration options", I provide the following:
> 
> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
> +   depends on CRASH_HOTPLUG
> +   int
> +   default 131072
> +   help
> + Specify the maximum size of the elfcorehdr buffer/segment.
> 
> which defines a default value of 128KiB, and can be overriden at configure 
> time.
> 
> Are you asking for a different technique?

I thought to define a global variable, like

/* Defaults to ahve 128K elfcorehdr buffer which contains 2048 entries.*/
unsigned long crash_hotplug_elfcorehdr_size = 0x2;

Then initialize it in crash_hotplug_init() if CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ
is enabled.

Seems using the config directly is also OK. Let's keep it and see if
other people have comment.

> 
> > 
> > > + /* For marking as usable to crash kernel */
> > > + image->elf_headers_sz = kbuf.memsz;
> > > + /* Record the index of the elfcorehdr segment */
> > > + image->elf_index = image->nr_segments;
> > > + image->elf_index_valid = true;
> > > +#else
> > >   kbuf.memsz = kbuf.bufsz;
> > > +#endif
> > >   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> > >   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> > >   ret = kexec_add_buffer();
> > > @@ -412,3 +424,127 @@ int crash_load_segments(struct kimage *image)
> > >   return ret;
> > >   }
> > >   #endif /* CONFIG_KEXEC_FILE */
> > > +
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > 
> > These two helper function should be carved out into a separate patch as
> > a preparatory one. I am considering how to rearrange and split the
> > patches, will reply to cover letter.
> 
> OK, I look forward to that insight!
> 
> > 
> > > +void *map_crash_pages(unsigned long paddr, unsigned long size)
> > > +{
> > > + /*
> > > +  * NOTE: The addresses and sizes passed to this routine have
> > > +  * already been fully aligned on page boundaries. There is no
> > > +  * need for massaging the address or size.
> > > +  */
> > > + void *ptr = NULL;
> > > +
> > > + /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
> > > + if (size > 0) {
> > > + struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
> > > +
> > > + ptr = kmap(page);
> > > + }
> > > +
> > > + return ptr;
> > > +}
> > > +
> > > +void unmap_crash_pages(void **ptr)
> > > +{
> > > + if (ptr) {
> > > + if (*ptr)
> > > + kunmap(*ptr);
> > > + *ptr = NULL;
> > > + }
> > > +}
> > > +
> > > +void arch_crash_hotplug_handler(struct kimage *image,
> > > + unsigned int hp_action, unsigned long a, unsigned long b)
> > > +{
> > > + /*
> > > +  * To accurately reflect hot un/plug changes, the elfcorehdr (which
> > > +  * is passed to the crash kernel via the elfcorehdr= parameter)
> > > +  * must be updated with the new list of CPUs and memories. The new
> > > +  * elfcorehdr is prepared in a kernel buffer, and if no errors,
> > > +  * then it is written on top of the existing/old elfcorehdr.
> > > +  *
> > > +  * Due to the change to the elfcorehdr, purgatory must explicitly
> > > +  * exclude the elfcorehdr from the list of segments it checks.
> > > +  */
> > 
> > Please move this code comment to above function as kernel-doc if you
> > this it benefits the entire function. Otherwise should move them above
> > the code block they are explaining. For this place, I think moving them
> > to above arch_crash_hotplug_handler() is better.
> 
> ok, I will do that!
> 
> > 
> > > + struct kexec_segment *ksegment;
> > > + unsigned char *ptr = NULL;
> > > + 

Re: [PATCH v3 3/6] crash hp: definitions and prototype changes

2022-01-26 Thread Baoquan He
On 01/21/22 at 07:48am, Eric DeVolder wrote:
.. 
> > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > > index 0c994ae37729..068f853f1c65 100644
> > > --- a/include/linux/kexec.h
> > > +++ b/include/linux/kexec.h
> > > @@ -221,8 +221,9 @@ struct crash_mem {
> > >   extern int crash_exclude_mem_range(struct crash_mem *mem,
> > >  unsigned long long mstart,
> > >  unsigned long long mend);
> > > -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
> > > kernel_map,
> > > -void **addr, unsigned long *sz);
> > > +extern int crash_prepare_elf64_headers(struct kimage *image,
> > > + struct crash_mem *mem, int kernel_map,
> > > + void **addr, unsigned long *sz);
> > >   #endif /* CONFIG_KEXEC_FILE */
> > >   #ifdef CONFIG_KEXEC_ELF
> > > @@ -299,6 +300,13 @@ struct kimage {
> > >   /* Information for loading purgatory */
> > >   struct purgatory_info purgatory_info;
> > > +
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > + bool hotplug_event;
> > > + int offlinecpu;
> > > + bool elf_index_valid;
> > > + int elf_index;
> > 
> > Do we really need elf_index_valid? Can we initialize elf_index to , e.g 
> > '-1',
> > then check if the value is valid?
> 
> These members become part of struct kimage, and when the kimage is
> allocated, it is automatically zero'd. Wrt/ elf_index, 0 is a valid index,
> and so it needs to be qualified. I initially had used -1, but that required
> code and was fragile as I had to find the right place to do that. Using the
> boolean elf_index_valid, the problems with -1 vanish, and for free! I also
> found when examining the code that reading 'elf_index_valid' was better than
> 'elf_index != -1', more clear.
> 
> Let me know what you think.

OK, I am fine with it. Will see if other people have comment.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec