Re: [PATCH 1/7] kexec_file: add kexec_file flag to control debug printing

2023-11-14 Thread Baoquan He
On 11/14/23 at 08:03am, Joe Perches wrote:
> On Tue, 2023-11-14 at 23:32 +0800, Baoquan He wrote:
> > When specifying 'kexec -c -d', kexec_load interface will print loading
> > information, e.g the regions where kernel/initrd/purgatory/cmdline
> > are put, the memmap passed to 2nd kernel taken as system RAM ranges,
> > and printing all contents of struct kexec_segment, etc. These are
> > very helpful for analyzing or positioning what's happening when
> > kexec/kdump itself failed. The debugging printing for kexec_load
> > interface is made in user space utility kexec-tools.
> > 
> > Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
> > Because kexec_file code is mostly implemented in kernel space, and the
> > debugging printing functionality is missed. It's not convenient when
> > debugging kexec/kdump loading and jumping with kexec_file_load
> > interface.
> > 
> > Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
> > message printing. And add global variable kexec_file_dbg_print and
> > macro kexec_dprintk() to facilitate the printing.
> > 
> > This is a preparation, later kexec_dprintk() will be used to replace the
> > existing pr_debug(). Once 'kexec -s -d' is specified, it will print out
> > kexec/kdump loading information. If '-d' is not specified, it regresses
> > to pr_debug().
> 
> Not quite as pr_debug is completely eliminated with
> zero object size when DEBUG is not #defined.
> 
> Now the object size will be larger and contain the
> formats in .text.

Ah, I didn't realize that. Thanks for telling. I didn't take pr_info()
and pr_debug because I want to avoid printing the pr_fmt() string in
each file.

> 
> []
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> []
> > @@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info 
> > *pi, Elf_Shdr *section,
> > return -ENOEXEC;
> >  }
> >  #endif
> > +
> > +extern bool kexec_file_dbg_print;
> > +
> > +#define kexec_dprintk(fmt, args...)\
> > +   do {\
> > +   if (kexec_file_dbg_print)   \
> > +   printk(KERN_INFO fmt, ##args);  \
> > +   else\
> > +   printk(KERN_DEBUG fmt, ##args); \
> > +   } while (0)
> > +
> > +
> 
> I don't know how many of these printks exist and if
> overall object size matters but using

Not too much because they are spread in different arch.
> 
> #define kexec_dprintkfmt, ...)\
>   printk("%s" fmt,\
>  kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
>  ##__VA_ARGS__)
> 
> should reduce overall object size by eliminating the
> mostly duplicated format in .text which differs only
> by the KERN_

Sure, the new one looks great to me, I will update code to take it.
Thanks a lot for your great suggestion.

Thanks
Baoquan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

2023-11-14 Thread Baoquan He
On 11/14/23 at 03:17pm, Andrew Morton wrote:
> On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He  wrote:
> 
> > This function, being a variant of walk_system_ram_res() introduced in
> > commit 8c86e70acead ("resource: provide new functions to walk through
> > resources"), walks through a list of all the resources of System RAM
> > in reversed order, i.e., from higher to lower.
> > 
> > It will be used in kexec_file code to load kernel, initrd etc when
> > preparing kexec reboot.
> >
> > ...
> >
> > +/*
> > + * This function, being a variant of walk_system_ram_res(), calls the @func
> > + * callback against all memory ranges of type System RAM which are marked 
> > as
> > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
> > + * higher to lower.
> > + */
> > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> > +   int (*func)(struct resource *, void *))
> > +{
> > +   struct resource res, *rams;
> > +   int rams_size = 16, i;
> > +   unsigned long flags;
> > +   int ret = -1;
> > +
> > +   /* create a list */
> > +   rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL);
> > +   if (!rams)
> > +   return ret;
> > +
> > +   flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > +   i = 0;
> > +   while ((start < end) &&
> > +   (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, 
> > ))) {
> > +   if (i >= rams_size) {
> > +   /* re-alloc */
> > +   struct resource *rams_new;
> > +   int rams_new_size;
> > +
> > +   rams_new_size = rams_size + 16;
> > +   rams_new = kvcalloc(rams_new_size, sizeof(struct 
> > resource),
> > +   GFP_KERNEL);
> 
> kvrealloc()?

Exactly. Will udpate. Thanks for the great suggestion.

> 
> > +   if (!rams_new)
> > +   goto out;
> > +
> > +   memcpy(rams_new, rams,
> > +   sizeof(struct resource) * rams_size);
> > +   kvfree(rams);
> > +   rams = rams_new;
> > +   rams_size = rams_new_size;
> > +   }
> > +
> > +   rams[i].start = res.start;
> > +   rams[i++].end = res.end;
> > +
> > +   start = res.end + 1;
> > +   }
> > +
> > +   /* go reverse */
> > +   for (i--; i >= 0; i--) {
> > +   ret = (*func)([i], arg);
> > +   if (ret)
> > +   break;
> > +   }
> > +
> > +out:
> > +   kvfree(rams);
> > +   return ret;
> > +}
> > +
> >  /*
> >   * This function calls the @func callback against all memory ranges, which
> >   * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
> > -- 
> > 2.41.0
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/7] ima: kexec: measure events between kexec load and execute

2023-11-14 Thread Tushar Sugandhi




On 10/27/23 08:18, Mimi Zohar wrote:

On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote:

The current Kernel behavior is IMA measurements snapshot is taken at
kexec 'load' and not at kexec 'execute'.  IMA log is then carried
over to the new Kernel after kexec 'execute'.

Some systems can be configured to call kexec 'load' first, and followed
by kexec 'execute' after some time.  (as opposed to calling 'load' and
'execute' in one single kexec command).


Additional measurements may be introduced by the kexec load itself.
Saving the measurement list as close as possible to the reboot is
beneficial, whether or not the kexec load and kexec execute are
executed separately.


True. What I am trying to say here is the longer the window between
'load' and 'execute', greater are the chances of measurements getting
added.
But as long as a single measurement gets added between 'load' and
'execute', it will break the attestation after kexec soft-reboot.

So maybe the above line in the patch description is not necessary.
I will remove.


In such scenario, if new IMA
measurements are added between kexec 'load' and kexec 'execute', the
TPM PCRs are extended with the IMA events between 'load' and 'execute'.
But those IMA events are not carried over to the new Kernel after kexec
soft reboot.  This results in mismatch between TPM PCR quotes, and the
actual IMA measurements list, after the system boots into the new kexec
image.  This mismatch results in the remote attestation failing for that
system.

This patch series proposes a solution to solve this problem by allocating
the necessary buffer at kexec 'load' time, and populating the buffer
with the IMA measurements at kexec 'execute' time.


How about beginning the paragraph with "To solve this problem allocate
... and populate ..."


Sure. Will do.

~Tushar



Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()

2023-11-14 Thread Andrew Morton
On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He  wrote:

> This function, being a variant of walk_system_ram_res() introduced in
> commit 8c86e70acead ("resource: provide new functions to walk through
> resources"), walks through a list of all the resources of System RAM
> in reversed order, i.e., from higher to lower.
> 
> It will be used in kexec_file code to load kernel, initrd etc when
> preparing kexec reboot.
>
> ...
>
> +/*
> + * This function, being a variant of walk_system_ram_res(), calls the @func
> + * callback against all memory ranges of type System RAM which are marked as
> + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
> + * higher to lower.
> + */
> +int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
> + int (*func)(struct resource *, void *))
> +{
> + struct resource res, *rams;
> + int rams_size = 16, i;
> + unsigned long flags;
> + int ret = -1;
> +
> + /* create a list */
> + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL);
> + if (!rams)
> + return ret;
> +
> + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + i = 0;
> + while ((start < end) &&
> + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, 
> ))) {
> + if (i >= rams_size) {
> + /* re-alloc */
> + struct resource *rams_new;
> + int rams_new_size;
> +
> + rams_new_size = rams_size + 16;
> + rams_new = kvcalloc(rams_new_size, sizeof(struct 
> resource),
> + GFP_KERNEL);

kvrealloc()?

> + if (!rams_new)
> + goto out;
> +
> + memcpy(rams_new, rams,
> + sizeof(struct resource) * rams_size);
> + kvfree(rams);
> + rams = rams_new;
> + rams_size = rams_new_size;
> + }
> +
> + rams[i].start = res.start;
> + rams[i++].end = res.end;
> +
> + start = res.end + 1;
> + }
> +
> + /* go reverse */
> + for (i--; i >= 0; i--) {
> + ret = (*func)([i], arg);
> + if (ret)
> + break;
> + }
> +
> +out:
> + kvfree(rams);
> + return ret;
> +}
> +
>  /*
>   * This function calls the @func callback against all memory ranges, which
>   * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
> -- 
> 2.41.0

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 7/7] ima: record log size at kexec load and execute

2023-11-14 Thread Tushar Sugandhi




On 10/27/23 07:56, Mimi Zohar wrote:

Hi Tushar,

On Thu, 2023-10-05 at 11:26 -0700, Tushar Sugandhi wrote:

The window between kexec 'load' and 'execute' could be arbitrarily long.
Even with the large chunk of memory allocated at kexec 'load', it may
run out which would result in missing events in IMA log after the system
soft reboots to the new Kernel.  This would result in IMA measurements
getting out of sync with the TPM PCR quotes which would result in remote
attestation failing for that system.  There is currently no way for the
new Kernel to know if the IMA log TPM PCR quote out of sync problem is
because of the missing measurements during kexec.

Define two new IMA events, 'kexec_load' and 'kexec_execute', to be
measured at kexec 'load' and 'execute' respectively.

IMA measures 'boot_aggregate' as the first event when the system boots -
either cold boot or kexec soft boot.  In case when the system goes
through multiple soft reboots, the number of 'boot_aggregate' events in
IMA log corresponds to total number of boots (cold boot plus multiple
kexec soft reboots).  With this change, there would be additional
'kexec_load' and 'kexec_execute' events in between the two
'boot_aggregate' events.  In rare cases, when the system runs out of
memory during kexec soft reboot, 'kexec_execute' won't be copied since
its one of the very last event measured just before kexec soft reboot.
The absence of the event 'kexec_execute' in between the two
boot_aggregate' events would signal the attestation service that the IMA
log on the system is out of sync with TPM PCR quotes and the system needs
to be cold booted for the remote attestation to succeed again.

Signed-off-by: Tushar Sugandhi 


Adding a new type of critical data is fine.  The patch description
should describe the reason for it.  Please update the Subject line and
the patch description accordingly.

Thanks.  I described the reasons in the code comment below, because I 
thought it is important enough to stay with the code to give context. 
But I can move it from the code comment to a patch description.

---
  security/integrity/ima/ima_kexec.c | 35 +-
  1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 6cd5f46a7208..0f9c424fe808 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -17,6 +17,8 @@
  #include "ima.h"
  
  #ifdef CONFIG_IMA_KEXEC

+#define IMA_KEXEC_EVENT_LEN 128
+
  struct seq_file ima_kexec_file;
  struct ima_kexec_hdr ima_khdr;
  static void *ima_kexec_buffer;
@@ -34,6 +36,8 @@ void ima_clear_kexec_file(void)
  
  static int ima_alloc_kexec_buf(size_t kexec_segment_size)

  {
+   char ima_kexec_event[IMA_KEXEC_EVENT_LEN];
+
if ((kexec_segment_size == 0) ||
(kexec_segment_size == ULONG_MAX) ||
((kexec_segment_size >> PAGE_SHIFT) > totalram_pages() / 2)) {
@@ -64,6 +68,12 @@ static int ima_alloc_kexec_buf(size_t kexec_segment_size)
memset(_khdr, 0, sizeof(ima_khdr));
ima_khdr.version = 1;
  
+	scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN,

+ "kexec_segment_size=%lu;", kexec_segment_size);
+
+   ima_measure_critical_data("ima_kexec", "kexec_load", ima_kexec_event,
+ strlen(ima_kexec_event), false, NULL, 0);
+
return 0;
  }
  
@@ -198,6 +208,7 @@ void ima_add_kexec_buffer(struct kimage *image)

  static int ima_update_kexec_buffer(struct notifier_block *self,
   unsigned long action, void *data)
  {
+   char ima_kexec_event[IMA_KEXEC_EVENT_LEN];
void *buf = NULL;
size_t buf_size;
bool resume = false;
@@ -213,9 +224,31 @@ static int ima_update_kexec_buffer(struct notifier_block 
*self,
return NOTIFY_OK;
}
  
+	buf_size = ima_get_binary_runtime_size();

+   scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN,
+ "kexec_segment_size=%lu;ima_binary_runtime_size=%lu;",
+ kexec_segment_size, buf_size);
+
+   /*
+* This is one of the very last events measured by IMA before kexec
+* soft rebooting into the new Kernal.
+* This event can be used as a marker after the system soft reboots
+* to the new Kernel to check if there was sufficient memory allocated
+* at kexec 'load' to capture the events measured between the window
+* of kexec 'load' and 'execute'.
+* This event needs to be present in the IMA log, in between the two
+* 'boot_aggregate' events that are logged for the previous boot and
+* the current soft reboot. If it is not present after the system soft
+* reboots into the new Kernel, it would mean the IMA log is not
+* consistent with the TPM PCR quotes, and the system needs to be
+* cold-booted for the attestation to succeed again.
+*/


Comments like this 

Re: [PATCH v2 2/7] ima: move ima_dump_measurement_list call from kexec load to execute

2023-11-14 Thread Tushar Sugandhi




On 10/27/23 06:08, Mimi Zohar wrote:

Hi Tushar,

On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote:

In the current IMA implementation, ima_dump_measurement_list() is called
during the kexec 'load' operation.  This can result in loss of IMA
measurements taken between the 'load' and 'execute' phases when the
system goes through Kexec soft reboot to a new Kernel.  The call to the
function ima_dump_measurement_list() needs to be moved out of the
function ima_add_kexec_buffer() and needs to be called during the kexec
'execute' operation.

Implement a function ima_update_kexec_buffer() that is called during
kexec 'execute', allowing IMA to update the measurement list with the
events between kexec 'load' and 'execute'.  Move the
ima_dump_measurement_list() call from ima_add_kexec_buffer() to
ima_update_kexec_buffer().  Make ima_kexec_buffer and kexec_segment_size
variables global, so that they can be accessed during both kexec 'load'
and 'execute'.  Add functions ima_measurements_suspend() and
ima_measurements_resume() to set and reset the 'suspend_ima_measurements'
variable respectively, to suspend/resume IMA measurements.  Use
the existing 'ima_extend_list_mutex' to ensure that the operations are
thread-safe.  These function calls will help maintaining the integrity
of the IMA log while it is being copied to the new Kernel's buffer.
Add a reboot notifier_block 'update_buffer_nb' to ensure
the function ima_update_kexec_buffer() gets called during kexec
soft-reboot.

Signed-off-by: Tushar Sugandhi 


The lengthiness and complexity of the patch description is an
indication that the patch  needs to be broken up.  Please refer to
Documentation/process/submitting-patches.rst  for further info.

In addition, this patch moves the function ima_dump_measurement_list()
to a new function named ima_update_kexec_buffer(), which is never
called.   The patch set is thus not bisect safe.
 > [...]

+void ima_measurements_suspend(void)
+{
+   mutex_lock(_extend_list_mutex);
+   atomic_set(_ima_measurements, 1);
+   mutex_unlock(_extend_list_mutex);
+}
+
+void ima_measurements_resume(void)
+{
+   mutex_lock(_extend_list_mutex);
+   atomic_set(_ima_measurements, 0);
+   mutex_unlock(_extend_list_mutex);
+}


These function are being defined and called here, but are not enforced
until a later patch.   It would make more sense to introduce and
enforce these functions in a single patch with an explanation as to why
suspend/resume is needed.

The cover letter describes the problem as "... new IMA measurements are
added between kexec 'load' and kexec 'execute'".Please include in
the patch description the reason for needing suspend/resume, since
saving the measurement records is done during kexec execute.


Since I am introducing a few new functions in this patch set,
I am having hard time keeping the patches bisect safe and at the same 
time managing the length/complexity of the individual patches in the series.


I will go back and revisit the bisect-safeness of the patches in this 
series again.


Thanks for the feedback, appreciate it.



Re: [PATCH v2 1/7] ima: refactor ima_dump_measurement_list to move memory allocation to a separate function

2023-11-14 Thread Tushar Sugandhi




On 10/26/23 20:25, Mimi Zohar wrote:

On Thu, 2023-10-26 at 16:16 -0400, Mimi Zohar wrote:

Hi Tushar,

According to Documentation/process/submitting-patches.rst, the subject
line should be between 70-75 characters.

Perhaps something like "ima: define and call ima_alloc_kexec_buffer()".

On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote:

IMA allocates memory and dumps the measurement during kexec soft reboot
as a single function call ima_dump_measurement_list().  It gets called
during kexec 'load' operation.  It results in the IMA measurements
between the window of kexec 'load' and 'execute' getting dropped when the
system boots into the new Kernel.  One of the kexec requirements is the
segment size cannot change between the 'load' and the 'execute'.
Therefore, to address this problem, ima_dump_measurement_list() needs
to be refactored to allocate the memory at kexec 'load', and dump the
measurements at kexec 'execute'.  The function that allocates the memory
should handle the scenario where the kexec load is called multiple times


The above pragraph is unnecessary.


Refactor ima_dump_measurement_list() to move the memory allocation part
to a separate function ima_alloc_kexec_buf() to allocate buffer of size
'kexec_segment_size' at kexec 'load'.  Make the local variables in
function ima_dump_measurement_list() global, so that they can be accessed
from ima_alloc_kexec_buf().  Make necessary changes to the function
ima_add_kexec_buffer() to call the above two functions.


Fix the wording based on the suggested changes below.


Signed-off-by: Tushar Sugandhi 


- Before re-posting this patch set, verify there aren't any
"checkpatch.pl --strict" issues.
- After applying each patch, compile the kernel and verify it still
works.


Doing this will detect whether or not the patch set is bisect safe.


I usually just do checkpatch.pl <.patch file>.
I didn't know about --strict and it's benefits.
Will do it going forward.



---
  security/integrity/ima/ima_kexec.c | 126 +
  1 file changed, 93 insertions(+), 33 deletions(-)

diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 419dc405c831..307e07991865 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -15,61 +15,114 @@
  #include "ima.h"
  
  #ifdef CONFIG_IMA_KEXEC

+struct seq_file ima_kexec_file;


Define "ima_kexec_file" as static since it only used in this file.
Since the variable does not need to be global, is there still a reason
for changing its name?   Minimize code change.


Adding "static" would make ima_kexec_file a global static variable.
Please ignore my comment about reverting the variable name change.

Mimi


Sure :)

~Tushar
...



Re: [PATCH v2 1/7] ima: refactor ima_dump_measurement_list to move memory allocation to a separate function

2023-11-14 Thread Tushar Sugandhi

Thanks a lot for reviewing this patch set Mimi.

On 10/26/23 13:16, Mimi Zohar wrote:

Hi Tushar,

According to Documentation/process/submitting-patches.rst, the subject
line should be between 70-75 characters.

Perhaps something like "ima: define and call ima_alloc_kexec_buffer()".


Sure thing. I will shorten the subject line. Here and elsewhere.

On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote:

IMA allocates memory and dumps the measurement during kexec soft reboot
as a single function call ima_dump_measurement_list().  It gets called
during kexec 'load' operation.  It results in the IMA measurements
between the window of kexec 'load' and 'execute' getting dropped when the
system boots into the new Kernel.  One of the kexec requirements is the
segment size cannot change between the 'load' and the 'execute'.
Therefore, to address this problem, ima_dump_measurement_list() needs
to be refactored to allocate the memory at kexec 'load', and dump the
measurements at kexec 'execute'.  The function that allocates the memory
should handle the scenario where the kexec load is called multiple times


The above pragraph is unnecessary.


Sounds good. I will remove the above paragraph.


Refactor ima_dump_measurement_list() to move the memory allocation part
to a separate function ima_alloc_kexec_buf() to allocate buffer of size
'kexec_segment_size' at kexec 'load'.  Make the local variables in
function ima_dump_measurement_list() global, so that they can be accessed
from ima_alloc_kexec_buf().  Make necessary changes to the function
ima_add_kexec_buffer() to call the above two functions.


Fix the wording based on the suggested changes below.


Will do.

Signed-off-by: Tushar Sugandhi 


Before re-posting this patch set, verify there aren't any
"checkpatch.pl --strict" issues.
After applying each patch, compile the kernel and verify it still
works.
>> ---

  security/integrity/ima/ima_kexec.c | 126 +
  1 file changed, 93 insertions(+), 33 deletions(-)

diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 419dc405c831..307e07991865 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -15,61 +15,114 @@
  #include "ima.h"
  
  #ifdef CONFIG_IMA_KEXEC

+struct seq_file ima_kexec_file;


Define "ima_kexec_file" as static since it only used in this file.
Since the variable does not need to be global, is there still a reason
for changing its name?   Minimize code change.


+struct ima_kexec_hdr ima_khdr;
+
+void ima_clear_kexec_file(void)


The opposite of "set" is "clear" (e.g. set_git, clear_bit).  The
opposite of "alloc" would be "free" (e.g. ima_alloc_kexec_buf,
ima_free_kexec_buf)


Agreed. Will make it ima_free_kexec_buf.

+{
+   vfree(ima_kexec_file.buf);
+   ima_kexec_file.buf = NULL;
+   ima_kexec_file.size = 0;
+   ima_kexec_file.read_pos = 0;
+   ima_kexec_file.count = 0;
+}
+
+static int ima_alloc_kexec_buf(size_t kexec_segment_size)
+{
+   if ((kexec_segment_size == 0) ||
+   (kexec_segment_size == ULONG_MAX) ||
+   ((kexec_segment_size >> PAGE_SHIFT) > totalram_pages() / 2)) {
+   pr_err("%s: Invalid segment size for kexec: %zu\n",
+   __func__, kexec_segment_size);
+   return -EINVAL;
+   }


Please avoid code duplication.  If moving the test here, then remove it
from its original place.


Sure. Will do.

+
+   /*
+* If kexec load was called before, clear the existing buffer
+* before allocating a new one
+*/



+   if (ima_kexec_file.buf)
+   ima_clear_kexec_file();


Perhaps instead of always freeing the buffer, check and see whether the
size has changed.  If it hasn't changed, then simply return.  If it has
changed, perhaps use realloc().  Possible wording:

"Kexec may be called multiple times.  Free and re-alloc the buffer if
the size changed."


That's a good optimization. Thanks for the suggestion. Will do.

+
+   /* segment size can't change between kexec load and execute */
+   ima_kexec_file.buf = vmalloc(kexec_segment_size);
+   if (!ima_kexec_file.buf) {
+   pr_err("%s: No memory for ima kexec measurement buffer\n",
+   __func__);
+   return -ENOMEM;
+   }
+
+   ima_kexec_file.size = kexec_segment_size;
+   ima_kexec_file.read_pos = 0;
+   ima_kexec_file.count = sizeof(ima_khdr);/* reserved space */
+
+   memset(_khdr, 0, sizeof(ima_khdr));
+   ima_khdr.version = 1;
+
+   return 0;
+}
+
  static int ima_dump_measurement_list(unsigned long *buffer_size, void 
**buffer,
 unsigned long segment_size)
  {
struct ima_queue_entry *qe;
-   struct seq_file file;
-   struct ima_kexec_hdr khdr;
int ret = 0;
  
-	/* segment size can't change between kexec load and execute */

-   file.buf = 

Re: [RFC V2] IMA Log Snapshotting Design Proposal

2023-11-14 Thread Stefan Berger




On 11/14/23 13:36, Sush Shringarputale wrote:



On 11/13/2023 10:59 AM, Stefan Berger wrote:



On 10/19/23 14:49, Tushar Sugandhi wrote:

===
| Introduction |
===
This document provides a detailed overview of the proposed Kernel
feature IMA log snapshotting.  It describes the motivation behind the
proposal, the problem to be solved, a detailed solution design with
examples, and describes the changes to be made in the clients/services
which are part of remote-attestation system.  This is the 2nd version
of the proposal.  The first version is present here[1].

Table of Contents:
--
A. Motivation and Background
B. Goals and Non-Goals
 B.1 Goals
 B.2 Non-Goals
C. Proposed Solution
 C.1 Solution Summary
 C.2 High-level Work-flow
D. Detailed Design
 D.1 Snapshot Aggregate Event
 D.2 Snapshot Triggering Mechanism
 D.3 Choosing A Persistent Storage Location For Snapshots
 D.4 Remote-Attestation Client/Service-side Changes
 D.4.a Client-side Changes
 D.4.b Service-side Changes
E. Example Walk-through
F. Other Design Considerations
G. References



Userspace applications will have to know
a) where are the shard files?

We describe the file storage location choices in section D.3, but user
applications will have to query the well-known location described there.
b) how do I read the shard files while locking out the producer of the 
shard files?


IMO, this will require a well known config file and a locking method 
(flock) so that user space applications can work together in this new 
environment. The lock could be defined in the config file or just be 
the config file itself.

The flock is a good idea for co-ordination between UM clients. While
the Kernel cannot enforce any access in this way, any UM process that
is planning on triggering the snapshot mechanism should follow that
protocol.  We will ensure we document that as the best-practices in
the patch series.


It's more than 'best practices'. You need a well-known config file with 
well-known config options in it.


All clients that were previously just trying to read new bytes from the 
IMA log cannot do this anymore in the presence of a log shard producer 
but have to also learn that a new log shard has been produced so they 
need to figure out the new position in the log where to read from. So 
maybe a counter in a config file should indicate to the log readers that 
a new log has been produced -- otherwise they would have to monitor all 
the log shard files or the log shard file's size.


Iff the log-shard producer were configured to discard leading parts of 
the log then that should also be noted in a config file so clients, that 
need to see the beginning of the log, can refuse early on to work on a 
machine that either is configured this way or where the discarding has 
already happened.


  Stefan


- Sush




Re: [RFC V2] IMA Log Snapshotting Design Proposal

2023-11-14 Thread Sush Shringarputale




On 11/13/2023 10:59 AM, Stefan Berger wrote:



On 10/19/23 14:49, Tushar Sugandhi wrote:

===
| Introduction |
===
This document provides a detailed overview of the proposed Kernel
feature IMA log snapshotting.  It describes the motivation behind the
proposal, the problem to be solved, a detailed solution design with
examples, and describes the changes to be made in the clients/services
which are part of remote-attestation system.  This is the 2nd version
of the proposal.  The first version is present here[1].

Table of Contents:
--
A. Motivation and Background
B. Goals and Non-Goals
 B.1 Goals
 B.2 Non-Goals
C. Proposed Solution
 C.1 Solution Summary
 C.2 High-level Work-flow
D. Detailed Design
 D.1 Snapshot Aggregate Event
 D.2 Snapshot Triggering Mechanism
 D.3 Choosing A Persistent Storage Location For Snapshots
 D.4 Remote-Attestation Client/Service-side Changes
 D.4.a Client-side Changes
 D.4.b Service-side Changes
E. Example Walk-through
F. Other Design Considerations
G. References



Userspace applications will have to know
a) where are the shard files?

We describe the file storage location choices in section D.3, but user
applications will have to query the well-known location described there.
b) how do I read the shard files while locking out the producer of the 
shard files?


IMO, this will require a well known config file and a locking method 
(flock) so that user space applications can work together in this new 
environment. The lock could be defined in the config file or just be 
the config file itself.

The flock is a good idea for co-ordination between UM clients. While
the Kernel cannot enforce any access in this way, any UM process that
is planning on triggering the snapshot mechanism should follow that
protocol.  We will ensure we document that as the best-practices in
the patch series.
- Sush



[PATCH] kexec: Use atomic_try_cmpxchg in crash_kexec

2023-11-14 Thread Uros Bizjak
Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
crash_kexec().  x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg.

No functional change intended.

Cc: Eric Biederman 
Signed-off-by: Uros Bizjak 
---
 kernel/kexec_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index be5642a4ec49..bc4c096ab1f3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1063,9 +1063,10 @@ __bpf_kfunc void crash_kexec(struct pt_regs *regs)
 * panic().  Otherwise parallel calls of panic() and crash_kexec()
 * may stop each other.  To exclude them, we use panic_cpu here too.
 */
+   old_cpu = PANIC_CPU_INVALID;
this_cpu = raw_smp_processor_id();
-   old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
-   if (old_cpu == PANIC_CPU_INVALID) {
+
+   if (atomic_try_cmpxchg(_cpu, _cpu, this_cpu)) {
/* This is the 1st CPU which comes here, so go ahead. */
__crash_kexec(regs);
 
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/7] kexec_file: add kexec_file flag to control debug printing

2023-11-14 Thread Joe Perches
On Tue, 2023-11-14 at 23:32 +0800, Baoquan He wrote:
> When specifying 'kexec -c -d', kexec_load interface will print loading
> information, e.g the regions where kernel/initrd/purgatory/cmdline
> are put, the memmap passed to 2nd kernel taken as system RAM ranges,
> and printing all contents of struct kexec_segment, etc. These are
> very helpful for analyzing or positioning what's happening when
> kexec/kdump itself failed. The debugging printing for kexec_load
> interface is made in user space utility kexec-tools.
> 
> Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
> Because kexec_file code is mostly implemented in kernel space, and the
> debugging printing functionality is missed. It's not convenient when
> debugging kexec/kdump loading and jumping with kexec_file_load
> interface.
> 
> Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
> message printing. And add global variable kexec_file_dbg_print and
> macro kexec_dprintk() to facilitate the printing.
> 
> This is a preparation, later kexec_dprintk() will be used to replace the
> existing pr_debug(). Once 'kexec -s -d' is specified, it will print out
> kexec/kdump loading information. If '-d' is not specified, it regresses
> to pr_debug().

Not quite as pr_debug is completely eliminated with
zero object size when DEBUG is not #defined.

Now the object size will be larger and contain the
formats in .text.

[]
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
[]
> @@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info *pi, 
> Elf_Shdr *section,
>   return -ENOEXEC;
>  }
>  #endif
> +
> +extern bool kexec_file_dbg_print;
> +
> +#define kexec_dprintk(fmt, args...)  \
> + do {\
> + if (kexec_file_dbg_print)   \
> + printk(KERN_INFO fmt, ##args);  \
> + else\
> + printk(KERN_DEBUG fmt, ##args); \
> + } while (0)
> +
> +

I don't know how many of these printks exist and if
overall object size matters but using

#define kexec_dprintkfmt, ...)  \
printk("%s" fmt,\
   kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG,   \
   ##__VA_ARGS__)

should reduce overall object size by eliminating the
mostly duplicated format in .text which differs only
by the KERN_



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 7/7] kexec_file, parisc: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

Signed-off-by: Baoquan He 
---
 arch/parisc/kernel/kexec_file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/parisc/kernel/kexec_file.c b/arch/parisc/kernel/kexec_file.c
index 8c534204f0fd..011545898da7 100644
--- a/arch/parisc/kernel/kexec_file.c
+++ b/arch/parisc/kernel/kexec_file.c
@@ -38,7 +38,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf,
for (i = 0; i < image->nr_segments; i++)
image->segment[i].mem = __pa(image->segment[i].mem);
 
-   pr_debug("Loaded the kernel at 0x%lx, entry at 0x%lx\n",
+   kexec_dprintk("Loaded the kernel at 0x%lx, entry at 0x%lx\n",
 kernel_load_addr, image->start);
 
if (initrd != NULL) {
@@ -51,7 +51,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf,
if (ret)
goto out;
 
-   pr_debug("Loaded initrd at 0x%lx\n", kbuf.mem);
+   kexec_dprintk("Loaded initrd at 0x%lx\n", kbuf.mem);
image->arch.initrd_start = kbuf.mem;
image->arch.initrd_end = kbuf.mem + initrd_len;
}
@@ -68,7 +68,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf,
if (ret)
goto out;
 
-   pr_debug("Loaded cmdline at 0x%lx\n", kbuf.mem);
+   kexec_dprintk("Loaded cmdline at 0x%lx\n", kbuf.mem);
image->arch.cmdline = kbuf.mem;
}
 out:
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 4/7] kexec_file, arm64: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also remove the kimage->segment[] printing because the generic code
has done the printing.

Signed-off-by: Baoquan He 
---
 arch/arm64/kernel/kexec_image.c|  2 +-
 arch/arm64/kernel/machine_kexec.c  | 24 ++--
 arch/arm64/kernel/machine_kexec_file.c |  6 +++---
 3 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 636be6715155..df71965178f5 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -122,7 +122,7 @@ static void *image_load(struct kimage *image,
kernel_segment->memsz -= text_offset;
image->start = kernel_segment->mem;
 
-   pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+   kexec_dprintk("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
kernel_segment->mem, kbuf.bufsz,
kernel_segment->memsz);
 
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 078910db77a4..efd4e03b95d3 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -34,24 +34,12 @@ static void _kexec_image_info(const char *func, int line,
 {
unsigned long i;
 
-   pr_debug("%s:%d:\n", func, line);
-   pr_debug("  kexec kimage info:\n");
-   pr_debug("type:%d\n", kimage->type);
-   pr_debug("start:   %lx\n", kimage->start);
-   pr_debug("head:%lx\n", kimage->head);
-   pr_debug("nr_segments: %lu\n", kimage->nr_segments);
-   pr_debug("dtb_mem: %pa\n", >arch.dtb_mem);
-   pr_debug("kern_reloc: %pa\n", >arch.kern_reloc);
-   pr_debug("el2_vectors: %pa\n", >arch.el2_vectors);
-
-   for (i = 0; i < kimage->nr_segments; i++) {
-   pr_debug("  segment[%lu]: %016lx - %016lx, 0x%lx bytes, %lu 
pages\n",
-   i,
-   kimage->segment[i].mem,
-   kimage->segment[i].mem + kimage->segment[i].memsz,
-   kimage->segment[i].memsz,
-   kimage->segment[i].memsz /  PAGE_SIZE);
-   }
+   kexec_dprintk("%s:%d:\n", func, line);
+   kexec_dprintk("  kexec kimage info:\n");
+   kexec_dprintk("type:%d\n", kimage->type);
+   kexec_dprintk("head:%lx\n", kimage->head);
+   kexec_dprintk("kern_reloc: %pa\n", >arch.kern_reloc);
+   kexec_dprintk("el2_vectors: %pa\n", >arch.el2_vectors);
 }
 
 void machine_kexec_cleanup(struct kimage *kimage)
diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index a11a6e14ba89..9f82401d99f4 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -127,7 +127,7 @@ int load_other_segments(struct kimage *image,
image->elf_load_addr = kbuf.mem;
image->elf_headers_sz = headers_sz;
 
-   pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
+   kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
 image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
}
 
@@ -148,7 +148,7 @@ int load_other_segments(struct kimage *image,
goto out_err;
initrd_load_addr = kbuf.mem;
 
-   pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+   kexec_dprintk("Loaded initrd at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
initrd_load_addr, kbuf.bufsz, kbuf.memsz);
}
 
@@ -179,7 +179,7 @@ int load_other_segments(struct kimage *image,
image->arch.dtb = dtb;
image->arch.dtb_mem = kbuf.mem;
 
-   pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+   kexec_dprintk("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
kbuf.mem, kbuf.bufsz, kbuf.memsz);
 
return 0;
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 6/7] kexec_file, power: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

Signed-off-by: Baoquan He 
---
 arch/powerpc/kexec/elf_64.c   |  8 
 arch/powerpc/kexec/file_load_64.c | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e..904016cf89ea 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -59,7 +59,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
goto out;
 
-   pr_debug("Loaded the kernel at 0x%lx\n", kernel_load_addr);
+   kexec_dprintk("Loaded the kernel at 0x%lx\n", kernel_load_addr);
 
ret = kexec_load_purgatory(image, );
if (ret) {
@@ -67,7 +67,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
goto out;
}
 
-   pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem);
+   kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem);
 
/* Load additional segments needed for panic kernel */
if (image->type == KEXEC_TYPE_CRASH) {
@@ -99,7 +99,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
goto out;
initrd_load_addr = kbuf.mem;
 
-   pr_debug("Loaded initrd at 0x%lx\n", initrd_load_addr);
+   kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_load_addr);
}
 
fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr,
@@ -132,7 +132,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
 
fdt_load_addr = kbuf.mem;
 
-   pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr);
+   kexec_dprintk("Loaded device tree at 0x%lx\n", fdt_load_addr);
 
slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset;
ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr,
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 961a6dd67365..45089f53f875 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -577,7 +577,7 @@ static int add_usable_mem_property(void *fdt, struct 
device_node *dn,
   NODE_PATH_LEN, dn);
return -EOVERFLOW;
}
-   pr_debug("Memory node path: %s\n", path);
+   kexec_dprintk("Memory node path: %s\n", path);
 
/* Now that we know the path, find its offset in kdump kernel's fdt */
node = fdt_path_offset(fdt, path);
@@ -590,8 +590,8 @@ static int add_usable_mem_property(void *fdt, struct 
device_node *dn,
/* Get the address & size cells */
n_mem_addr_cells = of_n_addr_cells(dn);
n_mem_size_cells = of_n_size_cells(dn);
-   pr_debug("address cells: %d, size cells: %d\n", n_mem_addr_cells,
-n_mem_size_cells);
+   kexec_dprintk("address cells: %d, size cells: %d\n", n_mem_addr_cells,
+ n_mem_size_cells);
 
um_info->idx  = 0;
if (!check_realloc_usable_mem(um_info, 2)) {
@@ -664,7 +664,7 @@ static int update_usable_mem_fdt(void *fdt, struct 
crash_mem *usable_mem)
 
node = fdt_path_offset(fdt, "/ibm,dynamic-reconfiguration-memory");
if (node == -FDT_ERR_NOTFOUND)
-   pr_debug("No dynamic reconfiguration memory found\n");
+   kexec_dprintk("No dynamic reconfiguration memory found\n");
else if (node < 0) {
pr_err("Malformed device tree: error reading 
/ibm,dynamic-reconfiguration-memory.\n");
return -EINVAL;
@@ -776,7 +776,7 @@ static void update_backup_region_phdr(struct kimage *image, 
Elf64_Ehdr *ehdr)
for (i = 0; i < ehdr->e_phnum; i++) {
if (phdr->p_paddr == BACKUP_SRC_START) {
phdr->p_offset = image->arch.backup_start;
-   pr_debug("Backup region offset updated to 0x%lx\n",
+   kexec_dprintk("Backup region offset updated to 0x%lx\n",
 image->arch.backup_start);
return;
}
@@ -850,7 +850,7 @@ int load_crashdump_segments_ppc64(struct kimage *image,
pr_err("Failed to load backup segment\n");
return ret;
}
-   pr_debug("Loaded the backup region at 0x%lx\n", kbuf->mem);
+   kexec_dprintk("Loaded the backup region at 0x%lx\n", kbuf->mem);
 
/* Load elfcorehdr segment - to export crashing kernel's vmcore */
ret = load_elfcorehdr_segment(image, kbuf);
@@ -858,7 +858,7 @@ int load_crashdump_segments_ppc64(struct kimage *image,
pr_err("Failed to load elfcorehdr segment\n");
return ret;
}
-   pr_debug("Loaded elf core header at 0x%lx, bufsz=0x%lx memsz=0x%lx\n",
+   kexec_dprintk("Loaded elf core header at 0x%lx, bufsz=0x%lx 
memsz=0x%lx\n",
 image->elf_load_addr, 

[PATCH 3/7] kexec_file, x86: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also print out e820 memmap passed to 2nd kernel just as kexec_load
interface has been doing.

Signed-off-by: Baoquan He 
---
 arch/x86/kernel/crash.c   |  2 +-
 arch/x86/kernel/kexec-bzimage64.c | 23 ++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c92d88680dbf..df4dbd3aa08c 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -386,7 +386,7 @@ int crash_load_segments(struct kimage *image)
if (ret)
return ret;
image->elf_load_addr = kbuf.mem;
-   pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+   kexec_dprintk("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
 image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
 
return ret;
diff --git a/arch/x86/kernel/kexec-bzimage64.c 
b/arch/x86/kernel/kexec-bzimage64.c
index a61c12c01270..e9ae0eac6bf9 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -82,7 +82,7 @@ static int setup_cmdline(struct kimage *image, struct 
boot_params *params,
 
cmdline_ptr[cmdline_len - 1] = '\0';
 
-   pr_debug("Final command line is: %s\n", cmdline_ptr);
+   kexec_dprintk("Final command line is: %s\n", cmdline_ptr);
cmdline_ptr_phys = bootparams_load_addr + cmdline_offset;
cmdline_low_32 = cmdline_ptr_phys & 0xUL;
cmdline_ext_32 = cmdline_ptr_phys >> 32;
@@ -272,7 +272,12 @@ setup_boot_parameters(struct kimage *image, struct 
boot_params *params,
 
nr_e820_entries = params->e820_entries;
 
+   kexec_dprintk("E820 memmap:\n");
for (i = 0; i < nr_e820_entries; i++) {
+   kexec_dprintk("%016llx-%016llx (%d)\n",
+ params->e820_table[i].addr,
+ params->e820_table[i].addr + 
params->e820_table[i].size - 1,
+ params->e820_table[i].type);
if (params->e820_table[i].type != E820_TYPE_RAM)
continue;
start = params->e820_table[i].addr;
@@ -424,7 +429,7 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
 * command line. Make sure it does not overflow
 */
if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) {
-   pr_debug("Appending elfcorehdr= to command line exceeds 
maximum allowed length\n");
+   kexec_dprintk("Appending elfcorehdr= to command line 
exceeds maximum allowed length\n");
return ERR_PTR(-EINVAL);
}
 
@@ -445,7 +450,7 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
return ERR_PTR(ret);
}
 
-   pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem);
+   kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem);
 
 
/*
@@ -490,8 +495,8 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
if (ret)
goto out_free_params;
bootparam_load_addr = kbuf.mem;
-   pr_debug("Loaded boot_param, command line and misc at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
-bootparam_load_addr, kbuf.bufsz, kbuf.bufsz);
+   kexec_dprintk("Loaded boot_param, command line and misc at 0x%lx 
bufsz=0x%lx memsz=0x%lx\n",
+ bootparam_load_addr, kbuf.bufsz, kbuf.bufsz);
 
/* Load kernel */
kbuf.buffer = kernel + kern16_size;
@@ -505,8 +510,8 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
goto out_free_params;
kernel_load_addr = kbuf.mem;
 
-   pr_debug("Loaded 64bit kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-kernel_load_addr, kbuf.bufsz, kbuf.memsz);
+   kexec_dprintk("Loaded 64bit kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+ kernel_load_addr, kbuf.bufsz, kbuf.memsz);
 
/* Load initrd high */
if (initrd) {
@@ -520,8 +525,8 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
goto out_free_params;
initrd_load_addr = kbuf.mem;
 
-   pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-   initrd_load_addr, initrd_len, initrd_len);
+   kexec_dprintk("Loaded initrd at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
+ initrd_load_addr, initrd_len, initrd_len);
 
setup_initrd(params, initrd_load_addr, initrd_len);
}
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/7] kexec_file: print out debugging message if required

2023-11-14 Thread Baoquan He
Currently, specifying '-d' will print a lot of debugging information
about kexec/kdump loading with kexec_load interface.

However, kexec_file_load prints nothing even though '-d' is specified.
It's very inconvenient to debug or analyze the kexec/kdump loading when
something wrong happened with kexec/kdump itself or develper want to
check the kexec/kdump loading.

In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked
in code. If it's passed in, debugging message of kexec_file code will be
printed out and can be seen from console and dmesg. Otherwise, the
debugging message is printed via pr_debug().

Note:
=
1) The code in kexec-tools utility also need be changed to support
passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified.
The patch link is here:
=
[PATCH] kexec_file: add kexec_file flag to support debug printing
http://lists.infradead.org/pipermail/kexec/2023-November/028505.html

2) s390 also has kexec_file code, while I am not sure what debugging
information is necessary. So leave it to s390 dev to add if they think
it's needed.

Test:
==
I did testing on x86_64 and arm64. On x86_64, the printed messages look
like below:
--
kexec measurement buffer for the loaded kernel at 0x207fffe000.
Loaded purgatory at 0x207fff9000
Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 
memsz=0x1180
Loaded 64bit kernel at 0x207c00 bufsz=0xc88200 memsz=0x3c4a000
Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280
Final command line is: 
root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro 
rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 
crashkernel=256M
E820 memmap:
-0009a3ff (1)
0009a400-0009 (2)
000e-000f (2)
0010-6ff83fff (1)
6ff84000-7ac50fff (2)
..
00207fff6150-00207fff615f (128)
00207fff6160-00207fff714f (1)
00207fff7150-00207fff715f (128)
00207fff7160-00207fff814f (1)
00207fff8150-00207fff815f (128)
00207fff8160-00207fff (1)
nr_segments = 5
segment[0]: buf=0x4e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000
segment[1]: buf=0x9e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000
segment[2]: buf=0xd879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000
segment[3]: buf=0x1101cd86 bufsz=0xc88200 mem=0x207c00 
memsz=0x3c4a000
segment[4]: buf=0xc6e38ac7 bufsz=0x2186280 mem=0x2079e79000 
memsz=0x2187000
kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8
---

Baoquan He (7):
  kexec_file: add kexec_file flag to control debug printing
  kexec_file: print out debugging message if required
  kexec_file, x86: print out debugging message if required
  kexec_file, arm64: print out debugging message if required
  kexec_file, ricv: print out debugging message if required
  kexec_file, power: print out debugging message if required
  kexec_file, parisc: print out debugging message if required

 arch/arm64/kernel/kexec_image.c|  2 +-
 arch/arm64/kernel/machine_kexec.c  | 24 ++--
 arch/arm64/kernel/machine_kexec_file.c |  6 +++---
 arch/parisc/kernel/kexec_file.c|  6 +++---
 arch/powerpc/kexec/elf_64.c|  8 
 arch/powerpc/kexec/file_load_64.c  | 14 +++---
 arch/riscv/kernel/elf_kexec.c  |  9 +
 arch/riscv/kernel/machine_kexec.c  | 26 --
 arch/x86/kernel/crash.c|  2 +-
 arch/x86/kernel/kexec-bzimage64.c  | 23 ++-
 include/linux/kexec.h  | 14 +-
 include/uapi/linux/kexec.h |  1 +
 kernel/crash_core.c|  3 ++-
 kernel/kexec_file.c| 12 +++-
 security/integrity/ima/ima_kexec.c |  2 +-
 15 files changed, 72 insertions(+), 80 deletions(-)

-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/7] kexec_file: add kexec_file flag to control debug printing

2023-11-14 Thread Baoquan He
When specifying 'kexec -c -d', kexec_load interface will print loading
information, e.g the regions where kernel/initrd/purgatory/cmdline
are put, the memmap passed to 2nd kernel taken as system RAM ranges,
and printing all contents of struct kexec_segment, etc. These are
very helpful for analyzing or positioning what's happening when
kexec/kdump itself failed. The debugging printing for kexec_load
interface is made in user space utility kexec-tools.

Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing.
Because kexec_file code is mostly implemented in kernel space, and the
debugging printing functionality is missed. It's not convenient when
debugging kexec/kdump loading and jumping with kexec_file_load
interface.

Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
message printing. And add global variable kexec_file_dbg_print and
macro kexec_dprintk() to facilitate the printing.

This is a preparation, later kexec_dprintk() will be used to replace the
existing pr_debug(). Once 'kexec -s -d' is specified, it will print out
kexec/kdump loading information. If '-d' is not specified, it regresses
to pr_debug().

Signed-off-by: Baoquan He 
---
 include/linux/kexec.h  | 14 +-
 include/uapi/linux/kexec.h |  1 +
 kernel/kexec_file.c|  5 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 8227455192b7..189a6c5bec84 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info *pi, 
Elf_Shdr *section,
return -ENOEXEC;
 }
 #endif
+
+extern bool kexec_file_dbg_print;
+
+#define kexec_dprintk(fmt, args...)\
+   do {\
+   if (kexec_file_dbg_print)   \
+   printk(KERN_INFO fmt, ##args);  \
+   else\
+   printk(KERN_DEBUG fmt, ##args); \
+   } while (0)
+
+
 #endif /* CONFIG_KEXEC_FILE */
 
 #ifdef CONFIG_KEXEC_ELF
@@ -403,7 +415,7 @@ bool kexec_load_permitted(int kexec_image_type);
 
 /* List of defined/legal kexec file flags */
 #define KEXEC_FILE_FLAGS   (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
-KEXEC_FILE_NO_INITRAMFS)
+KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_DEBUG)
 
 /* flag to track if kexec reboot is in progress */
 extern bool kexec_in_progress;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 01766dd839b0..c17bb096ea68 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -25,6 +25,7 @@
 #define KEXEC_FILE_UNLOAD  0x0001
 #define KEXEC_FILE_ON_CRASH0x0002
 #define KEXEC_FILE_NO_INITRAMFS0x0004
+#define KEXEC_FILE_DEBUG   0x0008
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f9a419cd22d4..4c35500ae40a 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -38,6 +38,8 @@ void set_kexec_sig_enforced(void)
 }
 #endif
 
+bool kexec_file_dbg_print;
+
 static int kexec_calculate_store_digests(struct kimage *image);
 
 /* Maximum size in bytes for kernel/initrd files. */
@@ -123,6 +125,8 @@ void kimage_file_post_load_cleanup(struct kimage *image)
 */
kfree(image->image_loader_data);
image->image_loader_data = NULL;
+
+   kexec_file_dbg_print = false;
 }
 
 #ifdef CONFIG_KEXEC_SIG
@@ -278,6 +282,7 @@ kimage_file_alloc_init(struct kimage **rimage, int 
kernel_fd,
if (!image)
return -ENOMEM;
 
+   kexec_file_dbg_print = !!(flags & KEXEC_FILE_DEBUG);
image->file_mode = 1;
 
if (kexec_on_panic) {
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/7] kexec_file: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also print out type/start/head of kimage and flags to help debug.

Signed-off-by: Baoquan He 
---
 kernel/crash_core.c| 3 ++-
 kernel/kexec_file.c| 7 ++-
 security/integrity/ima/ima_kexec.c | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index efe87d501c8c..41001ffbaa99 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -551,7 +551,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int 
need_kernel_map,
phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
phdr->p_align = 0;
ehdr->e_phnum++;
-   pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, 
paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
+   kexec_dprintk("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, 
paddr=0x%llx, "
+   "sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
ehdr->e_phnum, phdr->p_offset);
phdr++;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 4c35500ae40a..7ae1b0901aa4 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -206,6 +206,8 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
if (ret < 0)
return ret;
image->kernel_buf_len = ret;
+   kexec_dprintk("kernel: %p kernel_size: %#lx\n",
+ image->kernel_buf, image->kernel_buf_len);
 
/* Call arch image probe handlers */
ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
@@ -389,11 +391,12 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, 
initrd_fd,
if (ret)
goto out;
 
+   kexec_dprintk("nr_segments = %lu\n", image->nr_segments);
for (i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
 
ksegment = >segment[i];
-   pr_debug("Loading segment %d: buf=0x%p bufsz=0x%zx mem=0x%lx 
memsz=0x%zx\n",
+   kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx 
memsz=0x%zx\n",
 i, ksegment->buf, ksegment->bufsz, ksegment->mem,
 ksegment->memsz);
 
@@ -408,6 +411,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, 
initrd_fd,
if (ret)
goto out;
 
+   kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx 
flags:0x%lx\n",
+ image->type, image->start, image->head, flags);
/*
 * Free up any temporary buffers allocated which are not needed
 * after image has been loaded
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index ad133fe120db..e692624bcab3 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -129,7 +129,7 @@ void ima_add_kexec_buffer(struct kimage *image)
image->ima_buffer_size = kexec_segment_size;
image->ima_buffer = kexec_buffer;
 
-   pr_debug("kexec measurement buffer for the loaded kernel at 0x%lx.\n",
+   kexec_dprintk("kexec measurement buffer for the loaded kernel at 
0x%lx.\n",
 kbuf.mem);
 }
 #endif /* IMA_KEXEC */
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 5/7] kexec_file, ricv: print out debugging message if required

2023-11-14 Thread Baoquan He
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also remove kexec_image_info() because the content has been printed
out in generic code.

Signed-off-by: Baoquan He 
---
 arch/riscv/kernel/elf_kexec.c |  9 +
 arch/riscv/kernel/machine_kexec.c | 26 --
 2 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index e60fbd8660c4..20d941e91b5e 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -216,7 +216,6 @@ static void *elf_kexec_load(struct kimage *image, char 
*kernel_buf,
if (ret)
goto out;
kernel_start = image->start;
-   pr_notice("The entry point of kernel at 0x%lx\n", image->start);
 
/* Add the kernel binary to the image */
ret = riscv_kexec_elf_load(image, , _info,
@@ -252,7 +251,7 @@ static void *elf_kexec_load(struct kimage *image, char 
*kernel_buf,
image->elf_load_addr = kbuf.mem;
image->elf_headers_sz = headers_sz;
 
-   pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
+   kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx 
memsz=0x%lx\n",
 image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
 
/* Setup cmdline for kdump kernel case */
@@ -275,6 +274,8 @@ static void *elf_kexec_load(struct kimage *image, char 
*kernel_buf,
pr_err("Error loading purgatory ret=%d\n", ret);
goto out;
}
+   kexec_dprintk("Loaded purgatory at 0x%lx\n", kbuf.mem);
+
ret = kexec_purgatory_get_set_symbol(image, "riscv_kernel_entry",
 _start,
 sizeof(kernel_start), 0);
@@ -293,7 +294,7 @@ static void *elf_kexec_load(struct kimage *image, char 
*kernel_buf,
if (ret)
goto out;
initrd_pbase = kbuf.mem;
-   pr_notice("Loaded initrd at 0x%lx\n", initrd_pbase);
+   kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_pbase);
}
 
/* Add the DTB to the image */
@@ -318,7 +319,7 @@ static void *elf_kexec_load(struct kimage *image, char 
*kernel_buf,
}
/* Cache the fdt buffer address for memory cleanup */
image->arch.fdt = fdt;
-   pr_notice("Loaded device tree at 0x%lx\n", kbuf.mem);
+   kexec_dprintk("Loaded device tree at 0x%lx\n", kbuf.mem);
goto out;
 
 out_free_fdt:
diff --git a/arch/riscv/kernel/machine_kexec.c 
b/arch/riscv/kernel/machine_kexec.c
index 2d139b724bc8..ed9cad20c039 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -18,30 +18,6 @@
 #include 
 #include 
 
-/*
- * kexec_image_info - Print received image details
- */
-static void
-kexec_image_info(const struct kimage *image)
-{
-   unsigned long i;
-
-   pr_debug("Kexec image info:\n");
-   pr_debug("\ttype:%d\n", image->type);
-   pr_debug("\tstart:   %lx\n", image->start);
-   pr_debug("\thead:%lx\n", image->head);
-   pr_debug("\tnr_segments: %lu\n", image->nr_segments);
-
-   for (i = 0; i < image->nr_segments; i++) {
-   pr_debug("\tsegment[%lu]: %016lx - %016lx", i,
-   image->segment[i].mem,
-   image->segment[i].mem + image->segment[i].memsz);
-   pr_debug("\t\t0x%lx bytes, %lu pages\n",
-   (unsigned long) image->segment[i].memsz,
-   (unsigned long) image->segment[i].memsz /  PAGE_SIZE);
-   }
-}
-
 /*
  * machine_kexec_prepare - Initialize kexec
  *
@@ -60,8 +36,6 @@ machine_kexec_prepare(struct kimage *image)
unsigned int control_code_buffer_sz = 0;
int i = 0;
 
-   kexec_image_info(image);
-
/* Find the Flattened Device Tree and save its physical address */
for (i = 0; i < image->nr_segments; i++) {
if (image->segment[i].memsz <= sizeof(fdt))
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec_file: add kexec_file flag to support debug printing

2023-11-14 Thread Baoquan He
This add KEXEC_FILE_DEBUG to kexec_file_flags so that it can be passed
to kernel when '-d' is added with kexec_file_load interface. With that
flag enabled, kernel can enable the debugging message printing.

Signed-off-by: Baoquan He 
---
 kexec/kexec-syscall.h | 1 +
 kexec/kexec.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/kexec/kexec-syscall.h b/kexec/kexec-syscall.h
index 2559bffb93da..73e52543e1b0 100644
--- a/kexec/kexec-syscall.h
+++ b/kexec/kexec-syscall.h
@@ -119,6 +119,7 @@ static inline long kexec_file_load(int kernel_fd, int 
initrd_fd,
 #define KEXEC_FILE_UNLOAD  0x0001
 #define KEXEC_FILE_ON_CRASH0x0002
 #define KEXEC_FILE_NO_INITRAMFS0x0004
+#define KEXEC_FILE_DEBUG   0x0008
 
 /* These values match the ELF architecture values. 
  * Unless there is a good reason that should continue to be the case.
diff --git a/kexec/kexec.c b/kexec/kexec.c
index 08edfca2349a..b5393e3b20aa 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1477,6 +1477,7 @@ int main(int argc, char *argv[])
return 0;
case OPT_DEBUG:
kexec_debug = 1;
+   kexec_file_flags |= KEXEC_FILE_DEBUG;
break;
case OPT_NOIFDOWN:
skip_ifdown = 1;
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000

2023-11-14 Thread Baoquan He
Hi John,

On 10/23/23 at 02:54pm, John Sperbeck wrote:
> On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin  wrote:
..
> > >---
> > > arch/x86/kernel/kexec-bzimage64.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > >diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> > >b/arch/x86/kernel/kexec-bzimage64.c
> > >index a61c12c01270..d6bf6c13dab1 100644
> > >--- a/arch/x86/kernel/kexec-bzimage64.c
> > >+++ b/arch/x86/kernel/kexec-bzimage64.c
> > >@@ -36,7 +36,7 @@
> > >  */
> > > #define MIN_PURGATORY_ADDR0x3000
> > > #define MIN_BOOTPARAM_ADDR0x3000
> > >-#define MIN_KERNEL_LOAD_ADDR  0x10
> > >+#define MIN_KERNEL_LOAD_ADDR  0x100
> > > #define MIN_INITRD_LOAD_ADDR  0x100
> > >
> > > /*
> >
> > This doesn't make any sense to me. There is already a high water mark for 
> > his much memory the kernel needs until an initrd or setup_data item can 
> > appear. This is just a hack, please fix it properly.
> 
> The startup_64() code in head_64.S changes behavior based on whether
> it's running below or above LOAD_PHYSICAL_ADDR:
> 
> #ifdef CONFIG_RELOCATABLE
> leaqstartup_32(%rip) /* - $startup_32 */, %rbp
> movlBP_kernel_alignment(%rsi), %eax
> decl%eax
> addq%rax, %rbp
> notq%rax
> andq%rax, %rbp
> cmpq$LOAD_PHYSICAL_ADDR, %rbp
> jae 1f
> #endif
> movq$LOAD_PHYSICAL_ADDR, %rbp
> 1:
> 
> In my example, we were running from address 0x0040.  The %rbp
> register will start with 0x0040, but will be changed to 0x0100
> after the check against LOAD_PHYSICAL_ADDR fails.
> 
> The 0x0100 value in %rbp is passed to extract_kernel as the
> 'output' argument.  Unless choose_random_location() decides
> differently, this will be where the kernel is decompressed to.  The
> size of the kernel is large enough in my example that the
> decompression overruns the initrd.
> 
> If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and
> used %rpb as is, then there would be no issue.  The decompression
> would have been to 0x0040 and would have completed before reaching
> the initrd memory.
> 
> That is, the kexec code is being careful to ensure that the kernel and
> initrd memory doesn't overlap, but isn't paying attention to what
> happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel
> address is effectively changed to a different location).  My proposed
> change makes it aware, and avoids such addresses.

Wondering why kexec-ed kernel is located under 0x100. The loading
code will search physical memory regions bottom up for an available one.
Usually, kexec kernel will be loaded above 16M.

I have posted a patchset to load kernel at top of system RAM for kexec_file
load just as kexec_load has been doing. Do you think it's helpful?

[PATCH 0/2] kexec_file: Load kernel at top of system RAM if required
https://lore.kernel.org/all/20231114091658.228030-1-...@redhat.com/T/#u

Thanks
Baoquan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] kexec_file: Load kernel at top of system RAM if required

2023-11-14 Thread Baoquan He
Kexec_load interface has been doing top down searching and loading
kernel/initrd/purgtory etc to prepare for kexec reboot. In that way,
the benefits are that it avoids to consume and fragment limited low
memory which satisfy DMA buffer allocation and big chunk of continuous
memory during system init; and avoids to stir with BIOS/FW reserved
or occupied areas, or corner case handling/work around/quirk occupied
areas when doing system init. By the way, the top-down searching and
loading of kexec-ed kernel is done in user space utility code.

For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
ignored. It calls walk_system_ram_res() directly to go through all
resources of System RAM bottom up, to find an available memory region,
then call locate_mem_hole_callback() to allocate memory in that found
memory region from top to down. This is not expected and inconsistent
with kexec_load.

Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(),
if yes, call the newly added walk_system_ram_res_rev() to find memory
region of system RAM from top to down to load kernel/initrd etc.

Signed-off-by: Baoquan He 
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f9a419cd22d4..ba3ef30921b8 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -592,6 +592,8 @@ static int kexec_walk_resources(struct kexec_buf *kbuf,
   IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
   crashk_res.start, crashk_res.end,
   kbuf, func);
+   else if (kbuf->top_down)
+   return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
else
return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2] kexec_file: Load kernel at top of system RAM if required

2023-11-14 Thread Baoquan He
Justification:
==
Kexec_load interface has been doing top down searching and loading
kernel/initrd/purgtory etc to prepare for kexec reboot. In that way,
the benefits are that it avoids to consume and fragment limited low
memory which satisfy DMA buffer allocation and big chunk of continuous
memory during system init; and avoids to stir with BIOS/FW reserved
or occupied areas, or corner case handling/work around/quirk occupied
areas when doing system init. By the way, the top-down searching and
loading of kexec-ed kernel is done in user space utility code.

For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
ignored. It calls walk_system_ram_res() directly to go through all
resources of System RAM bottom up, to find an available memory region,
then call locate_mem_hole_callback() to allocate memory in that found
memory region from top to down. This is not expected and inconsistent
with kexec_load.

Implementation
===
In patch 1, introduce a new function walk_system_ram_res_rev() which is
a variant of walk_system_ram_res(), it walks through a list of all the
resources of System RAM in reversed order, i.e., from higher to lower.

In patch 2, check if kexec_buf.top_down is 'true' in kexec_walk_resources(),
if yes, call walk_system_ram_res_rev() to find memory region of system RAM
from top to down to load kernel/initrd etc.

Background information:
===
And I ever tried this in the past in a different way, please see below
link. In the post, I tried to adjust struct sibling linking code,
replace the the singly linked list with list_head so that
walk_system_ram_res_rev() can be implemented in a much easier way.
Finally I failed.
https://lore.kernel.org/all/20180718024944.577-4-...@redhat.com/

This time, I picked up the patch from AKASHI Takahiro's old post and
made some change to take as the current patch 1:
https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html

Testing:

Only tried on x86_64

Baoquan He (2):
  resource: add walk_system_ram_res_rev()
  kexec_file: Load kernel at top of system RAM if required

 include/linux/ioport.h |  3 +++
 kernel/kexec_file.c|  2 ++
 kernel/resource.c  | 61 ++
 3 files changed, 66 insertions(+)

-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] resource: add walk_system_ram_res_rev()

2023-11-14 Thread Baoquan He
This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code to load kernel, initrd etc when
preparing kexec reboot.

Signed-off-by: AKASHI Takahiro 
Signed-off-by: Baoquan He 
---
 include/linux/ioport.h |  3 +++
 kernel/resource.c  | 61 ++
 2 files changed, 64 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 14f5cfabbbc8..db7fe25f3370 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -331,6 +331,9 @@ extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
 extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(struct resource *, void *));
+extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(struct resource *, void *));
 
diff --git a/kernel/resource.c b/kernel/resource.c
index 866ef3663a0b..12bce44a2c08 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 
@@ -429,6 +431,65 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 func);
 }
 
+/*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(struct resource *, void *))
+{
+   struct resource res, *rams;
+   int rams_size = 16, i;
+   unsigned long flags;
+   int ret = -1;
+
+   /* create a list */
+   rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL);
+   if (!rams)
+   return ret;
+
+   flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   i = 0;
+   while ((start < end) &&
+   (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, 
))) {
+   if (i >= rams_size) {
+   /* re-alloc */
+   struct resource *rams_new;
+   int rams_new_size;
+
+   rams_new_size = rams_size + 16;
+   rams_new = kvcalloc(rams_new_size, sizeof(struct 
resource),
+   GFP_KERNEL);
+   if (!rams_new)
+   goto out;
+
+   memcpy(rams_new, rams,
+   sizeof(struct resource) * rams_size);
+   kvfree(rams);
+   rams = rams_new;
+   rams_size = rams_new_size;
+   }
+
+   rams[i].start = res.start;
+   rams[i++].end = res.end;
+
+   start = res.end + 1;
+   }
+
+   /* go reverse */
+   for (i--; i >= 0; i--) {
+   ret = (*func)([i], arg);
+   if (ret)
+   break;
+   }
+
+out:
+   kvfree(rams);
+   return ret;
+}
+
 /*
  * This function calls the @func callback against all memory ranges, which
  * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec