Re: [PATCH 1/7] kexec_file: add kexec_file flag to control debug printing
On 11/14/23 at 08:03am, Joe Perches wrote: > On Tue, 2023-11-14 at 23:32 +0800, Baoquan He wrote: > > When specifying 'kexec -c -d', kexec_load interface will print loading > > information, e.g the regions where kernel/initrd/purgatory/cmdline > > are put, the memmap passed to 2nd kernel taken as system RAM ranges, > > and printing all contents of struct kexec_segment, etc. These are > > very helpful for analyzing or positioning what's happening when > > kexec/kdump itself failed. The debugging printing for kexec_load > > interface is made in user space utility kexec-tools. > > > > Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. > > Because kexec_file code is mostly implemented in kernel space, and the > > debugging printing functionality is missed. It's not convenient when > > debugging kexec/kdump loading and jumping with kexec_file_load > > interface. > > > > Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging > > message printing. And add global variable kexec_file_dbg_print and > > macro kexec_dprintk() to facilitate the printing. > > > > This is a preparation, later kexec_dprintk() will be used to replace the > > existing pr_debug(). Once 'kexec -s -d' is specified, it will print out > > kexec/kdump loading information. If '-d' is not specified, it regresses > > to pr_debug(). > > Not quite as pr_debug is completely eliminated with > zero object size when DEBUG is not #defined. > > Now the object size will be larger and contain the > formats in .text. Ah, I didn't realize that. Thanks for telling. I didn't take pr_info() and pr_debug because I want to avoid printing the pr_fmt() string in each file. > > [] > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > [] > > @@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info > > *pi, Elf_Shdr *section, > > return -ENOEXEC; > > } > > #endif > > + > > +extern bool kexec_file_dbg_print; > > + > > +#define kexec_dprintk(fmt, args...)\ > > + do {\ > > + if (kexec_file_dbg_print) \ > > + printk(KERN_INFO fmt, ##args); \ > > + else\ > > + printk(KERN_DEBUG fmt, ##args); \ > > + } while (0) > > + > > + > > I don't know how many of these printks exist and if > overall object size matters but using Not too much because they are spread in different arch. > > #define kexec_dprintkfmt, ...)\ > printk("%s" fmt,\ > kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \ > ##__VA_ARGS__) > > should reduce overall object size by eliminating the > mostly duplicated format in .text which differs only > by the KERN_ Sure, the new one looks great to me, I will update code to take it. Thanks a lot for your great suggestion. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On 11/14/23 at 03:17pm, Andrew Morton wrote: > On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He wrote: > > > This function, being a variant of walk_system_ram_res() introduced in > > commit 8c86e70acead ("resource: provide new functions to walk through > > resources"), walks through a list of all the resources of System RAM > > in reversed order, i.e., from higher to lower. > > > > It will be used in kexec_file code to load kernel, initrd etc when > > preparing kexec reboot. > > > > ... > > > > +/* > > + * This function, being a variant of walk_system_ram_res(), calls the @func > > + * callback against all memory ranges of type System RAM which are marked > > as > > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > > + * higher to lower. > > + */ > > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > > + int (*func)(struct resource *, void *)) > > +{ > > + struct resource res, *rams; > > + int rams_size = 16, i; > > + unsigned long flags; > > + int ret = -1; > > + > > + /* create a list */ > > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > > + if (!rams) > > + return ret; > > + > > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > + i = 0; > > + while ((start < end) && > > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, > > ))) { > > + if (i >= rams_size) { > > + /* re-alloc */ > > + struct resource *rams_new; > > + int rams_new_size; > > + > > + rams_new_size = rams_size + 16; > > + rams_new = kvcalloc(rams_new_size, sizeof(struct > > resource), > > + GFP_KERNEL); > > kvrealloc()? Exactly. Will udpate. Thanks for the great suggestion. > > > + if (!rams_new) > > + goto out; > > + > > + memcpy(rams_new, rams, > > + sizeof(struct resource) * rams_size); > > + kvfree(rams); > > + rams = rams_new; > > + rams_size = rams_new_size; > > + } > > + > > + rams[i].start = res.start; > > + rams[i++].end = res.end; > > + > > + start = res.end + 1; > > + } > > + > > + /* go reverse */ > > + for (i--; i >= 0; i--) { > > + ret = (*func)([i], arg); > > + if (ret) > > + break; > > + } > > + > > +out: > > + kvfree(rams); > > + return ret; > > +} > > + > > /* > > * This function calls the @func callback against all memory ranges, which > > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > > -- > > 2.41.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 0/7] ima: kexec: measure events between kexec load and execute
On 10/27/23 08:18, Mimi Zohar wrote: On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote: The current Kernel behavior is IMA measurements snapshot is taken at kexec 'load' and not at kexec 'execute'. IMA log is then carried over to the new Kernel after kexec 'execute'. Some systems can be configured to call kexec 'load' first, and followed by kexec 'execute' after some time. (as opposed to calling 'load' and 'execute' in one single kexec command). Additional measurements may be introduced by the kexec load itself. Saving the measurement list as close as possible to the reboot is beneficial, whether or not the kexec load and kexec execute are executed separately. True. What I am trying to say here is the longer the window between 'load' and 'execute', greater are the chances of measurements getting added. But as long as a single measurement gets added between 'load' and 'execute', it will break the attestation after kexec soft-reboot. So maybe the above line in the patch description is not necessary. I will remove. In such scenario, if new IMA measurements are added between kexec 'load' and kexec 'execute', the TPM PCRs are extended with the IMA events between 'load' and 'execute'. But those IMA events are not carried over to the new Kernel after kexec soft reboot. This results in mismatch between TPM PCR quotes, and the actual IMA measurements list, after the system boots into the new kexec image. This mismatch results in the remote attestation failing for that system. This patch series proposes a solution to solve this problem by allocating the necessary buffer at kexec 'load' time, and populating the buffer with the IMA measurements at kexec 'execute' time. How about beginning the paragraph with "To solve this problem allocate ... and populate ..." Sure. Will do. ~Tushar
Re: [PATCH 1/2] resource: add walk_system_ram_res_rev()
On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He wrote: > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code to load kernel, initrd etc when > preparing kexec reboot. > > ... > > +/* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + unsigned long flags; > + int ret = -1; > + > + /* create a list */ > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > + if (!rams) > + return ret; > + > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((start < end) && > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, > ))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + int rams_new_size; > + > + rams_new_size = rams_size + 16; > + rams_new = kvcalloc(rams_new_size, sizeof(struct > resource), > + GFP_KERNEL); kvrealloc()? > + if (!rams_new) > + goto out; > + > + memcpy(rams_new, rams, > + sizeof(struct resource) * rams_size); > + kvfree(rams); > + rams = rams_new; > + rams_size = rams_new_size; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + start = res.end + 1; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)([i], arg); > + if (ret) > + break; > + } > + > +out: > + kvfree(rams); > + return ret; > +} > + > /* > * This function calls the @func callback against all memory ranges, which > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > -- > 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 7/7] ima: record log size at kexec load and execute
On 10/27/23 07:56, Mimi Zohar wrote: Hi Tushar, On Thu, 2023-10-05 at 11:26 -0700, Tushar Sugandhi wrote: The window between kexec 'load' and 'execute' could be arbitrarily long. Even with the large chunk of memory allocated at kexec 'load', it may run out which would result in missing events in IMA log after the system soft reboots to the new Kernel. This would result in IMA measurements getting out of sync with the TPM PCR quotes which would result in remote attestation failing for that system. There is currently no way for the new Kernel to know if the IMA log TPM PCR quote out of sync problem is because of the missing measurements during kexec. Define two new IMA events, 'kexec_load' and 'kexec_execute', to be measured at kexec 'load' and 'execute' respectively. IMA measures 'boot_aggregate' as the first event when the system boots - either cold boot or kexec soft boot. In case when the system goes through multiple soft reboots, the number of 'boot_aggregate' events in IMA log corresponds to total number of boots (cold boot plus multiple kexec soft reboots). With this change, there would be additional 'kexec_load' and 'kexec_execute' events in between the two 'boot_aggregate' events. In rare cases, when the system runs out of memory during kexec soft reboot, 'kexec_execute' won't be copied since its one of the very last event measured just before kexec soft reboot. The absence of the event 'kexec_execute' in between the two boot_aggregate' events would signal the attestation service that the IMA log on the system is out of sync with TPM PCR quotes and the system needs to be cold booted for the remote attestation to succeed again. Signed-off-by: Tushar Sugandhi Adding a new type of critical data is fine. The patch description should describe the reason for it. Please update the Subject line and the patch description accordingly. Thanks. I described the reasons in the code comment below, because I thought it is important enough to stay with the code to give context. But I can move it from the code comment to a patch description. --- security/integrity/ima/ima_kexec.c | 35 +- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 6cd5f46a7208..0f9c424fe808 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -17,6 +17,8 @@ #include "ima.h" #ifdef CONFIG_IMA_KEXEC +#define IMA_KEXEC_EVENT_LEN 128 + struct seq_file ima_kexec_file; struct ima_kexec_hdr ima_khdr; static void *ima_kexec_buffer; @@ -34,6 +36,8 @@ void ima_clear_kexec_file(void) static int ima_alloc_kexec_buf(size_t kexec_segment_size) { + char ima_kexec_event[IMA_KEXEC_EVENT_LEN]; + if ((kexec_segment_size == 0) || (kexec_segment_size == ULONG_MAX) || ((kexec_segment_size >> PAGE_SHIFT) > totalram_pages() / 2)) { @@ -64,6 +68,12 @@ static int ima_alloc_kexec_buf(size_t kexec_segment_size) memset(_khdr, 0, sizeof(ima_khdr)); ima_khdr.version = 1; + scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN, + "kexec_segment_size=%lu;", kexec_segment_size); + + ima_measure_critical_data("ima_kexec", "kexec_load", ima_kexec_event, + strlen(ima_kexec_event), false, NULL, 0); + return 0; } @@ -198,6 +208,7 @@ void ima_add_kexec_buffer(struct kimage *image) static int ima_update_kexec_buffer(struct notifier_block *self, unsigned long action, void *data) { + char ima_kexec_event[IMA_KEXEC_EVENT_LEN]; void *buf = NULL; size_t buf_size; bool resume = false; @@ -213,9 +224,31 @@ static int ima_update_kexec_buffer(struct notifier_block *self, return NOTIFY_OK; } + buf_size = ima_get_binary_runtime_size(); + scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN, + "kexec_segment_size=%lu;ima_binary_runtime_size=%lu;", + kexec_segment_size, buf_size); + + /* +* This is one of the very last events measured by IMA before kexec +* soft rebooting into the new Kernal. +* This event can be used as a marker after the system soft reboots +* to the new Kernel to check if there was sufficient memory allocated +* at kexec 'load' to capture the events measured between the window +* of kexec 'load' and 'execute'. +* This event needs to be present in the IMA log, in between the two +* 'boot_aggregate' events that are logged for the previous boot and +* the current soft reboot. If it is not present after the system soft +* reboots into the new Kernel, it would mean the IMA log is not +* consistent with the TPM PCR quotes, and the system needs to be +* cold-booted for the attestation to succeed again. +*/ Comments like this
Re: [PATCH v2 2/7] ima: move ima_dump_measurement_list call from kexec load to execute
On 10/27/23 06:08, Mimi Zohar wrote: Hi Tushar, On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote: In the current IMA implementation, ima_dump_measurement_list() is called during the kexec 'load' operation. This can result in loss of IMA measurements taken between the 'load' and 'execute' phases when the system goes through Kexec soft reboot to a new Kernel. The call to the function ima_dump_measurement_list() needs to be moved out of the function ima_add_kexec_buffer() and needs to be called during the kexec 'execute' operation. Implement a function ima_update_kexec_buffer() that is called during kexec 'execute', allowing IMA to update the measurement list with the events between kexec 'load' and 'execute'. Move the ima_dump_measurement_list() call from ima_add_kexec_buffer() to ima_update_kexec_buffer(). Make ima_kexec_buffer and kexec_segment_size variables global, so that they can be accessed during both kexec 'load' and 'execute'. Add functions ima_measurements_suspend() and ima_measurements_resume() to set and reset the 'suspend_ima_measurements' variable respectively, to suspend/resume IMA measurements. Use the existing 'ima_extend_list_mutex' to ensure that the operations are thread-safe. These function calls will help maintaining the integrity of the IMA log while it is being copied to the new Kernel's buffer. Add a reboot notifier_block 'update_buffer_nb' to ensure the function ima_update_kexec_buffer() gets called during kexec soft-reboot. Signed-off-by: Tushar Sugandhi The lengthiness and complexity of the patch description is an indication that the patch needs to be broken up. Please refer to Documentation/process/submitting-patches.rst for further info. In addition, this patch moves the function ima_dump_measurement_list() to a new function named ima_update_kexec_buffer(), which is never called. The patch set is thus not bisect safe. > [...] +void ima_measurements_suspend(void) +{ + mutex_lock(_extend_list_mutex); + atomic_set(_ima_measurements, 1); + mutex_unlock(_extend_list_mutex); +} + +void ima_measurements_resume(void) +{ + mutex_lock(_extend_list_mutex); + atomic_set(_ima_measurements, 0); + mutex_unlock(_extend_list_mutex); +} These function are being defined and called here, but are not enforced until a later patch. It would make more sense to introduce and enforce these functions in a single patch with an explanation as to why suspend/resume is needed. The cover letter describes the problem as "... new IMA measurements are added between kexec 'load' and kexec 'execute'".Please include in the patch description the reason for needing suspend/resume, since saving the measurement records is done during kexec execute. Since I am introducing a few new functions in this patch set, I am having hard time keeping the patches bisect safe and at the same time managing the length/complexity of the individual patches in the series. I will go back and revisit the bisect-safeness of the patches in this series again. Thanks for the feedback, appreciate it.
Re: [PATCH v2 1/7] ima: refactor ima_dump_measurement_list to move memory allocation to a separate function
On 10/26/23 20:25, Mimi Zohar wrote: On Thu, 2023-10-26 at 16:16 -0400, Mimi Zohar wrote: Hi Tushar, According to Documentation/process/submitting-patches.rst, the subject line should be between 70-75 characters. Perhaps something like "ima: define and call ima_alloc_kexec_buffer()". On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote: IMA allocates memory and dumps the measurement during kexec soft reboot as a single function call ima_dump_measurement_list(). It gets called during kexec 'load' operation. It results in the IMA measurements between the window of kexec 'load' and 'execute' getting dropped when the system boots into the new Kernel. One of the kexec requirements is the segment size cannot change between the 'load' and the 'execute'. Therefore, to address this problem, ima_dump_measurement_list() needs to be refactored to allocate the memory at kexec 'load', and dump the measurements at kexec 'execute'. The function that allocates the memory should handle the scenario where the kexec load is called multiple times The above pragraph is unnecessary. Refactor ima_dump_measurement_list() to move the memory allocation part to a separate function ima_alloc_kexec_buf() to allocate buffer of size 'kexec_segment_size' at kexec 'load'. Make the local variables in function ima_dump_measurement_list() global, so that they can be accessed from ima_alloc_kexec_buf(). Make necessary changes to the function ima_add_kexec_buffer() to call the above two functions. Fix the wording based on the suggested changes below. Signed-off-by: Tushar Sugandhi - Before re-posting this patch set, verify there aren't any "checkpatch.pl --strict" issues. - After applying each patch, compile the kernel and verify it still works. Doing this will detect whether or not the patch set is bisect safe. I usually just do checkpatch.pl <.patch file>. I didn't know about --strict and it's benefits. Will do it going forward. --- security/integrity/ima/ima_kexec.c | 126 + 1 file changed, 93 insertions(+), 33 deletions(-) diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 419dc405c831..307e07991865 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -15,61 +15,114 @@ #include "ima.h" #ifdef CONFIG_IMA_KEXEC +struct seq_file ima_kexec_file; Define "ima_kexec_file" as static since it only used in this file. Since the variable does not need to be global, is there still a reason for changing its name? Minimize code change. Adding "static" would make ima_kexec_file a global static variable. Please ignore my comment about reverting the variable name change. Mimi Sure :) ~Tushar ...
Re: [PATCH v2 1/7] ima: refactor ima_dump_measurement_list to move memory allocation to a separate function
Thanks a lot for reviewing this patch set Mimi. On 10/26/23 13:16, Mimi Zohar wrote: Hi Tushar, According to Documentation/process/submitting-patches.rst, the subject line should be between 70-75 characters. Perhaps something like "ima: define and call ima_alloc_kexec_buffer()". Sure thing. I will shorten the subject line. Here and elsewhere. On Thu, 2023-10-05 at 11:25 -0700, Tushar Sugandhi wrote: IMA allocates memory and dumps the measurement during kexec soft reboot as a single function call ima_dump_measurement_list(). It gets called during kexec 'load' operation. It results in the IMA measurements between the window of kexec 'load' and 'execute' getting dropped when the system boots into the new Kernel. One of the kexec requirements is the segment size cannot change between the 'load' and the 'execute'. Therefore, to address this problem, ima_dump_measurement_list() needs to be refactored to allocate the memory at kexec 'load', and dump the measurements at kexec 'execute'. The function that allocates the memory should handle the scenario where the kexec load is called multiple times The above pragraph is unnecessary. Sounds good. I will remove the above paragraph. Refactor ima_dump_measurement_list() to move the memory allocation part to a separate function ima_alloc_kexec_buf() to allocate buffer of size 'kexec_segment_size' at kexec 'load'. Make the local variables in function ima_dump_measurement_list() global, so that they can be accessed from ima_alloc_kexec_buf(). Make necessary changes to the function ima_add_kexec_buffer() to call the above two functions. Fix the wording based on the suggested changes below. Will do. Signed-off-by: Tushar Sugandhi Before re-posting this patch set, verify there aren't any "checkpatch.pl --strict" issues. After applying each patch, compile the kernel and verify it still works. >> --- security/integrity/ima/ima_kexec.c | 126 + 1 file changed, 93 insertions(+), 33 deletions(-) diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 419dc405c831..307e07991865 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -15,61 +15,114 @@ #include "ima.h" #ifdef CONFIG_IMA_KEXEC +struct seq_file ima_kexec_file; Define "ima_kexec_file" as static since it only used in this file. Since the variable does not need to be global, is there still a reason for changing its name? Minimize code change. +struct ima_kexec_hdr ima_khdr; + +void ima_clear_kexec_file(void) The opposite of "set" is "clear" (e.g. set_git, clear_bit). The opposite of "alloc" would be "free" (e.g. ima_alloc_kexec_buf, ima_free_kexec_buf) Agreed. Will make it ima_free_kexec_buf. +{ + vfree(ima_kexec_file.buf); + ima_kexec_file.buf = NULL; + ima_kexec_file.size = 0; + ima_kexec_file.read_pos = 0; + ima_kexec_file.count = 0; +} + +static int ima_alloc_kexec_buf(size_t kexec_segment_size) +{ + if ((kexec_segment_size == 0) || + (kexec_segment_size == ULONG_MAX) || + ((kexec_segment_size >> PAGE_SHIFT) > totalram_pages() / 2)) { + pr_err("%s: Invalid segment size for kexec: %zu\n", + __func__, kexec_segment_size); + return -EINVAL; + } Please avoid code duplication. If moving the test here, then remove it from its original place. Sure. Will do. + + /* +* If kexec load was called before, clear the existing buffer +* before allocating a new one +*/ + if (ima_kexec_file.buf) + ima_clear_kexec_file(); Perhaps instead of always freeing the buffer, check and see whether the size has changed. If it hasn't changed, then simply return. If it has changed, perhaps use realloc(). Possible wording: "Kexec may be called multiple times. Free and re-alloc the buffer if the size changed." That's a good optimization. Thanks for the suggestion. Will do. + + /* segment size can't change between kexec load and execute */ + ima_kexec_file.buf = vmalloc(kexec_segment_size); + if (!ima_kexec_file.buf) { + pr_err("%s: No memory for ima kexec measurement buffer\n", + __func__); + return -ENOMEM; + } + + ima_kexec_file.size = kexec_segment_size; + ima_kexec_file.read_pos = 0; + ima_kexec_file.count = sizeof(ima_khdr);/* reserved space */ + + memset(_khdr, 0, sizeof(ima_khdr)); + ima_khdr.version = 1; + + return 0; +} + static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer, unsigned long segment_size) { struct ima_queue_entry *qe; - struct seq_file file; - struct ima_kexec_hdr khdr; int ret = 0; - /* segment size can't change between kexec load and execute */ - file.buf =
Re: [RFC V2] IMA Log Snapshotting Design Proposal
On 11/14/23 13:36, Sush Shringarputale wrote: On 11/13/2023 10:59 AM, Stefan Berger wrote: On 10/19/23 14:49, Tushar Sugandhi wrote: === | Introduction | === This document provides a detailed overview of the proposed Kernel feature IMA log snapshotting. It describes the motivation behind the proposal, the problem to be solved, a detailed solution design with examples, and describes the changes to be made in the clients/services which are part of remote-attestation system. This is the 2nd version of the proposal. The first version is present here[1]. Table of Contents: -- A. Motivation and Background B. Goals and Non-Goals B.1 Goals B.2 Non-Goals C. Proposed Solution C.1 Solution Summary C.2 High-level Work-flow D. Detailed Design D.1 Snapshot Aggregate Event D.2 Snapshot Triggering Mechanism D.3 Choosing A Persistent Storage Location For Snapshots D.4 Remote-Attestation Client/Service-side Changes D.4.a Client-side Changes D.4.b Service-side Changes E. Example Walk-through F. Other Design Considerations G. References Userspace applications will have to know a) where are the shard files? We describe the file storage location choices in section D.3, but user applications will have to query the well-known location described there. b) how do I read the shard files while locking out the producer of the shard files? IMO, this will require a well known config file and a locking method (flock) so that user space applications can work together in this new environment. The lock could be defined in the config file or just be the config file itself. The flock is a good idea for co-ordination between UM clients. While the Kernel cannot enforce any access in this way, any UM process that is planning on triggering the snapshot mechanism should follow that protocol. We will ensure we document that as the best-practices in the patch series. It's more than 'best practices'. You need a well-known config file with well-known config options in it. All clients that were previously just trying to read new bytes from the IMA log cannot do this anymore in the presence of a log shard producer but have to also learn that a new log shard has been produced so they need to figure out the new position in the log where to read from. So maybe a counter in a config file should indicate to the log readers that a new log has been produced -- otherwise they would have to monitor all the log shard files or the log shard file's size. Iff the log-shard producer were configured to discard leading parts of the log then that should also be noted in a config file so clients, that need to see the beginning of the log, can refuse early on to work on a machine that either is configured this way or where the discarding has already happened. Stefan - Sush
Re: [RFC V2] IMA Log Snapshotting Design Proposal
On 11/13/2023 10:59 AM, Stefan Berger wrote: On 10/19/23 14:49, Tushar Sugandhi wrote: === | Introduction | === This document provides a detailed overview of the proposed Kernel feature IMA log snapshotting. It describes the motivation behind the proposal, the problem to be solved, a detailed solution design with examples, and describes the changes to be made in the clients/services which are part of remote-attestation system. This is the 2nd version of the proposal. The first version is present here[1]. Table of Contents: -- A. Motivation and Background B. Goals and Non-Goals B.1 Goals B.2 Non-Goals C. Proposed Solution C.1 Solution Summary C.2 High-level Work-flow D. Detailed Design D.1 Snapshot Aggregate Event D.2 Snapshot Triggering Mechanism D.3 Choosing A Persistent Storage Location For Snapshots D.4 Remote-Attestation Client/Service-side Changes D.4.a Client-side Changes D.4.b Service-side Changes E. Example Walk-through F. Other Design Considerations G. References Userspace applications will have to know a) where are the shard files? We describe the file storage location choices in section D.3, but user applications will have to query the well-known location described there. b) how do I read the shard files while locking out the producer of the shard files? IMO, this will require a well known config file and a locking method (flock) so that user space applications can work together in this new environment. The lock could be defined in the config file or just be the config file itself. The flock is a good idea for co-ordination between UM clients. While the Kernel cannot enforce any access in this way, any UM process that is planning on triggering the snapshot mechanism should follow that protocol. We will ensure we document that as the best-practices in the patch series. - Sush
[PATCH] kexec: Use atomic_try_cmpxchg in crash_kexec
Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in crash_kexec(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. No functional change intended. Cc: Eric Biederman Signed-off-by: Uros Bizjak --- kernel/kexec_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index be5642a4ec49..bc4c096ab1f3 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1063,9 +1063,10 @@ __bpf_kfunc void crash_kexec(struct pt_regs *regs) * panic(). Otherwise parallel calls of panic() and crash_kexec() * may stop each other. To exclude them, we use panic_cpu here too. */ + old_cpu = PANIC_CPU_INVALID; this_cpu = raw_smp_processor_id(); - old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu); - if (old_cpu == PANIC_CPU_INVALID) { + + if (atomic_try_cmpxchg(_cpu, _cpu, this_cpu)) { /* This is the 1st CPU which comes here, so go ahead. */ __crash_kexec(regs); -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/7] kexec_file: add kexec_file flag to control debug printing
On Tue, 2023-11-14 at 23:32 +0800, Baoquan He wrote: > When specifying 'kexec -c -d', kexec_load interface will print loading > information, e.g the regions where kernel/initrd/purgatory/cmdline > are put, the memmap passed to 2nd kernel taken as system RAM ranges, > and printing all contents of struct kexec_segment, etc. These are > very helpful for analyzing or positioning what's happening when > kexec/kdump itself failed. The debugging printing for kexec_load > interface is made in user space utility kexec-tools. > > Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. > Because kexec_file code is mostly implemented in kernel space, and the > debugging printing functionality is missed. It's not convenient when > debugging kexec/kdump loading and jumping with kexec_file_load > interface. > > Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging > message printing. And add global variable kexec_file_dbg_print and > macro kexec_dprintk() to facilitate the printing. > > This is a preparation, later kexec_dprintk() will be used to replace the > existing pr_debug(). Once 'kexec -s -d' is specified, it will print out > kexec/kdump loading information. If '-d' is not specified, it regresses > to pr_debug(). Not quite as pr_debug is completely eliminated with zero object size when DEBUG is not #defined. Now the object size will be larger and contain the formats in .text. [] > diff --git a/include/linux/kexec.h b/include/linux/kexec.h [] > @@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info *pi, > Elf_Shdr *section, > return -ENOEXEC; > } > #endif > + > +extern bool kexec_file_dbg_print; > + > +#define kexec_dprintk(fmt, args...) \ > + do {\ > + if (kexec_file_dbg_print) \ > + printk(KERN_INFO fmt, ##args); \ > + else\ > + printk(KERN_DEBUG fmt, ##args); \ > + } while (0) > + > + I don't know how many of these printks exist and if overall object size matters but using #define kexec_dprintkfmt, ...) \ printk("%s" fmt,\ kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \ ##__VA_ARGS__) should reduce overall object size by eliminating the mostly duplicated format in .text which differs only by the KERN_ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 7/7] kexec_file, parisc: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. Signed-off-by: Baoquan He --- arch/parisc/kernel/kexec_file.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/parisc/kernel/kexec_file.c b/arch/parisc/kernel/kexec_file.c index 8c534204f0fd..011545898da7 100644 --- a/arch/parisc/kernel/kexec_file.c +++ b/arch/parisc/kernel/kexec_file.c @@ -38,7 +38,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf, for (i = 0; i < image->nr_segments; i++) image->segment[i].mem = __pa(image->segment[i].mem); - pr_debug("Loaded the kernel at 0x%lx, entry at 0x%lx\n", + kexec_dprintk("Loaded the kernel at 0x%lx, entry at 0x%lx\n", kernel_load_addr, image->start); if (initrd != NULL) { @@ -51,7 +51,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf, if (ret) goto out; - pr_debug("Loaded initrd at 0x%lx\n", kbuf.mem); + kexec_dprintk("Loaded initrd at 0x%lx\n", kbuf.mem); image->arch.initrd_start = kbuf.mem; image->arch.initrd_end = kbuf.mem + initrd_len; } @@ -68,7 +68,7 @@ static void *elf_load(struct kimage *image, char *kernel_buf, if (ret) goto out; - pr_debug("Loaded cmdline at 0x%lx\n", kbuf.mem); + kexec_dprintk("Loaded cmdline at 0x%lx\n", kbuf.mem); image->arch.cmdline = kbuf.mem; } out: -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 4/7] kexec_file, arm64: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also remove the kimage->segment[] printing because the generic code has done the printing. Signed-off-by: Baoquan He --- arch/arm64/kernel/kexec_image.c| 2 +- arch/arm64/kernel/machine_kexec.c | 24 ++-- arch/arm64/kernel/machine_kexec_file.c | 6 +++--- 3 files changed, 10 insertions(+), 22 deletions(-) diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c index 636be6715155..df71965178f5 100644 --- a/arch/arm64/kernel/kexec_image.c +++ b/arch/arm64/kernel/kexec_image.c @@ -122,7 +122,7 @@ static void *image_load(struct kimage *image, kernel_segment->memsz -= text_offset; image->start = kernel_segment->mem; - pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n", kernel_segment->mem, kbuf.bufsz, kernel_segment->memsz); diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 078910db77a4..efd4e03b95d3 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -34,24 +34,12 @@ static void _kexec_image_info(const char *func, int line, { unsigned long i; - pr_debug("%s:%d:\n", func, line); - pr_debug(" kexec kimage info:\n"); - pr_debug("type:%d\n", kimage->type); - pr_debug("start: %lx\n", kimage->start); - pr_debug("head:%lx\n", kimage->head); - pr_debug("nr_segments: %lu\n", kimage->nr_segments); - pr_debug("dtb_mem: %pa\n", >arch.dtb_mem); - pr_debug("kern_reloc: %pa\n", >arch.kern_reloc); - pr_debug("el2_vectors: %pa\n", >arch.el2_vectors); - - for (i = 0; i < kimage->nr_segments; i++) { - pr_debug(" segment[%lu]: %016lx - %016lx, 0x%lx bytes, %lu pages\n", - i, - kimage->segment[i].mem, - kimage->segment[i].mem + kimage->segment[i].memsz, - kimage->segment[i].memsz, - kimage->segment[i].memsz / PAGE_SIZE); - } + kexec_dprintk("%s:%d:\n", func, line); + kexec_dprintk(" kexec kimage info:\n"); + kexec_dprintk("type:%d\n", kimage->type); + kexec_dprintk("head:%lx\n", kimage->head); + kexec_dprintk("kern_reloc: %pa\n", >arch.kern_reloc); + kexec_dprintk("el2_vectors: %pa\n", >arch.el2_vectors); } void machine_kexec_cleanup(struct kimage *kimage) diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index a11a6e14ba89..9f82401d99f4 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -127,7 +127,7 @@ int load_other_segments(struct kimage *image, image->elf_load_addr = kbuf.mem; image->elf_headers_sz = headers_sz; - pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n", image->elf_load_addr, kbuf.bufsz, kbuf.memsz); } @@ -148,7 +148,7 @@ int load_other_segments(struct kimage *image, goto out_err; initrd_load_addr = kbuf.mem; - pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n", initrd_load_addr, kbuf.bufsz, kbuf.memsz); } @@ -179,7 +179,7 @@ int load_other_segments(struct kimage *image, image->arch.dtb = dtb; image->arch.dtb_mem = kbuf.mem; - pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n", kbuf.mem, kbuf.bufsz, kbuf.memsz); return 0; -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 6/7] kexec_file, power: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. Signed-off-by: Baoquan He --- arch/powerpc/kexec/elf_64.c | 8 arch/powerpc/kexec/file_load_64.c | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c index eeb258002d1e..904016cf89ea 100644 --- a/arch/powerpc/kexec/elf_64.c +++ b/arch/powerpc/kexec/elf_64.c @@ -59,7 +59,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, if (ret) goto out; - pr_debug("Loaded the kernel at 0x%lx\n", kernel_load_addr); + kexec_dprintk("Loaded the kernel at 0x%lx\n", kernel_load_addr); ret = kexec_load_purgatory(image, ); if (ret) { @@ -67,7 +67,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, goto out; } - pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem); + kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem); /* Load additional segments needed for panic kernel */ if (image->type == KEXEC_TYPE_CRASH) { @@ -99,7 +99,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, goto out; initrd_load_addr = kbuf.mem; - pr_debug("Loaded initrd at 0x%lx\n", initrd_load_addr); + kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_load_addr); } fdt = of_kexec_alloc_and_setup_fdt(image, initrd_load_addr, @@ -132,7 +132,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf, fdt_load_addr = kbuf.mem; - pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr); + kexec_dprintk("Loaded device tree at 0x%lx\n", fdt_load_addr); slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset; ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr, diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 961a6dd67365..45089f53f875 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -577,7 +577,7 @@ static int add_usable_mem_property(void *fdt, struct device_node *dn, NODE_PATH_LEN, dn); return -EOVERFLOW; } - pr_debug("Memory node path: %s\n", path); + kexec_dprintk("Memory node path: %s\n", path); /* Now that we know the path, find its offset in kdump kernel's fdt */ node = fdt_path_offset(fdt, path); @@ -590,8 +590,8 @@ static int add_usable_mem_property(void *fdt, struct device_node *dn, /* Get the address & size cells */ n_mem_addr_cells = of_n_addr_cells(dn); n_mem_size_cells = of_n_size_cells(dn); - pr_debug("address cells: %d, size cells: %d\n", n_mem_addr_cells, -n_mem_size_cells); + kexec_dprintk("address cells: %d, size cells: %d\n", n_mem_addr_cells, + n_mem_size_cells); um_info->idx = 0; if (!check_realloc_usable_mem(um_info, 2)) { @@ -664,7 +664,7 @@ static int update_usable_mem_fdt(void *fdt, struct crash_mem *usable_mem) node = fdt_path_offset(fdt, "/ibm,dynamic-reconfiguration-memory"); if (node == -FDT_ERR_NOTFOUND) - pr_debug("No dynamic reconfiguration memory found\n"); + kexec_dprintk("No dynamic reconfiguration memory found\n"); else if (node < 0) { pr_err("Malformed device tree: error reading /ibm,dynamic-reconfiguration-memory.\n"); return -EINVAL; @@ -776,7 +776,7 @@ static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr) for (i = 0; i < ehdr->e_phnum; i++) { if (phdr->p_paddr == BACKUP_SRC_START) { phdr->p_offset = image->arch.backup_start; - pr_debug("Backup region offset updated to 0x%lx\n", + kexec_dprintk("Backup region offset updated to 0x%lx\n", image->arch.backup_start); return; } @@ -850,7 +850,7 @@ int load_crashdump_segments_ppc64(struct kimage *image, pr_err("Failed to load backup segment\n"); return ret; } - pr_debug("Loaded the backup region at 0x%lx\n", kbuf->mem); + kexec_dprintk("Loaded the backup region at 0x%lx\n", kbuf->mem); /* Load elfcorehdr segment - to export crashing kernel's vmcore */ ret = load_elfcorehdr_segment(image, kbuf); @@ -858,7 +858,7 @@ int load_crashdump_segments_ppc64(struct kimage *image, pr_err("Failed to load elfcorehdr segment\n"); return ret; } - pr_debug("Loaded elf core header at 0x%lx, bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded elf core header at 0x%lx, bufsz=0x%lx memsz=0x%lx\n", image->elf_load_addr,
[PATCH 3/7] kexec_file, x86: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also print out e820 memmap passed to 2nd kernel just as kexec_load interface has been doing. Signed-off-by: Baoquan He --- arch/x86/kernel/crash.c | 2 +- arch/x86/kernel/kexec-bzimage64.c | 23 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c92d88680dbf..df4dbd3aa08c 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -386,7 +386,7 @@ int crash_load_segments(struct kimage *image) if (ret) return ret; image->elf_load_addr = kbuf.mem; - pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", image->elf_load_addr, kbuf.bufsz, kbuf.memsz); return ret; diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index a61c12c01270..e9ae0eac6bf9 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -82,7 +82,7 @@ static int setup_cmdline(struct kimage *image, struct boot_params *params, cmdline_ptr[cmdline_len - 1] = '\0'; - pr_debug("Final command line is: %s\n", cmdline_ptr); + kexec_dprintk("Final command line is: %s\n", cmdline_ptr); cmdline_ptr_phys = bootparams_load_addr + cmdline_offset; cmdline_low_32 = cmdline_ptr_phys & 0xUL; cmdline_ext_32 = cmdline_ptr_phys >> 32; @@ -272,7 +272,12 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params, nr_e820_entries = params->e820_entries; + kexec_dprintk("E820 memmap:\n"); for (i = 0; i < nr_e820_entries; i++) { + kexec_dprintk("%016llx-%016llx (%d)\n", + params->e820_table[i].addr, + params->e820_table[i].addr + params->e820_table[i].size - 1, + params->e820_table[i].type); if (params->e820_table[i].type != E820_TYPE_RAM) continue; start = params->e820_table[i].addr; @@ -424,7 +429,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, * command line. Make sure it does not overflow */ if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { - pr_debug("Appending elfcorehdr= to command line exceeds maximum allowed length\n"); + kexec_dprintk("Appending elfcorehdr= to command line exceeds maximum allowed length\n"); return ERR_PTR(-EINVAL); } @@ -445,7 +450,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, return ERR_PTR(ret); } - pr_debug("Loaded purgatory at 0x%lx\n", pbuf.mem); + kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem); /* @@ -490,8 +495,8 @@ static void *bzImage64_load(struct kimage *image, char *kernel, if (ret) goto out_free_params; bootparam_load_addr = kbuf.mem; - pr_debug("Loaded boot_param, command line and misc at 0x%lx bufsz=0x%lx memsz=0x%lx\n", -bootparam_load_addr, kbuf.bufsz, kbuf.bufsz); + kexec_dprintk("Loaded boot_param, command line and misc at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + bootparam_load_addr, kbuf.bufsz, kbuf.bufsz); /* Load kernel */ kbuf.buffer = kernel + kern16_size; @@ -505,8 +510,8 @@ static void *bzImage64_load(struct kimage *image, char *kernel, goto out_free_params; kernel_load_addr = kbuf.mem; - pr_debug("Loaded 64bit kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n", -kernel_load_addr, kbuf.bufsz, kbuf.memsz); + kexec_dprintk("Loaded 64bit kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kernel_load_addr, kbuf.bufsz, kbuf.memsz); /* Load initrd high */ if (initrd) { @@ -520,8 +525,8 @@ static void *bzImage64_load(struct kimage *image, char *kernel, goto out_free_params; initrd_load_addr = kbuf.mem; - pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n", - initrd_load_addr, initrd_len, initrd_len); + kexec_dprintk("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + initrd_load_addr, initrd_len, initrd_len); setup_initrd(params, initrd_load_addr, initrd_len); } -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/7] kexec_file: print out debugging message if required
Currently, specifying '-d' will print a lot of debugging information about kexec/kdump loading with kexec_load interface. However, kexec_file_load prints nothing even though '-d' is specified. It's very inconvenient to debug or analyze the kexec/kdump loading when something wrong happened with kexec/kdump itself or develper want to check the kexec/kdump loading. In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked in code. If it's passed in, debugging message of kexec_file code will be printed out and can be seen from console and dmesg. Otherwise, the debugging message is printed via pr_debug(). Note: = 1) The code in kexec-tools utility also need be changed to support passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified. The patch link is here: = [PATCH] kexec_file: add kexec_file flag to support debug printing http://lists.infradead.org/pipermail/kexec/2023-November/028505.html 2) s390 also has kexec_file code, while I am not sure what debugging information is necessary. So leave it to s390 dev to add if they think it's needed. Test: == I did testing on x86_64 and arm64. On x86_64, the printed messages look like below: -- kexec measurement buffer for the loaded kernel at 0x207fffe000. Loaded purgatory at 0x207fff9000 Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180 Loaded 64bit kernel at 0x207c00 bufsz=0xc88200 memsz=0x3c4a000 Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280 Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M E820 memmap: -0009a3ff (1) 0009a400-0009 (2) 000e-000f (2) 0010-6ff83fff (1) 6ff84000-7ac50fff (2) .. 00207fff6150-00207fff615f (128) 00207fff6160-00207fff714f (1) 00207fff7150-00207fff715f (128) 00207fff7160-00207fff814f (1) 00207fff8150-00207fff815f (128) 00207fff8160-00207fff (1) nr_segments = 5 segment[0]: buf=0x4e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000 segment[1]: buf=0x9e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000 segment[2]: buf=0xd879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000 segment[3]: buf=0x1101cd86 bufsz=0xc88200 mem=0x207c00 memsz=0x3c4a000 segment[4]: buf=0xc6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000 kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8 --- Baoquan He (7): kexec_file: add kexec_file flag to control debug printing kexec_file: print out debugging message if required kexec_file, x86: print out debugging message if required kexec_file, arm64: print out debugging message if required kexec_file, ricv: print out debugging message if required kexec_file, power: print out debugging message if required kexec_file, parisc: print out debugging message if required arch/arm64/kernel/kexec_image.c| 2 +- arch/arm64/kernel/machine_kexec.c | 24 ++-- arch/arm64/kernel/machine_kexec_file.c | 6 +++--- arch/parisc/kernel/kexec_file.c| 6 +++--- arch/powerpc/kexec/elf_64.c| 8 arch/powerpc/kexec/file_load_64.c | 14 +++--- arch/riscv/kernel/elf_kexec.c | 9 + arch/riscv/kernel/machine_kexec.c | 26 -- arch/x86/kernel/crash.c| 2 +- arch/x86/kernel/kexec-bzimage64.c | 23 ++- include/linux/kexec.h | 14 +- include/uapi/linux/kexec.h | 1 + kernel/crash_core.c| 3 ++- kernel/kexec_file.c| 12 +++- security/integrity/ima/ima_kexec.c | 2 +- 15 files changed, 72 insertions(+), 80 deletions(-) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/7] kexec_file: add kexec_file flag to control debug printing
When specifying 'kexec -c -d', kexec_load interface will print loading information, e.g the regions where kernel/initrd/purgatory/cmdline are put, the memmap passed to 2nd kernel taken as system RAM ranges, and printing all contents of struct kexec_segment, etc. These are very helpful for analyzing or positioning what's happening when kexec/kdump itself failed. The debugging printing for kexec_load interface is made in user space utility kexec-tools. Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. Because kexec_file code is mostly implemented in kernel space, and the debugging printing functionality is missed. It's not convenient when debugging kexec/kdump loading and jumping with kexec_file_load interface. Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging message printing. And add global variable kexec_file_dbg_print and macro kexec_dprintk() to facilitate the printing. This is a preparation, later kexec_dprintk() will be used to replace the existing pr_debug(). Once 'kexec -s -d' is specified, it will print out kexec/kdump loading information. If '-d' is not specified, it regresses to pr_debug(). Signed-off-by: Baoquan He --- include/linux/kexec.h | 14 +- include/uapi/linux/kexec.h | 1 + kernel/kexec_file.c| 5 + 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8227455192b7..189a6c5bec84 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -264,6 +264,18 @@ arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section, return -ENOEXEC; } #endif + +extern bool kexec_file_dbg_print; + +#define kexec_dprintk(fmt, args...)\ + do {\ + if (kexec_file_dbg_print) \ + printk(KERN_INFO fmt, ##args); \ + else\ + printk(KERN_DEBUG fmt, ##args); \ + } while (0) + + #endif /* CONFIG_KEXEC_FILE */ #ifdef CONFIG_KEXEC_ELF @@ -403,7 +415,7 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec file flags */ #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ -KEXEC_FILE_NO_INITRAMFS) +KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_DEBUG) /* flag to track if kexec reboot is in progress */ extern bool kexec_in_progress; diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 01766dd839b0..c17bb096ea68 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -25,6 +25,7 @@ #define KEXEC_FILE_UNLOAD 0x0001 #define KEXEC_FILE_ON_CRASH0x0002 #define KEXEC_FILE_NO_INITRAMFS0x0004 +#define KEXEC_FILE_DEBUG 0x0008 /* These values match the ELF architecture values. * Unless there is a good reason that should continue to be the case. diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f9a419cd22d4..4c35500ae40a 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -38,6 +38,8 @@ void set_kexec_sig_enforced(void) } #endif +bool kexec_file_dbg_print; + static int kexec_calculate_store_digests(struct kimage *image); /* Maximum size in bytes for kernel/initrd files. */ @@ -123,6 +125,8 @@ void kimage_file_post_load_cleanup(struct kimage *image) */ kfree(image->image_loader_data); image->image_loader_data = NULL; + + kexec_file_dbg_print = false; } #ifdef CONFIG_KEXEC_SIG @@ -278,6 +282,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, if (!image) return -ENOMEM; + kexec_file_dbg_print = !!(flags & KEXEC_FILE_DEBUG); image->file_mode = 1; if (kexec_on_panic) { -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/7] kexec_file: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also print out type/start/head of kimage and flags to help debug. Signed-off-by: Baoquan He --- kernel/crash_core.c| 3 ++- kernel/kexec_file.c| 7 ++- security/integrity/ima/ima_kexec.c | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index efe87d501c8c..41001ffbaa99 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -551,7 +551,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, phdr->p_filesz = phdr->p_memsz = mend - mstart + 1; phdr->p_align = 0; ehdr->e_phnum++; - pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n", + kexec_dprintk("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, " + "sz=0x%llx e_phnum=%d p_offset=0x%llx\n", phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz, ehdr->e_phnum, phdr->p_offset); phdr++; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 4c35500ae40a..7ae1b0901aa4 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -206,6 +206,8 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, if (ret < 0) return ret; image->kernel_buf_len = ret; + kexec_dprintk("kernel: %p kernel_size: %#lx\n", + image->kernel_buf, image->kernel_buf_len); /* Call arch image probe handlers */ ret = arch_kexec_kernel_image_probe(image, image->kernel_buf, @@ -389,11 +391,12 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; + kexec_dprintk("nr_segments = %lu\n", image->nr_segments); for (i = 0; i < image->nr_segments; i++) { struct kexec_segment *ksegment; ksegment = >segment[i]; - pr_debug("Loading segment %d: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", + kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", i, ksegment->buf, ksegment->bufsz, ksegment->mem, ksegment->memsz); @@ -408,6 +411,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; + kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", + image->type, image->start, image->head, flags); /* * Free up any temporary buffers allocated which are not needed * after image has been loaded diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index ad133fe120db..e692624bcab3 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -129,7 +129,7 @@ void ima_add_kexec_buffer(struct kimage *image) image->ima_buffer_size = kexec_segment_size; image->ima_buffer = kexec_buffer; - pr_debug("kexec measurement buffer for the loaded kernel at 0x%lx.\n", + kexec_dprintk("kexec measurement buffer for the loaded kernel at 0x%lx.\n", kbuf.mem); } #endif /* IMA_KEXEC */ -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 5/7] kexec_file, ricv: print out debugging message if required
Replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also remove kexec_image_info() because the content has been printed out in generic code. Signed-off-by: Baoquan He --- arch/riscv/kernel/elf_kexec.c | 9 + arch/riscv/kernel/machine_kexec.c | 26 -- 2 files changed, 5 insertions(+), 30 deletions(-) diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c index e60fbd8660c4..20d941e91b5e 100644 --- a/arch/riscv/kernel/elf_kexec.c +++ b/arch/riscv/kernel/elf_kexec.c @@ -216,7 +216,6 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf, if (ret) goto out; kernel_start = image->start; - pr_notice("The entry point of kernel at 0x%lx\n", image->start); /* Add the kernel binary to the image */ ret = riscv_kexec_elf_load(image, , _info, @@ -252,7 +251,7 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf, image->elf_load_addr = kbuf.mem; image->elf_headers_sz = headers_sz; - pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n", image->elf_load_addr, kbuf.bufsz, kbuf.memsz); /* Setup cmdline for kdump kernel case */ @@ -275,6 +274,8 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf, pr_err("Error loading purgatory ret=%d\n", ret); goto out; } + kexec_dprintk("Loaded purgatory at 0x%lx\n", kbuf.mem); + ret = kexec_purgatory_get_set_symbol(image, "riscv_kernel_entry", _start, sizeof(kernel_start), 0); @@ -293,7 +294,7 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf, if (ret) goto out; initrd_pbase = kbuf.mem; - pr_notice("Loaded initrd at 0x%lx\n", initrd_pbase); + kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_pbase); } /* Add the DTB to the image */ @@ -318,7 +319,7 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf, } /* Cache the fdt buffer address for memory cleanup */ image->arch.fdt = fdt; - pr_notice("Loaded device tree at 0x%lx\n", kbuf.mem); + kexec_dprintk("Loaded device tree at 0x%lx\n", kbuf.mem); goto out; out_free_fdt: diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c index 2d139b724bc8..ed9cad20c039 100644 --- a/arch/riscv/kernel/machine_kexec.c +++ b/arch/riscv/kernel/machine_kexec.c @@ -18,30 +18,6 @@ #include #include -/* - * kexec_image_info - Print received image details - */ -static void -kexec_image_info(const struct kimage *image) -{ - unsigned long i; - - pr_debug("Kexec image info:\n"); - pr_debug("\ttype:%d\n", image->type); - pr_debug("\tstart: %lx\n", image->start); - pr_debug("\thead:%lx\n", image->head); - pr_debug("\tnr_segments: %lu\n", image->nr_segments); - - for (i = 0; i < image->nr_segments; i++) { - pr_debug("\tsegment[%lu]: %016lx - %016lx", i, - image->segment[i].mem, - image->segment[i].mem + image->segment[i].memsz); - pr_debug("\t\t0x%lx bytes, %lu pages\n", - (unsigned long) image->segment[i].memsz, - (unsigned long) image->segment[i].memsz / PAGE_SIZE); - } -} - /* * machine_kexec_prepare - Initialize kexec * @@ -60,8 +36,6 @@ machine_kexec_prepare(struct kimage *image) unsigned int control_code_buffer_sz = 0; int i = 0; - kexec_image_info(image); - /* Find the Flattened Device Tree and save its physical address */ for (i = 0; i < image->nr_segments; i++) { if (image->segment[i].memsz <= sizeof(fdt)) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] kexec_file: add kexec_file flag to support debug printing
This add KEXEC_FILE_DEBUG to kexec_file_flags so that it can be passed to kernel when '-d' is added with kexec_file_load interface. With that flag enabled, kernel can enable the debugging message printing. Signed-off-by: Baoquan He --- kexec/kexec-syscall.h | 1 + kexec/kexec.c | 1 + 2 files changed, 2 insertions(+) diff --git a/kexec/kexec-syscall.h b/kexec/kexec-syscall.h index 2559bffb93da..73e52543e1b0 100644 --- a/kexec/kexec-syscall.h +++ b/kexec/kexec-syscall.h @@ -119,6 +119,7 @@ static inline long kexec_file_load(int kernel_fd, int initrd_fd, #define KEXEC_FILE_UNLOAD 0x0001 #define KEXEC_FILE_ON_CRASH0x0002 #define KEXEC_FILE_NO_INITRAMFS0x0004 +#define KEXEC_FILE_DEBUG 0x0008 /* These values match the ELF architecture values. * Unless there is a good reason that should continue to be the case. diff --git a/kexec/kexec.c b/kexec/kexec.c index 08edfca2349a..b5393e3b20aa 100644 --- a/kexec/kexec.c +++ b/kexec/kexec.c @@ -1477,6 +1477,7 @@ int main(int argc, char *argv[]) return 0; case OPT_DEBUG: kexec_debug = 1; + kexec_file_flags |= KEXEC_FILE_DEBUG; break; case OPT_NOIFDOWN: skip_ifdown = 1; -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000
Hi John, On 10/23/23 at 02:54pm, John Sperbeck wrote: > On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin wrote: .. > > >--- > > > arch/x86/kernel/kexec-bzimage64.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > >diff --git a/arch/x86/kernel/kexec-bzimage64.c > > >b/arch/x86/kernel/kexec-bzimage64.c > > >index a61c12c01270..d6bf6c13dab1 100644 > > >--- a/arch/x86/kernel/kexec-bzimage64.c > > >+++ b/arch/x86/kernel/kexec-bzimage64.c > > >@@ -36,7 +36,7 @@ > > > */ > > > #define MIN_PURGATORY_ADDR0x3000 > > > #define MIN_BOOTPARAM_ADDR0x3000 > > >-#define MIN_KERNEL_LOAD_ADDR 0x10 > > >+#define MIN_KERNEL_LOAD_ADDR 0x100 > > > #define MIN_INITRD_LOAD_ADDR 0x100 > > > > > > /* > > > > This doesn't make any sense to me. There is already a high water mark for > > his much memory the kernel needs until an initrd or setup_data item can > > appear. This is just a hack, please fix it properly. > > The startup_64() code in head_64.S changes behavior based on whether > it's running below or above LOAD_PHYSICAL_ADDR: > > #ifdef CONFIG_RELOCATABLE > leaqstartup_32(%rip) /* - $startup_32 */, %rbp > movlBP_kernel_alignment(%rsi), %eax > decl%eax > addq%rax, %rbp > notq%rax > andq%rax, %rbp > cmpq$LOAD_PHYSICAL_ADDR, %rbp > jae 1f > #endif > movq$LOAD_PHYSICAL_ADDR, %rbp > 1: > > In my example, we were running from address 0x0040. The %rbp > register will start with 0x0040, but will be changed to 0x0100 > after the check against LOAD_PHYSICAL_ADDR fails. > > The 0x0100 value in %rbp is passed to extract_kernel as the > 'output' argument. Unless choose_random_location() decides > differently, this will be where the kernel is decompressed to. The > size of the kernel is large enough in my example that the > decompression overruns the initrd. > > If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and > used %rpb as is, then there would be no issue. The decompression > would have been to 0x0040 and would have completed before reaching > the initrd memory. > > That is, the kexec code is being careful to ensure that the kernel and > initrd memory doesn't overlap, but isn't paying attention to what > happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel > address is effectively changed to a different location). My proposed > change makes it aware, and avoids such addresses. Wondering why kexec-ed kernel is located under 0x100. The loading code will search physical memory regions bottom up for an available one. Usually, kexec kernel will be loaded above 16M. I have posted a patchset to load kernel at top of system RAM for kexec_file load just as kexec_load has been doing. Do you think it's helpful? [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required https://lore.kernel.org/all/20231114091658.228030-1-...@redhat.com/T/#u Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/2] kexec_file: Load kernel at top of system RAM if required
Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call the newly added walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Signed-off-by: Baoquan He --- kernel/kexec_file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f9a419cd22d4..ba3ef30921b8 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -592,6 +592,8 @@ static int kexec_walk_resources(struct kexec_buf *kbuf, IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, crashk_res.start, crashk_res.end, kbuf, func); + else if (kbuf->top_down) + return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func); else return walk_system_ram_res(0, ULONG_MAX, kbuf, func); } -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/2] kexec_file: Load kernel at top of system RAM if required
Justification: == Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Implementation === In patch 1, introduce a new function walk_system_ram_res_rev() which is a variant of walk_system_ram_res(), it walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. In patch 2, check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Background information: === And I ever tried this in the past in a different way, please see below link. In the post, I tried to adjust struct sibling linking code, replace the the singly linked list with list_head so that walk_system_ram_res_rev() can be implemented in a much easier way. Finally I failed. https://lore.kernel.org/all/20180718024944.577-4-...@redhat.com/ This time, I picked up the patch from AKASHI Takahiro's old post and made some change to take as the current patch 1: https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html Testing: Only tried on x86_64 Baoquan He (2): resource: add walk_system_ram_res_rev() kexec_file: Load kernel at top of system RAM if required include/linux/ioport.h | 3 +++ kernel/kexec_file.c| 2 ++ kernel/resource.c | 61 ++ 3 files changed, 66 insertions(+) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] resource: add walk_system_ram_res_rev()
This function, being a variant of walk_system_ram_res() introduced in commit 8c86e70acead ("resource: provide new functions to walk through resources"), walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. It will be used in kexec_file code to load kernel, initrd etc when preparing kexec reboot. Signed-off-by: AKASHI Takahiro Signed-off-by: Baoquan He --- include/linux/ioport.h | 3 +++ kernel/resource.c | 61 ++ 2 files changed, 64 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 14f5cfabbbc8..db7fe25f3370 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -331,6 +331,9 @@ extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int +walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); diff --git a/kernel/resource.c b/kernel/resource.c index 866ef3663a0b..12bce44a2c08 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include #include @@ -429,6 +431,65 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, func); } +/* + * This function, being a variant of walk_system_ram_res(), calls the @func + * callback against all memory ranges of type System RAM which are marked as + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from + * higher to lower. + */ +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + struct resource res, *rams; + int rams_size = 16, i; + unsigned long flags; + int ret = -1; + + /* create a list */ + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); + if (!rams) + return ret; + + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + i = 0; + while ((start < end) && + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, ))) { + if (i >= rams_size) { + /* re-alloc */ + struct resource *rams_new; + int rams_new_size; + + rams_new_size = rams_size + 16; + rams_new = kvcalloc(rams_new_size, sizeof(struct resource), + GFP_KERNEL); + if (!rams_new) + goto out; + + memcpy(rams_new, rams, + sizeof(struct resource) * rams_size); + kvfree(rams); + rams = rams_new; + rams_size = rams_new_size; + } + + rams[i].start = res.start; + rams[i++].end = res.end; + + start = res.end + 1; + } + + /* go reverse */ + for (i--; i >= 0; i--) { + ret = (*func)([i], arg); + if (ret) + break; + } + +out: + kvfree(rams); + return ret; +} + /* * This function calls the @func callback against all memory ranges, which * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec