Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers
On 2024-04-24 20:12:34+, Jakub Kicinski wrote: > On Tue, 23 Apr 2024 09:54:35 +0200 Thomas Weißschuh wrote: > > The series was split from my larger series sysctl-const series [0]. > > It only focusses on the proc_handlers but is an important step to be > > able to move all static definitions of ctl_table into .rodata. > > Split this per subsystem, please. Unfortunately this would introduce an enormous amount of code churn. The function prototypes for each callback have to stay consistent. So a another callback member ("proc_handler_new") is needed and users would be migrated to it gradually. But then *all* definitions of "struct ctl_table" throughout the tree need to be touched. In contrast, the proposed series only needs to change the handler implementations, not their usage sites. There are many, many more usage sites than handler implementations. Especially, as the majority of sysctl tables use the standard handlers (proc_dostring, proc_dobool, ...) and are not affected by the proposed aproach at all. And then we would have introduced a new handler name "proc_handler_new" and maybe have to do the whole thing again to rename it back to the original and well-known "proc_handler". Of course if somebody has a better aproach, I'm all ears. Thomas
Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas wrote: > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > When the power rail gets cut off, the hardware can create some electric > > noise on the link that triggers AER. If IRQ is shared between AER with > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > When the power rail gets back, the firmware of the device resets itself > > and can create unexpected behavior like sending PTM messages. For this > > case, the driver will always be too late to toggle off features should > > be disabled. > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > the power will be turned off during suspend process, disable AER service > > and re-enable it during the resume process. This should not affect the > > basic functionality. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > Signed-off-by: Kai-Heng Feng > > Thanks for reviving this series. I tried follow the history about > this, but there are at least two series that were very similar and I > can't put it all together. > > > --- > > v8: > > - Add more bug reports. > > > > v7: > > - Wording > > - Disable AER completely (again) if power will be turned off > > > > v6: > > v5: > > - Wording. > > > > v4: > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 25 + > > 1 file changed, 25 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index ac6293c24976..bea7818c2d1b 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > + aer_disable_rootport(rpc); > > Why do we check pci_ancestor_pr3_present(pdev) and > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > to disable AER interrupts on suspend in general. I think it will be > better if we do that consistently on all platforms, not special cases > based on details of how we suspend. Sure. Will change in next revision. > > Also, why do we use aer_disable_rootport() instead of just > aer_disable_irq()? I think it's the interrupt that causes issues on > suspend. I see that there *were* some versions that used > aer_disable_irq(), but I can't find the reason it changed. Interrupt can cause system wakeup, if it's shared with PME. The reason why aer_disable_rootport() is used over aer_disable_irq() is that when the latter is used the error still gets logged during sleep cycle. Once the pcieport driver resumes, it invokes aer_root_reset() to reset the hierarchy, while the hierarchy hasn't resumed yet. So use aer_disable_rootport() to prevent such issue from happening. Kai-Heng > > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > > + aer_enable_rootport(rpc); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers
On Wed, Apr 24, 2024 at 08:12:34PM -0700, Jakub Kicinski wrote: > On Tue, 23 Apr 2024 09:54:35 +0200 Thomas Weißschuh wrote: > > The series was split from my larger series sysctl-const series [0]. > > It only focusses on the proc_handlers but is an important step to be > > able to move all static definitions of ctl_table into .rodata. > > Split this per subsystem, please. It is tricky to do that because it changes the first argument (ctl*) to const in the proc_handler function type defined in sysclt.h: " -typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, +typedef int proc_handler(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); " This means that all the proc_handlers need to change at the same time. However, there is an alternative way to do this that allows chunking. We first define the proc_handler as a void pointer (casting it where it is being used) [1]. Then we could do the constification by subsystem (like Jakub proposes). Finally we can "revert the void pointer change so we don't have one size fit all pointer as our proc_handler [2]. Here are some comments about the alternative: 1. We would need to make the first argument const in all the derived proc_handlers [3] 2. There would be no undefined behavior for two reasons: 2.1. There is no case where we change the first argument. We know this because there are no compile errors after we make it const. 2.2. We would always go from non-const to const. This is the case because all the stuff that is unchanged in non-const. 3. If the idea sticks, it should go into mainline as one patchset. I would not like to have a void* proc_handler in a kernel release. 4. I think this is a "win/win" solution were the constification goes through and it is divided in such a way that it is reviewable. I would really like to hear what ppl think about this "heretic" alternative. @Thomas, @Luis, @Kees @Jakub? Best [1] https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/commit/?h=jag/constfy_treewide_alternative&id=4a383503b1ea650d4e12c1f5838974e879f5aa6f [2] https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/commit/?h=jag/constfy_treewide_alternative&id=a3be65973d27ec2933b9e81e1bec60be3a9b460d [3] proc_dostring, proc_dobool, proc_dointvec -- Joel Granados signature.asc Description: PGP signature
Re: [PATCH v5 RESEND] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests
On Wed, Apr 24, 2024 at 11:08:38AM +0530, Gautam Menghani wrote: > On Mon, Apr 22, 2024 at 09:15:02PM +0530, Naveen N Rao wrote: > > On Tue, Apr 02, 2024 at 12:36:54PM +0530, Gautam Menghani wrote: > > > static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 > > > time_limit, > > >unsigned long lpcr, u64 *tb) > > > { > > > @@ -4130,6 +4161,11 @@ static int kvmhv_vcpu_entry_nestedv2(struct > > > kvm_vcpu *vcpu, u64 time_limit, > > > kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr); > > > > > > accumulate_time(vcpu, &vcpu->arch.in_guest); > > > + > > > + /* Enable the guest host context switch time tracking */ > > > + if (unlikely(trace_kvmppc_vcpu_exit_cs_time_enabled())) > > > + kvmhv_set_l2_accumul(1); > > > + > > > rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id, > > > &trap, &i); > > > > > > @@ -4156,6 +4192,10 @@ static int kvmhv_vcpu_entry_nestedv2(struct > > > kvm_vcpu *vcpu, u64 time_limit, > > > > > > timer_rearm_host_dec(*tb); > > > > > > + /* Record context switch and guest_run_time data */ > > > + if (kvmhv_get_l2_accumul()) > > > + do_trace_nested_cs_time(vcpu); > > > + > > > return trap; > > > } > > > > I'm assuming the counters in VPA are cumulative, since you are zero'ing > > them out on exit. If so, I think a better way to implement this is to > > use TRACE_EVENT_FN() and provide tracepoint registration and > > unregistration functions. You can then enable the counters once during > > registration and avoid repeated writes to the VPA area. With that, you > > also won't need to do anything before vcpu entry. If you maintain > > previous values, you can calculate the delta and emit the trace on vcpu > > exit. The values in VPA area can then serve as the cumulative values. > > > > This approach will have a problem. The context switch times are reported > in the L1 LPAR's CPU's VPA area. Consider the following scenario: > > 1. L1 has 2 cpus, and L2 has 1 cpu > 2. L2 runs on L1's cpu0 for a few seconds, and the counter values go to > 1 million > 3. We are maintaining a copy of values of VPA in separate variables, so > those variables also have 1 million. > 4. Now if L2's vcpu is migrated to another L1 cpu, that L1 cpu's VPA > counters will start from 0, so if we try to get delta value, we will end > up doing 0 - 1 million, which would be wrong. I'm assuming you mean migrating the task. If we maintain the previous readings in paca, it should work I think. > > The aggregation logic in this patch works as we zero out the VPA after > every switch, and maintain aggregation in a vcpu->arch Are the cumulative values of the VPA counters of no significance? We lose those with this approach. Not sure if we care. - Naveen
Re: [PATCH v13 25/35] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2
On Fri, Oct 27, 2023 at 11:22:07AM -0700, Sean Christopherson wrote: > Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that > support for guest private memory can be added without needing an entirely > separate set of helpers. > > Note, this obviously makes selftests backwards-incompatible with older KVM ^^ > versions from this point forward. ^ Is there a way we could disable the tests on older kernels instead of making them fail? Check uname or something? There is probably a standard way to do this... It's these tests which fail. kvm_aarch32_id_regs kvm_access_tracking_perf_test kvm_arch_timer kvm_debug-exceptions kvm_demand_paging_test kvm_dirty_log_perf_test kvm_dirty_log_test kvm_guest_print_test kvm_hypercalls kvm_kvm_page_table_test kvm_memslot_modification_stress_test kvm_memslot_perf_test kvm_page_fault_test kvm_psci_test kvm_rseq_test kvm_smccc_filter kvm_steal_time kvm_vgic_init kvm_vgic_irq kvm_vpmu_counter_access regards, dan carpenter
Re: [PATCH v13 25/35] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2
On 4/25/24 08:12, Dan Carpenter wrote: On Fri, Oct 27, 2023 at 11:22:07AM -0700, Sean Christopherson wrote: Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that support for guest private memory can be added without needing an entirely separate set of helpers. Note, this obviously makes selftests backwards-incompatible with older KVM ^^ versions from this point forward. ^ Is there a way we could disable the tests on older kernels instead of making them fail? Check uname or something? There is probably a standard way to do this... It's these tests which fail. They shouldn't fail - the tests should be skipped on older kernels. If it is absolutely necessary to dd uname to check kernel version, refer to zram/zram_lib.sh for an example. thanks, -- Shuah
Re: [PATCH v13 25/35] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2
On Thu, Apr 25, 2024, Shuah Khan wrote: > On 4/25/24 08:12, Dan Carpenter wrote: > > On Fri, Oct 27, 2023 at 11:22:07AM -0700, Sean Christopherson wrote: > > > Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that > > > support for guest private memory can be added without needing an entirely > > > separate set of helpers. > > > > > > Note, this obviously makes selftests backwards-incompatible with older KVM > > > > ^^ > > > versions from this point forward. > >^ > > > > Is there a way we could disable the tests on older kernels instead of > > making them fail? Check uname or something? There is probably a > > standard way to do this... It's these tests which fail. > > They shouldn't fail - the tests should be skipped on older kernels. Ah, that makes sense. Except for a few outliers that aren't all that interesting, all KVM selftests create memslots, so I'm tempted to just make it a hard requirement to spare us headache, e.g. diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b2262b5fad9e..4b2038b1f11f 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -2306,6 +2306,9 @@ void __attribute((constructor)) kvm_selftest_init(void) /* Tell stdout not to buffer its content. */ setbuf(stdout, NULL); + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + kvm_selftest_arch_init(); } -- but it's also easy enough to be more precise and skip only those that actually create memslots. diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b2262b5fad9e..b21152adf448 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -944,6 +944,9 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot, uint32_t flag .guest_memfd_offset = guest_memfd_offset, }; + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + return ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION2, ®ion); } @@ -970,6 +973,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, size_t mem_size = npages * vm->page_size; size_t alignment; + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, "Number of guest pages is not compatible with the host. " "Try npages=%d", vm_adjust_num_guest_pages(vm->mode, npages)); --
Re: [PATCH v13 25/35] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2
On 4/25/24 09:09, Sean Christopherson wrote: On Thu, Apr 25, 2024, Shuah Khan wrote: On 4/25/24 08:12, Dan Carpenter wrote: On Fri, Oct 27, 2023 at 11:22:07AM -0700, Sean Christopherson wrote: Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that support for guest private memory can be added without needing an entirely separate set of helpers. Note, this obviously makes selftests backwards-incompatible with older KVM ^^ versions from this point forward. ^ Is there a way we could disable the tests on older kernels instead of making them fail? Check uname or something? There is probably a standard way to do this... It's these tests which fail. They shouldn't fail - the tests should be skipped on older kernels. Ah, that makes sense. Except for a few outliers that aren't all that interesting, all KVM selftests create memslots, so I'm tempted to just make it a hard requirement to spare us headache, e.g. diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b2262b5fad9e..4b2038b1f11f 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -2306,6 +2306,9 @@ void __attribute((constructor)) kvm_selftest_init(void) /* Tell stdout not to buffer its content. */ setbuf(stdout, NULL); + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + kvm_selftest_arch_init(); } -- but it's also easy enough to be more precise and skip only those that actually create memslots. This is approach is what is recommended in kselfest document. Rubn as many tests as possible and skip the ones that can't be run due to unmet dependencies. diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b2262b5fad9e..b21152adf448 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -944,6 +944,9 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, uint32_t slot, uint32_t flag .guest_memfd_offset = guest_memfd_offset, }; + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + return ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION2, ®ion); } @@ -970,6 +973,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, size_t mem_size = npages * vm->page_size; size_t alignment; + __TEST_REQUIRE(kvm_has_cap(KVM_CAP_USER_MEMORY2), + "KVM selftests from v6.8+ require KVM_SET_USER_MEMORY_REGION2"); + TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) == npages, "Number of guest pages is not compatible with the host. " "Try npages=%d", vm_adjust_num_guest_pages(vm->mode, npages)); -- thanks, -- Shuah
Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers
Hi Joel, On 2024-04-25 13:04:12+, Joel Granados wrote: > On Wed, Apr 24, 2024 at 08:12:34PM -0700, Jakub Kicinski wrote: > > On Tue, 23 Apr 2024 09:54:35 +0200 Thomas Weißschuh wrote: > > > The series was split from my larger series sysctl-const series [0]. > > > It only focusses on the proc_handlers but is an important step to be > > > able to move all static definitions of ctl_table into .rodata. > > > > Split this per subsystem, please. > It is tricky to do that because it changes the first argument (ctl*) to > const in the proc_handler function type defined in sysclt.h: > " > -typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, > +typedef int proc_handler(const struct ctl_table *ctl, int write, void > *buffer, > size_t *lenp, loff_t *ppos); > " > This means that all the proc_handlers need to change at the same time. > > However, there is an alternative way to do this that allows chunking. We > first define the proc_handler as a void pointer (casting it where it is > being used) [1]. Then we could do the constification by subsystem (like > Jakub proposes). Finally we can "revert the void pointer change so we > don't have one size fit all pointer as our proc_handler [2]. > > Here are some comments about the alternative: > 1. We would need to make the first argument const in all the derived >proc_handlers [3] > 2. There would be no undefined behavior for two reasons: >2.1. There is no case where we change the first argument. We know > this because there are no compile errors after we make it const. >2.2. We would always go from non-const to const. This is the case > because all the stuff that is unchanged in non-const. > 3. If the idea sticks, it should go into mainline as one patchset. I >would not like to have a void* proc_handler in a kernel release. > 4. I think this is a "win/win" solution were the constification goes >through and it is divided in such a way that it is reviewable. > > I would really like to hear what ppl think about this "heretic" > alternative. @Thomas, @Luis, @Kees @Jakub? Thanks for that alternative, I'm not a big fan though. Besides the wonky syntax, Control Flow Integrity should trap on this construct. Functions are called through different pointers than their actual types which is exactly what CFI is meant to prevent. Maybe people find it easier to review when using "--word-diff" and/or "-U0" with git diff/show. There is really nothing going an besides adding a few "const"s. But if the consensus prefers this solution, I'll be happy to adopt it. > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/commit/?h=jag/constfy_treewide_alternative&id=4a383503b1ea650d4e12c1f5838974e879f5aa6f > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/commit/?h=jag/constfy_treewide_alternative&id=a3be65973d27ec2933b9e81e1bec60be3a9b460d > [3] proc_dostring, proc_dobool, proc_dointvec Thomas
[PATCH v19 1/6] crash: forward memory_notify arg to arch crash hotplug handler
In the event of memory hotplug or online/offline events, the crash memory hotplug notifier `crash_memhp_notifier()` receives a `memory_notify` object but doesn't forward that object to the generic and architecture-specific crash hotplug handler. The `memory_notify` object contains the starting PFN (Page Frame Number) and the number of pages in the hot-removed memory. This information is necessary for architectures like PowerPC to update/recreate the kdump image, specifically `elfcorehdr`. So update the function signature of `crash_handle_hotplug_event()` and `arch_crash_handle_hotplug_event()` to accept the `memory_notify` object as an argument from crash memory hotplug notifier. Since no such object is available in the case of CPU hotplug event, the crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the crash hotplug handler. Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 4 +++- include/linux/crash_core.h | 2 +- kernel/crash_core.c | 14 +++--- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 91ca9a9ee3a2..cb1320ebbc23 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -207,7 +207,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image); extern void kdump_nmi_shootdown_cpus(void); #ifdef CONFIG_CRASH_HOTPLUG -void arch_crash_handle_hotplug_event(struct kimage *image); +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index e74d0c4286c1..2a682fe86352 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -432,10 +432,12 @@ unsigned int arch_crash_get_elfcorehdr_size(void) /** * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes * @image: a pointer to kexec_crash_image + * @arg: struct memory_notify handler for memory hotplug case and + * NULL for CPU hotplug case. * * Prepare the new elfcorehdr and replace the existing elfcorehdr. */ -void arch_crash_handle_hotplug_event(struct kimage *image) +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { void *elfbuf = NULL, *old_elfcorehdr; unsigned long nr_mem_ranges; diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index d33352c2e386..647e928efee8 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -37,7 +37,7 @@ static inline void arch_kexec_unprotect_crashkres(void) { } #ifndef arch_crash_handle_hotplug_event -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } #endif int crash_check_update_elfcorehdr(void); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 78b5dc7cee3a..70fa8111a9d6 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -534,7 +534,7 @@ int crash_check_update_elfcorehdr(void) * list of segments it checks (since the elfcorehdr changes and thus * would require an update to purgatory itself to update the digest). */ -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg) { struct kimage *image; @@ -596,7 +596,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) image->hp_action = hp_action; /* Now invoke arch-specific update handler */ - arch_crash_handle_hotplug_event(image); + arch_crash_handle_hotplug_event(image, arg); /* No longer handling a hotplug event */ image->hp_action = KEXEC_CRASH_HP_NONE; @@ -612,17 +612,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu) crash_hotplug_unlock(); } -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg) { switch (val) { case MEM_ONLINE: crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, - KEXEC_CRASH_HP_INVALID_CPU); + KEXEC_CRASH_HP_INVALID_CPU, arg)
[PATCH v19 0/6]powerpc/crash: Kernel handling of CPU and memory hotplug
Commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. This patch series adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU/Memory add/remove events. Among the 6 patches in this series, the first two patches make changes to the generic crash hotplug handler to assist PowerPC in adding support for this feature. The last four patches add support for this feature. The following section outlines the problem addressed by this patch series, along with the current solution, its shortcomings, and the proposed resolution. Problem: Due to CPU/Memory hotplug or online/offline events the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. Existing solution and its shortcoming: == The current solution to address the above issue involves monitoring the CPU/memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. Proposed solution: == Instead of initiating a full kdump image reload from userspace on CPU/Memory hotplug and online/offline events, the proposed solution aims to update only the necessary kdump image component within the kernel itself. Git tree for testing: = https://github.com/sourabhjains/linux/tree/kdump-in-kernel-crash-update-v19 Above tree is rebased on top of v6.9-rc5 branch. To realize this feature, the kdump udev rule must be updated. On RHEL, add the following two lines at the top of the "/usr/lib/udev/rules.d/98-kexec.rules" file. SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With the above change to the kdump udev rule, kdump reload is avoided during CPU/Memory add/remove events if this feature is enabled in the kernel. Note: only kexec_file_load syscall will work. For kexec_load minor changes are required in kexec tool. Changelog: -- v19: - Fix a build warning, remove NULL check before freeing memory. 6/6 Reported by kernel test robot - Rebase it to 6.9-rc5 v18: [No functional changes] - https://lore.kernel.org/all/20240326055413.186534-1-sourabhj...@linux.ibm.com/ - Update a comment in 2/6. - Describe the clean-up done on x86 in patch description 2/6. - Fix a minor typo in the patch description of 3/6. v17: [https://lore.kernel.org/all/20240226084118.16310-1-sourabhj...@linux.ibm.com/] - Rebase the patch series on top linux-next tree and below patch series https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/ - Split 0003 patch from v16 into two patches 1. Move get_crash_memory_ranges() along with other *_memory_ranges() functions to ranges.c and make them public. 2. Make update_cpus_node function public and take this function out of file_load_64.c - Keep arch_crash_hotplug_support in crash.c instead of core_64.c [05/06] - Use CONFIG_CRASH_MAX_MEMORY_RANGES to find extra elfcorehdr size [06/06] v16: [https://lore.kernel.org/all/20240217081452.164571-1-sourabhj...@linux.ibm.com/] - Remove the unused #define `crash_hotplug_cpu_support` and `crash_hotplug_memory_support` in `arch/x86/include/asm/kexec.h`. - Document why two kexec flag bits are used in `arch_crash_hotplug_memory_support` (x86). - Use a switch case to handle different hotplug operations in `arch_crash_handle_hotplug_event` for PowerPC. - Fix a typo in 4/5. v15: - Remove the patch that adds a new kexec flag for FDT update. - Introduce a generic kexec flag bit to share hotplug support intent between the kexec tool and the kernel for the kexec_load syscall. (2/5) - Introduce an architecture-specific handler to process the kexec flag for crash hotplug support. (2/5) - Rename the @update_elfcorehdr member of the struct kimage to @hotplug_support. (2/5) - Use a common function to advertise hotplug support for both CPU and Memory. (2/5) v14: - Fix build warnings by including necessary header files - Rebase to v6.7-rc5 v13: - Fix a build warning, take ranges.c out
[PATCH v19 2/6] crash: add a new kexec flag for hotplug support
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now we have two kexec flags that enables crash hotplug support for kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures). To simplify the process of finding and reporting the crash hotplug support the following changes are introduced. 1. Define arch specific function to process the kexec flags and determine crash hotplug support 2. Rename the @update_elfcorehdr member of struct kimage to @hotplug_support and populate it for both kexec_load and kexec_file_load syscalls, because architecture can update more than one kexec segment 3. Let generic function crash_check_hotplug_support report hotplug support for loaded kdump image based on value of @hotplug_support To bring the x86 crash hotplug support in line with the above points, the following changes have been made: - Introduce the arch_crash_hotplug_support function to process kexec flags and determine crash hotplug support - Remove the arch_crash_hotplug_[cpu|memory]_support functions Signed-off-by: Sourabh Jain Acked-by: Baoquan He Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Eric DeVolder Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/x86/include/asm/kexec.h | 11 ++- arch/x86/kernel/crash.c | 28 +--- drivers/base/cpu.c | 2 +- drivers/base/memory.c| 2 +- include/linux/crash_core.h | 13 ++--- include/linux/kexec.h| 11 +++ include/uapi/linux/kexec.h | 1 + kernel/crash_core.c | 15 ++- kernel/kexec.c | 4 ++-- kernel/kexec_file.c | 5 + 10 files changed, 48 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index cb1320ebbc23..ae5482a2f0ca 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void); void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void); -#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support -#endif - -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void); -#define crash_hotplug_memory_support arch_crash_hotplug_memory_support -#endif +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); +#define arch_crash_hotplug_support arch_crash_hotplug_support unsigned int arch_crash_get_elfcorehdr_size(void); #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 2a682fe86352..f06501445cd9 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt -/* These functions provide the value for the sysfs crash_hotplug nodes */ -#ifdef CONFIG_HOTPLUG_CPU -int arch_crash_hotplug_cpu_support(void) +int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags) { - return crash_check_update_elfcorehdr(); -} -#endif -#ifdef CONFIG_MEMORY_HOTPLUG -int arch_crash_hotplug_memory_support(void) -{ - return crash_check_update_elfcorehdr(); -} +#ifdef CONFIG_KEXEC_FILE + if (image->file_mode) + return 1; #endif + /* +* Initially, crash hotplug support for kexec_load was added +* with the KEXEC_UPDATE_ELFCOREHDR flag. Later, this +* functionality was expanded to accommodate multiple kexec +* segment updates, leading to the introduction of the +* KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag bit. Consequently, +* when the kexec tool sends either o
[PATCH v19 4/6] PowerPC/kexec: make the update_cpus_node() function public
Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c} to allow other kexec components to use it. Later in the series, this function is used for in-kernel updates to the kdump image during CPU/memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/include/asm/kexec.h | 4 ++ arch/powerpc/kexec/core_64.c | 91 +++ arch/powerpc/kexec/file_load_64.c | 87 - 3 files changed, 95 insertions(+), 87 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index fdb90e24dc74..d9ff4d0e392d 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -185,6 +185,10 @@ static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *)) #endif /* CONFIG_CRASH_DUMP */ +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +int update_cpus_node(void *fdt); +#endif + #ifdef CONFIG_PPC_BOOK3S_64 #include #endif diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index 762e4d09aacf..85050be08a23 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ #include #include #include +#include int machine_kexec_prepare(struct kimage *image) { @@ -419,3 +421,92 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_64S_HASH_MMU */ + +#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP) +/** + * add_node_props - Reads node properties from device node structure and add + * them to fdt. + * @fdt:Flattened device tree of the kernel + * @node_offset:offset of the node to add a property at + * @dn: device node pointer + * + * Returns 0 on success, negative errno on error. + */ +static int add_node_props(void *fdt, int node_offset, const struct device_node *dn) +{ + int ret = 0; + struct property *pp; + + if (!dn) + return -EINVAL; + + for_each_property_of_node(dn, pp) { + ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length); + if (ret < 0) { + pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret)); + return ret; + } + } + return ret; +} + +/** + * update_cpus_node - Update cpus node of flattened device tree using of_root + *device node. + * @fdt: Flattened device tree of the kernel. + * + * Returns 0 on success, negative errno on error. + */ +int update_cpus_node(void *fdt) +{ + struct device_node *cpus_node, *dn; + int cpus_offset, cpus_subnode_offset, ret = 0; + + cpus_offset = fdt_path_offset(fdt, "/cpus"); + if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) { + pr_err("Malformed device tree: error reading /cpus node: %s\n", + fdt_strerror(cpus_offset)); + return cpus_offset; + } + + if (cpus_offset > 0) { + ret = fdt_del_node(fdt, cpus_offset); + if (ret < 0) { + pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret)); + return -EINVAL; + } + } + + /* Add cpus node to fdt */ + cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus"); + if (cpus_offset < 0) { + pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset)); + return -EINVAL; + } + + /* Add cpus node properties */ + cpus_node = of_find_node_by_path("/cpus"); + ret = add_node_props(fdt, cpus_offset, cpus_node); + of_node_put(cpus_node); + if (ret < 0) + return ret; + + /* Loop through all subnodes of cpus and add them to fdt */ + for_each_node_by_type(dn, "cpu") { + cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name); + if (cpus_subnode_offset < 0) { + pr_err("Unable to add %s subnode: %s\n", dn->full_name, + fdt_strerror(cpus_subnode_offset)); + ret = cpus_subnode_offset; +
[PATCH v19 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c
Move the following functions form kexec/{file_load_64.c => ranges.c} and make them public so that components other than KEXEC_FILE can also use these functions. 1. get_exclude_memory_ranges 2. get_reserved_memory_ranges 3. get_crash_memory_ranges 4. get_usable_memory_ranges Later in the series get_crash_memory_ranges function is utilized for in-kernel updates to kdump image during CPU/Memory hotplug or online/offline events for both kexec_load and kexec_file_load syscalls. Since the above functions are moved to ranges.c, some of the helper functions in ranges.c are no longer required to be public. Mark them as static and removed them from kexec_ranges.h header file. Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c because it is required for other config, such as CONFIG_CRASH_DUMP. No functional changes are intended. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/include/asm/kexec_ranges.h | 19 +- arch/powerpc/kexec/Makefile | 4 +- arch/powerpc/kexec/file_load_64.c | 190 arch/powerpc/kexec/ranges.c | 227 +++- 4 files changed, 224 insertions(+), 216 deletions(-) diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index f83866a19e87..8489e844b447 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,19 +7,8 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); -int add_tce_mem_ranges(struct crash_mem **mem_ranges); -int add_initrd_mem_range(struct crash_mem **mem_ranges); -#ifdef CONFIG_PPC_64S_HASH_MMU -int add_htab_mem_range(struct crash_mem **mem_ranges); -#else -static inline int add_htab_mem_range(struct crash_mem **mem_ranges) -{ - return 0; -} -#endif -int add_kernel_mem_range(struct crash_mem **mem_ranges); -int add_rtas_mem_range(struct crash_mem **mem_ranges); -int add_opal_mem_range(struct crash_mem **mem_ranges); -int add_reserved_mem_ranges(struct crash_mem **mem_ranges); - +int get_exclude_memory_ranges(struct crash_mem **mem_ranges); +int get_reserved_memory_ranges(struct crash_mem **mem_ranges); +int get_crash_memory_ranges(struct crash_mem **mem_ranges); +int get_usable_memory_ranges(struct crash_mem **mem_ranges); #endif /* _ASM_POWERPC_KEXEC_RANGES_H */ diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 8e469c4da3f8..470eb0453e17 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -3,11 +3,11 @@ # Makefile for the linux kernel. # -obj-y += core.o core_$(BITS).o +obj-y += core.o core_$(BITS).o ranges.o obj-$(CONFIG_PPC32)+= relocate_32.o -obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o +obj-$(CONFIG_KEXEC_FILE) += file_load.o file_load_$(BITS).o elf_$(BITS).o obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o obj-$(CONFIG_CRASH_DUMP) += crash.o diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 1bc65de6174f..6a01f62b8fcf 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -47,83 +47,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * get_exclude_memory_ranges - Get exclude memory ranges. This list includes - * regions like opal/rtas, tce-table, initrd, - * kernel, htab which should be avoided while - * setting up kexec load segments. - * @mem_ranges:Range list to add the memory ranges to. - * - * Returns 0 on success, negative errno on error. - */ -static int get_exclude_memory_ranges(struct crash_mem **mem_ranges) -{ - int ret; - - ret = add_tce_mem_ranges(mem_ranges); - if (ret) - goto out; - - ret = add_initrd_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_htab_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_kernel_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_rtas_mem_range(mem_ranges); - if (ret) - goto out; - - ret = add_opal_mem_range(mem_ranges); -
[PATCH v19 5/6] powerpc/crash: add crash CPU hotplug support
Due to CPU/Memory hotplug or online/offline events, the elfcorehdr (which describes the CPUs and memory of the crashed kernel) and FDT (Flattened Device Tree) of kdump image becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr or FDT can lead to failed or inaccurate dump collection. Going forward, CPU hotplug or online/offline events are referred as CPU/Memory add/remove events. The current solution to address the above issue involves monitoring the CPU/Memory add/remove events in userspace using udev rules and whenever there are changes in CPU and memory resources, the entire kdump image is loaded again. The kdump image includes kernel, initrd, elfcorehdr, FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to CPU/Memory add/remove events, reloading the entire kdump image is inefficient. More importantly, kdump remains inactive for a substantial amount of time until the kdump reload completes. To address the aforementioned issue, commit 247262756121 ("crash: add generic infrastructure for crash hotplug support") added a generic infrastructure that allows architectures to selectively update the kdump image component during CPU or memory add/remove events within the kernel itself. In the event of a CPU or memory add/remove events, the generic crash hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It then acquires the necessary locks to update the kdump image and invokes the architecture-specific crash hotplug handler, `arch_crash_handle_hotplug_event()`, to update the required kdump image components. This patch adds crash hotplug handler for PowerPC and enable support to update the kdump image on CPU add/remove events. Support for memory add/remove events is added in a subsequent patch with the title "powerpc: add crash memory hotplug support" As mentioned earlier, only the elfcorehdr and FDT kdump image components need to be updated in the event of CPU or memory add/remove events. However, on PowerPC architecture crash hotplug handler only updates the FDT to enable crash hotplug support for CPU add/remove events. Here's why. The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does not need an update on CPU add/remove events. On the other hand, the FDT needs to be updated on CPU add events to include the newly added CPU. If the FDT is not updated and the kernel crashes on a newly added CPU, the kdump kernel will fail to boot due to the unavailability of the crashing CPU in the FDT. During the early boot, it is expected that the boot CPU must be a part of the FDT; otherwise, the kernel will raise a BUG and fail to boot. For more information, refer to commit 36ae37e3436b0 ("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is okay to have an offline CPU in the kdump FDT, no action is taken in case of CPU removal. There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. Few changes have been made to ensure kernel can safely update the FDT of kdump image loaded using both system calls. For kexec_file_load syscall the kdump image is prepared in kernel. So to support an increasing number of CPUs, the FDT is constructed with extra buffer space to ensure it can accommodate a possible number of CPU nodes. Additionally, a call to fdt_pack (which trims the unused space once the FDT is prepared) is avoided if this feature is enabled. For the kexec_load syscall, the FDT is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by userspace (kexec tools). When userspace passes this flag to the kernel, it indicates that the FDT is built to accommodate possible CPUs, and the FDT segment is excluded from SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- * No changes in v19. arch/powerpc/Kconfig | 4 ++ arch/powerpc/include/asm/kexec.h | 8 +++ arch/powerpc/kexec/crash.c| 103 ++ arch/powerpc/kexec/elf_64.c | 3 +- arch/powerpc/kexec/file_load_64.c | 17 + 5 files changed, 134 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..a1a3b3363008 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -686,6 +686,10 @@ config ARCH_SELECTS_CRASH_DUMP depends on CRASH_DUMP select RELOCATABLE
[PATCH v19 6/6] powerpc/crash: add crash memory hotplug support
Extend the arch crash hotplug handler, as introduced by the patch title ("powerpc: add crash CPU hotplug support"), to also support memory add/remove events. Elfcorehdr describes the memory of the crash kernel to capture the kernel; hence, it needs to be updated if memory resources change due to memory add/remove events. Therefore, arch_crash_handle_hotplug_event() is updated to recreate the elfcorehdr and replace it with the previous one on memory add/remove events. The memblock list is used to prepare the elfcorehdr. In the case of memory hot remove, the memblock list is updated after the arch crash hotplug handler is triggered, as depicted in Figure 1. Thus, the hot-removed memory is explicitly removed from the crash memory ranges to ensure that the memory ranges added to elfcorehdr do not include the hot-removed memory. Memory remove | v Offline pages | v Initiate memory notify call <> crash hotplug handler chain for MEM_OFFLINE event | v Update memblock list Figure 1 There are two system calls, `kexec_file_load` and `kexec_load`, used to load the kdump image. A few changes have been made to ensure that the kernel can safely update the elfcorehdr component of the kdump image for both system calls. For the kexec_file_load syscall, kdump image is prepared in the kernel. To support an increasing number of memory regions, the elfcorehdr is built with extra buffer space to ensure that it can accommodate additional memory ranges in future. For the kexec_load syscall, the elfcorehdr is updated only if the KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the kexec tool. Passing this flag to the kernel indicates that the elfcorehdr is built to accommodate additional memory ranges and the elfcorehdr segment is not considered for SHA calculation, making it safe to update. The changes related to this feature are kept under the CRASH_HOTPLUG config, and it is enabled by default. Signed-off-by: Sourabh Jain Acked-by: Hari Bathini Cc: Akhil Raj Cc: Andrew Morton Cc: Aneesh Kumar K.V Cc: Baoquan He Cc: Borislav Petkov (AMD) Cc: Boris Ostrovsky Cc: Christophe Leroy Cc: Dave Hansen Cc: Dave Young Cc: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Laurent Dufour Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Mimi Zohar Cc: Naveen N Rao Cc: Oscar Salvador Cc: Stephen Rothwell Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Vivek Goyal Cc: ke...@lists.infradead.org Cc: x...@kernel.org --- Changes in v19: * Fix a build warning: remove NULL check before freeing memory for elfbuf in update_crash_elfcorehdr function. arch/powerpc/include/asm/kexec.h| 3 + arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/crash.c | 94 - arch/powerpc/kexec/file_load_64.c | 20 +- arch/powerpc/kexec/ranges.c | 85 ++ 5 files changed, 201 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index e75970351bcd..95a98b390d62 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg); int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags); #define arch_crash_hotplug_support arch_crash_hotplug_support + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif /* CONFIG_CRASH_HOTPLUG */ extern int crashing_cpu; diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h index 8489e844b447..14055896cbcb 100644 --- a/arch/powerpc/include/asm/kexec_ranges.h +++ b/arch/powerpc/include/asm/kexec_ranges.h @@ -7,6 +7,7 @@ void sort_memory_ranges(struct crash_mem *mrngs, bool merge); struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges); int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size); int get_exclude_memory_ranges(struct crash_mem **mem_ranges); int get_reserved_memory_ranges(struct crash_mem **mem_ranges); int get_crash_memory_ranges(struct crash_mem **mem_ranges); diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c index 8938a19af12f..9ac3266e4965 100644 --- a/arch/powerpc/kexec/crash.c +++ b/arch/powerpc/kexec/crash.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include /* * The primary CPU waits a while for all secondary CPUs to enter. This is to @@ -398,6 +400,93 @@ void default_machine_crash_shutdown(struct pt_regs *regs) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* + * Advertise preferred elfcorehdr size to userspace via + * /sys/kernel/crash_elfcorehdr_size sysfs int
Re: linux-next: boot failure after merge of the modules tree
Hi Mike, On Wed, 24 Apr 2024 12:14:49 +0300 Mike Rapoport wrote: > > This should fix it for now, I'll rework initialization a bit in v6 > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 1c4be3373686..bea33bf538e9 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -176,6 +176,7 @@ config PPC > select ARCH_WANT_IRQS_OFF_ACTIVATE_MM > select ARCH_WANT_LD_ORPHAN_WARN > select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if PPC_RADIX_MMU > + select ARCH_WANTS_EXECMEM_EARLY if EXECMEM > select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if PPC_BOOK3S_32 || > PPC_8xx > select ARCH_WEAK_RELEASE_ACQUIRE > select BINFMT_ELF I added the above to today's merge of the modules tree and it made the boot failure go away. -- Cheers, Stephen Rothwell pgp23z3ezFCNO.pgp Description: OpenPGP digital signature