Re: [PATCH 09/15] tracing: Introduce names for events
On Wed, 13 Dec 2023 00:04:46 + Alexander Graf wrote: > With KHO (Kexec HandOver), we want to preserve trace buffers. To parse > them, we need to ensure that all trace events that exist in the logs are > identical to the ones we parse as. That means we need to match the > events before and after kexec. > > As a first step towards that, let's give every event a unique name. That > way we can clearly identify the event before and after kexec and restore > its ID post-kexec. > > Signed-off-by: Alexander Graf > --- > include/linux/trace_events.h | 1 + > include/trace/trace_events.h | 2 ++ > kernel/trace/blktrace.c | 1 + > kernel/trace/trace_branch.c | 1 + > kernel/trace/trace_events.c | 3 +++ > kernel/trace/trace_functions_graph.c | 4 +++- > kernel/trace/trace_output.c | 13 + > kernel/trace/trace_probe.c | 3 +++ > kernel/trace/trace_syscalls.c| 29 > 9 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index d68ff9b1247f..7670224aa92d 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -149,6 +149,7 @@ struct trace_event { > struct hlist_node node; > int type; > struct trace_event_functions*funcs; > + const char *name; > }; OK, this is a hard no. We definitely need to find a different way to do this. I'm trying hard to lower the footprint of tracing, and this just added 8 bytes to every event on a 64 bit machine. On my box I have 1953 events, and they are constantly growing. This just added 15,624 bytes of tracing overhead to that machine. That may not sound like much, but as this is only for this feature, it just added 15K to the overhead for the majority of users. I'm not sure how easy it is to make this a config option that takes away that field when not set. But I would need that at a minimum. -- Steve
Re: [PATCH 08/15] tracing: Introduce names for ring buffers
On Wed, 13 Dec 2023 01:35:16 +0100 Alexander Graf wrote: > > The trace_array is the structure that represents each tracing instance. And > > it already has a name field. And if you can get the associated ring buffer > > from that too. > > > > struct trace_array *tr; > > > > tr->array_buffer.buffer > > > > tr->name > > > > When you do: mkdir /sys/kernel/tracing/instance/foo > > > > You create a new trace_array instance where tr->name = "foo" and allocates > > the buffer for it as well. > > The name in the ring buffer is pretty much just a copy of the trace > array name. I use it to reconstruct which buffer we're actually > referring to inside __ring_buffer_alloc(). No, I rather not tie the ring buffer to the trace_array. > > I'm all ears for alternative suggestions. I suppose we could pass tr as > argument to ring_buffer_alloc() instead of the name? I'll have to spend some time (that I don't currently have :-( ) on looking at this more. I really don't like the copying of the name into the ring buffer allocation, as it may be an unneeded burden to maintain, not to mention the duplicate field. -- Steve
Re: [PATCH 08/15] tracing: Introduce names for ring buffers
On Wed, 13 Dec 2023 00:04:45 + Alexander Graf wrote: > With KHO (Kexec HandOver), we want to preserve trace buffers across > kexec. To carry over their state between kernels, the kernel needs a > common handle for them that exists on both sides. As handle we introduce > names for ring buffers. In a follow-up patch, the kernel can then use > these names to recover buffer contents for specific ring buffers. > Is there a way to use the trace_array name instead? The trace_array is the structure that represents each tracing instance. And it already has a name field. And if you can get the associated ring buffer from that too. struct trace_array *tr; tr->array_buffer.buffer tr->name When you do: mkdir /sys/kernel/tracing/instance/foo You create a new trace_array instance where tr->name = "foo" and allocates the buffer for it as well. -- Steve
Re: [PATCH 03/10] ftrace: Remove the now superfluous sentinel elements from ctl_table array
On Wed, 8 Nov 2023 19:29:49 +0900 Masami Hiramatsu (Google) wrote: > On Tue, 07 Nov 2023 14:45:03 +0100 > Joel Granados via B4 Relay wrote: > > > From: Joel Granados > > > > This commit comes at the tail end of a greater effort to remove the > > empty elements at the end of the ctl_table arrays (sentinels) which > > will reduce the overall build time size of the kernel and run time > > memory bloat by ~64 bytes per sentinel (further information Link : > > https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/) > > > > Remove sentinel elements from ftrace_sysctls and user_event_sysctls > > > > Both looks good to me. (since register_sysctl_init() uses ARRAY_SIZE() > macro to get the array size.) > > Acked-by: Masami Hiramatsu (Google) Agreed. Acked-by: Steven Rostedt (Google) -- Steve
Re: [PATCH] kexec: Fix reboot race during device_shutdown()
On Sat, 7 Oct 2023 21:30:42 -0400 Joel Fernandes wrote: > Just checking how we want to proceed, is the consensus that we should > prevent kernel crashes without relying on userspace stopping all > processes? Should we fix regular reboot syscall as well and not just > kexec reboot? If you can show that we can trigger the crash on normal reboot, then I don't see why not. That is, if you have a program that does the reboot (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's a legitimate reason to fix it on normal reboot too. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 2/2] x86/purgatory: Add linker script
On Thu, 30 Mar 2023 17:18:26 +0200 Borislav Petkov wrote: > On Thu, Mar 30, 2023 at 11:15:23AM -0400, Steven Rostedt wrote: > > > Make sure that the .text section is not divided in multiple overlapping > > > sections. This is not supported by kexec_file. > > And? > > What is the failure scenario? Why are you fixing it? Why do we care? > > This is way too laconic. > Yeah, I think the change log in patch 1 needs to be in this patch too, which gives better context. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 2/2] x86/purgatory: Add linker script
Hmm, this patch may need some more eyes. At least from the x86 maintainers. -- Steve On Thu, 30 Mar 2023 11:44:48 +0200 Ricardo Ribalda wrote: > Make sure that the .text section is not divided in multiple overlapping > sections. This is not supported by kexec_file. > > Signed-off-by: Ricardo Ribalda > --- > arch/x86/purgatory/.gitignore| 2 ++ > arch/x86/purgatory/Makefile | 20 + > arch/x86/purgatory/kexec-purgatory.S | 2 +- > arch/x86/purgatory/purgatory.lds.S | 57 > > 4 files changed, 74 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/purgatory/.gitignore b/arch/x86/purgatory/.gitignore > index d2be1500671d..1fe71fe5945d 100644 > --- a/arch/x86/purgatory/.gitignore > +++ b/arch/x86/purgatory/.gitignore > @@ -1 +1,3 @@ > purgatory.chk > +purgatory.lds > +purgatory > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile > index 17f09dc26381..4dc96d409bec 100644 > --- a/arch/x86/purgatory/Makefile > +++ b/arch/x86/purgatory/Makefile > @@ -16,10 +16,11 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS > > # When linking purgatory.ro with -r unresolved symbols are not checked, > # also link a purgatory.chk binary without -r to check for unresolved > symbols. > -PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib > -LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS) > -LDFLAGS_purgatory.chk := $(PURGATORY_LDFLAGS) > -targets += purgatory.ro purgatory.chk > +PURGATORY_LDFLAGS := -nostdlib -z nodefaultlib > +LDFLAGS_purgatory := -r $(PURGATORY_LDFLAGS) -T > +LDFLAGS_purgatory.chk := -e purgatory_start $(PURGATORY_LDFLAGS) > + > +targets += purgatory.lds purgatory.ro purgatory.chk > > # Sanitizer, etc. runtimes are unavailable and cannot be linked here. > GCOV_PROFILE := n > @@ -72,10 +73,17 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS) > AFLAGS_REMOVE_setup-x86_$(BITS).o+= -Wa,-gdwarf-2 > AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2 > > -$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE > +OBJCOPYFLAGS_purgatory.ro := -O elf64-x86-64 > +OBJCOPYFLAGS_purgatory.ro += --remove-section='*debug*' > +OBJCOPYFLAGS_purgatory.ro += --remove-section='.comment' > +OBJCOPYFLAGS_purgatory.ro += --remove-section='.note.*' > +$(obj)/purgatory.ro: $(obj)/purgatory FORCE > + $(call if_changed,objcopy) > + > +$(obj)/purgatory.chk: $(obj)/purgatory FORCE > $(call if_changed,ld) > > -$(obj)/purgatory.chk: $(obj)/purgatory.ro FORCE > +$(obj)/purgatory: $(obj)/purgatory.lds $(PURGATORY_OBJS) FORCE > $(call if_changed,ld) > > $(obj)/kexec-purgatory.o: $(obj)/purgatory.ro $(obj)/purgatory.chk > diff --git a/arch/x86/purgatory/kexec-purgatory.S > b/arch/x86/purgatory/kexec-purgatory.S > index 8530fe93b718..54b0d0b4dc42 100644 > --- a/arch/x86/purgatory/kexec-purgatory.S > +++ b/arch/x86/purgatory/kexec-purgatory.S > @@ -5,7 +5,7 @@ > .align 8 > kexec_purgatory: > .globl kexec_purgatory > - .incbin "arch/x86/purgatory/purgatory.ro" > + .incbin "arch/x86/purgatory/purgatory" > .Lkexec_purgatory_end: > > .align 8 > diff --git a/arch/x86/purgatory/purgatory.lds.S > b/arch/x86/purgatory/purgatory.lds.S > new file mode 100644 > index ..610da88aafa0 > --- /dev/null > +++ b/arch/x86/purgatory/purgatory.lds.S > @@ -0,0 +1,57 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#include > + > +OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT) > + > +#undef i386 > + > +#include > +#include > + > +ENTRY(purgatory_start) > + > +SECTIONS > +{ > + . = 0; > + .head.text : { > + _head = . ; > + HEAD_TEXT > + _ehead = . ; > + } > + .rodata : { > + _rodata = . ; > + *(.rodata) /* read-only data */ > + *(.rodata.*) > + _erodata = . ; > + } > + .text : { > + _text = .; /* Text */ > + *(.text) > + *(.text.*) > + *(.noinstr.text) > + _etext = . ; > + } > + .data : { > + _data = . ; > + *(.data) > + *(.data.*) > + *(.bss.efistub) > + _edata = . ; > + } > + . = ALIGN(L1_CACHE_BYTES); > + .bss : { > + _bss = . ; > + *(.bss) > + *(.bss.*) > + *(COMMON) > + . = ALIGN(8); /* For convenience during zeroing */ > + _ebss = .; > + } > + > + /* Sections to be discarded */ > + /DISCARD/ : { > + *(.eh_frame) > + *(*__ksymtab*) > + *(___kcrctab*) > + } > +} > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 1/2] kexec: Support purgatories with .text.hot sections
On Thu, 30 Mar 2023 11:44:47 +0200 Ricardo Ribalda wrote: > Clang16 links the purgatory text in two sections: > > [ 1] .text PROGBITS 0040 >11a1 AX 0 0 16 > [ 2] .rela.textRELA 3498 >0648 0018 I 24 1 8 > ... > [17] .text.hot.PROGBITS 3220 >020b AX 0 0 1 > [18] .rela.text.hot. RELA 4428 >0078 0018 I 2417 8 > > And both of them have their range [sh_addr ... sh_addr+sh_size] on the > area pointed by `e_entry`. > > This causes that image->start is calculated twice, once for .text and > another time for .text.hot. The second calculation leaves image->start > in a random location. > > Because of this, the system crashes immediately after: > > kexec_core: Starting new kernel > > Cc: sta...@vger.kernel.org > Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") > Reviewed-by: Ross Zwisler > Signed-off-by: Ricardo Ribalda > --- > kernel/kexec_file.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index f1a0e4e3fb5c..c7a0e51a6d87 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -901,10 +901,22 @@ static int kexec_purgatory_setup_sechdrs(struct > purgatory_info *pi, > } > > offset = ALIGN(offset, align); > + > + /* > + * Check if the segment contains the entry point, if so, > + * calculate the value of image->start based on it. > + * If the compiler has produced more than one .text section > + * (Eg: .text.hot), they are generally after the main .text > + * section, and they shall not be used to calculate > + * image->start. So do not re-calculate image->start if it > + * is not set to the initial value, and warn the user so they > + * have a chance to fix their purgatory's linker script. > + */ > if (sechdrs[i].sh_flags & SHF_EXECINSTR && > pi->ehdr->e_entry >= sechdrs[i].sh_addr && > pi->ehdr->e_entry < (sechdrs[i].sh_addr > - + sechdrs[i].sh_size)) { > + + sechdrs[i].sh_size) && > + !WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) { > kbuf->image->start -= sechdrs[i].sh_addr; > kbuf->image->start += kbuf->mem + offset; > } > Reviewed-by: Steven Rostedt (Google) Thanks Ricardo! -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Support purgatories with .text.hot sections
On Wed, 22 Mar 2023 22:52:04 +0800 Baoquan He wrote: > When you resne patch, please fix Philipp's mail adress as > 'Philipp Rudo ' if he should know this. He has joined > Redhat. But I thought redhat *was* IBM? ;-) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Support purgatories with .text.hot sections
On Tue, Mar 21, 2023 at 12:49:08PM +0100, Ricardo Ribalda wrote: > Clang16 links the purgatory text in two sections: > > [ 1] .text PROGBITS 0040 >11a1 AX 0 0 16 > [ 2] .rela.textRELA 3498 >0648 0018 I 24 1 8 > ... > [17] .text.hot.PROGBITS 3220 >020b AX 0 0 1 > [18] .rela.text.hot. RELA 4428 >0078 0018 I 2417 8 > > And both of them have their range [sh_addr ... sh_addr+sh_size] on the > area pointed by `e_entry`. > > This causes that image->start is calculated twice, once for .text and > another time for .text.hot. The second calculation leaves image->start > in a random location. > > Because of this, the system crashes inmediatly after: > > kexec_core: Starting new kernel > > Signed-off-by: Ricardo Ribalda > To: Eric Biederman > Cc: Philipp Rudo > Cc: kexec@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > --- > kernel/kexec_file.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index f1a0e4e3fb5c..b1a25d97d5e2 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -904,7 +904,8 @@ static int kexec_purgatory_setup_sechdrs(struct > purgatory_info *pi, > if (sechdrs[i].sh_flags & SHF_EXECINSTR && > pi->ehdr->e_entry >= sechdrs[i].sh_addr && > pi->ehdr->e_entry < (sechdrs[i].sh_addr > - + sechdrs[i].sh_size)) { > + + sechdrs[i].sh_size) && > + kbuf->image->start != pi->ehdr->e_shnum) { Shouldn't this be: kbuf->image->start == pi->ehdr->e_shnum) { ? As you want to only do this update when it's not equal to the initial value. If this did work, then you may want to make sure that was the initial value. Also, please add a comment about why you are doing this check. Thanks! -- Steve > kbuf->image->start -= sechdrs[i].sh_addr; > kbuf->image->start += kbuf->mem + offset; > } > > --- > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b > change-id: 20230321-kexec_clang16-4510c23d129c > > Best regards, > -- > Ricardo Ribalda ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v5 0/3] kexec: Add new parameter to limit the access to kexec
On Wed, 21 Dec 2022 20:45:56 +0100 Ricardo Ribalda wrote: > Add two parameter to specify how many times a kexec kernel can be loaded. > > These parameter allow hardening the system. > > While we are at it, fix a documentation issue and refactor some code. > > To: Jonathan Corbet > To: Eric Biederman > Cc: linux-...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: kexec@lists.infradead.org > Cc: Joel Fernandes (Google) > Cc: Sergey Senozhatsky > Cc: Steven Rostedt > Cc: Ross Zwisler > To: Philipp Rudo > To: Guilherme G. Piccoli > Signed-off-by: Ricardo Ribalda Reviewed-by: Steven Rostedt (Google) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 3/3] kexec: Introduce sysctl parameters kexec_load_limit_*
On Wed, 21 Dec 2022 13:50:03 +0100 Ricardo Ribalda wrote: > @@ -941,6 +995,20 @@ static struct ctl_table kexec_core_sysctls[] = { > .extra1 = SYSCTL_ONE, > .extra2 = SYSCTL_ONE, > }, > + { > + .procname = "kexec_load_limit_panic", > + .data = _limit_panic, > + .maxlen = sizeof(load_limit_panic), If I understand the sysctl logic correctly, the .maxlen is the maxlen of the input to the sysctl, and not the data. Usually set to sizeof(data) because most proc_handlers write to data directly. In this case, I believe it's not even used (you override it with the struct ctl_table tmp). I guess it doesn't really matter what it's set to. Perhaps just set it to zero and leave it out? > + .mode = 0644, > + .proc_handler = kexec_limit_handler, > + }, > + { > + .procname = "kexec_load_limit_reboot", > + .data = _limit_reboot, > + .maxlen = sizeof(load_limit_reboot), Same here. -- Steve > + .mode = 0644, > + .proc_handler = kexec_limit_handler, > + }, > { } > }; > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 3/3] kexec: Introduce parameters load_limit_reboot and load_limit_panic
On Wed, 21 Dec 2022 02:22:57 +0100 Ricardo Ribalda wrote: > > > + kexec_core.load_limit_reboot= > > > + kexec_core.load_limit_panic= > > > + [KNL] > > > + This parameter specifies a limit to the number of > > > times > > > + a kexec kernel can be loaded. > > > + Format: > > > + -1 = Unlimited. > > > + int = Number of times kexec can be called. > > > + > > > + During runtime, this parameter can be modified with > > > a > > > > > + value smaller than the current one (but not -1). > > > > Perhaps state: > > smaller positive value than the current one or if > > current is currently -1. > > I find it a bit complicated.. > What about: > > During runtime this parameter can be modified with a more restrictive value Sure. That's better than the original. > > > +module_param_cb(load_limit_reboot, _limit_ops, _limit_reboot, > > > 0644); > > > +MODULE_PARM_DESC(load_limit_reboot, "Maximum attempts to load a kexec > > > reboot kernel"); > > > + > > > +module_param_cb(load_limit_panic, _limit_ops, _limit_panic, > > > 0644); > > > +MODULE_PARM_DESC(load_limit_reboot, "Maximum attempts to load a kexec > > > panic kernel"); > > > > Wait, why the module params if this can not be a module? > > > > The kernel/kexec.c is decided via CONFIG_KEXEC_CORE which is bool. Either > > builtin or not at all. No module selection possible. > > > > For kernel parameters, we should just use __setup(), right? > > Isn't __setup() only kernel parameter and then it cannot be updated on > runtime? Yes, but then we use sysctl. > > What about using late_param_cb? and remove MODULE_PARAM_DESC ? > > I think this is how these parameters work > > $ ls /sys/module/kernel/parameters/ > consoleblank crash_kexec_post_notifiers ignore_rlimit_data > initcall_debug module_blacklist panic panic_on_warn panic_print > pause_on_oops Well, I think those are more leftovers and not something we want to add to. I believe sysctl is the better option, and a more common one: $ ls /proc/sys/kernel/ acct perf_event_max_contexts_per_stack acpi_video_flags perf_event_max_sample_rate auto_msgmniperf_event_max_stack bootloader_typeperf_event_mlock_kb bootloader_version perf_event_paranoid bpf_stats_enabled pid_max cad_pidpoweroff_cmd cap_last_cap print-fatal-signals core_pattern printk core_pipe_limitprintk_delay core_uses_pid printk_devkmsg ctrl-alt-del printk_ratelimit dmesg_restrict printk_ratelimit_burst domainname pty firmware_configrandom ftrace_dump_on_oopsrandomize_va_space ftrace_enabled real-root-dev hardlockup_all_cpu_backtrace sched_autogroup_enabled hardlockup_panic sched_child_runs_first hostname sched_deadline_period_max_us hotplugsched_deadline_period_min_us hung_task_all_cpu_backtracesched_energy_aware hung_task_check_count sched_rr_timeslice_ms hung_task_check_interval_secs sched_rt_period_us hung_task_panicsched_rt_runtime_us hung_task_timeout_secs sched_util_clamp_max hung_task_warnings sched_util_clamp_min io_delay_type sched_util_clamp_min_rt_default kexec_load_disabledseccomp keys sem kptr_restrict sem_next_id max_lock_depth shmall max_rcu_stall_to_panic shmmax modprobe shmmni modules_disabled shm_next_id msgmax shm_rmid_forced msgmnb softlockup_all_cpu_backtrace msgmni softlockup_panic msg_next_idsoft_watchdog ngroups_maxstack_tracer_enabled nmi_watchdog sysctl_writes_strict ns_last_pidsysrq numa_balancing tainted oops_all_cpu_backtrace task_delayacct osrelease threads-max ostype timer_migration overflowgidtraceoff_on_warning overflowuidtracepoint_printk panic unknown_nmi_panic panic_on_io_nmiunprivileged_bpf_disabled panic_on_oops usermodehelper panic_on_rcu_stall version panic_on_unrecovered_nmi watchdog panic_on_warn watchdog_cpumask panic_printwatchdog_thresh perf_cpu_time_max_percent yama > I could pass the flags and then check for flags & (KEXEC_ON_CRASH | > KEXEC_FILE_ON_CRASH)... but not
Re: [PATCH v3 3/3] kexec: Introduce parameters load_limit_reboot and load_limit_panic
On Tue, 20 Dec 2022 23:05:45 +0100 Ricardo Ribalda wrote: I hate to be the grammar police, but.. > Add two parameter to specify how many times a kexec kernel can be loaded. "parameters" > > The sysadmin can set different limits for kexec panic and kexec reboot > kernels. > > The value can be modified at runtime via sysfs, but only with a value > smaller than the current one (except -1). > > Signed-off-by: Ricardo Ribalda > --- > Documentation/admin-guide/kernel-parameters.txt | 14 > include/linux/kexec.h | 2 +- > kernel/kexec.c | 2 +- > kernel/kexec_core.c | 91 > - > kernel/kexec_file.c | 2 +- > 5 files changed, 106 insertions(+), 5 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 42af9ca0127e..2b37d6a20747 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -2374,6 +2374,20 @@ > for Movable pages. "nn[KMGTPE]", "nn%", and "mirror" > are exclusive, so you cannot specify multiple forms. > > + kexec_core.load_limit_reboot= > + kexec_core.load_limit_panic= > + [KNL] > + This parameter specifies a limit to the number of times > + a kexec kernel can be loaded. > + Format: > + -1 = Unlimited. > + int = Number of times kexec can be called. > + > + During runtime, this parameter can be modified with a > + value smaller than the current one (but not -1). Perhaps state: smaller positive value than the current one or if current is currently -1. > + > + Default: -1 > + > kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port. > Format: [,poll interval] > The controller # is the number of the ehci usb debug > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index 182e0c11b87b..5daf9990d5b8 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -407,7 +407,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage > *image); > extern struct kimage *kexec_image; > extern struct kimage *kexec_crash_image; > > -bool kexec_load_permitted(void); > +bool kexec_load_permitted(bool crash_image); > > #ifndef kexec_flush_icache_page > #define kexec_flush_icache_page(page) > diff --git a/kernel/kexec.c b/kernel/kexec.c > index ce1bca874a8d..7aefd134e319 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -193,7 +193,7 @@ static inline int kexec_load_check(unsigned long > nr_segments, > int result; > > /* We only trust the superuser with rebooting the system. */ > - if (!kexec_load_permitted()) > + if (!kexec_load_permitted(flags & KEXEC_ON_CRASH)) Note, here we have KEXEC_ON_CRASH (see bottom). > return -EPERM; > > /* Permit LSMs and IMA to fail the kexec */ > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index a1efc70f4158..adf71f2be3ff 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -952,13 +952,100 @@ static int __init kexec_core_sysctl_init(void) > late_initcall(kexec_core_sysctl_init); > #endif > > -bool kexec_load_permitted(void) > +struct kexec_load_limit { > + /* Mutex protects the limit count. */ > + struct mutex mutex; > + int limit; > +}; > + > +struct kexec_load_limit load_limit_reboot = { Perhaps make the above static? > + .mutex = __MUTEX_INITIALIZER(load_limit_reboot.mutex), > + .limit = -1, > +}; > + > +struct kexec_load_limit load_limit_panic = { static? > + .mutex = __MUTEX_INITIALIZER(load_limit_panic.mutex), > + .limit = -1, > +}; > + > +static int param_get_limit(char *buffer, const struct kernel_param *kp) > { > + int ret; > + struct kexec_load_limit *limit = kp->arg; Looks better if "int ret;" is after the "limit". > + > + mutex_lock(>mutex); > + ret = scnprintf(buffer, PAGE_SIZE, "%i\n", limit->limit); The above string can be at most "-2147483648\n\0" Which is 13 characters. Why use PAGE_SIZE. Or scnprintf(), and not just state: /* buffer is PAGE_SIZE, much larger than what %i can be */ ret = sprintf(buffer, "%i\n", limit->limit); > + mutex_unlock(>mutex); > + > + return ret; > +} > + > +static int param_set_limit(const char *buffer, const struct kernel_param *kp) > +{ > + int ret; > + struct kexec_load_limit *limit = kp->arg; > + int new_val; > + > + ret = kstrtoint(buffer, 0, _val); > + if (ret) > + return ret; > + > + new_val = max(-1, new_val); I wonder if anything less than -1 should be invalid. > + > +
Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers
On Tue, 13 Dec 2022 20:51:07 -0300 "Guilherme G. Piccoli" wrote: > > Acked-by: Steven Rostedt (Google) > > > > -- Steve > > Hi Steve, since you were so kind in the other patch...decided to ping in > this one as well heheh Heh, yeah, I forgot about this one too. > > So, do you think it's possible to pick it for 6.2? No dependency here, > it's a standalone patch. I can pull it in. > > Thanks again and sorry for the annoyance! No, sorry for missing these. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled
On Tue, 29 Nov 2022 14:44:50 +0100 Philipp Rudo wrote: > An alternative approach and sort of compromise I see is to convert > kexec_load_disabled from a simple on/off switch to a counter on how > often a kexec load can be made (in practice a tristate on/off/one-shot > should be sufficient). Ideally the reboot and panic path will > have separate counters. With that you could for example use > kexec_load_limit.reboot=0 and kexec_load_limit.panic=1 to disable the > load of images for reboot while still allow to load a crash kernel > once. With this you have the flexibility you need while also preventing > a race where an attacker overwrites your crash kernel before you can > toggle the switch. What do you think? I actually like this idea :-) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled
On Mon, 28 Nov 2022 17:28:55 +0100 Philipp Rudo wrote: > To be honest I don't think we make a progress here at the moment. I > would like to hear from others what they think about this. Not sure if you missed my reply. https://lore.kernel.org/all/20221128114200.72b3e...@gandalf.local.home/ -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled
On Thu, 24 Nov 2022 16:01:15 +0100 Philipp Rudo wrote: > No, I think the implementation is fine. I'm currently only struggling > to understand what problem kexec_reboot_disabled solves that cannot be > solved by kexec_load_disabled. Hi Philipp, Thanks for working with us on this. Let me try to explain our use case. We want kexec/kdump enabled, but we really do not want kexec used for any other purpose. We must have the kexec kernel loaded at boot up and not afterward. Your recommendation of: kexec -p dump_kernel echo 1 > /proc/sys/kernel/kexec_load_disabled can work, and we will probably add it. But we are taking the paranoid approach, and what I learned in security 101 ;-) and that is, only open up the minimal attack surface as possible. Yes, it's highly unlikely that the above would crash. But as with most security vulnerabilities, it's not going to be an attacker that creates a new gadget here, but probably another script in the future that causes this to be delayed or something, and a new window of opportunity will arise for an attacker. Maybe, that new window only works for non panic kernels. Yes, this is a contrived scenario, but the work vs risk is very low in adding this feature. Perhaps the attack surface that a reboot kexec could be, is that the attacker gets the ability at boot up to load the kexec for reboot and not panic. Then the attack must wait for the victim to reboot their machine before they have access to the new kernel. Again, I admit this is contrived, but just because I can't think of a real situation that this could be a problem doesn't mean that one doesn't exist. In other words, if we never want to allow a kexec reboot, why allow it at all from the beginning? The above allows it, until we don't. That alone makes us nervous. Whereas this patch is rather trivial and doesn't add complexity. Thanks for your time, we appreciate it. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH RFC] kexec: Freeze processes before kexec
On Wed, 16 Nov 2022 19:56:24 + "Joel Fernandes (Google)" wrote: > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1175,6 +1175,12 @@ int kernel_kexec(void) > } else > #endif > { > + error = freeze_processes(); > + if (error) { > + error = -EBUSY; > + goto Unlock; > + } If this is the path of a kernel panic, do we really want to check the return error of freeze_processes()? We are panicing, there's not much more we can do. -- Steve > + > kexec_in_progress = true; > kernel_restart_prepare("kexec reboot"); > migrate_to_reboot_cpu(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers
On Thu, 20 Oct 2022 18:53:43 -0300 "Guilherme G. Piccoli" wrote: > Could you pick it in your tree? Or do you prefer that I re-send as a > solo patch, with your ACK? I wasn't sure there were any dependencies on this. If not, I can take it. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers
On Fri, 19 Aug 2022 19:17:26 -0300 "Guilherme G. Piccoli" wrote: > Currently the tracing dump_on_oops feature is implemented through > separate notifiers, one for die/oops and the other for panic; > given they have the same functionality, let's unify them. > > Also improve the function comment and change the priority of the > notifier to make it execute earlier, avoiding showing useless trace > data (like the callback names for the other notifiers); finally, > we also removed an unnecessary header inclusion. > > Cc: Petr Mladek > Cc: Sergei Shtylyov > Cc: Steven Rostedt > Signed-off-by: Guilherme G. Piccoli Sorry for the late reply. Acked-by: Steven Rostedt (Google) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 08/13] tracing: Improve panic/die notifiers
On Tue, 16 Aug 2022 10:57:20 -0400 Alan Stern wrote: > > static int trace_die_panic_handler(struct notifier_block *self, > > unsigned long ev, void *unused) > > { > > if (!ftrace_dump_on_oops) > > return NOTIFY_DONE; > > > > /* The die notifier requires DIE_OOPS to trigger */ > > if (self == _die_notifier && ev != DIE_OOPS) > > return NOTIFY_DONE; > > > > ftrace_dump(ftrace_dump_on_oops); > > > > return NOTIFY_DONE; > > } > > Or better yet: > > if (ftrace_dump_on_oops) { > > /* The die notifier requires DIE_OOPS to trigger */ > if (self != _die_notifier || ev == DIE_OOPS) > ftrace_dump(ftrace_dump_on_oops); > } > return NOTIFY_DONE; > That may be more consolidated but less easy to read and follow. This is far from a fast path. As I maintain this bike-shed, I prefer the one I suggested ;-) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 08/13] tracing: Improve panic/die notifiers
On Tue, 19 Jul 2022 16:53:21 -0300 "Guilherme G. Piccoli" wrote: Sorry for the late review, but this fell to the bottom of my queue :-/ > +/* > + * The idea is to execute the following die/panic callback early, in order > + * to avoid showing irrelevant information in the trace (like other panic > + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall > + * warnings get disabled (to prevent potential log flooding). > + */ > +static int trace_die_panic_handler(struct notifier_block *self, > + unsigned long ev, void *unused) > +{ > + if (!ftrace_dump_on_oops) > + goto out; > + > + if (self == _die_notifier && ev != DIE_OOPS) > + goto out; I really hate gotos that are not for clean ups. > + > + ftrace_dump(ftrace_dump_on_oops); > + > +out: > + return NOTIFY_DONE; > +} > + Just do: static int trace_die_panic_handler(struct notifier_block *self, unsigned long ev, void *unused) { if (!ftrace_dump_on_oops) return NOTIFY_DONE; /* The die notifier requires DIE_OOPS to trigger */ if (self == _die_notifier && ev != DIE_OOPS) return NOTIFY_DONE; ftrace_dump(ftrace_dump_on_oops); return NOTIFY_DONE; } Thanks, Other than that, Acked-by: Steven Rostedt (Google) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set
On Thu, 28 Apr 2022 09:01:13 +0800 Xiaoming Ni wrote: > > +#ifdef CONFIG_DEBUG_NOTIFIERS > > + { > > + char sym_name[KSYM_NAME_LEN]; > > + > > + pr_info("notifiers: registered %s()\n", > > + notifier_name(n, sym_name)); > > + } > > Duplicate Code. > > Is it better to use __func__ and %pS? > > pr_info("%s: %pS\n", __func__, n->notifier_call); > > > > +#endif Also, don't sprinkle #ifdef in C code. Instead: if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS)) pr_info("notifers: regsiter %ps()\n", n->notifer_call); Or define a print macro at the start of the C file that is a nop if it is not defined, and use the macro. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers
On Wed, 27 Apr 2022 19:49:17 -0300 "Guilherme G. Piccoli" wrote: > Currently we don't have a way to check if there are dumpers set, > except counting the list members maybe. This patch introduces a very > simple helper to provide this information, by just keeping track of > registered/unregistered kmsg dumpers. It's going to be used on the > panic path in the subsequent patch. FYI, it is considered "bad form" to reference in the change log "this patch". We know this is a patch. The change log should just talk about what is being done. So can you reword your change logs (you do this is almost every patch). Here's what I would reword the above to be: Currently we don't have a way to check if there are dumpers set, except perhaps by counting the list members. Introduce a very simple helper to provide this information, by just keeping track of registered/unregistered kmsg dumpers. This will simplify the refactoring of the panic path. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path
On Tue, 10 May 2022 13:38:39 +0200 Petr Mladek wrote: > As already mentioned in the other reply, panic() sometimes stops > the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus(). > > Another situation is when the CPU using the lock ends in some > infinite loop because something went wrong. The system is in > an unpredictable state during panic(). > > I am not sure if this is possible with the code under gsmi_dev.lock > but such things really happen during panic() in other subsystems. > Using trylock in the panic() code path is a good practice. I believe that Peter Zijlstra had a special spin lock for NMIs or early printk, where it would not block if the lock was held on the same CPU. That is, if an NMI happened and paniced while this lock was held on the same CPU, it would not deadlock. But it would block if the lock was held on another CPU. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 17/30] tracing: Improve panic/die notifiers
On Fri, 29 Apr 2022 10:46:35 -0300 "Guilherme G. Piccoli" wrote: > Thanks Sergei and Steven, good idea! I thought about the switch change > you propose, but I confess I got a bit confused by the "fallthrough" > keyword - do I need to use it? No. The fallthrough keyword is only needed when there's code between case labels. As it is very common to list multiple cases for the same code path. That is: case DIE_OOPS: case PANIC_NOTIFIER: do_dump = 1; break; Does not need a fall through label, as there's no code between the DIE_OOPS and the PANIC_NOTIFIER. But if you had: case DIE_OOPS: x = true; case PANIC_NOTIFIER: do_dump = 1; break; Then you do. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 17/30] tracing: Improve panic/die notifiers
On Fri, 29 Apr 2022 12:22:44 +0300 Sergei Shtylyov wrote: > > + switch (ev) { > > + case DIE_OOPS: > > + do_dump = 1; > > + break; > > + case PANIC_NOTIFIER: > > + do_dump = 1; > > + break; > >Why not: > > case DIE_OOPS: > case PANIC_NOTIFIER: > do_dump = 1; > break; Agreed. Other than that. Acked-by: Steven Rostedt (Google) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v6 00/13] Add build ID to stacktraces
On Tue, 11 May 2021 12:36:06 + David Laight wrote: > > x1 : ff93fef15788 x0 : ffe3622352e0 > > Call trace: > > lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e] > > direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e] > > full_proxy_write+0x74/0xa4 > > Is there any way to get it to print each module ID only once? If there's a trivial way to do that, then perhaps it should be done, but for now, this patch series isn't as obnoxious as the previous versions. It only affects stack traces, and I'm fine with that. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation
On Wed, 17 Feb 2021 12:40:43 -0600 john.p.donne...@oracle.com wrote: > Hello. > > Ping. > > Can we get this reviewed and staged ? > > Thank you. Andrew, Seems you are the only one pushing patches in for kexec/crash. Is this maintained by anyone? -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH] printk: _printk_rb_static_dict can be static
On Fri, 19 Jun 2020 08:49:21 +0200 John Ogness wrote: > > +++ b/kernel/printk/printk.c > > @@ -434,7 +434,7 @@ static u32 log_buf_len = __LOG_BUF_LEN; > > */ > > #define PRB_AVGBITS 5 /* 32 character average length */ > > > > -_DECLARE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS, > > +static _DECLARE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - > > PRB_AVGBITS, > > PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]); > > _DECLARE_PRINTKRB declares multiple variables, so this patch will not > work as intended. I would like to declare the variables static but am > not sure how best to go about it. > > In the Linux source I see examples of macros just desclaring the > variables static. And I see examples of the macros providing a parameter > where the "static" keyword can be specified. > > Since the ringbuffer was created exclusively to serve printk, I would > prefer to just have _DECLARE_PRINTKRB (and DECLARE_PRINTKRB) declare all > the variables as static. Haven written macros that do such things, I agree with your last statement. Just have the macro declare all the variables static. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/2] printk: replace ringbuffer
On Wed, 05 Feb 2020 16:48:32 +0100 John Ogness wrote: > The quick fixup: > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index d0d24ee1d1f4..5ad67ff60cd9 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -883,6 +883,7 @@ static int devkmsg_open(struct inode *inode, struct file > *file) > user->record.text_buf_size = sizeof(user->text_buf); > user->record.dict_buf = >dict_buf[0]; > user->record.dict_buf_size = sizeof(user->dict_buf); > + user->record.text_line_count = NULL; > > logbuf_lock_irq(); > user->seq = prb_first_seq(prb); FYI, I used your patch set to test out Konstantin's new get-lore-mbox script, and then applied them. It locked up on boot up as well, and applying this appears to fix it. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] printk: add lockless buffer
On Tue, 28 Jan 2020 17:25:47 +0106 John Ogness wrote: > diff --git a/kernel/printk/printk_ringbuffer.c > b/kernel/printk/printk_ringbuffer.c > new file mode 100644 > index ..796257f226ee > --- /dev/null > +++ b/kernel/printk/printk_ringbuffer.c > @@ -0,0 +1,1370 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > +#include > +#include > +#include > +#include "printk_ringbuffer.h" > + > +/** > + * DOC: printk_ringbuffer overview > + * > + * Data Structure > + * -- > + * The printk_ringbuffer is made up of 3 internal ringbuffers:: > + * > + * * desc_ring: A ring of descriptors. A descriptor contains all > record > + * meta data (sequence number, timestamp, loglevel, etc.) > + * as well as internal state information about the record > + * and logical positions specifying where in the other > + * ringbuffers the text and dictionary strings are > + * located. > + * > + * * text_data_ring: A ring of data blocks. A data block consists of an > + * unsigned long integer (ID) that maps to a desc_ring > + * index followed by the text string of the record. > + * > + * * dict_data_ring: A ring of data blocks. A data block consists of an > + * unsigned long integer (ID) that maps to a desc_ring > + * index followed by the dictionary string of the record. > + * > + * Implementation > + * -- > + * > + * ABA Issues > + * ~~ > + * To help avoid ABA issues, descriptors are referenced by IDs (index values > + * with tagged states) and data blocks are referenced by logical positions > + * (index values with tagged states). However, on 32-bit systems the number > + * of tagged states is relatively small such that an ABA incident is (at > + * least theoretically) possible. For example, if 4 million maximally sized 4 million? I'm guessing that maximally sized printk messages are 1k? Perhaps say that, otherwise one might think this is a mistake. "4 million maximally sized (1k) printk messages" > + * printk messages were to occur in NMI context on a 32-bit system, the > + * interrupted task would not be able to recognize that the 32-bit integer > + * wrapped and thus represents a different data block than the one the > + * interrupted task expects. > + * > + * To help combat this possibility, additional state checking is performed > + * (such as using cmpxchg() even though set() would suffice). These extra > + * checks will hopefully catch any ABA issue that a 32-bit system might > + * experience. > + * [..] > + * Usage > + * - > + * Here are some simple examples demonstrating writers and readers. For the > + * examples a global ringbuffer (test_rb) is available (which is not the > + * actual ringbuffer used by printk):: > + * > + * DECLARE_PRINTKRB(test_rb, 15, 5, 3); > + * > + * This ringbuffer allows up to 32768 records (2 ^ 15) and has a size of > + * 1 MiB (2 ^ 20) for text data and 256 KiB (2 ^ 18) for dictionary data. (2 ^ (15 + 5)) ... (2 ^ (15 + 3)) ? I'll play around more with this this week. But so far it looks good. -- Steve > + * > + * Sample writer code:: > + * > + * struct prb_reserved_entry e; > + * struct printk_record r; > + * > + * // specify how much to allocate > + * r.text_buf_size = strlen(textstr) + 1; > + * r.dict_buf_size = strlen(dictstr) + 1; > + * > + * if (prb_reserve(, _rb, )) { > + * snprintf(r.text_buf, r.text_buf_size, "%s", textstr); > + * > + * // dictionary allocation may have failed > + * if (r.dict_buf) > + * snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr); > + * > + * r.info->ts_nsec = local_clock(); > + * > + * prb_commit(); > + * } > + * ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)
On Tue, 3 Dec 2019 10:17:21 +0900 Sergey Senozhatsky wrote: > > > BTW: If I am counting correctly. The ABA problem would happen when > > > exactly 2^30 (1G) messages is written in the mean time. > > > > All the ringbuffer code assumes that the use of index numbers handles > > the ABA problem (i.e. there must not be 1 billion printk's within an > > NMI). If we want to support 1 billion+ printk's within an NMI, then > > perhaps the index number should be increased. For 64-bit systems it > > would be no problem to go to 62 bits. For 32-bit systems, I don't know > > how well the 64-bit atomic operations are supported. > > ftrace dumps from NMI (DUMP_ALL type ftrace_dump_on_oops on a $BIG > machine)? 1G seems large enough, but who knows. ftrace dump from NMI is the most likely case to hit this, but when that happens, you are in debugging mode, and the system usually becomes unreliable at this moment. I agree with Petr, that we should not complicate the code more to handle this theoretical condition. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] print kdump kernel loaded status in stack dump
On Thu, 18 Jan 2018 10:02:17 -0800 Andi Kleenwrote: > Dave Young writes: > > printk("%sHardware name: %s\n", > >log_lvl, dump_stack_arch_desc_str); > > + if (kexec_crash_loaded()) > > + printk("%skdump kernel loaded\n", log_lvl); > > Oops/warnings are getting longer and longer, often scrolling away > from the screen, and if the kernel crashes backscroll does not work > anymore, so precious information is lost. > > Can you merge it with some other line? > > Just a [KDUMP] or so somewhere should be good enough. Or perhaps we should add it as a TAINT. Not all taints are bad. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote: > Currently, panic() and crash_kexec() can be called at the same time. > For example (x86 case): > > CPU 0: > oops_end() > crash_kexec() > mutex_trylock() // acquired > nmi_shootdown_cpus() // stop other cpus > > CPU 1: > panic() > crash_kexec() > mutex_trylock() // failed to acquire > smp_send_stop() // stop other cpus > infinite loop > > If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump > fails. So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right? > > In another case: > > CPU 0: > oops_end() > crash_kexec() > mutex_trylock() // acquired > > io_check_error() > panic() > crash_kexec() > mutex_trylock() // failed to acquire > infinite loop > > Clearly, this is an undesirable result. I'm trying to see how this patch fixes this case. > > To fix this problem, this patch changes crash_kexec() to exclude > others by using atomic_t panic_cpu. > > V5: > - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case > - Replace atomic_xchg() with atomic_set() in crash_kexec() because > it is used as a release operation and there is no need of memory > barrier effect. This change also removes an unused value warning > > V4: > - Use new __crash_kexec(), no exclusion check version of crash_kexec(), > instead of checking if panic_cpu is the current cpu or not > > V2: > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock > to exclude concurrent accesses > - Don't introduce no-lock version of crash_kexec() > > Signed-off-by: Hidehiro Kawai> Cc: Eric Biederman > Cc: Vivek Goyal > Cc: Andrew Morton > Cc: Michal Hocko > --- > include/linux/kexec.h |2 ++ > kernel/kexec_core.c | 26 +- > kernel/panic.c|4 ++-- > 3 files changed, 29 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index d140b1e..7b68d27 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage > *image, > unsigned int size, bool get_value); > extern void *kexec_purgatory_get_symbol_addr(struct kimage *image, >const char *name); > +extern void __crash_kexec(struct pt_regs *); > extern void crash_kexec(struct pt_regs *); > int kexec_should_crash(struct task_struct *); > void crash_save_cpu(struct pt_regs *regs, int cpu); > @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr > *ehdr, Elf_Shdr *sechdrs, > #else /* !CONFIG_KEXEC_CORE */ > struct pt_regs; > struct task_struct; > +static inline void __crash_kexec(struct pt_regs *regs) { } > static inline void crash_kexec(struct pt_regs *regs) { } > static inline int kexec_should_crash(struct task_struct *p) { return 0; } > #define kexec_in_progress false > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 11b64a6..9d097f5 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -853,7 +853,8 @@ struct kimage *kexec_image; > struct kimage *kexec_crash_image; > int kexec_load_disabled; > > -void crash_kexec(struct pt_regs *regs) > +/* No panic_cpu check version of crash_kexec */ > +void __crash_kexec(struct pt_regs *regs) > { > /* Take the kexec_mutex here to prevent sys_kexec_load >* running on one cpu from replacing the crash kernel > @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs) > } > } > > +void crash_kexec(struct pt_regs *regs) > +{ > + int old_cpu, this_cpu; > + > + /* > + * Only one CPU is allowed to execute the crash_kexec() code as with > + * panic(). Otherwise parallel calls of panic() and crash_kexec() > + * may stop each other. To exclude them, we use panic_cpu here too. > + */ > + this_cpu = raw_smp_processor_id(); > + old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu); > + if (old_cpu == -1) { > + /* This is the 1st CPU which comes here, so go ahead. */ > + __crash_kexec(regs); > + > + /* > + * Reset panic_cpu to allow another panic()/crash_kexec() > + * call. > + */ > + atomic_set(_cpu, -1); > + } > +} > + > size_t crash_get_memory_size(void) > { > size_t size = 0; > diff --git a/kernel/panic.c b/kernel/panic.c > index 4fce2be..5d0b807 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -138,7 +138,7 @@ void panic(const char *fmt, ...) >* the "crash_kexec_post_notifiers" option to the kernel. >*/ > if (!crash_kexec_post_notifiers) > - crash_kexec(NULL); > + __crash_kexec(NULL); Why call
Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote: > > > +*/ > > + while (!raw_spin_trylock(_reason_lock)) > > + poll_crash_ipi_and_callback(regs); > > Waaait a minute: so if we're getting NMIs broadcasted on every core but > we're *not* crash dumping, we will run into here too. This can't be > right. :-\ This only does something if crash_ipi_done is set, which means you are killing the box. But perhaps a comment that states that here would be useful, or maybe just put in the check here. There's no need to make it depend on SMP, as raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be hit. -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
On Tue, 24 Nov 2015 10:05:10 -0500 Steven Rostedt <rost...@goodmis.org> wrote: > cmpxchg(_cpu, -1, 0) != 0 > > returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0 That was suppose to be "thus -1 != 0". -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote: > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 350dfb0..480a4fd 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow; > > extern bool crash_kexec_post_notifiers; > > +extern atomic_t panic_cpu; > + > +/* > + * A variant of panic() called from NMI context. > + * If we've already panicked on this cpu, return from here. > + */ > +#define nmi_panic(fmt, ...) \ > + do {\ > + int this_cpu = raw_smp_processor_id(); \ > + if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \ > + panic(fmt, ##__VA_ARGS__); \ Hmm, What happens if: CPU 0: CPU 1: -- -- nmi_panic(); nmi_panic(); nmi_panic(); ? cmpxchg(_cpu, -1, 0) != 0 returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0 cmpxchg(_cpu, -1, 1) != 1 returns 0, and then it too panics, but does not set panic_cpu to 1 Now you have your external NMI triggering on CPU 1 cmpxchg(_cpu, -1, 1) != 1 returns 0 again, and you call panic again within the panic of CPU 1. Is this OK? Perhaps you want a per cpu bitmask, and do a test_and_set() on the CPU. That would prevent any CPU from rerunning a panic() twice on any CPU. -- Steve > + } while (0) > + > /* > * Only to be used by arch init code. If the user over-wrote the default > * CONFIG_PANIC_TIMEOUT, honor it. > diff --git a/kernel/panic.c b/kernel/panic.c > index 4579dbb..24ee2ea 100644 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore
On Mon, 11 Aug 2008, Huang Ying wrote: Hi, Steven, On Fri, 2008-08-08 at 10:30 -0400, Steven Rostedt wrote: [...] The only problem with this approach is what happens if the user changes the enabled in between these two calls. This would make ftrace inconsistent. I have a patch from the -rt tree that handles what you want. It is attached below. Not sure how well it will apply to mainline. I really need to go through the rt patch set and start submitting a bunch of clean-up/fixes to mainline. We've been meaning to do it, just have been distracted :-( Your version is better in general sense. Thank you very much! But in this specific situation of kexec/kjump. The execution environment is that other CPUs are disabled, local irq is disabled, and it is not permitted to switch to other process. But it is safe and sufficient to use non-locked version here. So to satisfy both demands, I think it is better to provide both version, locked and non-locked. What do you think about that? Sounds good, I'm looking forward to the patch ;-) -- Steve ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore
On Fri, 8 Aug 2008, Vivek Goyal wrote: On Fri, Aug 08, 2008 at 02:52:48PM +0800, Huang Ying wrote: Add ftrace_enabled_save/restore, used to disable ftrace for a while. This is used by kexec jump. Signed-off-by: Huang Ying [EMAIL PROTECTED] CCing Steven Rostedt for ftrace related changes. Thanks, --- include/linux/ftrace.h | 18 ++ 1 file changed, 18 insertions(+) --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -98,6 +98,24 @@ static inline void tracer_disable(void) #endif } +static inline int ftrace_enabled_save(void) +{ +#ifdef CONFIG_FTRACE + int saved_ftrace_enabled = ftrace_enabled; + ftrace_enabled = 0; + return saved_ftrace_enabled; +#else + return 0; +#endif +} + +static inline void ftrace_enabled_restore(int enabled) +{ +#ifdef CONFIG_FTRACE + ftrace_enabled = enabled; +#endif +} + #ifdef CONFIG_FRAME_POINTER /* TODO: need to fix this for ARM */ # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) The only problem with this approach is what happens if the user changes the enabled in between these two calls. This would make ftrace inconsistent. I have a patch from the -rt tree that handles what you want. It is attached below. Not sure how well it will apply to mainline. I really need to go through the rt patch set and start submitting a bunch of clean-up/fixes to mainline. We've been meaning to do it, just have been distracted :-( -- Steve From: Steven Rostedt [EMAIL PROTECTED] Subject: ftrace: cpu hotplug fix Peter Zijlstra found that taking down and bringing up a new CPU caused ftrace to crash the kernel. This was due to some arch calls that were being traced by the function tracer before the smp_processor_id was set up. Since the function tracer uses smp_processor_id it caused a triple fault. Instead of adding notrace all over the architecture code to prevent this problem, it is easier to simply disable the function tracer when bringing up a new CPU. Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- include/linux/ftrace.h| 11 --- kernel/cpu.c |9 + kernel/trace/ftrace.c | 23 +++ kernel/trace/trace_irqsoff.c |3 +++ kernel/trace/trace_sched_wakeup.c |2 +- 5 files changed, 44 insertions(+), 4 deletions(-) Index: linux-2.6.26/include/linux/ftrace.h === --- linux-2.6.26.orig/include/linux/ftrace.h +++ linux-2.6.26/include/linux/ftrace.h @@ -33,10 +33,15 @@ void clear_ftrace_function(void); extern void ftrace_stub(unsigned long a0, unsigned long a1); +void ftrace_enable(void); +void ftrace_disable(void); + #else /* !CONFIG_FTRACE */ -# define register_ftrace_function(ops) do { } while (0) -# define unregister_ftrace_function(ops) do { } while (0) -# define clear_ftrace_function(ops) do { } while (0) +# define register_ftrace_function(ops) do { } while (0) +# define unregister_ftrace_function(ops) do { } while (0) +# define clear_ftrace_function(ops)do { } while (0) +# define ftrace_enable() do { } while (0) +# define ftrace_disable() do { } while (0) #endif /* CONFIG_FTRACE */ #ifdef CONFIG_DYNAMIC_FTRACE Index: linux-2.6.26/kernel/cpu.c === --- linux-2.6.26.orig/kernel/cpu.c +++ linux-2.6.26/kernel/cpu.c @@ -14,6 +14,7 @@ #include linux/kthread.h #include linux/stop_machine.h #include linux/mutex.h +#include linux/ftrace.h /* Serializes the updates to cpu_online_map, cpu_present_map */ static DEFINE_MUTEX(cpu_add_remove_lock); @@ -300,8 +301,16 @@ static int __cpuinit _cpu_up(unsigned in goto out_notify; } + /* +* Disable function tracing while bringing up a new CPU. +* We don't want to trace functions that can not handle a +* smp_processor_id() call. +*/ + ftrace_disable(); + /* Arch-specific enabling code. */ ret = __cpu_up(cpu); + ftrace_enable(); if (ret != 0) goto out_notify; BUG_ON(!cpu_online(cpu)); Index: linux-2.6.26/kernel/trace/ftrace.c === --- linux-2.6.26.orig/kernel/trace/ftrace.c +++ linux-2.6.26/kernel/trace/ftrace.c @@ -151,6 +151,29 @@ static int __unregister_ftrace_function( return ret; } +static int save_ftrace_enabled; + +void ftrace_disable(void) +{ + mutex_lock(ftrace_sysctl_lock); + + save_ftrace_enabled = ftrace_enabled; + ftrace_enabled = 0; +} + +void ftrace_enable(void) +{ + /* ftrace_enable must be paired with ftrace_disable */ + if (!mutex_is_locked(ftrace_sysctl_lock)) { + WARN_ON(1); + return; + } + + ftrace_enabled