Re: [PATCH 09/15] tracing: Introduce names for events

2023-12-12 Thread Steven Rostedt
On Wed, 13 Dec 2023 00:04:46 +
Alexander Graf  wrote:

> With KHO (Kexec HandOver), we want to preserve trace buffers. To parse
> them, we need to ensure that all trace events that exist in the logs are
> identical to the ones we parse as. That means we need to match the
> events before and after kexec.
> 
> As a first step towards that, let's give every event a unique name. That
> way we can clearly identify the event before and after kexec and restore
> its ID post-kexec.
> 
> Signed-off-by: Alexander Graf 
> ---
>  include/linux/trace_events.h |  1 +
>  include/trace/trace_events.h |  2 ++
>  kernel/trace/blktrace.c  |  1 +
>  kernel/trace/trace_branch.c  |  1 +
>  kernel/trace/trace_events.c  |  3 +++
>  kernel/trace/trace_functions_graph.c |  4 +++-
>  kernel/trace/trace_output.c  | 13 +
>  kernel/trace/trace_probe.c   |  3 +++
>  kernel/trace/trace_syscalls.c| 29 
>  9 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index d68ff9b1247f..7670224aa92d 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -149,6 +149,7 @@ struct trace_event {
>   struct hlist_node   node;
>   int type;
>   struct trace_event_functions*funcs;
> + const char  *name;
>  };

OK, this is a hard no. We definitely need to find a different way to do
this. I'm trying hard to lower the footprint of tracing, and this just
added 8 bytes to every event on a 64 bit machine.

On my box I have 1953 events, and they are constantly growing. This just
added 15,624 bytes of tracing overhead to that machine.

That may not sound like much, but as this is only for this feature, it just
added 15K to the overhead for the majority of users.

I'm not sure how easy it is to make this a config option that takes away
that field when not set. But I would need that at a minimum.

-- Steve




Re: [PATCH 08/15] tracing: Introduce names for ring buffers

2023-12-12 Thread Steven Rostedt
On Wed, 13 Dec 2023 01:35:16 +0100
Alexander Graf  wrote:

> > The trace_array is the structure that represents each tracing instance. And
> > it already has a name field. And if you can get the associated ring buffer
> > from that too.
> >
> > struct trace_array *tr;
> >
> >  tr->array_buffer.buffer
> >
> >  tr->name
> >
> > When you do: mkdir /sys/kernel/tracing/instance/foo
> >
> > You create a new trace_array instance where tr->name = "foo" and allocates
> > the buffer for it as well.  
> 
> The name in the ring buffer is pretty much just a copy of the trace 
> array name. I use it to reconstruct which buffer we're actually 
> referring to inside __ring_buffer_alloc().

No, I rather not tie the ring buffer to the trace_array.

> 
> I'm all ears for alternative suggestions. I suppose we could pass tr as 
> argument to ring_buffer_alloc() instead of the name?

I'll have to spend some time (that I don't currently have :-( ) on looking
at this more. I really don't like the copying of the name into the ring
buffer allocation, as it may be an unneeded burden to maintain, not to
mention the duplicate field.

-- Steve



Re: [PATCH 08/15] tracing: Introduce names for ring buffers

2023-12-12 Thread Steven Rostedt
On Wed, 13 Dec 2023 00:04:45 +
Alexander Graf  wrote:

> With KHO (Kexec HandOver), we want to preserve trace buffers across
> kexec. To carry over their state between kernels, the kernel needs a
> common handle for them that exists on both sides. As handle we introduce
> names for ring buffers. In a follow-up patch, the kernel can then use
> these names to recover buffer contents for specific ring buffers.
> 

Is there a way to use the trace_array name instead?

The trace_array is the structure that represents each tracing instance. And
it already has a name field. And if you can get the associated ring buffer
from that too.

struct trace_array *tr;

tr->array_buffer.buffer

tr->name

When you do: mkdir /sys/kernel/tracing/instance/foo

You create a new trace_array instance where tr->name = "foo" and allocates
the buffer for it as well.

-- Steve



Re: [PATCH 03/10] ftrace: Remove the now superfluous sentinel elements from ctl_table array

2023-11-08 Thread Steven Rostedt
On Wed, 8 Nov 2023 19:29:49 +0900
Masami Hiramatsu (Google)  wrote:

> On Tue, 07 Nov 2023 14:45:03 +0100
> Joel Granados via B4 Relay  wrote:
> 
> > From: Joel Granados 
> > 
> > This commit comes at the tail end of a greater effort to remove the
> > empty elements at the end of the ctl_table arrays (sentinels) which
> > will reduce the overall build time size of the kernel and run time
> > memory bloat by ~64 bytes per sentinel (further information Link :
> > https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> > 
> > Remove sentinel elements from ftrace_sysctls and user_event_sysctls
> >   
> 
> Both looks good to me. (since register_sysctl_init() uses ARRAY_SIZE()
> macro to get the array size.)
> 
> Acked-by: Masami Hiramatsu (Google) 

Agreed.

Acked-by: Steven Rostedt (Google) 

-- Steve



Re: [PATCH] kexec: Fix reboot race during device_shutdown()

2023-10-09 Thread Steven Rostedt
On Sat, 7 Oct 2023 21:30:42 -0400
Joel Fernandes  wrote:

> Just checking how we want to proceed, is the consensus that we should
> prevent kernel crashes without relying on userspace stopping all
> processes? Should we fix regular reboot syscall as well and not just
> kexec reboot?

If you can show that we can trigger the crash on normal reboot, then I
don't see why not. That is, if you have a program that does the reboot
(without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's
a legitimate reason to fix it on normal reboot too.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 2/2] x86/purgatory: Add linker script

2023-03-30 Thread Steven Rostedt
On Thu, 30 Mar 2023 17:18:26 +0200
Borislav Petkov  wrote:

> On Thu, Mar 30, 2023 at 11:15:23AM -0400, Steven Rostedt wrote:
> > > Make sure that the .text section is not divided in multiple overlapping
> > > sections. This is not supported by kexec_file.  
> 
> And?
> 
> What is the failure scenario? Why are you fixing it? Why do we care?
> 
> This is way too laconic.
> 

Yeah, I think the change log in patch 1 needs to be in this patch too,
which gives better context.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 2/2] x86/purgatory: Add linker script

2023-03-30 Thread Steven Rostedt


Hmm, this patch may need some more eyes. At least from the x86 maintainers.

-- Steve


On Thu, 30 Mar 2023 11:44:48 +0200
Ricardo Ribalda  wrote:

> Make sure that the .text section is not divided in multiple overlapping
> sections. This is not supported by kexec_file.
> 
> Signed-off-by: Ricardo Ribalda 
> ---
>  arch/x86/purgatory/.gitignore|  2 ++
>  arch/x86/purgatory/Makefile  | 20 +
>  arch/x86/purgatory/kexec-purgatory.S |  2 +-
>  arch/x86/purgatory/purgatory.lds.S   | 57 
> 
>  4 files changed, 74 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/purgatory/.gitignore b/arch/x86/purgatory/.gitignore
> index d2be1500671d..1fe71fe5945d 100644
> --- a/arch/x86/purgatory/.gitignore
> +++ b/arch/x86/purgatory/.gitignore
> @@ -1 +1,3 @@
>  purgatory.chk
> +purgatory.lds
> +purgatory
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 17f09dc26381..4dc96d409bec 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -16,10 +16,11 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS
>  
>  # When linking purgatory.ro with -r unresolved symbols are not checked,
>  # also link a purgatory.chk binary without -r to check for unresolved 
> symbols.
> -PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
> -LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS)
> -LDFLAGS_purgatory.chk := $(PURGATORY_LDFLAGS)
> -targets += purgatory.ro purgatory.chk
> +PURGATORY_LDFLAGS := -nostdlib -z nodefaultlib
> +LDFLAGS_purgatory := -r $(PURGATORY_LDFLAGS) -T
> +LDFLAGS_purgatory.chk := -e purgatory_start $(PURGATORY_LDFLAGS)
> +
> +targets += purgatory.lds purgatory.ro purgatory.chk
>  
>  # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>  GCOV_PROFILE := n
> @@ -72,10 +73,17 @@ CFLAGS_string.o   += $(PURGATORY_CFLAGS)
>  AFLAGS_REMOVE_setup-x86_$(BITS).o+= -Wa,-gdwarf-2
>  AFLAGS_REMOVE_entry64.o  += -Wa,-gdwarf-2
>  
> -$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
> +OBJCOPYFLAGS_purgatory.ro := -O elf64-x86-64
> +OBJCOPYFLAGS_purgatory.ro += --remove-section='*debug*'
> +OBJCOPYFLAGS_purgatory.ro += --remove-section='.comment'
> +OBJCOPYFLAGS_purgatory.ro += --remove-section='.note.*'
> +$(obj)/purgatory.ro: $(obj)/purgatory FORCE
> + $(call if_changed,objcopy)
> +
> +$(obj)/purgatory.chk: $(obj)/purgatory FORCE
>   $(call if_changed,ld)
>  
> -$(obj)/purgatory.chk: $(obj)/purgatory.ro FORCE
> +$(obj)/purgatory: $(obj)/purgatory.lds $(PURGATORY_OBJS) FORCE
>   $(call if_changed,ld)
>  
>  $(obj)/kexec-purgatory.o: $(obj)/purgatory.ro $(obj)/purgatory.chk
> diff --git a/arch/x86/purgatory/kexec-purgatory.S 
> b/arch/x86/purgatory/kexec-purgatory.S
> index 8530fe93b718..54b0d0b4dc42 100644
> --- a/arch/x86/purgatory/kexec-purgatory.S
> +++ b/arch/x86/purgatory/kexec-purgatory.S
> @@ -5,7 +5,7 @@
>   .align  8
>  kexec_purgatory:
>   .globl  kexec_purgatory
> - .incbin "arch/x86/purgatory/purgatory.ro"
> + .incbin "arch/x86/purgatory/purgatory"
>  .Lkexec_purgatory_end:
>  
>   .align  8
> diff --git a/arch/x86/purgatory/purgatory.lds.S 
> b/arch/x86/purgatory/purgatory.lds.S
> new file mode 100644
> index ..610da88aafa0
> --- /dev/null
> +++ b/arch/x86/purgatory/purgatory.lds.S
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include 
> +
> +OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
> +
> +#undef i386
> +
> +#include 
> +#include 
> +
> +ENTRY(purgatory_start)
> +
> +SECTIONS
> +{
> + . = 0;
> + .head.text : {
> + _head = . ;
> + HEAD_TEXT
> + _ehead = . ;
> + }
> + .rodata : {
> + _rodata = . ;
> + *(.rodata)   /* read-only data */
> + *(.rodata.*)
> + _erodata = . ;
> + }
> + .text : {
> + _text = .;  /* Text */
> + *(.text)
> + *(.text.*)
> + *(.noinstr.text)
> + _etext = . ;
> + }
> + .data : {
> + _data = . ;
> + *(.data)
> + *(.data.*)
> + *(.bss.efistub)
> + _edata = . ;
> + }
> + . = ALIGN(L1_CACHE_BYTES);
> + .bss : {
> + _bss = . ;
> + *(.bss)
> + *(.bss.*)
> + *(COMMON)
> + . = ALIGN(8);   /* For convenience during zeroing */
> + _ebss = .;
> + }
> +
> + /* Sections to be discarded */
> + /DISCARD/ : {
> + *(.eh_frame)
> + *(*__ksymtab*)
> + *(___kcrctab*)
> + }
> +}
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 1/2] kexec: Support purgatories with .text.hot sections

2023-03-30 Thread Steven Rostedt
On Thu, 30 Mar 2023 11:44:47 +0200
Ricardo Ribalda  wrote:

> Clang16 links the purgatory text in two sections:
> 
>   [ 1] .text PROGBITS   0040
>11a1    AX   0 0 16
>   [ 2] .rela.textRELA   3498
>0648  0018   I  24 1 8
>   ...
>   [17] .text.hot.PROGBITS   3220
>020b    AX   0 0 1
>   [18] .rela.text.hot.   RELA   4428
>0078  0018   I  2417 8
> 
> And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> area pointed by `e_entry`.
> 
> This causes that image->start is calculated twice, once for .text and
> another time for .text.hot. The second calculation leaves image->start
> in a random location.
> 
> Because of this, the system crashes immediately after:
> 
> kexec_core: Starting new kernel
> 
> Cc: sta...@vger.kernel.org
> Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
> Reviewed-by: Ross Zwisler 
> Signed-off-by: Ricardo Ribalda 
> ---
>  kernel/kexec_file.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index f1a0e4e3fb5c..c7a0e51a6d87 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -901,10 +901,22 @@ static int kexec_purgatory_setup_sechdrs(struct 
> purgatory_info *pi,
>   }
>  
>   offset = ALIGN(offset, align);
> +
> + /*
> +  * Check if the segment contains the entry point, if so,
> +  * calculate the value of image->start based on it.
> +  * If the compiler has produced more than one .text section
> +  * (Eg: .text.hot), they are generally after the main .text
> +  * section, and they shall not be used to calculate
> +  * image->start. So do not re-calculate image->start if it
> +  * is not set to the initial value, and warn the user so they
> +  * have a chance to fix their purgatory's linker script.
> +  */
>   if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
>   pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
>   pi->ehdr->e_entry < (sechdrs[i].sh_addr
> -  + sechdrs[i].sh_size)) {
> +  + sechdrs[i].sh_size) &&
> +     !WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
>   kbuf->image->start -= sechdrs[i].sh_addr;
>   kbuf->image->start += kbuf->mem + offset;
>   }
> 

Reviewed-by: Steven Rostedt (Google) 

Thanks Ricardo!

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Support purgatories with .text.hot sections

2023-03-22 Thread Steven Rostedt
On Wed, 22 Mar 2023 22:52:04 +0800
Baoquan He  wrote:

> When you resne patch, please fix Philipp's mail adress as
> 'Philipp Rudo ' if he should know this. He has joined
> Redhat.

But I thought redhat *was* IBM? ;-)

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Support purgatories with .text.hot sections

2023-03-22 Thread Steven Rostedt
On Tue, Mar 21, 2023 at 12:49:08PM +0100, Ricardo Ribalda wrote:
> Clang16 links the purgatory text in two sections:
> 
>   [ 1] .text PROGBITS   0040
>11a1    AX   0 0 16
>   [ 2] .rela.textRELA   3498
>0648  0018   I  24 1 8
>   ...
>   [17] .text.hot.PROGBITS   3220
>020b    AX   0 0 1
>   [18] .rela.text.hot.   RELA   4428
>0078  0018   I  2417 8
> 
> And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> area pointed by `e_entry`.
> 
> This causes that image->start is calculated twice, once for .text and
> another time for .text.hot. The second calculation leaves image->start
> in a random location.
> 
> Because of this, the system crashes inmediatly after:
> 
> kexec_core: Starting new kernel
> 
> Signed-off-by: Ricardo Ribalda 
> To: Eric Biederman 
> Cc: Philipp Rudo 
> Cc: kexec@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  kernel/kexec_file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index f1a0e4e3fb5c..b1a25d97d5e2 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -904,7 +904,8 @@ static int kexec_purgatory_setup_sechdrs(struct 
> purgatory_info *pi,
>   if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
>   pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
>   pi->ehdr->e_entry < (sechdrs[i].sh_addr
> -  + sechdrs[i].sh_size)) {
> +  + sechdrs[i].sh_size) &&
> + kbuf->image->start != pi->ehdr->e_shnum) {

Shouldn't this be: kbuf->image->start == pi->ehdr->e_shnum) {

?

As you want to only do this update when it's not equal to the initial value.
If this did work, then you may want to make sure that was the initial value.

Also, please add a comment about why you are doing this check.

Thanks!

-- Steve

>   kbuf->image->start -= sechdrs[i].sh_addr;
>   kbuf->image->start += kbuf->mem + offset;
>   }
> 
> ---
> base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> change-id: 20230321-kexec_clang16-4510c23d129c
> 
> Best regards,
> -- 
> Ricardo Ribalda 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/3] kexec: Add new parameter to limit the access to kexec

2022-12-21 Thread Steven Rostedt
On Wed, 21 Dec 2022 20:45:56 +0100
Ricardo Ribalda  wrote:

> Add two parameter to specify how many times a kexec kernel can be loaded.
> 
> These parameter allow hardening the system.
> 
> While we are at it, fix a documentation issue and refactor some code.
> 
> To: Jonathan Corbet 
> To: Eric Biederman 
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: kexec@lists.infradead.org
> Cc: Joel Fernandes (Google) 
> Cc: Sergey Senozhatsky 
> Cc: Steven Rostedt 
> Cc: Ross Zwisler 
> To: Philipp Rudo 
> To: Guilherme G. Piccoli 
> Signed-off-by: Ricardo Ribalda 

Reviewed-by: Steven Rostedt (Google) 

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 3/3] kexec: Introduce sysctl parameters kexec_load_limit_*

2022-12-21 Thread Steven Rostedt
On Wed, 21 Dec 2022 13:50:03 +0100
Ricardo Ribalda  wrote:

> @@ -941,6 +995,20 @@ static struct ctl_table kexec_core_sysctls[] = {
>   .extra1 = SYSCTL_ONE,
>   .extra2 = SYSCTL_ONE,
>   },
> + {
> + .procname   = "kexec_load_limit_panic",
> + .data   = _limit_panic,
> + .maxlen = sizeof(load_limit_panic),

If I understand the sysctl logic correctly, the .maxlen is the maxlen of
the input to the sysctl, and not the data. Usually set to sizeof(data)
because most proc_handlers write to data directly.

In this case, I believe it's not even used (you override it with the
struct ctl_table tmp). I guess it doesn't really matter what it's set to.
Perhaps just set it to zero and leave it out?

> + .mode   = 0644,
> + .proc_handler   = kexec_limit_handler,
> + },
> + {
> + .procname   = "kexec_load_limit_reboot",
> + .data   = _limit_reboot,
> + .maxlen = sizeof(load_limit_reboot),

Same here.

-- Steve

> + .mode   = 0644,
> + .proc_handler   = kexec_limit_handler,
> + },
>   { }
>  };
>  

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 3/3] kexec: Introduce parameters load_limit_reboot and load_limit_panic

2022-12-20 Thread Steven Rostedt
On Wed, 21 Dec 2022 02:22:57 +0100
Ricardo Ribalda  wrote:

> > > + kexec_core.load_limit_reboot=
> > > + kexec_core.load_limit_panic=
> > > + [KNL]
> > > + This parameter specifies a limit to the number of 
> > > times
> > > + a kexec kernel can be loaded.
> > > + Format: 
> > > + -1  = Unlimited.
> > > + int = Number of times kexec can be called.
> > > +
> > > + During runtime, this parameter can be modified with 
> > > a  
> >  
> > > + value smaller than the current one (but not -1).  
> >
> > Perhaps state:
> > smaller positive value than the current one or if
> > current is currently -1.  
> 
> I find it a bit complicated..
> What about:
> 
>  During runtime this parameter can be modified with a more restrictive value

Sure. That's better than the original.


> > > +module_param_cb(load_limit_reboot, _limit_ops, _limit_reboot, 
> > > 0644);
> > > +MODULE_PARM_DESC(load_limit_reboot, "Maximum attempts to load a kexec 
> > > reboot kernel");
> > > +
> > > +module_param_cb(load_limit_panic, _limit_ops, _limit_panic, 
> > > 0644);
> > > +MODULE_PARM_DESC(load_limit_reboot, "Maximum attempts to load a kexec 
> > > panic kernel");  
> >
> > Wait, why the module params if this can not be a module?
> >
> > The kernel/kexec.c is decided via CONFIG_KEXEC_CORE which is bool. Either
> > builtin or not at all. No module selection possible.
> >
> > For kernel parameters, we should just use __setup(), right?  
> 
> Isn't __setup() only kernel parameter and then it cannot be updated on 
> runtime?

Yes, but then we use sysctl.

> 
> What about using late_param_cb? and remove MODULE_PARAM_DESC ?
> 
> I think this is how these parameters work
> 
> $ ls /sys/module/kernel/parameters/
> consoleblank  crash_kexec_post_notifiers  ignore_rlimit_data
> initcall_debug  module_blacklist  panic  panic_on_warn  panic_print
> pause_on_oops

Well, I think those are more leftovers and not something we want to add to.

I believe sysctl is the better option, and a more common one:

$ ls /proc/sys/kernel/
acct   perf_event_max_contexts_per_stack
acpi_video_flags   perf_event_max_sample_rate
auto_msgmniperf_event_max_stack
bootloader_typeperf_event_mlock_kb
bootloader_version perf_event_paranoid
bpf_stats_enabled  pid_max
cad_pidpoweroff_cmd
cap_last_cap   print-fatal-signals
core_pattern   printk
core_pipe_limitprintk_delay
core_uses_pid  printk_devkmsg
ctrl-alt-del   printk_ratelimit
dmesg_restrict printk_ratelimit_burst
domainname pty
firmware_configrandom
ftrace_dump_on_oopsrandomize_va_space
ftrace_enabled real-root-dev
hardlockup_all_cpu_backtrace   sched_autogroup_enabled
hardlockup_panic   sched_child_runs_first
hostname   sched_deadline_period_max_us
hotplugsched_deadline_period_min_us
hung_task_all_cpu_backtracesched_energy_aware
hung_task_check_count  sched_rr_timeslice_ms
hung_task_check_interval_secs  sched_rt_period_us
hung_task_panicsched_rt_runtime_us
hung_task_timeout_secs sched_util_clamp_max
hung_task_warnings sched_util_clamp_min
io_delay_type  sched_util_clamp_min_rt_default
kexec_load_disabledseccomp
keys   sem
kptr_restrict  sem_next_id
max_lock_depth shmall
max_rcu_stall_to_panic shmmax
modprobe   shmmni
modules_disabled   shm_next_id
msgmax shm_rmid_forced
msgmnb softlockup_all_cpu_backtrace
msgmni softlockup_panic
msg_next_idsoft_watchdog
ngroups_maxstack_tracer_enabled
nmi_watchdog   sysctl_writes_strict
ns_last_pidsysrq
numa_balancing tainted
oops_all_cpu_backtrace task_delayacct
osrelease  threads-max
ostype timer_migration
overflowgidtraceoff_on_warning
overflowuidtracepoint_printk
panic  unknown_nmi_panic
panic_on_io_nmiunprivileged_bpf_disabled
panic_on_oops  usermodehelper
panic_on_rcu_stall version
panic_on_unrecovered_nmi   watchdog
panic_on_warn  watchdog_cpumask
panic_printwatchdog_thresh
perf_cpu_time_max_percent  yama



> I could pass the flags and then check for flags & (KEXEC_ON_CRASH |
> KEXEC_FILE_ON_CRASH)... but not 

Re: [PATCH v3 3/3] kexec: Introduce parameters load_limit_reboot and load_limit_panic

2022-12-20 Thread Steven Rostedt
On Tue, 20 Dec 2022 23:05:45 +0100
Ricardo Ribalda  wrote:

I hate to be the grammar police, but..

> Add two parameter to specify how many times a kexec kernel can be loaded.

   "parameters"

> 
> The sysadmin can set different limits for kexec panic and kexec reboot
> kernels.
> 
> The value can be modified at runtime via sysfs, but only with a value
> smaller than the current one (except -1).
> 
> Signed-off-by: Ricardo Ribalda 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 14 
>  include/linux/kexec.h   |  2 +-
>  kernel/kexec.c  |  2 +-
>  kernel/kexec_core.c | 91 
> -
>  kernel/kexec_file.c |  2 +-
>  5 files changed, 106 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 42af9ca0127e..2b37d6a20747 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2374,6 +2374,20 @@
>   for Movable pages.  "nn[KMGTPE]", "nn%", and "mirror"
>   are exclusive, so you cannot specify multiple forms.
>  
> + kexec_core.load_limit_reboot=
> + kexec_core.load_limit_panic=
> + [KNL]
> + This parameter specifies a limit to the number of times
> + a kexec kernel can be loaded.
> + Format: 
> + -1  = Unlimited.
> + int = Number of times kexec can be called.
> +
> + During runtime, this parameter can be modified with a

> + value smaller than the current one (but not -1).

Perhaps state:
smaller positive value than the current one or if
current is currently -1.

> +
> + Default: -1
> +
>   kgdbdbgp=   [KGDB,HW] kgdb over EHCI usb debug port.
>   Format: [,poll interval]
>   The controller # is the number of the ehci usb debug
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 182e0c11b87b..5daf9990d5b8 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -407,7 +407,7 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage 
> *image);
>  extern struct kimage *kexec_image;
>  extern struct kimage *kexec_crash_image;
>  
> -bool kexec_load_permitted(void);
> +bool kexec_load_permitted(bool crash_image);
>  
>  #ifndef kexec_flush_icache_page
>  #define kexec_flush_icache_page(page)
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index ce1bca874a8d..7aefd134e319 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -193,7 +193,7 @@ static inline int kexec_load_check(unsigned long 
> nr_segments,
>   int result;
>  
>   /* We only trust the superuser with rebooting the system. */
> - if (!kexec_load_permitted())
> + if (!kexec_load_permitted(flags & KEXEC_ON_CRASH))

Note, here we have KEXEC_ON_CRASH (see bottom).

>   return -EPERM;
>  
>   /* Permit LSMs and IMA to fail the kexec */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index a1efc70f4158..adf71f2be3ff 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -952,13 +952,100 @@ static int __init kexec_core_sysctl_init(void)
>  late_initcall(kexec_core_sysctl_init);
>  #endif
>  
> -bool kexec_load_permitted(void)
> +struct kexec_load_limit {
> + /* Mutex protects the limit count. */
> + struct mutex mutex;
> + int limit;
> +};
> +
> +struct kexec_load_limit load_limit_reboot = {

Perhaps make the above static?

> + .mutex = __MUTEX_INITIALIZER(load_limit_reboot.mutex),
> + .limit = -1,
> +};
> +
> +struct kexec_load_limit load_limit_panic = {

static?

> + .mutex = __MUTEX_INITIALIZER(load_limit_panic.mutex),
> + .limit = -1,
> +};
> +
> +static int param_get_limit(char *buffer, const struct kernel_param *kp)
>  {
> + int ret;
> + struct kexec_load_limit *limit = kp->arg;

Looks better if "int ret;" is after the "limit".

> +
> + mutex_lock(>mutex);
> + ret = scnprintf(buffer, PAGE_SIZE, "%i\n", limit->limit);

The above string can be at most "-2147483648\n\0"

Which is 13 characters. Why use PAGE_SIZE. Or scnprintf(), and not just
state:

/* buffer is PAGE_SIZE, much larger than what %i can be */
ret = sprintf(buffer, "%i\n", limit->limit);

> + mutex_unlock(>mutex);
> +
> + return ret;
> +}
> +
> +static int param_set_limit(const char *buffer, const struct kernel_param *kp)
> +{
> + int ret;
> + struct kexec_load_limit *limit = kp->arg;
> + int new_val;
> +
> + ret = kstrtoint(buffer, 0, _val);
> + if (ret)
> + return ret;
> +
> + new_val = max(-1, new_val);

I wonder if anything less than -1 should be invalid.

> +
> + 

Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers

2022-12-13 Thread Steven Rostedt
On Tue, 13 Dec 2022 20:51:07 -0300
"Guilherme G. Piccoli"  wrote:

> > Acked-by: Steven Rostedt (Google) 
> > 
> > -- Steve  
> 
> Hi Steve, since you were so kind in the other patch...decided to ping in
> this one as well heheh

Heh, yeah, I forgot about this one too.

> 
> So, do you think it's possible to pick it for 6.2? No dependency here,
> it's a standalone patch.

I can pull it in.

> 
> Thanks again and sorry for the annoyance!

No, sorry for missing these.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled

2022-11-29 Thread Steven Rostedt
On Tue, 29 Nov 2022 14:44:50 +0100
Philipp Rudo  wrote:

> An alternative approach and sort of compromise I see is to convert
> kexec_load_disabled from a simple on/off switch to a counter on how
> often a kexec load can be made (in practice a tristate on/off/one-shot
> should be sufficient). Ideally the reboot and panic path will
> have separate counters. With that you could for example use
> kexec_load_limit.reboot=0 and kexec_load_limit.panic=1 to disable the
> load of images for reboot while still allow to load a crash kernel
> once. With this you have the flexibility you need while also preventing
> a race where an attacker overwrites your crash kernel before you can
> toggle the switch. What do you think?

I actually like this idea :-)

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled

2022-11-28 Thread Steven Rostedt
On Mon, 28 Nov 2022 17:28:55 +0100
Philipp Rudo  wrote:

> To be honest I don't think we make a progress here at the moment. I
> would like to hear from others what they think about this.

Not sure if you missed my reply.

  https://lore.kernel.org/all/20221128114200.72b3e...@gandalf.local.home/

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/2] kexec: Introduce kexec_reboot_disabled

2022-11-28 Thread Steven Rostedt
On Thu, 24 Nov 2022 16:01:15 +0100
Philipp Rudo  wrote:

> No, I think the implementation is fine. I'm currently only struggling
> to understand what problem kexec_reboot_disabled solves that cannot be
> solved by kexec_load_disabled.

Hi Philipp,

Thanks for working with us on this.

Let me try to explain our use case. We want kexec/kdump enabled, but we
really do not want kexec used for any other purpose. We must have the kexec
kernel loaded at boot up and not afterward.

Your recommendation of:

  kexec -p dump_kernel
  echo 1 > /proc/sys/kernel/kexec_load_disabled

can work, and we will probably add it. But we are taking the paranoid
approach, and what I learned in security 101 ;-) and that is, only open up
the minimal attack surface as possible.

Yes, it's highly unlikely that the above would crash. But as with most
security vulnerabilities, it's not going to be an attacker that creates a
new gadget here, but probably another script in the future that causes this
to be delayed or something, and a new window of opportunity will arise for
an attacker. Maybe, that new window only works for non panic kernels. Yes,
this is a contrived scenario, but the work vs risk is very low in adding
this feature.

Perhaps the attack surface that a reboot kexec could be, is that the
attacker gets the ability at boot up to load the kexec for reboot and not panic.
Then the attack must wait for the victim to reboot their machine before
they have access to the new kernel. Again, I admit this is contrived, but
just because I can't think of a real situation that this could be a problem
doesn't mean that one doesn't exist.

In other words, if we never want to allow a kexec reboot, why allow it at
all from the beginning? The above allows it, until we don't. That alone
makes us nervous. Whereas this patch is rather trivial and doesn't add
complexity.

Thanks for your time, we appreciate it.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC] kexec: Freeze processes before kexec

2022-11-16 Thread Steven Rostedt
On Wed, 16 Nov 2022 19:56:24 +
"Joel Fernandes (Google)"  wrote:

> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1175,6 +1175,12 @@ int kernel_kexec(void)
>   } else
>  #endif
>   {
> + error = freeze_processes();
> + if (error) {
> + error = -EBUSY;
> + goto Unlock;
> + }

If this is the path of a kernel panic, do we really want to check the
return error of freeze_processes()? We are panicing, there's not much more
we can do.

-- Steve


> +
>   kexec_in_progress = true;
>   kernel_restart_prepare("kexec reboot");
>   migrate_to_reboot_cpu();

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers

2022-10-21 Thread Steven Rostedt
On Thu, 20 Oct 2022 18:53:43 -0300
"Guilherme G. Piccoli"  wrote:

> Could you pick it in your tree? Or do you prefer that I re-send as a
> solo patch, with your ACK?

I wasn't sure there were any dependencies on this. If not, I can take it.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3 06/11] tracing: Improve panic/die notifiers

2022-10-21 Thread Steven Rostedt
On Fri, 19 Aug 2022 19:17:26 -0300
"Guilherme G. Piccoli"  wrote:

> Currently the tracing dump_on_oops feature is implemented through
> separate notifiers, one for die/oops and the other for panic;
> given they have the same functionality, let's unify them.
> 
> Also improve the function comment and change the priority of the
> notifier to make it execute earlier, avoiding showing useless trace
> data (like the callback names for the other notifiers); finally,
> we also removed an unnecessary header inclusion.
> 
> Cc: Petr Mladek 
> Cc: Sergei Shtylyov 
> Cc: Steven Rostedt 
> Signed-off-by: Guilherme G. Piccoli 

Sorry for the late reply.

Acked-by: Steven Rostedt (Google) 

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 08/13] tracing: Improve panic/die notifiers

2022-08-16 Thread Steven Rostedt
On Tue, 16 Aug 2022 10:57:20 -0400
Alan Stern  wrote:

> > static int trace_die_panic_handler(struct notifier_block *self,
> > unsigned long ev, void *unused)
> > {
> > if (!ftrace_dump_on_oops)
> > return NOTIFY_DONE;
> > 
> > /* The die notifier requires DIE_OOPS to trigger */
> > if (self == _die_notifier && ev != DIE_OOPS)
> > return NOTIFY_DONE;
> > 
> > ftrace_dump(ftrace_dump_on_oops);
> > 
> > return NOTIFY_DONE;
> > }  
> 
> Or better yet:
> 
>   if (ftrace_dump_on_oops) {
> 
>   /* The die notifier requires DIE_OOPS to trigger */
>   if (self != _die_notifier || ev == DIE_OOPS)
>   ftrace_dump(ftrace_dump_on_oops);
>   }
>   return NOTIFY_DONE;
> 

That may be more consolidated but less easy to read and follow. This is far
from a fast path.

As I maintain this bike-shed, I prefer the one I suggested ;-)

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 08/13] tracing: Improve panic/die notifiers

2022-08-16 Thread Steven Rostedt
On Tue, 19 Jul 2022 16:53:21 -0300
"Guilherme G. Piccoli"  wrote:


Sorry for the late review, but this fell to the bottom of my queue :-/

> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused)
> +{
> + if (!ftrace_dump_on_oops)
> + goto out;
> +
> + if (self == _die_notifier && ev != DIE_OOPS)
> + goto out;

I really hate gotos that are not for clean ups.

> +
> + ftrace_dump(ftrace_dump_on_oops);
> +
> +out:
> + return NOTIFY_DONE;
> +}
> +

Just do:

static int trace_die_panic_handler(struct notifier_block *self,
unsigned long ev, void *unused)
{
if (!ftrace_dump_on_oops)
return NOTIFY_DONE;

/* The die notifier requires DIE_OOPS to trigger */
if (self == _die_notifier && ev != DIE_OOPS)
return NOTIFY_DONE;

ftrace_dump(ftrace_dump_on_oops);

return NOTIFY_DONE;
}


Thanks,

Other than that, Acked-by: Steven Rostedt (Google) 

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-05-11 Thread Steven Rostedt
On Thu, 28 Apr 2022 09:01:13 +0800
Xiaoming Ni  wrote:

> > +#ifdef CONFIG_DEBUG_NOTIFIERS
> > +   {
> > +   char sym_name[KSYM_NAME_LEN];
> > +
> > +   pr_info("notifiers: registered %s()\n",
> > +   notifier_name(n, sym_name));
> > +   }  
> 
> Duplicate Code.
> 
> Is it better to use __func__ and %pS?
> 
> pr_info("%s: %pS\n", __func__, n->notifier_call);
> 
> 
> > +#endif

Also, don't sprinkle #ifdef in C code. Instead:

if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
pr_info("notifers: regsiter %ps()\n",
n->notifer_call);


Or define a print macro at the start of the C file that is a nop if it is
not defined, and use the macro.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-11 Thread Steven Rostedt
On Wed, 27 Apr 2022 19:49:17 -0300
"Guilherme G. Piccoli"  wrote:

> Currently we don't have a way to check if there are dumpers set,
> except counting the list members maybe. This patch introduces a very
> simple helper to provide this information, by just keeping track of
> registered/unregistered kmsg dumpers. It's going to be used on the
> panic path in the subsequent patch.

FYI, it is considered "bad form" to reference in the change log "this
patch". We know this is a patch. The change log should just talk about what
is being done. So can you reword your change logs (you do this is almost
every patch). Here's what I would reword the above to be:

 Currently we don't have a way to check if there are dumpers set, except
 perhaps by counting the list members. Introduce a very simple helper to
 provide this information, by just keeping track of registered/unregistered
 kmsg dumpers. This will simplify the refactoring of the panic path.


-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-11 Thread Steven Rostedt
On Tue, 10 May 2022 13:38:39 +0200
Petr Mladek  wrote:

> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.

I believe that Peter Zijlstra had a special spin lock for NMIs or early
printk, where it would not block if the lock was held on the same CPU. That
is, if an NMI happened and paniced while this lock was held on the same
CPU, it would not deadlock. But it would block if the lock was held on
another CPU.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-30 Thread Steven Rostedt
On Fri, 29 Apr 2022 10:46:35 -0300
"Guilherme G. Piccoli"  wrote:

> Thanks Sergei and Steven, good idea! I thought about the switch change
> you propose, but I confess I got a bit confused by the "fallthrough"
> keyword - do I need to use it?

No. The fallthrough keyword is only needed when there's code between case
labels. As it is very common to list multiple cases for the same code path.
That is:

case DIE_OOPS:
case PANIC_NOTIFIER:
do_dump = 1;
break;

Does not need a fall through label, as there's no code between the DIE_OOPS
and the PANIC_NOTIFIER. But if you had:

case DIE_OOPS:
x = true;
case PANIC_NOTIFIER:
do_dump = 1;
break;

Then you do.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-30 Thread Steven Rostedt
On Fri, 29 Apr 2022 12:22:44 +0300
Sergei Shtylyov  wrote:

> > +   switch (ev) {
> > +   case DIE_OOPS:
> > +   do_dump = 1;
> > +   break;
> > +   case PANIC_NOTIFIER:
> > +   do_dump = 1;
> > +   break;  
> 
>Why not:
> 
>   case DIE_OOPS:
>   case PANIC_NOTIFIER:
>   do_dump = 1;
>       break;

Agreed.

Other than that.

Acked-by: Steven Rostedt (Google) 

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 00/13] Add build ID to stacktraces

2021-05-11 Thread Steven Rostedt
On Tue, 11 May 2021 12:36:06 +
David Laight  wrote:

> >  x1 : ff93fef15788 x0 : ffe3622352e0
> >  Call trace:
> >   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> >   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> >   full_proxy_write+0x74/0xa4  
> 
> Is there any way to get it to print each module ID only once?

If there's a trivial way to do that, then perhaps it should be done, but for
now, this patch series isn't as obnoxious as the previous versions. It only
affects stack traces, and I'm fine with that.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-17 Thread Steven Rostedt
On Wed, 17 Feb 2021 12:40:43 -0600
john.p.donne...@oracle.com wrote:

> Hello.
> 
> Ping.
> 
> Can we get this reviewed and staged ?
> 
> Thank you.

Andrew,

Seems you are the only one pushing patches in for kexec/crash. Is this
maintained by anyone?

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH] printk: _printk_rb_static_dict can be static

2020-06-19 Thread Steven Rostedt
On Fri, 19 Jun 2020 08:49:21 +0200
John Ogness  wrote:

> > +++ b/kernel/printk/printk.c
> > @@ -434,7 +434,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
> >   */
> >  #define PRB_AVGBITS 5  /* 32 character average length */
> >  
> > -_DECLARE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS,
> > +static _DECLARE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - 
> > PRB_AVGBITS,
> >   PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]);  
> 
> _DECLARE_PRINTKRB declares multiple variables, so this patch will not
> work as intended. I would like to declare the variables static but am
> not sure how best to go about it.
> 
> In the Linux source I see examples of macros just desclaring the
> variables static. And I see examples of the macros providing a parameter
> where the "static" keyword can be specified.
> 
> Since the ringbuffer was created exclusively to serve printk, I would
> prefer to just have _DECLARE_PRINTKRB (and DECLARE_PRINTKRB) declare all
> the variables as static.

Haven written macros that do such things, I agree with your last
statement. Just have the macro declare all the variables static.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-06 Thread Steven Rostedt
On Wed, 05 Feb 2020 16:48:32 +0100
John Ogness  wrote:

> The quick fixup:
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d0d24ee1d1f4..5ad67ff60cd9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -883,6 +883,7 @@ static int devkmsg_open(struct inode *inode, struct file 
> *file)
>   user->record.text_buf_size = sizeof(user->text_buf);
>   user->record.dict_buf = >dict_buf[0];
>   user->record.dict_buf_size = sizeof(user->dict_buf);
> + user->record.text_line_count = NULL;
>  
>   logbuf_lock_irq();
>   user->seq = prb_first_seq(prb);

FYI, I used your patch set to test out Konstantin's new get-lore-mbox
script, and then applied them. It locked up on boot up as well, and
applying this appears to fix it.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] printk: add lockless buffer

2020-01-28 Thread Steven Rostedt
On Tue, 28 Jan 2020 17:25:47 +0106
John Ogness  wrote:

> diff --git a/kernel/printk/printk_ringbuffer.c 
> b/kernel/printk/printk_ringbuffer.c
> new file mode 100644
> index ..796257f226ee
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -0,0 +1,1370 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "printk_ringbuffer.h"
> +
> +/**
> + * DOC: printk_ringbuffer overview
> + *
> + * Data Structure
> + * --
> + * The printk_ringbuffer is made up of 3 internal ringbuffers::
> + *
> + *   * desc_ring:  A ring of descriptors. A descriptor contains all 
> record
> + * meta data (sequence number, timestamp, loglevel, etc.)
> + * as well as internal state information about the record
> + * and logical positions specifying where in the other
> + * ringbuffers the text and dictionary strings are
> + * located.
> + *
> + *   * text_data_ring: A ring of data blocks. A data block consists of an
> + * unsigned long integer (ID) that maps to a desc_ring
> + * index followed by the text string of the record.
> + *
> + *   * dict_data_ring: A ring of data blocks. A data block consists of an
> + * unsigned long integer (ID) that maps to a desc_ring
> + * index followed by the dictionary string of the record.
> + *
> + * Implementation
> + * --
> + *
> + * ABA Issues
> + * ~~
> + * To help avoid ABA issues, descriptors are referenced by IDs (index values
> + * with tagged states) and data blocks are referenced by logical positions
> + * (index values with tagged states). However, on 32-bit systems the number
> + * of tagged states is relatively small such that an ABA incident is (at
> + * least theoretically) possible. For example, if 4 million maximally sized

4 million? I'm guessing that maximally sized printk messages are 1k?

Perhaps say that, otherwise one might think this is a mistake. "4
million maximally sized (1k) printk messages"

> + * printk messages were to occur in NMI context on a 32-bit system, the
> + * interrupted task would not be able to recognize that the 32-bit integer
> + * wrapped and thus represents a different data block than the one the
> + * interrupted task expects.
> + *
> + * To help combat this possibility, additional state checking is performed
> + * (such as using cmpxchg() even though set() would suffice). These extra
> + * checks will hopefully catch any ABA issue that a 32-bit system might
> + * experience.
> + *
[..]

> + * Usage
> + * -
> + * Here are some simple examples demonstrating writers and readers. For the
> + * examples a global ringbuffer (test_rb) is available (which is not the
> + * actual ringbuffer used by printk)::
> + *
> + *   DECLARE_PRINTKRB(test_rb, 15, 5, 3);
> + *
> + * This ringbuffer allows up to 32768 records (2 ^ 15) and has a size of
> + * 1 MiB (2 ^ 20) for text data and 256 KiB (2 ^ 18) for dictionary data.

 (2 ^ (15 + 5)) ... (2 ^ (15 + 3)) ?

I'll play around more with this this week. But so far it looks good.

-- Steve

> + *
> + * Sample writer code::
> + *
> + *   struct prb_reserved_entry e;
> + *   struct printk_record r;
> + *
> + *   // specify how much to allocate
> + *   r.text_buf_size = strlen(textstr) + 1;
> + *   r.dict_buf_size = strlen(dictstr) + 1;
> + *
> + *   if (prb_reserve(, _rb, )) {
> + *   snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
> + *
> + *   // dictionary allocation may have failed
> + *   if (r.dict_buf)
> + *   snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
> + *
> + *   r.info->ts_nsec = local_clock();
> + *
> + *   prb_commit();
> + *   }
> + *

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

2019-12-03 Thread Steven Rostedt
On Tue, 3 Dec 2019 10:17:21 +0900
Sergey Senozhatsky  wrote:

> > > BTW: If I am counting correctly. The ABA problem would happen when
> > > exactly 2^30 (1G) messages is written in the mean time.  
> > 
> > All the ringbuffer code assumes that the use of index numbers handles
> > the ABA problem (i.e. there must not be 1 billion printk's within an
> > NMI). If we want to support 1 billion+ printk's within an NMI, then
> > perhaps the index number should be increased. For 64-bit systems it
> > would be no problem to go to 62 bits. For 32-bit systems, I don't know
> > how well the 64-bit atomic operations are supported.  
> 
> ftrace dumps from NMI (DUMP_ALL type ftrace_dump_on_oops on a $BIG
> machine)? 1G seems large enough, but who knows.

ftrace dump from NMI is the most likely case to hit this, but when that
happens, you are in debugging mode, and the system usually becomes
unreliable at this moment. I agree with Petr, that we should not
complicate the code more to handle this theoretical condition.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-18 Thread Steven Rostedt
On Thu, 18 Jan 2018 10:02:17 -0800
Andi Kleen  wrote:

> Dave Young  writes:
> > printk("%sHardware name: %s\n",
> >log_lvl, dump_stack_arch_desc_str);
> > +   if (kexec_crash_loaded())
> > +   printk("%skdump kernel loaded\n", log_lvl);  
> 
> Oops/warnings are getting longer and longer, often scrolling away
> from the screen, and if the kernel crashes backscroll does not work
> anymore, so precious information is lost.
> 
> Can you merge it with some other line?
> 
> Just a [KDUMP] or so somewhere should be good enough.

Or perhaps we should add it as a TAINT. Not all taints are bad.

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread Steven Rostedt
On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> Currently, panic() and crash_kexec() can be called at the same time.
> For example (x86 case):
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> nmi_shootdown_cpus() // stop other cpus
> 
> CPU 1:
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> smp_send_stop() // stop other cpus
> infinite loop
> 
> If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> fails.

So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

> 
> In another case:
> 
> CPU 0:
>   oops_end()
> crash_kexec()
>   mutex_trylock() // acquired
> 
> io_check_error()
>   panic()
> crash_kexec()
>   mutex_trylock() // failed to acquire
> infinite loop
> 
> Clearly, this is an undesirable result.

I'm trying to see how this patch fixes this case.

> 
> To fix this problem, this patch changes crash_kexec() to exclude
> others by using atomic_t panic_cpu.
> 
> V5:
> - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> - Replace atomic_xchg() with atomic_set() in crash_kexec() because
>   it is used as a release operation and there is no need of memory
>   barrier effect.  This change also removes an unused value warning
> 
> V4:
> - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
>   instead of checking if panic_cpu is the current cpu or not
> 
> V2:
> - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
>   to exclude concurrent accesses
> - Don't introduce no-lock version of crash_kexec()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> ---
>  include/linux/kexec.h |2 ++
>  kernel/kexec_core.c   |   26 +-
>  kernel/panic.c|4 ++--
>  3 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d140b1e..7b68d27 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> *image,
> unsigned int size, bool get_value);
>  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
>const char *name);
> +extern void __crash_kexec(struct pt_regs *);
>  extern void crash_kexec(struct pt_regs *);
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
> @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> *ehdr, Elf_Shdr *sechdrs,
>  #else /* !CONFIG_KEXEC_CORE */
>  struct pt_regs;
>  struct task_struct;
> +static inline void __crash_kexec(struct pt_regs *regs) { }
>  static inline void crash_kexec(struct pt_regs *regs) { }
>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>  #define kexec_in_progress false
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 11b64a6..9d097f5 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -853,7 +853,8 @@ struct kimage *kexec_image;
>  struct kimage *kexec_crash_image;
>  int kexec_load_disabled;
>  
> -void crash_kexec(struct pt_regs *regs)
> +/* No panic_cpu check version of crash_kexec */
> +void __crash_kexec(struct pt_regs *regs)
>  {
>   /* Take the kexec_mutex here to prevent sys_kexec_load
>* running on one cpu from replacing the crash kernel
> @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
>   }
>  }
>  
> +void crash_kexec(struct pt_regs *regs)
> +{
> + int old_cpu, this_cpu;
> +
> + /*
> +  * Only one CPU is allowed to execute the crash_kexec() code as with
> +  * panic().  Otherwise parallel calls of panic() and crash_kexec()
> +  * may stop each other.  To exclude them, we use panic_cpu here too.
> +  */
> + this_cpu = raw_smp_processor_id();
> + old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> + if (old_cpu == -1) {
> + /* This is the 1st CPU which comes here, so go ahead. */
> + __crash_kexec(regs);
> +
> + /*
> +  * Reset panic_cpu to allow another panic()/crash_kexec()
> +  * call.
> +  */
> + atomic_set(_cpu, -1);
> + }
> +}
> +
>  size_t crash_get_memory_size(void)
>  {
>   size_t size = 0;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4fce2be..5d0b807 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -138,7 +138,7 @@ void panic(const char *fmt, ...)
>* the "crash_kexec_post_notifiers" option to the kernel.
>*/
>   if (!crash_kexec_post_notifiers)
> - crash_kexec(NULL);
> + __crash_kexec(NULL);

Why call 

Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Steven Rostedt
On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> 
> > +*/
> > +   while (!raw_spin_trylock(_reason_lock))
> > +   poll_crash_ipi_and_callback(regs);
> 
> Waaait a minute: so if we're getting NMIs broadcasted on every core but
> we're *not* crash dumping, we will run into here too. This can't be
> right. :-\

This only does something if crash_ipi_done is set, which means you are killing
the box. But perhaps a comment that states that here would be useful, or maybe
just put in the check here. There's no need to make it depend on SMP, as
raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
hit.

-- Steve


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Steven Rostedt
On Tue, 24 Nov 2015 10:05:10 -0500
Steven Rostedt <rost...@goodmis.org> wrote:

> cmpxchg(_cpu, -1, 0) != 0
> 
> returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0

That was suppose to be "thus -1 != 0".

-- Steve

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Steven Rostedt
On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb0..480a4fd 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \

Hmm,

What happens if:

CPU 0:  CPU 1:
--  --
nmi_panic();

nmi_panic();

nmi_panic();

?

cmpxchg(_cpu, -1, 0) != 0

returns -1 for cpu 0, thus 0 != 0, and sets panic_cpu to 0


cmpxchg(_cpu, -1, 1) != 1

returns 0, and then it too panics, but does not set panic_cpu to 1

Now you have your external NMI triggering on CPU 1

cmpxchg(_cpu, -1, 1) != 1

returns 0 again, and you call panic again within the panic of CPU 1.

Is this OK?

Perhaps you want a per cpu bitmask, and do a test_and_set() on the CPU. That
would prevent any CPU from rerunning a panic() twice on any CPU.

-- Steve

> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4579dbb..24ee2ea 100644

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore

2008-08-11 Thread Steven Rostedt

On Mon, 11 Aug 2008, Huang Ying wrote:

 Hi, Steven,
 
 On Fri, 2008-08-08 at 10:30 -0400, Steven Rostedt wrote:
 [...]
  The only problem with this approach is what happens if the user changes 
  the enabled in between these two calls. This would make ftrace 
  inconsistent.
  
  I have a patch from the -rt tree that handles what you want. It is 
  attached below. Not sure how well it will apply to mainline.
  
  I really need to go through the rt patch set and start submitting a bunch 
  of clean-up/fixes to mainline. We've been meaning to do it, just have been 
  distracted :-(
 
 Your version is better in general sense. Thank you very much!
 
 But in this specific situation of kexec/kjump. The execution environment
 is that other CPUs are disabled, local irq is disabled, and it is not
 permitted to switch to other process. But it is safe and sufficient to
 use non-locked version here.
 
 So to satisfy both demands, I think it is better to provide both
 version, locked and non-locked. What do you think about that?

Sounds good,

I'm looking forward to the patch ;-)

-- Steve


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore

2008-08-08 Thread Steven Rostedt


On Fri, 8 Aug 2008, Vivek Goyal wrote:

 On Fri, Aug 08, 2008 at 02:52:48PM +0800, Huang Ying wrote:
  Add ftrace_enabled_save/restore, used to disable ftrace for a
  while. This is used by kexec jump.
  
  Signed-off-by: Huang Ying [EMAIL PROTECTED]
  
 
 
 CCing Steven Rostedt for ftrace related changes.

Thanks,


 
  ---
   include/linux/ftrace.h |   18 ++
   1 file changed, 18 insertions(+)
  
  --- a/include/linux/ftrace.h
  +++ b/include/linux/ftrace.h
  @@ -98,6 +98,24 @@ static inline void tracer_disable(void)
   #endif
   }
   
  +static inline int ftrace_enabled_save(void)
  +{
  +#ifdef CONFIG_FTRACE
  +   int saved_ftrace_enabled = ftrace_enabled;
  +   ftrace_enabled = 0;
  +   return saved_ftrace_enabled;
  +#else
  +   return 0;
  +#endif
  +}
  +
  +static inline void ftrace_enabled_restore(int enabled)
  +{
  +#ifdef CONFIG_FTRACE
  +   ftrace_enabled = enabled;
  +#endif
  +}
  +
   #ifdef CONFIG_FRAME_POINTER
   /* TODO: need to fix this for ARM */
   # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))

The only problem with this approach is what happens if the user changes 
the enabled in between these two calls. This would make ftrace 
inconsistent.

I have a patch from the -rt tree that handles what you want. It is 
attached below. Not sure how well it will apply to mainline.

I really need to go through the rt patch set and start submitting a bunch 
of clean-up/fixes to mainline. We've been meaning to do it, just have been 
distracted :-(

-- Steve

From: Steven Rostedt [EMAIL PROTECTED]
Subject: ftrace: cpu hotplug fix

Peter Zijlstra found that taking down and bringing up a new CPU caused
ftrace to crash the kernel. This was due to some arch calls that were
being traced by the function tracer before the smp_processor_id was set
up. Since the function tracer uses smp_processor_id it caused a triple
fault.

Instead of adding notrace all over the architecture code to prevent
this problem, it is easier to simply disable the function tracer
when bringing up a new CPU.

Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/linux/ftrace.h|   11 ---
 kernel/cpu.c  |9 +
 kernel/trace/ftrace.c |   23 +++
 kernel/trace/trace_irqsoff.c  |3 +++
 kernel/trace/trace_sched_wakeup.c |2 +-
 5 files changed, 44 insertions(+), 4 deletions(-)

Index: linux-2.6.26/include/linux/ftrace.h
===
--- linux-2.6.26.orig/include/linux/ftrace.h
+++ linux-2.6.26/include/linux/ftrace.h
@@ -33,10 +33,15 @@ void clear_ftrace_function(void);
 
 extern void ftrace_stub(unsigned long a0, unsigned long a1);
 
+void ftrace_enable(void);
+void ftrace_disable(void);
+
 #else /* !CONFIG_FTRACE */
-# define register_ftrace_function(ops) do { } while (0)
-# define unregister_ftrace_function(ops) do { } while (0)
-# define clear_ftrace_function(ops) do { } while (0)
+# define register_ftrace_function(ops) do { } while (0)
+# define unregister_ftrace_function(ops)   do { } while (0)
+# define clear_ftrace_function(ops)do { } while (0)
+# define ftrace_enable()   do { } while (0)
+# define ftrace_disable()  do { } while (0)
 #endif /* CONFIG_FTRACE */
 
 #ifdef CONFIG_DYNAMIC_FTRACE
Index: linux-2.6.26/kernel/cpu.c
===
--- linux-2.6.26.orig/kernel/cpu.c
+++ linux-2.6.26/kernel/cpu.c
@@ -14,6 +14,7 @@
 #include linux/kthread.h
 #include linux/stop_machine.h
 #include linux/mutex.h
+#include linux/ftrace.h
 
 /* Serializes the updates to cpu_online_map, cpu_present_map */
 static DEFINE_MUTEX(cpu_add_remove_lock);
@@ -300,8 +301,16 @@ static int __cpuinit _cpu_up(unsigned in
goto out_notify;
}
 
+   /*
+* Disable function tracing while bringing up a new CPU.
+* We don't want to trace functions that can not handle a
+* smp_processor_id() call.
+*/
+   ftrace_disable();
+
/* Arch-specific enabling code. */
ret = __cpu_up(cpu);
+   ftrace_enable();
if (ret != 0)
goto out_notify;
BUG_ON(!cpu_online(cpu));
Index: linux-2.6.26/kernel/trace/ftrace.c
===
--- linux-2.6.26.orig/kernel/trace/ftrace.c
+++ linux-2.6.26/kernel/trace/ftrace.c
@@ -151,6 +151,29 @@ static int __unregister_ftrace_function(
return ret;
 }
 
+static int save_ftrace_enabled;
+
+void ftrace_disable(void)
+{
+   mutex_lock(ftrace_sysctl_lock);
+
+   save_ftrace_enabled = ftrace_enabled;
+   ftrace_enabled = 0;
+}
+
+void ftrace_enable(void)
+{
+   /* ftrace_enable must be paired with ftrace_disable */
+   if (!mutex_is_locked(ftrace_sysctl_lock)) {
+   WARN_ON(1);
+   return;
+   }
+
+   ftrace_enabled