from:"Sergey Senozhatsky"

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-25 Thread Sergey Senozhatsky

On (22/02/25 13:16), Petr Mladek wrote:
> 
> The result looks good to me. We could do it as a followup patch.
> 

Yup, sounds good to me. Thanks.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-24 Thread Sergey Senozhatsky

Correction:

On (22/02/25 14:18), Sergey Senozhatsky wrote:
> > 
> > Or just split panic_printk_sys_info() into these two functions.
> 
> Agreed. I also tend to think that panic_printk_sys_info() is needed anyway,

 split

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-24 Thread Sergey Senozhatsky

On (22/02/24 15:33), Petr Mladek wrote:
> > My bad! I did not spot the `return` at the end of the new branch.
> > 
> > +   if (console_flush) {
> > +   if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > +   console_flush_on_panic(CONSOLE_REPLAY_ALL);
> > +   return;
> > +   }
> > 
> > Hmm. Yeah, well, that's a bit of a tricky interface now
> > 
> > panic()
> > // everything (if corresponding bits set), no console flush
> > panic_print_sys_info(false)
> > ...
> > // console flush only if corresponding bit set
> > panic_print_sys_info(true)
> 
> I agree that self-explaining names are always better than true/false.
> It is pity that replay the log is handled in panic_print at all.
> 
> I sometimes hide these tricks into wrappers. We could rename:
> 
> panic_printk_sys_info() -> panic_print_handler()
> 
> and add wrappers:
> 
> void panic_print_sys_info()
> {
>   panic_printk_handler(false);
> }
> 
> void panic_print_log_replay()
> {
>   panic_printk_handler(true);
> }
> 
> Or just split panic_printk_sys_info() into these two functions.

Agreed. I also tend to think that panic_printk_sys_info() is needed anyway,
just because now we do

debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
console_flush_on_panic(CONSOLE_REPLAY_ALL);

It probably would be better if we do

debug_locks_off();
if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
console_flush_on_panic(CONSOLE_REPLAY_ALL);
else
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

instead.

IOW move console_flush_on_panic() handling out of panic_print_sys_info().
console_flush_on_panic() isn't really related to "print sys info" stuff
that panic_print_sys_info() does.

Something like this may be:

---
 static void panic_print_sys_info(void)
 {
-   if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
-   console_flush_on_panic(CONSOLE_REPLAY_ALL);
-
if (panic_print & PANIC_PRINT_ALL_CPU_BT)
trigger_all_cpu_backtrace();
 
@@ -196,6 +193,23 @@ static void panic_print_sys_info(void)
ftrace_dump(DUMP_ALL);
 }
 
+static void panic_console_flush(void)
+{
+   /*
+* We may have ended up stopping the CPU holding the lock (in
+* smp_send_stop()) while still having some valuable data in the console
+* buffer.  Try to acquire the lock then release it regardless of the
+* result.  The release will also print the buffers out.  Locks debug
+* should be disabled to avoid reporting bad unlock balance when
+* panic() is not being callled from OOPS.
+*/
+   debug_locks_off();
+   if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+   console_flush_on_panic(CONSOLE_REPLAY_ALL);
+   else
+   console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+}
+
 /**
  * panic - halt the system
  * @fmt: The text string to print
@@ -329,17 +343,7 @@ void panic(const char *fmt, ...)
 #endif
console_unblank();
 
-   /*
-* We may have ended up stopping the CPU holding the lock (in
-* smp_send_stop()) while still having some valuable data in the console
-* buffer.  Try to acquire the lock then release it regardless of the
-* result.  The release will also print the buffers out.  Locks debug
-* should be disabled to avoid reporting bad unlock balance when
-* panic() is not being callled from OOPS.
-*/
-   debug_locks_off();
-   console_flush_on_panic(CONSOLE_FLUSH_PENDING);
-
+   panic_console_flush();
panic_print_sys_info();
 
if (!panic_blink)
---

> > If everyone is fine then OK.
> > 
> > But I _personally_ would look into changing this to something like this:
> > 
> > #define EARLY_PANIC_MASK (PANIC_PRINT_FOO | PANIC_PRINT_BAR | ...)
> > #define LATE_PANIC_MASK (PANIC_PRINT_ALL_PRINTK_MSG)
> 
> These lists cause merge and backporting conflicts. I vote to avoid
> this approach ;-)

OK :)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-23 Thread Sergey Senozhatsky

Hi,

On (22/02/23 10:15), Guilherme G. Piccoli wrote:
> On 22/02/2022 22:27, Sergey Senozhatsky wrote:
> > [...]
> > Hmm. Yeah, well, that's a bit of a tricky interface now
> > 
> > panic()
> > // everything (if corresponding bits set), no console flush
> > panic_print_sys_info(false)
> > ...
> > // console flush only if corresponding bit set
> > panic_print_sys_info(true)
> > 
> > 
> > 
> > If everyone is fine then OK.
> > 
> > But I _personally_ would look into changing this to something like this:
> > 
> > #define EARLY_PANIC_MASK (PANIC_PRINT_FOO | PANIC_PRINT_BAR | ...)
> > #define LATE_PANIC_MASK (PANIC_PRINT_ALL_PRINTK_MSG)
> > panic()
> > panic_print_sys_info(panic_print & EARLY_PANIC_MASK)
> > ...
> > panic_print_sys_info(panic_print & LATE_PANIC_MASK)
> 
> Hi Sergey, notice that panic_print_sys_info() currently doesn't have a
> parameter! The prototype (without this patch) is:

Correct.

> static void panic_print_sys_info(void);
>
> So, it consumes the "panic_print" global variable (which matches the
> command-line parameter / sysctl), hence to implement your suggestion
> either we need a refactor in panic_print_sys_info(), adding a parameter

Correct. That's the idea. Since you are already adding a parameter,
what I'm talking is turning that parameter from true/false to something
more verbose.

> (more or less what the patch is already doing, but with a bit more
> changes) or we override the global variable twice in panic(), before the
> function calls.

We don't need to overwrite the global var. We pass "permitted bits at
this stage of panic" mask to panic_print_sys_info(). The global var
stays intact.

> As you said, it's possible and a matter of personal coding style. I'd be
> fine if more people ask that, but if everyone is fine with the current
> implementation, I'd rather get this patch merged as is, since we need it
> and couldn't even make it for 5.17 heh

Sure, works for me.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-23 Thread Sergey Senozhatsky

On (22/02/15 17:14), Petr Mladek wrote:
> Makes sense and looks good to me.
> 
> Reviewed-by: Petr Mladek 

FWIW

Reviewed-by: Sergey Senozhatsky 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-22 Thread Sergey Senozhatsky

On (22/02/22 11:08), Guilherme G. Piccoli wrote:
> Hi Sergey, thanks for your feedback. So, personally I prefer having the
> flag - for me it's clear, it's just a matter of reading the prototype -
> either we print the info _or_ we console_flush.
> 
> But let's see if others have a preference - if the preference is to use
> the bitmask as you suggest, we can do it in a next version.

Sounds good to me, thanks.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-22 Thread Sergey Senozhatsky

On (22/02/22 11:10), Guilherme G. Piccoli wrote:
> On 21/02/2022 23:06, Sergey Senozhatsky wrote:
> > On (22/02/14 11:13), Guilherme G. Piccoli wrote:
> > [...]
> > By additional panic_print messages you mean that panic_print_sys_info()
> > will print everything (except PANIC_PRINT_ALL_PRINTK_MSG) twice?
> > 
> > Do we really need to dump everything twice? show_mem(), show_state(),
> > ftrace_dump(DUMP_ALL). That's quite a bit of extra data.
> > 
> 
> Oh no, we don't print everything twice, that'd be insane heh

My bad! I did not spot the `return` at the end of the new branch.

+   if (console_flush) {
+   if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+   console_flush_on_panic(CONSOLE_REPLAY_ALL);
+   return;
+   }

Hmm. Yeah, well, that's a bit of a tricky interface now

panic()
// everything (if corresponding bits set), no console flush
panic_print_sys_info(false)
...
// console flush only if corresponding bit set
panic_print_sys_info(true)



If everyone is fine then OK.

But I _personally_ would look into changing this to something like this:

#define EARLY_PANIC_MASK (PANIC_PRINT_FOO | PANIC_PRINT_BAR | ...)
#define LATE_PANIC_MASK (PANIC_PRINT_ALL_PRINTK_MSG)
panic()
panic_print_sys_info(panic_print & EARLY_PANIC_MASK)
...
panic_print_sys_info(panic_print & LATE_PANIC_MASK)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-21 Thread Sergey Senozhatsky

A trivial typo:

On (22/02/22 11:06), Sergey Senozhatsky wrote:
> @@ -286,6 +289,8 @@ void panic(const char *fmt, ...)
>*/
>   atomic_notifier_call_chain(_notifier_list, 0, buf);
> 
> 
> + panic_print_sys_info(panic_print & ~PANIC_PRINT_ALL_PRINTK_MSG);
> 
> 
>   kmsg_dump(KMSG_DUMP_PANIC);
>   
>   /*
> @@ -316,7 +321,7 @@ void panic(const char *fmt, ...)
>   debug_locks_off();
>   console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> 

+   panic_print_sys_info(panic_print & PANIC_PRINT_ALL_PRINTK_MSG);



I also wonder if we want to CONSOLE_FLUSH_PENDING when
PANIC_PRINT_ALL_PRINTK_MSG set.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-21 Thread Sergey Senozhatsky

On (22/02/14 11:13), Guilherme G. Piccoli wrote:
> 
> The additional messages from panic_print could overwrite the oldest
> messages when the buffer is full. The only reasonable solution is to
> use a large enough log buffer, hence we added an advice into the kernel
> parameters documentation about that.

By additional panic_print messages you mean that panic_print_sys_info()
will print everything (except PANIC_PRINT_ALL_PRINTK_MSG) twice?

Do we really need to dump everything twice? show_mem(), show_state(),
ftrace_dump(DUMP_ALL). That's quite a bit of extra data.

Can instead this be something like (?):

@@ -286,6 +289,8 @@ void panic(const char *fmt, ...)
 */
atomic_notifier_call_chain(_notifier_list, 0, buf);


+   panic_print_sys_info(panic_print & ~PANIC_PRINT_ALL_PRINTK_MSG);


kmsg_dump(KMSG_DUMP_PANIC);
  
/*
@@ -316,7 +321,7 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);


+   panic_print_sys_info(PANIC_PRINT_ALL_PRINTK_MSG);


if (!panic_blink)
panic_blink = no_blink;

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V6] panic: Move panic_print before kmsg dumpers

2022-02-21 Thread Sergey Senozhatsky

On (22/02/14 11:13), Guilherme G. Piccoli wrote:
> -static void panic_print_sys_info(void)
> +static void panic_print_sys_info(bool console_flush)
>  {
> - if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> - console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + if (console_flush) {
> + if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> + console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + return;
> + }

Yeah, if Petr is fine with that then I'm OK. But at the same time,
we have `panic_print' which is a bit mask of what panic_print_sys_info()
should do. And now we also have a boolean `console_flush` flag that tells
panic_print_sys_info() to ignore some (one as of now) bits of `panic_print'.

So _maybe_ panic_print_sys_info() can just accept panic_print as
its parameter and then we can do something like this (as an example)

panic_print_sys_info(panic_print & ~PANIC_PRINT_ALL_PRINTK_MSG);

>   if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>   trigger_all_cpu_backtrace();
> @@ -286,6 +289,8 @@ void panic(const char *fmt, ...)
>*/
>   atomic_notifier_call_chain(_notifier_list, 0, buf);

[..]

> + panic_print_sys_info(false);

Merely because `panic_print_sys_info(false);` doesn't tell much to a reader.
Like what is print sys info false?

Or did you already discuss this?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-01 Thread Sergey Senozhatsky

On (21/04/01 16:17), Petr Mladek wrote:
> > For the long term, we should introduce a printk-context API that allows
> > callers to perfectly pack their multi-line output into a single
> > entry. We discussed [0][1] this back in August 2020.
> 
> We need a "short" term solution. There are currently 3 solutions:
> 
> 1. Keep nmi_safe() and all the hacks around.
> 
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>the special lock used also by atomic consoles.
> 
> 3. Tell complaining people how to sort the messed logs.

Are we talking about nmi_cpu_backtrace()->dump_stack() or some
other path?

dump_stack() seems to be already serialized by `dump_lock`. Hmm,
show_regs() is not serialized, seems like it should be under the
same `dump_lock` as dump_stack().

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-20 Thread Sergey Senozhatsky

On (21/03/17 00:33), John Ogness wrote:
[..]
>  void printk_nmi_direct_enter(void)
>  {
> @@ -324,27 +44,8 @@ void printk_nmi_direct_exit(void)
>   this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
>  }
>  
> -#else
> -
> -static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
> -{
> - return 0;
> -}
> -
>  #endif /* CONFIG_PRINTK_NMI */
>  
> -/*
> - * Lock-less printk(), to avoid deadlocks should the printk() recurse
> - * into itself. It uses a per-CPU buffer to store the message, just like
> - * NMI.
> - */
> -static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
> -{
> - struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq);
> -
> - return printk_safe_log_store(s, fmt, args);
> -}
> -
>  /* Can be preempted by NMI. */
>  void __printk_safe_enter(void)
>  {
> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
> args)
>* Use the main logbuf even in NMI. But avoid calling console
>* drivers that might have their own locks.
>*/
> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> + if (this_cpu_read(printk_context) &
> + (PRINTK_NMI_DIRECT_CONTEXT_MASK |
> +  PRINTK_NMI_CONTEXT_MASK |
> +  PRINTK_SAFE_CONTEXT_MASK)) {

Do we need printk_nmi_direct_enter/exit() and PRINTK_NMI_DIRECT_CONTEXT_MASK?
Seems like all printk_safe() paths are now DIRECT - we store messages to the
prb, but don't call console drivers.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support

2020-08-26 Thread Sergey Senozhatsky

On (20/08/26 10:45), John Ogness wrote:
> On 2020-08-24, John Ogness  wrote:
> > @@ -1157,6 +1431,14 @@ bool prb_reserve(struct prb_reserved_entry *e, 
> > struct printk_ringbuffer *rb,
> > goto fail;
> > }
> >  
> > +   /*
> > +* New data is about to be reserved. Once that happens, previous
> > +* descriptors are no longer able to be extended. Finalize the
> > +* previous descriptor now so that it can be made available to
> > +* readers (when committed).
> > +*/
> > +   desc_finalize(desc_ring, DESC_ID(id - 1));
> > +
> > d = to_desc(desc_ring, id);
> >  
> > /*
> 
> Apparently this is not enough to guarantee that past descriptors are
> finalized. I am able to reproduce a scenario where the finalization of a
> certain descriptor never happens. That leaves the descriptor permanently
> in the reserved queried state, which prevents any new records from being
> created. I am investigating.

Good to know. I also run into problems:
- broken dmesg (and broken journalctl -f /dev/kmsg poll) and broken
  syslog read

$ strace dmesg

...
openat(AT_FDCWD, "/dev/kmsg", O_RDONLY|O_NONBLOCK) = 3
lseek(3, 0, SEEK_DATA)  = 0
read(3, 0x55dda8c240a8, 8191)   = -1 EAGAIN (Resource temporarily 
unavailable)
close(3)= 0
syslog(10 /* SYSLOG_ACTION_SIZE_BUFFER */) = 524288
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f43ea847000
syslog(3 /* SYSLOG_ACTION_READ_ALL */, "", 524296) = 0
munmap(0x7f43ea847000, 528384)  = 0
...

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-14 Thread Sergey Senozhatsky

On (20/08/14 15:46), Linus Torvalds wrote:
> On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
>  wrote:
> >
> > I think what Linus said a long time ago was that the initial purpose of
> > pr_cont was
> >
> > pr_info("Initialize feature foo...");
> > if (init_feature_foo() == 0)
> > pr_cont("ok\n");
> > else
> > pr_cont("not ok\n");
> >
> > And if init_feature_foo() crashes the kernel then the first printk()
> > form panic() will flush the cont buffer.
> 
> Right.
> 
> This is why I think any discussion that says "people should buffer
> their lines themselves and we should get rid if pr_cont()" is
> fundamentally broken.

I think what we've been talking about so far was "how do we buffer
cont lines now", what are the problems of current buffering and what
can we do to make that buffering SMP safe (preserving the context of
broken cont buffer, etc. etc.). I don't think that anyone concluded
to just "s/pr_cont/printk/g", I don't see this.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-13 Thread Sergey Senozhatsky

On (20/08/13 12:35), John Ogness wrote:
> I believe I failed to recognize the fundamental problem. The fundamental
> problem is that the pr_cont() semantics are very poor.

The semantics is pretty clear - use it only in UP early bootup,
anything else is broken :)

  /*
   * Annotation for a "continued" line of log printout (only done after a
   * line that had no enclosing \n). Only to be used by core/arch code
   * during early bootup (a continued line is not SMP-safe otherwise).
   */
  #define KERN_CONT KERN_SOH "c"

> I now strongly believe that we need to fix those semantics by having the
> pr_cont() user take responsibility for buffering the message. Patching the
> ~2000 pr_cont() users will be far easier than continuing to twist ourselves
> around this madness.

I welcome this effort. We've been talking about the fact that pr_cont() is
not something we can ignore anymore (we have more and more SMP users of
it) since the Kernel Summit in Santa Fe, NM, but the general response back
then was "oh my god, who cares" (pretty sure this is very close to what Ted
Ts'o said during the printk session).

> Here is an example for a new pr_cont() API:
> 
> struct pr_cont c;
> 
> pr_cont_alloc_info();
>(or alternatively)
> dev_cont_alloc_info(dev, );
> 
> pr_cont(, "1");
> pr_cont(, "2");
> 
> pr_cont_flush();

This might be a bit more complex.

One thing that we need to handle here, I believe, is that the context
which crashes the kernel should flush its cont buffer, because the
information there is relevant to the crash:

pr_cont_alloc_info();
pr_cont(, "1");
pr_cont(, "2");
>>
   oops
  panic()
<<
pr_cont_flush();

We better flush that context's pr_cont buffer during panic().

Another example:

pr_cont_alloc_info();

for (i = 0; i < p->sz; i++)
pr_cont(, p->buf[i]);
>>
   page fault
exit
<<
pr_cont_flush();

I believe we need to preliminary flush pr_cont() in this case as well,
because the information there might be very helpful.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-13 Thread Sergey Senozhatsky

On (20/08/13 10:41), Petr Mladek wrote:
> > My concerns about this idea:
> > 
> > - What if the printk user does not correctly terminate the cont message?
> >   There is no mechanism to allow that open record to be force-finalized
> >   so that readers can read newer records.
> 
> This is a real problem. And it is the reason why the cont buffer is
> currently flushed (finalized) by the next message from another context.

I understand that you think that this should be discussed and addressed
later in a separate patch, but, since we are on pr_cont topic right now,
can we slow down and maybe re-think what is actually expected from
pr_cont()? IOW, have the "what is expect from this feature" thread?

For instance, is missing \n the one and only reason why printk-s from
another context flush cont buffer now? Because I can see some more reasons
for current behaviour and I'd like to question those reasons.

I think what Linus said a long time ago was that the initial purpose of
pr_cont was

pr_info("Initialize feature foo...");
if (init_feature_foo() == 0)
pr_cont("ok\n");
else
pr_cont("not ok\n");

And if init_feature_foo() crashes the kernel then the first printk()
form panic() will flush the cont buffer.

We can handle this by realizing that new printk() message has LOG_NEWLINE
and has different log_level (not pr_cont), maybe.

Let's look at the more general case:

CPU0.. CPU255
pr_info("text");
pr_alert("boom\n");
pr_cont("1);
pr_cont("2\n");

Do we really need to preliminary flush CPU0 pr_cont buffer in this
case? There is no connection between messages from CPU0 and CPU255.
Maybe (maybe!) what matters here is keeping the order of messages
per-context rather than globally system-wide?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH] printk: Change timestamp to triplet as mono, boot and real

2020-08-13 Thread Sergey Senozhatsky

On (20/08/13 12:22), Petr Mladek wrote:
>
>   + would take more space (prefix + text vs. binary representation)

Dict buffer is allocated regardless of how we use it, and only printks
from drivers/* (dev_printk*) add dict payload. It might be the case
that on some (if not most) systems dict pages are not used 90% of times
(if not 99%).

>   + not reliable because dict is currently dropped when no space

Well, in the perfect world this is a problem, but "maybe not having
alternative timestamps sometimes" can be OK approach for people who
wants to use those triplet timestamps.

But, in general, how real this problem is? What I sae so far (on my boxes)
was that printk messages are longer than dict payload.

Example:

6,223,93298,-,caller=T1;acpi PNP0A08:00: _OSC: OS supports [Segments MSI 
HPX-Type3]
 SUBSYSTEM=acpi
 DEVICE=+acpi:PNP0A08:00
6,224,93413,-,caller=T1;acpi PNP0A08:00: _OSC: not requesting OS control; OS 
requires [ExtendedConfig ASPM ClockPM MSI]
 SUBSYSTEM=acpi
 DEVICE=+acpi:PNP0A08:00
6,225,93540,-,caller=T1;PCI host bridge to bus :00
6,226,93541,-,caller=T1;pci_bus :00: root bus resource [io  0x-0x0cf7 
window]
 SUBSYSTEM=pci_bus
 DEVICE=+pci_bus::00

It might be the case, that overflowing dict buffer, without first overflowing
printk buffer, can be rather challenging.

>   + it would make the controversial dictionary feature more important

Yes. But, for the time being, this controversial dict feature allocates
pages for dict buffer anyway.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

2020-08-12 Thread Sergey Senozhatsky

On (20/08/13 02:30), John Ogness wrote:
> 2. I haven't yet figured out how to preserve calling context when a
> newline appears. For example:
> 
> pr_info("text");
> pr_cont(" 1");
> pr_cont(" 2\n");
> pr_cont("3");
> pr_cont(" 4\n");
>
> For "3" the calling context (info, timestamp) is lost because with "2"
> the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

This is not an unseen pattern, I'm afraid. And the problem here can
be more general:

pr_info("text");
pr_cont("1");
exception/IRQ/NMI
pr_alert("text\n");
pr_cont("2");
pr_cont("\n");

I guess the solution would be to store "last log_level" in task_struct
and get current (new) timestamp for broken cont line?

We have this problem now. E.g. early boot messages from one of my boxes:

6,173,41837,-;x86: Booting SMP configuration:
6,174,41838,-; node  #0, CPUs:  #1 #2 #3 #4
4,175,44993,-;MDS CPU bug present and SMT on, data leak possible. See 
https://www.kernel.org/doc/...
4,176,44993,c; #5 #6 #7

"CPUs: #1 #2 #3 #4 #5 #6 #7" is supposed to be one cont line with
loglevel 6. But it gets interrupted and flushed so that "#5 #6 #7"
appears with the different loglevel.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH] printk: Change timestamp to triplet as mono, boot and real

2020-08-12 Thread Sergey Senozhatsky

On (20/08/11 15:02), Petr Mladek wrote:
> On Tue 2020-08-11 14:05:12, Thomas Gleixner wrote:
> > Petr Mladek  writes:
> > > At least "crash" tool would need an update anyway. AFAIK, it checks
> > > the size of struct printk_log and refuses to read it when it changes.
> > >
> > > It means that the hack with VMCOREINFO_FIELD_OFFSET probably is not
> > > needed because we would need to update the crashdump-related tools anyway.
> > >
> > > Well, the timing is good. We are about to switch the printk ring
> > > buffer into a lockless one. It requires updating the crashdump tools
> > > as well. We could do this at the same time. The lockless ring buffer
> > > already is in linux-next. It is aimed for 5.10 or 5.11.
> > ...
> > > It would be great to synchronize all these changes changes of the
> > > printk log buffer structures.
> > 
> > I agree that having one update is a good thing, but pretty please can we
> > finally make progress with this and not create yet another dependency?
> 
> To make it clear. I definitely do not want to block lockless printk by
> this.
> 
> BTW: I am not 100% convinced that storing all three timestamps is
> worth it. It increases the code complexity, metadata size. It needs
> an interface with the userspace that has to stay backward compatible.

Can we, perhaps, store those various "alternative" timestamps in dict so
then whoever wants to read them can just parse the dict key:value pairs
attach to each printk message?

> Also it still will be racy because the timestamp is taken when the message
> is printed. It might be "long" before or after the event that
> it talks about.
>
> There is still the alternative to print all three timestamps regularly
> for those interested. It is less user convenient but much easier
> to maintain.

Yes, that's a nice alternative.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-28 Thread Sergey Senozhatsky

On (20/07/21 17:03), John Ogness wrote:
> On 2020-07-21, Sergey Senozhatsky  wrote:
> >> That said, we have traditionally used not just "current process", but
> >> also "last irq-level" as the context information, so I do think it
> >> would be good to continue to do that.
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x8000 + raw_smp_processor_id()".
> 
> If bit31 is set, the upper 8 bits could specify what the lower 24 bits
> represent. That would give some freedom for the future.
> 
> For example:
> 
> 0x80 = cpu id (generic context)
> 0x81 = interrupt number
> 0x82 = cpu id (nmi context)

Looks good to me. I'd probably add in_softirq state. So the contexts
are: in_task, in_softirq, in_irq, in_nmi

For the purpose of future extensions, I would probably consider
u64 for callerid. So that we can store more "state bits" in the future,
like disabled/enabled preemption; disabled/enabled local irqs; etc.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-27 Thread Sergey Senozhatsky

On (20/07/21 08:40), Linus Torvalds wrote:
> On Tue, Jul 21, 2020 at 7:42 AM Sergey Senozhatsky
>  wrote:
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x8000 + raw_smp_processor_id()".
> 
> I think it's really preempt_count() that we want to have there.
> 
> That has the softirq/hardirq/NMI information in it.
> 
> So the "context" identifier of an incomplete write would be { task,
> cpu, preempt_count() } of the writer. If a new KERN_CONT writer comes
> in, it would have to match in that context to actually combine.
> 
> You can probably play "hide the bits" tricks to not *ac tually* use
> three words for it. The task structure in particular tends to be very
> aligned, you could hide bits in the low bits there. The CPU number
> would fit in there, for example.

The purpose of callerid has changed. We used to keep it _only_ in
struct printk_log and never used it for anything but

if (cont.owner == current && (lflags & LOG_CONT))
...

so it was easy to use task_struct pointers - we only cared if cont.owner
matches current and never dereferenced the cont.owner pointer.

But we do show caller id to users now (CONFIG_PRINTK_CALLER), so it
makes sense to keep there something more or less human readable, it
helps syzkaller/fuzzer people. That's why we keep PID in [30,0] bits
of callerid if the printk was called in_task(), and CPU_ID otherwise.

Replacing easy to understand PID/CPU_ID with some magic hex, e.g.
hash(current) >> shift or hash(smp_processor_id()) >> shift, will
make things less convenient, so I think it is reasonable to preserve
the existing scheme.

I like what John suggests:

- if printk was called from in_task(), then we keep PID in [30,0] bits
- otherwise we keep smp_processor_id() in [27,0] bits and in the upper
  bits we keep the IRQ/NMI/etc flags.

It may make sense to "promote" u32 callerid to u64, just in case if
someone sometime in the future will have boxes with more than several
billion PIDs.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2][next] printk: ringbuffer: support dataless records

2020-07-27 Thread Sergey Senozhatsky

On (20/07/21 15:31), John Ogness wrote:
> With commit ("printk: use the lockless ringbuffer"), printk()
> started silently dropping messages without text because such
> records are not supported by the new printk ringbuffer.
> 
> Add support for such records.
> 
> Currently dataless records are denoted by INVALID_LPOS in order
> to recognize failed prb_reserve() calls. Change the ringbuffer
> to instead use two different identifiers (FAILED_LPOS and
> NO_LPOS) to distinguish between failed prb_reserve() records and
> successful dataless records, respectively.
> 
> Fixes: ("printk: use the lockless ringbuffer")
> Fixes: https://lkml.kernel.org/r/20200718121053.ga691...@elver.google.com
> Reported-by: Marco Elver 
> Signed-off-by: John Ogness 

Applied to printk-rework, thanks.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-21 Thread Sergey Senozhatsky

On (20/07/20 11:30), Linus Torvalds wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

OK, I see. I sort of suspect that we may actually have code that does
just pr_cont() (e.g. what Joe pointed out). It doesn't seem like that
"establish a context" was ever enforced, doing a bunch of pr_cont()
simply works.

[..]
> That said, we have traditionally used not just "current process", but
> also "last irq-level" as the context information, so I do think it
> would be good to continue to do that.

OK, so basically, extending printk_caller_id() so that for IRQ/NMI
we will have more info than just "0x8000 + raw_smp_processor_id()".

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH][next] printk: ringbuffer: support dataless records

2020-07-20 Thread Sergey Senozhatsky

On (20/07/20 16:07), John Ogness wrote:
>  
> +/* Determine if a logical position refers to a data-less block. */
> +#define LPOS_DATALESS(lpos)  ((lpos) & 1UL)
> +

[..]

> @@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 
> seq,
>   /* Copy text data. If it fails, this is a data-less record. */
>   if (!copy_data(>text_data_ring, _blk_lpos, 
> desc.info.text_len,
>  r->text_buf, r->text_buf_size, line_count)) {
> - return -ENOENT;
> + /* Report an error if there should have been data. */
> + if (desc.info.text_len != 0)
> + return -ENOENT;
>   }

If this is a dataless record then should copy_data() return error?

Otherwise, looks good to me
Acked-by: Sergey Senozhatsky 

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Sergey Senozhatsky

On (20/07/20 08:43), Marco Elver wrote:
> On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> 
> As I said, a number of debugging tools use them to format reports to be
> more readable (visually separate title and report body, and separate
> parts of the report). Also, such reports are often parsed by CI systems,
> and by changing the reports, these CI systems may break. But those are
> just the usecases I'm acutely aware of

Can you give example of such CI systems? // that's a real question //

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-20 Thread Sergey Senozhatsky

On (20/07/20 08:43), Marco Elver wrote:
[..]
> please see a full list of newline-print users below.
[..]
> $> git grep -En '\<(printk|pr_err|pr_warn|pr_info)\>\("\\n"\)'
> arch/alpha/kernel/core_wildfire.c:650:printk("\n");
> arch/alpha/kernel/core_wildfire.c:658:printk("\n");
> arch/alpha/kernel/traps.c:120:printk("\n");
[..]

In many cases printk("\n") is not for "print a blank line", but
rather for "flush pr_cont buffer".

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-19 Thread Sergey Senozhatsky

On (20/07/19 11:27), Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 7:35 AM Sergey Senozhatsky
>  wrote:
> >
> > Can we merge lines that we don't want to merge?
> >
> >pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()
> 
> That pr_cont in either IRQ or NMI context would be a bug.
> 
> You can't validly have a PR_CONT without the non-cont that precedes it.

Do I get it right, what you are saying is - when we process a PR_CONT
message the cont buffer should already contain previous non-LOG_NEWLINE
and non-PR_CONT message, otherwise it's a bug?

lockdep (I'll trim the code)

static void __print_lock_name(struct lock_class *class)
{
 ..
name = class->name;
if (!name) {
name = __get_key_name(class->key, str);
printk(KERN_CONT "%s", name);
} else {
printk(KERN_CONT "%s", name);
if (class->name_version > 1)
printk(KERN_CONT "#%d", class->name_version);
if (class->subclass)
printk(KERN_CONT "/%d", class->subclass);
}
}

static void print_lock_name(struct lock_class *class)
{
printk(KERN_CONT " (");
__print_lock_name(class);
printk(KERN_CONT "){%s}-{%hd:%hd}", usage, ...
}

static void
print_bad_irq_dependency(struct task_struct *curr,
{
 ..
pr_warn("which would create a new lock dependency:\n");
print_lock_name(hlock_class(prev));
pr_cont(" ->");
print_lock_name(hlock_class(next));
pr_cont("\n");
 ..
}

pr_warn() is LOG_NEWLINE, so cont buffer is empty by the time
we call print_lock_name()->__print_lock_name(), which do several
pr_cont() print outs.

I'm quite sure there is more code that does similar things.

But, overall, isn't it by design that we can process pr_cont()
message with no preceding non-cont message? Because of preliminary
flushes. Example:

CPU0

 pr_info("foo");// !LOG_NEWLINE goes into the cont buffer
 pr_cont("1");  // OK
 -> IRQ / NMI / exception / etc
pr_alert("error\n"); // flush cont buffer, log_store error message 
(it's LOG_NEWLINE)
 <- iret
 pr_cont("2");  // cont buffer was flushed. There is no preceding non-cont 
message
 pr_cont("3");
 pr_cont("\n");

Or am I misunderstanding what you saying?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/4] printk: store instead of processing cont parts

2020-07-19 Thread Sergey Senozhatsky

On (20/07/18 01:54), John Ogness wrote:
> ---
>  kernel/printk/printk.c | 114 -
>  1 file changed, 11 insertions(+), 103 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fec71229169e..c4274c867771 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 
> *trunc_msg_len)
>   *trunc_msg_len = 0;
>  }
>  
> +static inline u32 printk_caller_id(void)
> +{
> + return in_task() ? task_pid_nr(current) :
> + 0x8000 + raw_smp_processor_id();
> +}

Can we merge lines that we don't want to merge?

   pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()

  |__|
   different   same 0x8000 + cpu_id caller id
   caller id

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-18 Thread Sergey Senozhatsky

Hi Marco,

On (20/07/18 14:10), Marco Elver wrote:
> 
> It seems this causes a regression observed at least with newline-only
> printks. I noticed this during -next testing because various debugging
> tools (K*SAN, lockdep, etc.) use e.g. pr_{err,warn,info}("\n") to format
> reports.
> 
> Without wanting to wait for a report from one of these debugging tools,
> a simple reproducer is below. Without this patch, the expected newline
> is printed.

Empty/blank lines carry no valuable payload, could you please explain
why do you consider this to be a regression?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v4 0/4] printk: replace ringbuffer

2020-07-13 Thread Sergey Senozhatsky

On (20/07/10 11:58), John Ogness wrote:
> 
> IMHO users of these sequence number interfaces need to see all the
> records individually and reassemble the LOG_CONT messages themselves if
> they want to. I believe that is the only sane path forward. To do this,
> the caller id will no longer be optional to the sequence number output
> since that is vital information to re-assemble the LOG_CONT messages.

Agreed. The caller id thus probably can be extended. Right now it only
tells us either we are in a process context (PID) or in something else.
This works sort of fine (I guess) because deeply nested printouts (printks
from NMI atop of process or irq printks) land in the deferred printk-safe
buffers, but things will change with the new logbuf (and after removal of
the logbuf_lock and per-CPU printk-safe buffers).

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 0/4] printk: replace ringbuffer

2020-07-12 Thread Sergey Senozhatsky

On (20/07/10 10:28), Petr Mladek wrote:
> On Thu 2020-07-09 15:29:40, John Ogness wrote:
> > Hello,
> > 
> > Here is a v5 for the first series to rework the printk
> > subsystem. The v4 is here [0]. This first series
> > only replaces the existing ringbuffer implementation. No locking
> > is removed. The semantics/behavior of printk are kept the same
> > except for a minor optimization that is reverted (patch 3).
> > 
> > John Ogness (4):
> >   crash: add VMCOREINFO macro to define offset in a struct declared by
> > typedef
> >   printk: add lockless ringbuffer
> >   Revert "printk: lock/unlock console only for new logbuf entries"
> >   printk: use the lockless ringbuffer
> 
> The patchset is committed in printk/linux.git, branch printk-rework.

FWIW,
Acked-by: Sergey Senozhatsky 

Thank you.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 2/4] printk: add lockless ringbuffer

2020-07-10 Thread Sergey Senozhatsky

On (20/07/10 17:43), Sergey Senozhatsky wrote:
> [..]
> 
> > +void prb_init(struct printk_ringbuffer *rb,
> > + char *text_buf, unsigned int textbits,
> > + char *dict_buf, unsigned int dictbits,
> > + struct prb_desc *descs, unsigned int descbits)
> > +{
> > +   memset(descs, 0, _DESCS_COUNT(descbits) * sizeof(descs[0]));
> > +
> > +   rb->desc_ring.count_bits = descbits;
> > +   rb->desc_ring.descs = descs;
> > +   atomic_long_set(>desc_ring.head_id, DESC0_ID(descbits));
> > +   atomic_long_set(>desc_ring.tail_id, DESC0_ID(descbits));
> > +
> > +   rb->text_data_ring.size_bits = textbits;
> > +   rb->text_data_ring.data = text_buf;
> > +   atomic_long_set(>text_data_ring.head_lpos, BLK0_LPOS(textbits));
> > +   atomic_long_set(>text_data_ring.tail_lpos, BLK0_LPOS(textbits));
> > +
> > +   rb->dict_data_ring.size_bits = dictbits;
> > +   rb->dict_data_ring.data = dict_buf;
> > +   atomic_long_set(>dict_data_ring.head_lpos, BLK0_LPOS(dictbits));
> > +   atomic_long_set(>dict_data_ring.tail_lpos, BLK0_LPOS(dictbits));
> > +
> 
> Just a side note: some people want !CONFIG_PRINTK builds. I wonder
> how many people will want !CONFIG_PRINTK_DICT. The core logbuf/dict
> logbuf split is really cool.

We also may be can add a dedicated logbuf for the user-space logs,
as opposed to current "user space can write to the kernel logbuf".
Some folks really dislike that systemd, for instance, can trash
the core kernel logbuf doing a lot of /dev/kmsg writes.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v5 2/4] printk: add lockless ringbuffer

2020-07-10 Thread Sergey Senozhatsky

On (20/07/09 15:29), John Ogness wrote:
[..]
> +/*
> + * A data block: mapped directly to the beginning of the data block area
> + * specified as a logical position within the data ring.
> + *
> + * @id:   the ID of the associated descriptor
> + * @data: the writer data
> + *
> + * Note that the size of a data block is only known by its associated
> + * descriptor.
> + */
> +struct prb_data_block {
> + unsigned long   id;
> + chardata[0];
> +};

A nit: I think someone will send "Replace zero-length arrays with
flexible array member" soon enough:

-   chardata[0];
+   chardata[];

[..]
> +/*
> + * Sanity checker for reserve size. The ringbuffer code assumes that a data
> + * block does not exceed the maximum possible size that could fit within the
> + * ringbuffer. This function provides that basic size check so that the
> + * assumption is safe.
> + *
> + * Writers are also not allowed to write 0-sized (data-less) records. Such
> + * records are used only internally by the ringbuffer.
> + */
> +static bool data_check_size(struct prb_data_ring *data_ring, unsigned int 
> size)
> +{
> + struct prb_data_block *db = NULL;
> +
> + /*
> +  * Writers are not allowed to write data-less records. Such records
> +  * are used only internally by the ringbuffer to denote records where
> +  * their data failed to allocate or have been lost.
> +  */

A nit: The same data-less records comment is some 5 lines earlier. But OK.

> + if (size == 0)
> + return false;

[..]

> +void prb_init(struct printk_ringbuffer *rb,
> +   char *text_buf, unsigned int textbits,
> +   char *dict_buf, unsigned int dictbits,
> +   struct prb_desc *descs, unsigned int descbits)
> +{
> + memset(descs, 0, _DESCS_COUNT(descbits) * sizeof(descs[0]));
> +
> + rb->desc_ring.count_bits = descbits;
> + rb->desc_ring.descs = descs;
> + atomic_long_set(>desc_ring.head_id, DESC0_ID(descbits));
> + atomic_long_set(>desc_ring.tail_id, DESC0_ID(descbits));
> +
> + rb->text_data_ring.size_bits = textbits;
> + rb->text_data_ring.data = text_buf;
> + atomic_long_set(>text_data_ring.head_lpos, BLK0_LPOS(textbits));
> + atomic_long_set(>text_data_ring.tail_lpos, BLK0_LPOS(textbits));
> +
> + rb->dict_data_ring.size_bits = dictbits;
> + rb->dict_data_ring.data = dict_buf;
> + atomic_long_set(>dict_data_ring.head_lpos, BLK0_LPOS(dictbits));
> + atomic_long_set(>dict_data_ring.tail_lpos, BLK0_LPOS(dictbits));
> +

Just a side note: some people want !CONFIG_PRINTK builds. I wonder
how many people will want !CONFIG_PRINTK_DICT. The core logbuf/dict
logbuf split is really cool.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [printk] 18a2dc6982: ltp.kmsg01.fail

2020-07-09 Thread Sergey Senozhatsky

On (20/07/09 14:25), Petr Mladek wrote:
> On Thu 2020-07-09 13:23:07, John Ogness wrote:
> > On 2020-07-09, Petr Mladek  wrote:
> > > I though more about it. IMHO, it will be better to modify
> > > prb_first_seq() to do the same cycle as prb_next_seq()
> > > and return seq number of the first valid entry.
> > 
> > Exactly!
> > 
> > Here is a patch that does just that. I added a prb_first_valid_seq()
> > function and made prb_first_seq() static. (The ringbuffer still needs
> > prb_first_seq() for itself.)
> 
> The fix looks fine to me:

Yeah, looks right to me as well.

> It means that we have two fixes on top of the original patchset. Could
> you please send v5 with the two fixes integrated? I would just squash
> them into the 4th patch.

I'd prefer v5, if possible.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [printk] 18a2dc6982: ltp.kmsg01.fail

2020-07-09 Thread Sergey Senozhatsky

On (20/07/09 15:14), kernel test robot wrote:
[..]

Took me a while to find the FAIL-ed test:

> kmsg01.c:393: INFO: TEST: read returns EPIPE when messages get overwritten
> kmsg01.c:398: INFO: first seqno: 0
> kmsg01.c:411: INFO: first seqno now: 881
> kmsg01.c:425: FAIL: read returned: 77: SUCCESS (0)

So this is seq number related
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/logging/kmsg/kmsg01.c#L383

LTP writes a gazillion (roughly, maybe a bit less than that)
filler messages to /dev/kmsg in order to cause logbuf overlap:
...
19490 [  172.301584] LTP kmsg01 FILLER MESSAGE TO OVERWRITE OTHERS
19491 [  172.308730] LTP kmsg01 FILLER MESSAGE TO OVERWRITE OTHERS
19492 [  172.313852] LTP kmsg01 FILLER MESSAGE TO OVERWRITE OTHERS
19493 [  172.320988] LTP kmsg01 FILLER MESSAGE TO OVERWRITE OTHERS
19494 [  172.325618] LTP kmsg01 FILLER MESSAGE TO OVERWRITE OTHERS
...

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v4 3/4] Revert "printk: lock/unlock console only for new logbuf entries"

2020-07-08 Thread Sergey Senozhatsky

On (20/07/07 17:05), John Ogness wrote:
> This reverts commit 3ac37a93fa9217e576bebfd4ba3e80edaaeb2289.
> 
> This optimization will not apply once the transition to a lockless
> printk is complete. Rather than porting this optimization through
> the transition only to remove it anyway, just revert it now to
> simplify the transition.
> 
> Signed-off-by: John Ogness 

Acked-by: Sergey Senozhatsky 

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2 0/3] printk: replace ringbuffer

2020-05-15 Thread Sergey Senozhatsky

On (20/05/01 11:46), John Ogness wrote:
> Hello,
> 
> Here is a v2 for the first series to rework the printk subsystem. The
> v1 and history are here [0]. This first series only replaces the
> existing ringbuffer implementation. No locking is removed. No
> semantics/behavior of printk are changed.
> 
> The VMCOREINFO is updated. RFC patches for the external tools
> crash(8) [1] and makedumpfile(8) [2] have been submitted that allow
> the new ringbuffer to be correctly read.
> 
> This series is in line with the agreements [3] made at the meeting
> during LPC2019 in Lisbon, with 1 exception: support for dictionaries
> will not be discontinued [4]. Dictionaries are stored in a separate
> buffer so that they cannot interfere with the human-readable buffer.

I'm willing to bless this. The code looks good to me, nice job guys.

Acked-by: Sergey Senozhatsky 

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/2] printk: use the lockless ringbuffer

2020-02-13 Thread Sergey Senozhatsky

On (20/02/14 10:41), Sergey Senozhatsky wrote:
> On (20/02/13 23:36), John Ogness wrote:
[..]
> > We could implement it such that devkmsg_read() will skip over data-less
> > records instead of issuing an EPIPE. (That is what dmesg does.) But then
> > do we need EPIPE at all? The reader can see that is has missed records
> > by tracking the sequence number, so could we just get rid of EPIPE? Then
> > cat(1) would be a great tool to view the raw ringbuffer. Please share
> > your thoughts on this.
> 
> Looking at systemd/src/journal/journald-kmsg.c : server_read_dev_kmsg()
> -EPIPE is just one of the erronos they handle, nothing special. Could it
> be the case that some other loggers would have special handling for EPIPE?
> I'm not sure, let's look around.

rsyslog

static void
readkmsg(void)
{
int i;
uchar pRcv[8192+1];
char errmsg[2048];

for (;;) {
dbgprintf("imkmsg waiting for kernel log line\n");

/* every read() from the opened device node receives one record 
of the printk buffer */
i = read(fklog, pRcv, 8192);

if (i > 0) {
/* successful read of message of nonzero length */
pRcv[i] = '\0';
} else if (i == -EPIPE) {
imkmsgLogIntMsg(LOG_WARNING,
"imkmsg: some messages in circular 
buffer got overwritten");
continue;
} else {
/* something went wrong - error or zero length message 
*/
if (i < 0 && errno != EINTR && errno != EAGAIN) {
/* error occured */
imkmsgLogIntMsg(LOG_ERR,
   "imkmsg: error reading kernel log - 
shutting down: %s",
rs_strerror_r(errno, errmsg, 
sizeof(errmsg)));
fklog = -1;
}
break;
}

submitSyslog(pRcv);
}
}


So EPIPE errno better stay around.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/2] printk: use the lockless ringbuffer

2020-02-13 Thread Sergey Senozhatsky

On (20/02/13 23:36), John Ogness wrote:
> >> Here prb_read_valid() was successful, so a record _was_ read. The
> >> kerneldoc for the prb_read_valid() says:
> >
> > Hmm, yeah. That's true.
> >
> > OK, something weird...
> >
> > I ran some random printk-pressure test (mostly printks from IRQs;
> > + some NMI printk-s, but they are routed through nmi printk-safe
> > buffers; + some limited number of printk-safe printk-s, routed
> > via printk-safe buffer (so, once again, IRQ); + user-space
> > journalctl -f syslog reader), and after the test 'cat /dev/kmsg'
> > is terminally broken
> >
> > [..]
> > cat /dev/kmsg
> > cat: /dev/kmsg: Broken pipe
>
> In mainline you can have this "problem" as well. Once the ringbuffer has
> wrapped, any read to a newly opened /dev/kmsg when a new message arrived
> will result in an EPIPE. This happens quite easily once the ringbuffer
> has wrapped because each new message is overwriting the oldest message.

Hmm. Something doesn't add up.

Looking at the numbers, both r->info->seq and prb_first_seq(prb)
do increase, so there are new messages in the ring buffer

   u->seqr->seqprb_first_seq
[..]
cat: devkmsg_read() error 1981080   1982633   1981080
cat: devkmsg_read() error 1981080   1982633   1981080
cat: devkmsg_read() error 1981095   1982652   1981095
cat: devkmsg_read() error 1981095   1982652   1981095
cat: devkmsg_read() error 1981095   1982652   1981095
[..]

but 'cat' still wouldn't read anything from the logbuf - EPIPE.

NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test first,
then I run 'cat /dev/kmsg', after the test, when printk-pressure is gone.

I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg after
heavy printk-pressure test. So chances are some loggers can also experience
problems. This might be a regression.

> > ...
> > systemd-journal: devkmsg_read() error 1979281 1982465 1980933
> > systemd-journal: corrected seq 1982465 1982465
> > cat: devkmsg_read() error 1980987 1982531 1980987
> > cat: corrected seq 1982531 1982531
> > cat: devkmsg_read() error 1981015 1982563 1981015
> > cat: corrected seq 1982563 1982563
>
> The situation with a data-less record is the same as when the ringbuffer
> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
> not going to help because it will not be able to get past that data-less
> record.

So maybe this is the case with broken 'cat' on my system?

> We could implement it such that devkmsg_read() will skip over data-less
> records instead of issuing an EPIPE. (That is what dmesg does.) But then
> do we need EPIPE at all? The reader can see that is has missed records
> by tracking the sequence number, so could we just get rid of EPIPE? Then
> cat(1) would be a great tool to view the raw ringbuffer. Please share
> your thoughts on this.

Looking at systemd/src/journal/journald-kmsg.c : server_read_dev_kmsg()
-EPIPE is just one of the erronos they handle, nothing special. Could it
be the case that some other loggers would have special handling for EPIPE?
I'm not sure, let's look around.

I'd say that EPIPE removal looks OK to me. But before we do that, I'm
not sure that we have clear understanding of 'cat /dev/kmsg' behaviour
change.

> On a side note (but related to data-less records): I hacked the
> ringbuffer code to inject data-less records at various times in order to
> verify your report. And I stumbled upon a bug in the ringbuffer, which
> can lead to an infinite loop in console_unlock(). The problem occurs at:
> 
> retry = prb_read_valid(prb, console_seq, NULL);
> 
> which will erroneously return true if console_seq is pointing to a
> data-less record but there are no valid records after it. The following
> patch fixes the bug. And yes, for v2 I have added comments to the
> desc_read_committed() code.

That's great to know!

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-13 Thread Sergey Senozhatsky

On (20/02/13 14:07), Petr Mladek wrote:
> On Wed 2020-02-05 17:12:12, John Ogness wrote:
> > On 2020-02-05, lijiang  wrote:
> > > Do you have any suggestions about the size of CONFIG_LOG_* and
> > > CONFIG_PRINTK_* options by default?
> > 
> > The new printk implementation consumes more than double the memory that
> > the current printk implementation requires. This is because dictionaries
> > and meta-data are now stored separately.
> > 
> > If the old defaults (LOG_BUF_SHIFT=17 LOG_CPU_MAX_BUF_SHIFT=12) were
> > chosen because they are maximally acceptable defaults, then the defaults
> > should be reduced by 1 so that the final size is "similar" to the
> > current implementation.
> >
> > If instead the defaults are left as-is, a machine with less than 64 CPUs
> > will reserve 336KiB for printk information (128KiB text, 128KiB
> > dictionary, 80KiB meta-data).
> > 
> > It might also be desirable to reduce the dictionary size (maybe 1/4 the
> > size of text?).
> 
> Good questions. It would be great to check the usage on some real
> systems.

[..]

> I wish the dictionaries were never added ;-) They complicate the code
> and nobody knows how many people actually use the information.

Maybe we can have CONFIG_PRINTK_EXTRA_PAYLOAD [for dicts] so people can
compile it out if it's not needed. This can save several bytes here and
there.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 2/2] printk: use the lockless ringbuffer

2020-02-13 Thread Sergey Senozhatsky

On (20/02/13 10:42), John Ogness wrote:
> On 2020-02-13, Sergey Senozhatsky  wrote:
> >> -  while (user->seq == log_next_seq) {
> >> +  if (!prb_read_valid(prb, user->seq, r)) {
> >>if (file->f_flags & O_NONBLOCK) {
> >>ret = -EAGAIN;
> >>logbuf_unlock_irq();
> >> @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char 
> >> __user *buf,
> >>  
> >>logbuf_unlock_irq();
> >>ret = wait_event_interruptible(log_wait,
> >> - user->seq != log_next_seq);
> >> +  prb_read_valid(prb, user->seq, r));
> >>if (ret)
> >>goto out;
> >>logbuf_lock_irq();
> >>}
> >>  
> >> -  if (user->seq < log_first_seq) {
> >> -  /* our last seen message is gone, return error and reset */
> >> -  user->idx = log_first_idx;
> >> -  user->seq = log_first_seq;
> >> +  if (user->seq < r->info->seq) {
> >> +  /* the expected message is gone, return error and reset */
> >> +  user->seq = r->info->seq;
> >>ret = -EPIPE;
> >>logbuf_unlock_irq();
> >>goto out;
> >>}
> >
> > Sorry, why doesn't this do something like
> >
> > if (user->seq < prb_first_seq(prb)) {
> > /* the expected message is gone, return error and reset */
> > user->seq = prb_first_seq(prb);
> > ret = -EPIPE;
> > ...
> > }
> 
> Here prb_read_valid() was successful, so a record _was_ read. The
> kerneldoc for the prb_read_valid() says:

Hmm, yeah. That's true.

OK, something weird...

I ran some random printk-pressure test (mostly printks from IRQs;
+ some NMI printk-s, but they are routed through nmi printk-safe
buffers; + some limited number of printk-safe printk-s, routed
via printk-safe buffer (so, once again, IRQ); + user-space
journalctl -f syslog reader), and after the test 'cat /dev/kmsg'
is terminally broken

[..]
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
cat /dev/kmsg
cat: /dev/kmsg: Broken pipe
[..]

dmesg works. Reading from /dev/kmsg - doesn't; it did work, however,
before the test.

So I printed seq numbers from devksmg read to a seq buffer and dumped
it via procfs, just seq numbers before we adjust user->seq (set to
r->seq) and after

+   offt += snprintf(BUF + offt,
+   sizeof(BUF) - offt,
+   "%s: devkmsg_read() error %llu %llu 
%llu\n",
+   current->comm,
+   user->seq,
+   r->info->seq,
+   prb_first_seq(prb));


...
systemd-journal: devkmsg_read() error 1979235 1979236 1979236
systemd-journal: corrected seq 1979236 1979236
systemd-journal: devkmsg_read() error 1979237 1979243 1979243
systemd-journal: corrected seq 1979243 1979243
systemd-journal: devkmsg_read() error 1979244 1979250 1979250
systemd-journal: corrected seq 1979250 1979250
systemd-journal: devkmsg_read() error 197925

Re: [PATCH 2/2] printk: use the lockless ringbuffer

2020-02-13 Thread Sergey Senozhatsky

On (20/01/28 17:25), John Ogness wrote:
[..]
> - while (user->seq == log_next_seq) {
> + if (!prb_read_valid(prb, user->seq, r)) {
>   if (file->f_flags & O_NONBLOCK) {
>   ret = -EAGAIN;
>   logbuf_unlock_irq();
> @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char 
> __user *buf,
>  
>   logbuf_unlock_irq();
>   ret = wait_event_interruptible(log_wait,
> -user->seq != log_next_seq);
> + prb_read_valid(prb, user->seq, r));
>   if (ret)
>   goto out;
>   logbuf_lock_irq();
>   }
>  
> - if (user->seq < log_first_seq) {
> - /* our last seen message is gone, return error and reset */
> - user->idx = log_first_idx;
> - user->seq = log_first_seq;
> + if (user->seq < r->info->seq) {
> + /* the expected message is gone, return error and reset */
> + user->seq = r->info->seq;
>   ret = -EPIPE;
>   logbuf_unlock_irq();
>   goto out;
>   }

Sorry, why doesn't this do something like

if (user->seq < prb_first_seq(prb)) {
/* the expected message is gone, return error and reset */
user->seq = prb_first_seq(prb);
ret = -EPIPE;
...
}

?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-05 Thread Sergey Senozhatsky

On (20/02/05 16:48), John Ogness wrote:
> On 2020-02-05, Sergey Senozhatsky  wrote:
> > 3BUG: KASAN: wild-memory-access in copy_data+0x129/0x220>
> > 3Write of size 4 at addr 5a5a5a5a5a5a5a5a by task cat/474>
> 
> The problem was due to an uninitialized pointer.
> 
> Very recently the ringbuffer API was expanded so that it could
> optionally count lines in a record. This made it possible for me to
> implement record_print_text_inline(), which can do all the kmsg_dump
> multi-line madness without requiring a temporary buffer. Rather than
> passing an extra argument around for the optional line count, I added
> the text_line_count pointer to the printk_record struct. And since line
> counting is rarely needed, it is only performed if text_line_count is
> non-NULL.
> 
> I oversaw that devkmsg_open() setup a printk_record and so I did not see
> to add the extra NULL initialization of text_line_count. There should be
> be an initializer function/macro to avoid this danger.
> 
> John Ogness
> 
> The quick fixup:
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d0d24ee1d1f4..5ad67ff60cd9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -883,6 +883,7 @@ static int devkmsg_open(struct inode *inode, struct file 
> *file)
>   user->record.text_buf_size = sizeof(user->text_buf);
>   user->record.dict_buf = >dict_buf[0];
>   user->record.dict_buf_size = sizeof(user->dict_buf);
> + user->record.text_line_count = NULL;
>  
>   logbuf_lock_irq();
>   user->seq = prb_first_seq(prb);

Yes. That should do. It seems that /dev/kmsg reads/writes happen very early in
my system and all the backtraces I saw were from completely unrelated paths -
either a NULL deref at sys_clone()->do_fork()->copy_creds()->prepare_creads(),
or general protection fault in 
sys_keyctl()->join_session_keyring()->prepare_creds(),
or some weird crashes in ext4. And so on.

I see some more unexplainable lockups on one on my test boards, but I
can't provide more details at this time. Might not be related to the
patch set. Need to investigate further.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-05 Thread Sergey Senozhatsky

On (20/02/05 10:00), John Ogness wrote:
> On 2020-02-05, Sergey Senozhatsky  wrote:
> >>>> So there is a General protection fault. That's the type of a
> >>>> problem that kills the boot for me as well (different backtrace,
> >>>> tho).
> >>> 
> >>> Do you have CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE (KASLR)
> >>> enabled?
> >> 
> >> Yes. These two options are enabled.
> >> 
> >> CONFIG_RELOCATABLE=y
> >> CONFIG_RANDOMIZE_BASE=y
> >
> > So KASLR kills the boot for me. So does KASAN.
> 
> Sergey, thanks for looking into this already!
> 

So I hacked the system a bit.

3BUG: KASAN: wild-memory-access in copy_data+0x129/0x220>
3Write of size 4 at addr 5a5a5a5a5a5a5a5a by task cat/474>
Call Trace:>
 dump_stack+0x76/0xa0>
 ? copy_data+0x129/0x220>
 __kasan_report.cold+0x5/0x3b>
 ? get_page_from_freelist+0x1224/0x1490>
 ? copy_data+0x129/0x220>
 copy_data+0x129/0x220>
 _prb_read_valid+0x1a0/0x330>
 ? prb_first_seq+0xe0/0xe0>
 ? __might_sleep+0x2f/0xd0>
 ? __zone_watermark_ok+0x180/0x180>
 ? ___might_sleep+0xbe/0xe0>
 prb_read_valid+0x4f/0x60>
 ? _prb_read_valid+0x330/0x330>
 devkmsg_read+0x12e/0x3d0>
 ? __mod_node_page_state+0x1a/0xa0>
 ? info_print_ext_header.constprop.0+0x120/0x120>
 ? __lru_cache_add+0x16c/0x190>
 ? __handle_mm_fault+0x1097/0x1f60>
 vfs_read+0xdc/0x200>
 ksys_read+0xa0/0x130>
 ? kernel_write+0xb0/0xb0>
 ? up_read+0x56/0x130>
 do_syscall_64+0xa0/0x520>
 ? syscall_return_slowpath+0x210/0x210>
 ? do_page_fault+0x399/0x4fa>
 entry_SYSCALL_64_after_hwframe+0x44/0xa9>
RIP: 0033:0x7ff5f39813f2>
Code: c0 e9 c2 fe ff ff 50 48 8d 3d 9a 0d 0a 00 e8 95 ed 01 00 0f 1f 44 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 
c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24>
RSP: 002b:7ffc47b3ee58 EFLAGS: 024>
c ORIG_RAX: >
RAX: ffda RBX: 0002 RCX: 7ff5f39813f2>
RDX: 0002 RSI: 7ff5f3588000 RDI: 0003>
RBP: 7ff5f3588000 R08: 7ff5f3587010 R09: >
R10: 0022 R11: 0246 R12: 55f9c8a81c00>
R13: 0003 R14: 0002 R15: 0002>

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-05 Thread Sergey Senozhatsky

On (20/02/05 10:00), John Ogness wrote:
> On 2020-02-05, Sergey Senozhatsky  wrote:
> >>>> So there is a General protection fault. That's the type of a
> >>>> problem that kills the boot for me as well (different backtrace,
> >>>> tho).
> >>> 
> >>> Do you have CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE (KASLR)
> >>> enabled?
> >> 
> >> Yes. These two options are enabled.
> >> 
> >> CONFIG_RELOCATABLE=y
> >> CONFIG_RANDOMIZE_BASE=y
> >
> > So KASLR kills the boot for me. So does KASAN.
> 
> Sergey, thanks for looking into this already!

Hey, no prob! I can't see how and why that would be KASLR related,
and most likely it's not. Probably we just hit some fault sooner
with it enabled.

So far it seems that reads from /dev/kmsg are causing problems
on my laptop, but it's a bit hard to debug.

Nothing printk-related in my boot params.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-04 Thread Sergey Senozhatsky

On (20/02/05 13:38), lijiang wrote:
> > On (20/02/05 13:48), Sergey Senozhatsky wrote:
> >> On (20/02/05 12:25), lijiang wrote:

[..]

> >>
> >> So there is a General protection fault. That's the type of a problem that
> >> kills the boot for me as well (different backtrace, tho).
> > 
> > Do you have CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE (KASLR) enabled?
> > 
> 
> Yes. These two options are enabled.
> 
> CONFIG_RELOCATABLE=y
> CONFIG_RANDOMIZE_BASE=y

So KASLR kills the boot for me. So does KASAN.

John, do you see any of these problems on your test machine?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-04 Thread Sergey Senozhatsky

On (20/02/05 13:48), Sergey Senozhatsky wrote:
> On (20/02/05 12:25), lijiang wrote:
> [..]
> > [   42.111004] Kernel Offset: 0x1f00 from 0x8100 
> > (relocation range: 0x8000-0xbfff)
> > [   42.111005] general protection fault:  [#1] SMP PTI
> > [   42.111005] CPU: 15 PID: 1395 Comm: systemd-journal Not tainted 
> > 5.5.0-rc7+ #4
> > [   42.111005] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
> > SE5C610.86B.01.01.6024.071720181717 07/17/2018
> > [   42.111006] RIP: 0010:copy_data+0xf2/0x1e0
> > [   42.111006] Code: eb 08 49 83 c4 08 0f 84 8e 00 00 00 4c 89 74 24 08 4c 
> > 89 cd 41 89 d6 44 89 44 24 04 49 39 db 0f 87 c6 00 00 00 4d 85 c9 74 43 
> > <41> c7 01 00 00 00 00 48 85 db 74 37 4c 89 e7 48 89 da 41 bf 01 00
> > [   42.111007] RSP: 0018:bbe207a7bd80 EFLAGS: 00010002
> > [   42.111007] RAX: a075d44ca000 RBX: 00a8 RCX: 
> > fff000b0
> > [   42.111008] RDX: 00a8 RSI: 0f01 RDI: 
> > a1456e00
> > [   42.111008] RBP: 0801364600307073 R08: 2000 R09: 
> > 0801364600307073
> > [   42.111008] R10: fff0 R11: 00a8 R12: 
> > a1e98330
> > [   42.111009] R13: d7efbe00 R14: 00a8 R15: 
> > c000
> > [   42.111009] FS:  7f7c5642a980() GS:a075df5c() 
> > knlGS:
> > [   42.111010] CS:  0010 DS:  ES:  CR0: 80050033
> > [   42.111010] CR2: 7ffe95f4c4c0 CR3: 00084fbfc004 CR4: 
> > 003606e0
> > [   42.111011] DR0:  DR1:  DR2: 
> > 
> > [   42.111011] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [   42.111012] Call Trace:
> > [   42.111012]  _prb_read_valid+0xd8/0x190
> > [   42.111012]  prb_read_valid+0x15/0x20
> > [   42.111013]  devkmsg_read+0x9d/0x2a0
> > [   42.111013]  vfs_read+0x91/0x140
> > [   42.111013]  ksys_read+0x59/0xd0
> > [   42.111014]  do_syscall_64+0x55/0x1b0
> > [   42.111014]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   42.111014] RIP: 0033:0x7f7c55740b62
> > [   42.111015] Code: 94 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b6 0f 
> > 1f 80 00 00 00 00 f3 0f 1e fa 8b 05 e6 d8 20 00 85 c0 75 12 31 c0 0f 05 
> > <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89
> > [   42.111015] RSP: 002b:7ffe95f4c4a8 EFLAGS: 0246 ORIG_RAX: 
> > 
> > [   42.111016] RAX: ffda RBX: 7ffe95f4e500 RCX: 
> > 7f7c55740b62
> > [   42.111016] RDX: 2000 RSI: 7ffe95f4c4b0 RDI: 
> > 0008
> > [   42.111017] RBP:  R08: 0100 R09: 
> > 0003
> > [   42.111017] R10: 0100 R11: 0246 R12: 
> > 7ffe95f4c4b0
> 
> So there is a General protection fault. That's the type of a problem that
> kills the boot for me as well (different backtrace, tho).

Do you have CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE (KASLR) enabled?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-04 Thread Sergey Senozhatsky

On (20/02/05 12:25), lijiang wrote:
[..]
> [   42.111004] Kernel Offset: 0x1f00 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
> [   42.111005] general protection fault:  [#1] SMP PTI
> [   42.111005] CPU: 15 PID: 1395 Comm: systemd-journal Not tainted 5.5.0-rc7+ 
> #4
> [   42.111005] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
> SE5C610.86B.01.01.6024.071720181717 07/17/2018
> [   42.111006] RIP: 0010:copy_data+0xf2/0x1e0
> [   42.111006] Code: eb 08 49 83 c4 08 0f 84 8e 00 00 00 4c 89 74 24 08 4c 89 
> cd 41 89 d6 44 89 44 24 04 49 39 db 0f 87 c6 00 00 00 4d 85 c9 74 43 <41> c7 
> 01 00 00 00 00 48 85 db 74 37 4c 89 e7 48 89 da 41 bf 01 00
> [   42.111007] RSP: 0018:bbe207a7bd80 EFLAGS: 00010002
> [   42.111007] RAX: a075d44ca000 RBX: 00a8 RCX: 
> fff000b0
> [   42.111008] RDX: 00a8 RSI: 0f01 RDI: 
> a1456e00
> [   42.111008] RBP: 0801364600307073 R08: 2000 R09: 
> 0801364600307073
> [   42.111008] R10: fff0 R11: 00a8 R12: 
> a1e98330
> [   42.111009] R13: d7efbe00 R14: 00a8 R15: 
> c000
> [   42.111009] FS:  7f7c5642a980() GS:a075df5c() 
> knlGS:
> [   42.111010] CS:  0010 DS:  ES:  CR0: 80050033
> [   42.111010] CR2: 7ffe95f4c4c0 CR3: 00084fbfc004 CR4: 
> 003606e0
> [   42.111011] DR0:  DR1:  DR2: 
> 
> [   42.111011] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   42.111012] Call Trace:
> [   42.111012]  _prb_read_valid+0xd8/0x190
> [   42.111012]  prb_read_valid+0x15/0x20
> [   42.111013]  devkmsg_read+0x9d/0x2a0
> [   42.111013]  vfs_read+0x91/0x140
> [   42.111013]  ksys_read+0x59/0xd0
> [   42.111014]  do_syscall_64+0x55/0x1b0
> [   42.111014]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   42.111014] RIP: 0033:0x7f7c55740b62
> [   42.111015] Code: 94 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 
> 80 00 00 00 00 f3 0f 1e fa 8b 05 e6 d8 20 00 85 c0 75 12 31 c0 0f 05 <48> 3d 
> 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89
> [   42.111015] RSP: 002b:7ffe95f4c4a8 EFLAGS: 0246 ORIG_RAX: 
> 
> [   42.111016] RAX: ffda RBX: 7ffe95f4e500 RCX: 
> 7f7c55740b62
> [   42.111016] RDX: 2000 RSI: 7ffe95f4c4b0 RDI: 
> 0008
> [   42.111017] RBP:  R08: 0100 R09: 
> 0003
> [   42.111017] R10: 0100 R11: 0246 R12: 
> 7ffe95f4c4b0

So there is a General protection fault. That's the type of a problem that
kills the boot for me as well (different backtrace, tho).

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/2] printk: replace ringbuffer

2020-02-04 Thread Sergey Senozhatsky

On (20/02/05 12:25), lijiang wrote:
> Hi, John Ogness
> 
> Thank you for improving the patch series and making great efforts.
> 
> I'm not sure if I missed anything else. Or are there any other related 
> patches to be applied?
> 
> After applying this patch series, NMI watchdog detected a hard lockup, which 
> caused that kernel can not boot, please refer to
> the following call trace. And I put the complete kernel log in the attachment.

I'm also having some problems running the code on my laptop. But may be
I did something wrong while applying patch 0002 (which didn't apply
cleanly). Will look more.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

2019-12-09 Thread Sergey Senozhatsky

On (19/11/28 02:58), John Ogness wrote:
> + * Sample reader code::
> + *
> + *   struct printk_info info;
> + *   char text_buf[32];
> + *   char dict_buf[32];
> + *   u64 next_seq = 0;
> + *   struct printk_record r = {
> + *   .info   = ,
> + *   .text_buf   = _buf[0],
> + *   .dict_buf   = _buf[0],
> + *   .text_buf_size  = sizeof(text_buf),
> + *   .dict_buf_size  = sizeof(dict_buf),
> + *   };
> + *
> + *   while (prb_read_valid(, next_seq, )) {
> + *   if (info.seq != next_seq)
> + *   pr_warn("lost %llu records\n", info.seq - next_seq);
> + *
> + *   if (info.text_len > r.text_buf_size) {
> + *   pr_warn("record %llu text truncated\n", info.seq);
> + *   text_buf[sizeof(text_buf) - 1] = 0;
> + *   }
> + *
> + *   if (info.dict_len > r.dict_buf_size) {
> + *   pr_warn("record %llu dict truncated\n", info.seq);
> + *   dict_buf[sizeof(dict_buf) - 1] = 0;
> + *   }
> + *
> + *   pr_info("%llu: %llu: %s;%s\n", info.seq, info.ts_nsec,
> + *   _buf[0], info.dict_len ? _buf[0] : "");
> + *
> + *   next_seq = info.seq + 1;
> + *   }
> + */

Will this loop ever end? :)

pr_info() adds data to ringbuffer, which prb_read_valid() reads, so pr_info()
can add more data, which prb_read_valid() will read, so pr_info()...

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

2019-12-09 Thread Sergey Senozhatsky

On (19/12/02 16:48), Petr Mladek wrote:
> Hi,
> 
> I have seen few prelimitary versions before this public one.
> I am either happy with it or blind to see new problems[*].

I guess I'm happy with it.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 2/3] printk-rb: new printk ringbuffer implementation (reader)

2019-12-09 Thread Sergey Senozhatsky

On (19/12/09 17:43), Sergey Senozhatsky wrote:
> > +static int desc_read_committed(struct prb_desc_ring *desc_ring, u32 id,
> > +  u64 seq, struct prb_desc *desc)
> > +{
> > +   enum desc_state d_state;
> > +
> > +   d_state = desc_read(desc_ring, id, desc);
> > +   if (desc->info.seq != seq)
> > +   return -EINVAL;
> > +   else if (d_state == desc_reusable)
> > +   return -ENOENT;
> > +   else if (d_state != desc_committed)
> > +   return -EINVAL;
> > +
> > +   return 0;
> > +}
> > +
> > +/*
> > + * Copy the ringbuffer data from the record with @seq to the provided
> > + * @r buffer. On success, 0 is returned.
> > + *
> > + * See desc_read_committed() for error return values.
> > + */
> > +static int prb_read(struct printk_ringbuffer *rb, u64 seq,
> > +   struct printk_record *r)
> > +{
> > +   struct prb_desc_ring *desc_ring = >desc_ring;
> > +   struct prb_desc *rdesc = to_desc(desc_ring, seq);
> > +   atomic_t *state_var = >state_var;
> > +   struct prb_desc desc;
> > +   int err;
> > +   u32 id;
> > +
> > +   /* Get a reliable local copy of the descriptor and check validity. */
> > +   id = DESC_ID(atomic_read(state_var));
> > +   err = desc_read_committed(desc_ring, id, seq, );
> > +   if (err)
> > +   return err;
> > +
> > +   /* If requested, copy meta data. */
> > +   if (r->info)
> > +   memcpy(r->info, , sizeof(*(r->info)));
> 
> I wonder if those WARN_ON-s will trigger false positive sometimes.
> 
> A theoretical case.
> 
> What if reader gets preempted/interrupted in the middle of
> desc_read_committed()->desc_read()->memcpy(). The context which interrupts
> the reader recycles the descriptor and pushes new data. Suppose that
> reader was interrupted right after it copied ->info.seq and ->info.text_len.
> So the first desc_read_committed() will pass - we have matching ->seq
> and committed state. copy_data(), however, most likely, will generate
> WARNs. The final desc_read_committed() will notice that local copy
> of desc was in non-consistent state and everything is fine, but we have
> WARNs in the log buffer now.

Hmm. No, that won't happen. We should get desc_miss first, and then -EINVAL.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 3/3] printk-rb: add test module

2019-12-09 Thread Sergey Senozhatsky

On (19/11/28 02:58), John Ogness wrote:
> + * This is a test module that starts "num_online_cpus()" writer threads
> + * that each write data of varying length. They do this as fast as
> + * they can.

Can we also add some IRQ (hard or soft) writers and (if possible) some NMI
writers?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 2/3] printk-rb: new printk ringbuffer implementation (reader)

2019-12-09 Thread Sergey Senozhatsky

On (19/11/28 02:58), John Ogness wrote:
> +/* Given @blk_lpos, copy an expected @len of data into the provided buffer. 
> */
> +static bool copy_data(struct prb_data_ring *data_ring,
> +   struct prb_data_blk_lpos *blk_lpos, u16 len, char *buf,
> +   unsigned int buf_size)
> +{
> + unsigned long data_size;
> + char *data;
> +
> + /* Caller might not want the data. */
> + if (!buf || !buf_size)
> + return true;
> +
> + data = get_data(data_ring, blk_lpos, _size);
> + if (!data)
> + return false;
> +
> + /* Actual cannot be less than expected. */
> + if (WARN_ON_ONCE(data_size < len))
> + return false;
> +
> + data_size = min_t(u16, buf_size, len);
> +
> + if (!WARN_ON_ONCE(!data_size))
> + memcpy([0], data, data_size);
> + return true;
> +}
> +
> +/*
> + * Read the record @id and verify that it is committed and has the sequence
> + * number @seq.
> + *
> + * Error return values:
> + * -EINVAL: The record @seq does not exist.
> + * -ENOENT: The record @seq exists, but its data is not available. This is a
> + *  valid record, so readers should continue with the next seq.
> + */
> +static int desc_read_committed(struct prb_desc_ring *desc_ring, u32 id,
> +u64 seq, struct prb_desc *desc)
> +{
> + enum desc_state d_state;
> +
> + d_state = desc_read(desc_ring, id, desc);
> + if (desc->info.seq != seq)
> + return -EINVAL;
> + else if (d_state == desc_reusable)
> + return -ENOENT;
> + else if (d_state != desc_committed)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +/*
> + * Copy the ringbuffer data from the record with @seq to the provided
> + * @r buffer. On success, 0 is returned.
> + *
> + * See desc_read_committed() for error return values.
> + */
> +static int prb_read(struct printk_ringbuffer *rb, u64 seq,
> + struct printk_record *r)
> +{
> + struct prb_desc_ring *desc_ring = >desc_ring;
> + struct prb_desc *rdesc = to_desc(desc_ring, seq);
> + atomic_t *state_var = >state_var;
> + struct prb_desc desc;
> + int err;
> + u32 id;
> +
> + /* Get a reliable local copy of the descriptor and check validity. */
> + id = DESC_ID(atomic_read(state_var));
> + err = desc_read_committed(desc_ring, id, seq, );
> + if (err)
> + return err;
> +
> + /* If requested, copy meta data. */
> + if (r->info)
> + memcpy(r->info, , sizeof(*(r->info)));

I wonder if those WARN_ON-s will trigger false positive sometimes.

A theoretical case.

What if reader gets preempted/interrupted in the middle of
desc_read_committed()->desc_read()->memcpy(). The context which interrupts
the reader recycles the descriptor and pushes new data. Suppose that
reader was interrupted right after it copied ->info.seq and ->info.text_len.
So the first desc_read_committed() will pass - we have matching ->seq
and committed state. copy_data(), however, most likely, will generate
WARNs. The final desc_read_committed() will notice that local copy
of desc was in non-consistent state and everything is fine, but we have
WARNs in the log buffer now.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

2019-12-08 Thread Sergey Senozhatsky

On (19/11/28 02:58), John Ogness wrote:
[..]
> +
> +#define _DATA_SIZE(sz_bits)  (1UL << (sz_bits))
> +#define _DESCS_COUNT(ct_bits)(1U << (ct_bits))
> +#define DESC_SV_BITS (sizeof(int) * 8)
> +#define DESC_COMMITTED_MASK  (1U << (DESC_SV_BITS - 1))

What does SV state for? State Value?

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

2019-12-02 Thread Sergey Senozhatsky

On (19/12/02 17:37), John Ogness wrote:
> On 2019-12-02, Petr Mladek  wrote:
> >> > +/* Reserve a new descriptor, invalidating the oldest if necessary. */
> >> > +static bool desc_reserve(struct printk_ringbuffer *rb, u32 *id_out)
> >> > +{
> >> > +struct prb_desc_ring *desc_ring = >desc_ring;
> >> > +struct prb_desc *desc;
> >> > +u32 id_prev_wrap;
> >> > +u32 head_id;
> >> > +u32 id;
> >> > +
> >> > +head_id = atomic_read(_ring->head_id);
> >> > +
> >> > +do {
> >> > +desc = to_desc(desc_ring, head_id);
> >> > +
> >> > +id = DESC_ID(head_id + 1);
> >> > +id_prev_wrap = DESC_ID_PREV_WRAP(desc_ring, id);
> >> > +
> >> > +if (id_prev_wrap == atomic_read(_ring->tail_id)) {
> >> > +if (!desc_push_tail(rb, id_prev_wrap))
> >> > +return false;
> >> > +}
> >> > +} while (!atomic_try_cmpxchg(_ring->head_id, _id, 
> >> > id));
> >> 
> >> Hmm, in theory, ABA problem might cause that we successfully
> >> move desc_ring->head_id when tail has not been pushed yet.
> >> 
> >> As a result we would never call desc_push_tail() until
> >> it overflows again.
> >> 
> >> I am not sure if we need to take care of it. The code is called with
> >> interrupts disabled. IMHO, only NMI could cause ABA problem
> >> in reality. But the game (debugging) is lost anyway when NMI ovewrites
> >> the buffer several times.
> >
> > BTW: If I am counting correctly. The ABA problem would happen when
> > exactly 2^30 (1G) messages is written in the mean time.
> 
> All the ringbuffer code assumes that the use of index numbers handles
> the ABA problem (i.e. there must not be 1 billion printk's within an
> NMI). If we want to support 1 billion+ printk's within an NMI, then
> perhaps the index number should be increased. For 64-bit systems it
> would be no problem to go to 62 bits. For 32-bit systems, I don't know
> how well the 64-bit atomic operations are supported.

ftrace dumps from NMI (DUMP_ALL type ftrace_dump_on_oops on a $BIG
machine)? 1G seems large enough, but who knows.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH V2] print kdump kernel loaded status in stack dump

2018-01-30 Thread Sergey Senozhatsky

On (01/27/18 12:11), Dave Young wrote:
> It is useful to print kdump kernel loaded status in dump_stack() 
> especially when panic happens so that we can differenciate 
> kdump kernel early hang and a normal panic in a bug report.
> 
> Signed-off-by: Dave Young <dyo...@redhat.com>

Looks OK to me.

Reviewed-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>


I agree with Steven, would be better to move the whole thing
to lib/dump_stack.c

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-19 Thread Sergey Senozhatsky

On (01/19/18 16:42), Dave Young wrote:
[..]
> > [I'm not entirely sure I see why do we have printk_delay() in
> >vprintk_emit()... I mean I probably can see some reasoning behind
> >it, but at the same it makes sense to slow down console_unlock()
> >as well]
> 
> Looks like I am the guy who added the code :)

LOL :)

> Actually no special reason, just did not thinking about the performance
> issue at all at that time..

it's quite reasonable to have it in vprintk_emit(), actually.

the thing is that it limits the rate at which new messages appear in
the kernel log buffer, which does not necessarily correspond to the rate
at which new messages appear on the consoles.

printk has a "direct path"  printk -> console_unlock() -> consoles,
and "indirect path" - when it fails to acquire console semaphore,
because it's locked by someone else. that someone else, a console_sem
owner, might be scheduled out under console_sem; all printks in the
meantime will just log_store() messages [after printk_delay()].
once the console_sem owner will be back to running it will resume
console_unlock() printing loop and print out all pending messages
immediately [modulo preemption]. so there are ways for messages to
bypass printk_delay() and appear on the consoles with no visible
delay.

additionally printk_delay() does touch_nmi_watchdog() so we,
technically, are fine with moving it to console_unlock().

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-19 Thread Sergey Senozhatsky

On (01/19/18 16:16), Dave Young wrote:
> I thought about adding an option to improve printk_delay so it can
> delay every n lines, eg. 25 rows. Maybe that idea works for this
> issue?

/* sent the message too soon */

printk_delay() has no effect there. it limits only printk()->vprintk_emit()
and we don't have any new printk()-s once the system has panic-ed. it's
console_unlock() that prints out the oops.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-19 Thread Sergey Senozhatsky

On (01/19/18 16:16), Dave Young wrote:
> On 01/19/18 at 02:45pm, Sergey Senozhatsky wrote:
> > On (01/18/18 10:02), Andi Kleen wrote:
> > > Dave Young <dyo...@redhat.com> writes:
> > > > printk("%sHardware name: %s\n",
> > > >log_lvl, dump_stack_arch_desc_str);
> > > > +   if (kexec_crash_loaded())
> > > > +   printk("%skdump kernel loaded\n", log_lvl);
> > > 
> > > Oops/warnings are getting longer and longer, often scrolling away
> > > from the screen, and if the kernel crashes backscroll does not work
> > > anymore, so precious information is lost.
> > 
> > true. I even ended up having a console_reflush_on_panic() function. it
> > simply re-prints with a delay [so I can at least read the oops] logbuf
> > entries every once in a while, staring with the first oops_in_progress
> > record.
> > 
> 
> If too many messages printed on screen, then the next flush will 
> still scroll up. 

right. but it re-prints Oops with a new console_unlock_delay() delay
which gives me enough time to either read it as many times as I want,
or take a picture, etc. it's not as fast as the normal oops print out.

[I'm not entirely sure I see why do we have printk_delay() in
   vprintk_emit()... I mean I probably can see some reasoning behind
   it, but at the same it makes sense to slow down console_unlock()
   as well]

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH] print kdump kernel loaded status in stack dump

2018-01-18 Thread Sergey Senozhatsky

On (01/18/18 10:02), Andi Kleen wrote:
> Dave Young  writes:
> > printk("%sHardware name: %s\n",
> >log_lvl, dump_stack_arch_desc_str);
> > +   if (kexec_crash_loaded())
> > +   printk("%skdump kernel loaded\n", log_lvl);
> 
> Oops/warnings are getting longer and longer, often scrolling away
> from the screen, and if the kernel crashes backscroll does not work
> anymore, so precious information is lost.

true. I even ended up having a console_reflush_on_panic() function. it
simply re-prints with a delay [so I can at least read the oops] logbuf
entries every once in a while, staring with the first oops_in_progress
record.

something like below [it's completely hacked up, but at least gives
an idea]

---

 include/linux/console.h |  1 +
 kernel/panic.c  |  7 +++
 kernel/printk/printk.c  | 39 ++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..502e3f539448 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -168,6 +168,7 @@ extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
 extern void console_flush_on_panic(void);
+extern void console_reflush_on_panic(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..39cd59bbfaab 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -137,6 +137,7 @@ void panic(const char *fmt, ...)
va_list args;
long i, i_next = 0;
int state = 0;
+   int reflush_tick = 0;
int old_cpu, this_cpu;
bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
 
@@ -298,6 +299,12 @@ void panic(const char *fmt, ...)
i_next = i + 3600 / PANIC_BLINK_SPD;
}
mdelay(PANIC_TIMER_STEP);
+
+   reflush_tick++;
+   if (reflush_tick == 32) { /* don't reflush too often */
+   console_reflush_on_panic();
+   reflush_tick = 0;
+   }
}
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..ef3f28d4c741 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -426,6 +426,10 @@ static u32 log_next_idx;
 static u64 console_seq;
 static u32 console_idx;
 
+/* index and sequence number of the record which started the oops print out */
+static u64 log_oops_seq;
+static u32 log_oops_idx;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1736,6 +1740,15 @@ static inline void printk_delay(void)
}
 }
 
+/*
+ * Why do we have printk_delay() in vprintk_emit()
+ * and not in console_unlock()?
+ */
+static inline void console_unlock_delay(void)
+{
+   printk_delay();
+}
+
 /*
  * Continuation lines are buffered, and not committed to the record buffer
  * until the line is complete, or a race forces it. The line fragments
@@ -1849,6 +1862,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
/* This stops the holder of console_sem just where we want him */
logbuf_lock_irqsave(flags);
+
/*
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.
@@ -1890,7 +1904,11 @@ asmlinkage int vprintk_emit(int facility, int level,
lflags |= LOG_PREFIX|LOG_NEWLINE;
 
printed_len = log_output(facility, level, lflags, dict, dictlen, text, 
text_len);
-
+   /* Oops... */
+   if (oops_in_progress && !log_oops_seq) {
+   log_oops_seq = log_next_seq;
+   log_oops_idx = log_next_idx;
+   }
logbuf_unlock_irqrestore(flags);
 
/* If called from the scheduler, we can not call up(). */
@@ -2396,6 +2414,7 @@ void console_unlock(void)
 
stop_critical_timings();/* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
+   console_unlock_delay();
start_critical_timings();
 
if (console_lock_spinning_disable_and_check()) {
@@ -2495,6 +2514,24 @@ void console_flush_on_panic(void)
console_unlock();
 }
 
+/**
+ * console_reflush_on_panic - re-flush console content starting from the
+ * first oops_in_progress record
+ */
+void console_reflush_on_panic(void)
+{
+   unsigned long flags;
+
+   logbuf_lock_irqsave(flags);
+   console_seq = log_oops_seq;
+   console_idx = log_oops_idx;
+   logbuf_unlock_irqrestore(flags);
+
+   if (!printk_delay_msec)
+   printk_delay_msec = 273; /* I can't read any faster */
+   console_flush_on_panic();
+}
+
 /*
  * Return the console tty driver structure and its

61 matches

Mail list logo