Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-20 Thread Baoquan He
On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> Hi Baoquan, some comments inline below:
> 
> 
> On 20/01/2022 05:51, Baoquan He wrote:
> > [...]
> >> From my POV, the function of panic notifiers is not well defined. They
> >> do various things, for example:
> >> [...]
> >> The do more that just providing information. Some are risky. It is not
> >> easy to disable a particular one.
> > 
> > Yes, agree. Not all of them just provide information. 
> > 
> > Now panic_notifier_filter Guilherme added can help to disable some of
> > them.
> 
> So, just for completeness, worth to mention Petr had some interesting
> suggestions in the other thread (about the filter) and we may end-up not
> having this implemented - in other words, maybe a refactor of that
> mechanism is going to be proposed.

OK, saw that. We can continue discuss that there.

> 
> 
> > [...] 
> >>
> >>   + Guilherme uses crash dump only to dump the kernel log. It might
> >> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> >> is the only way to get the extra information.
> > 
> > Hmm, I haven't made clear what Guilherme really wants in his recent
> > post. In this patch he wants to get panic print info into pstore. He
> > also want to dump the kernel log poked by panic_print in kdump kernel. 
> > And it's very weird people try to collect kernel log via crash dump
> > mechnism, that is obviously using a sledgehammer to crack a nut.
> > Sometime, we should not add or change code to a too specific corner
> > case.
> 
> OK, I'll try to be really clear, hopefully I can explain the use case in
> better and simpler words. First of all, I wouldn't call it a corner case
> - it's just a valid use case that, in my opinion, should be allowed. Why
> not, right? Kernel shouldn't push policy on users, we should instead let
> the users decide how to use the tools/options.

Agree, sorry about my wrong expression.

> 
> So imagine you cannot collect a vmcore, due to the lack of storage
> space. Yet, you want the most information as possible to investigate the
> cause of a panic. The kernel flag "panic_print" is the perfect fit, we
> can dump backtraces, task list, memory info...right on a panic event.
> 
> But then, how to save this panic log with lots of information after a
> reboot? There are 2 ways in my understanding:
> 
> (a) pstore/kmsg_dump()
> (b) kdump
> 
> The option (a) is easily the best - we don't need to reserve lots of
> memory, then boot another kernel, etc. This patch (being hereby
> discussed) aims to enable the "panic_print" output for this case!
> But...there are cases in which option (a) cannot work. We need a backend
> of persistent storage, either a block device or, more common, RAM memory
> that is persistent across a reboot. What if it's not available?
> 
> Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
> It's not ideal, but might be a last resort for users wanting to collect
> the most information they can without saving a full vmcore. And for
> that, we need to be able to invoke "panic_print" function before the
> __crash_kexec() call. Continue below...

OK, pstore via kmsg_dump is first option, then fallback to kdump.
This is what I suggested at below. This is what panic notifier has done
at below. I think both of them are similar, thus should take the same
way to handle.

 void panic()
 {
 if (!_crash_kexec_post_notifiers && !panic_print) {
 __crash_kexec(NULL);
 smp_send_stop();
 } else {
 crash_smp_send_stop();
 }
 
atomic_notifier_call_chain(_notifier_list, 0, buf);
panic_print_sys_info(false);
kmsg_dump(KMSG_DUMP_PANIC);
if (_crash_kexec_post_notifiers || panic_print)
 __crash_kexec(NULL);
...
debug_locks_off();
 console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 
 panic_print_sys_info(true);
..
 }
> >
> 
> So, your idea is good and it mostly works, except it *requires* users to
> make use of "crash_kexec_post_notifiers" in order to use "panic_print"
> in the case (b) above discussed.

I don't get. Why it has to *require* users to make use of
"crash_kexec_post_notifiers" in order to use "panic_print"? 
To enable panic notifiers and panic_print, we need add below parameter
to kernel cmdline separately.

crash_kexec_post_notifiers=1
panic_print=0x7f

With above code, we have:
1) None specified in cmdline, only kdump enabled.
   Crash dump will work to get vmcore.
2) crash_kexec_post_notifiers=1 , kdump enabled
   panic_notifers are executed, then crash dump
3) panic_print=0x7f, kdump enabled,
   Panic_print get system info printed, then crash dump
4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
   panic_notifers are executed firstly, then panic_print, at last crash dump

Here I don't list the no kdump enabled case. Please help point out if I
misunderstood anything.
> 
> Do you think 

Re: [PATCHv4 1/4] arm64: make phys_offset signed

2022-01-20 Thread Pingfan Liu
On Fri, Jan 21, 2022 at 2:09 AM Philipp Rudo  wrote:
>
> Hi Pingfan,
>
> On Tue, 18 Jan 2022 15:48:09 +0800
> Pingfan Liu  wrote:
>
> > After kernel commit 7bc1a0f9e176 ("arm64: mm: use single quantity to
> > represent the PA to VA translation"), phys_offset can be negative if
> > running 52-bits kernel on 48-bits hardware.
> >
> > So changing phys_offset from unsigned to signed.
> >
> > Signed-off-by: Pingfan Liu 
> > Cc: Kairui Song 
> > Cc: Simon Horman 
> > Cc: Philipp Rudo 
> > To: kexec@lists.infradead.org
> > ---
> >  kexec/arch/arm64/kexec-arm64.c | 12 ++--
> >  kexec/arch/arm64/kexec-arm64.h |  2 +-
> >  util_lib/elf_info.c|  2 +-
> >  util_lib/include/elf_info.h|  2 +-
> >  4 files changed, 9 insertions(+), 9 deletions(-)
> >
>
> [...]
>
> > diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> > index ed447ac..1844b67 100644
> > --- a/kexec/arch/arm64/kexec-arm64.h
> > +++ b/kexec/arch/arm64/kexec-arm64.h
> > @@ -58,7 +58,7 @@ extern off_t initrd_size;
> >   */
> >
> >  struct arm64_mem {
> > - uint64_t phys_offset;
> > + long phys_offset;
>
> I think this one should be int64_t as well.
>
Yes, you are right. Thanks for your careful review.

@Simon, could you help to correct it or prefer my V5 to fix it.

Thanks,

PIngfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv4 4/4] arm64: fix PAGE_OFFSET calc for flipped mm

2022-01-20 Thread Pingfan Liu
On Fri, Jan 21, 2022 at 2:09 AM Philipp Rudo  wrote:
>
> Hi Pingfan,
>
> On Tue, 18 Jan 2022 15:48:12 +0800
> Pingfan Liu  wrote:
>
> > From: Kairui Song 
> >
> > Since kernel commit 14c127c957c1 ('arm64: mm: Flip kernel VA space'),
> > the memory layout on arm64 have changed, and kexec-tools can no longer
> > get the the right PAGE_OFFSET based on _text symbol.
> >
> > Prior to that, the kimage (_text) lays above PAGE_END with this layout:
> > 0   -> VA_START : Usespace
> > VA_START-> VA_START + 256M  : BPF JIT, Modules
> > VA_START + 256M -> PAGE_OFFSET - (~GB misc) : Vmalloc (KERNEL _text HERE)
> > PAGE_OFFSET -> ...  : * Linear map *
> >
> > And here we have:
> > VA_START= -1UL << VA_BITS
> > PAGE_OFFSET = -1UL << (VA_BITS - 1)
> > _text < -1UL << (VA_BITS - 1)
> >
> > Kernel image lays somewhere between VA_START and PAGE_OFFSET, so we just
> > calc VA_BITS by getting the highest unset bit of _text symbol address,
> > and shift one less bit of VA_BITS to get page offset. This works as long
> > as KASLR don't put kernel in a too high location (which is commented 
> > inline).
> >
> > And after that commit, kernel layout have changed:
> > 0   -> PAGE_OFFSET  : Userspace
> > PAGE_OFFSET -> PAGE_END : * Linear map *
> > PAGE_END-> PAGE_END + 128M  : bpf jit region
> > PAGE_END + 128M -> PAGE_END + 256MB : modules
> > PAGE_END + 256M -> ...  : vmalloc (KERNEL _text HERE)
> >
> > Here we have:
> > PAGE_OFFSET = -1UL << VA_BITS
> > PAGE_END= -1UL << (VA_BITS - 1)
> > _text > -1UL << (VA_BITS - 1)
> >
> > Kernel image now lays above PAGE_END, so we have to shift one more bit to
> > get the VA_BITS, and shift the exact VA_BITS for PAGE_OFFSET.
> >
> > We can simply check if "_text > -1UL << (VA_BITS - 1)" is true to judge
> > which layout is being used and shift the page offset occordingly.
> >
> > Signed-off-by: Kairui Song 
> > (rebased and stripped by Pingfan )
> > Signed-off-by: Pingfan Liu 
> > Cc: Simon Horman 
> > Cc: Philipp Rudo 
> > To: kexec@lists.infradead.org
> > ---
> >  kexec/arch/arm64/kexec-arm64.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index 793799b..ce7a5bb 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -923,13 +923,25 @@ out:
> >
> >  int get_page_offset(unsigned long *page_offset)
> >  {
> > + unsigned long long text_sym_addr, kernel_va_mid;
> >   int ret;
> >
> > + text_sym_addr = get_kernel_sym("_text");
> > + if (text_sym_addr == 0) {
> > + fprintf(stderr, "Can't get the symbol of _text to calculate 
> > page_offset.\n");
> > + return -1;
> > + }
> > +
> >   ret = get_va_bits();
> >   if (ret < 0)
> >   return ret;
> >
> > - if (va_bits < 52)
> > + /* Since kernel 5.4, kernel image is put above
> > +  * UINT64_MAX << (va_bits - 1)
> > +  */
> > + kernel_va_mid = UINT64_MAX << (va_bits - 1);
> > + /* older kernel */
> > + if (text_sym_addr < kernel_va_mid)
> >   *page_offset = UINT64_MAX << (va_bits - 1);
> >   else
> >   *page_offset = UINT64_MAX << va_bits;
>
> I would drop the kernel_va_mid and simply use
>
> *page_offset = UINT64_MAX << (va_bits - 1)
> if (*page_offset > text_sym_addr > *page_offset)
> *page_offset = UINT64_MAX << va_bits
>
> but that's more a matter of taste.
>
Ah, I kept kernel_va_mid dedicatedly to illustrate the purpose.

> Reviewed-by: Philipp Rudo 
>
Thanks for your reviewing.

Regards,

Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-20 Thread Guilherme G. Piccoli
Hi Baoquan, some comments inline below:


On 20/01/2022 05:51, Baoquan He wrote:
> [...]
>> From my POV, the function of panic notifiers is not well defined. They
>> do various things, for example:
>> [...]
>> The do more that just providing information. Some are risky. It is not
>> easy to disable a particular one.
> 
> Yes, agree. Not all of them just provide information. 
> 
> Now panic_notifier_filter Guilherme added can help to disable some of
> them.

So, just for completeness, worth to mention Petr had some interesting
suggestions in the other thread (about the filter) and we may end-up not
having this implemented - in other words, maybe a refactor of that
mechanism is going to be proposed.


> [...] 
>>
>>   + Guilherme uses crash dump only to dump the kernel log. It might
>> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
>> is the only way to get the extra information.
> 
> Hmm, I haven't made clear what Guilherme really wants in his recent
> post. In this patch he wants to get panic print info into pstore. He
> also want to dump the kernel log poked by panic_print in kdump kernel. 
> And it's very weird people try to collect kernel log via crash dump
> mechnism, that is obviously using a sledgehammer to crack a nut.
> Sometime, we should not add or change code to a too specific corner
> case.

OK, I'll try to be really clear, hopefully I can explain the use case in
better and simpler words. First of all, I wouldn't call it a corner case
- it's just a valid use case that, in my opinion, should be allowed. Why
not, right? Kernel shouldn't push policy on users, we should instead let
the users decide how to use the tools/options.

So imagine you cannot collect a vmcore, due to the lack of storage
space. Yet, you want the most information as possible to investigate the
cause of a panic. The kernel flag "panic_print" is the perfect fit, we
can dump backtraces, task list, memory info...right on a panic event.

But then, how to save this panic log with lots of information after a
reboot? There are 2 ways in my understanding:

(a) pstore/kmsg_dump()
(b) kdump

The option (a) is easily the best - we don't need to reserve lots of
memory, then boot another kernel, etc. This patch (being hereby
discussed) aims to enable the "panic_print" output for this case!
But...there are cases in which option (a) cannot work. We need a backend
of persistent storage, either a block device or, more common, RAM memory
that is persistent across a reboot. What if it's not available?

Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
It's not ideal, but might be a last resort for users wanting to collect
the most information they can without saving a full vmcore. And for
that, we need to be able to invoke "panic_print" function before the
__crash_kexec() call. Continue below...


> [...] I noticed
> below patch from Guilherme has been queued in linux-next. At least, from
> the commit log, I don't understand why a kernel log need be collected
> via crash dump. Now, this patch is posted, kernel log need be collected
> via kmsg_dump. Really hope we can make all things clear, then a final
> agreement is made.
> 
> commit ab693ae2140afdf797cc376b3569ca9850a7681d
> Author: Guilherme G. Piccoli 
> Date:   Thu Dec 30 20:29:14 2021 +1100
> 
> panic: allow printing extra panic information on kdump
> 
> 
> In fact, my suggestion is as below. I would like to see kmsg_dump()
> being moved above panic_notifer after Guilherme's careful evaluation.
> 
> void panic()
> {
> if (!_crash_kexec_post_notifiers && !panic_print) {
> __crash_kexec(NULL);
> smp_send_stop();
> } else {
> crash_smp_send_stop();
> }
> 
>   atomic_notifier_call_chain(_notifier_list, 0, buf);
>   panic_print_sys_info(false);
>   kmsg_dump(KMSG_DUMP_PANIC);
>   if (_crash_kexec_post_notifiers || panic_print)
> __crash_kexec(NULL);
>   ...
>   debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> 
> panic_print_sys_info(true);
>   ..
> }
>

So, your idea is good and it mostly works, except it *requires* users to
make use of "crash_kexec_post_notifiers" in order to use "panic_print"
in the case (b) above discussed.

Do you think it should be necessary?
How about if we allow users to just "panic_print" with or without the
"crash_kexec_post_notifiers", then we pursue Petr suggestion of
refactoring the panic notifiers? So, after this future refactor, we
might have a much clear code.


> Please, don't name 'after_kmsg_dumpers', that's too nerd, bro :-)
> static void panic_print_sys_info(bool console_flush)

Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
iteration, although my nerd side won't be so happy ;-)

Thanks for your review/comments once more.
Cheers,


Guilherme

___
kexec mailing list

Re: [PATCHv4 1/4] arm64: make phys_offset signed

2022-01-20 Thread Philipp Rudo
Hi Pingfan,

On Tue, 18 Jan 2022 15:48:09 +0800
Pingfan Liu  wrote:

> After kernel commit 7bc1a0f9e176 ("arm64: mm: use single quantity to
> represent the PA to VA translation"), phys_offset can be negative if
> running 52-bits kernel on 48-bits hardware.
> 
> So changing phys_offset from unsigned to signed.
> 
> Signed-off-by: Pingfan Liu 
> Cc: Kairui Song 
> Cc: Simon Horman 
> Cc: Philipp Rudo 
> To: kexec@lists.infradead.org
> ---
>  kexec/arch/arm64/kexec-arm64.c | 12 ++--
>  kexec/arch/arm64/kexec-arm64.h |  2 +-
>  util_lib/elf_info.c|  2 +-
>  util_lib/include/elf_info.h|  2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
> 

[...]

> diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> index ed447ac..1844b67 100644
> --- a/kexec/arch/arm64/kexec-arm64.h
> +++ b/kexec/arch/arm64/kexec-arm64.h
> @@ -58,7 +58,7 @@ extern off_t initrd_size;
>   */
>  
>  struct arm64_mem {
> - uint64_t phys_offset;
> + long phys_offset;

I think this one should be int64_t as well.

Other than that
Reviewed-by: Philipp Rudo 

>   uint64_t text_offset;
>   uint64_t image_size;
>   uint64_t vp_offset;


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv4 2/4] arm64/crashdump: unify routine to get page_offset

2022-01-20 Thread Philipp Rudo
Hi Pingfan,

On Tue, 18 Jan 2022 15:48:10 +0800
Pingfan Liu  wrote:

> There are two funcs to get page_offset:
>   get_kernel_page_offset()
>   get_page_offset()
> 
> Since get_kernel_page_offset() does not observe the kernel formula, and
> remove it. Unify them in order to introduce 52-bits VA kernel more
> easily in the coming patch.
> 
> Signed-off-by: Pingfan Liu 
> Cc: Kairui Song 
> Cc: Simon Horman 
> Cc: Philipp Rudo 
> To: kexec@lists.infradead.org

looks good

Reviewed-by: Philipp Rudo 

> ---
>  kexec/arch/arm64/crashdump-arm64.c | 23 +--
>  kexec/arch/arm64/kexec-arm64.c |  8 
>  kexec/arch/arm64/kexec-arm64.h |  1 +
>  3 files changed, 6 insertions(+), 26 deletions(-)
> 
> diff --git a/kexec/arch/arm64/crashdump-arm64.c 
> b/kexec/arch/arm64/crashdump-arm64.c
> index a02019a..0a8d44c 100644
> --- a/kexec/arch/arm64/crashdump-arm64.c
> +++ b/kexec/arch/arm64/crashdump-arm64.c
> @@ -46,27 +46,6 @@ static struct crash_elf_info elf_info = {
>   .machine= EM_AARCH64,
>  };
>  
> -/*
> - * Note: The returned value is correct only if !CONFIG_RANDOMIZE_BASE.
> - */
> -static uint64_t get_kernel_page_offset(void)
> -{
> - int i;
> -
> - if (elf_info.kern_vaddr_start == UINT64_MAX)
> - return UINT64_MAX;
> -
> - /* Current max virtual memory range is 48-bits. */
> - for (i = 48; i > 0; i--)
> - if (!(elf_info.kern_vaddr_start & (1UL << i)))
> - break;
> -
> - if (i <= 0)
> - return UINT64_MAX;
> - else
> - return UINT64_MAX << i;
> -}
> -
>  /*
>   * iomem_range_callback() - callback called for each iomem region
>   * @data: not used
> @@ -198,7 +177,7 @@ int load_crashdump_segments(struct kexec_info *info)
>   if (err)
>   return EFAILED;
>  
> - elf_info.page_offset = get_kernel_page_offset();
> + get_page_offset(_info.page_offset);
>   dbgprintf("%s: page_offset:   %016llx\n", __func__,
>   elf_info.page_offset);
>  
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index c6c67e8..33cc258 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -909,7 +909,7 @@ static int get_va_bits(void)
>   * get_page_offset - Helper for getting PAGE_OFFSET
>   */
>  
> -static int get_page_offset(void)
> +int get_page_offset(unsigned long *page_offset)
>  {
>   int ret;
>  
> @@ -917,8 +917,8 @@ static int get_page_offset(void)
>   if (ret < 0)
>   return ret;
>  
> - page_offset = (0xUL) << (va_bits - 1);
> - dbgprintf("page_offset : %lx\n", page_offset);
> + *page_offset = UINT64_MAX << (va_bits - 1);
> + dbgprintf("page_offset : %lx\n", *page_offset);
>  
>   return 0;
>  }
> @@ -954,7 +954,7 @@ int get_phys_base_from_pt_load(long *phys_offset)
>   unsigned long long phys_start;
>   unsigned long long virt_start;
>  
> - ret = get_page_offset();
> + ret = get_page_offset(_offset);
>   if (ret < 0)
>   return ret;
>  
> diff --git a/kexec/arch/arm64/kexec-arm64.h b/kexec/arch/arm64/kexec-arm64.h
> index 1844b67..ed99d9d 100644
> --- a/kexec/arch/arm64/kexec-arm64.h
> +++ b/kexec/arch/arm64/kexec-arm64.h
> @@ -69,6 +69,7 @@ extern struct arm64_mem arm64_mem;
>  
>  uint64_t get_phys_offset(void);
>  uint64_t get_vp_offset(void);
> +int get_page_offset(unsigned long *offset);
>  
>  static inline void reset_vp_offset(void)
>  {


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv4 3/4] arm64: read VA_BITS from kcore for 52-bits VA kernel

2022-01-20 Thread Philipp Rudo
Hi Pingfan,

On Tue, 18 Jan 2022 15:48:11 +0800
Pingfan Liu  wrote:

> phys_to_virt() calculates virtual address. As a important factor,
> page_offset is excepted to be accurate.
> 
> Since arm64 kernel exposes va_bits through vmcore, using it.
> 
> Signed-off-by: Pingfan Liu 
> Cc: Kairui Song 
> Cc: Simon Horman 
> Cc: Philipp Rudo 
> To: kexec@lists.infradead.org

looks good

Reviewed-by: Philipp Rudo 

> ---
>  kexec/arch/arm64/kexec-arm64.c | 34 ++
>  util_lib/elf_info.c|  5 +
>  util_lib/include/elf_info.h|  1 +
>  3 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index 33cc258..793799b 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -54,7 +54,7 @@
>  static bool try_read_phys_offset_from_kcore = false;
>  
>  /* Machine specific details. */
> -static int va_bits;
> +static int va_bits = -1;
>  static unsigned long page_offset;
>  
>  /* Global varables the core kexec routines expect. */
> @@ -876,7 +876,18 @@ static inline void set_phys_offset(int64_t v, char 
> *set_method)
>  
>  static int get_va_bits(void)
>  {
> - unsigned long long stext_sym_addr = get_kernel_sym("_stext");
> + unsigned long long stext_sym_addr;
> +
> + /*
> +  * if already got from kcore
> +  */
> + if (va_bits != -1)
> + goto out;
> +
> +
> + /* For kernel older than v4.19 */
> + fprintf(stderr, "Warning, can't get the VA_BITS from kcore\n");
> + stext_sym_addr = get_kernel_sym("_stext");
>  
>   if (stext_sym_addr == 0) {
>   fprintf(stderr, "Can't get the symbol of _stext.\n");
> @@ -900,6 +911,7 @@ static int get_va_bits(void)
>   return -1;
>   }
>  
> +out:
>   dbgprintf("va_bits : %d\n", va_bits);
>  
>   return 0;
> @@ -917,14 +929,27 @@ int get_page_offset(unsigned long *page_offset)
>   if (ret < 0)
>   return ret;
>  
> - *page_offset = UINT64_MAX << (va_bits - 1);
> + if (va_bits < 52)
> + *page_offset = UINT64_MAX << (va_bits - 1);
> + else
> + *page_offset = UINT64_MAX << va_bits;
> +
>   dbgprintf("page_offset : %lx\n", *page_offset);
>  
>   return 0;
>  }
>  
> +static void arm64_scan_vmcoreinfo(char *pos)
> +{
> + const char *str;
> +
> + str = "NUMBER(VA_BITS)=";
> + if (memcmp(str, pos, strlen(str)) == 0)
> + va_bits = strtoul(pos + strlen(str), NULL, 10);
> +}
> +
>  /**
> - * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting PHYS_OFFSET
> + * get_phys_offset_from_vmcoreinfo_pt_note - Helper for getting PHYS_OFFSET 
> (and va_bits)
>   * from VMCOREINFO note inside 'kcore'.
>   */
>  
> @@ -937,6 +962,7 @@ static int get_phys_offset_from_vmcoreinfo_pt_note(long 
> *phys_offset)
>   return EFAILED;
>   }
>  
> + arch_scan_vmcoreinfo = arm64_scan_vmcoreinfo;
>   ret = read_phys_offset_elf_kcore(fd, phys_offset);
>  
>   close(fd);
> diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
> index 5574c7f..d252eff 100644
> --- a/util_lib/elf_info.c
> +++ b/util_lib/elf_info.c
> @@ -310,6 +310,8 @@ int get_pt_load(int idx,
>  
>  #define NOT_FOUND_LONG_VALUE (-1)
>  
> +void (*arch_scan_vmcoreinfo)(char *pos);
> +
>  void scan_vmcoreinfo(char *start, size_t size)
>  {
>   char *last = start + size - 1;
> @@ -551,6 +553,9 @@ void scan_vmcoreinfo(char *start, size_t size)
>   }
>   }
>  
> + if (arch_scan_vmcoreinfo != NULL)
> + (*arch_scan_vmcoreinfo)(pos);
> +
>   if (last_line)
>   break;
>   }
> diff --git a/util_lib/include/elf_info.h b/util_lib/include/elf_info.h
> index f550d86..fdf4c3d 100644
> --- a/util_lib/include/elf_info.h
> +++ b/util_lib/include/elf_info.h
> @@ -31,5 +31,6 @@ int get_pt_load(int idx,
>  int read_phys_offset_elf_kcore(int fd, long *phys_off);
>  int read_elf(int fd);
>  void dump_dmesg(int fd, void (*handler)(char*, unsigned int));
> +extern void (*arch_scan_vmcoreinfo)(char *pos);
>  
>  #endif /* ELF_INFO_H */


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv4 4/4] arm64: fix PAGE_OFFSET calc for flipped mm

2022-01-20 Thread Philipp Rudo
Hi Pingfan,

On Tue, 18 Jan 2022 15:48:12 +0800
Pingfan Liu  wrote:

> From: Kairui Song 
> 
> Since kernel commit 14c127c957c1 ('arm64: mm: Flip kernel VA space'),
> the memory layout on arm64 have changed, and kexec-tools can no longer
> get the the right PAGE_OFFSET based on _text symbol.
> 
> Prior to that, the kimage (_text) lays above PAGE_END with this layout:
> 0   -> VA_START : Usespace
> VA_START-> VA_START + 256M  : BPF JIT, Modules
> VA_START + 256M -> PAGE_OFFSET - (~GB misc) : Vmalloc (KERNEL _text HERE)
> PAGE_OFFSET -> ...  : * Linear map *
> 
> And here we have:
> VA_START= -1UL << VA_BITS
> PAGE_OFFSET = -1UL << (VA_BITS - 1)
> _text < -1UL << (VA_BITS - 1)
> 
> Kernel image lays somewhere between VA_START and PAGE_OFFSET, so we just
> calc VA_BITS by getting the highest unset bit of _text symbol address,
> and shift one less bit of VA_BITS to get page offset. This works as long
> as KASLR don't put kernel in a too high location (which is commented inline).
> 
> And after that commit, kernel layout have changed:
> 0   -> PAGE_OFFSET  : Userspace
> PAGE_OFFSET -> PAGE_END : * Linear map *
> PAGE_END-> PAGE_END + 128M  : bpf jit region
> PAGE_END + 128M -> PAGE_END + 256MB : modules
> PAGE_END + 256M -> ...  : vmalloc (KERNEL _text HERE)
> 
> Here we have:
> PAGE_OFFSET = -1UL << VA_BITS
> PAGE_END= -1UL << (VA_BITS - 1)
> _text > -1UL << (VA_BITS - 1)
> 
> Kernel image now lays above PAGE_END, so we have to shift one more bit to
> get the VA_BITS, and shift the exact VA_BITS for PAGE_OFFSET.
> 
> We can simply check if "_text > -1UL << (VA_BITS - 1)" is true to judge
> which layout is being used and shift the page offset occordingly.
> 
> Signed-off-by: Kairui Song 
> (rebased and stripped by Pingfan )
> Signed-off-by: Pingfan Liu 
> Cc: Simon Horman 
> Cc: Philipp Rudo 
> To: kexec@lists.infradead.org
> ---
>  kexec/arch/arm64/kexec-arm64.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index 793799b..ce7a5bb 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -923,13 +923,25 @@ out:
>  
>  int get_page_offset(unsigned long *page_offset)
>  {
> + unsigned long long text_sym_addr, kernel_va_mid;
>   int ret;
>  
> + text_sym_addr = get_kernel_sym("_text");
> + if (text_sym_addr == 0) {
> + fprintf(stderr, "Can't get the symbol of _text to calculate 
> page_offset.\n");
> + return -1;
> + }
> +
>   ret = get_va_bits();
>   if (ret < 0)
>   return ret;
>  
> - if (va_bits < 52)
> + /* Since kernel 5.4, kernel image is put above
> +  * UINT64_MAX << (va_bits - 1)
> +  */
> + kernel_va_mid = UINT64_MAX << (va_bits - 1);
> + /* older kernel */
> + if (text_sym_addr < kernel_va_mid)
>   *page_offset = UINT64_MAX << (va_bits - 1);
>   else
>   *page_offset = UINT64_MAX << va_bits;

I would drop the kernel_va_mid and simply use

*page_offset = UINT64_MAX << (va_bits - 1)
if (*page_offset > text_sym_addr > *page_offset)
*page_offset = UINT64_MAX << va_bits

but that's more a matter of taste.

Reviewed-by: Philipp Rudo 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-20 Thread Guilherme G. Piccoli
On 20/01/2022 06:39, Petr Mladek wrote:
> [...]
> It makes perfect sense to disable the watchdogs during panic().
> For example, rcu_panic() just sets a variable:
> 
> static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
> {
>   rcu_cpu_stall_suppress = 1;
>   return NOTIFY_DONE;
> }
> 
> It is quick and super-safe. It might make sense to implement similar
> thing for other watchdogs and do something like:
> 
> void panic_supress_watchdogs(void)
> {
>   rcu_panic();
>   softlockup_watchog_panic();
>   wq_watchog_panic();
> }
> 
> It might be caller early in panic().
> 

For reference, I saw your great input about this subject in another
related thread, which I think we should mention here:

https://lore.kernel.org/lkml/Yel8WQiBn%2FHNQN83@alley/


> JFYI, there is an extension for the crash tool that might be able to read
> the trace log, see https://crash-utility.github.io/extensions.html
> 
> I haven't tested it myself yet.

Thanks, nice to know =)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V4] notifier/panic: Introduce panic_notifier_filter

2022-01-20 Thread Petr Mladek
Adding some more people into Cc. Some modified the logic in the past.
Some are familiar with some interesting areas where the panic
notfiers are used.

On Sat 2022-01-08 12:34:51, Guilherme G. Piccoli wrote:
> The kernel notifier infrastructure allows function callbacks to be
> added in multiple lists, which are then called in the proper time,
> like in a reboot or panic event. The panic_notifier_list specifically
> contains the callbacks that are executed during a panic event. As any
> other notifier list, the panic one has no filtering and all functions
> previously registered are executed.
> 
> The kdump infrastructure, on the other hand, enables users to set
> a crash kernel that is kexec'ed in a panic event, and vmcore/logs
> are collected in such crash kernel. When kdump is set, by default
> the panic notifiers are ignored - the kexec jumps to the crash kernel
> before the list is checked and callbacks executed.
> 
> There are some cases though in which kdump users might want to
> allow panic notifier callbacks to execute _before_ the kexec to
> the crash kernel, for a variety of reasons - for example, users
> may think kexec is very prone to fail and want to give a chance
> to kmsg dumpers to run (and save logs using pstore),

Yes, this seems to be original intention for the
"crash_kexec_post_notifiers" option, see the commit
f06e5153f4ae2e2f3b0300f ("kernel/panic.c: add
"crash_kexec_post_notifiers" option for kdump after panic_notifiers")


> some panic notifier is required to properly quiesce some hardware
> that must be used to the crash kernel.

Do you have any example, please? The above mentioned commit
says "crash_kexec_post_notifiers" actually increases risk
of kdump failure.

Note that kmsg_dump() is called after the notifiers only because
some are printing more information, see the commit
6723734cdff15211bb78a ("panic: call panic handlers before kmsg_dump").
They might still increase the change that kmsg_dump() will never
be called.


> But there's a problem: currently it's an "all-or-nothing" situation,
> the kdump user choice is either to execute all panic notifiers or
> none of them. Given that panic notifiers may increase the risk of a
> kdump failure, this is a tough decision and may affect the debug of
> hard to reproduce bugs, if for some reason the user choice is to
> enable panic notifiers, but kdump then fails.
>
> So, this patch aims to ease this decision: we hereby introduce a filter
> for the panic notifier list, in which users may select specifically
> which callbacks they wish to run, allowing a safer kdump. The allowlist
> should be provided using the parameter "panic_notifier_filter=a,b,..."
> where a, b are valid callback names. Invalid symbols are discarded.

I am afraid that this is almost unusable solution:

   + requires deep knowledge of what each notifier does
   + might need debugging what notifier causes problems
   + the list might need to be updated when new notifiers are added
   + function names are implementation detail and might change
   + requires kallsyms


It is only workaround for a real problem. The problem is that
"panic_notifier_list" is used for many purposes that break
each other.

I checked some notifiers and found few groups:

   + disable watchdogs:
  + hung_task_panic()
  + rcu_panic()

   + dump information:
  + kernel_offset_notifier()
  + trace_panic_handler() (duplicate of panic_print=0x10)

   + inform hypervisor
  + xen_panic_event()
  + pvpanic_panic_notify()
  + hyperv_panic_event()

   + misc cleanup / flush / blinking
  + panic_event()   in ipmi_msghandler.c
  + panic_happened()   in heartbeat.c
  + led_trigger_panic_notifier()


IMHO, the right solution is to split the callbacks into 2 or more
notifier list. Then we might rework panic() to do:

void panic(void)
{
[...]

/* stop watchdogs + extra info */
atomic_notifier_call_chain(_disable_watchdogs_notifier_list, 0, 
buf);
atomic_notifier_call_chain(_info_notifier_list, 0, buf);
panic_print_sys_info();

/* crash_kexec + kmsg_dump in configurable order */
if (!_crash_kexec_post_kmsg_dump) {
__crash_kexec(NULL);
smp_send_stop();
} else {
crash_smp_send_stop();
}

kmsg_dump();
if (_crash_kexec_post_kmsg_dump)
__crash_kexec(NULL);

/* infinite loop or reboot */
atomic_notifier_call_chain(_hypervisor_notifier_list, 0, buf);
atomic_notifier_call_chain(_rest_notifier_list, 0, buf);

console_flush_on_panic(CONSOLE_FLUSH_PENDING);

if (panic_timeout >= 0) {
timeout();
emergency_restart();
}

for (i = 0; ; i += PANIC_TIMER_STEP) {
if (i >= i_next) {
i += panic_blink(state ^= 1);
i_next = i + 3600 / PANIC_BLINK_SPD;
}

Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-20 Thread Baoquan He
On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code
> and increase compile coverage.

Only checked the x86 patch, but the whole patchset looks good to me,
thanks, Jisheng.

Acked-by: Baoquan He 

Maybe Andrew can help pick this whole series lest each patch need be taken
care of by its own ARCH maintainer.

> 
> I only modify x86, arm, arm64 and riscv, other architectures such as
> sh, powerpc and s390 are better to be kept kexec code as-is so they
> are not touched.
> 
> Since v1:
>  - collect Reviewed-by tag
>  - fix misleading commit msg.
> 
> Jisheng Zhang (5):
>   kexec: make crashk_res, crashk_low_res and crash_notes symbols always
> visible
>   riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>   x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>   arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>   arm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
> 
>  arch/arm/kernel/setup.c |  7 +++
>  arch/arm64/mm/init.c|  9 +++--
>  arch/riscv/mm/init.c|  6 ++
>  arch/x86/kernel/setup.c | 10 +++---
>  include/linux/kexec.h   | 12 ++--
>  5 files changed, 17 insertions(+), 27 deletions(-)
> 
> -- 
> 2.34.1
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-20 Thread Baoquan He
On 01/19/22 at 07:44pm, Jisheng Zhang wrote:
> On Wed, Jan 19, 2022 at 05:33:22PM +0800, Baoquan He wrote:
> > On 01/19/22 at 09:52am, Alexandre Ghiti wrote:
> > > Hi Baoquan,
> > > 
> > > On Wed, Jan 19, 2022 at 9:11 AM Baoquan He  wrote:
> > > >
> > > > On 01/18/22 at 10:13pm, Jisheng Zhang wrote:
> > > > > On Sun, Jan 16, 2022 at 09:38:47PM +0800, Baoquan He wrote:
> > > > > > Hi Jisheng,
> > > > >
> > > > > Hi Baoquan,
> > > > >
> > > > > >
> > > > > > On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> > > > > > > Replace the conditional compilation using "#ifdef 
> > > > > > > CONFIG_KEXEC_CORE"
> > > > > > > by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the 
> > > > > > > code
> > > > > > > and increase compile coverage.
> > > > > >
> > > > > > I go through this patchset, You mention the benefits it brings are
> > > > > > 1) simplity the code;
> > > > > > 2) increase compile coverage;
> > > > > >
> > > > > > For benefit 1), it mainly removes the dummy function in x86, arm and
> > > > > > arm64, right?
> > > > >
> > > > > Another benefit: remove those #ifdef #else #endif usage. Recently, I
> > > > > fixed a bug due to lots of "#ifdefs":
> > > > > http://lists.infradead.org/pipermail/linux-riscv/2021-December/010607.html
> > > >
> > > > Glad to know the fix. While, sometime the ifdeffery is necessary. I am
> > > > sorry about the one in riscv and you have fixed, it's truly a bug . But,
> > > > the increasing compile coverage at below you tried to make, it may cause
> > > > issue. Please see below my comment.
> > > >
> > > > >
> > > > > >
> > > > > > For benefit 2), increasing compile coverage, could you tell more 
> > > > > > how it
> > > > > > achieves and why it matters? What if people disables 
> > > > > > CONFIG_KEXEC_CORE in
> > > > > > purpose? Please forgive my poor compiling knowledge.
> > > > >
> > > > > Just my humble opinion, let's compare the code::
> > > > >
> > > > > #ifdef CONFIG_KEXEC_CORE
> > > > >
> > > > > code block A;
> > > > >
> > > > > #endif
> > > > >
> > > > > If KEXEC_CORE is disabled, code block A won't be compiled at all, the
> > > > > preprocessor will remove code block A;
> > > > >
> > > > > If we convert the code to:
> > > > >
> > > > > if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
> > > > >   code block A;
> > > > > }
> > > > >
> > > > > Even if KEXEC_CORE is disabled, code block A is still compiled.
> > > >
> > > > This is what I am worried about. Before, if CONFIG_KEXEC_CORE is
> > > > unset, those relevant codes are not compiled in. I can't see what
> > > > benefit is brought in if compiled in the unneeded code block. Do I miss
> > > > anything?
> > > >
> > > 
> > > This is explained in Documentation/process/coding-style.rst "21)
> > > Conditional Compilation".
> > 
> > Thanks for the pointer, Alex.
> > 
> > I read that part, while my confusion isn't gone. With the current code,
> > CONFIG_KEXEC_CORE is set,
> >   - reserve_crashkernel_low() and reserve_crashkernel() compiled in.
> 
> Although the code block will be compiled, but the code block will be
> optimized out.
> 
> > CONFIG_KEXEC_CORE is unset,
> >   - reserve_crashkernel_low() and reserve_crashkernel() compiled out. 
> > 
> > After this patch applied, does it have the same effect as the old code?
> 
> I compared the .o, and can confirm they acchieve the same effect.

Checked the .o, it's truly as you said. I didn't know this before,
thank you and Alex, learned this now.

Seems only static function has this effect. I tested your x86 patch,
those two functions are all optimized out. If I remove the static,
the entire reserve_crashkernel_low() exists, while reserve_crashkernel()
will be optimized as a empty function. 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-20 Thread Petr Mladek
On Wed 2022-01-19 13:03:15, Guilherme G. Piccoli wrote:
> Thanks again Petr, for the deep analysis! Much appreciated.
> Some comments inline below:
> 
> 
> On 19/01/2022 12:48, Petr Mladek wrote:
> >[...]
> > From my POV, the function of panic notifiers is not well defined. They
> > do various things, for example:
> > [...] 
> > 
> > The do more that just providing information. Some are risky. It is not
> > easy to disable a particular one.
> 
> We are trying to change that here:
> https://lore.kernel.org/lkml/20220108153451.195121-1-gpicc...@igalia.com/
> 
> Your review/comments are very welcome =)
> 
> 
> > [...] 
> > It might make sense to allow to call kmsg_dump before panic notifiers
> > to reduce the risk of a breakage. But I do not have enough experience
> > with them to judge this.
> > 
> > I can't remember any bug report in this code. I guess that only
> > few people use kmsg_dump.
> 
> One of the problems doing that is that RCU and hung task detector, for
> example, have panic notifiers to disable themselves in the panic
> scenario - if we kmsg_dump() _before_ the panic notifiers, we may have
> intermixed messages, all messy...so for me it makes sense to keep the
> kmsg_dump() after panic notifiers.

It makes perfect sense to disable the watchdogs during panic().
For example, rcu_panic() just sets a variable:

static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
{
rcu_cpu_stall_suppress = 1;
return NOTIFY_DONE;
}

It is quick and super-safe. It might make sense to implement similar
thing for other watchdogs and do something like:

void panic_supress_watchdogs(void)
{
rcu_panic();
softlockup_watchog_panic();
wq_watchog_panic();
}

It might be caller early in panic().

> 
> > [...]
> > Yes, panic_print_sys_info() increases the risk that the crash dump
> > will not succeed. But the change makes sense because:
> > 
> >   + panic_print_sys_info() might be useful even with full crash dump.
> > For example, ftrace messages are not easy to read from the memory
> > dump.
> 
> The last point is really good, I didn't consider that before but makes a
> lot of sense - we can now dump (a hopefully small!) ftrace/event trace
> buffer to dmesg before a kdump, making it pretty easy to read that later.
> Cheers,

JFYI, there is an extension for the crash tool that might be able to read
the trace log, see https://crash-utility.github.io/extensions.html

I haven't tested it myself yet.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-20 Thread Baoquan He
On 01/19/22 at 04:48pm, Petr Mladek wrote:
> On Wed 2022-01-19 15:13:18, Baoquan He wrote:
> > On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> > > The panic_print setting allows users to collect more information in a
> > > panic event, like memory stats, tasks, CPUs backtraces, etc.
> > > This is a pretty interesting debug mechanism, but currently the print
> > > event happens *after* kmsg_dump(), meaning that pstore, for example,
> > > cannot collect a dmesg with the panic_print information.
> > .. 
> > 
> > Thanks for the effort.  
> > 
> > 
> > I have some concerns about this patch, and it's still not clear to me
> > why the thing done in this patch is needed. Hope you don't mind if I ask
> > stupid question or things have been discussed in the earlier post.
> > 
> > Firstly, let me add some background information and details about the
> > problem with my understanding, please correct me if anthing is wrong.
> > 
> > Background:
> > ===
> > Currently, in panic(), there are 5 different ways to try best to
> > collect information:
> > 1) Vmcore dumping
> >done via kdump, high weight, almost get all information we want;
> >need switch to kdump kernel immediately.
> 
Thanks for the input, Petr.

> My experience is that it basically always works when it is correctly
> configured. It might be tested before the crash.

I guess you mean on a certain machine, admin usually triggers a crash
manually to test the vmcore dumping. If it works with the test, it
always works later. However, the success of switching sometime may
depend on what is crashed.

We got report of on guest of hyperV, they enable panic_notifier by
default, w/o the need of adding in cmdline. When crash triggered, it
will enter into the panic_notifier and execute the HyperV's specific
task, then it failed and never reach to kdump jumping.
> 
> 
> 
> > 2) Panic notifier
> >iterating those registered handlers, light weight, information got
> >depends on those panic handlers. Try its best after panic, do not reboot.
> 
> 
> From my POV, the function of panic notifiers is not well defined. They
> do various things, for example:
> 
>   + on_panic_nb() flushes s390 console buffers. It looks like a
> complex and risky code. For example, __sclp_vt220_flush_buffer()
> takes spin locks and calls timer code.
> 
>   + dump_gisb_error() prints some info but it also communicates with
> the device, see gisb_read() and gisb_write() in
> brcmstb_gisb_arb_decode_addr(). I am not sure how reliable this is.
> 
>   + parisc_panic_event() re-enables the soft-power switch. I am not
> sure how safe this is.
> 
>   + pvpanic_panic_notify() takes spin lock and does in iowrite().
> 
> 
> The do more that just providing information. Some are risky. It is not
> easy to disable a particular one.

Yes, agree. Not all of them just provide information. 

Now panic_notifier_filter Guilherme added can help to disable some of
them.

> 
> 
> > 3) kmsg_dump
> >iterating registered dumpers to collect as much kernel log as possible
> >to store. E.g on pstore, light weight, only kernel log, do not reboot,
> >log can be checked after system reboot.
> 
> From my POV, it is similar functionality like the crash dump.
> 
> It might make sense to allow to call kmsg_dump before panic notifiers
> to reduce the risk of a breakage. But I do not have enough experience
> with them to judge this.

Hmm, from the content it saved, it's more like the vmcore-dmesg
we call in kdump kernel. From behaviour, it's a little similar with
crash dump, dumped file are saved and can be checked after system
reboot. While it can't be compared to crash dump. With my understanding,
kmsg_dump is mostly used on embeded systems or mobile devices.

I tend to agree with you to move kmsg_dump before panic notifiers since
kmsg_dump looks lighter weight than panic notifier. Let's see how the
users of them say.
> 
> I can't remember any bug report in this code. I guess that only
> few people use kmsg_dump.
> 
> 
> > 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> >Flush to get the pending logbuf printed out to console.
> 
> My experience is that there are many problems with consoles.
> There might be deadlocks because of internal locks. Some
> serial consoles are incredibly slow. Messages in the console
> drivers might cause infinite output. Some drivers are pretty
> complex, especially tty that is often enabled.
> 
> console_flush_on_panic() makes it even worse. It resets console_sem.
> It means that it tries to use the consoles even when they are
> already in use.
> 
> Note that printk() tries to show the messages on the console
> immediately. console_flush_on_panic() is just the last
> instance when all safe ways failed.

OK, then putting them at last stage to execute sounds reasonable. 

> 
> 
> > 5)panic_print
> >Print out more system