Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes
Hi Dave, I have mistaken understand about the first point of view, thanks for correct my fault. But how about my second point about the optimization for ARM64 getting crash notes? 2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes for online cpus. If one cpu contains invalid note, it's better to continue finding the crash notes for other online cpus. So we can extract the backtraces for the online cpus which contain valid note by using command "bt -a". As a result without this change, for example, on an ELF-format kdump created by arm64 kernel, I can see the below warning messeges during crash initialization: WARNING: invalid note (name != "CORE" WARNING: cannot retrieve registers for active tasks From this dump, we can see online cpu number is 0,1,2,5,6,7: crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0xe7} } However, we cannot retrieve registers and backtraces for all active tasks by using command "bt -a": crash> bt -a PID: 244TASK: ffdcb7e40f80 CPU: 0 COMMAND: "hang_detect" bt: WARNING: cannot determine starting stack frame for task ffdcb7e40f80 PID: 0 TASK: ffdd0066dd00 CPU: 1 COMMAND: "swapper/1" bt: WARNING: cannot determine starting stack frame for task ffdd0066dd00 PID: 0 TASK: ffdd00669f00 CPU: 2 COMMAND: "swapper/2" bt: WARNING: cannot determine starting stack frame for task ffdd00669f00 PID: 0 TASK: ffdd0066cd80 CPU: 3 COMMAND: "swapper/3" bt: WARNING: cannot determine starting stack frame for task ffdd0066cd80 PID: 0 TASK: ffdd0066ae80 CPU: 4 COMMAND: "swapper/4" bt: WARNING: cannot determine starting stack frame for task ffdd0066ae80 PID: 0 TASK: ffdd0066be00 CPU: 5 COMMAND: "swapper/5" bt: WARNING: cannot determine starting stack frame for task ffdd0066be00 PID: 0 TASK: ffdd006f CPU: 6 COMMAND: "swapper/6" bt: WARNING: cannot determine starting stack frame for task ffdd006f PID: 0 TASK: ffdd006f6c80 CPU: 7 COMMAND: "swapper/7" bt: WARNING: cannot determine starting stack frame for task ffdd006f6c80 Since online cpu0 has invaid NT_PRSTATUS note, lead to skipping the reading the crash notes for other online cpus: crash> help -D | grep PC: LR: SP: PC: LR: ffa5f58b8688 SP: ffdd3fe4ea40 PC: ffa5f6932a10 LR: ffa5f58b8688 SP: ffdd3fe6ca40 PC: ffa5f6932a10 LR: SP: PC: LR: SP: PC: LR: ffa5f58b8688 SP: ffdd3fec6a40 PC: ffa5f6932a10 LR: ffa5f58b8688 SP: ffdd3fee4a40 PC: ffa5f6932a10 LR: ffa5f58b8688 SP: ffdd3ff02a40 PC: ffa5f6932a10 I test for patch v3 which achieves the proper result. crash> bt -a PID: 244TASK: ffdcb7e40f80 CPU: 0 COMMAND: "hang_detect" bt: WARNING: cannot determine starting stack frame for task ffdcb7e40f80 PID: 0 TASK: ffdd0066dd00 CPU: 1 COMMAND: "swapper/1" #0 [ffdd3fe4ea40] mrdump_stop_noncore_cpu at ffa5f6932a0c #1 [ffdd3fe4ebc0] flush_smp_call_function_queue at ffa5f58b8684 #2 [ffdd3fe4ec20] generic_smp_call_function_single_interrupt at ffa5f58ba8c4 #3 [ffdd3fe4ec30] handle_IPI at ffa5f56a8a10 #4 [ffdd3fe4eca0] gic_handle_irq at ffa5f5682294 --- --- #5 [ffdd006c7de0] el1_irq at ffa5f56844e4 PC: ffa5f7120b28 [cpuidle_enter_state+336] LR: ffa5f7120d70 [cpuidle_enter_state+920] SP: ffdd006c7df0 PSTATE: 10c00145 X29: ffdd006c7df0 X28: 0001 X27: 0001 X26: ffa5fa04e000 X25: 0001 X24: 378c55cbccab X23: ffa5f9970980 X22: 0001 X21: ffdcb6f6a400 X20: ffa5fa04efa8 X19: 378c56210393 X18: X17: X16: X15: X14: ffdd0066dd00 X13: 0037475a6000 X12: 3455591d X11: X10: 1000 X9: X8: ff8ba00d8f6c X7: X6: 1ff4bf57ac45 X5: 00209246a095bf14 X4: 348a8d795b93 X3: 431bde82d7b634db X2: 1ffba00cdba3 X1: X0: #6 [ffdd006c7df0] cpuidle_enter_state at ffa5f7120b24 #7 [ffdd006c7e70] cpuidle_enter at ffa5f712150c #8 [ffdd006c7e80] call_cpuidle at ffa5f5805e24 #9 [ffdd006c7eb0] do_idle at ffa5f5806320 #10 [ffdd006c7f80] cpu_startup_entry at ffa5f5806910 #11 [ffdd006c7fa0] secondary_start_kernel at ffa5f56a7fac PID: 0 TASK: ffdd00669f00 CPU: 2 COMMAND: "swapper/2"
Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes
- Original Message - > Hi Dave, > Above your suggestion, I made changes for patch v2. > > Best regards, > Qiwu Sorry, but I'm going to NAK this patch in its current form. I'm not exactly sure what you're trying to accomplish, but in my testing, it breaks things unnecessarily. For example, moving the call to arm64_get_crash_notes() from POST_VM to POST_INIT breaks this logic in task_init(): 645 else { 646 if (KDUMP_DUMPFILE()) 647 map_cpus_to_prstatus(); 648 else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE()) 649 map_cpus_to_prstatus_kdump_cmprs(); 650 please_wait("determining panic task"); 651 set_context(get_panic_context(), NO_PID); 652 please_wait_done(); 653 } The get_panic_context() call requires that the ARM64 arm64_get_crash_notes() has already been called, and that ms->panic_task_regs[] array has been allocated and populated. But with your patch applied, it has not been called yet, and therefore ms->page_task_regs is still NULL when get_panic_context() is called above. As a result, for example, on a compressed kdump clone created by virsh dump, I now see this during initialization: please wait... (determining panic task) WARNING: cannot determine starting stack frame for task 08c25280 WARNING: cannot determine starting stack frame for task 8000fa01b000 WARNING: cannot determine starting stack frame for task 8000fa01c000 WARNING: cannot determine starting stack frame for task 8000fa01d000 But all of those active tasks have NT_PRSTATUS notes and backtraces: crash> help -D | grep PC: LR: 08085938 SP: 08be3ee0 PC: 08099cf8 LR: 08085938 SP: 8000fa06ff30 PC: 08099cf8 LR: 08085938 SP: 8000fa073f30 PC: 08099cf8 LR: 08085938 SP: 8000fa077f30 PC: 08099cf8 crash> And therefore have legitimate starting stack frames: crash> bt 08c25280 8000fa01b000 8000fa01c000 8000fa01d000 PID: 0 TASK: 08c25280 CPU: 0 COMMAND: "swapper/0" #0 [08be3ee0] cpu_do_idle at 08099cf4 #1 [08be3f10] default_idle_call at 08766f90 #2 [08be3f20] cpu_startup_entry at 08110fe4 #3 [08be3f80] rest_init at 08761674 #4 [08be3f90] start_kernel at 08ab0be8 PID: 0 TASK: 8000fa01b000 CPU: 1 COMMAND: "swapper/1" #0 [8000fa06ff30] cpu_do_idle at 08099cf4 #1 [8000fa06ff60] default_idle_call at 08766f90 #2 [8000fa06ff70] cpu_startup_entry at 08110fe4 #3 [8000fa06ffd0] secondary_start_kernel at 0808ecb8 PID: 0 TASK: 8000fa01c000 CPU: 2 COMMAND: "swapper/2" #0 [8000fa073f30] cpu_do_idle at 08099cf4 #1 [8000fa073f60] default_idle_call at 08766f90 #2 [8000fa073f70] cpu_startup_entry at 08110fe4 #3 [8000fa073fd0] secondary_start_kernel at 0808ecb8 PID: 0 TASK: 8000fa01d000 CPU: 3 COMMAND: "swapper/3" #0 [8000fa077f30] cpu_do_idle at 08099cf4 #1 [8000fa077f60] default_idle_call at 08766f90 #2 [8000fa077f70] cpu_startup_entry at 08110fe4 #3 [8000fa077fd0] secondary_start_kernel at 0808ecb8 crash> Dave > -Original Message- > From: Dave Anderson > Sent: Wednesday, December 11, 2019 12:51 AM > To: qiwuche...@gmail.com > Cc: crash-utility@redhat.com; 陈启武 > Subject: [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting > crash notes > > > > - Original Message - > > From: chenqiwu > > > > 1) ARM64 call arm64_get_crash_notes() to retrieve active task > > registers when POST_VM before calling map_cpus_to_prstatus() to remap > > the NT_PRSTATUS elf notes to the online cpus. It's better to call > > arm64_get_crash_notes() when POST_INIT. > > 2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes only > > for online cpus. If one cpu contains invalid note, it's better to > > continue finding the crash notes for other online cpus. > > So we can extract the backtraces for the online cpus which contain > > valid note by using command "bt -a". > > 3) map_cpus_to_prstatus() remap the NT_PRSTATUS notes only to the > > online cpus. Make sure there must be a one-to-one relationship between > > the number of online cpus and the number of notes. > > The code in map_cpus_to_prstatus() and map_cpus_to_prstatus_kdump_cmprs() > has been in place forever. Both the nd->nt_prstatus_percpu[] and > dd->nt_prstatus_percpu[] arrays are per-cpu regardless whether > they are online or offline. However, since kdump
Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed
- Original Message - > Hi Dave, > > Am Mittwoch, 11. Dezember 2019, 20:28:49 CET schrieb Dave Anderson: > > > ... > > > > If you don't have a 32-bit x86 machine, or don't have the proper > > libraries to build a 32-bit crash binary on an x86_64 host with > > "make target=X86", just re-post the patch with your best effort > > and I'll build-test it. > > > > Thanks, > > Dave > > Sorry, my bad, I didn't think of 32 bit. > I installed the 32 bit libraries and hopefully fixed the issuses. > But I'm not able to test the 32 bit variant because I have no proper dumps. > Many thanks. > > Dietmar. Thanks Dietmar -- your patch is queued for crash-7.2.8: https://github.com/crash-utility/crash/commit/4e4e5859731da650d3520150d7ea2ef07094c7af Dave > > Signed-off-by: Dietmar Hahn > > --- > x86.c | 12 ++-- > x86_64.c| 20 +++- > xen_hyper.c | 22 -- > xen_hyper_defs.h| 8 > xen_hyper_dump_tables.c | 8 > 5 files changed, 41 insertions(+), 29 deletions(-) > > diff --git a/x86.c b/x86.c > index 88562b6..de0d3d3 100644 > --- a/x86.c > +++ b/x86.c > @@ -5600,18 +5600,18 @@ x86_get_stackbase_hyper(ulong task) > > if (symbol_exists("init_tss")) { > init_tss = symbol_value("init_tss"); > - init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu; > + init_tss += XEN_HYPER_SIZE(tss) * pcpu; > } else { > init_tss = symbol_value("per_cpu__init_tss"); > init_tss = xen_hyper_per_cpu(init_tss, pcpu); > } > > - buf = GETBUF(XEN_HYPER_SIZE(tss_struct)); > + buf = GETBUF(XEN_HYPER_SIZE(tss)); > if (!readmem(init_tss, KVADDR, buf, > - XEN_HYPER_SIZE(tss_struct), "init_tss", > RETURN_ON_ERROR)) { > + XEN_HYPER_SIZE(tss), "init_tss", RETURN_ON_ERROR)) { > error(FATAL, "cannot read init_tss.\n"); > } > - esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0)); > + esp = ULONG(buf + XEN_HYPER_OFFSET(tss_esp0)); > FREEBUF(buf); > base = esp & (~(STACKSIZE() - 1)); > > @@ -5745,8 +5745,8 @@ x86_init_hyper(int when) > #endif > XEN_HYPER_STRUCT_SIZE_INIT(cpu_time, "cpu_time"); > XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86"); > - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct"); > - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", > "esp0"); > + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct"); > + XEN_HYPER_MEMBER_OFFSET_INIT(tss_esp0, "tss_struct", "esp0"); > XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_local_tsc_stamp, > "cpu_time", > "local_tsc_stamp"); > XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_local_stamp, > "cpu_time", > "stime_local_stamp"); > XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_master_stamp, > "cpu_time", > "stime_master_stamp"); > diff --git a/x86_64.c b/x86_64.c > index a4138ed..4f1a6d7 100644 > --- a/x86_64.c > +++ b/x86_64.c > @@ -7973,13 +7973,23 @@ x86_64_init_hyper(int when) > > case POST_GDB: > XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86"); > - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct"); > - if (MEMBER_EXISTS("tss_struct", "__blh")) { > - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = > MEMBER_OFFSET("tss_struct", > "__blh") + sizeof(short unsigned int); > + if (symbol_exists("per_cpu__tss_page")) { > + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss64"); > + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = > + MEMBER_OFFSET("tss64", > "rsp0"); > + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss64", "ist"); > } else { > - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = > MEMBER_OFFSET("tss_struct", > "rsp0"); > + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct"); > + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss_struct", > "ist"); > + if (MEMBER_EXISTS("tss_struct", "__blh")) { > + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = > + MEMBER_OFFSET("tss_struct", "__blh") + > + sizeof(short > unsigned int); > + } else { > + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = > + > MEMBER_OFFSET("tss_struct", "rsp0"); > + } > } > - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", > "ist"); > if (symbol_exists("cpu_data")) { > xht->cpu_data_address = symbol_value("cpu_data"); >
Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed
Hi Dave, Am Mittwoch, 11. Dezember 2019, 20:28:49 CET schrieb Dave Anderson: > ... > > If you don't have a 32-bit x86 machine, or don't have the proper > libraries to build a 32-bit crash binary on an x86_64 host with > "make target=X86", just re-post the patch with your best effort > and I'll build-test it. > > Thanks, > Dave Sorry, my bad, I didn't think of 32 bit. I installed the 32 bit libraries and hopefully fixed the issuses. But I'm not able to test the 32 bit variant because I have no proper dumps. Many thanks. Dietmar. Signed-off-by: Dietmar Hahn --- x86.c | 12 ++-- x86_64.c| 20 +++- xen_hyper.c | 22 -- xen_hyper_defs.h| 8 xen_hyper_dump_tables.c | 8 5 files changed, 41 insertions(+), 29 deletions(-) diff --git a/x86.c b/x86.c index 88562b6..de0d3d3 100644 --- a/x86.c +++ b/x86.c @@ -5600,18 +5600,18 @@ x86_get_stackbase_hyper(ulong task) if (symbol_exists("init_tss")) { init_tss = symbol_value("init_tss"); - init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu; + init_tss += XEN_HYPER_SIZE(tss) * pcpu; } else { init_tss = symbol_value("per_cpu__init_tss"); init_tss = xen_hyper_per_cpu(init_tss, pcpu); } - buf = GETBUF(XEN_HYPER_SIZE(tss_struct)); + buf = GETBUF(XEN_HYPER_SIZE(tss)); if (!readmem(init_tss, KVADDR, buf, - XEN_HYPER_SIZE(tss_struct), "init_tss", RETURN_ON_ERROR)) { + XEN_HYPER_SIZE(tss), "init_tss", RETURN_ON_ERROR)) { error(FATAL, "cannot read init_tss.\n"); } - esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0)); + esp = ULONG(buf + XEN_HYPER_OFFSET(tss_esp0)); FREEBUF(buf); base = esp & (~(STACKSIZE() - 1)); @@ -5745,8 +5745,8 @@ x86_init_hyper(int when) #endif XEN_HYPER_STRUCT_SIZE_INIT(cpu_time, "cpu_time"); XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86"); - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct"); - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", "esp0"); + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct"); + XEN_HYPER_MEMBER_OFFSET_INIT(tss_esp0, "tss_struct", "esp0"); XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_local_tsc_stamp, "cpu_time", "local_tsc_stamp"); XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_local_stamp, "cpu_time", "stime_local_stamp"); XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_master_stamp, "cpu_time", "stime_master_stamp"); diff --git a/x86_64.c b/x86_64.c index a4138ed..4f1a6d7 100644 --- a/x86_64.c +++ b/x86_64.c @@ -7973,13 +7973,23 @@ x86_64_init_hyper(int when) case POST_GDB: XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86"); - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct"); - if (MEMBER_EXISTS("tss_struct", "__blh")) { - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = MEMBER_OFFSET("tss_struct", "__blh") + sizeof(short unsigned int); + if (symbol_exists("per_cpu__tss_page")) { + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss64"); + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = + MEMBER_OFFSET("tss64", "rsp0"); + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss64", "ist"); } else { - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = MEMBER_OFFSET("tss_struct", "rsp0"); + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct"); + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss_struct", "ist"); + if (MEMBER_EXISTS("tss_struct", "__blh")) { + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = + MEMBER_OFFSET("tss_struct", "__blh") + + sizeof(short unsigned int); + } else { + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) = + MEMBER_OFFSET("tss_struct", "rsp0"); + } } - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", "ist"); if (symbol_exists("cpu_data")) { xht->cpu_data_address = symbol_value("cpu_data"); } diff --git a/xen_hyper.c b/xen_hyper.c index f2f00e6..1030c0a 100644 --- a/xen_hyper.c +++ b/xen_hyper.c @@ -338,33 +338,35 @@ xen_hyper_x86_pcpu_init(void) if((xhpct->pcpu_struct = malloc(XEN_HYPER_SIZE(cpu_info))) == NULL) { error(FATAL, "cannot malloc pcpu struct space.\n"); } - /* get physical cpu