Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-12 Thread 陈启武
Hi Dave,
I have mistaken understand about the first point of view, thanks for correct my 
fault.
But how about my second point about the optimization for ARM64 getting crash 
notes?
2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes for online 
cpus.
If one cpu contains invalid note, it's better to continue finding the crash 
notes for other
online cpus. So we can extract the backtraces for the online cpus which contain 
valid note
by using command "bt -a".

As a result without this change, for example, on an ELF-format kdump created by 
arm64 kernel,
I can see the below warning messeges during crash initialization:
WARNING: invalid note (name != "CORE"
WARNING: cannot retrieve registers for active tasks

From this dump, we can see online cpu number is 0,1,2,5,6,7:
crash> p -x __cpu_online_mask
__cpu_online_mask = $1 = {
  bits = {0xe7}
}

However, we cannot retrieve registers and backtraces for all active tasks by 
using command "bt -a":
crash> bt -a
PID: 244TASK: ffdcb7e40f80  CPU: 0   COMMAND: "hang_detect"
bt: WARNING: cannot determine starting stack frame for task ffdcb7e40f80

PID: 0  TASK: ffdd0066dd00  CPU: 1   COMMAND: "swapper/1"
bt: WARNING: cannot determine starting stack frame for task ffdd0066dd00

PID: 0  TASK: ffdd00669f00  CPU: 2   COMMAND: "swapper/2"
bt: WARNING: cannot determine starting stack frame for task ffdd00669f00

PID: 0  TASK: ffdd0066cd80  CPU: 3   COMMAND: "swapper/3"
bt: WARNING: cannot determine starting stack frame for task ffdd0066cd80

PID: 0  TASK: ffdd0066ae80  CPU: 4   COMMAND: "swapper/4"
bt: WARNING: cannot determine starting stack frame for task ffdd0066ae80

PID: 0  TASK: ffdd0066be00  CPU: 5   COMMAND: "swapper/5"
bt: WARNING: cannot determine starting stack frame for task ffdd0066be00

PID: 0  TASK: ffdd006f  CPU: 6   COMMAND: "swapper/6"
bt: WARNING: cannot determine starting stack frame for task ffdd006f

PID: 0  TASK: ffdd006f6c80  CPU: 7   COMMAND: "swapper/7"
bt: WARNING: cannot determine starting stack frame for task ffdd006f6c80

Since online cpu0 has invaid NT_PRSTATUS note, lead to skipping the reading the 
crash notes for other online cpus:
crash> help -D | grep PC:
  LR:    SP:    PC: 

  LR: ffa5f58b8688   SP: ffdd3fe4ea40   PC: 
ffa5f6932a10
  LR: ffa5f58b8688   SP: ffdd3fe6ca40   PC: 
ffa5f6932a10
  LR:    SP:    PC: 

  LR:    SP:    PC: 

  LR: ffa5f58b8688   SP: ffdd3fec6a40   PC: 
ffa5f6932a10
  LR: ffa5f58b8688   SP: ffdd3fee4a40   PC: 
ffa5f6932a10
  LR: ffa5f58b8688   SP: ffdd3ff02a40   PC: 
ffa5f6932a10

I test for patch v3 which achieves the proper result.
crash> bt -a
PID: 244TASK: ffdcb7e40f80  CPU: 0   COMMAND: "hang_detect"
bt: WARNING: cannot determine starting stack frame for task ffdcb7e40f80

PID: 0  TASK: ffdd0066dd00  CPU: 1   COMMAND: "swapper/1"
 #0 [ffdd3fe4ea40] mrdump_stop_noncore_cpu at ffa5f6932a0c
 #1 [ffdd3fe4ebc0] flush_smp_call_function_queue at ffa5f58b8684
 #2 [ffdd3fe4ec20] generic_smp_call_function_single_interrupt at 
ffa5f58ba8c4
 #3 [ffdd3fe4ec30] handle_IPI at ffa5f56a8a10
 #4 [ffdd3fe4eca0] gic_handle_irq at ffa5f5682294
---  ---
 #5 [ffdd006c7de0] el1_irq at ffa5f56844e4
 PC: ffa5f7120b28  [cpuidle_enter_state+336]
 LR: ffa5f7120d70  [cpuidle_enter_state+920]
 SP: ffdd006c7df0  PSTATE: 10c00145
X29: ffdd006c7df0  X28: 0001  X27: 0001
X26: ffa5fa04e000  X25: 0001  X24: 378c55cbccab
X23: ffa5f9970980  X22: 0001  X21: ffdcb6f6a400
X20: ffa5fa04efa8  X19: 378c56210393  X18: 
X17:   X16:   X15: 
X14: ffdd0066dd00  X13: 0037475a6000  X12: 3455591d
X11:   X10: 1000   X9: 
 X8: ff8ba00d8f6c   X7:    X6: 1ff4bf57ac45
 X5: 00209246a095bf14   X4: 348a8d795b93   X3: 431bde82d7b634db
 X2: 1ffba00cdba3   X1:    X0: 
 #6 [ffdd006c7df0] cpuidle_enter_state at ffa5f7120b24
 #7 [ffdd006c7e70] cpuidle_enter at ffa5f712150c
 #8 [ffdd006c7e80] call_cpuidle at ffa5f5805e24
 #9 [ffdd006c7eb0] do_idle at ffa5f5806320
#10 [ffdd006c7f80] cpu_startup_entry at ffa5f5806910
#11 [ffdd006c7fa0] secondary_start_kernel at ffa5f56a7fac

PID: 0  TASK: ffdd00669f00  CPU: 2   COMMAND: "swapper/2"
 

Re: [Crash-utility] [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting crash notes

2019-12-12 Thread Dave Anderson


- Original Message -
> Hi Dave,
> Above your suggestion, I made changes for patch v2.
> 
> Best regards,
> Qiwu

Sorry, but I'm going to NAK this patch in its current form.  I'm not exactly 
sure
what you're trying to accomplish, but in my testing, it breaks things 
unnecessarily.

For example, moving the call to arm64_get_crash_notes() from POST_VM to 
POST_INIT
breaks this logic in task_init():

645 else {
646 if (KDUMP_DUMPFILE())
647 map_cpus_to_prstatus();
648 else if (ELF_NOTES_VALID() && DISKDUMP_DUMPFILE())
649 map_cpus_to_prstatus_kdump_cmprs();
650 please_wait("determining panic task");
651 set_context(get_panic_context(), NO_PID);
652 please_wait_done();
653 }

The get_panic_context() call requires that the ARM64 arm64_get_crash_notes()
has already been called, and that ms->panic_task_regs[] array has been allocated
and populated.  But with your patch applied, it has not been called yet, and
therefore ms->page_task_regs is still NULL when get_panic_context() is called
above.
  
As a result, for example, on a compressed kdump clone created by virsh dump,
I now see this during initialization:

  please wait... (determining panic task) 
  WARNING: cannot determine starting stack frame for task 08c25280
  
  WARNING: cannot determine starting stack frame for task 8000fa01b000
  
  WARNING: cannot determine starting stack frame for task 8000fa01c000
  
  WARNING: cannot determine starting stack frame for task 8000fa01d000
  
But all of those active tasks have NT_PRSTATUS notes and backtraces:
  
  crash> help -D | grep PC:
 LR: 08085938   SP: 08be3ee0   PC: 
08099cf8
 LR: 08085938   SP: 8000fa06ff30   PC: 
08099cf8
 LR: 08085938   SP: 8000fa073f30   PC: 
08099cf8
 LR: 08085938   SP: 8000fa077f30   PC: 
08099cf8
  crash>

And therefore have legitimate starting stack frames:

  crash> bt 08c25280 8000fa01b000 8000fa01c000 8000fa01d000
  PID: 0  TASK: 08c25280  CPU: 0   COMMAND: "swapper/0"
   #0 [08be3ee0] cpu_do_idle at 08099cf4
   #1 [08be3f10] default_idle_call at 08766f90
   #2 [08be3f20] cpu_startup_entry at 08110fe4
   #3 [08be3f80] rest_init at 08761674
   #4 [08be3f90] start_kernel at 08ab0be8
  
  PID: 0  TASK: 8000fa01b000  CPU: 1   COMMAND: "swapper/1"
   #0 [8000fa06ff30] cpu_do_idle at 08099cf4
   #1 [8000fa06ff60] default_idle_call at 08766f90
   #2 [8000fa06ff70] cpu_startup_entry at 08110fe4
   #3 [8000fa06ffd0] secondary_start_kernel at 0808ecb8
  
  PID: 0  TASK: 8000fa01c000  CPU: 2   COMMAND: "swapper/2"
   #0 [8000fa073f30] cpu_do_idle at 08099cf4
   #1 [8000fa073f60] default_idle_call at 08766f90
   #2 [8000fa073f70] cpu_startup_entry at 08110fe4
   #3 [8000fa073fd0] secondary_start_kernel at 0808ecb8
  
  PID: 0  TASK: 8000fa01d000  CPU: 3   COMMAND: "swapper/3"
   #0 [8000fa077f30] cpu_do_idle at 08099cf4
   #1 [8000fa077f60] default_idle_call at 08766f90
   #2 [8000fa077f70] cpu_startup_entry at 08110fe4
   #3 [8000fa077fd0] secondary_start_kernel at 0808ecb8
  crash> 

Dave


  


> -Original Message-
> From: Dave Anderson 
> Sent: Wednesday, December 11, 2019 12:51 AM
> To: qiwuche...@gmail.com
> Cc: crash-utility@redhat.com; 陈启武 
> Subject: [External Mail]Re: [PATCH] Bugfix and optimization for ARM64 getting
> crash notes
> 
> 
> 
> - Original Message -
> > From: chenqiwu 
> >
> > 1) ARM64 call arm64_get_crash_notes() to retrieve active task
> > registers when POST_VM before calling map_cpus_to_prstatus() to remap
> > the NT_PRSTATUS elf notes to the online cpus. It's better to call
> > arm64_get_crash_notes() when POST_INIT.
> > 2) arm64_get_crash_notes() check the sanity of NT_PRSTATUS notes only
> > for online cpus. If one cpu contains invalid note, it's better to
> > continue finding the crash notes for other online cpus.
> > So we can extract the backtraces for the online cpus which contain
> > valid note by using command "bt -a".
> > 3) map_cpus_to_prstatus() remap the NT_PRSTATUS notes only to the
> > online cpus. Make sure there must be a one-to-one relationship between
> > the number of online cpus and the number of notes.
> 
> The code in map_cpus_to_prstatus() and map_cpus_to_prstatus_kdump_cmprs()
> has been in place forever.  Both the nd->nt_prstatus_percpu[] and
> dd->nt_prstatus_percpu[] arrays are per-cpu regardless whether
> they are online or offline.  However, since kdump 

Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed

2019-12-12 Thread Dave Anderson



- Original Message -
> Hi Dave,
> 
> Am Mittwoch, 11. Dezember 2019, 20:28:49 CET schrieb Dave Anderson:
> > 
> ...
> > 
> > If you don't have a 32-bit x86 machine, or don't have the proper
> > libraries to build a 32-bit crash binary on an x86_64 host with
> > "make target=X86", just re-post the patch with your best effort
> > and I'll build-test it.
> > 
> > Thanks,
> >   Dave
> 
> Sorry, my bad, I didn't think of 32 bit.
> I installed the 32 bit libraries and hopefully fixed the issuses.
> But I'm not able to test the 32 bit variant because I have no proper dumps.
> Many thanks.
> 
> Dietmar.


Thanks Dietmar -- your patch is queued for crash-7.2.8:

  
https://github.com/crash-utility/crash/commit/4e4e5859731da650d3520150d7ea2ef07094c7af

Dave


> 
> Signed-off-by: Dietmar Hahn 
> 
> ---
>  x86.c   | 12 ++--
>  x86_64.c| 20 +++-
>  xen_hyper.c | 22 --
>  xen_hyper_defs.h|  8 
>  xen_hyper_dump_tables.c |  8 
>  5 files changed, 41 insertions(+), 29 deletions(-)
> 
> diff --git a/x86.c b/x86.c
> index 88562b6..de0d3d3 100644
> --- a/x86.c
> +++ b/x86.c
> @@ -5600,18 +5600,18 @@ x86_get_stackbase_hyper(ulong task)
>  
>   if (symbol_exists("init_tss")) {
>   init_tss = symbol_value("init_tss");
> - init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu;
> + init_tss += XEN_HYPER_SIZE(tss) * pcpu;
>   } else {
>   init_tss = symbol_value("per_cpu__init_tss");
>   init_tss = xen_hyper_per_cpu(init_tss, pcpu);
>   }
>   
> - buf = GETBUF(XEN_HYPER_SIZE(tss_struct));
> + buf = GETBUF(XEN_HYPER_SIZE(tss));
>   if (!readmem(init_tss, KVADDR, buf,
> - XEN_HYPER_SIZE(tss_struct), "init_tss", 
> RETURN_ON_ERROR)) {
> + XEN_HYPER_SIZE(tss), "init_tss", RETURN_ON_ERROR)) {
>   error(FATAL, "cannot read init_tss.\n");
>   }
> - esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0));
> + esp = ULONG(buf + XEN_HYPER_OFFSET(tss_esp0));
>   FREEBUF(buf);
>   base = esp & (~(STACKSIZE() - 1));
>  
> @@ -5745,8 +5745,8 @@ x86_init_hyper(int when)
>  #endif
>   XEN_HYPER_STRUCT_SIZE_INIT(cpu_time, "cpu_time");
>   XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
> - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
> - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", 
> "esp0");
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_esp0, "tss_struct", "esp0");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_local_tsc_stamp, 
> "cpu_time",
>   "local_tsc_stamp");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_local_stamp, 
> "cpu_time",
>   "stime_local_stamp");
>   XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_master_stamp, 
> "cpu_time",
>   "stime_master_stamp");
> diff --git a/x86_64.c b/x86_64.c
> index a4138ed..4f1a6d7 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -7973,13 +7973,23 @@ x86_64_init_hyper(int when)
>  
>   case POST_GDB:
>   XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
> - XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
> - if (MEMBER_EXISTS("tss_struct", "__blh")) {
> - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
> MEMBER_OFFSET("tss_struct",
> "__blh") + sizeof(short unsigned int);
> + if (symbol_exists("per_cpu__tss_page")) {
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss64");
> + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
> + MEMBER_OFFSET("tss64", 
> "rsp0");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss64", "ist");
>   } else {
> - XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
> MEMBER_OFFSET("tss_struct",
> "rsp0");
> + XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
> + XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss_struct", 
> "ist");
> + if (MEMBER_EXISTS("tss_struct", "__blh")) {
> + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
> + MEMBER_OFFSET("tss_struct", "__blh") +
> + sizeof(short 
> unsigned int);
> + } else  {
> + XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
> + 
> MEMBER_OFFSET("tss_struct", "rsp0");
> + }
>   }
> - XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", 
> "ist");
>   if (symbol_exists("cpu_data")) {
>   xht->cpu_data_address = symbol_value("cpu_data");
>

Re: [Crash-utility] [PATCH] Debugging xen hypervisor failed

2019-12-12 Thread Dietmar Hahn
Hi Dave,

Am Mittwoch, 11. Dezember 2019, 20:28:49 CET schrieb Dave Anderson:
> 
...
> 
> If you don't have a 32-bit x86 machine, or don't have the proper 
> libraries to build a 32-bit crash binary on an x86_64 host with
> "make target=X86", just re-post the patch with your best effort 
> and I'll build-test it.
> 
> Thanks,
>   Dave

Sorry, my bad, I didn't think of 32 bit.
I installed the 32 bit libraries and hopefully fixed the issuses.
But I'm not able to test the 32 bit variant because I have no proper dumps.
Many thanks.

Dietmar.


Signed-off-by: Dietmar Hahn 

---
 x86.c   | 12 ++--
 x86_64.c| 20 +++-
 xen_hyper.c | 22 --
 xen_hyper_defs.h|  8 
 xen_hyper_dump_tables.c |  8 
 5 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/x86.c b/x86.c
index 88562b6..de0d3d3 100644
--- a/x86.c
+++ b/x86.c
@@ -5600,18 +5600,18 @@ x86_get_stackbase_hyper(ulong task)
 
if (symbol_exists("init_tss")) {
init_tss = symbol_value("init_tss");
-   init_tss += XEN_HYPER_SIZE(tss_struct) * pcpu;
+   init_tss += XEN_HYPER_SIZE(tss) * pcpu;
} else {
init_tss = symbol_value("per_cpu__init_tss");
init_tss = xen_hyper_per_cpu(init_tss, pcpu);
}

-   buf = GETBUF(XEN_HYPER_SIZE(tss_struct));
+   buf = GETBUF(XEN_HYPER_SIZE(tss));
if (!readmem(init_tss, KVADDR, buf,
-   XEN_HYPER_SIZE(tss_struct), "init_tss", 
RETURN_ON_ERROR)) {
+   XEN_HYPER_SIZE(tss), "init_tss", RETURN_ON_ERROR)) {
error(FATAL, "cannot read init_tss.\n");
}
-   esp = ULONG(buf + XEN_HYPER_OFFSET(tss_struct_esp0));
+   esp = ULONG(buf + XEN_HYPER_OFFSET(tss_esp0));
FREEBUF(buf);
base = esp & (~(STACKSIZE() - 1));
 
@@ -5745,8 +5745,8 @@ x86_init_hyper(int when)
 #endif
XEN_HYPER_STRUCT_SIZE_INIT(cpu_time, "cpu_time");
XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
-   XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
-   XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_esp0, "tss_struct", 
"esp0");
+   XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
+   XEN_HYPER_MEMBER_OFFSET_INIT(tss_esp0, "tss_struct", "esp0");
XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_local_tsc_stamp, 
"cpu_time", "local_tsc_stamp");
XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_local_stamp, 
"cpu_time", "stime_local_stamp");
XEN_HYPER_MEMBER_OFFSET_INIT(cpu_time_stime_master_stamp, 
"cpu_time", "stime_master_stamp");
diff --git a/x86_64.c b/x86_64.c
index a4138ed..4f1a6d7 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -7973,13 +7973,23 @@ x86_64_init_hyper(int when)
 
case POST_GDB:
XEN_HYPER_STRUCT_SIZE_INIT(cpuinfo_x86, "cpuinfo_x86");
-   XEN_HYPER_STRUCT_SIZE_INIT(tss_struct, "tss_struct");
-   if (MEMBER_EXISTS("tss_struct", "__blh")) {
-   XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
MEMBER_OFFSET("tss_struct", "__blh") + sizeof(short unsigned int);
+   if (symbol_exists("per_cpu__tss_page")) {
+   XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss64");
+   XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
+   MEMBER_OFFSET("tss64", 
"rsp0");
+   XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss64", "ist");
} else {
-   XEN_HYPER_ASSIGN_OFFSET(tss_struct_rsp0) = 
MEMBER_OFFSET("tss_struct", "rsp0");
+   XEN_HYPER_STRUCT_SIZE_INIT(tss, "tss_struct");
+   XEN_HYPER_MEMBER_OFFSET_INIT(tss_ist, "tss_struct", 
"ist");
+   if (MEMBER_EXISTS("tss_struct", "__blh")) {
+   XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
+   MEMBER_OFFSET("tss_struct", "__blh") +
+   sizeof(short 
unsigned int);
+   } else  {
+   XEN_HYPER_ASSIGN_OFFSET(tss_rsp0) =
+   
MEMBER_OFFSET("tss_struct", "rsp0");
+   }
}
-   XEN_HYPER_MEMBER_OFFSET_INIT(tss_struct_ist, "tss_struct", 
"ist");
if (symbol_exists("cpu_data")) {
xht->cpu_data_address = symbol_value("cpu_data");
}
diff --git a/xen_hyper.c b/xen_hyper.c
index f2f00e6..1030c0a 100644
--- a/xen_hyper.c
+++ b/xen_hyper.c
@@ -338,33 +338,35 @@ xen_hyper_x86_pcpu_init(void)
if((xhpct->pcpu_struct = malloc(XEN_HYPER_SIZE(cpu_info))) == NULL) {
error(FATAL, "cannot malloc pcpu struct space.\n");
}
-
/* get physical cpu