Re: [PATCH] xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()

2020-05-22 Thread Jan Beulich
On 21.05.2020 10:44, Andrew Cooper wrote:
> The 'T' debugkey reliably wedges on one of my systems, which has a sparse
> APIC_ID layout due to a non power-of-2 number of cores per socket.  The
> per_cpu(dt_cpu_data, cpu) calcution falls over the deliberately non-canonical
> poison value.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 




[PATCH] xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()

2020-05-21 Thread Andrew Cooper
The 'T' debugkey reliably wedges on one of my systems, which has a sparse
APIC_ID layout due to a non power-of-2 number of cores per socket.  The
per_cpu(dt_cpu_data, cpu) calcution falls over the deliberately non-canonical
poison value.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Julien Grall 
CC: Juergen Gross 

What is however weird is that instead of a crash, Xen wedges without printing
a clean backtrace.  Usually it blocks after just a few characters.  The best I
managed to get (and can't reproduce) is:

88 cpupool_rm_domain(dom=1,pool=0) n_dom 1
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) [ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]
(XEN) CPU:3
(XEN) RIP:e008:[] 
common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1
(XEN) RFLAGS: 00010006   CONTEXT: hypervisor (d0v13)
(XEN) rax: 80007d2fbf6d1000   rbx: 0030   rcx: fffa
(XEN) rdx: 82d040473c04   rsi: 83103ff0fc48   rdi: 83103ff0fc3e
(XEN) rbp: 83103ff0fc78   rsp: 83103ff0fc38   r8:  0001
(XEN) r9:  0038   r10: 0030   r11: 0002
(XEN) r12: 82d0409535a0   r13: 83103ff0fc38   r14: 82d04093
(XEN) r15: 82d040473bfe   cr0: 80050033   cr4: 00362660
(XEN) cr3: 001dd0f74000   cr2: 0041e5b0
(XEN) fsb: 7f5bb0f15780   gsb: 88827ad4   gss: 
(XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  
(common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1):
(XEN)  5e 2d 03 00 49 8b 04 24 <4a> 8b 3c 30 4c 89 ee e8 e6 fe ff ff 83 c3 01 49
(XEN) Xen stack trace from rsp=83103ff0fc38:
(XEN)ff00383420757063 83103ff0fc48 0292 0292
(XEN)83103ff0fef8 83103ff0 83103ff0fd28 83103ff0fef8
(XEN)83103ff0fc98 82d040207c05 83103ff0fc98 0054
(XEN)83103ff0fcb8 82d04021d04a  
(XEN)83103ff0fe48 82d0402329f6 831033cd9000 
(XEN)831000800027 0001 3ff0fcf8 0286
(XEN)83103ff0fd28  7f5bb0f27010 
(XEN) 82004009c938 83103ff0fe54 82d04035b055
(XEN)831033cd9000 82c0 83103ff0fd68 82d040234612
(XEN)831033c8e068 0003 00823679 83103ff0fdf8
(XEN)83103ff0fd88 82d040350f14 83103ff0fdb8 00130007
(XEN)7f5bb0f28010 0001 7f5bafeb02c4 001c
(XEN)7f5bb01f02a0 0001 7ffd208ae538 00637a70
(XEN)00637a30 00424e59 7f5baff0d88b 7f5bb01f33c0
(XEN) 0002 7f5baff0d913 
(XEN)82d0403b33d4 83103ff0fef8 0230 831033c8e000
(XEN)0001 deadbeefdeadf00d 83103ff0fee8 82d04032f0e0
(XEN)82d0403b33d4 7f5bb0f27010 deadbeefdeadf00d deadbeefdeadf00d
(XEN)deadbeefdeadf00d deadbeefdeadf00d 82d0403b33d4 82d0403b33c8
(XEN)82d0403b33d4 82d0403b33c8 82d0403b33d4 82d0403b33c8
(XEN) Xen call trace:
(XEN)[] R 
common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1
(XEN)[] F common/debugtrace.c#debugtrace_key+0x7f/0x81
(XEN)[] F handle_keypress+0xb2/0xc9
(XEN)[] F do_sysctl+0x6bc/0x148b
(XEN)[] F pv_hypercall+0x2fd/0x578
(XEN)[] F lstar_enter+0x112/0x120
(XEN)

which is lacking the remainder of the #GP output from the non-canonical memory
reference in mov (%rax,%r14,1), %rdi.  The wedge also doesn't suffer a
watchdog timeout, which is even more concerning.
---
 xen/common/debugtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/debugtrace.c b/xen/common/debugtrace.c
index c21ec99ee0..f3794b9453 100644
--- a/xen/common/debugtrace.c
+++ b/xen/common/debugtrace.c
@@ -95,7 +95,7 @@ static void debugtrace_dump_worker(void)
 
 debugtrace_dump_buffer(dt_data, "global");
 
-for ( cpu = 0; cpu < nr_cpu_ids; cpu++ )
+for_each_online_cpu ( cpu )
 {
 char buf[16];
 
-- 
2.11.0