Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
Hi Juergen, I've found another regression that happens only with sched-gran=core. CentOS 5.11 (PV, CPUs: 32; RAM: 6GB) kernel hangs during suspend attempt. The last kernel messages are: CPU 1 offline: Remove Rx thread CPU 2 offline: Remove Rx thread Kernel: Linux localhost 2.6.18-398.el5xen #1 SMP Tue Sep 16 21:31:50 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux xl top shows 100% CPU utilization for the hanged VM. And here's its state: (XEN) [ 1907.976356] *** Dumping CPU14 guest state (d1v0): *** (XEN) [ 1907.982558] [ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ] (XEN) [ 1907.990704] CPU:14 (XEN) [ 1907.993901] RIP:e033:[] (XEN) [ 1907.999333] RFLAGS: 0286 EM: 1 CONTEXT: pv guest (d1v0) (XEN) [ 1908.007282] rax: 0001 rbx: 80522b80 rcx: (XEN) [ 1908.016203] rdx: 80522b90 rsi: 0079 rdi: 8052a980 (XEN) [ 1908.025121] rbp: 80522980 rsp: 88017106dcf8 r8: 88017106c000 (XEN) [ 1908.034040] r9: r10: r11: 880176fad8c0 (XEN) [ 1908.042962] r12: 0001 r13: r14: 80522980 (XEN) [ 1908.051881] r15: 0003 cr0: 8005003b cr4: 00142660 (XEN) [ 1908.060800] cr3: 00801d8c cr2: 2b540097 (XEN) [ 1908.067393] fsb: gsb: 80639000 gss: (XEN) [ 1908.076311] ds: es: fs: gs: ss: e02b cs: e033 (XEN) [ 1908.084650] Guest stack trace from rsp=88017106dcf8: (XEN) [ 1908.091147]802c3dd4 01168460 80522b90 0079 (XEN) [ 1908.100164]80522b80 80522980 0001 (XEN) [ 1908.109179]0003 0003 802c4041 88017d68b040 (XEN) [ 1908.118197]0003 0003 0007 0003 (XEN) [ 1908.127213]ffea 8029f0ad 802c4092 8050ff90 (XEN) [ 1908.136229]80268111 0003 88017d68b040 (XEN) [ 1908.145245]802a408b 8800011d3860 fff7 (XEN) [ 1908.154263] 7fff 0001 (XEN) [ 1908.163278] (XEN) [ 1908.172296]0003 0003 88017f5ebce0 (XEN) [ 1908.181312] 88017f5ebcd0 803be1a6 0001 (XEN) [ 1908.190328] 803be9dd 803bea43 88017106deb0 (XEN) [ 1908.199345]80289495 0003 88017f5ebce0 88017f5ebce8 (XEN) [ 1908.208362] 88017f5ebcd0 8029f0ad 88017106dee0 (XEN) [ 1908.217379] 88017f5ebce0 88017f5ebcd0 (XEN) [ 1908.226396]8029f0ad 80233ee4 (XEN) [ 1908.235413]88017d8687f0 (XEN) [ 1908.244427]7fff 8800011d3860 88017f5ebcd0 (XEN) [ 1908.253444]88017f5ebc78 88017f5ce6c0 80260b2c (XEN) [ 1908.262461]8029f0ad 88017f5ebcd0 88017f5ce6c0 Thanks, Sergey ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 24.07.19 16:54, Sergey Dyasli wrote: On 24/07/2019 10:13, Juergen Gross wrote: The fix is a one-liner. :-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index f0bc5b3161..da9efb147f 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -2207,6 +2207,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, if ( unlikely(!scheduler_active) ) { ASSERT(is_idle_unit(prev)); + atomic_set(>next_task->rendezvous_out_cnt, 0); prev->rendezvous_in_cnt = 0; } } Even with that applied, I'm still seeing it :( Interesting, for me it was gone. Time for more tests and some debug code... Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 24/07/2019 10:13, Juergen Gross wrote: > The fix is a one-liner. :-) > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index f0bc5b3161..da9efb147f 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -2207,6 +2207,7 @@ static struct sched_unit > *sched_wait_rendezvous_in(struct sched_unit *prev, > if ( unlikely(!scheduler_active) ) > { > ASSERT(is_idle_unit(prev)); > + atomic_set(>next_task->rendezvous_out_cnt, 0); > prev->rendezvous_in_cnt = 0; > } > } Even with that applied, I'm still seeing it :( (XEN) [ 311.223780] Watchdog timer detects that CPU1 is stuck! (XEN) [ 311.229413] [ Xen-4.13.0 x86_64 debug=y Not tainted ] (XEN) [ 311.236002] CPU:1 (XEN) [ 311.238774] RIP:e008:[] sched_context_switched+0x92/0x101 (XEN) [ 311.246575] RFLAGS: 0202 CONTEXT: hypervisor (XEN) [ 311.252556] rax: 0002 rbx: 83081cc635b0 rcx: 0001 (XEN) [ 311.260530] rdx: 83081cc63634 rsi: 83081cc8f000 rdi: 83081cc8f000 (XEN) [ 311.268502] rbp: 83081cc87df0 rsp: 83081cc87dd0 r8: (XEN) [ 311.276474] r9: 83081cc62000 r10: 83081cc62000 r11: 83081cc6b000 (XEN) [ 311.284448] r12: 83081cc8f000 r13: 83081cc8f000 r14: 83081cc61e80 (XEN) [ 311.292422] r15: 82d0805e2260 cr0: 8005003b cr4: 001526e0 (XEN) [ 311.300395] cr3: dd4ac000 cr2: 559b05a94048 (XEN) [ 311.306288] fsb: gsb: 8880a394 gss: (XEN) [ 311.314262] ds: 002b es: 002b fs: gs: ss: e010 cs: e008 (XEN) [ 311.321716] Xen code around (sched_context_switched+0x92/0x101): (XEN) [ 311.329862] 85 c0 74 08 f3 90 8b 02 <85> c0 75 f8 49 8b 44 24 10 66 81 38 ff 7f 75 05 (XEN) [ 311.338269] Xen stack trace from rsp=83081cc87dd0: (XEN) [ 311.343904]83081cc8f000 83081cc8f000 83081cc635b0 (XEN) [ 311.351963]83081cc87e28 82d080240996 83081cc61e98 82d08060a4a8 (XEN) [ 311.360022]83081cc61e98 82d08060a4a8 83081cc635b0 83081cc87e80 (XEN) [ 311.368083]82d080240f7a 0001 83081cc8f000 0047588837ec (XEN) [ 311.376142]00011cc87ec0 82d0805c3a00 82d0805c3980 (XEN) [ 311.384205]82d0805d3980 82d0805e2260 83081cc87eb0 82d08024274a (XEN) [ 311.392263]0001 82d0805c3a00 0001 0001 (XEN) [ 311.400324]83081cc87ec0 82d0802427bf 83081cc87ef0 82d080279a1d (XEN) [ 311.408385]83081cc8f000 83081cc8f000 0001 83081cc635b0 (XEN) [ 311.416443]83081cc87df0 88809ee1ba00 88809ee1ba00 (XEN) [ 311.424504] 0005 88809ee1ba00 0246 (XEN) [ 311.432563] 0001ca00 (XEN) [ 311.440625]810013aa 8203c190 deadbeefdeadf00d deadbeefdeadf00d (XEN) [ 311.448685]0100 810013aa e033 0246 (XEN) [ 311.456747]c900400bfeb0 e02b beef beef (XEN) [ 311.464807]beef beef e011 83081cc8f000 (XEN) [ 311.472864]00379c665d00 001526e0 (XEN) [ 311.480926]0600 (XEN) [ 311.486041] Xen call trace: (XEN) [ 311.489332][] sched_context_switched+0x92/0x101 (XEN) [ 311.496266][] schedule.c#sched_context_switch+0x7f/0x160 (XEN) [ 311.503980][] schedule.c#sched_slave+0x28f/0x2b5 (XEN) [ 311.510999][] softirq.c#__do_softirq+0x61/0x8c (XEN) [ 311.517846][] do_softirq+0x13/0x15 (XEN) [ 311.523653][] domain.c#idle_loop+0x52/0xa7 (XEN) [ 311.530152] (XEN) [ 311.532144] CPU0 @ e008:82d08024334d (stop_machine.c#stopmachine_wait_state+0x19/0x24) (XEN) [ 311.540899] CPU5 @ e008:82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93) (XEN) [ 311.549307] CPU3 @ e008:82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93) (XEN) [ 311.557712] CPU4 @ e008:82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93) (XEN) [ 311.566119] CPU7 @ e008:82d08024339e (stop_machine.c#stopmachine_action+0x46/0x93) (XEN) [ 311.574526] CPU2 @ e008:82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93) (XEN) [ 311.582931] CPU6 @ e008:82d080243398 (stop_machine.c#stopmachine_action+0x40/0x93) (XEN) [ 311.591919] (XEN) [ 311.593914] (XEN) [ 311.599374] Panic on CPU 1: (XEN) [ 311.602669] FATAL TRAP: vector = 2 (nmi) (XEN) [ 311.607088] [error_code=] (XEN) [ 311.610641]
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 22.07.19 16:22, Sergey Dyasli wrote: On 19/07/2019 14:57, Juergen Gross wrote: I have now a git branch with the two problems corrected and rebased to current staging available: github.com/jgross1/xen.git sched-v1b Many thanks for the branch! As for the crashes, vcpu_sleep_sync() one seems to be fixed now. But I can still reproduce the shutdown one. Interestingly, it now happens only if a host has running VMs (which are automatically powered off via PV tools): (XEN) [ 332.981355] Preparing system for ACPI S5 state. (XEN) [ 332.981419] Disabling non-boot CPUs ... (XEN) [ 337.703896] Watchdog timer detects that CPU1 is stuck! (XEN) [ 337.709532] [ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ] (XEN) [ 337.716808] CPU:1 (XEN) [ 337.719582] RIP:e008:[] sched_context_switched+0xaf/0x101 (XEN) [ 337.727384] RFLAGS: 0202 CONTEXT: hypervisor (XEN) [ 337.733364] rax: 0002 rbx: 83081cc615b0 rcx: 0001 (XEN) [ 337.741338] rdx: 83081cc61634 rsi: 83081cc72000 rdi: 83081cc72000 (XEN) [ 337.749312] rbp: 83081cc8fdc0 rsp: 83081cc8fda0 r8: (XEN) [ 337.757284] r9: r10: 004d88fc535e r11: 004df8675ce7 (XEN) [ 337.765256] r12: 83081cc72000 r13: 83081cc72000 r14: 83081ccb0e80 (XEN) [ 337.773232] r15: 83081cc615b0 cr0: 8005003b cr4: 001526e0 (XEN) [ 337.781206] cr3: dd2a1000 cr2: 88809ed1fb80 (XEN) [ 337.787100] fsb: gsb: 8880a38c gss: (XEN) [ 337.795072] ds: 002b es: 002b fs: gs: ss: e010 cs: e008 (XEN) [ 337.802525] Xen code around (sched_context_switched+0xaf/0x101): (XEN) [ 337.810672] 00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8 (XEN) [ 337.819080] Xen stack trace from rsp=83081cc8fda0: (XEN) [ 337.824713]83081cc72000 83081cc72000 83081cc615b0 (XEN) [ 337.832772]83081cc8fe00 82d0802404e0 0082 83081ccb0e98 (XEN) [ 337.840832]0001 83081ccb0e98 0001 82d080602628 (XEN) [ 337.848895]83081cc8fe60 82d080240aca 004d873bd669 0001 (XEN) [ 337.856952]83081cc72000 004d873bdc1c 830800ff 82d0805bba00 (XEN) [ 337.865012]82d0805bb980 83081cc8 0001 (XEN) [ 337.873072]83081cc8fe90 82d080242315 0080 82d0805bb980 (XEN) [ 337.881132]0001 82d0806026f0 83081cc8fea0 82d08024236a (XEN) [ 337.889196]83081cc8fef0 82d08027a151 82d080242315 00010665f000 (XEN) [ 337.897256]83081cc72000 83081cc72000 83080665f000 83081cc63000 (XEN) [ 337.905313]0001 830806684000 83081cc8fd78 88809ee08000 (XEN) [ 337.913373]88809ee08000 0003 (XEN) [ 337.921434]88809ee08000 0246 (XEN) [ 337.929497]96968abe 810013aa 8203c190 (XEN) [ 337.937554]deadbeefdeadf00d deadbeefdeadf00d 0100 810013aa (XEN) [ 337.945615]e033 0246 c900400afeb0 e02b (XEN) [ 337.953674]beef beef beef beef (XEN) [ 337.961736]e011 83081cc72000 00379c66db80 001526e0 (XEN) [ 337.969797] 0600 (XEN) [ 337.977856] Xen call trace: (XEN) [ 337.981152][] sched_context_switched+0xaf/0x101 (XEN) [ 337.988083][] schedule.c#sched_context_switch+0x72/0x151 (XEN) [ 337.995796][] schedule.c#sched_slave+0x2a3/0x2b2 (XEN) [ 338.002817][] softirq.c#__do_softirq+0x85/0x90 (XEN) [ 338.009664][] do_softirq+0x13/0x15 (XEN) [ 338.015471][] domain.c#idle_loop+0xb2/0xc9 (XEN) [ 338.021970] (XEN) [ 338.023965] CPU7 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.032372] CPU5 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.040776] CPU4 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.049182] CPU2 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.057591] CPU6 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.065999] CPU3 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.074406] CPU0 @ e008:82d0802532d1 (ns16550.c#ns_read_reg+0x21/0x42) (XEN) [ 338.081773] (XEN) [ 338.083764] (XEN) [ 338.089226] Panic on CPU 1: (XEN) [ 338.092521] FATAL TRAP: vector = 2 (nmi) (XEN) [ 338.096940] [error_code=] (XEN)
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 19/07/2019 14:57, Juergen Gross wrote: > I have now a git branch with the two problems corrected and rebased to > current staging available: > > github.com/jgross1/xen.git sched-v1b Many thanks for the branch! As for the crashes, vcpu_sleep_sync() one seems to be fixed now. But I can still reproduce the shutdown one. Interestingly, it now happens only if a host has running VMs (which are automatically powered off via PV tools): (XEN) [ 332.981355] Preparing system for ACPI S5 state. (XEN) [ 332.981419] Disabling non-boot CPUs ... (XEN) [ 337.703896] Watchdog timer detects that CPU1 is stuck! (XEN) [ 337.709532] [ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ] (XEN) [ 337.716808] CPU:1 (XEN) [ 337.719582] RIP:e008:[] sched_context_switched+0xaf/0x101 (XEN) [ 337.727384] RFLAGS: 0202 CONTEXT: hypervisor (XEN) [ 337.733364] rax: 0002 rbx: 83081cc615b0 rcx: 0001 (XEN) [ 337.741338] rdx: 83081cc61634 rsi: 83081cc72000 rdi: 83081cc72000 (XEN) [ 337.749312] rbp: 83081cc8fdc0 rsp: 83081cc8fda0 r8: (XEN) [ 337.757284] r9: r10: 004d88fc535e r11: 004df8675ce7 (XEN) [ 337.765256] r12: 83081cc72000 r13: 83081cc72000 r14: 83081ccb0e80 (XEN) [ 337.773232] r15: 83081cc615b0 cr0: 8005003b cr4: 001526e0 (XEN) [ 337.781206] cr3: dd2a1000 cr2: 88809ed1fb80 (XEN) [ 337.787100] fsb: gsb: 8880a38c gss: (XEN) [ 337.795072] ds: 002b es: 002b fs: gs: ss: e010 cs: e008 (XEN) [ 337.802525] Xen code around (sched_context_switched+0xaf/0x101): (XEN) [ 337.810672] 00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8 (XEN) [ 337.819080] Xen stack trace from rsp=83081cc8fda0: (XEN) [ 337.824713]83081cc72000 83081cc72000 83081cc615b0 (XEN) [ 337.832772]83081cc8fe00 82d0802404e0 0082 83081ccb0e98 (XEN) [ 337.840832]0001 83081ccb0e98 0001 82d080602628 (XEN) [ 337.848895]83081cc8fe60 82d080240aca 004d873bd669 0001 (XEN) [ 337.856952]83081cc72000 004d873bdc1c 830800ff 82d0805bba00 (XEN) [ 337.865012]82d0805bb980 83081cc8 0001 (XEN) [ 337.873072]83081cc8fe90 82d080242315 0080 82d0805bb980 (XEN) [ 337.881132]0001 82d0806026f0 83081cc8fea0 82d08024236a (XEN) [ 337.889196]83081cc8fef0 82d08027a151 82d080242315 00010665f000 (XEN) [ 337.897256]83081cc72000 83081cc72000 83080665f000 83081cc63000 (XEN) [ 337.905313]0001 830806684000 83081cc8fd78 88809ee08000 (XEN) [ 337.913373]88809ee08000 0003 (XEN) [ 337.921434]88809ee08000 0246 (XEN) [ 337.929497]96968abe 810013aa 8203c190 (XEN) [ 337.937554]deadbeefdeadf00d deadbeefdeadf00d 0100 810013aa (XEN) [ 337.945615]e033 0246 c900400afeb0 e02b (XEN) [ 337.953674]beef beef beef beef (XEN) [ 337.961736]e011 83081cc72000 00379c66db80 001526e0 (XEN) [ 337.969797] 0600 (XEN) [ 337.977856] Xen call trace: (XEN) [ 337.981152][] sched_context_switched+0xaf/0x101 (XEN) [ 337.988083][] schedule.c#sched_context_switch+0x72/0x151 (XEN) [ 337.995796][] schedule.c#sched_slave+0x2a3/0x2b2 (XEN) [ 338.002817][] softirq.c#__do_softirq+0x85/0x90 (XEN) [ 338.009664][] do_softirq+0x13/0x15 (XEN) [ 338.015471][] domain.c#idle_loop+0xb2/0xc9 (XEN) [ 338.021970] (XEN) [ 338.023965] CPU7 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.032372] CPU5 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.040776] CPU4 @ e008:82d080242f94 (stop_machine.c#stopmachine_action+0x30/0xa0) (XEN) [ 338.049182] CPU2 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.057591] CPU6 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.065999] CPU3 @ e008:82d080242f9a (stop_machine.c#stopmachine_action+0x36/0xa0) (XEN) [ 338.074406] CPU0 @ e008:82d0802532d1 (ns16550.c#ns_read_reg+0x21/0x42) (XEN) [ 338.081773] (XEN) [ 338.083764] (XEN) [ 338.089226] Panic on CPU 1: (XEN) [ 338.092521] FATAL TRAP: vector = 2 (nmi) (XEN) [ 338.096940] [error_code=] (XEN) [ 338.100491]
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 18.07.19 17:14, Sergey Dyasli wrote: On 18/07/2019 15:48, Juergen Gross wrote: On 15.07.19 16:08, Sergey Dyasli wrote: On 05/07/2019 14:56, Dario Faggioli wrote: On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- 8.0.6-d x86_64 debug=y Not tainted ] [...] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278] [] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518] [] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326] [] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258] [] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238] [] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566] [] pv_hypercall+0x364/0x564 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719] [] do_entry_int82+0x26/0x2d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876] [] entry_int82+0xbb/0xc0 Mmm... vcpu_set_periodic_timer? What kernel is this and when does this crash happen? Hi Dario, I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) which has the following kernel: # uname -a Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux All I need to do is suspend and resume the VM. Happens with a more recent kernel, too. I can easily reproduce the issue with any PV guest with more than 1 vcpu by doing "xl save" and then "xl restore" again. With the reproducer being available I'm now diving into the issue... One further thing to add is that I was able to avoid the crash by reverting xen/sched: rework and rename vcpu_force_reschedule() which is a part of the series. This made all tests with PV guests pass. Another question I have is do you have a git branch with core-scheduling patches rebased on top of current staging available somewhere? I have now a git branch with the two problems corrected and rebased to current staging available: github.com/jgross1/xen.git sched-v1b Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 16.07.19 17:45, Sergey Dyasli wrote: On 05/07/2019 14:17, Sergey Dyasli wrote: [2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that CPU30 is stuck! [2019-07-05 00:37:16 UTC] (XEN) [24907.514180] [ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ] [2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:30 [2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP: e008:[] sched_context_switched+0xaf/0x101 [2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0202 CONTEXT: hypervisor [2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0002 rbx: 83202782e880 rcx: 001e [2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: 83202782e904 rsi: 832027823000 rdi: 832027823000 [2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: 83403cab7d20 rsp: 83403cab7d00 r8: [2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9: r10: 0200200200200200 r11: 0100100100100100 [2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: 832027823000 r13: 832027823000 r14: 83202782e7b0 [2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: 83202782e880 cr0: 8005003b cr4: 000426e0 [2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: bd8a1000 cr2: 1851b798 [2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: gsb: gss: [2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: es: fs: gs: ss: cs: e008 [2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around (sched_context_switched+0xaf/0x101): [2019-07-05 00:37:16 UTC] (XEN) [24907.990277] 00 00 eb 18 f3 90 8b 02 <85> c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8 [2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from rsp=83403cab7d00: [2019-07-05 00:37:16 UTC] (XEN) [24908.061298]832027823000 832027823000 83202782e880 [2019-07-05 00:37:16 UTC] (XEN) [24908.098529]83403cab7d60 82d0802407c0 0082 83202782e7c8 [2019-07-05 00:37:16 UTC] (XEN) [24908.135622]001e 83202782e7c8 001e 82d080602628 [2019-07-05 00:37:16 UTC] (XEN) [24908.172671]83403cab7dc0 82d080240d83 df99 001e [2019-07-05 00:37:16 UTC] (XEN) [24908.210212]832027823000 16a62dc8c6bc 00fc 001e [2019-07-05 00:37:16 UTC] (XEN) [24908.247181]83202782e7c8 82d080602628 82d0805da460 001e [2019-07-05 00:37:16 UTC] (XEN) [24908.284279]83403cab7e60 82d080240ea4 0002802aecc5 832027823000 [2019-07-05 00:37:16 UTC] (XEN) [24908.321128]83202782e7b0 83202782e880 83403cab7e10 82d080273b4e [2019-07-05 00:37:16 UTC] (XEN) [24908.358308]83403cab7e10 82d080242f7f 83403cab7e60 82d08024663a [2019-07-05 00:37:17 UTC] (XEN) [24908.395662]83403cab7ea0 82d0802ec32a 834000ff 82d0805bc880 [2019-07-05 00:37:17 UTC] (XEN) [24908.432376]82d0805bb980 83403cab7fff 001e [2019-07-05 00:37:17 UTC] (XEN) [24908.469812]83403cab7e90 82d080242575 0f00 82d0805bb980 [2019-07-05 00:37:17 UTC] (XEN) [24908.508373]001e 82d0806026f0 83403cab7ea0 82d0802425ca [2019-07-05 00:37:17 UTC] (XEN) [24908.549856]83403cab7ef0 82d08027a601 82d080242575 001e7ffde000 [2019-07-05 00:37:17 UTC] (XEN) [24908.588022]832027823000 832027823000 83127ffde000 83203ffe5000 [2019-07-05 00:37:17 UTC] (XEN) [24908.625217]001e 831204092000 83403cab7d78 ffed [2019-07-05 00:37:17 UTC] (XEN) [24908.662932]8180 8180 [2019-07-05 00:37:17 UTC] (XEN) [24908.703246]818f4580 880039118848 0e6a3c4b2698 148900db [2019-07-05 00:37:17 UTC] (XEN) [24908.743671] 8101e650 8185c3e0 [2019-07-05 00:37:17 UTC] (XEN) [24908.781927] beefbeef 81054eb2 [2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace: [2019-07-05 00:37:17 UTC] (XEN) [24908.836789][] sched_context_switched+0xaf/0x101 [2019-07-05 00:37:17 UTC] (XEN) [24908.869916][] schedule.c#sched_context_switch+0x72/0x151 [2019-07-05 00:37:17 UTC] (XEN) [24908.907384][] schedule.c#sched_slave+0x2a3/0x2b2 [2019-07-05 00:37:17 UTC] (XEN) [24908.941241][] schedule.c#schedule+0x112/0x2a1 [2019-07-05 00:37:17 UTC] (XEN) [24908.973939][] softirq.c#__do_softirq+0x85/0x90 [2019-07-05 00:37:17 UTC] (XEN) [24909.007101][] do_softirq+0x13/0x15 [2019-07-05 00:37:17 UTC] (XEN) [24909.035971][] domain.c#idle_loop+0xad/0xc0 [2019-07-05 00:37:17 UTC] (XEN) [24909.070546] [2019-07-05
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 19.07.19 07:41, Juergen Gross wrote: On 18.07.19 17:14, Sergey Dyasli wrote: On 18/07/2019 15:48, Juergen Gross wrote: On 15.07.19 16:08, Sergey Dyasli wrote: On 05/07/2019 14:56, Dario Faggioli wrote: On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- 8.0.6-d x86_64 debug=y Not tainted ] [...] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278] [] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518] [] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326] [] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258] [] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238] [] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566] [] pv_hypercall+0x364/0x564 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719] [] do_entry_int82+0x26/0x2d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876] [] entry_int82+0xbb/0xc0 Mmm... vcpu_set_periodic_timer? What kernel is this and when does this crash happen? Hi Dario, I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) which has the following kernel: # uname -a Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux All I need to do is suspend and resume the VM. Happens with a more recent kernel, too. I can easily reproduce the issue with any PV guest with more than 1 vcpu by doing "xl save" and then "xl restore" again. With the reproducer being available I'm now diving into the issue... One further thing to add is that I was able to avoid the crash by reverting xen/sched: rework and rename vcpu_force_reschedule() which is a part of the series. This made all tests with PV guests pass. Yeah, but removing this patch is just papering over a general issue. The main problem seems to be a vcpu trying to pause another vcpu of the same sched_unit. I already have an idea what is really happening, I just need to verify it. Was another problem as I thought initially, but I've found it. xl restore is working now. :-) Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 18.07.19 17:14, Sergey Dyasli wrote: On 18/07/2019 15:48, Juergen Gross wrote: On 15.07.19 16:08, Sergey Dyasli wrote: On 05/07/2019 14:56, Dario Faggioli wrote: On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- 8.0.6-d x86_64 debug=y Not tainted ] [...] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278] [] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518] [] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326] [] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258] [] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238] [] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566] [] pv_hypercall+0x364/0x564 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719] [] do_entry_int82+0x26/0x2d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876] [] entry_int82+0xbb/0xc0 Mmm... vcpu_set_periodic_timer? What kernel is this and when does this crash happen? Hi Dario, I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) which has the following kernel: # uname -a Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux All I need to do is suspend and resume the VM. Happens with a more recent kernel, too. I can easily reproduce the issue with any PV guest with more than 1 vcpu by doing "xl save" and then "xl restore" again. With the reproducer being available I'm now diving into the issue... One further thing to add is that I was able to avoid the crash by reverting xen/sched: rework and rename vcpu_force_reschedule() which is a part of the series. This made all tests with PV guests pass. Yeah, but removing this patch is just papering over a general issue. The main problem seems to be a vcpu trying to pause another vcpu of the same sched_unit. I already have an idea what is really happening, I just need to verify it. Another question I have is do you have a git branch with core-scheduling patches rebased on top of current staging available somewhere? Only the one Dario already mentioned. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On Thu, 2019-07-18 at 16:14 +0100, Sergey Dyasli wrote: > On 18/07/2019 15:48, Juergen Gross wrote: > > > > I can easily reproduce the issue with any PV guest with more than 1 > > vcpu > > by doing "xl save" and then "xl restore" again. > > > > With the reproducer being available I'm now diving into the > > issue... > > One further thing to add is that I was able to avoid the crash by > reverting > > xen/sched: rework and rename vcpu_force_reschedule() > Ah, interesting! > which is a part of the series. This made all tests with PV guests > pass. > That's good news. :-) > Another question I have is do you have a git branch with core- > scheduling > patches rebased on top of current staging available somewhere? > For my benchmarks, I used this: https://github.com/jgross1/xen/tree/sched-v1-rebase I don't know if Juergen has another one which is even more updated. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 18/07/2019 15:48, Juergen Gross wrote: > On 15.07.19 16:08, Sergey Dyasli wrote: >> On 05/07/2019 14:56, Dario Faggioli wrote: >>> On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- 8.0.6-d x86_64 debug=y Not tainted ] [...] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278] [] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518] [] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326] [] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258] [] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238] [] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566] [] pv_hypercall+0x364/0x564 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719] [] do_entry_int82+0x26/0x2d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876] [] entry_int82+0xbb/0xc0 >>> Mmm... vcpu_set_periodic_timer? >>> >>> What kernel is this and when does this crash happen? >> >> Hi Dario, >> >> I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) >> which has the following kernel: >> >> # uname -a >> >> Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux >> >> All I need to do is suspend and resume the VM. > > Happens with a more recent kernel, too. > > I can easily reproduce the issue with any PV guest with more than 1 vcpu > by doing "xl save" and then "xl restore" again. > > With the reproducer being available I'm now diving into the issue... One further thing to add is that I was able to avoid the crash by reverting xen/sched: rework and rename vcpu_force_reschedule() which is a part of the series. This made all tests with PV guests pass. Another question I have is do you have a git branch with core-scheduling patches rebased on top of current staging available somewhere? Thanks, Sergey ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 15.07.19 16:08, Sergey Dyasli wrote: On 05/07/2019 14:56, Dario Faggioli wrote: On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- 8.0.6-d x86_64 debug=y Not tainted ] [...] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278][] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518][] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326][] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258][] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238][] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.536566][] pv_hypercall+0x364/0x564 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.542719][] do_entry_int82+0x26/0x2d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.548876][] entry_int82+0xbb/0xc0 Mmm... vcpu_set_periodic_timer? What kernel is this and when does this crash happen? Hi Dario, I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) which has the following kernel: # uname -a Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux All I need to do is suspend and resume the VM. Happens with a more recent kernel, too. I can easily reproduce the issue with any PV guest with more than 1 vcpu by doing "xl save" and then "xl restore" again. With the reproducer being available I'm now diving into the issue... Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 05/07/2019 14:17, Sergey Dyasli wrote: > [2019-07-05 00:37:16 UTC] (XEN) [24907.482686] Watchdog timer detects that > CPU30 is stuck! > [2019-07-05 00:37:16 UTC] (XEN) [24907.514180] [ Xen-4.13.0-8.0.6-d > x86_64 debug=y Not tainted ] > [2019-07-05 00:37:16 UTC] (XEN) [24907.552070] CPU:30 > [2019-07-05 00:37:16 UTC] (XEN) [24907.565281] RIP: > e008:[] sched_context_switched+0xaf/0x101 > [2019-07-05 00:37:16 UTC] (XEN) [24907.601232] RFLAGS: 0202 > CONTEXT: hypervisor > [2019-07-05 00:37:16 UTC] (XEN) [24907.629998] rax: 0002 rbx: > 83202782e880 rcx: 001e > [2019-07-05 00:37:16 UTC] (XEN) [24907.669651] rdx: 83202782e904 rsi: > 832027823000 rdi: 832027823000 > [2019-07-05 00:37:16 UTC] (XEN) [24907.706560] rbp: 83403cab7d20 rsp: > 83403cab7d00 r8: > [2019-07-05 00:37:16 UTC] (XEN) [24907.743258] r9: r10: > 0200200200200200 r11: 0100100100100100 > [2019-07-05 00:37:16 UTC] (XEN) [24907.779940] r12: 832027823000 r13: > 832027823000 r14: 83202782e7b0 > [2019-07-05 00:37:16 UTC] (XEN) [24907.816849] r15: 83202782e880 cr0: > 8005003b cr4: 000426e0 > [2019-07-05 00:37:16 UTC] (XEN) [24907.854125] cr3: bd8a1000 cr2: > 1851b798 > [2019-07-05 00:37:16 UTC] (XEN) [24907.881483] fsb: gsb: > gss: > [2019-07-05 00:37:16 UTC] (XEN) [24907.918309] ds: es: fs: > gs: ss: cs: e008 > [2019-07-05 00:37:16 UTC] (XEN) [24907.952619] Xen code around > (sched_context_switched+0xaf/0x101): > [2019-07-05 00:37:16 UTC] (XEN) [24907.990277] 00 00 eb 18 f3 90 8b 02 <85> > c0 75 f8 eb 0e 49 8b 7e 30 48 85 ff 74 05 e8 > [2019-07-05 00:37:16 UTC] (XEN) [24908.032393] Xen stack trace from > rsp=83403cab7d00: > [2019-07-05 00:37:16 UTC] (XEN) [24908.061298]832027823000 > 832027823000 83202782e880 > [2019-07-05 00:37:16 UTC] (XEN) [24908.098529]83403cab7d60 > 82d0802407c0 0082 83202782e7c8 > [2019-07-05 00:37:16 UTC] (XEN) [24908.135622]001e > 83202782e7c8 001e 82d080602628 > [2019-07-05 00:37:16 UTC] (XEN) [24908.172671]83403cab7dc0 > 82d080240d83 df99 001e > [2019-07-05 00:37:16 UTC] (XEN) [24908.210212]832027823000 > 16a62dc8c6bc 00fc 001e > [2019-07-05 00:37:16 UTC] (XEN) [24908.247181]83202782e7c8 > 82d080602628 82d0805da460 001e > [2019-07-05 00:37:16 UTC] (XEN) [24908.284279]83403cab7e60 > 82d080240ea4 0002802aecc5 832027823000 > [2019-07-05 00:37:16 UTC] (XEN) [24908.321128]83202782e7b0 > 83202782e880 83403cab7e10 82d080273b4e > [2019-07-05 00:37:16 UTC] (XEN) [24908.358308]83403cab7e10 > 82d080242f7f 83403cab7e60 82d08024663a > [2019-07-05 00:37:17 UTC] (XEN) [24908.395662]83403cab7ea0 > 82d0802ec32a 834000ff 82d0805bc880 > [2019-07-05 00:37:17 UTC] (XEN) [24908.432376]82d0805bb980 > 83403cab7fff 001e > [2019-07-05 00:37:17 UTC] (XEN) [24908.469812]83403cab7e90 > 82d080242575 0f00 82d0805bb980 > [2019-07-05 00:37:17 UTC] (XEN) [24908.508373]001e > 82d0806026f0 83403cab7ea0 82d0802425ca > [2019-07-05 00:37:17 UTC] (XEN) [24908.549856]83403cab7ef0 > 82d08027a601 82d080242575 001e7ffde000 > [2019-07-05 00:37:17 UTC] (XEN) [24908.588022]832027823000 > 832027823000 83127ffde000 83203ffe5000 > [2019-07-05 00:37:17 UTC] (XEN) [24908.625217]001e > 831204092000 83403cab7d78 ffed > [2019-07-05 00:37:17 UTC] (XEN) [24908.662932]8180 > 8180 > [2019-07-05 00:37:17 UTC] (XEN) [24908.703246]818f4580 > 880039118848 0e6a3c4b2698 148900db > [2019-07-05 00:37:17 UTC] (XEN) [24908.743671] > 8101e650 8185c3e0 > [2019-07-05 00:37:17 UTC] (XEN) [24908.781927] > beefbeef 81054eb2 > [2019-07-05 00:37:17 UTC] (XEN) [24908.820986] Xen call trace: > [2019-07-05 00:37:17 UTC] (XEN) [24908.836789][] > sched_context_switched+0xaf/0x101 > [2019-07-05 00:37:17 UTC] (XEN) [24908.869916][] > schedule.c#sched_context_switch+0x72/0x151 > [2019-07-05 00:37:17 UTC] (XEN) [24908.907384][] > schedule.c#sched_slave+0x2a3/0x2b2 > [2019-07-05 00:37:17 UTC] (XEN) [24908.941241][] > schedule.c#schedule+0x112/0x2a1 > [2019-07-05 00:37:17 UTC] (XEN) [24908.973939][] > softirq.c#__do_softirq+0x85/0x90 > [2019-07-05 00:37:17 UTC] (XEN) [24909.007101][] > do_softirq+0x13/0x15 > [2019-07-05
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 05/07/2019 14:56, Dario Faggioli wrote: > On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: >> 1) This crash is quite likely to happen: >> >> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects >> that CPU2 is stuck! >> [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- >> 8.0.6-d x86_64 debug=y Not tainted ] >> [...] >> [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.505278][] vcpu_sleep_sync+0x50/0x71 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.511518][] vcpu_pause+0x21/0x23 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.517326][] >> vcpu_set_periodic_timer+0x27/0x73 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.524258][] do_vcpu_op+0x2c9/0x668 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.530238][] compat_vcpu_op+0x250/0x390 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.536566][] pv_hypercall+0x364/0x564 >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.542719][] do_entry_int82+0x26/0x2d >> [2019-07-04 18:22:47 UTC] (XEN) [ >> 3425.548876][] entry_int82+0xbb/0xc0 >> > Mmm... vcpu_set_periodic_timer? > > What kernel is this and when does this crash happen? Hi Dario, I can easily reproduce this crash using a Debian 7 PV VM (2 vCPUs, 2GB RAM) which has the following kernel: # uname -a Linux localhost 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux All I need to do is suspend and resume the VM. Thanks, Sergey ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On Tue, 2019-05-28 at 12:32 +0200, Juergen Gross wrote: > Add support for core- and socket-scheduling in the Xen hypervisor. > > [...] > > I have done some very basic performance testing: on a 4 cpu system > (2 cores with 2 threads each) I did a "make -j 4" for building the > Xen > hypervisor. With This test has been run on dom0, once with no other > guest active and once with another guest with 4 vcpus running the > same > test. The results are (always elapsed time, system time, user time): > > sched-gran=cpu,no other guest: 116.10 177.65 207.84 > sched-gran=core, no other guest: 114.04 175.47 207.45 > sched-gran=cpu,other guest:202.30 334.21 384.63 > sched-gran=core, other guest:207.24 293.04 371.37 > > The performance tests have been performed with credit2, the other > schedulers are tested only briefly to be able to create a domain in a > cpupool. > I have done some more, and I'd like to report the results here. For those that are attending the Xen-Project Dev Summit, and has seen Juergen's talk about core-scheduling, these are the numbers he had in his slides. They're quite a few number, and there are multiple way to show them. We arranged them in two different ways, and I'm showing both. Since it's quite likely that the result will be poor in mail clients, here they are the links to view in browser or download text files: http://xenbits.xen.org/people/dariof/benchmarks/results/2019/07-July/xen/core-sched/mmtests/boxes/wayrath/summary.txt http://xenbits.xen.org/people/dariof/benchmarks/results/2019/07-July/xen/core-sched/mmtests/boxes/wayrath/summary-5columns.txt They're the same numbers, in the two files, but in 'summary-5columns.txt' has them arranged in tables that show, for each combination of benchmark and configuration, the differences between the various options (i.e., no core-scheduling, core-scheduling not used, core-scheduling in use, etc). The 'summary.txt' file, contains some more data (such as the results of runs done on baremetal), arranged in different tables. It also contains some of my thoughts and analysis about what the numbers tells us. It's quite hard to come up with a concise summary, as results vary a lot on a case by case basis, and there are a few things that needs being investigated mode. I'll try anyway, but please, if you are interested in the subject, do have a look at the numbers themselves, even if there's a lot of them: - Overhead: the cost of having this patch series applied, and not using core-scheduling, seems acceptable to me. In most cases, the overhead is within the noise margin of the measurements. There are a couple of benchmarks where this is not the case. But that means we can go trying figuring out why this happens only there, and, potentially, optimize and tune. - PV vs. HVM: there seem to be some differences, in some of the results, for different type of guest (well, for PV I used dom0). In general, HVM seems to behave a little worse, i.e., suffers from more overhead and perf degradation, but this is not the case for all benchmarks, so it's hard to tell whether it's something specific or an actual trend. I don't have the numbers for proper PV guests and for PVH. I expect the former to be close to dom0 numbers and the latter to HVM numbers, but I'll try to do those runs as well (as soon as the testbox is free again). - HT vs. noHT: even before considering core-scheduling at all, the debate is still open about whether or not Hyperthreading help in the first place. These numbers shows that this very much depend on the workload and on the load, which is no big surprise. It is quite a solid trend, however, than when load is high (look, for instance, at runs that saturate the CPU, or at oversubscribed runs), Hyperthreading let us achieve better results. - Regressions versus no core-scheduling: this happens, as it could have been expected. It does not happen 100% of the times, and mileage may vary, but in most benchmarks and in most configurations, we do regress. - Core-scheduling vs. no-Hyperthreading: this is again a mixed bag. There are cases where things are faster in one setup, and cases where it is the other one that wins. Especially in the non overloaded case. - Core-scheduling and overloading: when more vCPUs than pCPUs are used (and there is actual overload, i.e., the vCPUs actually generate more load than there are pCPUs to satisfy), core-scheduling shows pretty decent performance. This is easy to see, comparing core- scheduling with no-Hyperthreading, in the overloaded cases. In most benchmark, both the configuration perform worse than default, but core-scheduling regresses a lot less than no-Hyperthreading. And this, I think, is quite important! - Core-scheduling and HT-aware scheduling: currently, the scheduler tend to spread vCPUs among cores. That is, if we have 2 vCPUs and 2 cores with two threads each, the
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On Fri, 2019-07-05 at 14:17 +0100, Sergey Dyasli wrote: > 1) This crash is quite likely to happen: > > [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects > that CPU2 is stuck! > [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0- > 8.0.6-d x86_64 debug=y Not tainted ] > [...] > [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.505278][] vcpu_sleep_sync+0x50/0x71 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.511518][] vcpu_pause+0x21/0x23 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.517326][] > vcpu_set_periodic_timer+0x27/0x73 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.524258][] do_vcpu_op+0x2c9/0x668 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.530238][] compat_vcpu_op+0x250/0x390 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.536566][] pv_hypercall+0x364/0x564 > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.542719][] do_entry_int82+0x26/0x2d > [2019-07-04 18:22:47 UTC] (XEN) [ > 3425.548876][] entry_int82+0xbb/0xc0 > Mmm... vcpu_set_periodic_timer? What kernel is this and when does this crash happen? Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
On 05.07.19 15:17, Sergey Dyasli wrote: Hi Juergen, I did some testing of this series (with sched-gran=core) and posting a couple of crash backtraces here for your information. Additionally, resuming a Debian 7 guest after suspend is broken. I will be able to provide any additional information only after XenSummit :) Thanks for the reports! I will look at this after XenSummit. :-) Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 00/60] xen: add core scheduling support
Hi Juergen, I did some testing of this series (with sched-gran=core) and posting a couple of crash backtraces here for your information. Additionally, resuming a Debian 7 guest after suspend is broken. I will be able to provide any additional information only after XenSummit :) 1) This crash is quite likely to happen: [2019-07-04 18:22:46 UTC] (XEN) [ 3425.220660] Watchdog timer detects that CPU2 is stuck! [2019-07-04 18:22:46 UTC] (XEN) [ 3425.226293] [ Xen-4.13.0-8.0.6-d x86_64 debug=y Not tainted ] [2019-07-04 18:22:46 UTC] (XEN) [ 3425.233576] CPU:2 [2019-07-04 18:22:46 UTC] (XEN) [ 3425.236348] RIP: e008:[] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:46 UTC] (XEN) [ 3425.243458] RFLAGS: 0202 CONTEXT: hypervisor (d34v0) [2019-07-04 18:22:47 UTC] (XEN) [ 3425.250129] rax: 0001 rbx: 8305f29e6000 rcx: 8305f29e6128 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.258101] rdx: rsi: 0296 rdi: 8308066f9128 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.266076] rbp: 8308066f7cb8 rsp: 8308066f7ca8 r8: deadf00d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.274052] r9: deadf00d r10: r11: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.282026] r12: r13: 8305f29e6000 r14: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.289994] r15: 0003 cr0: 8005003b cr4: 001526e0 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.297970] cr3: 0005f2de3000 cr2: c012ae78 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.303864] fsb: 04724000 gsb: c52c4a20 gss: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.311836] ds: 007b es: 007b fs: 00d8 gs: 00e0 ss: cs: e008 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.319290] Xen code around (vcpu_sleep_sync+0x50/0x71): [2019-07-04 18:22:47 UTC] (XEN) [ 3425.326744] ec 01 00 00 09 d0 48 98 <48> 0b 83 20 01 00 00 74 09 80 bb 07 01 00 00 00 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.335152] Xen stack trace from rsp=8308066f7ca8: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.340783]82d0802aede4 8305f29e6000 8308066f7cc8 82d080208370 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.348844]8308066f7ce8 82d08023e25d 0001 8305f33f [2019-07-04 18:22:47 UTC] (XEN) [ 3425.356904]8308066f7d58 82d080209682 031c63c966ad ed601000 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.364963]92920063 0009 8305f33f 0001 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.373024]0292 82d080242ee2 0001 8305f29e6000 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.381084] 8305f33f 8308066f7e28 82d08024f970 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.389144]8305f33f00d4 000c 8305f33f deadf00d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.397207]8308066f7da8 82d0802b3754 82d080209d46 82d08020b6e7 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.405262]8308066f7e28 82d08020c658 0002ec86be74 0002 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.413325]8305c33d8300 8305f33f00d4 000c0008 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.421383]0009 83081cca1000 82d08038835a 8308066f7ef8 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.429445]8306a2b11000 deadf00d 0180 0003 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.437503]8308066f7ec8 82d080383964 82d08038835a 82d7 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.445565]82d1 82d0 82d0deadf00d 82d0deadf00d [2019-07-04 18:22:47 UTC] (XEN) [ 3425.453624]82d08038835a 82d08038834e 82d08038835a 82d08038834e [2019-07-04 18:22:47 UTC] (XEN) [ 3425.461683]82d08038835a 82d08038834e 82d08038835a 8308066f7ef8 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.469744] [2019-07-04 18:22:47 UTC] (XEN) [ 3425.477803]8308066f7ee8 82d080385644 82d08038835a 8306a2b11000 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.485865]7cf7f99080e7 82d08038839b [2019-07-04 18:22:47 UTC] (XEN) [ 3425.493923] 0001 0007 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.501989] Xen call trace: [2019-07-04 18:22:47 UTC] (XEN) [ 3425.505278][] vcpu_sleep_sync+0x50/0x71 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.511518][] vcpu_pause+0x21/0x23 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.517326][] vcpu_set_periodic_timer+0x27/0x73 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.524258][] do_vcpu_op+0x2c9/0x668 [2019-07-04 18:22:47 UTC] (XEN) [ 3425.530238][] compat_vcpu_op+0x250/0x390 [2019-07-04 18:22:47 UTC] (XEN) [
[Xen-devel] [PATCH 00/60] xen: add core scheduling support
Add support for core- and socket-scheduling in the Xen hypervisor. Via boot parameter sched-gran=core (or sched-gran=socket) it is possible to change the scheduling granularity from cpu (the default) to either whole cores or even sockets. All logical cpus (threads) of the core or socket are always scheduled together. This means that on a core always vcpus of the same domain will be active, and those vcpus will always be scheduled at the same time. This is achieved by switching the scheduler to no longer see vcpus as the primary object to schedule, but "schedule units". Each schedule unit consists of as many vcpus as each core has threads on the current system. The vcpu->unit relation is fixed. I have done some very basic performance testing: on a 4 cpu system (2 cores with 2 threads each) I did a "make -j 4" for building the Xen hypervisor. With This test has been run on dom0, once with no other guest active and once with another guest with 4 vcpus running the same test. The results are (always elapsed time, system time, user time): sched-gran=cpu,no other guest: 116.10 177.65 207.84 sched-gran=core, no other guest: 114.04 175.47 207.45 sched-gran=cpu,other guest:202.30 334.21 384.63 sched-gran=core, other guest:207.24 293.04 371.37 The performance tests have been performed with credit2, the other schedulers are tested only briefly to be able to create a domain in a cpupool. Cpupools have been moderately tested (cpu add/remove, create, destroy, move domain). Cpu on-/offlining has been moderately tested, too. The complete patch series is available under: git://github.com/jgross1/xen/ sched-v1 Changes in V1: - cpupools are working now - cpu on-/offlining working now - all schedulers working now - renamed "items" to "units" - introduction of "idle scheduler" - several new patches (see individual patches, mostly splits of former patches or cpupool and cpu on-/offlining support) - all review comments addressed - some minor changes (see individual patches) Changes in RFC V2: - ARM is building now - HVM domains are working now - idling will always be done with idle_vcpu active - other small changes see individual patches Juergen Gross (60): xen/sched: only allow schedulers with all mandatory functions available xen/sched: add inline wrappers for calling per-scheduler functions xen/sched: let sched_switch_sched() return new lock address xen/sched: use new sched_unit instead of vcpu in scheduler interfaces xen/sched: alloc struct sched_unit for each vcpu xen/sched: move per-vcpu scheduler private data pointer to sched_unit xen/sched: build a linked list of struct sched_unit xen/sched: introduce struct sched_resource xen/sched: let pick_cpu return a scheduler resource xen/sched: switch schedule_data.curr to point at sched_unit xen/sched: move per cpu scheduler private data into struct sched_resource xen/sched: switch vcpu_schedule_lock to unit_schedule_lock xen/sched: move some per-vcpu items to struct sched_unit xen/sched: add scheduler helpers hiding vcpu xen/sched: add domain pointer to struct sched_unit xen/sched: add id to struct sched_unit xen/sched: rename scheduler related perf counters xen/sched: switch struct task_slice from vcpu to sched_unit xen/sched: add is_running indicator to struct sched_unit xen/sched: make null scheduler vcpu agnostic. xen/sched: make rt scheduler vcpu agnostic. xen/sched: make credit scheduler vcpu agnostic. xen/sched: make credit2 scheduler vcpu agnostic. xen/sched: make arinc653 scheduler vcpu agnostic. xen: add sched_unit_pause_nosync() and sched_unit_unpause() xen: let vcpu_create() select processor xen/sched: use sched_resource cpu instead smp_processor_id in schedulers xen/sched: switch schedule() from vcpus to sched_units xen/sched: switch sched_move_irqs() to take sched_unit as parameter xen: switch from for_each_vcpu() to for_each_sched_unit() xen/sched: add runstate counters to struct sched_unit xen/sched: rework and rename vcpu_force_reschedule() xen/sched: Change vcpu_migrate_*() to operate on schedule unit xen/sched: move struct task_slice into struct sched_unit xen/sched: add code to sync scheduling of all vcpus of a sched unit xen/sched: introduce unit_runnable_state() xen/sched: add support for multiple vcpus per sched unit where missing x86: make loading of GDT at context switch more modular x86: optimize loading of GDT at context switch xen/sched: modify cpupool_domain_cpumask() to be an unit mask xen/sched: support allocating multiple vcpus into one sched unit xen/sched: add a scheduler_percpu_init() function xen/sched: add a percpu resource index xen/sched: add fall back to idle vcpu when scheduling unit xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware xen/sched: carve out freeing sched_unit memory into dedicated function xen/sched: move per-cpu variable scheduler to struct sched_resource xen/sched: