Re: [Xen-devel] Regression, host crash with 4.5rc1
On 27.02.15 at 18:50, len.br...@intel.com wrote: If this issue were to happen on Linux/bare-metal, this is how I'd debug it. Hopefully some of this will translate to Xen in one way or another. Sadly not really - the kernel plays only a minor role (forwarding ACPI data to the hypervisor) in C-state handling under Xen. dmesg | grep idle will tell us what idle driver is running (on Dom0 kernel) and if it is intel_idle, it will also tell us the supported sub-states (CPUID.MWAIT.EDX value) Yeah, we call the driver mwait-idle in the hypervisor, and the log would be accssible via xl dmesg, but yes, that information is available there too. (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] I'm hopeful that this information comes from the hardware's BIOS and not some hypervisor tricking out Dom0 with a fake BIOS, yes? In the case of mwait-idle (intel_idle on Linux) it would be built-in knowledge of the driver. For acpi-cpuidle it would come from actual firmware, not anything fake/virtual. Next, hopefully the attached turbostat utility can be invoked on Dom0 and it can read the MSRs on at least 1 processor via the /dev/cpu interface. Yes, that would be possible, provided it's not important what specific CPU it gets executed on. It may tell us just the same thing I think we learned here: (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] which I'm assuming are a dump of the MSR residency counters. If yes, it appears to be that this platform is not invoking c6 and pc6 at all, and that the deepest state being used is actually cc3 and pc3. I don't know if that is because you've booted the kernel with max_cstate=N of some kind, or if this is default. Sadly I haven't been able to tell which original mail the quotes above are from, and since I had Steve experiment with disabling the deepest C-state permitted to be used, it may well be that this output was from one of those experiments. Remember, we already know that with use of C6 alone disabled things work for him (Steve - please correct me if I'm misremembering). Guessing... If no surprises in the debug stuff requested above, and If the XEN debug stuff above is with c6 explicitly disabled... Note that here are two kinds of c6 -- CC6 (core) and PC6 (package). If this box supports both, the next thing to try will be to keep CC6 enabled, but to just disable PC6. This is done via an MSR that turbostat dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility. I don't think the wrmsr tool can be used (unmodified) to reliably do this on all CPUs in the system - we'd likely have to cook up a patch to the hypervisor instead, or I'd have to hand my patch to msr-tools to Steve so he could use the tool under Xen (albeit that would also require him to use one of our forward ported kernels, as the upstream one doesn't have a pCPU sysfs interface yet afaik). Though if that MSR is locked by the BIOS, then BIOS SETUP option may be the only way to disable the package C-state limit without also disabling the associated core C-state. Steve, could you check whether any such option exists (it's been a while, so apologies if we had asked already)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
Len (CC'd on this email) is our power expert who has some ideas on this issue, I'll let him explain further. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, November 27, 2014 2:28 AM To: Steve Freitas; Dugger, Donald D; Nakajima, Jun Cc: xen-devel@lists.xen.org; Don Slutz Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1 On 27.11.14 at 06:29, sfl...@ihonk.com wrote: On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Okay, working through that now. I tried max_cstate=2 and got no hangs, whether with or without mwait-idle=0. However, I was puzzled by this: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] (XEN)*C0: usage[73351700] duration[9974627547595] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] (XEN) ==cpu1== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10699950] method[ FFH] duration[1141422044112] (XEN) C2: type[C1] latency[010] usage[06382904] method[ FFH] duration[1329739264322] (XEN)*C3: type[C2] latency[020] usage[44630764] method[ FFH] duration[31676618425954] (XEN) C0: usage[61713618] duration[9561201640320] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30066495105056] CC6[0] CC7[0] [...] Why would some of the cores be in C3 even though they list max_cstate as C2? This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Don, Jun - is there anything known but not yet publicly documented for Family 6 Model 44 Xeons? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
(Please forgive my lack of Xen-fu knowledge in advance) If this issue were to happen on Linux/bare-metal, this is how I'd debug it. Hopefully some of this will translate to Xen in one way or another. dmesg | grep idle will tell us what idle driver is running (on Dom0 kernel) and if it is intel_idle, it will also tell us the supported sub-states (CPUID.MWAIT.EDX value) grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* will tell us what states the OS is requesting, It will expand on the FFH bit here: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] I'm hopeful that this information comes from the hardware's BIOS and not some hypervisor tricking out Dom0 with a fake BIOS, yes? If Xen doesn't have cpuidle, or its sysfs, then acpidump for the platform should be able to tell us what the platform is exporting. Next, hopefully the attached turbostat utility can be invoked on Dom0 and it can read the MSRs on at least 1 processor via the /dev/cpu interface. This will tell you what the hardware supports, and what HW states are actually being invoked. (which may be different from what the OS asks for...) It may tell us just the same thing I think we learned here: (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] which I'm assuming are a dump of the MSR residency counters. If yes, it appears to be that this platform is not invoking c6 and pc6 at all, and that the deepest state being used is actually cc3 and pc3. I don't know if that is because you've booted the kernel with max_cstate=N of some kind, or if this is default. attached is turbostat, source and binary, run it this way and send the ts.out file: # ./turbostat --debug sleep 5 ts.out 21 Guessing... If no surprises in the debug stuff requested above, and If the XEN debug stuff above is with c6 explicitly disabled... Note that here are two kinds of c6 -- CC6 (core) and PC6 (package). If this box supports both, the next thing to try will be to keep CC6 enabled, but to just disable PC6. This is done via an MSR that turbostat dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility. Though if that MSR is locked by the BIOS, then BIOS SETUP option may be the only way to disable the package C-state limit without also disabling the associated core C-state. cheers, -Len ps. turbostat-test.tar.gz Description: turbostat-test.tar.gz ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
Jan- No, I have no knowledge of an unpublished errata related to C State issues. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, November 27, 2014 2:28 AM To: Steve Freitas; Dugger, Donald D; Nakajima, Jun Cc: xen-devel@lists.xen.org; Don Slutz Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1 On 27.11.14 at 06:29, sfl...@ihonk.com wrote: On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Okay, working through that now. I tried max_cstate=2 and got no hangs, whether with or without mwait-idle=0. However, I was puzzled by this: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] (XEN)*C0: usage[73351700] duration[9974627547595] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] (XEN) ==cpu1== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10699950] method[ FFH] duration[1141422044112] (XEN) C2: type[C1] latency[010] usage[06382904] method[ FFH] duration[1329739264322] (XEN)*C3: type[C2] latency[020] usage[44630764] method[ FFH] duration[31676618425954] (XEN) C0: usage[61713618] duration[9561201640320] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30066495105056] CC6[0] CC7[0] [...] Why would some of the cores be in C3 even though they list max_cstate as C2? This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Don, Jun - is there anything known but not yet publicly documented for Family 6 Model 44 Xeons? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/27/2014 01:27 AM, Jan Beulich wrote: This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). Ah, thanks for the explanation. So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. Yes, latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Now I did get a hang with max_cstate=3 and mwait-idle=0. May I assume that mwait-idle=0 means that ACPI is responsible for the throttling? Thanks again for all your help! Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On Nov 28, 2014, at 00:50, Jan Beulich jbeul...@suse.com wrote: On 28.11.14 at 09:24, sfl...@ihonk.com wrote: And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Now I did get a hang with max_cstate=3 and mwait-idle=0. According to the data you provided earlier, max_cstate=3 is identical to not using that option at all when you also use mwait-idle=0. It would make a difference only when not using that latter option (and I specifically pointed this out in earlier replies). Apologies for asking you to repeat yourself. Most of this stuff is over my head -- the only time I was this far down the rabbit hole was on an 8051. Thanks for your patience. :-) Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 27.11.14 at 06:29, sfl...@ihonk.com wrote: On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Okay, working through that now. I tried max_cstate=2 and got no hangs, whether with or without mwait-idle=0. However, I was puzzled by this: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] (XEN)*C0: usage[73351700] duration[9974627547595] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] (XEN) ==cpu1== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10699950] method[ FFH] duration[1141422044112] (XEN) C2: type[C1] latency[010] usage[06382904] method[ FFH] duration[1329739264322] (XEN)*C3: type[C2] latency[020] usage[44630764] method[ FFH] duration[31676618425954] (XEN) C0: usage[61713618] duration[9561201640320] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30066495105056] CC6[0] CC7[0] [...] Why would some of the cores be in C3 even though they list max_cstate as C2? This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index 1 is C1, at 2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the numbering the CPU documentation uses, it's rather kind of meant to refer to the ACPI numbering (but probably also not fully matching up). So max_cstate=2 working suggests a problem with what the CPU calls C6, which presumably isn't all that surprising considering the many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you already made sure you run with the latest available BIOS. And with 6 errata documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT behavior. The commit you bisected to (and which you had verified to be the culprit by just forcing arch_skip_send_event_check() to always return false) could be reasonably assumed to be broken only when MWAIT use for all C states didn't work. Don, Jun - is there anything known but not yet publicly documented for Family 6 Model 44 Xeons? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Okay, working through that now. I tried max_cstate=2 and got no hangs, whether with or without mwait-idle=0. However, I was puzzled by this: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[12219860] method[ FFH] duration[1190961948551] (XEN) C2: type[C1] latency[010] usage[10205554] method[ FFH] duration[2015393965907] (XEN) C3: type[C2] latency[020] usage[50926286] method[ FFH] duration[30527997858148] (XEN)*C0: usage[73351700] duration[9974627547595] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[28794734145697] CC6[0] CC7[0] (XEN) ==cpu1== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10699950] method[ FFH] duration[1141422044112] (XEN) C2: type[C1] latency[010] usage[06382904] method[ FFH] duration[1329739264322] (XEN)*C3: type[C2] latency[020] usage[44630764] method[ FFH] duration[31676618425954] (XEN) C0: usage[61713618] duration[9561201640320] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30066495105056] CC6[0] CC7[0] (XEN) ==cpu2== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10829791] method[ FFH] duration[1145244102917] (XEN) C2: type[C1] latency[010] usage[06392468] method[ FFH] duration[1330830147023] (XEN)*C3: type[C2] latency[020] usage[44705668] method[ FFH] duration[31741190985486] (XEN) C0: usage[61927927] duration[9491716216846] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30117696095715] CC6[0] CC7[0] (XEN) ==cpu3== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10692336] method[ FFH] duration[1144876437514] (XEN) C2: type[C1] latency[010] usage[06394051] method[ FFH] duration[1333961503379] (XEN)*C3: type[C2] latency[020] usage[44744178] method[ FFH] duration[31803488799434] (XEN) C0: usage[61830565] duration[9426654792908] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30191557548300] CC6[0] CC7[0] (XEN) ==cpu4== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10746634] method[ FFH] duration[1144044534459] (XEN) C2: type[C1] latency[010] usage[06444054] method[ FFH] duration[1340637424913] (XEN)*C3: type[C2] latency[020] usage[44690901] method[ FFH] duration[31663207165902] (XEN) C0: usage[61881589] duration[9561092487876] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] (XEN) CC3[30049235012919] CC6[0] CC7[0] (XEN) ==cpu5== (XEN) active state: C3 (XEN) max_cstate: C2 (XEN) states: (XEN) C1: type[C1] latency[003] usage[10694684] method[ FFH] duration[1140625901110] (XEN) C2: type[C1] latency[010] usage[06461563] method[ FFH] duration[1342115502967] (XEN)*C3: type[C2] latency[020] usage[44834522] method[ FFH] duration[31719560664023] (XEN) C0: usage[61990769] duration[9506679619986] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0] Why would some of the cores be in C3 even though they list max_cstate as C2? Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/25/2014 12:16 AM, Jan Beulich wrote: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state:C0 (XEN) max_cstate:C7 (XEN) states: (XEN) C1:type[C1] latency[001] usage[5664] method[ FFH] duration[4042540627] (XEN) C2:type[C3] latency[064] usage[0414] method[ FFH] duration[44725] (XEN) C3:type[C3] latency[096] usage[2366] method[ FFH] duration[28183588810] (XEN)*C0:usage[8444] duration[26752178344] (XEN) max=0 pwr=0 urg=0 nxt=0 (XEN) PC2[0] PC3[112428588] PC6[21869019218] PC7[0] (XEN) CC3[484210884] CC6[27943480555] CC7[0] Interesting, so other than for me (perhaps due to other patches I have in my tree) the change resulted in C states now being used again despite mwait-idle=0, which is good. Question now is - with this being the case, did the hangs re-occur? Unfortunately they did. (Happened unusually quick this time, though I doubt the statistical significance.) Not sure what the desirable output is, so I did a couple of 'a' and 'd' requests, capped off by a 'c': (XEN) *** Dumping CPU0 host state: *** (XEN) [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[82d08012c9a2] _spin_unlock_irq+0x30/0x31 (XEN) RFLAGS: 0246 CONTEXT: hypervisor (XEN) rax: rbx: 8300a943d000 rcx: (XEN) rdx: 82d0802e rsi: 0008 rdi: 82d080329308 (XEN) rbp: 82d0802e7ec8 rsp: 82d0802e7e40 r8: 82d080329320 (XEN) r9: r10: f88002f2c2a0 r11: f88002f36d70 (XEN) r12: 01227e5280e4 r13: 8300a943d000 r14: 82d080329308 (XEN) r15: 01c9c380 cr0: 8005003b cr4: 26f0 (XEN) cr3: 000c1b57c000 cr2: 07fefca62000 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen stack trace from rsp=82d0802e7e40: (XEN)82d080128cb5 82d080329300 82d080329320 002e7e78 (XEN)82d080329300 82d0801b977e 8300a943d000 f88002f36d70 (XEN)8300a943d000 01c9c380 82d0801e5600 82d0802e7f08 (XEN)82d080300080 82d080300080 82d0802e (XEN)fa8004fdff80 82d0802e7ef8 82d08012bfa3 8300a943d000 (XEN)f88002f36d70 018507f7ef25 000f 82d0802e7f08 (XEN)82d08012bffb 000f 82d0801e849a fa8004fdff80 (XEN)000f 018507f7ef25 f88002f36d70 000f (XEN)f88002f2c180 f88002f36d70 f88002f2c2a0 f880043cb960 (XEN)f88002f2c2a0 0002 f88002f2c1c0 0400 (XEN) f88002f36eb0 beefbeef f8000293e20c (XEN)00bfbeef 0046 f88002f36c20 beef (XEN)beef beef beef beef (XEN) 8300a943d000 (XEN) Xen call trace: (XEN)[82d08012c9a2] _spin_unlock_irq+0x30/0x31 (XEN)[82d08012bfa3] __do_softirq+0x81/0x8c (XEN)[82d08012bffb] do_softirq+0x13/0x15 (XEN)[82d0801e849a] vmx_asm_do_vmentry+0x2a/0x45 (XEN) (XEN) *** Dumping CPU0 guest state (d1v3): *** (XEN) [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:0010:[f8000293e20c] (XEN) RFLAGS: 0046 CONTEXT: hvm guest (XEN) rax: 0002 rbx: f88002f2c180 rcx: f88002f2c1c0 (XEN) rdx: 0400 rsi: rdi: f88002f36eb0 (XEN) rbp: 000f rsp: f88002f36c20 r8: f88002f2c2a0 (XEN) r9: f880043cb960 r10: f88002f2c2a0 r11: f88002f36d70 (XEN) r12: f88002f36d70 r13: 018507f7ef25 r14: 000f (XEN) r15: fa8004fdff80 cr0: 80050031 cr4: 06f8 (XEN) cr3: 536d9000 cr2: 07fefca62000 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: cs: 0010 (XEN) (XEN) *** Dumping CPU1 guest state (d1v4): *** (XEN) [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:0010:[f8000293e20e] (XEN) RFLAGS: 0046 CONTEXT: hvm guest (XEN) rax: 0002 rbx: f88002fa2180 rcx: f88002fa21c0 (XEN) rdx: 0400 rsi: rdi: f88002faceb0 (XEN) rbp: 000f rsp: f88002facc20 r8: f88002fa22a0 (XEN) r9: f88002fcaca0 r10: f88002fa22a0 r11: f88002facd70 (XEN) r12: f88002facd70 r13: r14: 000f (XEN) r15: f88002fa6fc0 cr0: 80050031 cr4: 06f8 (XEN) cr3: 00187000 cr2: 07faf478 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 (XEN) (XEN) *** Dumping CPU2 guest state (d1v5): *** (XEN) [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) CPU:2 (XEN)
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 25.11.14 at 10:38, sfl...@ihonk.com wrote: On 11/25/2014 12:16 AM, Jan Beulich wrote: Interesting, so other than for me (perhaps due to other patches I have in my tree) the change resulted in C states now being used again despite mwait-idle=0, which is good. Question now is - with this being the case, did the hangs re-occur? Unfortunately they did. (Happened unusually quick this time, though I doubt the statistical significance.) Not sure what the desirable output is, so I did a couple of 'a' and 'd' requests, capped off by a 'c': Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with max_cstate=. For that you'll have to remember that the option's effect differs between the ACPI and the MWAIT idle drivers. In the spirit of bisection I'd suggest using max_cstate=2 first no matter which of the two scenarios you pick. If that still hangs, max_cstate=1 obviously is the only other thing to try. Should that not hang (and you left out mwait-idle=0), trying max_cstate=3 in that same scenario would be the other case to check. No need for 'd' and 'a' output for the time being, but 'c' output would be much appreciated for all cases where you observe hangs. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 23.11.14 at 02:28, sfl...@ihonk.com wrote: With mwait-idle=0: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C7 (XEN) states: (XEN) C1: type[C1] latency[001] usage[] method[ FFH] duration[0] (XEN) C2: type[C0] latency[000] usage[] method[ NONE] duration[0] (XEN) C3: type[C3] latency[064] usage[] method[ FFH] duration[0] (XEN) C4: type[C3] latency[096] usage[] method[ FFH] duration[0] (XEN)*C0: usage[] duration[46930624784] (XEN) PC2[0] PC3[0] PC6[0] PC7[0] (XEN) CC3[0] CC6[0] CC7[0] [...] Very interesting - the hypervisor has C-state information, but never entered any of them. That certainly explains the difference between using/not using the ,wait-idle driver, but puts us back to there being a more general issue with C-state use on this CPU model. Possibly related to C2 having entry method NONE, but then again I can't see how such a state could get entered into the table the first place: set_cx() bails upon check_cx() returning an error, and hence its switch()'s default statement should never be reached. Plus even if an array entry was set to NONE, it should simply be ignored when looking for a state to enter. I'll probably need to put together a debugging patch to figure out what's going on here. In any event C2 being set to NONE and that information presumably coming from firmware is an indication that there's a problem with C2 (note that the numbering doesn't really match up with what the document says, this likely really is C1E) on that CPU. Which gets us back to ... CPU information for one of the cores, 2.8 GHz is nominal, stepping is 2. Not sure how to translate that stepping number into Intel's format: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5660 @ 2.80GHz stepping: 2 [...] There are a couple potentially relevant errata (BC36, BC38, BC54, BC77, BC110). To exclude BC36, a boot log with apic-verbosity=debug and debug key 'i' output would be necessary. Done, see the very end of the email. BC38 should not affect us since we don't enter C states from ISRs. BC54 is probably irrelevant since we meanwhile know that your system doesn't really hang hard. For BC77 it would be worth trying to disable turbo mode instead of disabling the mwait-idle driver (xenpm disable-turbo-mode right after boot). I looked up BC77 but as a result found this document[1], which seems to relate to the i7. Would this[2] not be the relevant document? [1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd ates/core-i7-900-ee-and-desktop-processor-series-32nm-spec-update.pdf [2] http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd ates/xeon-5600-specification-update.pdf Indeed. I wasn't aware that there are family/model/stepping tuples that can be both Xeon and desktop CPUs. As promised, below is the apic-verbosity=debug log, with 'i'. Thanks! I'm sorry, I misspelled the option, it's really apic_verbosity=debug. The 'i' output at least already confirms that there are no ExtINT entries among the IO-APIC RTEs. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On Nov 24, 2014, at 00:45, Jan Beulich jbeul...@suse.com wrote: On 23.11.14 at 02:28, sfl...@ihonk.com wrote: With mwait-idle=0: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C7 (XEN) states: (XEN) C1: type[C1] latency[001] usage[] method[ FFH] duration[0] (XEN) C2: type[C0] latency[000] usage[] method[ NONE] duration[0] (XEN) C3: type[C3] latency[064] usage[] method[ FFH] duration[0] (XEN) C4: type[C3] latency[096] usage[] method[ FFH] duration[0] (XEN)*C0: usage[] duration[46930624784] (XEN) PC2[0] PC3[0] PC6[0] PC7[0] (XEN) CC3[0] CC6[0] CC7[0] [...] Very interesting - the hypervisor has C-state information, but never entered any of them. That certainly explains the difference between using/not using the ,wait-idle driver, but puts us back to there being a more general issue with C-state use on this CPU model. Possibly related to C2 having entry method NONE, but then again I can't see how such a state could get entered into the table the first place: set_cx() bails upon check_cx() returning an error, and hence its switch()'s default statement should never be reached. Plus even if an array entry was set to NONE, it should simply be ignored when looking for a state to enter. I'll probably need to put together a debugging patch to figure out what's going on here. Okay, happy to give it a go whenever you have the time to put something together. As promised, below is the apic-verbosity=debug log, with 'i'. Thanks! I'm sorry, I misspelled the option, it's really apic_verbosity=debug. The 'i' output at least already confirms that there are no ExtINT entries among the IO-APIC RTEs. No sweat. Do you need me to run it again with the corrected option? Thanks! Steve ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 24.11.14 at 10:08, sfl...@ihonk.com wrote: On Nov 24, 2014, at 00:45, Jan Beulich jbeul...@suse.com wrote: On 23.11.14 at 02:28, sfl...@ihonk.com wrote: With mwait-idle=0: (XEN) 'c' pressed - printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state: C0 (XEN) max_cstate: C7 (XEN) states: (XEN) C1: type[C1] latency[001] usage[] method[ FFH] duration[0] (XEN) C2: type[C0] latency[000] usage[] method[ NONE] duration[0] (XEN) C3: type[C3] latency[064] usage[] method[ FFH] duration[0] (XEN) C4: type[C3] latency[096] usage[] method[ FFH] duration[0] (XEN)*C0: usage[] duration[46930624784] (XEN) PC2[0] PC3[0] PC6[0] PC7[0] (XEN) CC3[0] CC6[0] CC7[0] [...] Very interesting - the hypervisor has C-state information, but never entered any of them. That certainly explains the difference between using/not using the ,wait-idle driver, but puts us back to there being a more general issue with C-state use on this CPU model. Possibly related to C2 having entry method NONE, but then again I can't see how such a state could get entered into the table the first place: set_cx() bails upon check_cx() returning an error, and hence its switch()'s default statement should never be reached. Plus even if an array entry was set to NONE, it should simply be ignored when looking for a state to enter. I'll probably need to put together a debugging patch to figure out what's going on here. Okay, happy to give it a go whenever you have the time to put something together. While putting this together I found the reason for the strange C2: line, and the attached debugging patch already has the fix for it (which I'll also submit separately, and hence you may need to drop that specific hunk should you end up applying it on a tree which already has that fix). You'll need to again run with mwait-idle=0, and it's the boot messages along with the 'c' debug key output that's of interest. Thanks, Jan --- unstable.orig/xen/arch/x86/acpi/cpu_idle.c +++ unstable/xen/arch/x86/acpi/cpu_idle.c @@ -58,7 +58,7 @@ #include xen/notifier.h #include xen/cpu.h -/*#define DEBUG_PM_CX*/ +#define DEBUG_PM_CX #define GET_HW_RES_IN_NS(msr, val) \ do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 ) @@ -238,6 +238,9 @@ static char* acpi_cstate_method_name[] = HALT }; +struct reasons { unsigned long max, pwr, urg, nxt; };//temp +static DEFINE_PER_CPU(struct reasons, reasons);//temp + static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power) { uint32_t i, idle_usage = 0; @@ -273,6 +276,8 @@ static void print_acpi_power(uint32_t cp printk((last_state_idx == 0) ?* : ); printk(C0:\tusage[%08d] duration[%PRId64]\n, idle_usage, NOW() - idle_res); +printk(max=%lx pwr=%lx urg=%lx nxt=%lx\n,//temp + per_cpu(reasons.max, cpu), per_cpu(reasons.pwr, cpu), per_cpu(reasons.urg, cpu), per_cpu(reasons.nxt, cpu)); print_hw_residencies(cpu); } @@ -501,6 +506,7 @@ static void acpi_processor_idle(void) u32 exp = 0, pred = 0; u32 irq_traced[4] = { 0 }; +next_state = 1;//temp if ( max_cstate 0 power !sched_has_urgent_vcpu() (next_state = cpuidle_current_governor-select(power)) 0 ) { @@ -519,6 +525,10 @@ static void acpi_processor_idle(void) } if ( !cx ) { +this_cpu(reasons.max) += max_cstate = 0;//temp +this_cpu(reasons.pwr) += !power;//temp +this_cpu(reasons.urg) += !!sched_has_urgent_vcpu();//temp +this_cpu(reasons.nxt) += next_state = 0;//temp if ( pm_idle_save ) pm_idle_save(); else @@ -1007,6 +1017,7 @@ static void set_cx( cx-entry_method = ACPI_CSTATE_EM_SYSIO; break; default: +printk(CPU%u: C%u space %x?\n, acpi_power-cpu, xen_cx-type, xen_cx-reg.space_id);//temp cx-entry_method = ACPI_CSTATE_EM_NONE; break; } @@ -1015,7 +1026,7 @@ static void set_cx( cx-target_residency = cx-latency * latency_factor; smp_wmb(); -acpi_power-count++; +acpi_power-count += (cx-type != ACPI_STATE_C1); if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 ) acpi_power-safe_state = cx; } @@ -1141,6 +1152,7 @@ long set_cx_pminfo(uint32_t cpu, struct /* FIXME: C-state dependency is not supported by far */ +printk(CPU%u: %pS, %pS\n, cpu, pm_idle_save, pm_idle);//temp if ( cpu_id == 0 ) { if ( pm_idle_save == NULL ) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 20.11.14 at 21:07, sfl...@ihonk.com wrote: Running with mwait-idle=0 solves (hides?) the problem. Next step is to fiddle with the C states? So this also prompted me to go over the list of errata. Just to confirm - your CPU is family 6 model 44? What stepping? And what nominal frequency? There are a couple potentially relevant errata (BC36, BC38, BC54, BC77, BC110). To exclude BC36, a boot log with apic-verbosity=debug and debug key 'i' output would be necessary. BC38 should not affect us since we don't enter C states from ISRs. BC54 is probably irrelevant since we meanwhile know that your system doesn't really hang hard. For BC77 it would be worth trying to disable turbo mode instead of disabling the mwait-idle driver (xenpm disable-turbo-mode right after boot). And BC110 would be relevant only if without the mwait-idle driver there would be no use of C3. Plus anyway this would more likely end up in a hard hang too. And then, considering that my system with a family 6 model 44 CPU has never shown anything similar (albeit that doesn't mean all that much since our workloads are likely very different), you're not over-clocking? And did you disable hyper-threading on purpose (if so could you check whether enabling it makes a difference)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 20.11.14 at 02:23, sfl...@ihonk.com wrote: On 11/17/2014 23:54, Jan Beulich wrote: Another thing - now that serial logging appears to be working for you, did you try whether the host, once hung, still reacts to serial input (perhaps force input to go to Xen right at boot via the conswitch= option)? If so, 'd' debug-key output would likely be the piece of most interest. Here you go. Performed with a checkout of 9a727a81 (because it was handy), let me know if you'd rather see the results from 4.5-rc2 or any other Xen debugging info: Interesting - all CPUs are executing stuff from Dom1, which be itself is not indicating a problem, but may (considering the host hang) hint at guest vCPU-s not getting de-scheduled anymore on one or more of the CPUs. The 'a' debug key output would hopefully give us enough information to know whether that's the case. Ideally do 'd' and 'a' a couple of times each (alternating, and with a few seconds in between). Also, for double checking purposes, could you make the xen-syms of a build you observed the problem with available somewhere? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
Hi Jan, Thanks for all your help so far! Here's my latest update. On 11/17/2014 23:54, Jan Beulich wrote: Plus, without said adjustment, first just disable the MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make a difference, use of C states altogether (cpuidle=0). If any of this does make a difference, limiting use of C states without fully excluding their use may need to be the next step. Running with mwait-idle=0 solves (hides?) the problem. Next step is to fiddle with the C states? On 11/19/2014 23:59, Jan Beulich wrote: Also, for double checking purposes, could you make the xen-syms of a build you observed the problem with available somewhere? The xen-syms (from my build of 9a727a81) can be found here: http://steve.freitas.org/xen-syms-4.5-unstable.gz Interesting - all CPUs are executing stuff from Dom1, which be itself is not indicating a problem, but may (considering the host hang) hint at guest vCPU-s not getting de-scheduled anymore on one or more of the CPUs. The 'a' debug key output would hopefully give us enough information to know whether that's the case. Ideally do 'd' and 'a' a couple of times each (alternating, and with a few seconds in between). Here ya go (as before, from 9a727a81). I had to pick and choose what parts to pull from the log because it was getting spammed with kernel complaints about the disk subsystem being locked up, so the hypervisor debugging info was hard to read amidst the kernel errors. As ever, let me know if you need more. Thanks again! Steve (XEN) CPU00: (XEN) ex=1445us timer=8300bf526060 cb=vcpu_singleshot_timer_fn(8300bf526000) (XEN) ex=9918us timer=830c3dc4b1e0 cb=csched_acct(830c3dc4b1c0) (XEN) ex=8390us timer=830c3dcc2d08 cb=csched_tick() (XEN) ex=70409499us timer=82d08031d1e0 cb=plt_overflow() (XEN) ex=12265483us timer=82d08031f4e0 cb=mce_work_fn() (XEN) ex= 94364us timer=82d08031d280 cb=time_calibration() (XEN) ex= 18390us timer=82d080321560 cb=do_dbs_timer(82d0803215a0) (XEN) CPU01: (XEN) ex= 390us timer=830c17ceb460 cb=pt_timer_fn(830c17ceb420) (XEN) ex=14101194us timer=830c17ceb4e0 cb=pt_timer_fn(830c17ceb4a0) (XEN) ex= 153445us timer=8300bf524060 cb=vcpu_singleshot_timer_fn(8300bf524000) (XEN) ex= 44171681527us timer=830c17ceb290 cb=rtc_alarm_cb(830c17ceb1f0) (XEN) CPU02: (XEN) ex=1445us timer=8300bf798060 cb=vcpu_singleshot_timer_fn(8300bf798000) (XEN) ex=8390us timer=830c3dc797c8 cb=csched_tick(0002) (XEN) ex= 18390us timer=830c3dcb8360 cb=do_dbs_timer(830c3dcb83a0) (XEN) ex= 29570us timer=830c3dcb80a0 cb=s_timer_fn() (XEN) CPU03: (XEN) ex= 25445us timer=8300bf2fb060 cb=vcpu_singleshot_timer_fn(8300bf2fb000) (XEN) CPU04: (XEN) ex= 634us timer=8300bf525060 cb=vcpu_singleshot_timer_fn(8300bf525000) (XEN) CPU05: (XEN) ex=1445us timer=8300bf527060 cb=vcpu_singleshot_timer_fn(8300bf527000) (XEN) ex= 388096702us timer=830c17ceb5d0 cb=pmt_timer_callback(830c17ceb5a8) (XEN) 'd' pressed - dumping registers (XEN) (XEN) *** Dumping CPU0 guest state (d1v4): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:0010:[f8000298a20e] (XEN) RFLAGS: 0046 CONTEXT: hvm guest (XEN) rax: 0002 rbx: f88002fa2180 rcx: f88002fa21c0 (XEN) rdx: 0400 rsi: rdi: f88002faceb0 (XEN) rbp: 000f rsp: f88002facc20 r8: f88002fa22a0 (XEN) r9: 03295cdc3c57 r10: f88002fa22a0 r11: f88002facd70 (XEN) r12: f88002facd70 r13: 03940027203c r14: 000f (XEN) r15: 0001 cr0: 80050031 cr4: 06f8 (XEN) cr3: 00187000 cr2: 01cb8300 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 (XEN) (XEN) *** Dumping CPU1 guest state (d1v1): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:0010:[f8000298a20e] (XEN) RFLAGS: 0046 CONTEXT: hvm guest (XEN) rax: 0002 rbx: f88002e40180 rcx: f88002e401c0 (XEN) rdx: 0400 rsi: rdi: f88002e4aeb0 (XEN) rbp: 000f rsp: f88002e4ac20 r8: f88002e402a0 (XEN) r9: 0256dc1c8f33 r10: f88002e402a0 r11: f88002e4ad70 (XEN) r12: f88002e4ad70 r13: 0394002722bb r14: 000f (XEN) r15: 0001 cr0: 80050031 cr4: 06f8 (XEN) cr3: 00187000 cr2: 002f7d38 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 (XEN) (XEN) *** Dumping CPU2 guest state (d1v0):
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/17/2014 23:54, Jan Beulich wrote: On 17.11.14 at 20:21, sfl...@ihonk.com wrote: Okay, I did a bisection and was not able to correlate the above error message with the problem I'm seeing. Not saying it's not related, but I had plenty of successful test runs in the presence of that error. Took me about a week (sometimes it takes as much as 6 hours to produce the error), but bisect narrowed it down to this commit: http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d c3fa47e621f9e238 What do you think? Thanks for narrowing this, even if this change didn't show any other bad effects so far (and it's been widely tested by now), and even if problems here would generally be expected to surface independent of the use of PCI pass-through. But a hang (rather than a crash) would indeed be the most natural result of something being wrong here. To double check the result, could you, in an up-to-date tree, simply make x86's arch_skip_send_event_check() return 0 unconditionally? Made this change and the host was happy. Plus, without said adjustment, first just disable the MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make a difference, use of C states altogether (cpuidle=0). If any of this does make a difference, limiting use of C states without fully excluding their use may need to be the next step. Will do this next. Another thing - now that serial logging appears to be working for you, did you try whether the host, once hung, still reacts to serial input (perhaps force input to go to Xen right at boot via the conswitch= option)? If so, 'd' debug-key output would likely be the piece of most interest. Here you go. Performed with a checkout of 9a727a81 (because it was handy), let me know if you'd rather see the results from 4.5-rc2 or any other Xen debugging info: (XEN) 'd' pressed - dumping registers (XEN) (XEN) *** Dumping CPU0 guest state (d1v2): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:0010:[f8000281e2c1] (XEN) RFLAGS: 0002 CONTEXT: hvm guest (XEN) rax: 3acd4939f3e7 rbx: 3acd493a0cce rcx: (XEN) rdx: 3acd rsi: rdi: 0057 (XEN) rbp: 645c rsp: f880033edf90 r8: f880033edff0 (XEN) r9: r10: f880033ee040 r11: 000342934690 (XEN) r12: f880033ee3c8 r13: 1000 r14: (XEN) r15: 0058 cr0: 80050031 cr4: 06f8 (XEN) cr3: 66aca000 cr2: f9800268 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 (XEN) (XEN) *** Dumping CPU1 host state: *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:e008:[82d08012a9a1] _spin_unlock_irq+0x30/0x31 (XEN) RFLAGS: 0246 CONTEXT: hypervisor (XEN) rax: rbx: 8300a943e000 rcx: 0001 (XEN) rdx: 830c3dc7 rsi: 0004 rdi: 830c3dc7a088 (XEN) rbp: 830c3dc77ec8 rsp: 830c3dc77e40 r8: 830c3dc7a0a0 (XEN) r9: r10: f88002fd82a0 r11: f88002fe2d70 (XEN) r12: 151cc8b48756 r13: 8300a943e000 r14: 830c3dc7a088 (XEN) r15: 01c9c380 cr0: 8005003b cr4: 26f0 (XEN) cr3: 000c18962000 cr2: ff331aa0 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen stack trace from rsp=830c3dc77e40: (XEN)82d080126ec5 82d080321280 830c3dc7a0a0 000100c77e78 (XEN)830c3dc7a080 82d0801b5277 8300a943e000 f88002fe2d70 (XEN)8300a943e000 01c9c380 82d0801e0f00 830c3dc77f08 (XEN)82d0802f8080 82d0802f8000 830c3dc7 (XEN)0001 830c3dc77ef8 82d08012a1b3 8300a943e000 (XEN)f88002fe2d70 36d08fbeebe8 000f 830c3dc77f08 (XEN)82d08012a20b 000f 82d0801e3d2a 0001 (XEN)000f 36d08fbeebe8 f88002fe2d70 000f (XEN)f88002fd8180 f88002fe2d70 f88002fd82a0 34711df61755 (XEN)f88002fd82a0 0002 f88002fd81c0 0400 (XEN) f88002fe2eb0 beefbeef f8000298520c (XEN)00bfbeef 0046 f88002fe2c20 beef (XEN)c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef (XEN)c2c2c2c20001 8300a943e000 003bbd958e00 c2c2c2c2c2c2c2c2 (XEN) Xen call trace: (XEN)[82d08012a9a1] _spin_unlock_irq+0x30/0x31 (XEN)[82d08012a1b3] __do_softirq+0x81/0x8c (XEN)[82d08012a20b] do_softirq+0x13/0x15 (XEN)[82d0801e3d2a] vmx_asm_do_vmentry+0x2a/0x45 (XEN) (XEN) *** Dumping CPU1 guest state (d1v5): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/10/2014 0:51, Jan Beulich wrote: On 10.11.14 at 09:03, sfl...@ihonk.com wrote: Sorry for the delay, took some debugging on another computer to get serial logging working. Due to its size, I've posted the entire log of a crashed session here: http://pastebin.com/AiPHUZRH In this case I used a 3.0 gig memory size for the Windows domU. As I mentioned before, sometimes it's the SATA that goes first, other times the tg3 ethernet driver. Also note that between 4.4.1 and 4.5rc1, the kernel I'm using (stock Debian Jessie) has not changed. Please let me know if you need any other information. Thanks! Raising the kernel log level to maximum too would have helped. Okay, I've done that and the output is here, let me know if you have any preferred logging flags instead: http://pastebin.com/M3yvWNTT Regardless of that, the first device showing anomalies here appears to be the UHCI controller: [ 147.415713] usb 7-1: reset low-speed USB device number 2 using uhci_hcd while booting the guest. I assume this is related to the USB device (a keyboard) I'm passing through to the domU. And these [ 199.775209] pcieport :00:03.0: AER: Multiple Corrected error received: id=0018 [ 199.775238] pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0018(Transmitter ID) [ 199.775251] pcieport :00:03.0: device [8086:340a] error status/mask=1100/2000 [ 199.775255] pcieport :00:03.0:[ 8] RELAY_NUM Rollover [ 199.775258] pcieport :00:03.0:[12] Replay Timer Timeout hint at a problem in the system's design. 00:03.0 is the parent bridge of 02:00.0 (and from what I can tell that's the only device behind that bridge), and hence the above messages can only reasonably have their origin at the passed through VGA device. You are correct that the VGA card is the only device on 03.0: root@g2:~# lspci -tv -[:00]-+-00.0 Intel Corporation 5520 I/O Hub to ESI Port +-01.0-[01]00.0 Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B +-03.0-[02]00.0 NVIDIA Corporation GT200GL [Quadro FX 4800] +-07.0-[03]-- +-14.0 Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers +-14.1 Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers +-14.2 Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers +-16.0 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.1 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.2 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.3 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.4 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.5 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.6 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-16.7 Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device +-1a.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 +-1a.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 +-1b.0 Intel Corporation 82801JI (ICH10 Family) HD Audio Controller +-1c.0-[04]-- +-1c.4-[05]00.0 Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express +-1c.5-[06-09]00.0-[07-09]--+-02.0-[08]-- | \-03.0-[09]00.0 Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express +-1d.0 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 +-1d.3 Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 +-1d.7 Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 +-1e.0-[0a]0e.0 Advanced Micro Devices, Inc. [AMD/ATI] RV100 [Radeon 7000 / Radeon VE] +-1f.0 Intel Corporation 82801JIB (ICH10) LPC Interface Controller +-1f.2 Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller \-1f.3 Intel Corporation 82801JI (ICH10 Family) SMBus Controller What problem in the system's design does this hint at? IOW it may well be that you were just lucky that things worked earlier on. Certainly possible but this is a very common machine in the corporate world -- a Lenovo ThinkStation D20 running the X58 chipset. If it's an inherent defect in the machine and somebody else hasn't already tripped over it I would be very surprised.