Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-03-02 Thread Jan Beulich
 On 27.02.15 at 18:50, len.br...@intel.com wrote:
 If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
 Hopefully some of this will translate to Xen in one way or another.

Sadly not really - the kernel plays only a minor role (forwarding ACPI
data to the hypervisor) in C-state handling under Xen.

 dmesg | grep idle
 will tell us what idle driver is running (on Dom0 kernel)
 and if it is intel_idle, it will also tell us the supported sub-states 
 (CPUID.MWAIT.EDX value)

Yeah, we call the driver mwait-idle in the hypervisor, and the log
would be accssible via xl dmesg, but yes, that information is
available there too.

  (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH]
  duration[1190961948551]
  (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH]
  duration[2015393965907]
  (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH]
  duration[30527997858148]
 
 I'm hopeful that this information comes from the hardware's BIOS
 and not some hypervisor tricking out Dom0 with a fake BIOS, yes?

In the case of mwait-idle (intel_idle on Linux) it would be built-in
knowledge of the driver. For acpi-cpuidle it would come from
actual firmware, not anything fake/virtual.

 Next, hopefully the attached turbostat utility can be invoked on Dom0
 and it can read the MSRs on at least 1 processor via the /dev/cpu interface.

Yes, that would be possible, provided it's not important what specific
CPU it gets executed on.

 It may tell us just the same thing I think we learned here:
 
  (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
  (XEN) CC3[28794734145697] CC6[0] CC7[0]
 
 which I'm assuming are a dump of the MSR residency counters.
 If yes, it appears to be that this platform is not invoking c6 and pc6 at 
 all,
 and that the deepest state being used is actually cc3 and pc3.
 I don't know if that is because you've booted the kernel with max_cstate=N
 of some kind, or if this is default.

Sadly I haven't been able to tell which original mail the quotes
above are from, and since I had Steve experiment with disabling
the deepest C-state permitted to be used, it may well be that
this output was from one of those experiments. Remember, we
already know that with use of C6 alone disabled things work for
him (Steve - please correct me if I'm misremembering).

 Guessing...
 If no surprises in the debug stuff requested above, and
 If the XEN debug stuff above is with c6 explicitly disabled...
 Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
 If this box supports both, the next thing to try will be to keep CC6
 enabled, but to just disable PC6.  This is done via an MSR that turbostat
 dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.

I don't think the wrmsr tool can be used (unmodified) to reliably do
this on all CPUs in the system - we'd likely have to cook up a patch
to the hypervisor instead, or I'd have to hand my patch to msr-tools
to Steve so he could use the tool under Xen (albeit that would also
require him to use one of our forward ported kernels, as the
upstream one doesn't have a pCPU sysfs interface yet afaik).

 Though if that MSR is locked by the BIOS, then BIOS SETUP option
 may be the only way to disable the package C-state limit without
 also disabling the associated core C-state.

Steve, could you check whether any such option exists (it's been
a while, so apologies if we had asked already)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Dugger, Donald D
Len (CC'd on this email) is our power expert who has some ideas on this issue, 
I'll let him explain further.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com] 
Sent: Thursday, November 27, 2014 2:28 AM
To: Steve Freitas; Dugger, Donald D; Nakajima, Jun
Cc: xen-devel@lists.xen.org; Don Slutz
Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1

 On 27.11.14 at 06:29, sfl...@ihonk.com wrote:
 On 11/25/2014 03:00 AM, Jan Beulich wrote:
 Okay, so it's not really the mwait-idle driver causing the 
 regression, but it is C-state related. Hence we're now down to seeing 
 whether all or just the deeper C states are affected, i.e. I now need 
 to ask you to play with max_cstate=. For that you'll have to 
 remember that the option's effect differs between the ACPI and the MWAIT 
 idle drivers.
 In the spirit of bisection I'd suggest using max_cstate=2 first no 
 matter which of the two scenarios you pick. If that still hangs, 
 max_cstate=1 obviously is the only other thing to try. Should that 
 not hang (and you left out mwait-idle=0), trying max_cstate=3
 in that same scenario would be the other case to check.

 No need for 'd' and 'a' output for the time being, but 'c' output 
 would be much appreciated for all cases where you observe hangs.

 
 Okay, working through that now. I tried max_cstate=2 and got no hangs, 
 whether with or without mwait-idle=0. However, I was puzzled by this:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
 duration[1190961948551]
 (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
 duration[2015393965907]
 (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
 duration[30527997858148]
 (XEN)*C0:   usage[73351700] duration[9974627547595]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[28794734145697] CC6[0] CC7[0]
 (XEN) ==cpu1==
 (XEN) active state: C3
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
 duration[1141422044112]
 (XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
 duration[1329739264322]
 (XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
 duration[31676618425954]
 (XEN) C0:   usage[61713618] duration[9561201640320]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[30066495105056] CC6[0] CC7[0] [...]
 
 Why would some of the cores be in C3 even though they list max_cstate as C2?

This was precisely the reason why I told you that the numbering differs (and is 
confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is what above is 
listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the 
numbering the CPU documentation uses, it's rather kind of meant to refer to the 
ACPI numbering (but probably also not fully matching up).

So max_cstate=2 working suggests a problem with what the CPU calls C6, which 
presumably isn't all that surprising considering the many errata (BD35, BD38, 
BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you 
already made sure you run with the latest available BIOS. And with 6 errata 
documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT 
behavior. The commit you bisected to (and which you had verified to be the 
culprit by just forcing
arch_skip_send_event_check() to always return false) could be reasonably 
assumed to be broken only when MWAIT use for all C states didn't work.

Don, Jun - is there anything known but not yet publicly documented for Family 6 
Model 44 Xeons?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Brown, Len
(Please forgive my lack of Xen-fu knowledge in advance)

If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
Hopefully some of this will translate to Xen in one way or another.

dmesg | grep idle
will tell us what idle driver is running (on Dom0 kernel)
and if it is intel_idle, it will also tell us the supported sub-states 
(CPUID.MWAIT.EDX value)

grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
will tell us what states the OS is requesting,
It will expand on the FFH bit here:

  (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH]
  duration[1190961948551]
  (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH]
  duration[2015393965907]
  (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH]
  duration[30527997858148]

I'm hopeful that this information comes from the hardware's BIOS
and not some hypervisor tricking out Dom0 with a fake BIOS, yes?

If Xen doesn't have cpuidle, or its sysfs, then acpidump for the platform
should be able to tell us what the platform is exporting.

Next, hopefully the attached turbostat utility can be invoked on Dom0
and it can read the MSRs on at least 1 processor via the /dev/cpu interface.

This will tell you what the hardware supports, and what HW states are actually
being invoked.  (which  may be different from what the OS asks for...)

It may tell us just the same thing I think we learned here:

  (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
  (XEN) CC3[28794734145697] CC6[0] CC7[0]

which I'm assuming are a dump of the MSR residency counters.
If yes, it appears to be that this platform is not invoking c6 and pc6 at all,
and that the deepest state being used is actually cc3 and pc3.
I don't know if that is because you've booted the kernel with max_cstate=N
of some kind, or if this is default.

attached is turbostat, source and binary, run it this way
and send the ts.out file:

# ./turbostat --debug sleep 5  ts.out 21

Guessing...
If no surprises in the debug stuff requested above, and
If the XEN debug stuff above is with c6 explicitly disabled...
Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
If this box supports both, the next thing to try will be to keep CC6
enabled, but to just disable PC6.  This is done via an MSR that turbostat
dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
Though if that MSR is locked by the BIOS, then BIOS SETUP option
may be the only way to disable the package C-state limit without
also disabling the associated core C-state.

cheers,
-Len


ps. 



turbostat-test.tar.gz
Description: turbostat-test.tar.gz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-12-03 Thread Dugger, Donald D
Jan-

No, I have no knowledge of an unpublished errata related to C State issues.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com] 
Sent: Thursday, November 27, 2014 2:28 AM
To: Steve Freitas; Dugger, Donald D; Nakajima, Jun
Cc: xen-devel@lists.xen.org; Don Slutz
Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1

 On 27.11.14 at 06:29, sfl...@ihonk.com wrote:
 On 11/25/2014 03:00 AM, Jan Beulich wrote:
 Okay, so it's not really the mwait-idle driver causing the 
 regression, but it is C-state related. Hence we're now down to seeing 
 whether all or just the deeper C states are affected, i.e. I now need 
 to ask you to play with max_cstate=. For that you'll have to 
 remember that the option's effect differs between the ACPI and the MWAIT 
 idle drivers.
 In the spirit of bisection I'd suggest using max_cstate=2 first no 
 matter which of the two scenarios you pick. If that still hangs, 
 max_cstate=1 obviously is the only other thing to try. Should that 
 not hang (and you left out mwait-idle=0), trying max_cstate=3
 in that same scenario would be the other case to check.

 No need for 'd' and 'a' output for the time being, but 'c' output 
 would be much appreciated for all cases where you observe hangs.

 
 Okay, working through that now. I tried max_cstate=2 and got no hangs, 
 whether with or without mwait-idle=0. However, I was puzzled by this:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
 duration[1190961948551]
 (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
 duration[2015393965907]
 (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
 duration[30527997858148]
 (XEN)*C0:   usage[73351700] duration[9974627547595]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[28794734145697] CC6[0] CC7[0]
 (XEN) ==cpu1==
 (XEN) active state: C3
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
 duration[1141422044112]
 (XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
 duration[1329739264322]
 (XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
 duration[31676618425954]
 (XEN) C0:   usage[61713618] duration[9561201640320]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[30066495105056] CC6[0] CC7[0] [...]
 
 Why would some of the cores be in C3 even though they list max_cstate as C2?

This was precisely the reason why I told you that the numbering differs (and is 
confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is what above is 
listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in line with the 
numbering the CPU documentation uses, it's rather kind of meant to refer to the 
ACPI numbering (but probably also not fully matching up).

So max_cstate=2 working suggests a problem with what the CPU calls C6, which 
presumably isn't all that surprising considering the many errata (BD35, BD38, 
BD40, BD59, BD87, and BD104). Not sure how to proceed from here - I suppose you 
already made sure you run with the latest available BIOS. And with 6 errata 
documented it's not all that unlikely that there's a 7th one with MONITOR/MWAIT 
behavior. The commit you bisected to (and which you had verified to be the 
culprit by just forcing
arch_skip_send_event_check() to always return false) could be reasonably 
assumed to be broken only when MWAIT use for all C states didn't work.

Don, Jun - is there anything known but not yet publicly documented for Family 6 
Model 44 Xeons?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas

On 11/27/2014 01:27 AM, Jan Beulich wrote:

This was precisely the reason why I told you that the numbering
differs (and is confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is
what above is listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in
line with the numbering the CPU documentation uses, it's rather
kind of meant to refer to the ACPI numbering (but probably also
not fully matching up).


Ah, thanks for the explanation.


So max_cstate=2 working suggests a problem with what the CPU
calls C6, which presumably isn't all that surprising considering the
many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not
sure how to proceed from here - I suppose you already made
sure you run with the latest available BIOS.


Yes, latest available BIOS.


And with 6 errata
documented it's not all that unlikely that there's a 7th one with
MONITOR/MWAIT behavior. The commit you bisected to (and
which you had verified to be the culprit by just forcing
arch_skip_send_event_check() to always return false) could be
reasonably assumed to be broken only when MWAIT use for all
C states didn't work.


Now I did get a hang with max_cstate=3 and mwait-idle=0. May I assume 
that mwait-idle=0 means that ACPI is responsible for the throttling?


Thanks again for all your help!

Steve

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas

On Nov 28, 2014, at 00:50, Jan Beulich jbeul...@suse.com wrote:

 On 28.11.14 at 09:24, sfl...@ihonk.com wrote:
 And with 6 errata
 documented it's not all that unlikely that there's a 7th one with
 MONITOR/MWAIT behavior. The commit you bisected to (and
 which you had verified to be the culprit by just forcing
 arch_skip_send_event_check() to always return false) could be
 reasonably assumed to be broken only when MWAIT use for all
 C states didn't work.
 
 Now I did get a hang with max_cstate=3 and mwait-idle=0.
 
 According to the data you provided earlier, max_cstate=3 is
 identical to not using that option at all when you also use
 mwait-idle=0. It would make a difference only when not using
 that latter option (and I specifically pointed this out in earlier
 replies).
 

Apologies for asking you to repeat yourself. Most of this stuff is over my head 
-- the only time I was this far down the rabbit hole was on an 8051.  Thanks 
for your patience. :-)

Steve


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-27 Thread Jan Beulich
 On 27.11.14 at 06:29, sfl...@ihonk.com wrote:
 On 11/25/2014 03:00 AM, Jan Beulich wrote:
 Okay, so it's not really the mwait-idle driver causing the regression,
 but it is C-state related. Hence we're now down to seeing whether all
 or just the deeper C states are affected, i.e. I now need to ask you
 to play with max_cstate=. For that you'll have to remember that the
 option's effect differs between the ACPI and the MWAIT idle drivers.
 In the spirit of bisection I'd suggest using max_cstate=2 first no
 matter which of the two scenarios you pick. If that still hangs,
 max_cstate=1 obviously is the only other thing to try. Should that
 not hang (and you left out mwait-idle=0), trying max_cstate=3
 in that same scenario would be the other case to check.

 No need for 'd' and 'a' output for the time being, but 'c' output would
 be much appreciated for all cases where you observe hangs.

 
 Okay, working through that now. I tried max_cstate=2 and got no hangs, 
 whether with or without mwait-idle=0. However, I was puzzled by this:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
 duration[1190961948551]
 (XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
 duration[2015393965907]
 (XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
 duration[30527997858148]
 (XEN)*C0:   usage[73351700] duration[9974627547595]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[28794734145697] CC6[0] CC7[0]
 (XEN) ==cpu1==
 (XEN) active state: C3
 (XEN) max_cstate:   C2
 (XEN) states:
 (XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
 duration[1141422044112]
 (XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
 duration[1329739264322]
 (XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
 duration[31676618425954]
 (XEN) C0:   usage[61713618] duration[9561201640320]
 (XEN) max=0 pwr=0 urg=0 nxt=0
 (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
 (XEN) CC3[30066495105056] CC6[0] CC7[0]
[...]
 
 Why would some of the cores be in C3 even though they list max_cstate as C2?

This was precisely the reason why I told you that the numbering
differs (and is confusing and has nothing to do with actual C state
numbers): What max_cstate refers to in the mwait-idle driver is
what above is listed as type[Cx], i.e. the state at index 1 is C1, at
2 we've got C1E, and at 3 we've got C2. And those still aren't in
line with the numbering the CPU documentation uses, it's rather
kind of meant to refer to the ACPI numbering (but probably also
not fully matching up).

So max_cstate=2 working suggests a problem with what the CPU
calls C6, which presumably isn't all that surprising considering the
many errata (BD35, BD38, BD40, BD59, BD87, and BD104). Not
sure how to proceed from here - I suppose you already made
sure you run with the latest available BIOS. And with 6 errata
documented it's not all that unlikely that there's a 7th one with
MONITOR/MWAIT behavior. The commit you bisected to (and
which you had verified to be the culprit by just forcing
arch_skip_send_event_check() to always return false) could be
reasonably assumed to be broken only when MWAIT use for all
C states didn't work.

Don, Jun - is there anything known but not yet publicly
documented for Family 6 Model 44 Xeons?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-26 Thread Steve Freitas

On 11/25/2014 03:00 AM, Jan Beulich wrote:

Okay, so it's not really the mwait-idle driver causing the regression,
but it is C-state related. Hence we're now down to seeing whether all
or just the deeper C states are affected, i.e. I now need to ask you
to play with max_cstate=. For that you'll have to remember that the
option's effect differs between the ACPI and the MWAIT idle drivers.
In the spirit of bisection I'd suggest using max_cstate=2 first no
matter which of the two scenarios you pick. If that still hangs,
max_cstate=1 obviously is the only other thing to try. Should that
not hang (and you left out mwait-idle=0), trying max_cstate=3
in that same scenario would be the other case to check.

No need for 'd' and 'a' output for the time being, but 'c' output would
be much appreciated for all cases where you observe hangs.



Okay, working through that now. I tried max_cstate=2 and got no hangs, 
whether with or without mwait-idle=0. However, I was puzzled by this:


(XEN) 'c' pressed - printing ACPI Cx structures
(XEN) ==cpu0==
(XEN) active state: C0
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[12219860] method[  FFH] 
duration[1190961948551]
(XEN) C2:   type[C1] latency[010] usage[10205554] method[  FFH] 
duration[2015393965907]
(XEN) C3:   type[C2] latency[020] usage[50926286] method[  FFH] 
duration[30527997858148]

(XEN)*C0:   usage[73351700] duration[9974627547595]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[28794734145697] CC6[0] CC7[0]
(XEN) ==cpu1==
(XEN) active state: C3
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[10699950] method[  FFH] 
duration[1141422044112]
(XEN) C2:   type[C1] latency[010] usage[06382904] method[  FFH] 
duration[1329739264322]
(XEN)*C3:   type[C2] latency[020] usage[44630764] method[  FFH] 
duration[31676618425954]

(XEN) C0:   usage[61713618] duration[9561201640320]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[30066495105056] CC6[0] CC7[0]
(XEN) ==cpu2==
(XEN) active state: C3
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[10829791] method[  FFH] 
duration[1145244102917]
(XEN) C2:   type[C1] latency[010] usage[06392468] method[  FFH] 
duration[1330830147023]
(XEN)*C3:   type[C2] latency[020] usage[44705668] method[  FFH] 
duration[31741190985486]

(XEN) C0:   usage[61927927] duration[9491716216846]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[30117696095715] CC6[0] CC7[0]
(XEN) ==cpu3==
(XEN) active state: C3
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[10692336] method[  FFH] 
duration[1144876437514]
(XEN) C2:   type[C1] latency[010] usage[06394051] method[  FFH] 
duration[1333961503379]
(XEN)*C3:   type[C2] latency[020] usage[44744178] method[  FFH] 
duration[31803488799434]

(XEN) C0:   usage[61830565] duration[9426654792908]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[30191557548300] CC6[0] CC7[0]
(XEN) ==cpu4==
(XEN) active state: C3
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[10746634] method[  FFH] 
duration[1144044534459]
(XEN) C2:   type[C1] latency[010] usage[06444054] method[  FFH] 
duration[1340637424913]
(XEN)*C3:   type[C2] latency[020] usage[44690901] method[  FFH] 
duration[31663207165902]

(XEN) C0:   usage[61881589] duration[9561092487876]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
(XEN) CC3[30049235012919] CC6[0] CC7[0]
(XEN) ==cpu5==
(XEN) active state: C3
(XEN) max_cstate:   C2
(XEN) states:
(XEN) C1:   type[C1] latency[003] usage[10694684] method[  FFH] 
duration[1140625901110]
(XEN) C2:   type[C1] latency[010] usage[06461563] method[  FFH] 
duration[1342115502967]
(XEN)*C3:   type[C2] latency[020] usage[44834522] method[  FFH] 
duration[31719560664023]

(XEN) C0:   usage[61990769] duration[9506679619986]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]

Why would some of the cores be in C3 even though they list max_cstate as C2?

Steve

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Steve Freitas

On 11/25/2014 12:16 AM, Jan Beulich wrote:

(XEN) 'c' pressed - printing ACPI Cx structures
(XEN) ==cpu0==
(XEN) active state:C0
(XEN) max_cstate:C7
(XEN) states:
(XEN) C1:type[C1] latency[001] usage[5664] method[  FFH] 
duration[4042540627]
(XEN) C2:type[C3] latency[064] usage[0414] method[  FFH] 
duration[44725]
(XEN) C3:type[C3] latency[096] usage[2366] method[  FFH] 
duration[28183588810]
(XEN)*C0:usage[8444] duration[26752178344]
(XEN) max=0 pwr=0 urg=0 nxt=0
(XEN) PC2[0] PC3[112428588] PC6[21869019218] PC7[0]
(XEN) CC3[484210884] CC6[27943480555] CC7[0]

Interesting, so other than for me (perhaps due to other patches
I have in my tree) the change resulted in C states now being used
again despite mwait-idle=0, which is good. Question now is - with
this being the case, did the hangs re-occur?


Unfortunately they did. (Happened unusually quick this time, though I 
doubt the statistical significance.) Not sure what the desirable output 
is, so I did a couple of 'a' and 'd' requests, capped off by a 'c':


(XEN) *** Dumping CPU0 host state: ***
(XEN) [ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:e008:[82d08012c9a2] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0246   CONTEXT: hypervisor
(XEN) rax:    rbx: 8300a943d000   rcx: 
(XEN) rdx: 82d0802e   rsi: 0008   rdi: 82d080329308
(XEN) rbp: 82d0802e7ec8   rsp: 82d0802e7e40   r8: 82d080329320
(XEN) r9:     r10: f88002f2c2a0   r11: f88002f36d70
(XEN) r12: 01227e5280e4   r13: 8300a943d000   r14: 82d080329308
(XEN) r15: 01c9c380   cr0: 8005003b   cr4: 26f0
(XEN) cr3: 000c1b57c000   cr2: 07fefca62000
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=82d0802e7e40:
(XEN)82d080128cb5 82d080329300 82d080329320 002e7e78
(XEN)82d080329300 82d0801b977e 8300a943d000 f88002f36d70
(XEN)8300a943d000 01c9c380 82d0801e5600 82d0802e7f08
(XEN)82d080300080 82d080300080  82d0802e
(XEN)fa8004fdff80 82d0802e7ef8 82d08012bfa3 8300a943d000
(XEN)f88002f36d70 018507f7ef25 000f 82d0802e7f08
(XEN)82d08012bffb 000f 82d0801e849a fa8004fdff80
(XEN)000f 018507f7ef25 f88002f36d70 000f
(XEN)f88002f2c180 f88002f36d70 f88002f2c2a0 f880043cb960
(XEN)f88002f2c2a0 0002 f88002f2c1c0 0400
(XEN) f88002f36eb0 beefbeef f8000293e20c
(XEN)00bfbeef 0046 f88002f36c20 beef
(XEN)beef beef beef beef
(XEN) 8300a943d000  
(XEN) Xen call trace:
(XEN)[82d08012c9a2] _spin_unlock_irq+0x30/0x31
(XEN)[82d08012bfa3] __do_softirq+0x81/0x8c
(XEN)[82d08012bffb] do_softirq+0x13/0x15
(XEN)[82d0801e849a] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v3): ***
(XEN) [ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:0010:[f8000293e20c]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest
(XEN) rax: 0002   rbx: f88002f2c180   rcx: f88002f2c1c0
(XEN) rdx: 0400   rsi:    rdi: f88002f36eb0
(XEN) rbp: 000f   rsp: f88002f36c20   r8: f88002f2c2a0
(XEN) r9:  f880043cb960   r10: f88002f2c2a0   r11: f88002f36d70
(XEN) r12: f88002f36d70   r13: 018507f7ef25   r14: 000f
(XEN) r15: fa8004fdff80   cr0: 80050031   cr4: 06f8
(XEN) cr3: 536d9000   cr2: 07fefca62000
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss:    cs: 0010
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v4): ***
(XEN) [ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:0010:[f8000293e20e]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest
(XEN) rax: 0002   rbx: f88002fa2180   rcx: f88002fa21c0
(XEN) rdx: 0400   rsi:    rdi: f88002faceb0
(XEN) rbp: 000f   rsp: f88002facc20   r8: f88002fa22a0
(XEN) r9:  f88002fcaca0   r10: f88002fa22a0   r11: f88002facd70
(XEN) r12: f88002facd70   r13:    r14: 000f
(XEN) r15: f88002fa6fc0   cr0: 80050031   cr4: 06f8
(XEN) cr3: 00187000   cr2: 07faf478
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU2 guest state (d1v5): ***
(XEN) [ Xen-4.5.0-rc  x86_64  debug=y  Not tainted ]
(XEN) CPU:2
(XEN) 

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Jan Beulich
 On 25.11.14 at 10:38, sfl...@ihonk.com wrote:
 On 11/25/2014 12:16 AM, Jan Beulich wrote:
 Interesting, so other than for me (perhaps due to other patches
 I have in my tree) the change resulted in C states now being used
 again despite mwait-idle=0, which is good. Question now is - with
 this being the case, did the hangs re-occur?
 
 Unfortunately they did. (Happened unusually quick this time, though I 
 doubt the statistical significance.) Not sure what the desirable output 
 is, so I did a couple of 'a' and 'd' requests, capped off by a 'c':

Okay, so it's not really the mwait-idle driver causing the regression,
but it is C-state related. Hence we're now down to seeing whether all
or just the deeper C states are affected, i.e. I now need to ask you
to play with max_cstate=. For that you'll have to remember that the
option's effect differs between the ACPI and the MWAIT idle drivers.
In the spirit of bisection I'd suggest using max_cstate=2 first no
matter which of the two scenarios you pick. If that still hangs,
max_cstate=1 obviously is the only other thing to try. Should that
not hang (and you left out mwait-idle=0), trying max_cstate=3
in that same scenario would be the other case to check.

No need for 'd' and 'a' output for the time being, but 'c' output would
be much appreciated for all cases where you observe hangs.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich
 On 23.11.14 at 02:28, sfl...@ihonk.com wrote:
 With mwait-idle=0:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C7
 (XEN) states:
 (XEN) C1:   type[C1] latency[001] usage[] method[  FFH] 
 duration[0]
 (XEN) C2:   type[C0] latency[000] usage[] method[ NONE] 
 duration[0]
 (XEN) C3:   type[C3] latency[064] usage[] method[  FFH] 
 duration[0]
 (XEN) C4:   type[C3] latency[096] usage[] method[  FFH] 
 duration[0]
 (XEN)*C0:   usage[] duration[46930624784]
 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
 (XEN) CC3[0] CC6[0] CC7[0]
[...]

Very interesting - the hypervisor has C-state information, but never
entered any of them. That certainly explains the difference between
using/not using the ,wait-idle driver, but puts us back to there being
a more general issue with C-state use on this CPU model. Possibly
related to C2 having entry method NONE, but then again I can't
see how such a state could get entered into the table the first place:
set_cx() bails upon check_cx() returning an error, and hence its
switch()'s default statement should never be reached. Plus even if an
array entry was set to NONE, it should simply be ignored when
looking for a state to enter. I'll probably need to put together a
debugging patch to figure out what's going on here.

In any event C2 being set to NONE and that information presumably
coming from firmware is an indication that there's a problem with C2
(note that the numbering doesn't really match up with what the
document says, this likely really is C1E) on that CPU. Which gets us
back to ...

 CPU information for one of the cores, 2.8 GHz is nominal, stepping is 2. 
 Not sure how to translate that stepping number into Intel's format:
 
 processor   : 0
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 44
 model name  : Intel(R) Xeon(R) CPU   X5660  @ 2.80GHz
 stepping: 2
[...]
 There are a couple potentially relevant errata (BC36, BC38, BC54,
 BC77, BC110).

 To exclude BC36, a boot log with apic-verbosity=debug and debug
 key 'i' output would be necessary.
 
 Done, see the very end of the email.
 
 BC38 should not affect us since we don't enter C states from ISRs.

 BC54 is probably irrelevant since we meanwhile know that your
 system doesn't really hang hard.

 For BC77 it would be worth trying to disable turbo mode instead of
 disabling the mwait-idle driver (xenpm disable-turbo-mode right
 after boot).
 
 I looked up BC77 but as a result found this document[1], which seems to 
 relate to the i7.  Would this[2] not be the relevant document?
 
 [1] 
 http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd
 ates/core-i7-900-ee-and-desktop-processor-series-32nm-spec-update.pdf
 
 [2] 
 http://www.intel.com/content/dam/www/public/us/en/documents/specification-upd
 ates/xeon-5600-specification-update.pdf

Indeed. I wasn't aware that there are family/model/stepping tuples
that can be both Xeon and desktop CPUs.

 As promised, below is the apic-verbosity=debug log, with 'i'. Thanks!

I'm sorry, I misspelled the option, it's really apic_verbosity=debug.
The 'i' output at least already confirms that there are no ExtINT
entries among the IO-APIC RTEs.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Steve Freitas
On Nov 24, 2014, at 00:45, Jan Beulich jbeul...@suse.com wrote:

 On 23.11.14 at 02:28, sfl...@ihonk.com wrote:
 With mwait-idle=0:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C7
 (XEN) states:
 (XEN) C1:   type[C1] latency[001] usage[] method[  FFH] 
 duration[0]
 (XEN) C2:   type[C0] latency[000] usage[] method[ NONE] 
 duration[0]
 (XEN) C3:   type[C3] latency[064] usage[] method[  FFH] 
 duration[0]
 (XEN) C4:   type[C3] latency[096] usage[] method[  FFH] 
 duration[0]
 (XEN)*C0:   usage[] duration[46930624784]
 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
 (XEN) CC3[0] CC6[0] CC7[0]
 [...]
 
 Very interesting - the hypervisor has C-state information, but never
 entered any of them. That certainly explains the difference between
 using/not using the ,wait-idle driver, but puts us back to there being
 a more general issue with C-state use on this CPU model. Possibly
 related to C2 having entry method NONE, but then again I can't
 see how such a state could get entered into the table the first place:
 set_cx() bails upon check_cx() returning an error, and hence its
 switch()'s default statement should never be reached. Plus even if an
 array entry was set to NONE, it should simply be ignored when
 looking for a state to enter. I'll probably need to put together a
 debugging patch to figure out what's going on here.
 

Okay, happy to give it a go whenever you have the time to put something 
together.

 
 
 As promised, below is the apic-verbosity=debug log, with 'i'. Thanks!
 
 I'm sorry, I misspelled the option, it's really apic_verbosity=debug.
 The 'i' output at least already confirms that there are no ExtINT
 entries among the IO-APIC RTEs.
 
 

No sweat. Do you need me to run it again with the corrected option?

Thanks!

Steve
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich
 On 24.11.14 at 10:08, sfl...@ihonk.com wrote:
 On Nov 24, 2014, at 00:45, Jan Beulich jbeul...@suse.com wrote:
 
 On 23.11.14 at 02:28, sfl...@ihonk.com wrote:
 With mwait-idle=0:
 
 (XEN) 'c' pressed - printing ACPI Cx structures
 (XEN) ==cpu0==
 (XEN) active state: C0
 (XEN) max_cstate:   C7
 (XEN) states:
 (XEN) C1:   type[C1] latency[001] usage[] method[  FFH] 
 duration[0]
 (XEN) C2:   type[C0] latency[000] usage[] method[ NONE] 
 duration[0]
 (XEN) C3:   type[C3] latency[064] usage[] method[  FFH] 
 duration[0]
 (XEN) C4:   type[C3] latency[096] usage[] method[  FFH] 
 duration[0]
 (XEN)*C0:   usage[] duration[46930624784]
 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
 (XEN) CC3[0] CC6[0] CC7[0]
 [...]
 
 Very interesting - the hypervisor has C-state information, but never
 entered any of them. That certainly explains the difference between
 using/not using the ,wait-idle driver, but puts us back to there being
 a more general issue with C-state use on this CPU model. Possibly
 related to C2 having entry method NONE, but then again I can't
 see how such a state could get entered into the table the first place:
 set_cx() bails upon check_cx() returning an error, and hence its
 switch()'s default statement should never be reached. Plus even if an
 array entry was set to NONE, it should simply be ignored when
 looking for a state to enter. I'll probably need to put together a
 debugging patch to figure out what's going on here.
 
 Okay, happy to give it a go whenever you have the time to put something 
 together.

While putting this together I found the reason for the strange
C2: line, and the attached debugging patch already has the fix
for it (which I'll also submit separately, and hence you may need
to drop that specific hunk should you end up applying it on a tree
which already has that fix). You'll need to again run with
mwait-idle=0, and it's the boot messages along with the 'c'
debug key output that's of interest.

Thanks, Jan

--- unstable.orig/xen/arch/x86/acpi/cpu_idle.c
+++ unstable/xen/arch/x86/acpi/cpu_idle.c
@@ -58,7 +58,7 @@
 #include xen/notifier.h
 #include xen/cpu.h
 
-/*#define DEBUG_PM_CX*/
+#define DEBUG_PM_CX
 
 #define GET_HW_RES_IN_NS(msr, val) \
 do { rdmsrl(msr, val); val = tsc_ticks2ns(val); } while( 0 )
@@ -238,6 +238,9 @@ static char* acpi_cstate_method_name[] =
 HALT
 };
 
+struct reasons { unsigned long max, pwr, urg, nxt; };//temp
+static DEFINE_PER_CPU(struct reasons, reasons);//temp
+
 static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 {
 uint32_t i, idle_usage = 0;
@@ -273,6 +276,8 @@ static void print_acpi_power(uint32_t cp
 printk((last_state_idx == 0) ?* : );
 printk(C0:\tusage[%08d] duration[%PRId64]\n,
idle_usage, NOW() - idle_res);
+printk(max=%lx pwr=%lx urg=%lx nxt=%lx\n,//temp
+   per_cpu(reasons.max, cpu), per_cpu(reasons.pwr, cpu), 
per_cpu(reasons.urg, cpu), per_cpu(reasons.nxt, cpu));
 
 print_hw_residencies(cpu);
 }
@@ -501,6 +506,7 @@ static void acpi_processor_idle(void)
 u32 exp = 0, pred = 0;
 u32 irq_traced[4] = { 0 };
 
+next_state = 1;//temp
 if ( max_cstate  0  power  !sched_has_urgent_vcpu() 
  (next_state = cpuidle_current_governor-select(power))  0 )
 {
@@ -519,6 +525,10 @@ static void acpi_processor_idle(void)
 }
 if ( !cx )
 {
+this_cpu(reasons.max) += max_cstate = 0;//temp
+this_cpu(reasons.pwr) += !power;//temp
+this_cpu(reasons.urg) += !!sched_has_urgent_vcpu();//temp
+this_cpu(reasons.nxt) += next_state = 0;//temp
 if ( pm_idle_save )
 pm_idle_save();
 else
@@ -1007,6 +1017,7 @@ static void set_cx(
 cx-entry_method = ACPI_CSTATE_EM_SYSIO;
 break;
 default:
+printk(CPU%u: C%u space %x?\n, acpi_power-cpu, xen_cx-type, 
xen_cx-reg.space_id);//temp
 cx-entry_method = ACPI_CSTATE_EM_NONE;
 break;
 }
@@ -1015,7 +1026,7 @@ static void set_cx(
 cx-target_residency = cx-latency * latency_factor;
 
 smp_wmb();
-acpi_power-count++;
+acpi_power-count += (cx-type != ACPI_STATE_C1);
 if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 )
 acpi_power-safe_state = cx;
 }
@@ -1141,6 +1152,7 @@ long set_cx_pminfo(uint32_t cpu, struct 
 
 /* FIXME: C-state dependency is not supported by far */
 
+printk(CPU%u: %pS, %pS\n, cpu, pm_idle_save, pm_idle);//temp
 if ( cpu_id == 0 )
 {
 if ( pm_idle_save == NULL )
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-21 Thread Jan Beulich
 On 20.11.14 at 21:07, sfl...@ihonk.com wrote:
 Running with mwait-idle=0 solves (hides?) the problem. Next step is to 
 fiddle with the C states?

So this also prompted me to go over the list of errata. Just to confirm
- your CPU is family 6 model 44? What stepping? And what nominal
frequency?

There are a couple potentially relevant errata (BC36, BC38, BC54,
BC77, BC110).

To exclude BC36, a boot log with apic-verbosity=debug and debug
key 'i' output would be necessary.

BC38 should not affect us since we don't enter C states from ISRs.

BC54 is probably irrelevant since we meanwhile know that your
system doesn't really hang hard.

For BC77 it would be worth trying to disable turbo mode instead of
disabling the mwait-idle driver (xenpm disable-turbo-mode right
after boot).

And BC110 would be relevant only if without the mwait-idle driver
there would be no use of C3. Plus anyway this would more likely end
up in a hard hang too.

And then, considering that my system with a family 6 model 44 CPU
has never shown anything similar (albeit that doesn't mean all that
much since our workloads are likely very different), you're not
over-clocking? And did you disable hyper-threading on purpose (if
so could you check whether enabling it makes a difference)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-20 Thread Jan Beulich
 On 20.11.14 at 02:23, sfl...@ihonk.com wrote:
 On 11/17/2014 23:54, Jan Beulich wrote:
 Another thing - now that serial logging appears to be working for
 you, did you try whether the host, once hung, still reacts to serial
 input (perhaps force input to go to Xen right at boot via the
 conswitch= option)? If so, 'd' debug-key output would likely be
 the piece of most interest.
 
 Here you go. Performed with a checkout of 9a727a81 (because it was 
 handy), let me know if you'd rather see the results from 4.5-rc2 or any 
 other Xen debugging info:

Interesting - all CPUs are executing stuff from Dom1, which be itself
is not indicating a problem, but may (considering the host hang) hint
at guest vCPU-s not getting de-scheduled anymore on one or more of
the CPUs. The 'a' debug key output would hopefully give us enough
information to know whether that's the case. Ideally do 'd' and 'a'
a couple of times each (alternating, and with a few seconds in
between).

Also, for double checking purposes, could you make the xen-syms
of a build you observed the problem with available somewhere?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-20 Thread Steve Freitas

Hi Jan,

Thanks for all your help so far! Here's my latest update.

On 11/17/2014 23:54, Jan Beulich wrote:

Plus, without said adjustment, first just disable the
MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make
a difference, use of C states altogether (cpuidle=0). If any of this
does make a difference, limiting use of C states without fully
excluding their use may need to be the next step.


Running with mwait-idle=0 solves (hides?) the problem. Next step is to 
fiddle with the C states?


On 11/19/2014 23:59, Jan Beulich wrote:
Also, for double checking purposes, could you make the xen-syms of a 
build you observed the problem with available somewhere? 


The xen-syms (from my build of 9a727a81) can be found here: 
http://steve.freitas.org/xen-syms-4.5-unstable.gz


Interesting - all CPUs are executing stuff from Dom1, which be itself 
is not indicating a problem, but may (considering the host hang) hint 
at guest vCPU-s not getting de-scheduled anymore on one or more of the 
CPUs. The 'a' debug key output would hopefully give us enough 
information to know whether that's the case. Ideally do 'd' and 'a' a 
couple of times each (alternating, and with a few seconds in between). 


Here ya go (as before, from 9a727a81). I had to pick and choose what 
parts to pull from the log because it was getting spammed with kernel 
complaints about the disk subsystem being locked up, so the hypervisor 
debugging info was hard to read amidst the kernel errors. As ever, let 
me know if you need more.


Thanks again!

Steve

(XEN) CPU00:
(XEN)   ex=1445us timer=8300bf526060 
cb=vcpu_singleshot_timer_fn(8300bf526000)
(XEN)   ex=9918us timer=830c3dc4b1e0 
cb=csched_acct(830c3dc4b1c0)
(XEN)   ex=8390us timer=830c3dcc2d08 
cb=csched_tick()
(XEN)   ex=70409499us timer=82d08031d1e0 
cb=plt_overflow()
(XEN)   ex=12265483us timer=82d08031f4e0 
cb=mce_work_fn()
(XEN)   ex=   94364us timer=82d08031d280 
cb=time_calibration()
(XEN)   ex=   18390us timer=82d080321560 
cb=do_dbs_timer(82d0803215a0)

(XEN) CPU01:
(XEN)   ex= 390us timer=830c17ceb460 
cb=pt_timer_fn(830c17ceb420)
(XEN)   ex=14101194us timer=830c17ceb4e0 
cb=pt_timer_fn(830c17ceb4a0)
(XEN)   ex=  153445us timer=8300bf524060 
cb=vcpu_singleshot_timer_fn(8300bf524000)
(XEN)   ex= 44171681527us timer=830c17ceb290 
cb=rtc_alarm_cb(830c17ceb1f0)

(XEN) CPU02:
(XEN)   ex=1445us timer=8300bf798060 
cb=vcpu_singleshot_timer_fn(8300bf798000)
(XEN)   ex=8390us timer=830c3dc797c8 
cb=csched_tick(0002)
(XEN)   ex=   18390us timer=830c3dcb8360 
cb=do_dbs_timer(830c3dcb83a0)
(XEN)   ex=   29570us timer=830c3dcb80a0 
cb=s_timer_fn()

(XEN) CPU03:
(XEN)   ex=   25445us timer=8300bf2fb060 
cb=vcpu_singleshot_timer_fn(8300bf2fb000)

(XEN) CPU04:
(XEN)   ex= 634us timer=8300bf525060 
cb=vcpu_singleshot_timer_fn(8300bf525000)

(XEN) CPU05:
(XEN)   ex=1445us timer=8300bf527060 
cb=vcpu_singleshot_timer_fn(8300bf527000)
(XEN)   ex=   388096702us timer=830c17ceb5d0 
cb=pmt_timer_callback(830c17ceb5a8)

(XEN) 'd' pressed - dumping registers
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v4): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:0010:[f8000298a20e]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest
(XEN) rax: 0002   rbx: f88002fa2180   rcx: f88002fa21c0
(XEN) rdx: 0400   rsi:    rdi: f88002faceb0
(XEN) rbp: 000f   rsp: f88002facc20   r8: f88002fa22a0
(XEN) r9:  03295cdc3c57   r10: f88002fa22a0   r11: f88002facd70
(XEN) r12: f88002facd70   r13: 03940027203c   r14: 000f
(XEN) r15: 0001   cr0: 80050031   cr4: 06f8
(XEN) cr3: 00187000   cr2: 01cb8300
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v1): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:0010:[f8000298a20e]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest
(XEN) rax: 0002   rbx: f88002e40180   rcx: f88002e401c0
(XEN) rdx: 0400   rsi:    rdi: f88002e4aeb0
(XEN) rbp: 000f   rsp: f88002e4ac20   r8: f88002e402a0
(XEN) r9:  0256dc1c8f33   r10: f88002e402a0   r11: f88002e4ad70
(XEN) r12: f88002e4ad70   r13: 0394002722bb   r14: 000f
(XEN) r15: 0001   cr0: 80050031   cr4: 06f8
(XEN) cr3: 00187000   cr2: 002f7d38
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU2 guest state (d1v0): 

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-19 Thread Steve Freitas

On 11/17/2014 23:54, Jan Beulich wrote:

On 17.11.14 at 20:21, sfl...@ihonk.com wrote:

Okay, I did a bisection and was not able to correlate the above error
message with the problem I'm seeing. Not saying it's not related, but I
had plenty of successful test runs in the presence of that error.

Took me about a week (sometimes it takes as much as 6 hours to produce
the error), but bisect narrowed it down to this commit:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d
c3fa47e621f9e238

What do you think?

Thanks for narrowing this, even if this change didn't show any other
bad effects so far (and it's been widely tested by now), and even if
problems here would generally be expected to surface independent
of the use of PCI pass-through. But a hang (rather than a crash)
would indeed be the most natural result of something being wrong
here. To double check the result, could you, in an up-to-date tree,
simply make x86's arch_skip_send_event_check() return 0
unconditionally?


Made this change and the host was happy.


  Plus, without said adjustment, first just disable the
MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make
a difference, use of C states altogether (cpuidle=0). If any of this
does make a difference, limiting use of C states without fully
excluding their use may need to be the next step.


Will do this next.


Another thing - now that serial logging appears to be working for
you, did you try whether the host, once hung, still reacts to serial
input (perhaps force input to go to Xen right at boot via the
conswitch= option)? If so, 'd' debug-key output would likely be
the piece of most interest.


Here you go. Performed with a checkout of 9a727a81 (because it was 
handy), let me know if you'd rather see the results from 4.5-rc2 or any 
other Xen debugging info:


(XEN) 'd' pressed - dumping registers
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v2): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:0010:[f8000281e2c1]
(XEN) RFLAGS: 0002   CONTEXT: hvm guest
(XEN) rax: 3acd4939f3e7   rbx: 3acd493a0cce   rcx: 
(XEN) rdx: 3acd   rsi:    rdi: 0057
(XEN) rbp: 645c   rsp: f880033edf90   r8: f880033edff0
(XEN) r9:     r10: f880033ee040   r11: 000342934690
(XEN) r12: f880033ee3c8   r13: 1000   r14: 
(XEN) r15: 0058   cr0: 80050031   cr4: 06f8
(XEN) cr3: 66aca000   cr2: f9800268
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU1 host state: ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:e008:[82d08012a9a1] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0246   CONTEXT: hypervisor
(XEN) rax:    rbx: 8300a943e000   rcx: 0001
(XEN) rdx: 830c3dc7   rsi: 0004   rdi: 830c3dc7a088
(XEN) rbp: 830c3dc77ec8   rsp: 830c3dc77e40   r8: 830c3dc7a0a0
(XEN) r9:     r10: f88002fd82a0   r11: f88002fe2d70
(XEN) r12: 151cc8b48756   r13: 8300a943e000   r14: 830c3dc7a088
(XEN) r15: 01c9c380   cr0: 8005003b   cr4: 26f0
(XEN) cr3: 000c18962000   cr2: ff331aa0
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=830c3dc77e40:
(XEN)82d080126ec5 82d080321280 830c3dc7a0a0 000100c77e78
(XEN)830c3dc7a080 82d0801b5277 8300a943e000 f88002fe2d70
(XEN)8300a943e000 01c9c380 82d0801e0f00 830c3dc77f08
(XEN)82d0802f8080 82d0802f8000  830c3dc7
(XEN)0001 830c3dc77ef8 82d08012a1b3 8300a943e000
(XEN)f88002fe2d70 36d08fbeebe8 000f 830c3dc77f08
(XEN)82d08012a20b 000f 82d0801e3d2a 0001
(XEN)000f 36d08fbeebe8 f88002fe2d70 000f
(XEN)f88002fd8180 f88002fe2d70 f88002fd82a0 34711df61755
(XEN)f88002fd82a0 0002 f88002fd81c0 0400
(XEN) f88002fe2eb0 beefbeef f8000298520c
(XEN)00bfbeef 0046 f88002fe2c20 beef
(XEN)c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef
(XEN)c2c2c2c20001 8300a943e000 003bbd958e00 c2c2c2c2c2c2c2c2
(XEN) Xen call trace:
(XEN)[82d08012a9a1] _spin_unlock_irq+0x30/0x31
(XEN)[82d08012a1b3] __do_softirq+0x81/0x8c
(XEN)[82d08012a20b] do_softirq+0x13/0x15
(XEN)[82d0801e3d2a] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v5): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:  

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-10 Thread Steve Freitas

On 11/10/2014 0:51, Jan Beulich wrote:

On 10.11.14 at 09:03, sfl...@ihonk.com wrote:

Sorry for the delay, took some debugging on another computer to get
serial logging working. Due to its size, I've posted the entire log of a
crashed session here: http://pastebin.com/AiPHUZRH In this case I used a
3.0 gig memory size for the Windows domU.

As I mentioned before, sometimes it's the SATA that goes first, other
times the tg3 ethernet driver. Also note that between 4.4.1 and 4.5rc1,
the kernel I'm using (stock Debian Jessie) has not changed.

Please let me know if you need any other information. Thanks!

Raising the kernel log level to maximum too would have helped.


Okay, I've done that and the output is here, let me know if you have any 
preferred logging flags instead:


http://pastebin.com/M3yvWNTT


Regardless of that, the first device showing anomalies here appears
to be the UHCI controller:

 [  147.415713] usb 7-1: reset low-speed USB device number 2 using uhci_hcd

while booting the guest.


I assume this is related to the USB device (a keyboard) I'm passing 
through to the domU.



And these

 [  199.775209] pcieport :00:03.0: AER: Multiple Corrected error 
received: id=0018
 [  199.775238] pcieport :00:03.0: PCIe Bus Error: severity=Corrected, 
type=Data Link Layer, id=0018(Transmitter ID)
 [  199.775251] pcieport :00:03.0:   device [8086:340a] error 
status/mask=1100/2000
 [  199.775255] pcieport :00:03.0:[ 8] RELAY_NUM Rollover
 [  199.775258] pcieport :00:03.0:[12] Replay Timer Timeout

hint at a problem in the system's design. 00:03.0 is the parent bridge
of 02:00.0 (and from what I can tell that's the only device behind that
bridge), and hence the above messages can only reasonably have
their origin at the passed through VGA device.


You are correct that the VGA card is the only device on 03.0:

root@g2:~# lspci -tv
-[:00]-+-00.0  Intel Corporation 5520 I/O Hub to ESI Port
   +-01.0-[01]00.0  Marvell Technology Group Ltd. 
MV64460/64461/64462 System Controller, Revision B

   +-03.0-[02]00.0  NVIDIA Corporation GT200GL [Quadro FX 4800]
   +-07.0-[03]--
   +-14.0  Intel Corporation 7500/5520/5500/X58 I/O Hub System 
Management Registers
   +-14.1  Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO 
and Scratch Pad Registers
   +-14.2  Intel Corporation 7500/5520/5500/X58 I/O Hub Control 
Status and RAS Registers
   +-16.0  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.1  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.2  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.3  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.4  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.5  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.6  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-16.7  Intel Corporation 5520/5500/X58 Chipset QuickData 
Technology Device
   +-1a.0  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #4
   +-1a.1  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #5
   +-1a.7  Intel Corporation 82801JI (ICH10 Family) USB2 EHCI 
Controller #2
   +-1b.0  Intel Corporation 82801JI (ICH10 Family) HD Audio 
Controller

   +-1c.0-[04]--
   +-1c.4-[05]00.0  Broadcom Corporation NetXtreme BCM5755 
Gigabit Ethernet PCI Express

   +-1c.5-[06-09]00.0-[07-09]--+-02.0-[08]--
   |   \-03.0-[09]00.0 Broadcom 
Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express
   +-1d.0  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #1
   +-1d.1  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #2
   +-1d.2  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #3
   +-1d.3  Intel Corporation 82801JI (ICH10 Family) USB UHCI 
Controller #6
   +-1d.7  Intel Corporation 82801JI (ICH10 Family) USB2 EHCI 
Controller #1
   +-1e.0-[0a]0e.0  Advanced Micro Devices, Inc. [AMD/ATI] 
RV100 [Radeon 7000 / Radeon VE]
   +-1f.0  Intel Corporation 82801JIB (ICH10) LPC Interface 
Controller
   +-1f.2  Intel Corporation 82801JI (ICH10 Family) SATA AHCI 
Controller
   \-1f.3  Intel Corporation 82801JI (ICH10 Family) SMBus 
Controller


What problem in the system's design does this hint at?


IOW it may well be that
you were just lucky that things worked earlier on.


Certainly possible but this is a very common machine in the corporate 
world -- a Lenovo ThinkStation D20 running the X58 chipset. If it's an 
inherent defect in the machine and somebody else hasn't already tripped 
over it I would be very surprised.