Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-22 Thread Andrew Cooper
On 22/01/2016 08:57, Håkon Alstadheim wrote:
> Den 17. jan. 2016 16:25, skrev Andrew Cooper:
>> On 17/01/16 15:16, Andrew Cooper wrote:
> This isn't the first time we have seen this on Haswell processors. Do
> you have microcode loading set up?
>
> ~Andrew
>
 Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
 cpu microcode, using microcode from 20151106.
> ...
 Actually, this will be more useful:

 diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
 index 1228568..4e75b03 100644
 --- a/xen/arch/x86/irq.c
 +++ b/xen/arch/x86/irq.c
 @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq)
  if ( action->ack_type == ACKTYPE_EOI )
  {
  sp = pending_eoi_sp(peoi);
 +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
 +{
 +int p;
 +
 +printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector);
 +for ( p = sp; p > 0; --p )
 +printk("**peoi[%d] = {%d, %#x, %d}\n",
 +   p-1, peoi[p-1].irq, peoi[p-1].vector,
 peoi[p-1].ready);
 +}
  ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
  ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
  peoi[sp].irq = irq;



> Got one again. dom5 is my desktop, dom1 is my
> mail-server/router/firewall. (planning to split that up ... ) . Is there
> any additional info that would be useful?
>
> Running now with gentoo xen 4.6.0-r8 and xen-tools 4.6.0-r7. dom0 kernel
> is gentoo-sources-4.1.15-r1 , and the above patch.
>
> I tried running with maxcpus=6 for a while, but I had to disable some
> services to get that running. So, when nothing happened for a while I
> re-enabled all my cores (two cpus, 12 cores, 24 threads). I was running
> with two cpu-pools, one for each cpu. I have not re-enabled that.

grant_table.c:1491:d1v3 Expanding dom (1) grant table from (12) to (13)
frames.
** sp 1, irq 107, vec 0x3b
**peoi[0] = {107, 0x3b, 0}
Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172
[ Xen-4.6.0  x86_64  debug=y  Tainted:C ]

Xen call trace:
   [] do_IRQ+0x451/0x6ea
   [] common_interrupt+0x62/0x70
   [] mwait_idle+0x2cb/0x315
   [] idle_loop+0x51/0x6b

So we have been interrupted with an interrupt we already believe to be
pending.  I wonder if there is an erratum to do with going to sleep with
a pending interrupt.

I will see about extending the debugging patch to stash the IIR/ISR
before going to sleep.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-22 Thread Jan Beulich
>>> On 22.01.16 at 10:20,  wrote:
> ** sp 1, irq 107, vec 0x3b
> **peoi[0] = {107, 0x3b, 0}
> Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172
> [ Xen-4.6.0  x86_64  debug=y  Tainted:C ]
> 
> Xen call trace:
>[] do_IRQ+0x451/0x6ea
>[] common_interrupt+0x62/0x70
>[] mwait_idle+0x2cb/0x315
>[] idle_loop+0x51/0x6b
> 
> So we have been interrupted with an interrupt we already believe to be
> pending.  I wonder if there is an erratum to do with going to sleep with
> a pending interrupt.

An immediate way to check whether that's (part of) the problem
would be to run with "cpuidle=0" for a while.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-18 Thread Jan Beulich
>>> On 18.01.16 at 11:35,  wrote:
> On 18/01/16 10:31, Jan Beulich wrote:
> On 18.01.16 at 00:07,  wrote:
>>> There we go :-/ . Log attached from boot to assertion-failure with
>>> loglvl=all guest_loglvl=all . Some of the log output might be a bit
>>> cryptic, they are notes to myself from local boot-scripts, basically
>>> firing up my router/name-server/dhcp-server and waiting until services
>>> are ready before continuing.
>>>
>>> ---
>>> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0}
>> According to
>>
>> (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> 
>> IRQ 3 
> Mode:0 Active:0)
>>
>> this might be the serial console, albeit IRQ 107 contradicts this
>> afaict. Does this also occur without serial console? Are we
>> perhaps wrongly re-using vector 0x40 (and if so might this be
>> fixed with -unstable commit fc0c3fa2ad, in turn requiring
>> e509b8e09c)?
> 
> I also had a bug in the first patch which printed the vector as 0x%u,
> fixed in the second to be %#x.  As such, the actual vector on the
> pending EOI stack is 0x28.

That wouldn't make it any better, as then, considering the other
similar messages, we would have to conclude it's the vector of
some other Xen internally used device (the IOMMU?), which again
shouldn't be used by guest IRQ unless it got recycled (albeit I don't
think e.g. IOMMU vectors get recycled at all).

Håkon, considering

(XEN) Failed to enable Interrupt Remapping: Will not enable x2APIC.

plus

(XEN) Intel VT-d Interrupt Remapping enabled.

(a logging inconsistency addressed on -unstable already) could you
check your BIOS setup whether you can make firmware permit use
of x2APIC mode? And could you try whether the issue goes away
with "maxcpus=6" (or less) on the Xen command line?

Also, you appear to be doing GPU pass-through - is the problem
connected to that?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-18 Thread Andrew Cooper
On 18/01/16 10:31, Jan Beulich wrote:
 On 18.01.16 at 00:07,  wrote:
>> There we go :-/ . Log attached from boot to assertion-failure with
>> loglvl=all guest_loglvl=all . Some of the log output might be a bit
>> cryptic, they are notes to myself from local boot-scripts, basically
>> firing up my router/name-server/dhcp-server and waiting until services
>> are ready before continuing.
>>
>> ---
>> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0}
> According to
>
> (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> 
> IRQ 3 Mode:0 Active:0)
>
> this might be the serial console, albeit IRQ 107 contradicts this
> afaict. Does this also occur without serial console? Are we
> perhaps wrongly re-using vector 0x40 (and if so might this be
> fixed with -unstable commit fc0c3fa2ad, in turn requiring
> e509b8e09c)?

I also had a bug in the first patch which printed the vector as 0x%u,
fixed in the second to be %#x.  As such, the actual vector on the
pending EOI stack is 0x28.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-18 Thread Jan Beulich
>>> On 18.01.16 at 00:07,  wrote:
> There we go :-/ . Log attached from boot to assertion-failure with
> loglvl=all guest_loglvl=all . Some of the log output might be a bit
> cryptic, they are notes to myself from local boot-scripts, basically
> firing up my router/name-server/dhcp-server and waiting until services
> are ready before continuing.
> 
> ---
> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0}

According to

(XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> 
IRQ 3 Mode:0 Active:0)

this might be the serial console, albeit IRQ 107 contradicts this
afaict. Does this also occur without serial console? Are we
perhaps wrongly re-using vector 0x40 (and if so might this be
fixed with -unstable commit fc0c3fa2ad, in turn requiring
e509b8e09c)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-18 Thread Håkon Alstadheim
Den 18. jan. 2016 11:31, skrev Jan Beulich:
 On 18.01.16 at 00:07,  wrote:
>> There we go :-/ . Log attached from boot to assertion-failure with
>> loglvl=all guest_loglvl=all . Some of the log output might be a bit
>> cryptic, they are notes to myself from local boot-scripts, basically
>> firing up my router/name-server/dhcp-server and waiting until services
>> are ready before continuing.
>>
>> ---
>> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0}
> According to
>
> (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> 
> IRQ 3 Mode:0 Active:0)
>
> this might be the serial console, albeit IRQ 107 contradicts this
> afaict. Does this also occur without serial console? Are we
> perhaps wrongly re-using vector 0x40 (and if so might this be
> fixed with -unstable commit fc0c3fa2ad, in turn requiring
> e509b8e09c)?
>
I don't understand all this, but fyi I believe I have two "serial ports"
on the motherboard, one old fashioned serial-port and one created by BMC
for "SOL". They show up in dom0 as ttyS0 and ttyS1. Only ttyS0 is ever
used that I am aware of. I also have one usb-rs232 emulation thingy
which is actually my UPS. All of these are used directly by dom0.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-17 Thread Andrew Cooper
On 17/01/2016 23:07, Håkon Alstadheim wrote:
> Den 17. jan. 2016 17:30, skrev Håkon Alstadheim:
>> Den 17. jan. 2016 16:16, skrev Andrew Cooper:
>>> On 17/01/16 14:50, Håkon Alstadheim wrote:
 Den 15. jan. 2016 12:05, skrev Andrew Cooper:
> On 15/01/16 10:58, Håkon Alstadheim wrote:
>> CPUINFO:
>> vendor_id: GenuineIntel
>> cpu family: 6
>> model: 63
>> model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>
>> # smbios-sys-info
>> Libsmbios version:  2.2.28
>> Product Name:   Z10PE-D8 WS
>> Vendor: ASUSTeK COMPUTER INC.
>> BIOS Version:   3101
>>
>>
>> I have been experiencing issues with domains with passed through PCIe
>> devices since I first installed xen. Then at version 4.5.x , I'm now
>> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
>> pass through and interrupts (usb-cards, sound cards).
>>
>> Recently the system has been more stable, whether it is because I pass
>> through as few things as possible, or because of improvements in Xen I
>> do not know. I have also taken to building with debug, which leads to
>> more abrupt but less mysterious failures. Earlier (w/o debug and under
>> xen 4.5 ) stuff would just gradually stop working and end up in total
>> hang of everything. So, hey, things are improving :-b
> This isn't the first time we have seen this on Haswell processors. Do
> you have microcode loading set up?
>
> ~Andrew
>
 Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
 cpu microcode, using microcode from 20151106.
>>> Ok - I previously investigated this issue, but my repro evaporated from
>>> under my feet with a firmware update, and I never got to the bottom of it.
>>>
>>> Please can you start with the following patch which will dump some more
>>> information on crash.
>>>
>>> ---8<---
>>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>>> index 1228568..588b562 100644
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
>>>  if ( action->ack_type == ACKTYPE_EOI )
>>>  {
>>>  sp = pending_eoi_sp(peoi);
>>> +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
>>> +{
>>> +int p;
>>> +for ( p = sp; p > 0; --p )
>>> +printk("**peoi[%d] = {%d, 0x%u, %d}\n",
>>> +   p-1, peoi[p-1].irq, peoi[p-1].vector,
>>> peoi[p-1].ready);
>>> +}
>>>  ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>>>  ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>>>  peoi[sp].irq = irq;
>>>
>>>
>> Will do. Building now.
>> Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at
>> the end of preceding line I presume?
>>
> There we go :-/ . Log attached from boot to assertion-failure with
> loglvl=all guest_loglvl=all . Some of the log output might be a bit
> cryptic, they are notes to myself from local boot-scripts, basically
> firing up my router/name-server/dhcp-server and waiting until services
> are ready before continuing.

Would you mind running with the second patch I sent? It gathers more
information.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-17 Thread Andrew Cooper
On 17/01/16 14:50, Håkon Alstadheim wrote:
> Den 15. jan. 2016 12:05, skrev Andrew Cooper:
>> On 15/01/16 10:58, Håkon Alstadheim wrote:
>>> CPUINFO:
>>> vendor_id: GenuineIntel
>>> cpu family: 6
>>> model: 63
>>> model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>>
>>> # smbios-sys-info
>>> Libsmbios version:  2.2.28
>>> Product Name:   Z10PE-D8 WS
>>> Vendor: ASUSTeK COMPUTER INC.
>>> BIOS Version:   3101
>>>
>>>
>>> I have been experiencing issues with domains with passed through PCIe
>>> devices since I first installed xen. Then at version 4.5.x , I'm now
>>> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
>>> pass through and interrupts (usb-cards, sound cards).
>>>
>>> Recently the system has been more stable, whether it is because I pass
>>> through as few things as possible, or because of improvements in Xen I
>>> do not know. I have also taken to building with debug, which leads to
>>> more abrupt but less mysterious failures. Earlier (w/o debug and under
>>> xen 4.5 ) stuff would just gradually stop working and end up in total
>>> hang of everything. So, hey, things are improving :-b
>> This isn't the first time we have seen this on Haswell processors. Do
>> you have microcode loading set up?
>>
>> ~Andrew
>>
> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
> cpu microcode, using microcode from 20151106.

Ok - I previously investigated this issue, but my repro evaporated from
under my feet with a firmware update, and I never got to the bottom of it.

Please can you start with the following patch which will dump some more
information on crash.

---8<---
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1228568..588b562 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
 if ( action->ack_type == ACKTYPE_EOI )
 {
 sp = pending_eoi_sp(peoi);
+if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
+{
+int p;
+for ( p = sp; p > 0; --p )
+printk("**peoi[%d] = {%d, 0x%u, %d}\n",
+   p-1, peoi[p-1].irq, peoi[p-1].vector,
peoi[p-1].ready);
+}
 ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
 ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
 peoi[sp].irq = irq;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-17 Thread Andrew Cooper
On 17/01/16 15:16, Andrew Cooper wrote:
>
>>> This isn't the first time we have seen this on Haswell processors. Do
>>> you have microcode loading set up?
>>>
>>> ~Andrew
>>>
>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
>> cpu microcode, using microcode from 20151106.
> Ok - I previously investigated this issue, but my repro evaporated from
> under my feet with a firmware update, and I never got to the bottom of it.
>
> Please can you start with the following patch which will dump some more
> information on crash.
>
> ---8<---
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 1228568..588b562 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
>  if ( action->ack_type == ACKTYPE_EOI )
>  {
>  sp = pending_eoi_sp(peoi);
> +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
> +{
> +int p;
> +for ( p = sp; p > 0; --p )
> +printk("**peoi[%d] = {%d, 0x%u, %d}\n",
> +   p-1, peoi[p-1].irq, peoi[p-1].vector,
> peoi[p-1].ready);
> +}
>  ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>  ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>  peoi[sp].irq = irq;

Actually, this will be more useful:

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1228568..4e75b03 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq)
 if ( action->ack_type == ACKTYPE_EOI )
 {
 sp = pending_eoi_sp(peoi);
+if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
+{
+int p;
+
+printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector);
+for ( p = sp; p > 0; --p )
+printk("**peoi[%d] = {%d, %#x, %d}\n",
+   p-1, peoi[p-1].irq, peoi[p-1].vector,
peoi[p-1].ready);
+}
 ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
 ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
 peoi[sp].irq = irq;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-17 Thread Håkon Alstadheim
Den 17. jan. 2016 16:16, skrev Andrew Cooper:
> On 17/01/16 14:50, Håkon Alstadheim wrote:
>> Den 15. jan. 2016 12:05, skrev Andrew Cooper:
>>> On 15/01/16 10:58, Håkon Alstadheim wrote:
 CPUINFO:
 vendor_id: GenuineIntel
 cpu family: 6
 model: 63
 model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

 # smbios-sys-info
 Libsmbios version:  2.2.28
 Product Name:   Z10PE-D8 WS
 Vendor: ASUSTeK COMPUTER INC.
 BIOS Version:   3101


 I have been experiencing issues with domains with passed through PCIe
 devices since I first installed xen. Then at version 4.5.x , I'm now
 at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
 pass through and interrupts (usb-cards, sound cards).

 Recently the system has been more stable, whether it is because I pass
 through as few things as possible, or because of improvements in Xen I
 do not know. I have also taken to building with debug, which leads to
 more abrupt but less mysterious failures. Earlier (w/o debug and under
 xen 4.5 ) stuff would just gradually stop working and end up in total
 hang of everything. So, hey, things are improving :-b
>>> This isn't the first time we have seen this on Haswell processors. Do
>>> you have microcode loading set up?
>>>
>>> ~Andrew
>>>
>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
>> cpu microcode, using microcode from 20151106.
> Ok - I previously investigated this issue, but my repro evaporated from
> under my feet with a firmware update, and I never got to the bottom of it.
>
> Please can you start with the following patch which will dump some more
> information on crash.
>
> ---8<---
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 1228568..588b562 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
>  if ( action->ack_type == ACKTYPE_EOI )
>  {
>  sp = pending_eoi_sp(peoi);
> +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
> +{
> +int p;
> +for ( p = sp; p > 0; --p )
> +printk("**peoi[%d] = {%d, 0x%u, %d}\n",
> +   p-1, peoi[p-1].irq, peoi[p-1].vector,
> peoi[p-1].ready);
> +}
>  ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>  ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>  peoi[sp].irq = irq;
>
>
Will do. Building now.
Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at
the end of preceding line I presume?


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Andrew Cooper
On 15/01/16 10:58, Håkon Alstadheim wrote:
> This is just a preliminary report, mostly just for the record.
>
> I will report again if this keeps happening after 4.7 is out, or upon
> request. Anyone working on this, please mail me and request more
> information. I have available logs from dom0 boot (I dump dmesg and xl
> dmesg to disk after every boot, and log dom0 serial console to disk).
> I will send boot logs if requested. I will turn on maximum verbosity
> and provide all output. My serial console is very slow, so I can not
> keep running at max verbosity all the time.
>
> At the end of this mail there is "xl info" and output from dom0 serial
> console.
>
> CPUINFO:
> vendor_id: GenuineIntel
> cpu family: 6
> model: 63
> model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>
> # smbios-sys-info
> Libsmbios version:  2.2.28
> Product Name:   Z10PE-D8 WS
> Vendor: ASUSTeK COMPUTER INC.
> BIOS Version:   3101
>
> Dom0 OS:
> Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64
> Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux.
> Kernel is gentoo-sources, with experimental use-flag. Cpu type set to
> Haswell. Issue also happened without experimental.
> # cat /proc/cmdline
> placeholder root=LABEL=ssdroot ro
> xen-pciback.hide=(02:00.*)(08:00.*)(00:1b.*)(81:00.*)(82:00.*)(83:00.*) 
> console=hvc0
> console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen
> usbcore.autosuspend=-1
>
> The system is mostly built with stable packages, xen and xen-tools
> keyworded to ~amd64.
>
> I have been experiencing issues with domains with passed through PCIe
> devices since I first installed xen. Then at version 4.5.x , I'm now
> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
> pass through and interrupts (usb-cards, sound cards).
>
> Recently the system has been more stable, whether it is because I pass
> through as few things as possible, or because of improvements in Xen I
> do not know. I have also taken to building with debug, which leads to
> more abrupt but less mysterious failures. Earlier (w/o debug and under
> xen 4.5 ) stuff would just gradually stop working and end up in total
> hang of everything. So, hey, things are improving :-b

This isn't the first time we have seen this on Haswell processors. Do
you have microcode loading set up?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Håkon Alstadheim



On 01/15/2016 12:05 PM, Andrew Cooper wrote:

On 15/01/16 10:58, Håkon Alstadheim wrote:

This is just a preliminary report, mostly just for the record.

I will report again if this keeps happening after 4.7 is out, or upon
request. Anyone working on this, please mail me and request more
information. I have available logs from dom0 boot (I dump dmesg and xl
dmesg to disk after every boot, and log dom0 serial console to disk).
I will send boot logs if requested. I will turn on maximum verbosity
and provide all output. My serial console is very slow, so I can not
keep running at max verbosity all the time.

At the end of this mail there is "xl info" and output from dom0 serial
console.

CPUINFO:
vendor_id: GenuineIntel
cpu family: 6
model: 63
model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

# smbios-sys-info
Libsmbios version:  2.2.28
Product Name:   Z10PE-D8 WS
Vendor: ASUSTeK COMPUTER INC.
BIOS Version:   3101

Dom0 OS:
Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64
Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux.
Kernel is gentoo-sources, with experimental use-flag. Cpu type set to
Haswell. Issue also happened without experimental.
# cat /proc/cmdline
placeholder root=LABEL=ssdroot ro
xen-pciback.hide=(02:00.*)(08:00.*)(00:1b.*)(81:00.*)(82:00.*)(83:00.*) 
console=hvc0
console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen
usbcore.autosuspend=-1

The system is mostly built with stable packages, xen and xen-tools
keyworded to ~amd64.

I have been experiencing issues with domains with passed through PCIe
devices since I first installed xen. Then at version 4.5.x , I'm now
at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
pass through and interrupts (usb-cards, sound cards).

Recently the system has been more stable, whether it is because I pass
through as few things as possible, or because of improvements in Xen I
do not know. I have also taken to building with debug, which leads to
more abrupt but less mysterious failures. Earlier (w/o debug and under
xen 4.5 ) stuff would just gradually stop working and end up in total
hang of everything. So, hey, things are improving :-b

This isn't the first time we have seen this on Haswell processors. Do
you have microcode loading set up?

Not entirely sure to be honest. Is microcode: 0x31 the newest?

I AM running the very latest bios from Asus, but I do not have 
confidence in my microcode loading setup, so I have not had one in place.


Trying now.

Downloading microcode.dat from Intel
Installing iucode_tool, which in its --help states:
  -w, --write-to=fileWrite selected microcodes to a file in binary
 format.  The binary format is suitable to be
 uploaded to the kernel

Ran "iucode_tool microcode.dat -w microcode.bin"

# ls -l micro*
-rwxr-xr-x 1 root root  693248 Jan 15 12:40 microcode.bin
-rwxr-xr-x 1 root root 2081807 Nov  6 04:04 microcode.dat

placed microcode.bin in /boot/microcode.bin

 booted with :
---
xen_commandline: ssd-xen-debug-marker console_timestamps=date 
loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug 
iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 
dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose 
tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 
core_parking=power ucode=microcode.bin

---

#cat /proc/cpuinfo | grep micro
says: microcode: 0x31

This is no change from previous boot.
Now: How do I know wheter 0x31 is the newest?

Grepping the console output reveals no reference to ucode or microcode 
other than the Xen command-line.

---
Håkon


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Håkon Alstadheim



On 01/15/2016 01:42 PM, Jan Beulich wrote:

On 15.01.16 at 13:32,  wrote:

placed microcode.bin in /boot/microcode.bin

   booted with :
---
xen_commandline: ssd-xen-debug-marker console_timestamps=date
loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug
iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4
dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose
tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1
core_parking=power ucode=microcode.bin
---

This can't work - did you look at the command line documentation?
You can't specify a file name here - there's no file system driver
inside the hypervisor, and hence it can't read files (it instead has
to rely on the boot loader bringing those into memory for it).
Get with the times :-) . Under EFI it most definitely wants a file-name. 
Not entirely sure about the file FORMAT though.


From xen-command-line.html
 "Note further that use of this option has an unspecified effect when 
used with xen.efi (there the concept of modules doesn't exist, and the 
blob gets specified via the ucode= config file/section entry; 
see EFI configuration file description).


From efi.html

 "ucode=

Specifies a CPU microcode blob to load. (x86 only)



#cat /proc/cpuinfo | grep micro
says: microcode: 0x31

This is no change from previous boot.
Now: How do I know wheter 0x31 is the newest?

By checking - for the precise model and stepping of your CPU(s) -
the information in the blob (which admittedly is a little cumbersome,
but without knowing model and stepping I also can't try to help).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Jan Beulich
>>> On 15.01.16 at 13:32,  wrote:
> placed microcode.bin in /boot/microcode.bin
> 
>   booted with :
> ---
> xen_commandline: ssd-xen-debug-marker console_timestamps=date 
> loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug 
> iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 
> dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose 
> tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 
> core_parking=power ucode=microcode.bin
> ---

This can't work - did you look at the command line documentation?
You can't specify a file name here - there's no file system driver
inside the hypervisor, and hence it can't read files (it instead has
to rely on the boot loader bringing those into memory for it).

> #cat /proc/cpuinfo | grep micro
> says: microcode: 0x31
> 
> This is no change from previous boot.
> Now: How do I know wheter 0x31 is the newest?

By checking - for the precise model and stepping of your CPU(s) -
the information in the blob (which admittedly is a little cumbersome,
but without knowing model and stepping I also can't try to help).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Håkon Alstadheim



On 01/15/2016 02:09 PM, Ian Campbell wrote:

On Fri, 2016-01-15 at 13:49 +0100, Håkon Alstadheim wrote:

On 01/15/2016 01:42 PM, Jan Beulich wrote:

On 15.01.16 at 13:32,  wrote:

placed microcode.bin in /boot/microcode.bin

booted with :
---
xen_commandline: ssd-xen-debug-marker console_timestamps=date
loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug
iommu_inclusive_mapping=1 com1=115200,8n1 console=com1
dom0_max_vcpus=4
dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose
tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1
core_parking=power ucode=microcode.bin
---

This can't work - did you look at the command line documentation?
You can't specify a file name here - there's no file system driver
inside the hypervisor, and hence it can't read files (it instead has
to rely on the boot loader bringing those into memory for it).

Get with the times :-) . Under EFI it most definitely wants a file-name.
Not entirely sure about the file FORMAT though.

  From xen-command-line.html
   "Note further that use of this option has an unspecified effect when
used with xen.efi (there the concept of modules doesn't exist, and the
blob gets specified via the ucode= config file/section entry;
see EFI configuration file description).

  From efi.html

   "ucode=

  Specifies a CPU microcode blob to load. (x86 only)

This needs to go in your xen.cfg file (alongside kernel= ramdisk= etc), not
on the xen command line.

Ian.



Ahh (face + palm) .  It dawned on me right after I sent my previous. Now 
I DO get some acknowledgement of microcode.bin in the console-log, but 
/proc/cpuinfo still reports microcode: 0x31, so it seems stale 
microcode is not the issue :-/




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163

2016-01-15 Thread Håkon Alstadheim



On 01/15/2016 01:49 PM, Håkon Alstadheim wrote:



On 01/15/2016 01:42 PM, Jan Beulich wrote:

On 15.01.16 at 13:32,  wrote:

placed microcode.bin in /boot/microcode.bin

   booted with :
---
xen_commandline: ssd-xen-debug-marker console_timestamps=date
loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug
iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4
dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose
tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1
core_parking=power ucode=microcode.bin
---

This can't work - did you look at the command line documentation?
You can't specify a file name here - there's no file system driver
inside the hypervisor, and hence it can't read files (it instead has
to rely on the boot loader bringing those into memory for it).
Get with the times :-) . Under EFI it most definitely wants a 
file-name. Not entirely sure about the file FORMAT though.


From xen-command-line.html
 "Note further that use of this option has an unspecified effect when 
used with xen.efi (there the concept of modules doesn't exist, and the 
blob gets specified via the ucode= config file/section 
entry; see EFI configuration file description).


From efi.html

 "ucode=

Specifies a CPU microcode blob to load. (x86 only)



#cat /proc/cpuinfo | grep micro
says: microcode: 0x31

This is no change from previous boot.
Now: How do I know wheter 0x31 is the newest?

By checking - for the precise model and stepping of your CPU(s) -
the information in the blob (which admittedly is a little cumbersome,
but without knowing model and stepping I also can't try to help).

Jan


My fingers running faster than my head here. Managed to generate a blob 
that Xen accepts with command "iucode_tool microcode.dat -S -w 
microcode.bin"  (missed the -S before).

ucode=microcode.bin on a line by itself in the config.

Now the file actually loads, there is indeed an update, to 0x36 in my case.

If the error at irq.c:1163 keeps happening, I'll be sure to report 
again. :-~


Humbly, thanks Håkon. Sorry for all the noise.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel