On 22/10/2018 08:33, Chao Gao wrote:
> On Mon, Oct 15, 2018 at 01:06:12PM +0100, Andrew Cooper wrote:
>> On 15/10/18 11:30, Roger Pau Monné wrote:
>>> Hello,
>>>
>>> Wei recently discovered an issue when running a Linux PVH Dom0 on a
>>> box with a Intel Family 6 (0x6), Model 158 (0x9e), Stepping 9 (raw
>>> 000906e9) CPU, we are not sure whether the issue is limited to a PVH
>>> Dom0, or it just happens to be easier to trigger in this scenario.
>> This issue has been seen very occasionally for years.  My debugging
>> patch dates back to 2013, and it has been observed on Haswell systems as
>> well.  There have also been a handful of reports on xen-devel over the
>> years.
>>
>> Wei is the first person to get a reliable enough repro to debug.  It is
>> not exclusive to PVH Dom0, but that appears to be the easiest way to
>> tickle the problem.
>>
>>> The issue is caused by what seems to be an interrupt injection while
>>> Xen is still servicing a previous interrupt (ie: the interrupt hasn't
>>> been EOI'ed and ISR for the vector is set) with the same or lower
>>> priority than the interrupt currently being serviced. This injection
>>> always happen when returning from idle from a state ACPI_STATE_C3 or
>>> lower.
>> As a bit of background, for some guest irqs, we need to inject the
>> interrupt into the guest and wait for an explicit ack.
>>
>> If the irq source doesn't have a mask bit which Xen can use, the only
>> option we have is to avoid repeated interruption is to leave the irq in
>> service at the LAPIC.  The purpose of the Pending EOI stack is to manage
>> these as acks arrive back from guest context.
>>
>> For reasons which aren't clear, guest-bound MSI vectors which don't have
>> a mask bit also use this PEOI stack mechanism.  I think this is probably
>> a Xen bug, but it also relevant to the issue.
>>
>> In Wei's case, the interrupt in question is an MSI non-maskable
>> interrupt from the USB controller.
>>
>>> Note that I haven't been able to reproduce this issue when using
>>> mwait-idle=0 or max_cstate=2 on the Xen command line, but again
>>> without knowing the underlying issue it's impossible to tell whether
>>> it's relevant.
>>>
>>> Andrew provided a debug patch which I've expanded to also log power
>>> state transition, and is attached to this email.
>>>
>>> Here is a trace of a crash, together with the debug info.
>>>
>>> (XEN) *** Pending EOI error ***
>>> (XEN)   cpu #1, irq 30, vector 0x21, sp 1
>>> (XEN) Peoi stack: sp 1
>>> (XEN)   [ 0] irq  30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0
>>> (XEN) Peoi stack trace records:
>>> (XEN)   [22619] POP      {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22620] POWER    TYPE 4
>>> (XEN)   [22621] IDLE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22622] WAKE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22623] ACK_PRE  PPR 0x000000f0
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN)   [22624] ACK_POST PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22625] POWER    TYPE 5
>>> (XEN)   [22626] IDLE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22627] WAKE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22628] PUSH     {sp  0, irq  30, vec 0x21}
>>> (XEN)   [22629] POWER    TYPE 5
>>> (XEN)   [22630] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22631] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22632] POWER    TYPE 5
>>> (XEN)   [22633] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22634] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22635] ACK_PRE  PPR 0x000000f0
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN)   [22636] ACK_POST PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22637] READY    {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22638] ACK_PRE  PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22639] ACK_POST PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22640] POP      {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22641] PUSH     {sp  0, irq  30, vec 0x21}
>>> (XEN)   [22642] POWER    TYPE 4
>>> (XEN)   [22643] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22644] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22645] POWER    TYPE 3
>>> (XEN)   [22646] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22647] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22648] POWER    TYPE 3
>>> (XEN)   [22649] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22650] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>> What has happened here is that, despite vector 0x21 being in service
>> (starting at the PUSH), we see it injected a second time.  The ASSERT()
>> fires because we find this vector still on the pending EOI stack.
>>
>> After that, we go idle a few times, but never haven't yet acked the
>> vector (i.e. whatever we're waiting for the guest to acknowledge hasn't
>> happened yet, and Xen has nothing else to do on this CPU).
>>
> >From the debugging, we see that PPR/IRR/ISR appear to retain their state
>> across the mwait, and there is nothing in the manual which I can see
>> discussing the interaction of LAPIC state and C states.
>>
>> However, from the behaviour seen here, we occasionally get woken from
>> mwait by an interrupt which already pending.  I can only conclude that
>> there is some issue with priority calculations for edge triggered
>> interrupts when idle, which allows another one to slip in.  The fact
> Hi, Roger, Andrew and Wei,
>
> Jan's patch
> (https://lists.xen.org/archives/html/xen-devel/2018-10/msg01031.html)
> fixs an issue in handling SVI. Currently, when dealing with EOI from guest, 
> the
> SVI was cleared. But the correct way is clearing the corresponding bit in VISR
> and then setting SVI to the highest index of bit set in VISR (please refer to
> SDM 29.1.4). If SVI is set to a value lower than the vector of the highest
> priority interrupt that is in service, the PPR virtualization (29.1.3) might
> set the VPPR to a lower value on VMEntry too. Thus an interrupt with same or
> lower priority, which should be blocked by VPPR, slips in.
>
> Could you apply Jan's patch and try to reproduce it again?

Hello,

I'm aware of Jan's patch, but pertains to Xen's emulation of the virtual
Local APIC for a guest.

This bug is with the real hardware APIC, as it pertains waking from
MWAIT.  At the point that things go wrong, there is no VT-x involved at all.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to