Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 15:30, Roger Pau Monné  wrote:
> On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
>> On 28.11.2019 15:13, Roger Pau Monné  wrote:
>>> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
 On 28.11.2019 12:39, Roger Pau Monné wrote:
> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>> At the time the pending EOI stack was introduced there were no
>> internally used IRQs which would have the LAPIC EOI issued from the
>> ->end() hook. This had then changed with the introduction of IOMMUs,
>> but the interaction issue was presumably masked by
>> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>> running without need"]).
>>
>> The problem is that with us re-enabling interrupts across handler
>> invocation, a higher priority (guest) interrupt may trigger while
>> handling a lower priority (internal) one. The EOI issued from
>> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>> EOI the higher priority (guest) interrupt, breaking (among other
>> things) pending EOI stack logic's assumptions.
>
> Maybe there's something that I'm missing, but shouldn't hypervisor
> vectors always be higher priority than guest ones?

 Depends - IOMMU ones imo aren't something that needs urgently
 dealing with, so a little bit of delay won't hurt. There would
 only be a problem if such interrupts could be deferred
 indefinitely.

> I see there's already a range reserved for high priority vectors
> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> interrupts not using this range?

 We'd quickly run out of high priority vectors on systems with
 multiple (and perhaps indeed many) IOMMUs.
>>>
>>> Well, there's no limit on the number of high priority vectors, since
>>> this is all a software abstraction. It only matters that such vectors
>>> are higher than guest owned ones.
>>>
>>> I have to take a look, but I would think that Xen used vectors are the
>>> first ones to be allocated, and hence could start from
>>> FIRST_HIPRIORITY_VECTOR - 1 and go down from there.
>>
>> If this was the case, then we wouldn't have observed the issue (despite
>> it being there) this patch tries to address. The IOMMUs for both Andrew
>> and me ended up using vector 0x28, below everything that e.g. the
>> IO-APIC RTE got assigned.
> 
> I know it's not like that ATM, and hence I wonder whether it would be
> possible to make it so: Xen vectors get allocated down from
> FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue.
> 
>> Also don't forget that we don't allocate
>> vectors continuously, but such that they'd get spread across the
>> different priority levels. (Whether that's an awfully good idea is a
>> separate question.)
> 
> Well, vectors used by Xen would be allocated downwards continuously
> from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread.
> 
> Guest used vectors could continue to use the same allocation
> mechanism, since that's a different issue.

The issue would go away only if guest vectors are at strictly
lower priority than Xen ones. I.e. we'd need to go in steps of
16. And there aren't that many vectors ... (I'm happy to see
changes here, but it'll need to be very careful ones.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 14:00, Jan Beulich wrote:
> On 28.11.2019 14:53, Andrew Cooper wrote:
>> On 28/11/2019 12:10, Andrew Cooper wrote:
>>> On 28/11/2019 11:03, Jan Beulich wrote:
 Notes:

 - In principle we could get away without the check_eoi_deferral flag.
   I've introduced it just to make sure there's as little change as
   possible to unaffected paths.
 - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
   strictly necessary.
>>> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
>>> checked at the head of end_nonmaskable_irq() as well.
>>>
>>> Similarly, I'm not sure that check_eoi_deferral is something that we'd
>>> want to introduce.
>>>
>>> I'll drop both of these and test, seeing as I have a repro of the problem.
>> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
>> I didn't), and dropping just check_eoi_deferral on its own definitely
>> breaks things.
>>
>> Given the 4.13 timeline, lets go with it in this form, seeing as it is
>> the version which had all of last night's worth of testing.
>>
>> Acked-by: Andrew Cooper 
>> Tested-by: Andrew Cooper 
> Thanks! I've taken note to produce a patch (if possible at all, given
> the results of your attempt) to remove the extra pieces again, ideally
> to go in pretty soon after the branching.

Just for morbid curiosity, this is what happens without the
check_eoi_deferral check:

[  383.845620] dracut-initqueue[302]: Warning: Could not boot.
[  383.892655] dracut-initqueue[302]: Warning:
/dev/disk/by-label/root-qomhwa does not exist
 Starting Dracut Emergency Shell...

dracut:/# [  439.282461] BUG: unable to handle kernel NULL pointer
dereference at 09a0
[  439.282773] PGD 0 P4D 0
[  439.282928] Oops:  [#1] SMP NOPTI
[  439.283088] CPU: 0 PID: 281 Comm: systemd-udevd Tainted: G  
O  4.19.0+1 #1
[  439.283337] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS
2.0.2 10/22/2012
[  439.283591] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih]
[  439.283751] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00
e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 c8 00 00 00 01 75
34 <48> 8b bd a0
[  439.284153] RSP: e02b:c900404b7b50 EFLAGS: 00010246
[  439.284312] RAX: 0049 RBX: 88809a7d9000 RCX:
0006
[  439.284477] RDX:  RSI: 0001 RDI:

[  439.284644] RBP:  R08:  R09:
0384
[  439.284810] R10: 0004 R11:  R12:
88809a7cc000
[  439.284975] R13:  R14: c038b180 R15:
c900404b7e98
[  439.285149] FS:  7ff0922a38c0() GS:8880a360()
knlGS:
[  439.285428] CS:  e033 DS:  ES:  CR0: 80050033
[  439.285590] CR2: 09a0 CR3: 04012000 CR4:
0660
[  439.285765] Call Trace:
[  439.285924]  mptsas_probe+0x384/0x530 [mptsas]
[  439.286133]  pci_device_probe+0xc9/0x140
[  439.286297]  really_probe+0x238/0x3e0
[  439.286455]  driver_probe_device+0x115/0x130
[  439.286622]  __driver_attach+0x103/0x110
[  439.286805]  ? driver_probe_device+0x130/0x130
[  439.286964]  bus_for_each_dev+0x67/0xc0
[  439.287120]  bus_add_driver+0x41/0x260
[  439.287274]  ? 0xc038f000
[  439.287426]  driver_register+0x5b/0xe0
[  439.287578]  ? 0xc038f000
[  439.287732]  mptsas_init+0x114/0x1000 [mptsas]
[  439.287890]  do_one_initcall+0x4e/0x1d4
[  439.288045]  ? _cond_resched+0x15/0x30
[  439.288232]  ? kmem_cache_alloc_trace+0x15f/0x1c0
[  439.288392]  do_init_module+0x5a/0x21b
[  439.288546]  load_module+0x2270/0x2800
[  439.288700]  ? m_show+0x1c0/0x1c0
[  439.288854]  __do_sys_finit_module+0x94/0xe0
[  439.289012]  do_syscall_64+0x4e/0x100
[  439.289168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  439.289326] RIP: 0033:0x7ff090ef9ec9
[  439.289482] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f8
[  439.289871] RSP: 002b:7ffd96d355d8 EFLAGS: 0246 ORIG_RAX:
0139
[  439.290120] RAX: ffda RBX: 5581851f9f70 RCX:
7ff090ef9ec9
[  439.290287] RDX:  RSI: 7ff09181a099 RDI:
000c
[  439.290458] RBP: 7ff09181a099 R08:  R09:
5581851f6670
[  439.290625] R10: 000c R11: 0246 R12:

[  439.290792] R13: 5581851f6240 R14: 0002 R15:

[  439.290993] Modules linked in: ata_generic pata_acpi ohci_pci ahci
serio_raw pata_atiixp libahci mptsas(+) scsi_transport_sas ehci_pci
bnx2(O) libata mptscst
[  439.291484] CR2: 09a0
[  439.291665] ---[ end trace 519b6f631d8509d1 ]---
[  439.295635] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih]
[  439.295804] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00
e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> On 28.11.2019 15:13, Roger Pau Monné  wrote:
> > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
> >> On 28.11.2019 12:39, Roger Pau Monné wrote:
> >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>  At the time the pending EOI stack was introduced there were no
>  internally used IRQs which would have the LAPIC EOI issued from the
>  ->end() hook. This had then changed with the introduction of IOMMUs,
>  but the interaction issue was presumably masked by
>  irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>  (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>  running without need"]).
> 
>  The problem is that with us re-enabling interrupts across handler
>  invocation, a higher priority (guest) interrupt may trigger while
>  handling a lower priority (internal) one. The EOI issued from
>  ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>  EOI the higher priority (guest) interrupt, breaking (among other
>  things) pending EOI stack logic's assumptions.
> >>>
> >>> Maybe there's something that I'm missing, but shouldn't hypervisor
> >>> vectors always be higher priority than guest ones?
> >>
> >> Depends - IOMMU ones imo aren't something that needs urgently
> >> dealing with, so a little bit of delay won't hurt. There would
> >> only be a problem if such interrupts could be deferred
> >> indefinitely.
> >>
> >>> I see there's already a range reserved for high priority vectors
> >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> >>> interrupts not using this range?
> >>
> >> We'd quickly run out of high priority vectors on systems with
> >> multiple (and perhaps indeed many) IOMMUs.
> > 
> > Well, there's no limit on the number of high priority vectors, since
> > this is all a software abstraction. It only matters that such vectors
> > are higher than guest owned ones.
> > 
> > I have to take a look, but I would think that Xen used vectors are the
> > first ones to be allocated, and hence could start from
> > FIRST_HIPRIORITY_VECTOR - 1 and go down from there.
> 
> If this was the case, then we wouldn't have observed the issue (despite
> it being there) this patch tries to address. The IOMMUs for both Andrew
> and me ended up using vector 0x28, below everything that e.g. the
> IO-APIC RTE got assigned.

I know it's not like that ATM, and hence I wonder whether it would be
possible to make it so: Xen vectors get allocated down from
FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue.

> Also don't forget that we don't allocate
> vectors continuously, but such that they'd get spread across the
> different priority levels. (Whether that's an awfully good idea is a
> separate question.)

Well, vectors used by Xen would be allocated downwards continuously
from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread.

Guest used vectors could continue to use the same allocation
mechanism, since that's a different issue.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 15:13, Roger Pau Monné  wrote:
> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
>> On 28.11.2019 12:39, Roger Pau Monné wrote:
>>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
 At the time the pending EOI stack was introduced there were no
 internally used IRQs which would have the LAPIC EOI issued from the
 ->end() hook. This had then changed with the introduction of IOMMUs,
 but the interaction issue was presumably masked by
 irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
 (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
 running without need"]).

 The problem is that with us re-enabling interrupts across handler
 invocation, a higher priority (guest) interrupt may trigger while
 handling a lower priority (internal) one. The EOI issued from
 ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
 EOI the higher priority (guest) interrupt, breaking (among other
 things) pending EOI stack logic's assumptions.
>>>
>>> Maybe there's something that I'm missing, but shouldn't hypervisor
>>> vectors always be higher priority than guest ones?
>>
>> Depends - IOMMU ones imo aren't something that needs urgently
>> dealing with, so a little bit of delay won't hurt. There would
>> only be a problem if such interrupts could be deferred
>> indefinitely.
>>
>>> I see there's already a range reserved for high priority vectors
>>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
>>> interrupts not using this range?
>>
>> We'd quickly run out of high priority vectors on systems with
>> multiple (and perhaps indeed many) IOMMUs.
> 
> Well, there's no limit on the number of high priority vectors, since
> this is all a software abstraction. It only matters that such vectors
> are higher than guest owned ones.
> 
> I have to take a look, but I would think that Xen used vectors are the
> first ones to be allocated, and hence could start from
> FIRST_HIPRIORITY_VECTOR - 1 and go down from there.

If this was the case, then we wouldn't have observed the issue (despite
it being there) this patch tries to address. The IOMMUs for both Andrew
and me ended up using vector 0x28, below everything that e.g. the
IO-APIC RTE got assigned. Also don't forget that we don't allocate
vectors continuously, but such that they'd get spread across the
different priority levels. (Whether that's an awfully good idea is a
separate question.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
> On 28.11.2019 12:39, Roger Pau Monné wrote:
> > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
> >> At the time the pending EOI stack was introduced there were no
> >> internally used IRQs which would have the LAPIC EOI issued from the
> >> ->end() hook. This had then changed with the introduction of IOMMUs,
> >> but the interaction issue was presumably masked by
> >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> >> running without need"]).
> >>
> >> The problem is that with us re-enabling interrupts across handler
> >> invocation, a higher priority (guest) interrupt may trigger while
> >> handling a lower priority (internal) one. The EOI issued from
> >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> >> EOI the higher priority (guest) interrupt, breaking (among other
> >> things) pending EOI stack logic's assumptions.
> > 
> > Maybe there's something that I'm missing, but shouldn't hypervisor
> > vectors always be higher priority than guest ones?
> 
> Depends - IOMMU ones imo aren't something that needs urgently
> dealing with, so a little bit of delay won't hurt. There would
> only be a problem if such interrupts could be deferred
> indefinitely.
> 
> > I see there's already a range reserved for high priority vectors
> > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> > interrupts not using this range?
> 
> We'd quickly run out of high priority vectors on systems with
> multiple (and perhaps indeed many) IOMMUs.

Well, there's no limit on the number of high priority vectors, since
this is all a software abstraction. It only matters that such vectors
are higher than guest owned ones.

I have to take a look, but I would think that Xen used vectors are the
first ones to be allocated, and hence could start from
FIRST_HIPRIORITY_VECTOR - 1 and go down from there. This way we will
end up with high priority vectors first, Xen used vectors, and
finally guest vectors.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:53, Andrew Cooper wrote:
> On 28/11/2019 12:10, Andrew Cooper wrote:
>> On 28/11/2019 11:03, Jan Beulich wrote:
>>> Notes:
>>>
>>> - In principle we could get away without the check_eoi_deferral flag.
>>>   I've introduced it just to make sure there's as little change as
>>>   possible to unaffected paths.
>>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>>>   strictly necessary.
>> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
>> checked at the head of end_nonmaskable_irq() as well.
>>
>> Similarly, I'm not sure that check_eoi_deferral is something that we'd
>> want to introduce.
>>
>> I'll drop both of these and test, seeing as I have a repro of the problem.
> 
> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
> I didn't), and dropping just check_eoi_deferral on its own definitely
> breaks things.
> 
> Given the 4.13 timeline, lets go with it in this form, seeing as it is
> the version which had all of last night's worth of testing.
> 
> Acked-by: Andrew Cooper 
> Tested-by: Andrew Cooper 

Thanks! I've taken note to produce a patch (if possible at all, given
the results of your attempt) to remove the extra pieces again, ideally
to go in pretty soon after the branching.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 12:10, Andrew Cooper wrote:
> On 28/11/2019 11:03, Jan Beulich wrote:
>> Notes:
>>
>> - In principle we could get away without the check_eoi_deferral flag.
>>   I've introduced it just to make sure there's as little change as
>>   possible to unaffected paths.
>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>>   strictly necessary.
> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
> checked at the head of end_nonmaskable_irq() as well.
>
> Similarly, I'm not sure that check_eoi_deferral is something that we'd
> want to introduce.
>
> I'll drop both of these and test, seeing as I have a repro of the problem.

Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
I didn't), and dropping just check_eoi_deferral on its own definitely
breaks things.

Given the 4.13 timeline, lets go with it in this form, seeing as it is
the version which had all of last night's worth of testing.

Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 12:39, Roger Pau Monné wrote:
> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>> At the time the pending EOI stack was introduced there were no
>> internally used IRQs which would have the LAPIC EOI issued from the
>> ->end() hook. This had then changed with the introduction of IOMMUs,
>> but the interaction issue was presumably masked by
>> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>> running without need"]).
>>
>> The problem is that with us re-enabling interrupts across handler
>> invocation, a higher priority (guest) interrupt may trigger while
>> handling a lower priority (internal) one. The EOI issued from
>> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>> EOI the higher priority (guest) interrupt, breaking (among other
>> things) pending EOI stack logic's assumptions.
> 
> Maybe there's something that I'm missing, but shouldn't hypervisor
> vectors always be higher priority than guest ones?

Depends - IOMMU ones imo aren't something that needs urgently
dealing with, so a little bit of delay won't hurt. There would
only be a problem if such interrupts could be deferred
indefinitely.

> I see there's already a range reserved for high priority vectors
> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> interrupts not using this range?

We'd quickly run out of high priority vectors on systems with
multiple (and perhaps indeed many) IOMMUs.

> IMO it seems troublesome that pending guests vectors can delay the
> injection of hypervisor interrupt vectors.

As per above - depends.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 11:03, Jan Beulich wrote:
> Notes:
>
> - In principle we could get away without the check_eoi_deferral flag.
>   I've introduced it just to make sure there's as little change as
>   possible to unaffected paths.
> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>   strictly necessary.

I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
checked at the head of end_nonmaskable_irq() as well.

Similarly, I'm not sure that check_eoi_deferral is something that we'd
want to introduce.

I'll drop both of these and test, seeing as I have a repro of the problem.

> - The new function's name isn't very helpful with its use in
>   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
>   parallel ack_APIC_irq()), but then liked this even less.

I don't have a better suggestion.

> - Other than I first thought serial console IRQs shouldn't be
>   affected, as we run them on specifically allocated high priority
>   vectors.

Good, although I don't think this wants to find its way into the final
commit message.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
> At the time the pending EOI stack was introduced there were no
> internally used IRQs which would have the LAPIC EOI issued from the
> ->end() hook. This had then changed with the introduction of IOMMUs,
> but the interaction issue was presumably masked by
> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> running without need"]).
> 
> The problem is that with us re-enabling interrupts across handler
> invocation, a higher priority (guest) interrupt may trigger while
> handling a lower priority (internal) one. The EOI issued from
> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> EOI the higher priority (guest) interrupt, breaking (among other
> things) pending EOI stack logic's assumptions.

Maybe there's something that I'm missing, but shouldn't hypervisor
vectors always be higher priority than guest ones?

I see there's already a range reserved for high priority vectors
({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
interrupts not using this range?

IMO it seems troublesome that pending guests vectors can delay the
injection of hypervisor interrupt vectors.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jürgen Groß

On 28.11.19 12:19, Jan Beulich wrote:

On 28.11.2019 12:03, Jan Beulich wrote:

At the time the pending EOI stack was introduced there were no
internally used IRQs which would have the LAPIC EOI issued from the
->end() hook. This had then changed with the introduction of IOMMUs,
but the interaction issue was presumably masked by
irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
(which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
running without need"]).

The problem is that with us re-enabling interrupts across handler
invocation, a higher priority (guest) interrupt may trigger while
handling a lower priority (internal) one. The EOI issued from
->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
EOI the higher priority (guest) interrupt, breaking (among other
things) pending EOI stack logic's assumptions.

Notes:

- In principle we could get away without the check_eoi_deferral flag.
   I've introduced it just to make sure there's as little change as
   possible to unaffected paths.
- Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
   strictly necessary.
- The new function's name isn't very helpful with its use in
   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
   parallel ack_APIC_irq()), but then liked this even less.
- Other than I first thought serial console IRQs shouldn't be
   affected, as we run them on specifically allocated high priority
   vectors.

Reported-by: Igor Druzhinin 
Diagnosed-by: Andrew Cooper 
Signed-off-by: Jan Beulich 


In case it's not explicit enough from the description: While the
issue appears to be long standing, it looks to have been masked
by other shortcomings prior to 4.13. Therefore this should be
considered at least a perceived regression, even if it may not
strictly be one.


Assuming an Ack: in a timely manner:

Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 12:03, Jan Beulich wrote:
> At the time the pending EOI stack was introduced there were no
> internally used IRQs which would have the LAPIC EOI issued from the
> ->end() hook. This had then changed with the introduction of IOMMUs,
> but the interaction issue was presumably masked by
> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> running without need"]).
> 
> The problem is that with us re-enabling interrupts across handler
> invocation, a higher priority (guest) interrupt may trigger while
> handling a lower priority (internal) one. The EOI issued from
> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> EOI the higher priority (guest) interrupt, breaking (among other
> things) pending EOI stack logic's assumptions.
> 
> Notes:
> 
> - In principle we could get away without the check_eoi_deferral flag.
>   I've introduced it just to make sure there's as little change as
>   possible to unaffected paths.
> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>   strictly necessary.
> - The new function's name isn't very helpful with its use in
>   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
>   parallel ack_APIC_irq()), but then liked this even less.
> - Other than I first thought serial console IRQs shouldn't be
>   affected, as we run them on specifically allocated high priority
>   vectors.
> 
> Reported-by: Igor Druzhinin 
> Diagnosed-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

In case it's not explicit enough from the description: While the
issue appears to be long standing, it looks to have been masked
by other shortcomings prior to 4.13. Therefore this should be
considered at least a perceived regression, even if it may not
strictly be one.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
At the time the pending EOI stack was introduced there were no
internally used IRQs which would have the LAPIC EOI issued from the
->end() hook. This had then changed with the introduction of IOMMUs,
but the interaction issue was presumably masked by
irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
(which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
running without need"]).

The problem is that with us re-enabling interrupts across handler
invocation, a higher priority (guest) interrupt may trigger while
handling a lower priority (internal) one. The EOI issued from
->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
EOI the higher priority (guest) interrupt, breaking (among other
things) pending EOI stack logic's assumptions.

Notes:

- In principle we could get away without the check_eoi_deferral flag.
  I've introduced it just to make sure there's as little change as
  possible to unaffected paths.
- Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
  strictly necessary.
- The new function's name isn't very helpful with its use in
  end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
  parallel ack_APIC_irq()), but then liked this even less.
- Other than I first thought serial console IRQs shouldn't be
  affected, as we run them on specifically allocated high priority
  vectors.

Reported-by: Igor Druzhinin 
Diagnosed-by: Andrew Cooper 
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1734,7 +1734,7 @@ static void end_level_ioapic_irq_new(str
 
 v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
-ack_APIC_irq();
+end_nonmaskable_irq(desc, vector);
 
 if ( (desc->status & IRQ_MOVE_PENDING) &&
  !io_apic_level_ack_pending(desc->irq) )
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -438,6 +438,7 @@ int __init init_irq_data(void)
 }
 
 static void __do_IRQ_guest(int vector);
+static void flush_ready_eoi(void);
 
 static void ack_none(struct irq_desc *desc)
 {
@@ -865,6 +866,7 @@ void pirq_set_affinity(struct domain *d,
 }
 
 DEFINE_PER_CPU(unsigned int, irq_count);
+static DEFINE_PER_CPU(bool, check_eoi_deferral);
 
 uint8_t alloc_hipriority_vector(void)
 {
@@ -1008,7 +1010,25 @@ void do_IRQ(struct cpu_user_regs *regs)
 
  out:
 if ( desc->handler->end )
+{
+/*
+ * If higher priority vectors still have their EOIs pending, we may
+ * not issue an EOI here, as this would EOI the highest priority one.
+ */
+if ( cpu_has_pending_apic_eoi() )
+{
+this_cpu(check_eoi_deferral) = true;
+desc->handler->end(desc, vector);
+this_cpu(check_eoi_deferral) = false;
+
+spin_unlock(>lock);
+flush_ready_eoi();
+goto out_no_unlock;
+}
+
 desc->handler->end(desc, vector);
+}
+
  out_no_end:
 spin_unlock(>lock);
  out_no_unlock:
@@ -1163,6 +1183,29 @@ bool cpu_has_pending_apic_eoi(void)
 return pending_eoi_sp(this_cpu(pending_eoi)) != 0;
 }
 
+void end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector)
+{
+struct pending_eoi *peoi = this_cpu(pending_eoi);
+unsigned int sp = pending_eoi_sp(peoi);
+
+if ( !this_cpu(check_eoi_deferral) || !sp || peoi[sp - 1].vector < vector )
+{
+ack_APIC_irq();
+return;
+}
+
+/* Defer this vector's EOI until all higher ones have been EOI-ed. */
+pending_eoi_sp(peoi) = sp + 1;
+do {
+peoi[sp] = peoi[sp - 1];
+} while ( --sp && peoi[sp - 1].vector > vector );
+ASSERT(!sp || peoi[sp - 1].vector < vector);
+
+peoi[sp].irq = desc->irq;
+peoi[sp].vector = vector;
+peoi[sp].ready = 1;
+}
+
 static inline void set_pirq_eoi(struct domain *d, unsigned int irq)
 {
 if ( d->arch.pirq_eoi_map )
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -512,11 +512,6 @@ static void ack_maskable_msi_irq(struct
 ack_APIC_irq(); /* ACKTYPE_NONE */
 }
 
-void end_nonmaskable_msi_irq(struct irq_desc *desc, u8 vector)
-{
-ack_APIC_irq(); /* ACKTYPE_EOI */
-}
-
 /*
  * IRQ chip for MSI PCI/PCI-X/PCI-Express devices,
  * which implement the MSI or MSI-X capability structure.
@@ -539,7 +534,7 @@ static hw_irq_controller pci_msi_nonmask
 .enable   = irq_enable_none,
 .disable  = irq_disable_none,
 .ack  = ack_nonmaskable_msi_irq,
-.end  = end_nonmaskable_msi_irq,
+.end  = end_nonmaskable_irq,
 .set_affinity = set_msi_affinity
 };
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -427,7 +427,7 @@ static unsigned int iommu_msi_startup(st
 static void iommu_msi_end(struct irq_desc *desc, u8 vector)
 {
 iommu_msi_unmask(desc);
-ack_APIC_irq();
+end_nonmaskable_irq(desc, vector);
 }
 
 
@@ -460,7 +460,7 @@ static void iommu_maskable_msi_shutdown(
  * maskable flavors here, as we want the ACK to be issued in ->end().
  */
 #define