Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 15:30, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote: >> On 28.11.2019 15:13, Roger Pau Monné wrote: >>> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: On 28.11.2019 12:39, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: >> At the time the pending EOI stack was introduced there were no >> internally used IRQs which would have the LAPIC EOI issued from the >> ->end() hook. This had then changed with the introduction of IOMMUs, >> but the interaction issue was presumably masked by >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer >> running without need"]). >> >> The problem is that with us re-enabling interrupts across handler >> invocation, a higher priority (guest) interrupt may trigger while >> handling a lower priority (internal) one. The EOI issued from >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly >> EOI the higher priority (guest) interrupt, breaking (among other >> things) pending EOI stack logic's assumptions. > > Maybe there's something that I'm missing, but shouldn't hypervisor > vectors always be higher priority than guest ones? Depends - IOMMU ones imo aren't something that needs urgently dealing with, so a little bit of delay won't hurt. There would only be a problem if such interrupts could be deferred indefinitely. > I see there's already a range reserved for high priority vectors > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > interrupts not using this range? We'd quickly run out of high priority vectors on systems with multiple (and perhaps indeed many) IOMMUs. >>> >>> Well, there's no limit on the number of high priority vectors, since >>> this is all a software abstraction. It only matters that such vectors >>> are higher than guest owned ones. >>> >>> I have to take a look, but I would think that Xen used vectors are the >>> first ones to be allocated, and hence could start from >>> FIRST_HIPRIORITY_VECTOR - 1 and go down from there. >> >> If this was the case, then we wouldn't have observed the issue (despite >> it being there) this patch tries to address. The IOMMUs for both Andrew >> and me ended up using vector 0x28, below everything that e.g. the >> IO-APIC RTE got assigned. > > I know it's not like that ATM, and hence I wonder whether it would be > possible to make it so: Xen vectors get allocated down from > FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue. > >> Also don't forget that we don't allocate >> vectors continuously, but such that they'd get spread across the >> different priority levels. (Whether that's an awfully good idea is a >> separate question.) > > Well, vectors used by Xen would be allocated downwards continuously > from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread. > > Guest used vectors could continue to use the same allocation > mechanism, since that's a different issue. The issue would go away only if guest vectors are at strictly lower priority than Xen ones. I.e. we'd need to go in steps of 16. And there aren't that many vectors ... (I'm happy to see changes here, but it'll need to be very careful ones.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 14:00, Jan Beulich wrote: > On 28.11.2019 14:53, Andrew Cooper wrote: >> On 28/11/2019 12:10, Andrew Cooper wrote: >>> On 28/11/2019 11:03, Jan Beulich wrote: Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. >>> I don't think the cpu_has_pending_apic_eoi() check is necessary. It is >>> checked at the head of end_nonmaskable_irq() as well. >>> >>> Similarly, I'm not sure that check_eoi_deferral is something that we'd >>> want to introduce. >>> >>> I'll drop both of these and test, seeing as I have a repro of the problem. >> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so >> I didn't), and dropping just check_eoi_deferral on its own definitely >> breaks things. >> >> Given the 4.13 timeline, lets go with it in this form, seeing as it is >> the version which had all of last night's worth of testing. >> >> Acked-by: Andrew Cooper >> Tested-by: Andrew Cooper > Thanks! I've taken note to produce a patch (if possible at all, given > the results of your attempt) to remove the extra pieces again, ideally > to go in pretty soon after the branching. Just for morbid curiosity, this is what happens without the check_eoi_deferral check: [ 383.845620] dracut-initqueue[302]: Warning: Could not boot. [ 383.892655] dracut-initqueue[302]: Warning: /dev/disk/by-label/root-qomhwa does not exist Starting Dracut Emergency Shell... dracut:/# [ 439.282461] BUG: unable to handle kernel NULL pointer dereference at 09a0 [ 439.282773] PGD 0 P4D 0 [ 439.282928] Oops: [#1] SMP NOPTI [ 439.283088] CPU: 0 PID: 281 Comm: systemd-udevd Tainted: G O 4.19.0+1 #1 [ 439.283337] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012 [ 439.283591] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih] [ 439.283751] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00 e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 c8 00 00 00 01 75 34 <48> 8b bd a0 [ 439.284153] RSP: e02b:c900404b7b50 EFLAGS: 00010246 [ 439.284312] RAX: 0049 RBX: 88809a7d9000 RCX: 0006 [ 439.284477] RDX: RSI: 0001 RDI: [ 439.284644] RBP: R08: R09: 0384 [ 439.284810] R10: 0004 R11: R12: 88809a7cc000 [ 439.284975] R13: R14: c038b180 R15: c900404b7e98 [ 439.285149] FS: 7ff0922a38c0() GS:8880a360() knlGS: [ 439.285428] CS: e033 DS: ES: CR0: 80050033 [ 439.285590] CR2: 09a0 CR3: 04012000 CR4: 0660 [ 439.285765] Call Trace: [ 439.285924] mptsas_probe+0x384/0x530 [mptsas] [ 439.286133] pci_device_probe+0xc9/0x140 [ 439.286297] really_probe+0x238/0x3e0 [ 439.286455] driver_probe_device+0x115/0x130 [ 439.286622] __driver_attach+0x103/0x110 [ 439.286805] ? driver_probe_device+0x130/0x130 [ 439.286964] bus_for_each_dev+0x67/0xc0 [ 439.287120] bus_add_driver+0x41/0x260 [ 439.287274] ? 0xc038f000 [ 439.287426] driver_register+0x5b/0xe0 [ 439.287578] ? 0xc038f000 [ 439.287732] mptsas_init+0x114/0x1000 [mptsas] [ 439.287890] do_one_initcall+0x4e/0x1d4 [ 439.288045] ? _cond_resched+0x15/0x30 [ 439.288232] ? kmem_cache_alloc_trace+0x15f/0x1c0 [ 439.288392] do_init_module+0x5a/0x21b [ 439.288546] load_module+0x2270/0x2800 [ 439.288700] ? m_show+0x1c0/0x1c0 [ 439.288854] __do_sys_finit_module+0x94/0xe0 [ 439.289012] do_syscall_64+0x4e/0x100 [ 439.289168] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 439.289326] RIP: 0033:0x7ff090ef9ec9 [ 439.289482] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f8 [ 439.289871] RSP: 002b:7ffd96d355d8 EFLAGS: 0246 ORIG_RAX: 0139 [ 439.290120] RAX: ffda RBX: 5581851f9f70 RCX: 7ff090ef9ec9 [ 439.290287] RDX: RSI: 7ff09181a099 RDI: 000c [ 439.290458] RBP: 7ff09181a099 R08: R09: 5581851f6670 [ 439.290625] R10: 000c R11: 0246 R12: [ 439.290792] R13: 5581851f6240 R14: 0002 R15: [ 439.290993] Modules linked in: ata_generic pata_acpi ohci_pci ahci serio_raw pata_atiixp libahci mptsas(+) scsi_transport_sas ehci_pci bnx2(O) libata mptscst [ 439.291484] CR2: 09a0 [ 439.291665] ---[ end trace 519b6f631d8509d1 ]--- [ 439.295635] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih] [ 439.295804] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00 e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > On 28.11.2019 15:13, Roger Pau Monné wrote: > > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: > >> On 28.11.2019 12:39, Roger Pau Monné wrote: > >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. > >>> > >>> Maybe there's something that I'm missing, but shouldn't hypervisor > >>> vectors always be higher priority than guest ones? > >> > >> Depends - IOMMU ones imo aren't something that needs urgently > >> dealing with, so a little bit of delay won't hurt. There would > >> only be a problem if such interrupts could be deferred > >> indefinitely. > >> > >>> I see there's already a range reserved for high priority vectors > >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > >>> interrupts not using this range? > >> > >> We'd quickly run out of high priority vectors on systems with > >> multiple (and perhaps indeed many) IOMMUs. > > > > Well, there's no limit on the number of high priority vectors, since > > this is all a software abstraction. It only matters that such vectors > > are higher than guest owned ones. > > > > I have to take a look, but I would think that Xen used vectors are the > > first ones to be allocated, and hence could start from > > FIRST_HIPRIORITY_VECTOR - 1 and go down from there. > > If this was the case, then we wouldn't have observed the issue (despite > it being there) this patch tries to address. The IOMMUs for both Andrew > and me ended up using vector 0x28, below everything that e.g. the > IO-APIC RTE got assigned. I know it's not like that ATM, and hence I wonder whether it would be possible to make it so: Xen vectors get allocated down from FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue. > Also don't forget that we don't allocate > vectors continuously, but such that they'd get spread across the > different priority levels. (Whether that's an awfully good idea is a > separate question.) Well, vectors used by Xen would be allocated downwards continuously from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread. Guest used vectors could continue to use the same allocation mechanism, since that's a different issue. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 15:13, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: >> On 28.11.2019 12:39, Roger Pau Monné wrote: >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. >>> >>> Maybe there's something that I'm missing, but shouldn't hypervisor >>> vectors always be higher priority than guest ones? >> >> Depends - IOMMU ones imo aren't something that needs urgently >> dealing with, so a little bit of delay won't hurt. There would >> only be a problem if such interrupts could be deferred >> indefinitely. >> >>> I see there's already a range reserved for high priority vectors >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor >>> interrupts not using this range? >> >> We'd quickly run out of high priority vectors on systems with >> multiple (and perhaps indeed many) IOMMUs. > > Well, there's no limit on the number of high priority vectors, since > this is all a software abstraction. It only matters that such vectors > are higher than guest owned ones. > > I have to take a look, but I would think that Xen used vectors are the > first ones to be allocated, and hence could start from > FIRST_HIPRIORITY_VECTOR - 1 and go down from there. If this was the case, then we wouldn't have observed the issue (despite it being there) this patch tries to address. The IOMMUs for both Andrew and me ended up using vector 0x28, below everything that e.g. the IO-APIC RTE got assigned. Also don't forget that we don't allocate vectors continuously, but such that they'd get spread across the different priority levels. (Whether that's an awfully good idea is a separate question.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: > On 28.11.2019 12:39, Roger Pau Monné wrote: > > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > >> At the time the pending EOI stack was introduced there were no > >> internally used IRQs which would have the LAPIC EOI issued from the > >> ->end() hook. This had then changed with the introduction of IOMMUs, > >> but the interaction issue was presumably masked by > >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > >> running without need"]). > >> > >> The problem is that with us re-enabling interrupts across handler > >> invocation, a higher priority (guest) interrupt may trigger while > >> handling a lower priority (internal) one. The EOI issued from > >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > >> EOI the higher priority (guest) interrupt, breaking (among other > >> things) pending EOI stack logic's assumptions. > > > > Maybe there's something that I'm missing, but shouldn't hypervisor > > vectors always be higher priority than guest ones? > > Depends - IOMMU ones imo aren't something that needs urgently > dealing with, so a little bit of delay won't hurt. There would > only be a problem if such interrupts could be deferred > indefinitely. > > > I see there's already a range reserved for high priority vectors > > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > > interrupts not using this range? > > We'd quickly run out of high priority vectors on systems with > multiple (and perhaps indeed many) IOMMUs. Well, there's no limit on the number of high priority vectors, since this is all a software abstraction. It only matters that such vectors are higher than guest owned ones. I have to take a look, but I would think that Xen used vectors are the first ones to be allocated, and hence could start from FIRST_HIPRIORITY_VECTOR - 1 and go down from there. This way we will end up with high priority vectors first, Xen used vectors, and finally guest vectors. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 14:53, Andrew Cooper wrote: > On 28/11/2019 12:10, Andrew Cooper wrote: >> On 28/11/2019 11:03, Jan Beulich wrote: >>> Notes: >>> >>> - In principle we could get away without the check_eoi_deferral flag. >>> I've introduced it just to make sure there's as little change as >>> possible to unaffected paths. >>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't >>> strictly necessary. >> I don't think the cpu_has_pending_apic_eoi() check is necessary. It is >> checked at the head of end_nonmaskable_irq() as well. >> >> Similarly, I'm not sure that check_eoi_deferral is something that we'd >> want to introduce. >> >> I'll drop both of these and test, seeing as I have a repro of the problem. > > Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so > I didn't), and dropping just check_eoi_deferral on its own definitely > breaks things. > > Given the 4.13 timeline, lets go with it in this form, seeing as it is > the version which had all of last night's worth of testing. > > Acked-by: Andrew Cooper > Tested-by: Andrew Cooper Thanks! I've taken note to produce a patch (if possible at all, given the results of your attempt) to remove the extra pieces again, ideally to go in pretty soon after the branching. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 12:10, Andrew Cooper wrote: > On 28/11/2019 11:03, Jan Beulich wrote: >> Notes: >> >> - In principle we could get away without the check_eoi_deferral flag. >> I've introduced it just to make sure there's as little change as >> possible to unaffected paths. >> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't >> strictly necessary. > I don't think the cpu_has_pending_apic_eoi() check is necessary. It is > checked at the head of end_nonmaskable_irq() as well. > > Similarly, I'm not sure that check_eoi_deferral is something that we'd > want to introduce. > > I'll drop both of these and test, seeing as I have a repro of the problem. Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so I didn't), and dropping just check_eoi_deferral on its own definitely breaks things. Given the 4.13 timeline, lets go with it in this form, seeing as it is the version which had all of last night's worth of testing. Acked-by: Andrew Cooper Tested-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 12:39, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: >> At the time the pending EOI stack was introduced there were no >> internally used IRQs which would have the LAPIC EOI issued from the >> ->end() hook. This had then changed with the introduction of IOMMUs, >> but the interaction issue was presumably masked by >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer >> running without need"]). >> >> The problem is that with us re-enabling interrupts across handler >> invocation, a higher priority (guest) interrupt may trigger while >> handling a lower priority (internal) one. The EOI issued from >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly >> EOI the higher priority (guest) interrupt, breaking (among other >> things) pending EOI stack logic's assumptions. > > Maybe there's something that I'm missing, but shouldn't hypervisor > vectors always be higher priority than guest ones? Depends - IOMMU ones imo aren't something that needs urgently dealing with, so a little bit of delay won't hurt. There would only be a problem if such interrupts could be deferred indefinitely. > I see there's already a range reserved for high priority vectors > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > interrupts not using this range? We'd quickly run out of high priority vectors on systems with multiple (and perhaps indeed many) IOMMUs. > IMO it seems troublesome that pending guests vectors can delay the > injection of hypervisor interrupt vectors. As per above - depends. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 11:03, Jan Beulich wrote: > Notes: > > - In principle we could get away without the check_eoi_deferral flag. > I've introduced it just to make sure there's as little change as > possible to unaffected paths. > - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't > strictly necessary. I don't think the cpu_has_pending_apic_eoi() check is necessary. It is checked at the head of end_nonmaskable_irq() as well. Similarly, I'm not sure that check_eoi_deferral is something that we'd want to introduce. I'll drop both of these and test, seeing as I have a repro of the problem. > - The new function's name isn't very helpful with its use in > end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to > parallel ack_APIC_irq()), but then liked this even less. I don't have a better suggestion. > - Other than I first thought serial console IRQs shouldn't be > affected, as we run them on specifically allocated high priority > vectors. Good, although I don't think this wants to find its way into the final commit message. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. Maybe there's something that I'm missing, but shouldn't hypervisor vectors always be higher priority than guest ones? I see there's already a range reserved for high priority vectors ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor interrupts not using this range? IMO it seems troublesome that pending guests vectors can delay the injection of hypervisor interrupt vectors. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.19 12:19, Jan Beulich wrote: On 28.11.2019 12:03, Jan Beulich wrote: At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. - The new function's name isn't very helpful with its use in end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to parallel ack_APIC_irq()), but then liked this even less. - Other than I first thought serial console IRQs shouldn't be affected, as we run them on specifically allocated high priority vectors. Reported-by: Igor Druzhinin Diagnosed-by: Andrew Cooper Signed-off-by: Jan Beulich In case it's not explicit enough from the description: While the issue appears to be long standing, it looks to have been masked by other shortcomings prior to 4.13. Therefore this should be considered at least a perceived regression, even if it may not strictly be one. Assuming an Ack: in a timely manner: Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 12:03, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. > > Notes: > > - In principle we could get away without the check_eoi_deferral flag. > I've introduced it just to make sure there's as little change as > possible to unaffected paths. > - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't > strictly necessary. > - The new function's name isn't very helpful with its use in > end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to > parallel ack_APIC_irq()), but then liked this even less. > - Other than I first thought serial console IRQs shouldn't be > affected, as we run them on specifically allocated high priority > vectors. > > Reported-by: Igor Druzhinin > Diagnosed-by: Andrew Cooper > Signed-off-by: Jan Beulich In case it's not explicit enough from the description: While the issue appears to be long standing, it looks to have been masked by other shortcomings prior to 4.13. Therefore this should be considered at least a perceived regression, even if it may not strictly be one. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. - The new function's name isn't very helpful with its use in end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to parallel ack_APIC_irq()), but then liked this even less. - Other than I first thought serial console IRQs shouldn't be affected, as we run them on specifically allocated high priority vectors. Reported-by: Igor Druzhinin Diagnosed-by: Andrew Cooper Signed-off-by: Jan Beulich --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -1734,7 +1734,7 @@ static void end_level_ioapic_irq_new(str v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1)); -ack_APIC_irq(); +end_nonmaskable_irq(desc, vector); if ( (desc->status & IRQ_MOVE_PENDING) && !io_apic_level_ack_pending(desc->irq) ) --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -438,6 +438,7 @@ int __init init_irq_data(void) } static void __do_IRQ_guest(int vector); +static void flush_ready_eoi(void); static void ack_none(struct irq_desc *desc) { @@ -865,6 +866,7 @@ void pirq_set_affinity(struct domain *d, } DEFINE_PER_CPU(unsigned int, irq_count); +static DEFINE_PER_CPU(bool, check_eoi_deferral); uint8_t alloc_hipriority_vector(void) { @@ -1008,7 +1010,25 @@ void do_IRQ(struct cpu_user_regs *regs) out: if ( desc->handler->end ) +{ +/* + * If higher priority vectors still have their EOIs pending, we may + * not issue an EOI here, as this would EOI the highest priority one. + */ +if ( cpu_has_pending_apic_eoi() ) +{ +this_cpu(check_eoi_deferral) = true; +desc->handler->end(desc, vector); +this_cpu(check_eoi_deferral) = false; + +spin_unlock(>lock); +flush_ready_eoi(); +goto out_no_unlock; +} + desc->handler->end(desc, vector); +} + out_no_end: spin_unlock(>lock); out_no_unlock: @@ -1163,6 +1183,29 @@ bool cpu_has_pending_apic_eoi(void) return pending_eoi_sp(this_cpu(pending_eoi)) != 0; } +void end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector) +{ +struct pending_eoi *peoi = this_cpu(pending_eoi); +unsigned int sp = pending_eoi_sp(peoi); + +if ( !this_cpu(check_eoi_deferral) || !sp || peoi[sp - 1].vector < vector ) +{ +ack_APIC_irq(); +return; +} + +/* Defer this vector's EOI until all higher ones have been EOI-ed. */ +pending_eoi_sp(peoi) = sp + 1; +do { +peoi[sp] = peoi[sp - 1]; +} while ( --sp && peoi[sp - 1].vector > vector ); +ASSERT(!sp || peoi[sp - 1].vector < vector); + +peoi[sp].irq = desc->irq; +peoi[sp].vector = vector; +peoi[sp].ready = 1; +} + static inline void set_pirq_eoi(struct domain *d, unsigned int irq) { if ( d->arch.pirq_eoi_map ) --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -512,11 +512,6 @@ static void ack_maskable_msi_irq(struct ack_APIC_irq(); /* ACKTYPE_NONE */ } -void end_nonmaskable_msi_irq(struct irq_desc *desc, u8 vector) -{ -ack_APIC_irq(); /* ACKTYPE_EOI */ -} - /* * IRQ chip for MSI PCI/PCI-X/PCI-Express devices, * which implement the MSI or MSI-X capability structure. @@ -539,7 +534,7 @@ static hw_irq_controller pci_msi_nonmask .enable = irq_enable_none, .disable = irq_disable_none, .ack = ack_nonmaskable_msi_irq, -.end = end_nonmaskable_msi_irq, +.end = end_nonmaskable_irq, .set_affinity = set_msi_affinity }; --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -427,7 +427,7 @@ static unsigned int iommu_msi_startup(st static void iommu_msi_end(struct irq_desc *desc, u8 vector) { iommu_msi_unmask(desc); -ack_APIC_irq(); +end_nonmaskable_irq(desc, vector); } @@ -460,7 +460,7 @@ static void iommu_maskable_msi_shutdown( * maskable flavors here, as we want the ACK to be issued in ->end(). */ #define