Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 2015-04-13 at 10:11 +0100, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. We should, but at this point in time this is going to be rather problematic. Such a separation of responsibilities should have been done before all the pass-through code got written. Wasn't Stefano just asking to write down the current semantics, problems and all? I think that would be a wonderful idea even if for some parts the documentation is things like: * we don't really know; * there are multiple parties X, Y and Z involved and it's a mess; * currently qemu, but really ought to be Xen; * etc. I think half the problem with coming up with a plan to move forward is that exactly where we are now isn't very clear, so having a thing to refer to there might be helpful above and beyond it being a good idea to write these things down on principal. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. We should, but at this point in time this is going to be rather problematic. Such a separation of responsibilities should have been done before all the pass-through code got written. @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_ } break; case PCI_CAP_ID_MSIX: -{ -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET; -writel(flag, entry-mask_base + offset); -readl(entry-mask_base + offset); -break; -} +if ( likely(memory_decoded(pdev)) ) +{ +writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +break; +} +if ( flag ) +{ +u16 control; +domid_t domid = pdev-domain-domain_id; + +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( control PCI_MSIX_FLAGS_MASKALL ) +break; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | PCI_MSIX_FLAGS_MASKALL); How is that going to interact with Linux (Dom0) writing to the command register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for devices assigned to HVM guests. Could this cause any conflicts? I don't see the relation to the comment register here - all quoted code only deals with the MSI-X control register. One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but as a matter of fact QEMU has done that for years and we cannot break that behaviour without introducing regressions. In fact as it stands QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM guests, not Xen. Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed through devices to running domains? I think that might be a good enough separation of responsibilities between Xen and QEMU. No, the bits has to be under hypervisor control (just like any other hardware bits controlling interrupts). Dealing with this is the subject of patches I have ready, but didn't post yet. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. No, that gets us in the wrong direction. Dom0 can have legitimate reasons to have to clear memory or I/O decoding on a device at run time (even if current Linux doesn't do so). The more general problem we may need to solve is that of racing config space accesses (one by Dom0, the other by the hypervisor). But that's beyond this series' scope. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: The tool stack interface of the hypervisor as well as the libxc interfaces are subject to change between any two releases. I view it as unavoidable to break older qemu here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. We should, but at this point in time this is going to be rather problematic. Such a separation of responsibilities should have been done before all the pass-through code got written. That is true. @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_ } break; case PCI_CAP_ID_MSIX: -{ -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET; -writel(flag, entry-mask_base + offset); -readl(entry-mask_base + offset); -break; -} +if ( likely(memory_decoded(pdev)) ) +{ +writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +break; +} +if ( flag ) +{ +u16 control; +domid_t domid = pdev-domain-domain_id; + +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( control PCI_MSIX_FLAGS_MASKALL ) +break; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | PCI_MSIX_FLAGS_MASKALL); How is that going to interact with Linux (Dom0) writing to the command register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for devices assigned to HVM guests. Could this cause any conflicts? I don't see the relation to the comment register here - all quoted code only deals with the MSI-X control register. I meant control register, sorry for the confusion. One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but as a matter of fact QEMU has done that for years and we cannot break that behaviour without introducing regressions. In fact as it stands QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM guests, not Xen. Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed through devices to running domains? I think that might be a good enough separation of responsibilities between Xen and QEMU. No, the bits has to be under hypervisor control (just like any other hardware bits controlling interrupts). Maybe they should be under hypervisor control, but keep in mind that they are not and they haven't been for years now. We cannot go ahead and break existing code paths and use cases. As it stands, QEMU is in charge of PCI_MSIX_FLAGS_MASKALL for passthrough devices and this is a regression. Dealing with this is the subject of patches I have ready, but didn't post yet. OK, that is very good. But if we really want to make this change, then I think that those patches would need to be part of this series so that they can be committed
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. No, that gets us in the wrong direction. Dom0 can have legitimate reasons to have to clear memory or I/O decoding on a device at run time (even if current Linux doesn't do so). The more general problem we may need to solve is that of racing config space accesses (one by Dom0, the other by the hypervisor). But that's beyond this series' scope. This is why I was asking for a document that describes who is in charge of what and when. I don't think we can move forward without it. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. I don't understand the point you are trying to make here. If your intention is to point out that QEMU shouldn't be writing to the control register, as I wrote earlier, the current codebase disagrees with you and has been that way for years. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: This is false, it is so by design. QEMU has an internal libxc compatibility layer. The tool stack interface of the hypervisor as well as the libxc interfaces are subject to change between any two releases. QEMU knows how to cope with libxc interface changes. I view it as unavoidable to break older qemu here. I disagree and I am opposed to any patches series that would deliberately break QEMU. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. I don't understand the point you are trying to make here. If your intention is to point out that QEMU shouldn't be writing to the control register, as I wrote earlier, the current codebase disagrees with you and has been that way for years. Yes, this was precisely my intention. Qemu having done a certain thing for many years doesn't mean this is correct, and shouldn't be fixed if broken. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: This is false, it is so by design. QEMU has an internal libxc compatibility layer. Which could necessarily only ever be updated after the fact (of a libxc change) and only ever in maintained branches. I.e. as widely or as narrow as fixing the issues here would require. I view it as unavoidable to break older qemu here. I disagree and I am opposed to any patches series that would deliberately break QEMU. If that was without also adjusting qemu I would understand you. Considering that the issues here haven't been brought up just yesterday, the lack of any substantial counter proposal on how to fix this is signaling to me that there are little alternatives. Or do you have any going beyond do it another way? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. I don't understand the point you are trying to make here. If your intention is to point out that QEMU shouldn't be writing to the control register, as I wrote earlier, the current codebase disagrees with you and has been that way for years. Yes, this was precisely my intention. Qemu having done a certain thing for many years doesn't mean this is correct, and shouldn't be fixed if broken. Sure, as long as the fix doesn't cause regressions. Do you have a specific example of how QEMU's behaviour is broken and this patch is supposed to correct it? I thought you were trying to work-around a dom0 kernel issue with this patch, not a QEMU bug. Working-around a Dom0 kernel bug by creating a QEMU bug is not a great move. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: This is false, it is so by design. QEMU has an internal libxc compatibility layer. Which could necessarily only ever be updated after the fact (of a libxc change) and only ever in maintained branches. I.e. as widely or as narrow as fixing the issues here would require. True, in fact we used it with older Xen releases, when upstream QEMU wasn't the default device model yet. Nothing from Xen 4.2 onward has required a change in QEMU and its compatibility shim and I would like to keep it that way (note that Paul's ioreq server changes are not required to build older QEMU versions against new Xen versions), unless we have a very compelling reason. I view it as unavoidable to break older qemu here. I disagree and I am opposed to any patches series that would deliberately break QEMU. If that was without also adjusting qemu I would understand you. You say that, but the initial submission only included the Xen hypervisor changes. I am not completely against stopping QEMU from writing to the control register, but we cannot go ahead and commit patches to Xen that break QEMU compatibility willy nilly. At least we need to: - review the QEMU and Xen patches together - commit the Xen patches only after the QEMU patches have been reviewed and acked - we should consider making the QEMU changes acceptable for QEMU stable trees and mark them for backport - we need to write down which is the minimal version of QEMU that can work for PCI passthrough Considering that the issues here haven't been brought up just yesterday, the lack of any substantial counter proposal on how to fix this is signaling to me that there are little alternatives. Or do you have any going beyond do it another way? TBH in case of this issue, I think that having the Dom0 kernel do the right thing is good enough. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_ } break; case PCI_CAP_ID_MSIX: -{ -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET; -writel(flag, entry-mask_base + offset); -readl(entry-mask_base + offset); -break; -} +if ( likely(memory_decoded(pdev)) ) +{ +writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +break; +} +if ( flag ) +{ +u16 control; +domid_t domid = pdev-domain-domain_id; + +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( control PCI_MSIX_FLAGS_MASKALL ) +break; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | PCI_MSIX_FLAGS_MASKALL); How is that going to interact with Linux (Dom0) writing to the command register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for devices assigned to HVM guests. Could this cause any conflicts? One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but as a matter of fact QEMU has done that for years and we cannot break that behaviour without introducing regressions. In fact as it stands QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM guests, not Xen. Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed through devices to running domains? I think that might be a good enough separation of responsibilities between Xen and QEMU. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 25/03/15 16:39, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. Signed-off-by: Jan Beulich jbeul...@suse.com --- v2: mask_msi_irq()'s BUG() needs to be conditional upon IRQ_DISABLED not already being set, as -shutdown() may be called on error paths when the IRQ never got enabled. This in turn required the changes to irq.c. Backporting note (largely to myself): Depends on (not yet backported) commit 061eebe0e x86/MSI: drop workaround for insecure Dom0 kernels (due to re-use of struct arch_msix's warned field). --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -217,9 +217,9 @@ void destroy_irq(unsigned int irq) } spin_lock_irqsave(desc-lock, flags); -desc-status |= IRQ_DISABLED; desc-status = ~IRQ_GUEST; desc-handler-shutdown(desc); +desc-status |= IRQ_DISABLED; action = desc-action; desc-action = NULL; desc-msi_desc = NULL; @@ -995,8 +995,8 @@ void __init release_irq(unsigned int irq spin_lock_irqsave(desc-lock,flags); action = desc-action; desc-action = NULL; -desc-status |= IRQ_DISABLED; desc-handler-shutdown(desc); +desc-status |= IRQ_DISABLED; spin_unlock_irqrestore(desc-lock,flags); /* Wait to make sure it's not being used on another CPU */ @@ -1725,8 +1725,8 @@ static irq_guest_action_t *__pirq_guest_ BUG_ON(action-in_flight != 0); /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */ -desc-status |= IRQ_DISABLED; desc-handler-disable(desc); +desc-status |= IRQ_DISABLED; /* * Mark any remaining pending EOIs as ready to flush. --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -121,6 +121,27 @@ static void msix_put_fixmap(struct arch_ spin_unlock(msix-table_lock); } +static bool_t memory_decoded(const struct pci_dev *dev) +{ +u8 bus, slot, func; + +if ( !dev-info.is_virtfn ) +{ +bus = dev-bus; +slot = PCI_SLOT(dev-devfn); +func = PCI_FUNC(dev-devfn); +} +else +{ +bus = dev-info.physfn.bus; +slot = PCI_SLOT(dev-info.physfn.devfn); +func = PCI_FUNC(dev-info.physfn.devfn); +} + +return !!(pci_conf_read16(dev-seg, bus, slot, func, PCI_COMMAND) + PCI_COMMAND_MEMORY); +} + /* * MSI message composition */ @@ -162,7 +183,7 @@ void msi_compose_msg(unsigned vector, co } } -static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) +static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) { switch ( entry-msi_attrib.type ) { @@ -198,6 +219,8 @@ static void read_msi_msg(struct msi_desc void __iomem *base; base = entry-mask_base; +if ( unlikely(!memory_decoded(entry-dev)) ) +return 0; msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); msg-data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET); @@ -209,6 +232,8 @@ static void read_msi_msg(struct msi_desc if ( iommu_intremap ) iommu_read_msi_from_ire(entry, msg); + +return 1; } static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) @@ -260,6 +285,8 @@ static int write_msi_msg(struct msi_desc void __iomem *base; base = entry-mask_base; +if ( unlikely(!memory_decoded(entry-dev)) ) +return -ENXIO; writel(msg-address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); writel(msg-address_hi, @@ -287,7 +314,8 @@ void set_msi_affinity(struct irq_desc *d ASSERT(spin_is_locked(desc-lock)); memset(msg, 0, sizeof(msg)); -read_msi_msg(msi_desc, msg); +if ( !read_msi_msg(msi_desc, msg) ) +return; msg.data = ~MSI_DATA_VECTOR_MASK; msg.data |= MSI_DATA_VECTOR(desc-arch.vector); @@ -347,20 +375,24 @@ int msi_maskable_irq(const struct msi_de || entry-msi_attrib.maskbit; } -static void msi_set_mask_bit(struct irq_desc *desc, int flag) +static bool_t msi_set_mask_bit(struct irq_desc *desc, int flag) { struct msi_desc *entry = desc-msi_desc; +struct pci_dev *pdev; +u16 seg; +u8 bus, slot, func; ASSERT(spin_is_locked(desc-lock)); BUG_ON(!entry || !entry-dev); +pdev = entry-dev; +seg =