Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-14 Thread Ian Campbell
On Mon, 2015-04-13 at 10:11 +0100, Jan Beulich wrote:
  On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
  In any case we should make it clear somewhere who is supposed to write
  to the command register (and other PCI reigsters) at any given time,
  otherwise it would be very easy for a new kernel update to break the
  hypervisor and we wouldn't even know why it happened.
 
 We should, but at this point in time this is going to be rather
 problematic. Such a separation of responsibilities should have been
 done before all the pass-through code got written.

Wasn't Stefano just asking to write down the current semantics, problems
and all?

I think that would be a wonderful idea even if for some parts the
documentation is things like:
  * we don't really know;
  * there are multiple parties X, Y and Z involved and it's a
mess;
  * currently qemu, but really ought to be Xen;
  * etc.

I think half the problem with coming up with a plan to move forward is
that exactly where we are now isn't very clear, so having a thing to
refer to there might be helpful above and beyond it being a good idea to
write these things down on principal.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Jan Beulich
 On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
 On Wed, 25 Mar 2015, Jan Beulich wrote:
 When a device gets detached from a guest, pciback will clear its
 command register, thus disabling both memory and I/O decoding. The
 disabled memory decoding, however, has an effect on the MSI-X table
 accesses the hypervisor does: These won't have the intended effect
 anymore. Even worse, for PCIe devices (but not SR-IOV virtual
 functions) such accesses may (will?) be treated as Unsupported
 Requests, causing respective errors to be surfaced, potentially in the
 form of NMIs that may be fatal to the hypervisor or Dom0 is different
 ways. Hence rather than carrying out these accesses, we should avoid
 them where we can, and use alternative (e.g. PCI config space based)
 mechanisms to achieve at least the same effect.
 
 I don't think that it is a good idea for both Xen and Linux to access
 the command register simultaneously.  Working around Linux in Xen
 doesn't sound like an optimal solution.   Maybe we could just fix the
 pciback and that would be enough.

I'm afraid that would just eliminate the specific case, but not the
general issue. While we trust Dom0 to not do outright bad things,
the hypervisor should still avoid doing things that can go wrong
due to the state a device is put (or left) in by Dom0.

 In any case we should make it clear somewhere who is supposed to write
 to the command register (and other PCI reigsters) at any given time,
 otherwise it would be very easy for a new kernel update to break the
 hypervisor and we wouldn't even know why it happened.

We should, but at this point in time this is going to be rather
problematic. Such a separation of responsibilities should have been
done before all the pass-through code got written.

 @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
  }
  break;
  case PCI_CAP_ID_MSIX:
 -{
 -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
 -writel(flag, entry-mask_base + offset);
 -readl(entry-mask_base + offset);
 -break;
 -}
 +if ( likely(memory_decoded(pdev)) )
 +{
 +writel(flag, entry-mask_base + 
 PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +break;
 +}
 +if ( flag )
 +{
 +u16 control;
 +domid_t domid = pdev-domain-domain_id;
 +
 +control = pci_conf_read16(seg, bus, slot, func,
 +  
 msix_control_reg(entry-msi_attrib.pos));
 +if ( control  PCI_MSIX_FLAGS_MASKALL )
 +break;
 +pci_conf_write16(seg, bus, slot, func,
 + msix_control_reg(entry-msi_attrib.pos),
 + control | PCI_MSIX_FLAGS_MASKALL);
 
 How is that going to interact with Linux (Dom0) writing to the command
 register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
 devices assigned to HVM guests. Could this cause any conflicts?

I don't see the relation to the comment register here - all quoted code
only deals with the MSI-X control register.

 One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
 as a matter of fact QEMU has done that for years and we cannot break
 that behaviour without introducing regressions.  In fact as it stands
 QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
 guests, not Xen.
 
 Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
 through devices to running domains?  I think that might be a good enough
 separation of responsibilities between Xen and QEMU.

No, the bits has to be under hypervisor control (just like any other
hardware bits controlling interrupts). Dealing with this is the subject
of patches I have ready, but didn't post yet.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Jan Beulich
 On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
  On Wed, 25 Mar 2015, Jan Beulich wrote:
  When a device gets detached from a guest, pciback will clear its
  command register, thus disabling both memory and I/O decoding. The
  disabled memory decoding, however, has an effect on the MSI-X table
  accesses the hypervisor does: These won't have the intended effect
  anymore. Even worse, for PCIe devices (but not SR-IOV virtual
  functions) such accesses may (will?) be treated as Unsupported
  Requests, causing respective errors to be surfaced, potentially in the
  form of NMIs that may be fatal to the hypervisor or Dom0 is different
  ways. Hence rather than carrying out these accesses, we should avoid
  them where we can, and use alternative (e.g. PCI config space based)
  mechanisms to achieve at least the same effect.
  
  I don't think that it is a good idea for both Xen and Linux to access
  the command register simultaneously.  Working around Linux in Xen
  doesn't sound like an optimal solution.   Maybe we could just fix the
  pciback and that would be enough.
 
 I'm afraid that would just eliminate the specific case, but not the
 general issue.
 
 If we trust Dom0 to do the right thing, then I don't think there is a
 general issue to be solved. Dom0 can break the system at any time, I
 don't see any differences here, unless we have a plan to actually be
 able to handle a misbehaving dom0, in that case I am all for it.

No, that gets us in the wrong direction. Dom0 can have legitimate
reasons to have to clear memory or I/O decoding on a device at
run time (even if current Linux doesn't do so). The more general
problem we may need to solve is that of racing config space
accesses (one by Dom0, the other by the hypervisor). But that's
beyond this series' scope.

 While we trust Dom0 to not do outright bad things,
 the hypervisor should still avoid doing things that can go wrong
 due to the state a device is put (or left) in by Dom0.
 
 Xen should also avoid doing things that can go wrong because of the
 state a device is put in by QEMU or other components in the system.
 There isn't much room for Xen to play with.

Qemu is either part of Dom0, or doesn't play with devices directly.

 And how are we going to deal with older unfixed QEMUs?
 So far we have been using the same policy for QEMU and the Dom0 kernel:
 Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
 just work.

I'm not sure that's really true for qemu, or if it is, then only by pure
luck: The tool stack interface of the hypervisor as well as the libxc
interfaces are subject to change between any two releases. I view
it as unavoidable to break older qemu here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Stefano Stabellini
On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
  On Wed, 25 Mar 2015, Jan Beulich wrote:
  When a device gets detached from a guest, pciback will clear its
  command register, thus disabling both memory and I/O decoding. The
  disabled memory decoding, however, has an effect on the MSI-X table
  accesses the hypervisor does: These won't have the intended effect
  anymore. Even worse, for PCIe devices (but not SR-IOV virtual
  functions) such accesses may (will?) be treated as Unsupported
  Requests, causing respective errors to be surfaced, potentially in the
  form of NMIs that may be fatal to the hypervisor or Dom0 is different
  ways. Hence rather than carrying out these accesses, we should avoid
  them where we can, and use alternative (e.g. PCI config space based)
  mechanisms to achieve at least the same effect.
  
  I don't think that it is a good idea for both Xen and Linux to access
  the command register simultaneously.  Working around Linux in Xen
  doesn't sound like an optimal solution.   Maybe we could just fix the
  pciback and that would be enough.
 
 I'm afraid that would just eliminate the specific case, but not the
 general issue.

If we trust Dom0 to do the right thing, then I don't think there is a
general issue to be solved. Dom0 can break the system at any time, I
don't see any differences here, unless we have a plan to actually be
able to handle a misbehaving dom0, in that case I am all for it.


 While we trust Dom0 to not do outright bad things,
 the hypervisor should still avoid doing things that can go wrong
 due to the state a device is put (or left) in by Dom0.

Xen should also avoid doing things that can go wrong because of the
state a device is put in by QEMU or other components in the system.
There isn't much room for Xen to play with.


  In any case we should make it clear somewhere who is supposed to write
  to the command register (and other PCI reigsters) at any given time,
  otherwise it would be very easy for a new kernel update to break the
  hypervisor and we wouldn't even know why it happened.
 
 We should, but at this point in time this is going to be rather
 problematic. Such a separation of responsibilities should have been
 done before all the pass-through code got written.

That is true.


  @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
   }
   break;
   case PCI_CAP_ID_MSIX:
  -{
  -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
  -writel(flag, entry-mask_base + offset);
  -readl(entry-mask_base + offset);
  -break;
  -}
  +if ( likely(memory_decoded(pdev)) )
  +{
  +writel(flag, entry-mask_base + 
  PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
  +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
  +break;
  +}
  +if ( flag )
  +{
  +u16 control;
  +domid_t domid = pdev-domain-domain_id;
  +
  +control = pci_conf_read16(seg, bus, slot, func,
  +  
  msix_control_reg(entry-msi_attrib.pos));
  +if ( control  PCI_MSIX_FLAGS_MASKALL )
  +break;
  +pci_conf_write16(seg, bus, slot, func,
  + msix_control_reg(entry-msi_attrib.pos),
  + control | PCI_MSIX_FLAGS_MASKALL);
  
  How is that going to interact with Linux (Dom0) writing to the command
  register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
  devices assigned to HVM guests. Could this cause any conflicts?
 
 I don't see the relation to the comment register here - all quoted code
 only deals with the MSI-X control register.

I meant control register, sorry for the confusion.


  One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
  as a matter of fact QEMU has done that for years and we cannot break
  that behaviour without introducing regressions.  In fact as it stands
  QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
  guests, not Xen.
  
  Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
  through devices to running domains?  I think that might be a good enough
  separation of responsibilities between Xen and QEMU.
 
 No, the bits has to be under hypervisor control (just like any other
 hardware bits controlling interrupts).

Maybe they should be under hypervisor control, but keep in mind that
they are not and they haven't been for years now.  We cannot go ahead
and break existing code paths and use cases. As it stands, QEMU is in
charge of PCI_MSIX_FLAGS_MASKALL for passthrough devices and this is a
regression.


 Dealing with this is the subject of patches I have ready, but didn't post yet.

OK, that is very good. But if we really want to make this change,
then I think that those patches would need to be part of this series so
that they can be committed 

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Stefano Stabellini
On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 13 Apr 2015, Jan Beulich wrote:
   On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
   On Wed, 25 Mar 2015, Jan Beulich wrote:
   When a device gets detached from a guest, pciback will clear its
   command register, thus disabling both memory and I/O decoding. The
   disabled memory decoding, however, has an effect on the MSI-X table
   accesses the hypervisor does: These won't have the intended effect
   anymore. Even worse, for PCIe devices (but not SR-IOV virtual
   functions) such accesses may (will?) be treated as Unsupported
   Requests, causing respective errors to be surfaced, potentially in the
   form of NMIs that may be fatal to the hypervisor or Dom0 is different
   ways. Hence rather than carrying out these accesses, we should avoid
   them where we can, and use alternative (e.g. PCI config space based)
   mechanisms to achieve at least the same effect.
   
   I don't think that it is a good idea for both Xen and Linux to access
   the command register simultaneously.  Working around Linux in Xen
   doesn't sound like an optimal solution.   Maybe we could just fix the
   pciback and that would be enough.
  
  I'm afraid that would just eliminate the specific case, but not the
  general issue.
  
  If we trust Dom0 to do the right thing, then I don't think there is a
  general issue to be solved. Dom0 can break the system at any time, I
  don't see any differences here, unless we have a plan to actually be
  able to handle a misbehaving dom0, in that case I am all for it.
 
 No, that gets us in the wrong direction. Dom0 can have legitimate
 reasons to have to clear memory or I/O decoding on a device at
 run time (even if current Linux doesn't do so). The more general
 problem we may need to solve is that of racing config space
 accesses (one by Dom0, the other by the hypervisor). But that's
 beyond this series' scope.

This is why I was asking for a document that describes who is in charge
of what and when. I don't think we can move forward without it.


  While we trust Dom0 to not do outright bad things,
  the hypervisor should still avoid doing things that can go wrong
  due to the state a device is put (or left) in by Dom0.
  
  Xen should also avoid doing things that can go wrong because of the
  state a device is put in by QEMU or other components in the system.
  There isn't much room for Xen to play with.
 
 Qemu is either part of Dom0, or doesn't play with devices directly.

I don't understand the point you are trying to make here.
If your intention is to point out that QEMU shouldn't be writing to the
control register, as I wrote earlier, the current codebase disagrees
with you and has been that way for years.


  And how are we going to deal with older unfixed QEMUs?
  So far we have been using the same policy for QEMU and the Dom0 kernel:
  Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
  just work.
 
 I'm not sure that's really true for qemu, or if it is, then only by pure
 luck:

This is false, it is so by design.  QEMU has an internal libxc
compatibility layer.


 The tool stack interface of the hypervisor as well as the libxc
 interfaces are subject to change between any two releases.

QEMU knows how to cope with libxc interface changes.


 I view it as unavoidable to break older qemu here.

I disagree and I am opposed to any patches series that would
deliberately break QEMU.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Jan Beulich
 On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 13 Apr 2015, Jan Beulich wrote:
  While we trust Dom0 to not do outright bad things,
  the hypervisor should still avoid doing things that can go wrong
  due to the state a device is put (or left) in by Dom0.
  
  Xen should also avoid doing things that can go wrong because of the
  state a device is put in by QEMU or other components in the system.
  There isn't much room for Xen to play with.
 
 Qemu is either part of Dom0, or doesn't play with devices directly.
 
 I don't understand the point you are trying to make here.
 If your intention is to point out that QEMU shouldn't be writing to the
 control register, as I wrote earlier, the current codebase disagrees
 with you and has been that way for years.

Yes, this was precisely my intention. Qemu having done a certain
thing for many years doesn't mean this is correct, and shouldn't
be fixed if broken.

  And how are we going to deal with older unfixed QEMUs?
  So far we have been using the same policy for QEMU and the Dom0 kernel:
  Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
  just work.
 
 I'm not sure that's really true for qemu, or if it is, then only by pure
 luck:
 
 This is false, it is so by design.  QEMU has an internal libxc
 compatibility layer.

Which could necessarily only ever be updated after the fact (of
a libxc change) and only ever in maintained branches. I.e. as
widely or as narrow as fixing the issues here would require.

 I view it as unavoidable to break older qemu here.
 
 I disagree and I am opposed to any patches series that would
 deliberately break QEMU.

If that was without also adjusting qemu I would understand you.
Considering that the issues here haven't been brought up just
yesterday, the lack of any substantial counter proposal on how
to fix this is signaling to me that there are little alternatives. Or
do you have any going beyond do it another way?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-13 Thread Stefano Stabellini
On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 13 Apr 2015, Jan Beulich wrote:
   On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
   On Mon, 13 Apr 2015, Jan Beulich wrote:
   While we trust Dom0 to not do outright bad things,
   the hypervisor should still avoid doing things that can go wrong
   due to the state a device is put (or left) in by Dom0.
   
   Xen should also avoid doing things that can go wrong because of the
   state a device is put in by QEMU or other components in the system.
   There isn't much room for Xen to play with.
  
  Qemu is either part of Dom0, or doesn't play with devices directly.
  
  I don't understand the point you are trying to make here.
  If your intention is to point out that QEMU shouldn't be writing to the
  control register, as I wrote earlier, the current codebase disagrees
  with you and has been that way for years.
 
 Yes, this was precisely my intention. Qemu having done a certain
 thing for many years doesn't mean this is correct, and shouldn't
 be fixed if broken.

Sure, as long as the fix doesn't cause regressions.

Do you have a specific example of how QEMU's behaviour is broken and
this patch is supposed to correct it?  I thought you were trying to
work-around a dom0 kernel issue with this patch, not a QEMU bug.

Working-around a Dom0 kernel bug by creating a QEMU bug is not a great
move.


   And how are we going to deal with older unfixed QEMUs?
   So far we have been using the same policy for QEMU and the Dom0 kernel:
   Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
   just work.
  
  I'm not sure that's really true for qemu, or if it is, then only by pure
  luck:
  
  This is false, it is so by design.  QEMU has an internal libxc
  compatibility layer.
 
 Which could necessarily only ever be updated after the fact (of
 a libxc change) and only ever in maintained branches. I.e. as
 widely or as narrow as fixing the issues here would require.

True, in fact we used it with older Xen releases, when upstream QEMU
wasn't the default device model yet. Nothing from Xen 4.2 onward has
required a change in QEMU and its compatibility shim and I would like to
keep it that way (note that Paul's ioreq server changes are not required
to build older QEMU versions against new Xen versions), unless we have a
very compelling reason.


  I view it as unavoidable to break older qemu here.
  
  I disagree and I am opposed to any patches series that would
  deliberately break QEMU.
 
 If that was without also adjusting qemu I would understand you.

You say that, but the initial submission only included the Xen
hypervisor changes.

I am not completely against stopping QEMU from writing to the control
register, but we cannot go ahead and commit patches to Xen that break
QEMU compatibility willy nilly.

At least we need to:

- review the QEMU and Xen patches together
- commit the Xen patches only after the QEMU patches have been reviewed
and acked
- we should consider making the QEMU changes acceptable for QEMU stable
trees and mark them for backport
- we need to write down which is the minimal version of QEMU that can
  work for PCI passthrough


 Considering that the issues here haven't been brought up just
 yesterday, the lack of any substantial counter proposal on how
 to fix this is signaling to me that there are little alternatives. Or
 do you have any going beyond do it another way?

TBH in case of this issue, I think that having the Dom0 kernel do the
right thing is good enough.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-04-02 Thread Stefano Stabellini
On Wed, 25 Mar 2015, Jan Beulich wrote:
 When a device gets detached from a guest, pciback will clear its
 command register, thus disabling both memory and I/O decoding. The
 disabled memory decoding, however, has an effect on the MSI-X table
 accesses the hypervisor does: These won't have the intended effect
 anymore. Even worse, for PCIe devices (but not SR-IOV virtual
 functions) such accesses may (will?) be treated as Unsupported
 Requests, causing respective errors to be surfaced, potentially in the
 form of NMIs that may be fatal to the hypervisor or Dom0 is different
 ways. Hence rather than carrying out these accesses, we should avoid
 them where we can, and use alternative (e.g. PCI config space based)
 mechanisms to achieve at least the same effect.

I don't think that it is a good idea for both Xen and Linux to access
the command register simultaneously.  Working around Linux in Xen
doesn't sound like an optimal solution.   Maybe we could just fix the
pciback and that would be enough.

In any case we should make it clear somewhere who is supposed to write
to the command register (and other PCI reigsters) at any given time,
otherwise it would be very easy for a new kernel update to break the
hypervisor and we wouldn't even know why it happened.

 
 @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
  }
  break;
  case PCI_CAP_ID_MSIX:
 -{
 -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
 -writel(flag, entry-mask_base + offset);
 -readl(entry-mask_base + offset);
 -break;
 -}
 +if ( likely(memory_decoded(pdev)) )
 +{
 +writel(flag, entry-mask_base + 
 PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +break;
 +}
 +if ( flag )
 +{
 +u16 control;
 +domid_t domid = pdev-domain-domain_id;
 +
 +control = pci_conf_read16(seg, bus, slot, func,
 +  
 msix_control_reg(entry-msi_attrib.pos));
 +if ( control  PCI_MSIX_FLAGS_MASKALL )
 +break;
 +pci_conf_write16(seg, bus, slot, func,
 + msix_control_reg(entry-msi_attrib.pos),
 + control | PCI_MSIX_FLAGS_MASKALL);

How is that going to interact with Linux (Dom0) writing to the command
register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
devices assigned to HVM guests. Could this cause any conflicts?

One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
as a matter of fact QEMU has done that for years and we cannot break
that behaviour without introducing regressions.  In fact as it stands
QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
guests, not Xen.

Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
through devices to running domains?  I think that might be a good enough
separation of responsibilities between Xen and QEMU.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-03-30 Thread Andrew Cooper
On 25/03/15 16:39, Jan Beulich wrote:
 When a device gets detached from a guest, pciback will clear its
 command register, thus disabling both memory and I/O decoding. The
 disabled memory decoding, however, has an effect on the MSI-X table
 accesses the hypervisor does: These won't have the intended effect
 anymore. Even worse, for PCIe devices (but not SR-IOV virtual
 functions) such accesses may (will?) be treated as Unsupported
 Requests, causing respective errors to be surfaced, potentially in the
 form of NMIs that may be fatal to the hypervisor or Dom0 is different
 ways. Hence rather than carrying out these accesses, we should avoid
 them where we can, and use alternative (e.g. PCI config space based)
 mechanisms to achieve at least the same effect.

 Signed-off-by: Jan Beulich jbeul...@suse.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

2015-03-25 Thread Jan Beulich
When a device gets detached from a guest, pciback will clear its
command register, thus disabling both memory and I/O decoding. The
disabled memory decoding, however, has an effect on the MSI-X table
accesses the hypervisor does: These won't have the intended effect
anymore. Even worse, for PCIe devices (but not SR-IOV virtual
functions) such accesses may (will?) be treated as Unsupported
Requests, causing respective errors to be surfaced, potentially in the
form of NMIs that may be fatal to the hypervisor or Dom0 is different
ways. Hence rather than carrying out these accesses, we should avoid
them where we can, and use alternative (e.g. PCI config space based)
mechanisms to achieve at least the same effect.

Signed-off-by: Jan Beulich jbeul...@suse.com
---
v2: mask_msi_irq()'s BUG() needs to be conditional upon IRQ_DISABLED
not already being set, as -shutdown() may be called on error paths
when the IRQ never got enabled. This in turn required the changes
to irq.c.

Backporting note (largely to myself):
   Depends on (not yet backported) commit 061eebe0e x86/MSI: drop
   workaround for insecure Dom0 kernels (due to re-use of struct
   arch_msix's warned field).

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -217,9 +217,9 @@ void destroy_irq(unsigned int irq)
 }
 
 spin_lock_irqsave(desc-lock, flags);
-desc-status  |= IRQ_DISABLED;
 desc-status  = ~IRQ_GUEST;
 desc-handler-shutdown(desc);
+desc-status |= IRQ_DISABLED;
 action = desc-action;
 desc-action  = NULL;
 desc-msi_desc = NULL;
@@ -995,8 +995,8 @@ void __init release_irq(unsigned int irq
 spin_lock_irqsave(desc-lock,flags);
 action = desc-action;
 desc-action  = NULL;
-desc-status |= IRQ_DISABLED;
 desc-handler-shutdown(desc);
+desc-status |= IRQ_DISABLED;
 spin_unlock_irqrestore(desc-lock,flags);
 
 /* Wait to make sure it's not being used on another CPU */
@@ -1725,8 +1725,8 @@ static irq_guest_action_t *__pirq_guest_
 BUG_ON(action-in_flight != 0);
 
 /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */
-desc-status |= IRQ_DISABLED;
 desc-handler-disable(desc);
+desc-status |= IRQ_DISABLED;
 
 /*
  * Mark any remaining pending EOIs as ready to flush.
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -121,6 +121,27 @@ static void msix_put_fixmap(struct arch_
 spin_unlock(msix-table_lock);
 }
 
+static bool_t memory_decoded(const struct pci_dev *dev)
+{
+u8 bus, slot, func;
+
+if ( !dev-info.is_virtfn )
+{
+bus = dev-bus;
+slot = PCI_SLOT(dev-devfn);
+func = PCI_FUNC(dev-devfn);
+}
+else
+{
+bus = dev-info.physfn.bus;
+slot = PCI_SLOT(dev-info.physfn.devfn);
+func = PCI_FUNC(dev-info.physfn.devfn);
+}
+
+return !!(pci_conf_read16(dev-seg, bus, slot, func, PCI_COMMAND) 
+  PCI_COMMAND_MEMORY);
+}
+
 /*
  * MSI message composition
  */
@@ -162,7 +183,7 @@ void msi_compose_msg(unsigned vector, co
 }
 }
 
-static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
+static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 switch ( entry-msi_attrib.type )
 {
@@ -198,6 +219,8 @@ static void read_msi_msg(struct msi_desc
 void __iomem *base;
 base = entry-mask_base;
 
+if ( unlikely(!memory_decoded(entry-dev)) )
+return 0;
 msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
 msg-data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
@@ -209,6 +232,8 @@ static void read_msi_msg(struct msi_desc
 
 if ( iommu_intremap )
 iommu_read_msi_from_ire(entry, msg);
+
+return 1;
 }
 
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
@@ -260,6 +285,8 @@ static int write_msi_msg(struct msi_desc
 void __iomem *base;
 base = entry-mask_base;
 
+if ( unlikely(!memory_decoded(entry-dev)) )
+return -ENXIO;
 writel(msg-address_lo,
base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 writel(msg-address_hi,
@@ -287,7 +314,8 @@ void set_msi_affinity(struct irq_desc *d
 ASSERT(spin_is_locked(desc-lock));
 
 memset(msg, 0, sizeof(msg));
-read_msi_msg(msi_desc, msg);
+if ( !read_msi_msg(msi_desc, msg) )
+return;
 
 msg.data = ~MSI_DATA_VECTOR_MASK;
 msg.data |= MSI_DATA_VECTOR(desc-arch.vector);
@@ -347,20 +375,24 @@ int msi_maskable_irq(const struct msi_de
|| entry-msi_attrib.maskbit;
 }
 
-static void msi_set_mask_bit(struct irq_desc *desc, int flag)
+static bool_t msi_set_mask_bit(struct irq_desc *desc, int flag)
 {
 struct msi_desc *entry = desc-msi_desc;
+struct pci_dev *pdev;
+u16 seg;
+u8 bus, slot, func;
 
 ASSERT(spin_is_locked(desc-lock));
 BUG_ON(!entry || !entry-dev);
+pdev = entry-dev;
+seg =