RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-18 Thread David Laight
From: Alex Williamson
> Sent: 17 December 2015 21:07
...
> > Is this all related to the statements in the PCI(e) spec that the
> > MSI-X table and Pending bit array should in their own BARs?
> > (ISTR it even suggests a BAR each.)
> >
> > Since the MSI-X table exists in device memory/registers there is
> > nothing to stop the device modifying the table contents (or even
> > ignoring the contents and writing address+data pairs that are known
> > to reference the CPUs MSI-X interrupt generation logic).
> >
> > We've an fpga based PCIe slave that has some additional PCIe slaves
> > (associated with the interrupt generation logic) that are currently
> > next to the PBA (which is 8k from the MSI-X table).
> > If we can't map the PBA we can't actually raise any interrupts.
> > The same would be true if page size is 64k and mapping the MSI-X
> > table banned.
> >
> > Do we need to change our PCIe slave address map so we don't need
> > to access anything in the same page (which might be 64k were we to
> > target large ppc - which we don't at the moment) as both the
> > MSI-X table and the PBA?
> >
> > I'd also note that being able to read the MSI-X table is a useful
> > diagnostic that the relevant interrupts are enabled properly.
> 
> Yes, the spec requirement is that MSI-X structures must reside in a 4k
> aligned area that doesn't overlap with other configuration registers
> for the device.  It's only an advisement to put them into their own
> BAR, and 4k clearly wasn't as forward looking as we'd hope.  Vfio
> doesn't particularly care about the PBA, but if it resides in the same
> host PAGE_SIZE area as the MSI-X vector table, you currently won't be
> able to get to it.  Most devices are not at all dependent on the PBA
> for any sort of functionality.

Having some hint in the spec as to why these mapping rules might be
needed would have been useful.

> It's really more correct to say that both the vector table and PBA are
> emulated by QEMU than paravirtualized.  Only PPC64 has the guest OS
> taking a paravirtual path to program the vector table, everyone else
> attempts to read/write to the device MMIO space, which gets trapped and
> emulated in QEMU.  This is why the QEMU side patch has further ugly
> hacks to mess with the ordering of MemoryRegions since even if we can
> access and mmap the MSI-X vector table, we'll still trap into QEMU for
> emulation.

Thanks for that explanation.

> How exactly does the ability to map the PBA affect your ability to
> raise an interrupt?

There are other registers for the logic block that converts internal
interrupt requests into the PCIe writes in the locations following the PBA.
These include interrupt enable bits, and the ability to remove pending
interrupt requests (and other stuff for testing the interrupt block).
Yes I know I probably shouldn't have done that, but it all worked.

At least it is better than the previous version of the hardware that
required the driver read back the MSI-X table entries in order to
set up an on-board PTE to convert a 32bit on-board address to the
64bit PCIe address needed for the MSI-X.

I'll 'fix' our board by making both the MSI-X table and PBA area
accessible through one of the other BARs. (Annoyingly this will
be confusing because the BAR offsets will have to differ.)
Note that this makes a complete mockery of disallowing the mapping
in the first place.

David


N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread yongji xie



On 2015/12/17 4:14, Alex Williamson wrote:

On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap MSI-X table in
case that user get to touch this directly.

However, EEH mechanism could ensure that a given pci device
can only shoot the MSIs assigned for its PE and guest kernel also
would not write to MSI-X table in pci_enable_msix() because
para-virtualization on PPC64 platform. So MSI-X table is safe to
access directly from the guest with EEH mechanism enabled.

The MSI-X table is paravirtualized on vfio in general and interrupt
remapping theoretically protects against errant interrupts, so why is
this PPC64 specific?  We have the same safeguards on x86 if we want to
decide they're sufficient.  Offhand, the only way I can think that a
device can touch the MSI-X table is via backdoors or p2p DMA with
another device.

Maybe I didn't make my point clear. The reasons why we can mmap MSI-X
table on PPC64 are:

1. EEH mechanism could ensure that a given pci device can only shoot
the MSIs assigned for its PE. So it would not do harm to other memory
space when the guest write a garbage MSI-X address/data to the vector table
if we passthough MSI-X tables to guest.

2. The guest kernel would not write to MSI-X table on PPC64 platform
when device drivers call pci_enable_msix() to initialize MSI-X interrupts.

So I think it is safe to mmap/passthrough MSI-X table on PPC64 platform.
And I'm not sure whether other architectures can ensure these two 
points. Thanks.


Regards
Yongji Xie

This patch adds support for this case and allow to mmap MSI-X
table if EEH is supported on PPC64 platform.

And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify
userspace that it's safe to mmap MSI-X table.

Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |5 -
  drivers/vfio/pci/vfio_pci_private.h |5 +
  include/uapi/linux/vfio.h   |2 ++
  3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index dbcad99..85d9980 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data,
if (vfio_pci_bar_page_aligned())
info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
  
+		if (vfio_msix_table_mmap_enabled())

+   info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP;
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
  
@@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)

if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;
  
-	if (index == vdev->msix_bar) {

+   if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 319352a..835619e 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void)
return IS_ENABLED(CONFIG_PPC64);
  }
  
+static inline bool vfio_msix_table_mmap_enabled(void)

+{
+   return IS_ENABLED(CONFIG_EEH);
+}

I really dislike these.


+
  extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
  extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
  
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h

index 1fc8066..289e662 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -173,6 +173,8 @@ struct vfio_device_info {
  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device */
  /* Platform support all PCI MMIO BARs to be page aligned */
  #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED(1 << 4)
+/* Platform support mmapping PCI MSI-X vector table */
+#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP(1 << 5)

Again, not sure why this is on the device versus the region, but I'd
prefer to investigate whether we can handle this with the sparse mmap
capability (or lack of) in the capability chains I proposed[1]. Thanks,

Alex

[1] https://lkml.org/lkml/2015/11/23/748


Good idea! I wiil investigate it. Thanks.

Regards
Yongji Xie

__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
  };


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Alex Williamson
On Thu, 2015-12-17 at 18:37 +0800, yongji xie wrote:
> 
> On 2015/12/17 4:14, Alex Williamson wrote:
> > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:
> > > Current vfio-pci implementation disallows to mmap MSI-X table in
> > > case that user get to touch this directly.
> > > 
> > > However, EEH mechanism could ensure that a given pci device
> > > can only shoot the MSIs assigned for its PE and guest kernel also
> > > would not write to MSI-X table in pci_enable_msix() because
> > > para-virtualization on PPC64 platform. So MSI-X table is safe to
> > > access directly from the guest with EEH mechanism enabled.
> > The MSI-X table is paravirtualized on vfio in general and interrupt
> > remapping theoretically protects against errant interrupts, so why
> > is
> > this PPC64 specific?  We have the same safeguards on x86 if we want
> > to
> > decide they're sufficient.  Offhand, the only way I can think that
> > a
> > device can touch the MSI-X table is via backdoors or p2p DMA with
> > another device.
> Maybe I didn't make my point clear. The reasons why we can mmap MSI-X
> table on PPC64 are:
> 
> 1. EEH mechanism could ensure that a given pci device can only shoot
> the MSIs assigned for its PE. So it would not do harm to other memory
> space when the guest write a garbage MSI-X address/data to the vector
> table
> if we passthough MSI-X tables to guest.

Interrupt remapping does the same on x86.

> 2. The guest kernel would not write to MSI-X table on PPC64 platform
> when device drivers call pci_enable_msix() to initialize MSI-X
> interrupts.

This is irrelevant to the vfio API.  vfio is a userspace driver
interface, QEMU is just one possible consumer of the interface.  Even
in the case of PPC64 & QEMU, the guest is still capable of writing to
the vector table, it just probably won't.

> So I think it is safe to mmap/passthrough MSI-X table on PPC64
> platform.
> And I'm not sure whether other architectures can ensure these two 
> points. 

There is another consideration, which is the API exposed to the user.
 vfio currently enforces interrupt setup through ioctls by making the
PCI mechanisms for interrupt programming inaccessible through the
device regions.  Ignoring that you are only focused on PPC64 with QEMU,
does it make sense for the vfio API to allow a user to manipulate
interrupt programming in a way that not only will not work, but in a
way that we expect to fail and require error isolation to recover from?
 I can't say I'm fully convinced that a footnote in the documentation
is sufficient for that.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Alex Williamson
On Thu, 2015-12-17 at 10:08 +, David Laight wrote:
> > The MSI-X table is paravirtualized on vfio in general and interrupt
> > remapping theoretically protects against errant interrupts, so why
> > is
> > this PPC64 specific? We have the same safeguards on x86 if we want
> > to
> > decide they're sufficient. Offhand, the only way I can think that a
> > device can touch the MSI-X table is via backdoors or p2p DMA with
> > another device.
> 
> Is this all related to the statements in the PCI(e) spec that the
> MSI-X table and Pending bit array should in their own BARs?
> (ISTR it even suggests a BAR each.)
> 
> Since the MSI-X table exists in device memory/registers there is
> nothing to stop the device modifying the table contents (or even
> ignoring the contents and writing address+data pairs that are known
> to reference the CPUs MSI-X interrupt generation logic).
> 
> We've an fpga based PCIe slave that has some additional PCIe slaves
> (associated with the interrupt generation logic) that are currently
> next to the PBA (which is 8k from the MSI-X table).
> If we can't map the PBA we can't actually raise any interrupts.
> The same would be true if page size is 64k and mapping the MSI-X
> table banned.
> 
> Do we need to change our PCIe slave address map so we don't need
> to access anything in the same page (which might be 64k were we to
> target large ppc - which we don't at the moment) as both the
> MSI-X table and the PBA?
> 
> I'd also note that being able to read the MSI-X table is a useful
> diagnostic that the relevant interrupts are enabled properly.

Yes, the spec requirement is that MSI-X structures must reside in a 4k
aligned area that doesn't overlap with other configuration registers
for the device.  It's only an advisement to put them into their own
BAR, and 4k clearly wasn't as forward looking as we'd hope.  Vfio
doesn't particularly care about the PBA, but if it resides in the same
host PAGE_SIZE area as the MSI-X vector table, you currently won't be
able to get to it.  Most devices are not at all dependent on the PBA
for any sort of functionality.

It's really more correct to say that both the vector table and PBA are
emulated by QEMU than paravirtualized.  Only PPC64 has the guest OS
taking a paravirtual path to program the vector table, everyone else
attempts to read/write to the device MMIO space, which gets trapped and
emulated in QEMU.  This is why the QEMU side patch has further ugly
hacks to mess with the ordering of MemoryRegions since even if we can
access and mmap the MSI-X vector table, we'll still trap into QEMU for
emulation.

How exactly does the ability to map the PBA affect your ability to
raise an interrupt?  I can only think that maybe you're writing PBA
bits to clear them, but the spec indicates that software should never
write to the PBA, only read, and that writes are undefined.  So that
would be very non-standard, QEMU drops writes, they don't even make it
to the hardware.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Benjamin Herrenschmidt
On Thu, 2015-12-17 at 14:41 -0700, Alex Williamson wrote:
> > So I think it is safe to mmap/passthrough MSI-X table on PPC64
> > platform.
> > And I'm not sure whether other architectures can ensure these two 
> > points. 
> 
> There is another consideration, which is the API exposed to the user.
>  vfio currently enforces interrupt setup through ioctls by making the
> PCI mechanisms for interrupt programming inaccessible through the
> device regions.  Ignoring that you are only focused on PPC64 with QEMU,
> does it make sense for the vfio API to allow a user to manipulate
> interrupt programming in a way that not only will not work, but in a
> way that we expect to fail and require error isolation to recover from?
>  I can't say I'm fully convinced that a footnote in the documentation
> is sufficient for that.  Thanks,

Well, one could argue that the "isolation" provided by qemu here is
fairly weak anyway ;-)

I mean. .. how do you know the device doesn't have a backdoor path into
that table via some other MMIO registers anyway ?

In any case, the HW isolation on platforms like pseries means that the
worst the guest can do si shoot itself in the foot. Big deal. On the
other hand, not bothering with intercepting the table has benefits,
such as reducing the memory region clutter, but also removing all the
massive performacne problems we see because adapters have critical
registers in the same 64K page as the MSI-X table.

So I don't think there is any question here that we *need* that
functionality in power. The filtering of the table by Qemu doesn't
provide any practical benefit, it just gets in the way.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread David Laight
> The MSI-X table is paravirtualized on vfio in general and interrupt
> remapping theoretically protects against errant interrupts, so why is
> this PPC64 specific? We have the same safeguards on x86 if we want to
> decide they're sufficient. Offhand, the only way I can think that a
> device can touch the MSI-X table is via backdoors or p2p DMA with
> another device.

Is this all related to the statements in the PCI(e) spec that the
MSI-X table and Pending bit array should in their own BARs?
(ISTR it even suggests a BAR each.)

Since the MSI-X table exists in device memory/registers there is
nothing to stop the device modifying the table contents (or even
ignoring the contents and writing address+data pairs that are known
to reference the CPUs MSI-X interrupt generation logic).

We've an fpga based PCIe slave that has some additional PCIe slaves
(associated with the interrupt generation logic) that are currently
next to the PBA (which is 8k from the MSI-X table).
If we can't map the PBA we can't actually raise any interrupts.
The same would be true if page size is 64k and mapping the MSI-X
table banned.

Do we need to change our PCIe slave address map so we don't need
to access anything in the same page (which might be 64k were we to
target large ppc - which we don't at the moment) as both the
MSI-X table and the PBA?

I'd also note that being able to read the MSI-X table is a useful
diagnostic that the relevant interrupts are enabled properly.

David



Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-16 Thread Alex Williamson
On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:
> Current vfio-pci implementation disallows to mmap MSI-X table in
> case that user get to touch this directly.
> 
> However, EEH mechanism could ensure that a given pci device
> can only shoot the MSIs assigned for its PE and guest kernel also
> would not write to MSI-X table in pci_enable_msix() because
> para-virtualization on PPC64 platform. So MSI-X table is safe to
> access directly from the guest with EEH mechanism enabled.

The MSI-X table is paravirtualized on vfio in general and interrupt
remapping theoretically protects against errant interrupts, so why is
this PPC64 specific?  We have the same safeguards on x86 if we want to
decide they're sufficient.  Offhand, the only way I can think that a
device can touch the MSI-X table is via backdoors or p2p DMA with
another device.

> This patch adds support for this case and allow to mmap MSI-X
> table if EEH is supported on PPC64 platform.
> 
> And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify
> userspace that it's safe to mmap MSI-X table.
> 
> Signed-off-by: Yongji Xie 
> ---
>  drivers/vfio/pci/vfio_pci.c |5 -
>  drivers/vfio/pci/vfio_pci_private.h |5 +
>  include/uapi/linux/vfio.h   |2 ++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index dbcad99..85d9980 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data,
>   if (vfio_pci_bar_page_aligned())
>   info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
>  
> + if (vfio_msix_table_mmap_enabled())
> + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP;
> +
>   info.num_regions = VFIO_PCI_NUM_REGIONS;
>   info.num_irqs = VFIO_PCI_NUM_IRQS;
>  
> @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct 
> vm_area_struct *vma)
>   if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
>   return -EINVAL;
>  
> - if (index == vdev->msix_bar) {
> + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) {
>   /*
>    * Disallow mmaps overlapping the MSI-X table; users don't
>    * get to touch this directly.  We could find somewhere
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 319352a..835619e 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void)
>   return IS_ENABLED(CONFIG_PPC64);
>  }
>  
> +static inline bool vfio_msix_table_mmap_enabled(void)
> +{
> + return IS_ENABLED(CONFIG_EEH);
> +}

I really dislike these.

> +
>  extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
>  extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
>  
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 1fc8066..289e662 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -173,6 +173,8 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3) /* vfio-amba device */
>  /* Platform support all PCI MMIO BARs to be page aligned */
>  #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED   (1 << 4)
> +/* Platform support mmapping PCI MSI-X vector table */
> +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP  (1 << 5)

Again, not sure why this is on the device versus the region, but I'd
prefer to investigate whether we can handle this with the sparse mmap
capability (or lack of) in the capability chains I proposed[1]. Thanks,

Alex

[1] https://lkml.org/lkml/2015/11/23/748

>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>  };

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html