RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
From: Alex Williamson > Sent: 17 December 2015 21:07 ... > > Is this all related to the statements in the PCI(e) spec that the > > MSI-X table and Pending bit array should in their own BARs? > > (ISTR it even suggests a BAR each.) > > > > Since the MSI-X table exists in device memory/registers there is > > nothing to stop the device modifying the table contents (or even > > ignoring the contents and writing address+data pairs that are known > > to reference the CPUs MSI-X interrupt generation logic). > > > > We've an fpga based PCIe slave that has some additional PCIe slaves > > (associated with the interrupt generation logic) that are currently > > next to the PBA (which is 8k from the MSI-X table). > > If we can't map the PBA we can't actually raise any interrupts. > > The same would be true if page size is 64k and mapping the MSI-X > > table banned. > > > > Do we need to change our PCIe slave address map so we don't need > > to access anything in the same page (which might be 64k were we to > > target large ppc - which we don't at the moment) as both the > > MSI-X table and the PBA? > > > > I'd also note that being able to read the MSI-X table is a useful > > diagnostic that the relevant interrupts are enabled properly. > > Yes, the spec requirement is that MSI-X structures must reside in a 4k > aligned area that doesn't overlap with other configuration registers > for the device. It's only an advisement to put them into their own > BAR, and 4k clearly wasn't as forward looking as we'd hope. Vfio > doesn't particularly care about the PBA, but if it resides in the same > host PAGE_SIZE area as the MSI-X vector table, you currently won't be > able to get to it. Most devices are not at all dependent on the PBA > for any sort of functionality. Having some hint in the spec as to why these mapping rules might be needed would have been useful. > It's really more correct to say that both the vector table and PBA are > emulated by QEMU than paravirtualized. Only PPC64 has the guest OS > taking a paravirtual path to program the vector table, everyone else > attempts to read/write to the device MMIO space, which gets trapped and > emulated in QEMU. This is why the QEMU side patch has further ugly > hacks to mess with the ordering of MemoryRegions since even if we can > access and mmap the MSI-X vector table, we'll still trap into QEMU for > emulation. Thanks for that explanation. > How exactly does the ability to map the PBA affect your ability to > raise an interrupt? There are other registers for the logic block that converts internal interrupt requests into the PCIe writes in the locations following the PBA. These include interrupt enable bits, and the ability to remove pending interrupt requests (and other stuff for testing the interrupt block). Yes I know I probably shouldn't have done that, but it all worked. At least it is better than the previous version of the hardware that required the driver read back the MSI-X table entries in order to set up an on-board PTE to convert a 32bit on-board address to the 64bit PCIe address needed for the MSI-X. I'll 'fix' our board by making both the MSI-X table and PBA area accessible through one of the other BARs. (Annoyingly this will be confusing because the BAR offsets will have to differ.) Note that this makes a complete mockery of disallowing the mapping in the first place. David N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On Thu, 2015-12-17 at 14:41 -0700, Alex Williamson wrote: > > So I think it is safe to mmap/passthrough MSI-X table on PPC64 > > platform. > > And I'm not sure whether other architectures can ensure these two > > points. > > There is another consideration, which is the API exposed to the user. > vfio currently enforces interrupt setup through ioctls by making the > PCI mechanisms for interrupt programming inaccessible through the > device regions. Ignoring that you are only focused on PPC64 with QEMU, > does it make sense for the vfio API to allow a user to manipulate > interrupt programming in a way that not only will not work, but in a > way that we expect to fail and require error isolation to recover from? > I can't say I'm fully convinced that a footnote in the documentation > is sufficient for that. Thanks, Well, one could argue that the "isolation" provided by qemu here is fairly weak anyway ;-) I mean. .. how do you know the device doesn't have a backdoor path into that table via some other MMIO registers anyway ? In any case, the HW isolation on platforms like pseries means that the worst the guest can do si shoot itself in the foot. Big deal. On the other hand, not bothering with intercepting the table has benefits, such as reducing the memory region clutter, but also removing all the massive performacne problems we see because adapters have critical registers in the same 64K page as the MSI-X table. So I don't think there is any question here that we *need* that functionality in power. The filtering of the table by Qemu doesn't provide any practical benefit, it just gets in the way. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On Thu, 2015-12-17 at 18:37 +0800, yongji xie wrote: > > On 2015/12/17 4:14, Alex Williamson wrote: > > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: > > > Current vfio-pci implementation disallows to mmap MSI-X table in > > > case that user get to touch this directly. > > > > > > However, EEH mechanism could ensure that a given pci device > > > can only shoot the MSIs assigned for its PE and guest kernel also > > > would not write to MSI-X table in pci_enable_msix() because > > > para-virtualization on PPC64 platform. So MSI-X table is safe to > > > access directly from the guest with EEH mechanism enabled. > > The MSI-X table is paravirtualized on vfio in general and interrupt > > remapping theoretically protects against errant interrupts, so why > > is > > this PPC64 specific? We have the same safeguards on x86 if we want > > to > > decide they're sufficient. Offhand, the only way I can think that > > a > > device can touch the MSI-X table is via backdoors or p2p DMA with > > another device. > Maybe I didn't make my point clear. The reasons why we can mmap MSI-X > table on PPC64 are: > > 1. EEH mechanism could ensure that a given pci device can only shoot > the MSIs assigned for its PE. So it would not do harm to other memory > space when the guest write a garbage MSI-X address/data to the vector > table > if we passthough MSI-X tables to guest. Interrupt remapping does the same on x86. > 2. The guest kernel would not write to MSI-X table on PPC64 platform > when device drivers call pci_enable_msix() to initialize MSI-X > interrupts. This is irrelevant to the vfio API. vfio is a userspace driver interface, QEMU is just one possible consumer of the interface. Even in the case of PPC64 & QEMU, the guest is still capable of writing to the vector table, it just probably won't. > So I think it is safe to mmap/passthrough MSI-X table on PPC64 > platform. > And I'm not sure whether other architectures can ensure these two > points. There is another consideration, which is the API exposed to the user. vfio currently enforces interrupt setup through ioctls by making the PCI mechanisms for interrupt programming inaccessible through the device regions. Ignoring that you are only focused on PPC64 with QEMU, does it make sense for the vfio API to allow a user to manipulate interrupt programming in a way that not only will not work, but in a way that we expect to fail and require error isolation to recover from? I can't say I'm fully convinced that a footnote in the documentation is sufficient for that. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On Thu, 2015-12-17 at 10:08 +, David Laight wrote: > > The MSI-X table is paravirtualized on vfio in general and interrupt > > remapping theoretically protects against errant interrupts, so why > > is > > this PPC64 specific? We have the same safeguards on x86 if we want > > to > > decide they're sufficient. Offhand, the only way I can think that a > > device can touch the MSI-X table is via backdoors or p2p DMA with > > another device. > > Is this all related to the statements in the PCI(e) spec that the > MSI-X table and Pending bit array should in their own BARs? > (ISTR it even suggests a BAR each.) > > Since the MSI-X table exists in device memory/registers there is > nothing to stop the device modifying the table contents (or even > ignoring the contents and writing address+data pairs that are known > to reference the CPUs MSI-X interrupt generation logic). > > We've an fpga based PCIe slave that has some additional PCIe slaves > (associated with the interrupt generation logic) that are currently > next to the PBA (which is 8k from the MSI-X table). > If we can't map the PBA we can't actually raise any interrupts. > The same would be true if page size is 64k and mapping the MSI-X > table banned. > > Do we need to change our PCIe slave address map so we don't need > to access anything in the same page (which might be 64k were we to > target large ppc - which we don't at the moment) as both the > MSI-X table and the PBA? > > I'd also note that being able to read the MSI-X table is a useful > diagnostic that the relevant interrupts are enabled properly. Yes, the spec requirement is that MSI-X structures must reside in a 4k aligned area that doesn't overlap with other configuration registers for the device. It's only an advisement to put them into their own BAR, and 4k clearly wasn't as forward looking as we'd hope. Vfio doesn't particularly care about the PBA, but if it resides in the same host PAGE_SIZE area as the MSI-X vector table, you currently won't be able to get to it. Most devices are not at all dependent on the PBA for any sort of functionality. It's really more correct to say that both the vector table and PBA are emulated by QEMU than paravirtualized. Only PPC64 has the guest OS taking a paravirtual path to program the vector table, everyone else attempts to read/write to the device MMIO space, which gets trapped and emulated in QEMU. This is why the QEMU side patch has further ugly hacks to mess with the ordering of MemoryRegions since even if we can access and mmap the MSI-X vector table, we'll still trap into QEMU for emulation. How exactly does the ability to map the PBA affect your ability to raise an interrupt? I can only think that maybe you're writing PBA bits to clear them, but the spec indicates that software should never write to the PBA, only read, and that writes are undefined. So that would be very non-standard, QEMU drops writes, they don't even make it to the hardware. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
> The MSI-X table is paravirtualized on vfio in general and interrupt > remapping theoretically protects against errant interrupts, so why is > this PPC64 specific? We have the same safeguards on x86 if we want to > decide they're sufficient. Offhand, the only way I can think that a > device can touch the MSI-X table is via backdoors or p2p DMA with > another device. Is this all related to the statements in the PCI(e) spec that the MSI-X table and Pending bit array should in their own BARs? (ISTR it even suggests a BAR each.) Since the MSI-X table exists in device memory/registers there is nothing to stop the device modifying the table contents (or even ignoring the contents and writing address+data pairs that are known to reference the CPUs MSI-X interrupt generation logic). We've an fpga based PCIe slave that has some additional PCIe slaves (associated with the interrupt generation logic) that are currently next to the PBA (which is 8k from the MSI-X table). If we can't map the PBA we can't actually raise any interrupts. The same would be true if page size is 64k and mapping the MSI-X table banned. Do we need to change our PCIe slave address map so we don't need to access anything in the same page (which might be 64k were we to target large ppc - which we don't at the moment) as both the MSI-X table and the PBA? I'd also note that being able to read the MSI-X table is a useful diagnostic that the relevant interrupts are enabled properly. David
Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On 2015/12/17 4:14, Alex Williamson wrote: On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: Current vfio-pci implementation disallows to mmap MSI-X table in case that user get to touch this directly. However, EEH mechanism could ensure that a given pci device can only shoot the MSIs assigned for its PE and guest kernel also would not write to MSI-X table in pci_enable_msix() because para-virtualization on PPC64 platform. So MSI-X table is safe to access directly from the guest with EEH mechanism enabled. The MSI-X table is paravirtualized on vfio in general and interrupt remapping theoretically protects against errant interrupts, so why is this PPC64 specific? We have the same safeguards on x86 if we want to decide they're sufficient. Offhand, the only way I can think that a device can touch the MSI-X table is via backdoors or p2p DMA with another device. Maybe I didn't make my point clear. The reasons why we can mmap MSI-X table on PPC64 are: 1. EEH mechanism could ensure that a given pci device can only shoot the MSIs assigned for its PE. So it would not do harm to other memory space when the guest write a garbage MSI-X address/data to the vector table if we passthough MSI-X tables to guest. 2. The guest kernel would not write to MSI-X table on PPC64 platform when device drivers call pci_enable_msix() to initialize MSI-X interrupts. So I think it is safe to mmap/passthrough MSI-X table on PPC64 platform. And I'm not sure whether other architectures can ensure these two points. Thanks. Regards Yongji Xie This patch adds support for this case and allow to mmap MSI-X table if EEH is supported on PPC64 platform. And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify userspace that it's safe to mmap MSI-X table. Signed-off-by: Yongji Xie --- drivers/vfio/pci/vfio_pci.c |5 - drivers/vfio/pci/vfio_pci_private.h |5 + include/uapi/linux/vfio.h |2 ++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index dbcad99..85d9980 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, if (vfio_pci_bar_page_aligned()) info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; + if (vfio_msix_table_mmap_enabled()) + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; + info.num_regions = VFIO_PCI_NUM_REGIONS; info.num_irqs = VFIO_PCI_NUM_IRQS; @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 319352a..835619e 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) return IS_ENABLED(CONFIG_PPC64); } +static inline bool vfio_msix_table_mmap_enabled(void) +{ + return IS_ENABLED(CONFIG_EEH); +} I really dislike these. + extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 1fc8066..289e662 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -173,6 +173,8 @@ struct vfio_device_info { #define VFIO_DEVICE_FLAGS_AMBA (1 << 3)/* vfio-amba device */ /* Platform support all PCI MMIO BARs to be page aligned */ #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED(1 << 4) +/* Platform support mmapping PCI MSI-X vector table */ +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP(1 << 5) Again, not sure why this is on the device versus the region, but I'd prefer to investigate whether we can handle this with the sparse mmap capability (or lack of) in the capability chains I proposed[1]. Thanks, Alex [1] https://lkml.org/lkml/2015/11/23/748 Good idea! I wiil investigate it. Thanks. Regards Yongji Xie __u32 num_regions;/* Max region index + 1 */ __u32 num_irqs; /* Max IRQ index + 1 */ }; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: > Current vfio-pci implementation disallows to mmap MSI-X table in > case that user get to touch this directly. > > However, EEH mechanism could ensure that a given pci device > can only shoot the MSIs assigned for its PE and guest kernel also > would not write to MSI-X table in pci_enable_msix() because > para-virtualization on PPC64 platform. So MSI-X table is safe to > access directly from the guest with EEH mechanism enabled. The MSI-X table is paravirtualized on vfio in general and interrupt remapping theoretically protects against errant interrupts, so why is this PPC64 specific? We have the same safeguards on x86 if we want to decide they're sufficient. Offhand, the only way I can think that a device can touch the MSI-X table is via backdoors or p2p DMA with another device. > This patch adds support for this case and allow to mmap MSI-X > table if EEH is supported on PPC64 platform. > > And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify > userspace that it's safe to mmap MSI-X table. > > Signed-off-by: Yongji Xie > --- > drivers/vfio/pci/vfio_pci.c |5 - > drivers/vfio/pci/vfio_pci_private.h |5 + > include/uapi/linux/vfio.h |2 ++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index dbcad99..85d9980 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, > if (vfio_pci_bar_page_aligned()) > info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; > > + if (vfio_msix_table_mmap_enabled()) > + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; > + > info.num_regions = VFIO_PCI_NUM_REGIONS; > info.num_irqs = VFIO_PCI_NUM_IRQS; > > @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct > vm_area_struct *vma) > if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) > return -EINVAL; > > - if (index == vdev->msix_bar) { > + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { > /* > * Disallow mmaps overlapping the MSI-X table; users don't > * get to touch this directly. We could find somewhere > diff --git a/drivers/vfio/pci/vfio_pci_private.h > b/drivers/vfio/pci/vfio_pci_private.h > index 319352a..835619e 100644 > --- a/drivers/vfio/pci/vfio_pci_private.h > +++ b/drivers/vfio/pci/vfio_pci_private.h > @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) > return IS_ENABLED(CONFIG_PPC64); > } > > +static inline bool vfio_msix_table_mmap_enabled(void) > +{ > + return IS_ENABLED(CONFIG_EEH); > +} I really dislike these. > + > extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); > extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 1fc8066..289e662 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -173,6 +173,8 @@ struct vfio_device_info { > #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ > /* Platform support all PCI MMIO BARs to be page aligned */ > #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4) > +/* Platform support mmapping PCI MSI-X vector table */ > +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP (1 << 5) Again, not sure why this is on the device versus the region, but I'd prefer to investigate whether we can handle this with the sparse mmap capability (or lack of) in the capability chains I proposed[1]. Thanks, Alex [1] https://lkml.org/lkml/2015/11/23/748 > __u32 num_regions;/* Max region index + 1 */ > __u32 num_irqs; /* Max IRQ index + 1 */ > }; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html