Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Alex Williamson
On Thu, 2015-12-03 at 16:16 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can
> introduce some wrappers in include/linux/msi.h like
> msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not
> conceptually change anything.

But otherwise we're parsing data structures that are possibly intended
to be private.  An interface also abstracts the implementation from the
caller.

> >  There needs to be a real, interface there that isn't specific to
> vfio.
> 
>  Hm... What else is going to use this?

I don't know, but pushing vfio specific data structures and concepts
into core kernel callbacks is clearly the wrong direction to go.

>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to perform
> MSI mapping for all "vfio domains" registered for this container. At
> least this is how original type1 driver works.
>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So is
> my understanding correct that "vfio domain" is IOMMU instance?

There's no such thing as a vfio domain, I think you mean iommu domains.
A vfio container represents a user iommu context.  All of the groups
(and thus devices) within a container have the same iommu mappings.
However, not all of the groups are necessarily behind iommu hardware
units that support the same set of features.  We might therefore need to
mirror the user iommu context for the container across multiple physical
iommu contexts (aka domains).  When we walk the iommu->domain_list,
we're mirroring mappings across these multiple iommu domains within the
container.

>  And here come completely different ideas...
>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

vfio is built on iommu groups with the premise being that an iommu group
represents the smallest set of devices that are isolated from all other
devices both by iommu visibility and by DMA isolation (peer-to-peer).
Therefore we base iommu mappings on an iommu group because we cannot
enforce userspace isolation at a sub-group level.  In a system or
topology that is well architected for device isolation, there will be a
one-to-one mapping of iommu groups to devices.

So, iommu mappings are always on a per-group basis, but the user may
attach multiple groups to a single container, which as discussed above
represents a single iommu context.  That single context may be backed by
one or more iommu domains, depending on the capabilities of the iommu
hardware.  You're therefore not performing all mappings on a per-iommu
basis unless you have a user defined container spanning multiple iommus
which are incompatible in a way that requires us to manage them with
separate iommu domains.

The iommu's ability to do per device mappings here is irrelevant.
You're working within a user defined IOMMU context where they have
decided that all of the devices should have the same context.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

Nak, that doesn't solve anything.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

x86 isn't problem-free in this space.  An x86 VM is going to know that
the 0xfee0 address range is special, it won't be backed by RAM and
won't be a DMA target, thus we'll never attempt to map it for an iova
address.  However, if we run a non-x86 VM or a userspace driver, it
doesn't necessarily know that there's anything special about that range
of iovas.  I intend to resolve this with an extension to the iommu info
ioctl that describes the available iova space for the iommu.  The
interrupt region would simply be excluded.

This may be an option for you too, but you need to consider whether it
precludes things like hotplug.  Even in the x86 case, if we have a
non-x86 VM and we try to hot-add a PCI device, we can't dynamically
remove the RAM that would interfere with with the MSI vector block.  I
don't know what that looks like on your platform, whether you can pick a
fixed range for the V

Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Marc Zyngier
On Thu, 3 Dec 2015 16:16:54 +0300
Pavel Fedin  wrote:

Pavel,

>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce 
> some wrappers in include/linux/msi.h like 
> msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually 
> change anything.
> 
> >  There needs to be a real, interface there that isn't specific to vfio.
> 
>  Hm... What else is going to use this?
>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to
> perform MSI mapping for all "vfio domains" registered for this
> container. At least this is how original type1 driver works.

>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So
> is my understanding correct that "vfio domain" is IOMMU instance?

>  And here come completely different ideas...

>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

Not exactly. You can have a a bunch of devices between a single
StreamID (making them completely indistinguishable). This is a system
integration decision.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

As explained above, this doesn't work.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

Neither. We want userspace to be in control of the memory map, and it
is the kernel's job to tell us whether or not this matches the HW
capabilities or not. A fixed mapping may completely clash with the
memory map I want (think emulating HW x on platform y), and there is no
reason why we should have the restrictions x86 has.

My understanding is that userspace should be in charge here, and maybe
obtain a range of acceptable IOVAs for the MSI doorbell to be mapped.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Pavel Fedin
 Hello!

> I like that you're making this transparent
> for the user, but at the same time, directly calling function pointers
> through the msi_domain_ops is quite ugly.

 Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce 
some wrappers in include/linux/msi.h like 
msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually 
change anything.

>  There needs to be a real, interface there that isn't specific to vfio.

 Hm... What else is going to use this?
 Actually, in my implementation the only thing specific to vfio is using struct 
vfio_iommu_driver_ops. This is because we have to perform MSI mapping for all 
"vfio domains" registered for this container. At least this is how original 
type1 driver works.
 Can anybody explain me, what these "vfio domains" are? From the code it looks 
like we can have several IOMMU instances belonging to one VFIO container, and 
in this case one IOMMU == one "vfio domain". So is my understanding correct 
that "vfio domain" is IOMMU instance?
 And here come completely different ideas...
 First of all, can anybody explain, why do i perform all mappings on per-IOMMU 
basis, not on per-device basis? AFAIK at least ARM SMMU knows about "stream 
IDs", and therefore it should be capable of distinguishing between different 
devices. So can i have per-device mapping? This would make things much simpler.
 So:
 Idea 1: do per-device mappings. In this case i don't have to track down which 
devices belong to which group and which IOMMU...
 Idea 2: What if we indeed simply simulate x86 behavior? What if we just do 1:1 
mapping for MSI register when IOMMU is initialized and forget about it, so that 
MSI messages are guaranteed to reach the host? Or would this mean that we would 
have to do 1:1 mapping for the whole address range? Looks like (1) tried to do 
something similar, with address reservation.
 Idea 3: Is single device guaranteed to correspond to a single "vfio domain" 
(and, as a consequence, to a single IOMMU)? In this case it's very easy to 
unlink interface introduced by 0002 of my series from vfio, and pass just raw 
struct iommu_domain * without any driver_ops? irqchip code would only need 
iommu_map() and iommu_unmap() then, no calling back to vfio layer.

Cc'ed to authors of all mentioned series

> [1] http://www.spinics.net/lists/kvm/msg121669.html,
> http://www.spinics.net/lists/kvm/msg121662.html
> [2] http://www.spinics.net/lists/kvm/msg119236.html

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-02 Thread Alex Williamson
On Tue, 2015-11-24 at 16:50 +0300, Pavel Fedin wrote:
> On some architectures (e.g. ARM64) if the device is behind an IOMMU, and
> is being mapped by VFIO, it is necessary to also add mappings for MSI
> translation register for interrupts to work. This series implements the
> necessary API to do this, and makes use of this API for GICv3 ITS on
> ARM64.
> 
> v1 => v2:
> - Adde dependency on CONFIG_GENERIC_MSI_IRQ_DOMAIN in some parts of the
>   code, should fix build without this option
> 
> Pavel Fedin (3):
>   vfio: Introduce map and unmap operations
>   gicv3, its: Introduce VFIO map and unmap operations
>   vfio: Introduce generic MSI mapping operations
> 
>  drivers/irqchip/irq-gic-v3-its.c   |  31 ++
>  drivers/vfio/pci/vfio_pci_intrs.c  |  11 
>  drivers/vfio/vfio.c| 116 
> +
>  drivers/vfio/vfio_iommu_type1.c|  29 ++
>  include/linux/irqchip/arm-gic-v3.h |   2 +
>  include/linux/msi.h|  12 
>  include/linux/vfio.h   |  17 +-
>  7 files changed, 217 insertions(+), 1 deletion(-)


Some good points and bad.  I like that you're making this transparent
for the user, but at the same time, directly calling function pointers
through the msi_domain_ops is quite ugly.  There needs to be a real,
interface there that isn't specific to vfio.  The down side of making it
transparent to the user is that parts of their IOVA space are being
claimed and they have no way to figure out what they are.  In fact, the
IOMMU mappings bypass the rb-tree that the type1 driver uses, so these
mappings might stomp on existing mappings for the user or the user might
stomp on these.  Neither of which would be much fun to debug.

There have been previous efforts to support MSI mapping in VFIO[1,2],
but none of them have really gone anywhere.  Whatever solution we use
needs to support everyone who needs it.  Thanks,

Alex

[1] http://www.spinics.net/lists/kvm/msg121669.html, 
http://www.spinics.net/lists/kvm/msg121662.html
[2] http://www.spinics.net/lists/kvm/msg119236.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-11-24 Thread Pavel Fedin
On some architectures (e.g. ARM64) if the device is behind an IOMMU, and
is being mapped by VFIO, it is necessary to also add mappings for MSI
translation register for interrupts to work. This series implements the
necessary API to do this, and makes use of this API for GICv3 ITS on
ARM64.

v1 => v2:
- Adde dependency on CONFIG_GENERIC_MSI_IRQ_DOMAIN in some parts of the
  code, should fix build without this option

Pavel Fedin (3):
  vfio: Introduce map and unmap operations
  gicv3, its: Introduce VFIO map and unmap operations
  vfio: Introduce generic MSI mapping operations

 drivers/irqchip/irq-gic-v3-its.c   |  31 ++
 drivers/vfio/pci/vfio_pci_intrs.c  |  11 
 drivers/vfio/vfio.c| 116 +
 drivers/vfio/vfio_iommu_type1.c|  29 ++
 include/linux/irqchip/arm-gic-v3.h |   2 +
 include/linux/msi.h|  12 
 include/linux/vfio.h   |  17 +-
 7 files changed, 217 insertions(+), 1 deletion(-)

-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html