Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2020-11-18 Thread Jacob Pan
Hi Eric,

On Wed, 18 Nov 2020 12:21:37 +0100, Eric Auger 
wrote:

> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v12 -> v13:
> - Fix config check
> 
> v11 -> v12:
> - add argsz, name the union
> ---
>  drivers/iommu/iommu.c  | 68 ++
>  include/linux/iommu.h  | 21 
>  include/uapi/linux/iommu.h | 54 ++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b53446bb8c6b..978fe34378fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct
> iommu_domain *domain, struct device *dev }
>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->attach_pasid_table(domain, cfg);
> +}
> +
> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +   void __user *uinfo)
> +{
> + struct iommu_pasid_table_config pasid_table_data = { 0 };
> + u32 minsz;
> +
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + /*
> +  * No new spaces can be added before the variable sized union,
> the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(&pasid_table_data, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before the variable size union are mandatory */
> + if (pasid_table_data.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz
> */
> + if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
> + return -EINVAL;
> + if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
> + pasid_table_data.argsz <
> + offsetofend(struct iommu_pasid_table_config,
> vendor_data.smmuv3))
> + return -EINVAL;
> +
> + /*
> +  * User might be using a newer UAPI header which has a larger
> data
> +  * size, we shall support the existing flags within the current
> +  * size. Copy the remaining user data _after_ minsz but not more
> +  * than the current kernel supported size.
> +  */
> + if (copy_from_user((void *)&pasid_table_data + minsz, uinfo +
> minsz,
> +min_t(u32, pasid_table_data.argsz,
> sizeof(pasid_table_data)) - minsz))
> + return -EFAULT;
> +
> + /* Now the argsz is validated, check the content */
> + if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
> + pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
> + return -EINVAL;
> +
> + return domain->ops->attach_pasid_table(domain,
> &pasid_table_data); +}
> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->detach_pasid_table))
> + return;
> +
> + domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..4

Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API

2020-04-15 Thread Jacob Pan
On Wed, 15 Apr 2020 16:52:10 +0200
Auger Eric  wrote:

> Hi Jacob,
> On 4/15/20 12:15 AM, Jacob Pan wrote:
> > Hi Eric,
> > 
> > There are some discussions about how to size the uAPI data.
> > https://lkml.org/lkml/2020/4/14/939
> > 
> > I think the problem with the current scheme is that when uAPI data
> > gets extended, if VFIO continue to use:
> > 
> > minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table,
> > config); if (copy_from_user(&spt, (void __user *)arg, minsz))
> > 
> > It may copy more data from user than what was setup by the user.
> > 
> > So, as suggested by Alex, we could add argsz to the IOMMU uAPI
> > struct. So if argsz > minsz, then fail the attach_table since
> > kernel might be old, doesn't know about the extra data.
> > If argsz <= minsz, kernel can support the attach_table but must
> > process the data based on flags or config.  
> 
> So I guess we would need both an argsz _u32 + a new flag _u32 right?
> 
Yes.
> I am ok with that idea. Besides how will you manage for existing IOMMU
> UAPIs?
I plan to add argsz and flags (if not already have one)

> At some point you envisionned to have a getter at iommu api
> level to retrieve the size of a structure for a given version, right?
> 
This idea is shot down. There is no version-size lookup.
So the current plan is for user to fill out argsz in each IOMMU uAPI
struct. VFIO does the copy_from_user() based on argsz (sanitized
against the size of current kernel struct).

IOMMU vendor driver process the data based on flags which indicates
new capability/extensions.

> Thanks
> 
> Eric
> > 
> > Does it make sense to you?
> > 
> > 
> > On Tue, 14 Apr 2020 17:05:55 +0200
> > Eric Auger  wrote:
> >   
> >> From: Jacob Pan 
> >>
> >> In virtualization use case, when a guest is assigned
> >> a PCI host device, protected by a virtual IOMMU on the guest,
> >> the physical IOMMU must be programmed to be consistent with
> >> the guest mappings. If the physical IOMMU supports two
> >> translation stages it makes sense to program guest mappings
> >> onto the first stage/level (ARM/Intel terminology) while the host
> >> owns the stage/level 2.
> >>
> >> In that case, it is mandated to trap on guest configuration
> >> settings and pass those to the physical iommu driver.
> >>
> >> This patch adds a new API to the iommu subsystem that allows
> >> to set/unset the pasid table information.
> >>
> >> A generic iommu_pasid_table_config struct is introduced in
> >> a new iommu.h uapi header. This is going to be used by the VFIO
> >> user API.
> >>
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Liu, Yi L
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Jacob Pan
> >>  Signed-off-by: Eric Auger
> >>  Reviewed-by: Jean-Philippe Brucker
> >>  ---
> >>  drivers/iommu/iommu.c  | 19 ++
> >>  include/linux/iommu.h  | 18 ++
> >>  include/uapi/linux/iommu.h | 51
> >> ++ 3 files changed, 88
> >> insertions(+)
> >>
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 2b471419e26c..b71ad56f8c99 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct
> >> iommu_domain *domain, struct device *dev, }
> >>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> >>  
> >> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> >> +   struct iommu_pasid_table_config *cfg)
> >> +{
> >> +  if (unlikely(!domain->ops->attach_pasid_table))
> >> +  return -ENODEV;
> >> +
> >> +  return domain->ops->attach_pasid_table(domain, cfg);
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
> >> +
> >> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> >> +{
> >> +  if (unlikely(!domain->ops->detach_pasid_table))
> >> +  return;
> >> +
> >> +  domain->ops->detach_pasid_table(domain);
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> >> +
> >>  static void __iommu_detach_device(struct iommu_domain *domain,
> >>  struct device *dev)
> >>  {
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index

Re: [PATCH v11 01/13] iommu: Introduce attach/detach_pasid_table API

2020-04-14 Thread Jacob Pan
Hi Eric,

There are some discussions about how to size the uAPI data.
https://lkml.org/lkml/2020/4/14/939

I think the problem with the current scheme is that when uAPI data gets
extended, if VFIO continue to use:

minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, config);
if (copy_from_user(&spt, (void __user *)arg, minsz))

It may copy more data from user than what was setup by the user.

So, as suggested by Alex, we could add argsz to the IOMMU uAPI struct.
So if argsz > minsz, then fail the attach_table since kernel might be
old, doesn't know about the extra data.
If argsz <= minsz, kernel can support the attach_table but must process
the data based on flags or config.

Does it make sense to you?


On Tue, 14 Apr 2020 17:05:55 +0200
Eric Auger  wrote:

> From: Jacob Pan 
> 
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> Reviewed-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c  | 19 ++
>  include/linux/iommu.h  | 18 ++
>  include/uapi/linux/iommu.h | 51
> ++ 3 files changed, 88
> insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2b471419e26c..b71ad56f8c99 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1723,6 +1723,25 @@ int iommu_sva_unbind_gpasid(struct
> iommu_domain *domain, struct device *dev, }
>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->attach_pasid_table(domain, cfg);
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->detach_pasid_table))
> + return;
> +
> + domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7ef8b0bda695..3e1057c3585a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -248,6 +248,8 @@ struct iommu_iotlb_gather {
>   * @cache_invalidate: invalidate translation caches
>   * @sva_bind_gpasid: bind guest pasid and mm
>   * @sva_unbind_gpasid: unbind guest pasid and mm
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   * @owner: Driver module providing these ops
>   */
> @@ -307,6 +309,9 @@ struct iommu_ops {
> void *drvdata);
>   void (*sva_unbind)(struct iommu_sva *handle);
>   int (*sva_get_pasid)(struct iommu_sva *handle);
> + int (*attach_pasid_table)(struct iommu_domain *domain,
> +   struct iommu_pasid_table_config
> *cfg);
> + void (*detach_pasid_table)(struct iommu_domain *domain);
>  
>   int (*page_response)(struct device *dev,
>struct iommu_fault_event *evt,
> @@ -446,6 +451,9 @@ extern int iommu_sva_bind_gpasid(struct
> iommu_domain *domain, struct device *dev, struct
> iommu_gpasid_bind_data *data); extern int
> iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device
> *dev, ioasid_t pasid); +extern int iommu_attach_pasid_table(struct
> iommu_domain *domain,
> + struct iommu_pasid_table_config
> *cfg); +extern void iommu_detach_pasid_table(struct iommu_domain
> *domain); extern struct iommu_domain *iommu_get_domain_for_dev(struct
> de

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-18 Thread Jacob Pan
On Tue, 18 Jun 2019 15:04:36 +0100
Jean-Philippe Brucker  wrote:

> On 12/06/2019 19:53, Jacob Pan wrote:
> >>> You are right, the worst case of the spurious PS is to terminate
> >>> the group prematurely. Need to know the scope of the HW damage in
> >>> case of mdev where group IDs can be shared among mdevs belong to
> >>> the same PF.
> >>
> >> But from the IOMMU fault API point of view, the full page request
> >> is identified by both PRGI and PASID. Given that each mdev has its
> >> own set of PASIDs, it should be easy to isolate page responses per
> >> mdev. 
> > On Intel platform, devices sending page request with private data
> > must receive page response with matching private data. If we solely
> > depend on PRGI and PASID, we may send stale private data to the
> > device in those incorrect page response. Since private data may
> > represent PF device wide contexts, the consequence of sending page
> > response with wrong private data may affect other mdev/PASID.
> > 
> > One solution we are thinking to do is to inject the sequence #(e.g.
> > ktime raw mono clock) as vIOMMU private data into to the guest.
> > Guest would return this fake private data in page response, then
> > host will send page response back to the device that matches PRG1
> > and PASID and private_data.
> > 
> > This solution does not expose HW context related private data to the
> > guest but need to extend page response in iommu uapi.
> > 
> > /**
> >  * struct iommu_page_response - Generic page response information
> >  * @version: API version of this structure
> >  * @flags: encodes whether the corresponding fields are valid
> >  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
> >  * @pasid: Process Address Space ID
> >  * @grpid: Page Request Group Index
> >  * @code: response code from &enum iommu_page_response_code
> >  * @private_data: private data for the matching page request
> >  */
> > struct iommu_page_response {
> > #define IOMMU_PAGE_RESP_VERSION_1   1
> > __u32   version;
> > #define IOMMU_PAGE_RESP_PASID_VALID (1 << 0)
> > #define IOMMU_PAGE_RESP_PRIVATE_DATA(1 << 1)
> > __u32   flags;
> > __u32   pasid;
> > __u32   grpid;
> > __u32   code;
> > __u32   padding;
> > __u64   private_data[2];
> > };
> > 
> > There is also the change needed for separating storage for the real
> > and fake private data.
> > 
> > Sorry for the last minute change, did not realize the HW
> > implications.
> > 
> > I see this as a future extension due to limited testing,   
> 
> I'm wondering how we deal with:
> (1) old userspace that won't fill the new private_data field in
> page_response. A new kernel still has to support it.
> (2) old kernel that won't recognize the new PRIVATE_DATA flag.
> Currently iommu_page_response() rejects page responses with unknown
> flags.
> 
> I guess we'll need a two-way negotiation, where userspace queries
> whether the kernel supports the flag (2), and the kernel learns
> whether it should expect the private data to come back (1).
> 
I am not sure case (1) exist in that there is no existing user space
supports PRQ w/o private data. Am I missing something?

For VT-d emulation, private data is always part of the scalable mode
PASID capability. If vIOMMU query host supports PASID and scalable
mode, it will always support private data once PRQ is enabled.

So I think we only need to negotiate (2) which should be covered by
VT-d PASID cap.

> > perhaps for
> > now, can you add paddings similar to page request? Make it 64B as
> > well.  
> 
> I don't think padding is necessary, because iommu_page_response is
> sent by userspace to the kernel, unlike iommu_fault which is
> allocated by userspace and filled by the kernel.
> 
> Page response looks a lot more like existing VFIO mechanisms, so I
> suppose we'll wrap the iommu_page_response structure and include an
> argsz parameter at the top:
> 
>   struct vfio_iommu_page_response {
>   u32 argsz;
>   struct iommu_page_response pr;
>   };
> 
>   struct vfio_iommu_page_response vpr = {
>   .argsz = sizeof(vpr),
>   .pr = ...
>   ...
>   };
> 
>   ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, &vpr);
> 
> In that case supporting private data can be done by simply appending a
> field at the end (plus the negotiation above).
> 
Do you mean at the end of struct vfio_iommu_page_response{}? or at
the end of that seems struct iommu_page_response{}?

The consumer of the private data is iommu driver not vfio. So I think
you want to add the new field at the end of struct iommu_page_response,
right?
I think that would work, just to clarify.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-12 Thread Jacob Pan
On Tue, 11 Jun 2019 14:14:33 +0100
Jean-Philippe Brucker  wrote:

> On 10/06/2019 22:31, Jacob Pan wrote:
> > On Mon, 10 Jun 2019 13:45:02 +0100
> > Jean-Philippe Brucker  wrote:
> >   
> >> On 07/06/2019 18:43, Jacob Pan wrote:  
> >>>>> So it seems we agree on the following:
> >>>>> - iommu_unregister_device_fault_handler() will never fail
> >>>>> - iommu driver cleans up all pending faults when handler is
> >>>>> unregistered
> >>>>> - assume device driver or guest not sending more page response
> >>>>> _after_ handler is unregistered.
> >>>>> - system will tolerate rare spurious response
> >>>>>
> >>>>> Sounds right?  
> >>>>
> >>>> Yes, I'll add that to the fault series
> >>> Hold on a second please, I think we need more clarifications.
> >>> Ashok pointed out to me that the spurious response can be harmful
> >>> to other devices when it comes to mdev, where PRQ group id is not
> >>> per PASID, device may reuse the group number and receiving
> >>> spurious page response can confuse the entire PF. 
> >>
> >> I don't understand how mdev differs from the non-mdev situation
> >> (but I also still don't fully get how mdev+PASID will be
> >> implemented). Is the following the case you're worried about?
> >>
> >>   M#: mdev #
> >>
> >> # Dev Hostmdev drv   VFIO/QEMUGuest
> >> 
> >> 1 <- reg(handler)
> >> 2 PR1 G1 P1-> M1 PR1 G1inject -> M1 PR1 G1
> >> 3 <- unreg(handler)
> >> 4   <- PS1 G1 P1 (F)  |
> >> 5unreg(handler)
> >> 6 <- reg(handler)
> >> 7 PR2 G1 P1-> M2 PR2 G1inject -> M2 PR2 G1
> >> 8 <- M1 PS1 G1
> >> 9 accept ??<- PS1 G1 P1
> >> 10<- M2 PS2 G1
> >> 11accept   <- PS2 G1 P1
> >>  
> > Not really. I am not worried about PASID reuse or unbind. Just
> > within the same PASID bind lifetime of a single mdev, back to back
> > register/unregister fault handler.
> > After Step 4, device will think G1 is done. Device could reuse G1
> > for the next PR, if we accept PS1 in step 9, device will terminate
> > G1 before the real G1 PS arrives in Step 11. The real G1 PS might
> > have a different response code. Then we just drop the PS in Step
> > 11?  
> 
> Yes, I think we do. Two possibilities:
> 
> * G1 is reused at step 7 for the same PASID context, which means that
> it is for the same mdev. The problem is then identical to the non-mdev
> case, new page faults and old page response may cross:
> 
> # Dev Hostmdev drv   VFIO/QEMUGuest
> 
> 7 PR2 G1 P1  --.
> 8   \ .- M1 PS1 G1
> 9'->  PR2 G1 P1  ->  /   inject  --> M1 PR2 G1
> 10   accept <---  PS1 G1 P1  <--'
> 11   reject <---  PS2 G1 P1  <-- M1 PS2 G1
> 
> And the incorrect page response is returned to the guest. However it
> affects a single mdev/guest context, it doesn't affect other mdevs.
> 
> * Or G1 is reused at step 7 for a different PASID. At step 10 the
> fault handler rejects the page response because the PASID is
> different, and step 11 is accepted.
> 
> 
> >>> Having spurious page response is also not
> >>> abiding the PCIe spec. exactly.
> >>
> >> We are following the PCI spec though, in that we don't send page
> >> responses for PRGIs that aren't in flight.
> >>  
> > You are right, the worst case of the spurious PS is to terminate the
> > group prematurely. Need to know the scope of the HW damage in case
> > of mdev where group IDs can be shared among mdevs belong to the
> > same PF.  
> 
> But from the IOMMU fault API point of view, the full page request is
> identified by both PRGI and PASID. Given that each mdev has its own
> set of PASIDs, it should be easy to isolate page responses per mdev.
> 
On Intel platform, devices sending page request with private data must
receive page response with ma

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-10 Thread Jacob Pan
On Mon, 10 Jun 2019 13:45:02 +0100
Jean-Philippe Brucker  wrote:

> On 07/06/2019 18:43, Jacob Pan wrote:
> >>> So it seems we agree on the following:
> >>> - iommu_unregister_device_fault_handler() will never fail
> >>> - iommu driver cleans up all pending faults when handler is
> >>> unregistered
> >>> - assume device driver or guest not sending more page response
> >>> _after_ handler is unregistered.
> >>> - system will tolerate rare spurious response
> >>>
> >>> Sounds right?
> >>
> >> Yes, I'll add that to the fault series  
> > Hold on a second please, I think we need more clarifications. Ashok
> > pointed out to me that the spurious response can be harmful to other
> > devices when it comes to mdev, where PRQ group id is not per PASID,
> > device may reuse the group number and receiving spurious page
> > response can confuse the entire PF.   
> 
> I don't understand how mdev differs from the non-mdev situation (but I
> also still don't fully get how mdev+PASID will be implemented). Is the
> following the case you're worried about?
> 
>   M#: mdev #
> 
> # Dev Hostmdev drv   VFIO/QEMUGuest
> 
> 1 <- reg(handler)
> 2 PR1 G1 P1-> M1 PR1 G1inject -> M1 PR1 G1
> 3 <- unreg(handler)
> 4   <- PS1 G1 P1 (F)  |
> 5unreg(handler)
> 6 <- reg(handler)
> 7 PR2 G1 P1-> M2 PR2 G1inject -> M2 PR2 G1
> 8 <- M1 PS1 G1
> 9 accept ??<- PS1 G1 P1
> 10<- M2 PS2 G1
> 11accept   <- PS2 G1 P1
> 
Not really. I am not worried about PASID reuse or unbind. Just within
the same PASID bind lifetime of a single mdev, back to back
register/unregister fault handler.
After Step 4, device will think G1 is done. Device could reuse G1 for
the next PR, if we accept PS1 in step 9, device will terminate G1 before
the real G1 PS arrives in Step 11. The real G1 PS might have a
different response code. Then we just drop the PS in Step 11?

If the device does not reuse G1 immediately, the spurious response to
G1 will get dropped no issue there.

> 
> Step 2 injects PR1 for mdev#1. Step 4 auto-responds to PR1. Between
> steps 5 and 6, we re-allocate PASID #1 for mdev #2. At step 7, we
> inject PR2 for mdev #2. Step 8 is the spurious Page Response for PR1.
> 
> But I don't think step 9 is possible, because the mdev driver knows
> that mdev #1 isn't using PASID #1 anymore. If the configuration is
> valid at all (a page response channel still exists for mdev #1), then
> mdev #1 now has a different PASID, e.g. #2, and step 9 would be "<-
> PS1 G1 P2" which is rejected by iommu.c (no such pending page
> request). And step 11 will be accepted.
> 
> If PASIDs are allocated through VCMD, then the situation seems
> similar: at step 2 you inject "M1 PR1 G1 P1" into the guest, and at
> step 8 the spurious response is "M1 PS1 G1 P1". If mdev #1 doesn't
> have PASID #1 anymore, then the mdev driver can check that the PASID
> is invalid and can reject the page response.
> 
> > Having spurious page response is also not
> > abiding the PCIe spec. exactly.  
> 
> We are following the PCI spec though, in that we don't send page
> responses for PRGIs that aren't in flight.
> 
You are right, the worst case of the spurious PS is to terminate the
group prematurely. Need to know the scope of the HW damage in case of mdev
where group IDs can be shared among mdevs belong to the same PF.

> > We have two options here:
> > 1. unregister handler will get -EBUSY if outstanding fault exists.
> > -PROs: block offending device unbind only, eventually
> > timeout will clear.
> > -CONs: flooded faults can prevent clearing
> > 2. unregister handle will block until all faults are clear in the
> > host. Never fails unregistration  
> 
> Here the host completes the faults itself or wait for a response from
> the guest? I'm slightly confused by the word "blocking". I'd rather we
> don't introduce an uninterruptible sleep in the IOMMU core, since it's
> unlikely to ever finish if we rely on the guest to complete things.
> 
No uninterruptible sleep, I meant unregister_handler is a sync call.
But no wait for guest's response.
> > -PROs: simple flow for VFIO, no need to worry about device
> > ho

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-07 Thread Jacob Pan
On Fri, 7 Jun 2019 11:28:13 +0100
Jean-Philippe Brucker  wrote:

> On 06/06/2019 21:29, Jacob Pan wrote:
> >>>>>> iommu_unregister_device_fault_handler(&vdev->pdev->dev);  
> >>>>>
> >>>>>
> >>>>> But this can fail if there are pending faults which leaves a
> >>>>> device reference and then the system is broken :(  
> >>>> This series only features unrecoverable errors and for those the
> >>>> unregistration cannot fail. Now unrecoverable errors were added I
> >>>> admit this is confusing. We need to sort this out or clean the
> >>>> dependencies.
> >>> As Alex pointed out in 4/29, we can make
> >>> iommu_unregister_device_fault_handler() never fail and clean up
> >>> all the pending faults in the host IOMMU belong to that device.
> >>> But the problem is that if a fault, such as PRQ, has already been
> >>> injected into the guest, the page response may come back after
> >>> handler is unregistered and registered again.
> >>
> >> I'm trying to figure out if that would be harmful in any way. I
> >> guess it can be a bit nasty if we handle the page response right
> >> after having injected a new page request that uses the same PRGI.
> >> In any other case we discard the page response, but here we
> >> forward it to the endpoint and:
> >>
> >> * If the response status is success, endpoint retries the
> >> translation. The guest probably hasn't had time to handle the new
> >> page request and translation will fail, which may lead the endpoint
> >> to give up (two unsuccessful translation requests). Or send a new
> >> request
> >>  
> > Good point, there shouldn't be any harm if the page response is a
> > "fake" success. In fact it could happen in the normal operation when
> > PRQs to two devices share the same non-leaf translation structure.
> > The worst case is just a retry. I am not aware of the retry limit,
> > is it in the PCIe spec? I cannot find it.  
> 
> I don't think so, it's the implementation's choice. In general I don't
> think devices will have a retry limit, but it doesn't seem like the
> PCI spec prevents them from implementing one either. It could be
> useful to stop retrying after a certain number of faults, for
> preventing livelocks when the OS doesn't fix up the page tables and
> the device would just repeat the fault indefinitely.
> 
> > I think we should just document it, similar to having a spurious
> > interrupt. The PRQ trace event should capture that as well.
> >   
> >> * otherwise the endpoint won't retry the access, and could also
> >> disable PRI if the status is failure.
> >>  
> > That would be true regardless this race condition with handler
> > registration. So should be fine.  
> 
> We do give an invalid response for the old PRG (because of
> unregistering), but also for the new one, which has a different
> address that the guest might be able to page in and would normally
> return success.
> 
> >>> We need a way to reject such page response belong
> >>> to the previous life of the handler. Perhaps a sync call to the
> >>> guest with your fault queue eventfd? I am not sure.
> >>
> >> We could simply expect the device driver not to send any page
> >> response after unregistering the fault handler. Is there any
> >> reason VFIO would need to unregister and re-register the fault
> >> handler on a live guest? 
> > There is no reason for VFIO to unregister and register again, I was
> > just thinking from security perspective. Someone could write a VFIO
> > app do this attack. But I agree the damage is within the device,
> > may get PRI disabled as a result.  
> 
> Yes I think the damage would always be contained within the
> misbehaving software
> 
> > So it seems we agree on the following:
> > - iommu_unregister_device_fault_handler() will never fail
> > - iommu driver cleans up all pending faults when handler is
> > unregistered
> > - assume device driver or guest not sending more page response
> > _after_ handler is unregistered.
> > - system will tolerate rare spurious response
> > 
> > Sounds right?  
> 
> Yes, I'll add that to the fault series
Hold on a second please, I think we need more clarifications. Ashok
pointed out to me that the spurious response can be harmful to other
devices when it comes to mdev, where PRQ group id is not per 

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-06 Thread Jacob Pan
On Thu, 6 Jun 2019 19:54:05 +0100
Jean-Philippe Brucker  wrote:

> On 05/06/2019 23:45, Jacob Pan wrote:
> > On Tue, 4 Jun 2019 18:11:08 +0200
> > Auger Eric  wrote:
> >   
> >> Hi Alex,
> >>
> >> On 6/4/19 12:31 AM, Alex Williamson wrote:  
> >>> On Sun, 26 May 2019 18:10:01 +0200
> >>> Eric Auger  wrote:
> >>> 
> >>>> This patch registers a fault handler which records faults in
> >>>> a circular buffer and then signals an eventfd. This buffer is
> >>>> exposed within the fault region.
> >>>>
> >>>> Signed-off-by: Eric Auger 
> >>>>
> >>>> ---
> >>>>
> >>>> v3 -> v4:
> >>>> - move iommu_unregister_device_fault_handler to vfio_pci_release
> >>>> ---
> >>>>  drivers/vfio/pci/vfio_pci.c | 49
> >>>> + drivers/vfio/pci/vfio_pci_private.h
> >>>> |  1 + 2 files changed, 50 insertions(+)
> >>>>
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c
> >>>> b/drivers/vfio/pci/vfio_pci.c index f75f61127277..52094ba8
> >>>> 100644 --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -30,6 +30,7 @@
> >>>>  #include 
> >>>>  #include 
> >>>>  #include 
> >>>> +#include 
> >>>>  
> >>>>  #include "vfio_pci_private.h"
> >>>>  
> >>>> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops
> >>>> vfio_pci_fault_prod_regops = { .add_capability =
> >>>> vfio_pci_fault_prod_add_capability, };
> >>>>  
> >>>> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event
> >>>> *evt, void *data) +{
> >>>> +struct vfio_pci_device *vdev = (struct vfio_pci_device
> >>>> *) data;
> >>>> +struct vfio_region_fault_prod *prod_region =
> >>>> +(struct vfio_region_fault_prod
> >>>> *)vdev->fault_pages;
> >>>> +struct vfio_region_fault_cons *cons_region =
> >>>> +(struct vfio_region_fault_cons
> >>>> *)(vdev->fault_pages + 2 * PAGE_SIZE);
> >>>> +struct iommu_fault *new =
> >>>> +(struct iommu_fault *)(vdev->fault_pages +
> >>>> prod_region->offset +
> >>>> +prod_region->prod *
> >>>> prod_region->entry_size);
> >>>> +int prod, cons, size;
> >>>> +
> >>>> +mutex_lock(&vdev->fault_queue_lock);
> >>>> +
> >>>> +if (!vdev->fault_abi)
> >>>> +goto unlock;
> >>>> +
> >>>> +prod = prod_region->prod;
> >>>> +cons = cons_region->cons;
> >>>> +size = prod_region->nb_entries;
> >>>> +
> >>>> +if (CIRC_SPACE(prod, cons, size) < 1)
> >>>> +goto unlock;
> >>>> +
> >>>> +*new = evt->fault;
> >>>> +prod = (prod + 1) % size;
> >>>> +prod_region->prod = prod;
> >>>> +mutex_unlock(&vdev->fault_queue_lock);
> >>>> +
> >>>> +mutex_lock(&vdev->igate);
> >>>> +if (vdev->dma_fault_trigger)
> >>>> +eventfd_signal(vdev->dma_fault_trigger, 1);
> >>>> +mutex_unlock(&vdev->igate);
> >>>> +return 0;
> >>>> +
> >>>> +unlock:
> >>>> +mutex_unlock(&vdev->fault_queue_lock);
> >>>> +return -EINVAL;
> >>>> +}
> >>>> +
> >>>>  static int vfio_pci_init_fault_region(struct vfio_pci_device
> >>>> *vdev) {
> >>>>  struct vfio_region_fault_prod *header;
> >>>> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct
> >>>> vfio_pci_device *vdev) header = (struct vfio_region_fault_prod
> >>>> *)vdev->fault_pages; header->version = -1;
> >>>>  header->offset = PAGE_SIZE;
> >>>> +
> >>>> +ret =
> >>>> iommu_register_device_fault_handler(&vdev

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-05 Thread Jacob Pan
On Tue, 4 Jun 2019 18:11:08 +0200
Auger Eric  wrote:

> Hi Alex,
> 
> On 6/4/19 12:31 AM, Alex Williamson wrote:
> > On Sun, 26 May 2019 18:10:01 +0200
> > Eric Auger  wrote:
> >   
> >> This patch registers a fault handler which records faults in
> >> a circular buffer and then signals an eventfd. This buffer is
> >> exposed within the fault region.
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >>
> >> v3 -> v4:
> >> - move iommu_unregister_device_fault_handler to vfio_pci_release
> >> ---
> >>  drivers/vfio/pci/vfio_pci.c | 49
> >> + drivers/vfio/pci/vfio_pci_private.h
> >> |  1 + 2 files changed, 50 insertions(+)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c
> >> b/drivers/vfio/pci/vfio_pci.c index f75f61127277..52094ba8
> >> 100644 --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -30,6 +30,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  #include "vfio_pci_private.h"
> >>  
> >> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops
> >> vfio_pci_fault_prod_regops = { .add_capability =
> >> vfio_pci_fault_prod_add_capability, };
> >>  
> >> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event
> >> *evt, void *data) +{
> >> +  struct vfio_pci_device *vdev = (struct vfio_pci_device *)
> >> data;
> >> +  struct vfio_region_fault_prod *prod_region =
> >> +  (struct vfio_region_fault_prod
> >> *)vdev->fault_pages;
> >> +  struct vfio_region_fault_cons *cons_region =
> >> +  (struct vfio_region_fault_cons
> >> *)(vdev->fault_pages + 2 * PAGE_SIZE);
> >> +  struct iommu_fault *new =
> >> +  (struct iommu_fault *)(vdev->fault_pages +
> >> prod_region->offset +
> >> +  prod_region->prod *
> >> prod_region->entry_size);
> >> +  int prod, cons, size;
> >> +
> >> +  mutex_lock(&vdev->fault_queue_lock);
> >> +
> >> +  if (!vdev->fault_abi)
> >> +  goto unlock;
> >> +
> >> +  prod = prod_region->prod;
> >> +  cons = cons_region->cons;
> >> +  size = prod_region->nb_entries;
> >> +
> >> +  if (CIRC_SPACE(prod, cons, size) < 1)
> >> +  goto unlock;
> >> +
> >> +  *new = evt->fault;
> >> +  prod = (prod + 1) % size;
> >> +  prod_region->prod = prod;
> >> +  mutex_unlock(&vdev->fault_queue_lock);
> >> +
> >> +  mutex_lock(&vdev->igate);
> >> +  if (vdev->dma_fault_trigger)
> >> +  eventfd_signal(vdev->dma_fault_trigger, 1);
> >> +  mutex_unlock(&vdev->igate);
> >> +  return 0;
> >> +
> >> +unlock:
> >> +  mutex_unlock(&vdev->fault_queue_lock);
> >> +  return -EINVAL;
> >> +}
> >> +
> >>  static int vfio_pci_init_fault_region(struct vfio_pci_device
> >> *vdev) {
> >>struct vfio_region_fault_prod *header;
> >> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct
> >> vfio_pci_device *vdev) header = (struct vfio_region_fault_prod
> >> *)vdev->fault_pages; header->version = -1;
> >>header->offset = PAGE_SIZE;
> >> +
> >> +  ret =
> >> iommu_register_device_fault_handler(&vdev->pdev->dev,
> >> +
> >> vfio_pci_iommu_dev_fault_handler,
> >> +  vdev);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >>return 0;
> >>  out:
> >>kfree(vdev->fault_pages);
> >> @@ -570,6 +618,7 @@ static void vfio_pci_release(void *device_data)
> >>if (!(--vdev->refcnt)) {
> >>vfio_spapr_pci_eeh_release(vdev->pdev);
> >>vfio_pci_disable(vdev);
> >> +
> >> iommu_unregister_device_fault_handler(&vdev->pdev->dev);  
> > 
> > 
> > But this can fail if there are pending faults which leaves a device
> > reference and then the system is broken :(  
> This series only features unrecoverable errors and for those the
> unregistration cannot fail. Now unrecoverable errors were added I
> admit this is confusing. We need to sort this out or clean the
> dependencies.
As Alex pointed out in 4/29, we can make
iommu_unregister_device_fault_handler() never fail and clean up all the
pending faults in the host IOMMU belong to that device. But the problem
is that if a fault, such as PRQ, has already been injected into the
guest, the page response may come back after handler is unregistered
and registered again. We need a way to reject such page response belong
to the previous life of the handler. Perhaps a sync call to the guest
with your fault queue eventfd? I am not sure.

Jacob
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response

2019-06-04 Thread Jacob Pan
On Tue, 4 Jun 2019 11:52:18 +0100
Jean-Philippe Brucker  wrote:

> On 03/06/2019 23:32, Alex Williamson wrote:
> > It doesn't seem to make much sense to include this patch without
> > also including "iommu: handle page response timeout".  Was that one
> > lost? Dropped?  Lives elsewhere?  
> 
> The first 7 patches come from my sva/api branch, where I had forgotten
> to add the "handle page response timeout" patch. I added it back,
> probably after Eric sent this version. But I don't think the patch is
> ready for upstream, as we still haven't decided how to proceed with
> timeouts. Patches 6 and 7 are for debugging, I don't know if they
> should go upstream.
Yeah, we can wait until we all agree on timeouts. It was introduced for
a basic safeguard against unresponsive guests.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 04/29] iommu: Add recoverable fault reporting

2019-06-04 Thread Jacob Pan
On Mon, 3 Jun 2019 16:31:45 -0600
Alex Williamson  wrote:

> On Sun, 26 May 2019 18:09:39 +0200
> Eric Auger  wrote:
> 
> > From: Jean-Philippe Brucker 
> > 
> > Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's
> > Stall, enable recoverable I/O page faults. Allow IOMMU drivers to
> > report PRI Page Requests and Stall events through the new fault
> > reporting API. The consumer of the fault can be either an I/O page
> > fault handler in the host, or a guest OS.
> > 
> > Once handled, the fault must be completed by sending a page
> > response back to the IOMMU. Add an iommu_page_response() function
> > to complete a page fault.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/iommu/iommu.c | 77
> > ++- include/linux/iommu.h |
> > 51  2 files changed, 127 insertions(+),
> > 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 795518445a3a..13b301cfb10f 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -869,7 +869,14 @@
> > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
> >   * @data: private data passed as argument to the handler
> >   *
> >   * When an IOMMU fault event is received, this handler gets called
> > with the
> > - * fault event and data as argument.
> > + * fault event and data as argument. The handler should return 0
> > on success. If
> > + * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler
> > should also
> > + * complete the fault by calling iommu_page_response() with one of
> > the following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.
> >   *
> >   * Return 0 if the fault handler was installed successfully, or an
> > error. */
> > @@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct
> > device *dev, }
> > param->fault_param->handler = handler;
> > param->fault_param->data = data;
> > +   mutex_init(¶m->fault_param->lock);
> > +   INIT_LIST_HEAD(¶m->fault_param->faults);
> >  
> >  done_unlock:
> > mutex_unlock(¶m->lock);
> > @@ -934,6 +943,12 @@ int
> > iommu_unregister_device_fault_handler(struct device *dev) if
> > (!param->fault_param) goto unlock;
> >  
> > +   /* we cannot unregister handler if there are pending
> > faults */
> > +   if (!list_empty(¶m->fault_param->faults)) {
> > +   ret = -EBUSY;
> > +   goto unlock;
> > +   }  
> 
> Why?  Attempting to unregister a fault handler suggests the handler
> doesn't care about outstanding faults.  Can't we go ahead and dispatch
> them as failed?  Otherwise we need to be careful that we don't
> introduce an environment where the registered fault handler is blocked
> trying to shutdown and release the device due to a flood of errors.
> Thanks,
> 
My original thinking was that outstanding faults such as PRQ can be
cleared if the handler does not send PRS within timeout. This could be
the case of a malicious guest.

But now I think your suggestion makes sense, it is better to clear out
the pending faults immediately. Then registered fault handler will not
be blocked. And flood of faults will not be reported outside IOMMU
after handler is unregistered.

Jean, would you agree? I guess you are taking care of it in your
sva/api tree now :).

> Alex
> 
> > +
> > kfree(param->fault_param);
> > param->fault_param = NULL;
> > put_device(dev);
> > @@ -958,6 +973,7 @@
> > EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler); int
> > iommu_report_device_fault(struct device *dev, struct
> > iommu_fault_event *evt) { struct iommu_param *param =
> > dev->iommu_param;
> > +   struct iommu_fault_event *evt_pending;
> > struct iommu_fault_param *fparam;
> > int ret = 0;
> >  
> > @@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device
> > *dev, struct iommu_fault_event *evt) ret = -EINVAL;
> > goto done_unlock;
> > }
> > +
> > +   if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> > +   (evt->fault.prm.flags &
> > IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> > +   evt_pending = kmemdup(evt, sizeof(struc

Re: [PATCH v7 05/23] iommu: Introduce cache_invalidate API

2019-05-02 Thread Jacob Pan
On Thu, 2 May 2019 11:53:34 +0100
Jean-Philippe Brucker  wrote:

> On 02/05/2019 07:58, Auger Eric wrote:
> > Hi Jean-Philippe,
> > 
> > On 5/1/19 12:38 PM, Jean-Philippe Brucker wrote:  
> >> On 08/04/2019 13:18, Eric Auger wrote:  
> >>> +int iommu_cache_invalidate(struct iommu_domain *domain, struct
> >>> device *dev,
> >>> +struct iommu_cache_invalidate_info
> >>> *inv_info) +{
> >>> + int ret = 0;
> >>> +
> >>> + if (unlikely(!domain->ops->cache_invalidate))
> >>> + return -ENODEV;
> >>> +
> >>> + ret = domain->ops->cache_invalidate(domain, dev,
> >>> inv_info); +
> >>> + return ret;  
> >>
> >> Nit: you don't really need ret
> >>
> >> The UAPI looks good to me, so
> >>
> >> Reviewed-by: Jean-Philippe Brucker
> >>   
> > Just to make sure, do you accept changes proposed by Jacob in
> > https://lkml.org/lkml/2019/4/29/659 ie.
> > - the addition of NR_IOMMU_INVAL_GRANU in enum
> > iommu_inv_granularity and
> > - the addition of NR_IOMMU_CACHE_TYPE  
> 
> Ah sorry, I forgot about that, I'll review the next version. Yes they
> can be useful (maybe call them IOMMU_INV_GRANU_NR and
> IOMMU_CACHE_INV_TYPE_NR?). I guess it's legal to export in UAPI values
> that will change over time, as VFIO also does it in its enums.
> 
I am fine with the names. Maybe you can put this patch in your sva/api
branch once you reviewed it? Having a common branch for common code
makes life so much easier.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Jacob Pan
On Thu, 21 Mar 2019 15:32:45 +0100
Auger Eric  wrote:

> Hi jean, Jacob,
> 
> On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
> > On 21/03/2019 13:54, Auger Eric wrote:  
> >> Hi Jacob, Jean-Philippe,
> >>
> >> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:  
> >>> On 20/03/2019 16:37, Jacob Pan wrote:
> >>> [...]  
> >>>>> +struct iommu_inv_addr_info {
> >>>>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
> >>>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
> >>>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
> >>>>> +   __u32   flags;
> >>>>> +   __u32   archid;
> >>>>> +   __u64   pasid;
> >>>>> +   __u64   addr;
> >>>>> +   __u64   granule_size;
> >>>>> +   __u64   nb_granules;
> >>>>> +};
> >>>>> +
> >>>>> +/**
> >>>>> + * First level/stage invalidation information
> >>>>> + * @cache: bitfield that allows to select which caches to
> >>>>> invalidate
> >>>>> + * @granularity: defines the lowest granularity used for the
> >>>>> invalidation:
> >>>>> + * domain > pasid > addr
> >>>>> + *
> >>>>> + * Not all the combinations of cache/granularity make sense:
> >>>>> + *
> >>>>> + * type |   DEV_IOTLB   | IOTLB |
> >>>>> PASID|
> >>>>> + * granularity |   |   |
> >>>>> cache   |
> >>>>> + *
> >>>>> -+---+---+---+
> >>>>> + * DOMAIN  |   N/A |   Y   |
> >>>>> Y   |
> >>>>> + * PASID   |   Y   |   Y   |
> >>>>> Y   |
> >>>>> + * ADDR|   Y   |   Y   |
> >>>>> N/A |
> >>>>> + */
> >>>>> +struct iommu_cache_invalidate_info {
> >>>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> >>>>> +   __u32   version;
> >>>>> +/* IOMMU paging structure cache */
> >>>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU
> >>>>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 <<
> >>>>> 1) /* Device IOTLB */ +#define
> >>>>> IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID cache */  
> >>>> Just a clarification, this used to be an enum. You do intend to
> >>>> issue a single invalidation request on multiple cache types?
> >>>> Perhaps for virtio-IOMMU? I only see a single cache type in your
> >>>> patch #14. For VT-d we plan to issue one cache type at a time
> >>>> for now. So this format works for us.  
> >>>
> >>> Yes for virtio-iommu I'd like as little overhead as possible,
> >>> which means a single invalidation message to hit both IOTLB and
> >>> ATC at once, and the ability to specify multiple pages with
> >>> @nb_granules.  
> >> The original request/explanation from Jean-Philippe can be found
> >> here: https://lkml.org/lkml/2019/1/28/1497
> >>  
> >>>  
> >>>> However, if multiple cache types are issued in a single
> >>>> invalidation. They must share a single granularity, not all
> >>>> combinations are valid. e.g. dev IOTLB does not support domain
> >>>> granularity. Just a reminder, not an issue. Driver could filter
> >>>> out invalid combinations.  
> >> Sure I will add a comment about this restriction.  
> >>>
> >>> Agreed. Even the core could filter out invalid combinations based
> >>> on the table above: IOTLB and domain granularity are N/A.  
> >> I don't get this sentence. What about vtd IOTLB domain-selective
> >> invalidation:  
> > 
> > My mistake: I meant dev-IOTLB and domain granularity are N/A  
> 
> Ah OK, no worries.
> 
> How do we proceed further with those user APIs? Besides the comment to
> be added above and previous suggestion from Jean ("Invalidations by
> @granularity use field ...), have we reached a consensus now on:
> 
> - attach/detach_pasid_table
> - cach

Re: [PATCH v6 02/22] iommu: introduce device fault data

2019-03-21 Thread Jacob Pan
On Sun, 17 Mar 2019 18:22:12 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside the IOMMU
> subsystem for further processing. This patch introduces
> a generic device fault data structure.
> 
> The fault can be either an unrecoverable fault or a page request,
> also referred to as a recoverable fault.
> 
> We only care about non internal faults that are likely to be reported
> to an external subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v4 -> v5:
> - simplified struct iommu_fault_event comment
> - Moved IOMMU_FAULT_PERM outside of the struct
> - Removed IOMMU_FAULT_PERM_INST
> - s/IOMMU_FAULT_PAGE_REQUEST_PASID_PRESENT/
>   IOMMU_FAULT_PAGE_REQUEST_PASID_VALID
> 
> v3 -> v4:
> - use a union containing aither an unrecoverable fault or a page
>   request message. Move the device private data in the page request
>   structure. Reshuffle the fields and use flags.
> - move fault perm attributes to the uapi
> - remove a bunch of iommu_fault_reason enum values that were related
>   to internal errors
> ---
>  include/linux/iommu.h  |  44 ++
>  include/uapi/linux/iommu.h | 115
> + 2 files changed, 159
> insertions(+) create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ffbbc7e39cee..c6f398f7e6e0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -48,6 +49,7 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -55,6 +57,7 @@ struct notifier_block;
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *); 
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be
> mapped*/ @@ -247,6 +250,46 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic fault event
> + *
> + * Can represent recoverable faults such as a page requests or
> + * unrecoverable faults such as DMA or IRQ remapping faults.
> + *
> + * @fault: fault descriptor
> + * @iommu_private: used by the IOMMU driver for storing
> fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at
> device level
> + * @data: handler private data
> + *
> + */
> +struct iommu_fault_param {
> + iommu_dev_fault_handler_t handler;
> + void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
> + *   struct iommu_group  *iommu_group;
> + *   struct iommu_fwspec *iommu_fwspec;
> + */
> +struct iommu_param {
> + struct iommu_fault_param *fault_param;
> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -422,6 +465,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> new file mode 100644
> index ..edcc0dda7993
> --- /dev/null
> +++ b/include/uapi/linux/iommu.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * IOMMU user API definitions
> + */
> +
> +#ifndef _UAPI_IOMMU_H
> +#define _UAPI_IOMMU_H
> +
> +#include 
> +
> +#define IOMMU_FAULT_PERM_WRITE   (1 << 0) /* write */
> +#define IOMMU_FAULT_PERM_EXEC(1 << 1) /* e

Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-20 Thread Jacob Pan
On Sun, 17 Mar 2019 18:22:15 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own
> format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v5 -> v6:
> - fix merge issue
> 
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c  | 14 
>  include/linux/iommu.h  | 15 
>  include/uapi/linux/iommu.h | 71
> ++ 3 files changed, 100
> insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7d9285cea100..b72e326ddd41 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1544,6 +1544,20 @@ void iommu_detach_pasid_table(struct
> iommu_domain *domain) }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct
> device *dev,
> +struct iommu_cache_invalidate_info
> *inv_info) +{
> + int ret = 0;
> +
> + if (unlikely(!domain->ops->cache_invalidate))
> + return -ENODEV;
> +
> + ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fb9b7a8de25f..7c7c6bad1420 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -191,6 +191,7 @@ struct iommu_resv_region {
>   *  driver init to device driver init (default
> no)
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -239,6 +240,9 @@ struct iommu_ops {
> struct iommu_pasid_table_config
> *cfg); void (*detach_pasid_table)(struct iommu_domain *domain);
>  
> + int (*cache_invalidate)(struct iommu_domain *domain, struct
> device *dev,
> + struct iommu_cache_invalidate_info
> *inv_info); +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -349,6 +353,9 @@ extern void iommu_detach_device(struct
> iommu_domain *domain, extern int iommu_attach_pasid_table(struct
> iommu_domain *domain, struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +   struct device *dev,
> +   struct iommu_cache_invalidate_info
> *inv_info); extern struct iommu_domain
> *iommu_get_domain_for_dev(struct device *dev); extern struct
> iommu_domain *iommu_get_dma_domain(struct device *dev); extern int
> iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -797,6
> +804,14 @@ int iommu_attach_pasid_table(struct iommu_domain *domain,
> static inline void iommu_detach_pasid_table(struct iommu_domain
> *domain) {} 
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 532a64075f23..e4c6a447e85a 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -159,4 +159,75 @@ struct iommu_pasid_table_config {
>   };
>  };
>  
> 

Re: [PATCH v5 05/22] iommu: Introduce cache_invalidate API

2019-03-15 Thread Jacob Pan
On Fri, 15 Mar 2019 17:08:49 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own
> format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c  | 14 
>  include/linux/iommu.h  | 21 +++
>  include/uapi/linux/iommu.h | 71
> ++ 3 files changed, 106
> insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7d9285cea100..b72e326ddd41 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1544,6 +1544,20 @@ void iommu_detach_pasid_table(struct
> iommu_domain *domain) }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct
> device *dev,
> +struct iommu_cache_invalidate_info
> *inv_info) +{
> + int ret = 0;
> +
> + if (unlikely(!domain->ops->cache_invalidate))
> + return -ENODEV;
> +
> + ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fb9b7a8de25f..3d8e48876162 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -191,6 +191,7 @@ struct iommu_resv_region {
>   *  driver init to device driver init (default
> no)
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -239,6 +240,9 @@ struct iommu_ops {
> struct iommu_pasid_table_config
> *cfg); void (*detach_pasid_table)(struct iommu_domain *domain);
>  
> + int (*cache_invalidate)(struct iommu_domain *domain, struct
> device *dev,
> + struct iommu_cache_invalidate_info
> *inv_info); +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -349,6 +353,9 @@ extern void iommu_detach_device(struct
> iommu_domain *domain, extern int iommu_attach_pasid_table(struct
> iommu_domain *domain, struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +   struct device *dev,
> +   struct iommu_cache_invalidate_info
> *inv_info); extern struct iommu_domain
> *iommu_get_domain_for_dev(struct device *dev); extern struct
> iommu_domain *iommu_get_dma_domain(struct device *dev); extern int
> iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -795,7
> +802,21 @@ int iommu_attach_pasid_table(struct iommu_domain *domain, }
>  
>  static inline
> +<<<<<<< HEAD
>  void iommu_detach_pasid_table(struct iommu_domain *domain) {}
> +===
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + return -ENODEV;
> +}
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + return -ENODEV;
> +}
> +>>>>>>> 56df871916e5... iommu: Introduce cache_invalidate API  
forgot to merge :)
>  
>  #endif /* CONFIG_IOMMU_API */
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 532a

Re: [PATCH v4 03/22] iommu: introduce device fault report API

2019-03-06 Thread Jacob Pan
On Tue, 5 Mar 2019 15:03:41 +
Jean-Philippe Brucker  wrote:

> On 18/02/2019 13:54, Eric Auger wrote:
> [...]> +/**
> > + * iommu_register_device_fault_handler() - Register a device fault
> > handler
> > + * @dev: the device
> > + * @handler: the fault handler
> > + * @data: private data passed as argument to the handler
> > + *
> > + * When an IOMMU fault event is received, call this handler with
> > the fault event
> > + * and data as argument. The handler should return 0 on success.
> > If the fault is
> > + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also
> > complete
> > + * the fault by calling iommu_page_response() with one of the
> > following
> > + * response code:
> > + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> > + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> > + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop
> > reporting
> > + *   page faults if possible.  
> 
> The comment refers to function and values that haven't been defined
> yet. Either the page_response() patch should come before, or we need
> to split this patch.
> 
> Something I missed before: if the handler fails (returns != 0) it
> should complete the fault by calling iommu_page_response(), if we're
> not doing it in iommu_report_device_fault(). It should be indicated
> in this comment. It's safe for the handler to call page_response()
> since we're not holding fault_param->lock when calling the handler.
> 
If the page request fault is to be reported to a guest, the report
function cannot wait for the completion status. As long as the fault is
injected into the guest, the handler should complete with success. If
the PRQ report fails, IMHO, the caller of iommu_report_device_fault()
should send page_response, perhaps after clean up all partial response
of the group too.

> > + *
> > + * Return 0 if the fault handler was installed successfully, or an
> > error.
> > + */  
> [...]
> > +/**
> > + * iommu_report_device_fault() - Report fault event to device
> > + * @dev: the device
> > + * @evt: fault event data
> > + *
> > + * Called by IOMMU model specific drivers when fault is detected,
> > typically
> > + * in a threaded IRQ handler.
> > + *
> > + * Return 0 on success, or an error.
> > + */
> > +int iommu_report_device_fault(struct device *dev, struct
> > iommu_fault_event *evt) +{
> > +   int ret = 0;
> > +   struct iommu_fault_event *evt_pending;
> > +   struct iommu_fault_param *fparam;
> > +
> > +   /* iommu_param is allocated when device is added to group
> > */
> > +   if (!dev->iommu_param | !evt)  
> 
> Typo: ||
> 
> Thanks,
> Jean
> 
> > +   return -EINVAL;
> > +   /* we only report device fault if there is a handler
> > registered */
> > +   mutex_lock(&dev->iommu_param->lock);
> > +   if (!dev->iommu_param->fault_param ||
> > +   !dev->iommu_param->fault_param->handler) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +   fparam = dev->iommu_param->fault_param;
> > +   if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> > +   evt->fault.prm.flags &
> > IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE) {
> > +   evt_pending = kmemdup(evt, sizeof(struct
> > iommu_fault_event),
> > +   GFP_KERNEL);
> > +       if (!evt_pending) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +   mutex_lock(&fparam->lock);
> > +   list_add_tail(&evt_pending->list, &fparam->faults);
> > +   mutex_unlock(&fparam->lock);
> > +   }
> > +   ret = fparam->handler(evt, fparam->data);
> > +done_unlock:
> > +   mutex_unlock(&dev->iommu_param->lock);
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_report_device_fault);  
> [...]

[Jacob Pan]
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 05/22] iommu: Introduce cache_invalidate API

2019-03-06 Thread Jacob Pan
On Tue, 5 Mar 2019 19:14:42 +0100
Auger Eric  wrote:

> Hi Kevin, Yi,
> 
> On 3/5/19 4:28 PM, Jean-Philippe Brucker wrote:
> > On 18/02/2019 13:54, Eric Auger wrote:  
> >> From: "Liu, Yi L" 
> >>
> >> In any virtualization use case, when the first translation stage
> >> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> >> of caching structure updates unless the guest invalidation
> >> activities are trapped by the virtualizer and passed down to the
> >> host.
> >>
> >> Since the invalidation data are obtained from user space and will
> >> be written into physical IOMMU, we must allow security check at
> >> various layers. Therefore, generic invalidation data format are
> >> proposed here, model specific IOMMU drivers need to convert them
> >> into their own format.
> >>
> >> Signed-off-by: Liu, Yi L 
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Jacob Pan
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Eric Auger
> >> 
> >>
> >> ---
> >> v3 -> v4:
> >> - full reshape of the API following Alex' comments
> >>
> >> v1 -> v2:
> >> - add arch_id field
> >> - renamed tlb_invalidate into cache_invalidate as this API allows
> >>   to invalidate context caches on top of IOTLBs
> >>
> >> v1:
> >> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> >> header. Commit message reworded.
> >> ---
> >>  drivers/iommu/iommu.c  | 14 
> >>  include/linux/iommu.h  | 14 
> >>  include/uapi/linux/iommu.h | 71
> >> ++ 3 files changed, 99
> >> insertions(+)
> >>
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index b3adb77cb14c..bcb8eb15426c 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1564,6 +1564,20 @@ void iommu_detach_pasid_table(struct
> >> iommu_domain *domain) }
> >>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> >>  
> >> +int iommu_cache_invalidate(struct iommu_domain *domain, struct
> >> device *dev,
> >> + struct iommu_cache_invalidate_info
> >> *inv_info) +{
> >> +  int ret = 0;
> >> +
> >> +  if (unlikely(!domain->ops->cache_invalidate))
> >> +  return -ENODEV;
> >> +
> >> +  ret = domain->ops->cache_invalidate(domain, dev,
> >> inv_info); +
> >> +  return ret;
> >> +}
> >> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> >> +
> >>  static void __iommu_detach_device(struct iommu_domain *domain,
> >>  struct device *dev)
> >>  {
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index 7045e26f3a7d..a3b879d0753c 100644
> >> --- a/include/linux/iommu.h
> >> +++ b/include/linux/iommu.h
> >> @@ -189,6 +189,7 @@ struct iommu_resv_region {
> >>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> >>   * @attach_pasid_table: attach a pasid table
> >>   * @detach_pasid_table: detach the pasid table
> >> + * @cache_invalidate: invalidate translation caches
> >>   */
> >>  struct iommu_ops {
> >>bool (*capable)(enum iommu_cap);
> >> @@ -235,6 +236,9 @@ struct iommu_ops {
> >>  struct iommu_pasid_table_config
> >> *cfg); void (*detach_pasid_table)(struct iommu_domain *domain);
> >>  
> >> +  int (*cache_invalidate)(struct iommu_domain *domain,
> >> struct device *dev,
> >> +  struct
> >> iommu_cache_invalidate_info *inv_info); +
> >>unsigned long pgsize_bitmap;
> >>  };
> >>  
> >> @@ -348,6 +352,9 @@ extern void iommu_detach_device(struct
> >> iommu_domain *domain, extern int iommu_attach_pasid_table(struct
> >> iommu_domain *domain, struct iommu_pasid_table_config *cfg);
> >>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> >> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> >> +struct device *dev,
> >> +struct
> >> iommu_cache_invalidate_info *inv_info); extern struct iommu_domain
> >> *iommu_get_domain_for_dev(struct device *dev); extern struct
> >> iommu_domain *iommu_get_

Re: [RFC v3 14/21] iommu: introduce device fault data

2019-01-14 Thread Jacob Pan
On Fri, 11 Jan 2019 11:06:29 +
Jean-Philippe Brucker  wrote:

> On 10/01/2019 18:45, Jacob Pan wrote:
> > On Tue,  8 Jan 2019 11:26:26 +0100
> > Eric Auger  wrote:
> >   
> >> From: Jacob Pan 
> >>
> >> Device faults detected by IOMMU can be reported outside IOMMU
> >> subsystem for further processing. This patch intends to provide
> >> a generic device fault data such that device drivers can be
> >> communicated with IOMMU faults without model specific knowledge.
> >>
> >> The proposed format is the result of discussion at:
> >> https://lkml.org/lkml/2017/11/10/291
> >> Part of the code is based on Jean-Philippe Brucker's patchset
> >> (https://patchwork.kernel.org/patch/9989315/).
> >>
> >> The assumption is that model specific IOMMU driver can filter and
> >> handle most of the internal faults if the cause is within IOMMU
> >> driver control. Therefore, the fault reasons can be reported are
> >> grouped and generalized based common specifications such as PCI
> >> ATS.
> >>
> >> Signed-off-by: Jacob Pan 
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Liu, Yi L
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Eric Auger
> >>  [moved part of the iommu_fault_event
> >> struct in the uapi, enriched the fault reasons to be able to map
> >> unrecoverable SMMUv3 errors] ---
> >>  include/linux/iommu.h  | 55 -
> >>  include/uapi/linux/iommu.h | 83
> >> ++ 2 files changed, 136
> >> insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index 244c1a3d5989..1dedc2d247c2 100644
> >> --- a/include/linux/iommu.h
> >> +++ b/include/linux/iommu.h
> >> @@ -49,13 +49,17 @@ struct bus_type;
> >>  struct device;
> >>  struct iommu_domain;
> >>  struct notifier_block;
> >> +struct iommu_fault_event;
> >>  
> >>  /* iommu fault flags */
> >> -#define IOMMU_FAULT_READ  0x0
> >> -#define IOMMU_FAULT_WRITE 0x1
> >> +#define IOMMU_FAULT_READ  (1 << 0)
> >> +#define IOMMU_FAULT_WRITE (1 << 1)
> >> +#define IOMMU_FAULT_EXEC  (1 << 2)
> >> +#define IOMMU_FAULT_PRIV  (1 << 3)
> >>  
> >>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
> >>struct device *, unsigned long, int, void
> >> *); +typedef int (*iommu_dev_fault_handler_t)(struct
> >> iommu_fault_event *, void *); 
> >>  struct iommu_domain_geometry {
> >>dma_addr_t aperture_start; /* First address that can be
> >> mapped*/ @@ -255,6 +259,52 @@ struct iommu_device {
> >>struct device *dev;
> >>  };
> >>  
> >> +/**
> >> + * struct iommu_fault_event - Generic per device fault data
> >> + *
> >> + * - PCI and non-PCI devices
> >> + * - Recoverable faults (e.g. page request), information based on
> >> PCI ATS
> >> + * and PASID spec.
> >> + * - Un-recoverable faults of device interest
> >> + * - DMA remapping and IRQ remapping faults
> >> + *
> >> + * @fault: fault descriptor
> >> + * @device_private: if present, uniquely identify device-specific
> >> + *  private data for an individual page request.
> >> + * @iommu_private: used by the IOMMU driver for storing
> >> fault-specific
> >> + * data. Users should not modify this field before
> >> + * sending the fault response.
> >> + */
> >> +struct iommu_fault_event {
> >> +  struct iommu_fault fault;
> >> +  u64 device_private;  
> > I think we want to move device_private to uapi since it gets
> > injected into the guest, then returned by guest in case of page
> > response. For VT-d we also need 128 bits of private data. VT-d
> > spec. 7.7.1  
> 
> Ah, I didn't notice the format changed in VT-d rev3. On that topic,
> how do we manage future extensions to the iommu_fault struct? Should
> we add ~48 bytes of padding after device_private, along with some
> flags telling which field is valid, or deal with it using a structure
> version like we do for the invalidate and bind structs? In the first
> case, iommu_fault wouldn't fit in a 64-byte cacheline anymore, but
> I'm not sure we care.
> 
IMHO, I like version and padding. I don't see a need for fl

Re: [RFC v3 14/21] iommu: introduce device fault data

2019-01-10 Thread Jacob Pan
On Tue,  8 Jan 2019 11:26:26 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside IOMMU
> subsystem for further processing. This patch intends to provide
> a generic device fault data such that device drivers can be
> communicated with IOMMU faults without model specific knowledge.
> 
> The proposed format is the result of discussion at:
> https://lkml.org/lkml/2017/11/10/291
> Part of the code is based on Jean-Philippe Brucker's patchset
> (https://patchwork.kernel.org/patch/9989315/).
> 
> The assumption is that model specific IOMMU driver can filter and
> handle most of the internal faults if the cause is within IOMMU driver
> control. Therefore, the fault reasons can be reported are grouped
> and generalized based common specifications such as PCI ATS.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> [moved part of the iommu_fault_event struct in the uapi, enriched
>  the fault reasons to be able to map unrecoverable SMMUv3 errors]
> ---
>  include/linux/iommu.h  | 55 -
>  include/uapi/linux/iommu.h | 83
> ++ 2 files changed, 136
> insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 244c1a3d5989..1dedc2d247c2 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -49,13 +49,17 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
> -#define IOMMU_FAULT_READ 0x0
> -#define IOMMU_FAULT_WRITE0x1
> +#define IOMMU_FAULT_READ (1 << 0)
> +#define IOMMU_FAULT_WRITE(1 << 1)
> +#define IOMMU_FAULT_EXEC (1 << 2)
> +#define IOMMU_FAULT_PRIV (1 << 3)
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *); 
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be
> mapped*/ @@ -255,6 +259,52 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic per device fault data
> + *
> + * - PCI and non-PCI devices
> + * - Recoverable faults (e.g. page request), information based on
> PCI ATS
> + * and PASID spec.
> + * - Un-recoverable faults of device interest
> + * - DMA remapping and IRQ remapping faults
> + *
> + * @fault: fault descriptor
> + * @device_private: if present, uniquely identify device-specific
> + *  private data for an individual page request.
> + * @iommu_private: used by the IOMMU driver for storing
> fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 device_private;
I think we want to move device_private to uapi since it gets injected
into the guest, then returned by guest in case of page response. For
VT-d we also need 128 bits of private data. VT-d spec. 7.7.1

For exception tracking (e.g. unanswered page request), I can add timer
and list info later when I include PRQ. sounds ok?
> + u64 iommu_private;

> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at
> device level
> + * @data: handler private data
> + *
> + */
> +struct iommu_fault_param {
> + iommu_dev_fault_handler_t handler;
> + void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
> + *   struct iommu_group  *iommu_group;
> + *   struct iommu_fwspec *iommu_fwspec;
> + */
> +struct iommu_param {
> + struct iommu_fault_param *fault_param;
> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -438,6 +488,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git 

Re: [RFC v2 14/20] iommu: introduce device fault data

2018-12-14 Thread Jacob Pan
On Wed, 12 Dec 2018 09:21:43 +0100
Auger Eric  wrote:

> Hi Jacob,
> 
> On 9/21/18 12:06 AM, Jacob Pan wrote:
> > On Tue, 18 Sep 2018 16:24:51 +0200
> > Eric Auger  wrote:
> >   
> >> From: Jacob Pan 
> >>
> >> Device faults detected by IOMMU can be reported outside IOMMU
> >> subsystem for further processing. This patch intends to provide
> >> a generic device fault data such that device drivers can be
> >> communicated with IOMMU faults without model specific knowledge.
> >>
> >> The proposed format is the result of discussion at:
> >> https://lkml.org/lkml/2017/11/10/291
> >> Part of the code is based on Jean-Philippe Brucker's patchset
> >> (https://patchwork.kernel.org/patch/9989315/).
> >>
> >> The assumption is that model specific IOMMU driver can filter and
> >> handle most of the internal faults if the cause is within IOMMU
> >> driver control. Therefore, the fault reasons can be reported are
> >> grouped and generalized based common specifications such as PCI
> >> ATS.
> >>
> >> Signed-off-by: Jacob Pan 
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Liu, Yi L
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Eric Auger
> >>  [moved part of the iommu_fault_event
> >> struct in the uapi, enriched the fault reasons to be able to map
> >> unrecoverable SMMUv3 errors]  
> > Sounds good to me.
> > There are also other "enrichment" we need to do to support mdev or
> > finer granularity fault reporting below physical device. e.g. PASID
> > level.
> > 
> > The current scheme works for PCIe physical device level, where each
> > device registers a single handler only once. When device fault is
> > detected by the IOMMU, it will find the matching handler and private
> > data to report back. However, for devices partitioned by PASID and
> > represented by mdev this may not work. Since IOMMU is not mdev aware
> > and only works at physical device level.
> > So I am thinking we should allow multiple registration of fault
> > handler with different data and ID. i.e.
> > 
> > int iommu_register_device_fault_handler(struct device *dev,
> > iommu_dev_fault_handler_t
> > handler, int id,
> > void *data)
> > 
> > where the new "id field" is
> >  * @id: Identification of the handler private data, will be used by
> > fault
> >  *  reporting code to match the handler data to be returned.
> > For page
> >  *  request, this can be the PASID. ID must be unique per
> > device, i.e.
> >  *  each ID can only be registered once per device.
> >  *  - IOMMU_DEV_FAULT_ID_UNRECOVERY (~0U) is reserved for fault
> > reporting
> >  *  w/o ID. e.g. unrecoverable faults.
> > 
> > I am still testing, but just wanted to have feedback on this idea.  
> 
> I am currently respinning this series. Do you have a respin for this
> patch including iommu_register_device_fault_handler with the @id param
> as you suggested above? Otherwise 2 solutions: I keep the code as is
> or I do the modification myself implementing a list of fault_params?
> 
you can keep the code as is if it fits your current needs. Yi and I
have thought of some new cases for supporting mdev. We are thinking to
support many to many handler vs PASID relationship. i.e. allow
registration of many fault handlers per device, each associated with an
ID and data. The use case is that a physical device may register a
fault handler for its own PASID or non-PASID related faults. Such
physical device can also be partitioned into sub-device, e.g. mdev, but
fault handler registration is at physical device level in that IOMMU is
not mdev aware.
Anyway, still need some work to flush out the details.
> Besides do you plans for "[PATCH v5 00/23] IOMMU and VT-d driver
> support for Shared Virtual Address (SVA)" respin - hope I didn't miss
> anything? - ?
> 
You did not miss anything. Yes, we are still working on some internal
integration issues. It should not affect the common interface much. Or
I can send out a common API spin first once we have the functionality
tested.

Thanks for checking.

> Thanks
> 
> Eric
> > 
> > Thanks,
> > 
> > Jacob
> > 
> >   
> >> ---
> >>  include/linux/iommu.h  | 55 -
> >>  include/uapi/linux/iommu.h | 83
> >> ++ 2 files changed, 136
> >> insertions(+), 2 deletions(-)
> >>
&g

Re: [RFC v2 01/20] iommu: Introduce bind_pasid_table API

2018-09-21 Thread Jacob Pan
On Tue, 18 Sep 2018 16:24:38 +0200
Eric Auger  wrote:

> From: Jacob Pan 
> 
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on a guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/VTD terminology) while to host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to bind and unbind the guest configuration data to the host.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API. We foresee at least two specializations of this struct,
> for PASID table passing and ARM SMMUv3.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> In practice, I think it would be simpler to have a single
> set_pasid_table function instead of bind/unbind. The "bypass" field
> tells the stage 1 is bypassed (equivalent to the unbind actually).
> On userspace we have notifications that the device context has
> changed. Calling either bind or unbind requires to have an understand
> of what was the previous state and call different notifiers. So to me
> the bind/unbind complexifies the user integration while not bring much
> benefits.
> 
I don't have strong preference and I think having a single function
makes sense. In VT-d2, the bind/unbind operation is a result of PASID
cache invalidation from the guest. So there is no symmetrical
bind/unbin user calls.

> This patch generalizes the API introduced by Jacob & co-authors in
> https://lwn.net/Articles/754331/
> 
> v1 -> v2:
> - restore the original pasid table name
> - remove the struct device * parameter in the API
> - reworked iommu_pasid_smmuv3
> ---
>  drivers/iommu/iommu.c  | 19 ++
>  include/linux/iommu.h  | 21 +++
>  include/uapi/linux/iommu.h | 52
> ++ 3 files changed, 92
> insertions(+) create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8c15c5980299..db2c7c9502ae 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1362,6 +1362,25 @@ int iommu_attach_device(struct iommu_domain
> *domain, struct device *dev) }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain,
> +struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->bind_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->bind_pasid_table(domain, cfg);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +void iommu_unbind_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->unbind_pasid_table))
> + return;
> +
> + domain->ops->unbind_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 87994c265bf5..e56cad4863f7 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -185,6 +186,8 @@ struct iommu_resv_region {
>   * @domain_get_windows: Return the number of windows for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @bind_pasid_table: bind pasid table
> + * @unbind_pasid_table: unbind pasid table and restore defaults
>   */
>  struct iommu_ops {
>   bool (*capable)(enum iommu_cap);
> @@ -231,6 +234,10 @@ struct iommu_ops {
>   int (*of_xlate)(struct device *dev, struct of_phandle_args
> *args); bool (*is_attach_deferred)(struct iommu_domain *domain,
> struct device *dev); 
> + int (*bind_pasid_table)(struct iommu_domain *domain,
> + struct iommu_pasid_table_config
> *cfg);
> + void (*unbind_pasid_table)(struct iommu_domain *domain);
> +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -292,6 +299,9 @@ extern int 

Re: [RFC v2 14/20] iommu: introduce device fault data

2018-09-21 Thread Jacob Pan
On Tue, 18 Sep 2018 16:24:51 +0200
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside IOMMU
> subsystem for further processing. This patch intends to provide
> a generic device fault data such that device drivers can be
> communicated with IOMMU faults without model specific knowledge.
> 
> The proposed format is the result of discussion at:
> https://lkml.org/lkml/2017/11/10/291
> Part of the code is based on Jean-Philippe Brucker's patchset
> (https://patchwork.kernel.org/patch/9989315/).
> 
> The assumption is that model specific IOMMU driver can filter and
> handle most of the internal faults if the cause is within IOMMU driver
> control. Therefore, the fault reasons can be reported are grouped
> and generalized based common specifications such as PCI ATS.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> [moved part of the iommu_fault_event struct in the uapi, enriched
>  the fault reasons to be able to map unrecoverable SMMUv3 errors]
Sounds good to me.
There are also other "enrichment" we need to do to support mdev or
finer granularity fault reporting below physical device. e.g. PASID
level.

The current scheme works for PCIe physical device level, where each
device registers a single handler only once. When device fault is
detected by the IOMMU, it will find the matching handler and private
data to report back. However, for devices partitioned by PASID and
represented by mdev this may not work. Since IOMMU is not mdev aware
and only works at physical device level.
So I am thinking we should allow multiple registration of fault handler
with different data and ID. i.e.

int iommu_register_device_fault_handler(struct device *dev,
iommu_dev_fault_handler_t handler,
int id,
void *data)

where the new "id field" is
 * @id: Identification of the handler private data, will be used by fault
 *  reporting code to match the handler data to be returned. For page
 *  request, this can be the PASID. ID must be unique per device, i.e.
 *  each ID can only be registered once per device.
 *  - IOMMU_DEV_FAULT_ID_UNRECOVERY (~0U) is reserved for fault reporting
 *  w/o ID. e.g. unrecoverable faults.

I am still testing, but just wanted to have feedback on this idea.

Thanks,

Jacob


> ---
>  include/linux/iommu.h  | 55 -
>  include/uapi/linux/iommu.h | 83
> ++ 2 files changed, 136
> insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9bd3e63d562b..7529c14ff506 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -49,13 +49,17 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
> -#define IOMMU_FAULT_READ 0x0
> -#define IOMMU_FAULT_WRITE0x1
> +#define IOMMU_FAULT_READ (1 << 0)
> +#define IOMMU_FAULT_WRITE(1 << 1)
> +#define IOMMU_FAULT_EXEC (1 << 2)
> +#define IOMMU_FAULT_PRIV (1 << 3)
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *); 
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be
> mapped*/ @@ -262,6 +266,52 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic per device fault data
> + *
> + * - PCI and non-PCI devices
> + * - Recoverable faults (e.g. page request), information based on
> PCI ATS
> + * and PASID spec.
> + * - Un-recoverable faults of device interest
> + * - DMA remapping and IRQ remapping faults
> + *
> + * @fault: fault descriptor
> + * @device_private: if present, uniquely identify device-specific
> + *  private data for an individual page request.
> + * @iommu_private: used by the IOMMU driver for storing
> fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 device_private;
> + u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at
> device leve

Re: [RFC v2 14/20] iommu: introduce device fault data

2018-09-21 Thread Jacob Pan
On Fri, 21 Sep 2018 11:54:56 +0200
Auger Eric  wrote:

> Hi Jacob,
> 
> On 9/21/18 12:06 AM, Jacob Pan wrote:
> > On Tue, 18 Sep 2018 16:24:51 +0200
> > Eric Auger  wrote:
> >   
> >> From: Jacob Pan 
> >>
> >> Device faults detected by IOMMU can be reported outside IOMMU
> >> subsystem for further processing. This patch intends to provide
> >> a generic device fault data such that device drivers can be
> >> communicated with IOMMU faults without model specific knowledge.
> >>
> >> The proposed format is the result of discussion at:
> >> https://lkml.org/lkml/2017/11/10/291
> >> Part of the code is based on Jean-Philippe Brucker's patchset
> >> (https://patchwork.kernel.org/patch/9989315/).
> >>
> >> The assumption is that model specific IOMMU driver can filter and
> >> handle most of the internal faults if the cause is within IOMMU
> >> driver control. Therefore, the fault reasons can be reported are
> >> grouped and generalized based common specifications such as PCI
> >> ATS.
> >>
> >> Signed-off-by: Jacob Pan 
> >> Signed-off-by: Jean-Philippe Brucker
> >>  Signed-off-by: Liu, Yi L
> >>  Signed-off-by: Ashok Raj
> >>  Signed-off-by: Eric Auger
> >>  [moved part of the iommu_fault_event
> >> struct in the uapi, enriched the fault reasons to be able to map
> >> unrecoverable SMMUv3 errors]  
> > Sounds good to me.
> > There are also other "enrichment" we need to do to support mdev or
> > finer granularity fault reporting below physical device. e.g. PASID
> > level.  
> 
> Actually I intended to send you an email about those
> iommu_fault_reason enum value changes. To attach this discussion to
> your original series, I will send a separate email in the "[PATCH v5
> 00/23] IOMMU and VT-d driver support for Shared Virtual Address
> (SVA)" thread.
> > 
> > The current scheme works for PCIe physical device level, where each
> > device registers a single handler only once. When device fault is
> > detected by the IOMMU, it will find the matching handler and private
> > data to report back. However, for devices partitioned by PASID and
> > represented by mdev this may not work. Since IOMMU is not mdev aware
> > and only works at physical device level.
> > So I am thinking we should allow multiple registration of fault
> > handler with different data and ID. i.e.
> > 
> > int iommu_register_device_fault_handler(struct device *dev,
> > iommu_dev_fault_handler_t
> > handler, int id,
> > void *data)
> > 
> > where the new "id field" is
> >  * @id: Identification of the handler private data, will be used by
> > fault
> >  *  reporting code to match the handler data to be returned.
> > For page
> >  *  request, this can be the PASID. ID must be unique per
> > device, i.e.
> >  *  each ID can only be registered once per device.
> >  *  - IOMMU_DEV_FAULT_ID_UNRECOVERY (~0U) is reserved for fault
> > reporting
> >  *  w/o ID. e.g. unrecoverable faults.  
> I don't get this last sentence. Don't you need the feature also for
> unrecoverable faults, ie. isn't it requested to report an
> unrecoverable fault on a specific id?
> 
For unrecoverable faults which are not associated with a specific PASID,
we reserve a range of special IDs for them. Let me rewrite the
comments.
The usage would be:
For Handler registration by vfio or device driver
1.  PRQ of PASID1
iommu_register_device_fault_handler(pdev, handler, pasid1, data1);
2.  PRQ of PASID2
iommu_register_device_fault_handler(pdev, handler, pasid2, data2);
3. unrecoverable fault
iommu_register_device_fault_handler(pdev, handler,
IOMMU_DEV_FAULT_ID_UNRECOVERY, NULL);

For IOMMU driver reporting fault back to vfio or kernel driver:
1. PRQ of PASID1
iommu_report_device_fault(dev, evt1)
where evt1->data = data1, evt1->pasid = pasid1
2. PRQ of PASID2
iommu_report_device_fault(dev, evt2)
where evt2->data = data2, evt2->pasid = pasid2
3. unrecoverable fault
iommu_report_device_fault(dev, evt)
where evt2->data = NULL, evt->pasid = IOMMU_DEV_FAULT_ID_UNRECOVERY

where evt is of struct iommu_fault_event.

> Otherwise looks OK; but I still need to carefully review "[RFC PATCH
> v2 00/10] vfio/mdev: IOMMU aware mediated device".
> 
> Thanks
> 
> Eric
> > 
> > I am still testing, but just wanted to have feedback on this idea.
> > 
> > Thanks,
> > 
> > Jacob
> > 
> >   
> >>