Re: [RFC] iommu: arm-smmu: stall support

2019-05-13 Thread Rob Clark
On Mon, May 13, 2019 at 11:37 AM Jean-Philippe Brucker
 wrote:
>
> Hi Rob,
>
> On 10/05/2019 19:23, Rob Clark wrote:
> > On Fri, Sep 22, 2017 at 2:58 AM Jean-Philippe Brucker
> >  wrote:
> >>
> >> On 22/09/17 10:02, Joerg Roedel wrote:
> >>> On Tue, Sep 19, 2017 at 10:23:43AM -0400, Rob Clark wrote:
>  I would like to decide in the IRQ whether or not to queue work or not,
>  because when we get a gpu fault, we tend to get 1000's of gpu faults
>  all at once (and I really only need to handle the first one).  I
>  suppose that could also be achieved by having a special return value
>  from the fault handler to say "call me again from a wq"..
> 
>  Note that in the drm driver I already have a suitable wq to queue the
>  work, so it really doesn't buy me anything to have the iommu driver
>  toss things off to a wq for me.  Might be a different situation for
>  other drivers (but I guess mostly other drivers are using iommu API
>  indirectly via dma-mapping?)
> >>>
> >>> Okay, so since you are the only user for now, we don't need a
> >>> work-queue. But I still want the ->resume call-back to be hidden in the
> >>> iommu code and not be exposed to users.
> >>>
> >>> We already have per-domain fault-handlers, so the best solution for now
> >>> is to call ->resume from report_iommu_fault() when the fault-handler
> >>> returns a special value.
> >>
> >> The problem is that report_iommu_fault is called from IRQ context by the
> >> SMMU driver, so the device driver callback cannot sleep.
> >>
> >> So if the device driver needs to be able to sleep between fault report and
> >> resume, as I understand Rob needs for writing debugfs, we can either:
> >>
> >> * call report_iommu_fault from higher up, in a thread or workqueue.
> >> * split the fault reporting as this patch proposes. The exact same
> >>   mechanism is needed for the vSVM work by Intel: in order to inject fault
> >>   into the guest, they would like to have an atomic notifier registered by
> >>   VFIO for passing down the Page Request, and a new function in the IOMMU
> >>   API to resume/complete the fault.
> >>
> >
> > So I was thinking about this topic again.. I would still like to get
> > some sort of async resume so that I can wire up GPU cmdstream/state
> > logging on iommu fault (without locally resurrecting and rebasing this
> > patch and drm/msm side changes each time I need to debug iommu
> > faults)..
>
> We've been working on the new fault reporting API with Jacob and Eric,
> and I intend to send it out soon. It is supposed to be used for
> reporting faults to guests via VFIO, handling page faults via mm, and
> also reporting events directly to device drivers. Please let us know
> what works and what doesn't in your case
>
> The most recent version of the patches is at
> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/api
> (git://www.linux-arm.org/linux-jpb.git branch sva/api). Hopefully on the
> list sometimes next week, I'll add you on Cc.
>
> In particular, see commits
> iommu: Introduce device fault data
> iommu: Introduce device fault report API
> iommu: Add recoverable fault reporting
>
> The device driver calls iommu_register_device_fault_handler(dev, cb,
> data). To report a fault, the SMMU driver calls
> iommu_report_device_fault(dev, fault). This calls into the device driver
> directly, there isn't any workqueue. If the fault is recoverable (the
> SMMU driver set type IOMMU_FAULT_PAGE_REQ rather than
> IOMMU_FAULT_DMA_UNRECOV), the device driver calls iommu_page_response()
> once it has dealt with the fault (after sleeping if it needs to). This
> invokes the SMMU driver's resume callback.

Ok, this sounds at a high level similar to my earlier RFC, in that
resume is split (and that was the main thing I was interested in).
And it does solve one thing I was struggling with, namely that when
the domain is created it doesn't know which iommu device it will be
attached to (given that at least the original arm-smmu.c driver cannot
support stall in all cases)..

For GPU translation faults, I also don't really need to know if the
faulting translation is stalled until the callback (I mainly want to
not bother to snapshot GPU state if it is not stalled, because in that
case the data we snapshot is unlikely to be related to the fault if
the translation is not stalled).

> At the moment we use mutexes, so iommu_report_device_fault() can only be
> called from an IRQ thread, which is incompatible with the current SMMUv2
> driver. Either we need to switch the SMMUv2 driver to an IRQ thread, or
> rework the fault handler to be called from an IRQ handler. The reporting
> also has to be per device rather than per domain, and I'm not sure if
> the SMMUv2 driver can deal with this.

I'll take a closer look at the branch and try to formulate some plan
to add v2 support for this.

For my cases, the GPU always has it's own iommu device, while display
and other blocks 

Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API

2019-05-13 Thread Jacob Pan
On Mon, 13 May 2019 18:09:48 +0100
Jean-Philippe Brucker  wrote:

> On 13/05/2019 17:50, Auger Eric wrote:
> >> struct iommu_inv_pasid_info {
> >> #define IOMMU_INV_PASID_FLAGS_PASID(1 << 0)
> >> #define IOMMU_INV_PASID_FLAGS_ARCHID   (1 << 1)
> >>__u32   flags;
> >>__u32   archid;
> >>__u64   pasid;
> >> };  
> > I agree it does the job now. However it looks a bit strange to do a
> > PASID based invalidation in my case - SMMUv3 nested stage - where I
> > don't have any PASID involved.
> > 
> > Couldn't we call it context based invalidation then? A context can
> > be tagged by a PASID or/and an ARCHID.  
> 
> I think calling it "context" would be confusing as well (I shouldn't
> have used it earlier), since VT-d uses that name for device table
> entries (=STE on Arm SMMU). Maybe "addr_space"?
> 
I am still struggling to understand what ARCHID is after scanning
through SMMUv3.1 spec. It seems to be a constant for a given SMMU. Why
do you need to pass it down every time? Could you point to me the
document or explain a little more on ARCHID use cases.
We have three fileds called pasid under this struct
iommu_cache_invalidate_info{}
Gets confusing :)
> Thanks,
> Jean
> 
> > 
> > Domain invalidation would invalidate all the contexts belonging to
> > that domain.
> > 
> > Thanks
> > 
> > Eric  

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu: arm-smmu: stall support

2019-05-13 Thread Jean-Philippe Brucker
Hi Rob,

On 10/05/2019 19:23, Rob Clark wrote:
> On Fri, Sep 22, 2017 at 2:58 AM Jean-Philippe Brucker
>  wrote:
>>
>> On 22/09/17 10:02, Joerg Roedel wrote:
>>> On Tue, Sep 19, 2017 at 10:23:43AM -0400, Rob Clark wrote:
 I would like to decide in the IRQ whether or not to queue work or not,
 because when we get a gpu fault, we tend to get 1000's of gpu faults
 all at once (and I really only need to handle the first one).  I
 suppose that could also be achieved by having a special return value
 from the fault handler to say "call me again from a wq"..

 Note that in the drm driver I already have a suitable wq to queue the
 work, so it really doesn't buy me anything to have the iommu driver
 toss things off to a wq for me.  Might be a different situation for
 other drivers (but I guess mostly other drivers are using iommu API
 indirectly via dma-mapping?)
>>>
>>> Okay, so since you are the only user for now, we don't need a
>>> work-queue. But I still want the ->resume call-back to be hidden in the
>>> iommu code and not be exposed to users.
>>>
>>> We already have per-domain fault-handlers, so the best solution for now
>>> is to call ->resume from report_iommu_fault() when the fault-handler
>>> returns a special value.
>>
>> The problem is that report_iommu_fault is called from IRQ context by the
>> SMMU driver, so the device driver callback cannot sleep.
>>
>> So if the device driver needs to be able to sleep between fault report and
>> resume, as I understand Rob needs for writing debugfs, we can either:
>>
>> * call report_iommu_fault from higher up, in a thread or workqueue.
>> * split the fault reporting as this patch proposes. The exact same
>>   mechanism is needed for the vSVM work by Intel: in order to inject fault
>>   into the guest, they would like to have an atomic notifier registered by
>>   VFIO for passing down the Page Request, and a new function in the IOMMU
>>   API to resume/complete the fault.
>>
> 
> So I was thinking about this topic again.. I would still like to get
> some sort of async resume so that I can wire up GPU cmdstream/state
> logging on iommu fault (without locally resurrecting and rebasing this
> patch and drm/msm side changes each time I need to debug iommu
> faults)..

We've been working on the new fault reporting API with Jacob and Eric,
and I intend to send it out soon. It is supposed to be used for
reporting faults to guests via VFIO, handling page faults via mm, and
also reporting events directly to device drivers. Please let us know
what works and what doesn't in your case

The most recent version of the patches is at
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/api
(git://www.linux-arm.org/linux-jpb.git branch sva/api). Hopefully on the
list sometimes next week, I'll add you on Cc.

In particular, see commits
iommu: Introduce device fault data
iommu: Introduce device fault report API
iommu: Add recoverable fault reporting

The device driver calls iommu_register_device_fault_handler(dev, cb,
data). To report a fault, the SMMU driver calls
iommu_report_device_fault(dev, fault). This calls into the device driver
directly, there isn't any workqueue. If the fault is recoverable (the
SMMU driver set type IOMMU_FAULT_PAGE_REQ rather than
IOMMU_FAULT_DMA_UNRECOV), the device driver calls iommu_page_response()
once it has dealt with the fault (after sleeping if it needs to). This
invokes the SMMU driver's resume callback.

At the moment we use mutexes, so iommu_report_device_fault() can only be
called from an IRQ thread, which is incompatible with the current SMMUv2
driver. Either we need to switch the SMMUv2 driver to an IRQ thread, or
rework the fault handler to be called from an IRQ handler. The reporting
also has to be per device rather than per domain, and I'm not sure if
the SMMUv2 driver can deal with this.

> 
> And I do still prefer the fault cb in irq (or not requiring it in
> wq)..  but on thinking about it, the two ideas aren't entirely
> conflicting, ie. add some flags either when we register handler[1], or
> they could be handled thru domain_set_attr, like:
> 
>  _EXPLICIT_RESUME - iommu API user calls iommu_domain_resume(),
> potentialy from wq/thread after fault handler returns
>  _HANDLER_SLEEPS  - iommu core handles the wq, and calls ops->resume()
> internally
> 
> In both cases, from the iommu driver PoV it just implements
> iommu_ops::resume().. in first case it is called via iommu user either
> from the fault handler or at some point later (ie. wq or thread).
> 
> I don't particularly need the _HANDLER_SLEEPS case (unless I can't
> convince anyone that iommu_domamin_resume() called from outside iommu
> core is a good idea).. so probably I wouldn't wire up the wq plumbing
> for the _HANDLER_SLEEPS case unless someone really wanted me to.
> 
> Since there are more iommu drivers, than places that register fault
> handlers, I like the idea that in either case, 

Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API

2019-05-13 Thread Jean-Philippe Brucker
On 13/05/2019 17:50, Auger Eric wrote:
>> struct iommu_inv_pasid_info {
>> #define IOMMU_INV_PASID_FLAGS_PASID  (1 << 0)
>> #define IOMMU_INV_PASID_FLAGS_ARCHID (1 << 1)
>>  __u32   flags;
>>  __u32   archid;
>>  __u64   pasid;
>> };
> I agree it does the job now. However it looks a bit strange to do a
> PASID based invalidation in my case - SMMUv3 nested stage - where I
> don't have any PASID involved.
> 
> Couldn't we call it context based invalidation then? A context can be
> tagged by a PASID or/and an ARCHID.

I think calling it "context" would be confusing as well (I shouldn't
have used it earlier), since VT-d uses that name for device table
entries (=STE on Arm SMMU). Maybe "addr_space"?

Thanks,
Jean

> 
> Domain invalidation would invalidate all the contexts belonging to that
> domain.
> 
> Thanks
> 
> Eric


Re: [PATCH 3/4] iommu/vt-d: Handle RMRR with PCI bridge device scopes

2019-05-13 Thread Auger Eric
Hi Jacob,

On 5/13/19 6:41 PM, Jacob Pan wrote:
> On Mon, 13 May 2019 09:13:01 +0200
> Eric Auger  wrote:
> 
>> When reading the vtd specification and especially the
>> Reserved Memory Region Reporting Structure chapter,
>> it is not obvious a device scope element cannot be a
>> PCI-PCI bridge, in which case all downstream ports are
>> likely to access the reserved memory region. Let's handle
>> this case in device_has_rmrr.
>>
>> Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from
>> being placed into SI Domain")
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/intel-iommu.c | 32 +++-
>>  1 file changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index e2134b13c9ae..89d82a1d50b1 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -736,12 +736,31 @@ static int iommu_dummy(struct device *dev)
>>  return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
>>  }
>>  
>> +static bool
>> +is_downstream_to_pci_bridge(struct device *deva, struct device *devb)
>> +{
>> +struct pci_dev *pdeva, *pdevb;
>> +
> is there a more illustrative name for these. i guess deva is is the
> bridge dev?
deva is the candidate PCI device likely to belong to the PCI
sub-hierarchy of devb (the candidate bridge).

My concern about the naming was that they are not necessarily a pci
device or a bridge. But at least I can add a comment or rename according
to your suggestion ;-)


>> +if (!dev_is_pci(deva) || !dev_is_pci(devb))
>> +return false;
>> +
>> +pdeva = to_pci_dev(deva);
>> +pdevb = to_pci_dev(devb);
>> +
>> +if (pdevb->subordinate &&
>> +pdevb->subordinate->number <= pdeva->bus->number &&
>> +pdevb->subordinate->busn_res.end >= pdeva->bus->number)
>> +return true;
>> +
>> +return false;
>> +>> +
> this seems to be a separate cleanup patch.
I can split into 2 patches: introduction of this helper and its usage in
device_to_iommu and second patch using it in iommu_has_rmrr.

>>  static struct intel_iommu *device_to_iommu(struct device *dev, u8
>> *bus, u8 *devfn) {
>>  struct dmar_drhd_unit *drhd = NULL;
>>  struct intel_iommu *iommu;
>>  struct device *tmp;
>> -struct pci_dev *ptmp, *pdev = NULL;
>> +struct pci_dev *pdev = NULL;
>>  u16 segment = 0;
>>  int i;
>>  
>> @@ -787,13 +806,7 @@ static struct intel_iommu
>> *device_to_iommu(struct device *dev, u8 *bus, u8 *devf goto out;
>>  }
>>  
>> -if (!pdev || !dev_is_pci(tmp))
>> -continue;
>> -
>> -ptmp = to_pci_dev(tmp);
>> -if (ptmp->subordinate &&
>> -ptmp->subordinate->number <=
>> pdev->bus->number &&
>> -ptmp->subordinate->busn_res.end >=
>> pdev->bus->number)
>> +if (is_downstream_to_pci_bridge(dev, tmp))
>>  goto got_pdev;
>>  }
>>  
>> @@ -2886,7 +2899,8 @@ static bool device_has_rmrr(struct device *dev)
>>   */
>>  for_each_active_dev_scope(rmrr->devices,
>>rmrr->devices_cnt, i, tmp)
>> -if (tmp == dev) {
>> +if (tmp == dev ||
>> +is_downstream_to_pci_bridge(dev, tmp)) {
>>  rcu_read_unlock();
>>  return true;
>>  }
> 
> [Jacob Pan]
> 
Thanks

Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API

2019-05-13 Thread Auger Eric
Hi Jean-Philippe,

On 5/13/19 1:20 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On 13/05/2019 10:14, Auger Eric wrote:
>> I noticed my qemu integration was currently incorrectly using PASID
>> invalidation for ASID based invalidation (SMMUV3 Stage1 CMD_TLBI_NH_ASID
>> invalidation command). So I think we also need ARCHID invalidation.
>> Sorry for the late notice.
>>>  
>>> +/* defines the granularity of the invalidation */
>>> +enum iommu_inv_granularity {
>>> +   IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
>> IOMMU_INV_GRANU_ARCHID, /* archid-selective invalidation */
>>> +   IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */
> 
> In terms of granularity, these values have the same meaning: invalidate
> the whole address space of a context. Then you can communicate two
> things using the same struct:
> * If ATS is enables an Arm host needs to invalidate all ATC entries
> using PASID.
> * If BTM isn't used by the guest, the host needs to invalidate all TLB
> entries using ARCHID.
> 
> Rather than introducing a new granule here, could we just add an archid
> field to the struct associated with IOMMU_INV_GRANU_PASID? Something like...
> 
>>> +   IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
>>> +   IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
>>> +};
>>> +
>>> +/**
>>> + * Address Selective Invalidation Structure
>>> + *
>>> + * @flags indicates the granularity of the address-selective invalidation
>>> + * - if PASID bit is set, @pasid field is populated and the invalidation
>>> + *   relates to cache entries tagged with this PASID and matching the
>>> + *   address range.
>>> + * - if ARCHID bit is set, @archid is populated and the invalidation 
>>> relates
>>> + *   to cache entries tagged with this architecture specific id and 
>>> matching
>>> + *   the address range.
>>> + * - Both PASID and ARCHID can be set as they may tag different caches.
>>> + * - if neither PASID or ARCHID is set, global addr invalidation applies
>>> + * - LEAF flag indicates whether only the leaf PTE caching needs to be
>>> + *   invalidated and other paging structure caches can be preserved.
>>> + * @pasid: process address space id
>>> + * @archid: architecture-specific id
>>> + * @addr: first stage/level input address
>>> + * @granule_size: page/block size of the mapping in bytes
>>> + * @nb_granules: number of contiguous granules to be invalidated
>>> + */
>>> +struct iommu_inv_addr_info {
>>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
>>> +   __u32   flags;
>>> +   __u32   archid;
>>> +   __u64   pasid;
>>> +   __u64   addr;
>>> +   __u64   granule_size;
>>> +   __u64   nb_granules;
>>> +};
> 
> struct iommu_inv_pasid_info {
> #define IOMMU_INV_PASID_FLAGS_PASID   (1 << 0)
> #define IOMMU_INV_PASID_FLAGS_ARCHID  (1 << 1)
>   __u32   flags;
>   __u32   archid;
>   __u64   pasid;
> };
I agree it does the job now. However it looks a bit strange to do a
PASID based invalidation in my case - SMMUv3 nested stage - where I
don't have any PASID involved.

Couldn't we call it context based invalidation then? A context can be
tagged by a PASID or/and an ARCHID.

Domain invalidation would invalidate all the contexts belonging to that
domain.

Thanks

Eric
> 
>>> +
>>> +/**
>>> + * First level/stage invalidation information
>>> + * @cache: bitfield that allows to select which caches to invalidate
>>> + * @granularity: defines the lowest granularity used for the invalidation:
>>> + * domain > pasid > addr
>>> + *
>>> + * Not all the combinations of cache/granularity make sense:
>>> + *
>>> + * type |   DEV_IOTLB   | IOTLB |  PASID|
>>> + * granularity |   |   |  cache|
>>> + * -+---+---+---+
>>> + * DOMAIN  |   N/A |   Y   |   Y   |
>>  * ARCHID   |   N/A |   Y   |   N/A |
>>
>>> + * PASID   |   Y   |   Y   |   Y   |
>>> + * ADDR|   Y   |   Y   |   N/A |
>>> + *
>>> + * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
>>  * Invalidations by %IOMMU_INV_GRANU_ARCHID use field @archid.
>>> + * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
>>> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
>>> + *
>>> + * If multiple cache types are invalidated simultaneously, they all
>>> + * must support the used granularity.
>>> + */
>>> +struct iommu_cache_invalidate_info {
>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>>> +   __u32   version;
>>> +/* IOMMU paging structure cache */
>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */
>>> 

Re: [PATCH 3/4] iommu/vt-d: Handle RMRR with PCI bridge device scopes

2019-05-13 Thread Jacob Pan
On Mon, 13 May 2019 09:13:01 +0200
Eric Auger  wrote:

> When reading the vtd specification and especially the
> Reserved Memory Region Reporting Structure chapter,
> it is not obvious a device scope element cannot be a
> PCI-PCI bridge, in which case all downstream ports are
> likely to access the reserved memory region. Let's handle
> this case in device_has_rmrr.
> 
> Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from
> being placed into SI Domain")
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/iommu/intel-iommu.c | 32 +++-
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index e2134b13c9ae..89d82a1d50b1 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -736,12 +736,31 @@ static int iommu_dummy(struct device *dev)
>   return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
>  }
>  
> +static bool
> +is_downstream_to_pci_bridge(struct device *deva, struct device *devb)
> +{
> + struct pci_dev *pdeva, *pdevb;
> +
is there a more illustrative name for these. i guess deva is is the
bridge dev?
> + if (!dev_is_pci(deva) || !dev_is_pci(devb))
> + return false;
> +
> + pdeva = to_pci_dev(deva);
> + pdevb = to_pci_dev(devb);
> +
> + if (pdevb->subordinate &&
> + pdevb->subordinate->number <= pdeva->bus->number &&
> + pdevb->subordinate->busn_res.end >= pdeva->bus->number)
> + return true;
> +
> + return false;
> +}
> +
this seems to be a separate cleanup patch.
>  static struct intel_iommu *device_to_iommu(struct device *dev, u8
> *bus, u8 *devfn) {
>   struct dmar_drhd_unit *drhd = NULL;
>   struct intel_iommu *iommu;
>   struct device *tmp;
> - struct pci_dev *ptmp, *pdev = NULL;
> + struct pci_dev *pdev = NULL;
>   u16 segment = 0;
>   int i;
>  
> @@ -787,13 +806,7 @@ static struct intel_iommu
> *device_to_iommu(struct device *dev, u8 *bus, u8 *devf goto out;
>   }
>  
> - if (!pdev || !dev_is_pci(tmp))
> - continue;
> -
> - ptmp = to_pci_dev(tmp);
> - if (ptmp->subordinate &&
> - ptmp->subordinate->number <=
> pdev->bus->number &&
> - ptmp->subordinate->busn_res.end >=
> pdev->bus->number)
> + if (is_downstream_to_pci_bridge(dev, tmp))
>   goto got_pdev;
>   }
>  
> @@ -2886,7 +2899,8 @@ static bool device_has_rmrr(struct device *dev)
>*/
>   for_each_active_dev_scope(rmrr->devices,
> rmrr->devices_cnt, i, tmp)
> - if (tmp == dev) {
> + if (tmp == dev ||
> + is_downstream_to_pci_bridge(dev, tmp)) {
>   rcu_read_unlock();
>   return true;
>   }

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 12/23] iommu/smmuv3: Get prepared for nested stage support

2019-05-13 Thread Auger Eric
Hi Robin,

On 5/13/19 1:43 PM, Robin Murphy wrote:
> On 10/05/2019 15:34, Auger Eric wrote:
>> Hi Robin,
>>
>> On 5/8/19 4:24 PM, Robin Murphy wrote:
>>> On 08/04/2019 13:19, Eric Auger wrote:
 To allow nested stage support, we need to store both
 stage 1 and stage 2 configurations (and remove the former
 union).

 A nested setup is characterized by both s1_cfg and s2_cfg
 set.

 We introduce a new ste.abort field that will be set upon
 guest stage1 configuration passing. If s1_cfg is NULL and
 ste.abort is set, traffic can't pass. If ste.abort is not set,
 S1 is bypassed.

 arm_smmu_write_strtab_ent() is modified to write both stage
 fields in the STE and deal with the abort field.

 In nested mode, only stage 2 is "finalized" as the host does
 not own/configure the stage 1 context descriptor, guest does.

 Signed-off-by: Eric Auger 

 ---

 v4 -> v5:
 - reset ste.abort on detach

 v3 -> v4:
 - s1_cfg.nested_abort and nested_bypass removed.
 - s/ste.nested/ste.abort
 - arm_smmu_write_strtab_ent modifications with introduction
     of local abort, bypass and translate local variables
 - comment updated

 v1 -> v2:
 - invalidate the STE before moving from a live STE config to another
 - add the nested_abort and nested_bypass fields
 ---
    drivers/iommu/arm-smmu-v3.c | 35 ---
    1 file changed, 20 insertions(+), 15 deletions(-)

 diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
 index 21d027695181..e22e944ffc05 100644
 --- a/drivers/iommu/arm-smmu-v3.c
 +++ b/drivers/iommu/arm-smmu-v3.c
 @@ -211,6 +211,7 @@
    #define STRTAB_STE_0_CFG_BYPASS    4
    #define STRTAB_STE_0_CFG_S1_TRANS    5
    #define STRTAB_STE_0_CFG_S2_TRANS    6
 +#define STRTAB_STE_0_CFG_NESTED    7
      #define STRTAB_STE_0_S1FMT    GENMASK_ULL(5, 4)
    #define STRTAB_STE_0_S1FMT_LINEAR    0
 @@ -514,6 +515,7 @@ struct arm_smmu_strtab_ent {
     * configured according to the domain type.
     */
    bool    assigned;
 +    bool    abort;
    struct arm_smmu_s1_cfg    *s1_cfg;
    struct arm_smmu_s2_cfg    *s2_cfg;
    };
 @@ -628,10 +630,8 @@ struct arm_smmu_domain {
    bool    non_strict;
      enum arm_smmu_domain_stage    stage;
 -    union {
 -    struct arm_smmu_s1_cfg    s1_cfg;
 -    struct arm_smmu_s2_cfg    s2_cfg;
 -    };
 +    struct arm_smmu_s1_cfg    s1_cfg;
 +    struct arm_smmu_s2_cfg    s2_cfg;
      struct iommu_domain    domain;
    @@ -1108,12 +1108,13 @@ static void arm_smmu_write_strtab_ent(struct
 arm_smmu_device *smmu, u32 sid,
  __le64 *dst, struct arm_smmu_strtab_ent *ste)
    {
    /*
 - * This is hideously complicated, but we only really care about
 - * three cases at the moment:
 + * We care about the following transitions:
     *
     * 1. Invalid (all zero) -> bypass/fault (init)
 - * 2. Bypass/fault -> translation/bypass (attach)
 - * 3. Translation/bypass -> bypass/fault (detach)
 + * 2. Bypass/fault -> single stage translation/bypass (attach)
 + * 3. single stage Translation/bypass -> bypass/fault (detach)
 + * 4. S2 -> S1 + S2 (attach_pasid_table)
 + * 5. S1 + S2 -> S2 (detach_pasid_table)
     *
     * Given that we can't update the STE atomically and the SMMU
     * doesn't read the thing in a defined order, that leaves us
 @@ -1124,7 +1125,7 @@ static void arm_smmu_write_strtab_ent(struct
 arm_smmu_device *smmu, u32 sid,
     * 3. Update Config, sync
     */
    u64 val = le64_to_cpu(dst[0]);
 -    bool ste_live = false;
 +    bool abort, bypass, translate, ste_live = false;
    struct arm_smmu_cmdq_ent prefetch_cmd = {
    .opcode    = CMDQ_OP_PREFETCH_CFG,
    .prefetch    = {
 @@ -1138,11 +1139,11 @@ static void arm_smmu_write_strtab_ent(struct
 arm_smmu_device *smmu, u32 sid,
    break;
    case STRTAB_STE_0_CFG_S1_TRANS:
    case STRTAB_STE_0_CFG_S2_TRANS:
 +    case STRTAB_STE_0_CFG_NESTED:
    ste_live = true;
    break;
    case STRTAB_STE_0_CFG_ABORT:
 -    if (disable_bypass)
 -    break;
 +    break;
    default:
    BUG(); /* STE corruption */
    }
 @@ -1152,8 +1153,13 @@ static void arm_smmu_write_strtab_ent(struct
 arm_smmu_device *smmu, u32 sid,
    val = STRTAB_STE_0_V;
      /* Bypass/fault */
 -    if (!ste->assigned || 

Re: [PATCH v6 1/1] iommu: enhance IOMMU dma mode build options

2019-05-13 Thread Leizhen (ThunderTown)



On 2019/5/8 17:42, John Garry wrote:
> On 18/04/2019 14:57, Zhen Lei wrote:
>> First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
>> opportunity to set {lazy|strict} mode as default at build time. Then put
>> the three config options in an choice, make people can only choose one of
>> the three at a time.
>>
>> The default IOMMU dma modes on each ARCHs have no change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/ia64/kernel/pci-dma.c|  2 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
>>  arch/s390/pci/pci_dma.c   |  2 +-
>>  arch/x86/kernel/pci-dma.c |  7 ++---
>>  drivers/iommu/Kconfig | 44 
>> ++-
>>  drivers/iommu/amd_iommu_init.c|  3 ++-
>>  drivers/iommu/intel-iommu.c   |  2 +-
>>  drivers/iommu/iommu.c |  3 ++-
>>  8 files changed, 48 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
>> index fe988c49f01ce6a..655511dbf3c3b34 100644
>> --- a/arch/ia64/kernel/pci-dma.c
>> +++ b/arch/ia64/kernel/pci-dma.c
>> @@ -22,7 +22,7 @@
>>  int force_iommu __read_mostly;
>>  #endif
>>
>> -int iommu_pass_through;
>> +int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>
>>  static int __init pci_iommu_init(void)
>>  {
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>> b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 3ead4c237ed0ec9..383e082a9bb985c 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const 
>> char *level,
>>  va_end(args);
>>  }
>>
>> -static bool pnv_iommu_bypass_disabled __read_mostly;
>> +static bool pnv_iommu_bypass_disabled __read_mostly =
>> +!IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>  static bool pci_reset_phbs __read_mostly;
>>
>>  static int __init iommu_setup(char *str)
>> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
>> index 9e52d1527f71495..784ad1e0acecfb1 100644
>> --- a/arch/s390/pci/pci_dma.c
>> +++ b/arch/s390/pci/pci_dma.c
>> @@ -17,7 +17,7 @@
>>
>>  static struct kmem_cache *dma_region_table_cache;
>>  static struct kmem_cache *dma_page_table_cache;
>> -static int s390_iommu_strict;
>> +static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
>>
>>  static int zpci_refresh_global(struct zpci_dev *zdev)
>>  {
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index d460998ae828514..fb2bab42a0a3173 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -43,11 +43,8 @@
>>   * It is also possible to disable by default in kernel config, and enable 
>> with
>>   * iommu=nopt at boot time.
>>   */
>> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
>> -int iommu_pass_through __read_mostly = 1;
>> -#else
>> -int iommu_pass_through __read_mostly;
>> -#endif
>> +int iommu_pass_through __read_mostly =
>> +IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>
>>  extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 6f07f3b21816c64..8a1f1793cde76b4 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -74,17 +74,47 @@ config IOMMU_DEBUGFS
>>debug/iommu directory, and then populate a subdirectory with
>>entries as required.
>>
>> -config IOMMU_DEFAULT_PASSTHROUGH
>> -bool "IOMMU passthrough by default"
>> +choice
>> +prompt "IOMMU dma mode"
> 
> /s/dma/DMA/
OK

> 
> And how about add "default", as in "Default IOMMU DMA mode" or "IOMMU default 
> DMA mode"?
Yes. I prefer "IOMMU default DMA mode".

> 
>>  depends on IOMMU_API
>> -help
>> -  Enable passthrough by default, removing the need to pass in
>> -  iommu.passthrough=on or iommu=pt through command line. If this
>> -  is enabled, you can still disable with iommu.passthrough=off
>> -  or iommu=nopt depending on the architecture.
>> +default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
>> +default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
>> +default IOMMU_DEFAULT_STRICT
>> +help
>> +  This option allows IOMMU dma mode to be chose at build time, to
> 
> again, capitalize acronyms, i.e. /s/dma/DMA/ (more of these above and below)
OK, I will check it all. Thanks.

> 
>> +  override the default dma mode of each ARCHs, removing the need to
>> +  pass in kernel parameters through command line. You can still use
>> +  ARCHs specific boot options to override this option again.
>> +
>> +config IOMMU_DEFAULT_PASSTHROUGH
> 
> I think that it may need to be indented, along with the other choices
There is no problem. I referred to mm/Kconfig.

> 
>> +bool "passthrough"
>> +help
>> +  In this mode, the dma access through IOMMU 

Re: [PATCH v7 14/23] iommu/smmuv3: Implement cache_invalidate

2019-05-13 Thread Auger Eric
Hi Robin,

On 5/13/19 4:01 PM, Robin Murphy wrote:
> On 13/05/2019 13:16, Auger Eric wrote:
>> Hi Robin,
>> On 5/8/19 5:01 PM, Robin Murphy wrote:
>>> On 08/04/2019 13:19, Eric Auger wrote:
 Implement domain-selective and page-selective IOTLB invalidations.

 Signed-off-by: Eric Auger 

 ---
 v6 -> v7
 - check the uapi version

 v3 -> v4:
 - adapt to changes in the uapi
 - add support for leaf parameter
 - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
     anymore

 v2 -> v3:
 - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

 v1 -> v2:
 - properly pass the asid
 ---
    drivers/iommu/arm-smmu-v3.c | 60
 +
    1 file changed, 60 insertions(+)

 diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
 index 1486baf53425..4366921d8318 100644
 --- a/drivers/iommu/arm-smmu-v3.c
 +++ b/drivers/iommu/arm-smmu-v3.c
 @@ -2326,6 +2326,65 @@ static void arm_smmu_detach_pasid_table(struct
 iommu_domain *domain)
    mutex_unlock(_domain->init_mutex);
    }
    +static int
 +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
 *dev,
 +  struct iommu_cache_invalidate_info *inv_info)
 +{
 +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 +    struct arm_smmu_device *smmu = smmu_domain->smmu;
 +
 +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
 +    return -EINVAL;
 +
 +    if (!smmu)
 +    return -EINVAL;
 +
 +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
 +    return -EINVAL;
 +
 +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
 +    if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
 +    struct arm_smmu_cmdq_ent cmd = {
 +    .opcode = CMDQ_OP_TLBI_NH_ASID,
 +    .tlbi = {
 +    .vmid = smmu_domain->s2_cfg.vmid,
 +    .asid = inv_info->pasid,
 +    },
 +    };
 +
 +    arm_smmu_cmdq_issue_cmd(smmu, );
 +    arm_smmu_cmdq_issue_sync(smmu);
>>>
>>> I'd much rather make arm_smmu_tlb_inv_context() understand nested
>>> domains than open-code commands all over the place.
>>
>>
>>>
 +
 +    } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
 +    struct iommu_inv_addr_info *info = _info->addr_info;
 +    size_t size = info->nb_granules * info->granule_size;
 +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
 +    struct arm_smmu_cmdq_ent cmd = {
 +    .opcode = CMDQ_OP_TLBI_NH_VA,
 +    .tlbi = {
 +    .addr = info->addr,
 +    .vmid = smmu_domain->s2_cfg.vmid,
 +    .asid = info->pasid,
 +    .leaf = leaf,
 +    },
 +    };
 +
 +    do {
 +    arm_smmu_cmdq_issue_cmd(smmu, );
 +    cmd.tlbi.addr += info->granule_size;
 +    } while (size -= info->granule_size);
 +    arm_smmu_cmdq_issue_sync(smmu);
>>>
>>> An this in particular I would really like to go all the way through
>>> io_pgtable_tlb_add_flush()/io_pgtable_sync() if at all possible. Hooking
>>> up range-based invalidations is going to be a massive headache if the
>>> abstraction isn't solid.
>>
>> The concern is the host does not "own" the s1 config asid
>> (smmu_domain->s1_cfg.cd.asid is not set, practically). In our case the
>> asid only is passed by the userspace on CACHE_INVALIDATE ioctl call.
>>
>> arm_smmu_tlb_inv_context and arm_smmu_tlb_inv_range_nosync use this field
> 
> Right, but that's not exactly hard to solve. Even just something like
> the (untested, purely illustrative) refactoring below would be beneficial.
Sure I can go this way.

Thank you for detailing

Eric
> 
> Robin.
> 
> ->8-
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d3880010c6cf..31ef703cf671 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1423,11 +1423,9 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>  arm_smmu_cmdq_issue_sync(smmu);
>  }
> 
> -static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
> -  size_t granule, bool leaf, void *cookie)
> +static void __arm_smmu_tlb_inv_range(struct arm_smmu_domain
> *smmu_domain, u16 asid,
> +    unsigned long iova, size_t size, size_t granule, bool leaf)
>  {
> -    struct arm_smmu_domain *smmu_domain = cookie;
> -    struct arm_smmu_device *smmu = smmu_domain->smmu;
>  struct arm_smmu_cmdq_ent cmd = {
>  .tlbi = {
>  .leaf    = leaf,
> @@ -1437,18 +1435,27 @@ static void
> 

Re: [PATCH v7 14/23] iommu/smmuv3: Implement cache_invalidate

2019-05-13 Thread Robin Murphy

On 13/05/2019 13:16, Auger Eric wrote:

Hi Robin,
On 5/8/19 5:01 PM, Robin Murphy wrote:

On 08/04/2019 13:19, Eric Auger wrote:

Implement domain-selective and page-selective IOTLB invalidations.

Signed-off-by: Eric Auger 

---
v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
    anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
   drivers/iommu/arm-smmu-v3.c | 60 +
   1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1486baf53425..4366921d8318 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2326,6 +2326,65 @@ static void arm_smmu_detach_pasid_table(struct
iommu_domain *domain)
   mutex_unlock(_domain->init_mutex);
   }
   +static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
*dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+    struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+    return -EINVAL;
+
+    if (!smmu)
+    return -EINVAL;
+
+    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+    return -EINVAL;
+
+    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
+    if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+    struct arm_smmu_cmdq_ent cmd = {
+    .opcode = CMDQ_OP_TLBI_NH_ASID,
+    .tlbi = {
+    .vmid = smmu_domain->s2_cfg.vmid,
+    .asid = inv_info->pasid,
+    },
+    };
+
+    arm_smmu_cmdq_issue_cmd(smmu, );
+    arm_smmu_cmdq_issue_sync(smmu);


I'd much rather make arm_smmu_tlb_inv_context() understand nested
domains than open-code commands all over the place.






+
+    } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
+    struct iommu_inv_addr_info *info = _info->addr_info;
+    size_t size = info->nb_granules * info->granule_size;
+    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+    struct arm_smmu_cmdq_ent cmd = {
+    .opcode = CMDQ_OP_TLBI_NH_VA,
+    .tlbi = {
+    .addr = info->addr,
+    .vmid = smmu_domain->s2_cfg.vmid,
+    .asid = info->pasid,
+    .leaf = leaf,
+    },
+    };
+
+    do {
+    arm_smmu_cmdq_issue_cmd(smmu, );
+    cmd.tlbi.addr += info->granule_size;
+    } while (size -= info->granule_size);
+    arm_smmu_cmdq_issue_sync(smmu);


An this in particular I would really like to go all the way through
io_pgtable_tlb_add_flush()/io_pgtable_sync() if at all possible. Hooking
up range-based invalidations is going to be a massive headache if the
abstraction isn't solid.


The concern is the host does not "own" the s1 config asid
(smmu_domain->s1_cfg.cd.asid is not set, practically). In our case the
asid only is passed by the userspace on CACHE_INVALIDATE ioctl call.

arm_smmu_tlb_inv_context and arm_smmu_tlb_inv_range_nosync use this field


Right, but that's not exactly hard to solve. Even just something like 
the (untested, purely illustrative) refactoring below would be beneficial.


Robin.

->8-
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..31ef703cf671 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1423,11 +1423,9 @@ static void arm_smmu_tlb_inv_context(void *cookie)
arm_smmu_cmdq_issue_sync(smmu);
 }

-static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
- size_t granule, bool leaf, void 
*cookie)
+static void __arm_smmu_tlb_inv_range(struct arm_smmu_domain 
*smmu_domain, u16 asid,

+   unsigned long iova, size_t size, size_t granule, bool leaf)
 {
-   struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cmdq_ent cmd = {
.tlbi = {
.leaf   = leaf,
@@ -1437,18 +1435,27 @@ static void 
arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,


if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
cmd.opcode  = CMDQ_OP_TLBI_NH_VA;
-   cmd.tlbi.asid   = smmu_domain->s1_cfg.cd.asid;
+   cmd.tlbi.asid   = asid;
} else {
cmd.opcode  = CMDQ_OP_TLBI_S2_IPA;
cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
}

do {
-   arm_smmu_cmdq_issue_cmd(smmu, );
+   arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, );
 

Re: [PATCH v7 18/23] iommu/smmuv3: Report non recoverable faults

2019-05-13 Thread Robin Murphy

On 13/05/2019 13:32, Auger Eric wrote:

Hi Robin,
On 5/13/19 1:54 PM, Robin Murphy wrote:

On 13/05/2019 08:46, Auger Eric wrote:

Hi Robin,

On 5/8/19 7:20 PM, Robin Murphy wrote:

On 08/04/2019 13:19, Eric Auger wrote:

When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.

Signed-off-by: Eric Auger 

---
v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
    drivers/iommu/arm-smmu-v3.c | 169
+---
    1 file changed, 158 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 805bc32a..1fd320788dcb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -167,6 +167,26 @@
    #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
    #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
    +/* Events */
+#define ARM_SMMU_EVT_F_UUT    0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID    0x02
+#define ARM_SMMU_EVT_F_STE_FETCH    0x03
+#define ARM_SMMU_EVT_C_BAD_STE    0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ    0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED    0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN    0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID    0x08
+#define ARM_SMMU_EVT_F_CD_FETCH    0x09
+#define ARM_SMMU_EVT_C_BAD_CD    0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT    0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION    0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE    0x11
+#define ARM_SMMU_EVT_F_ACCESS    0x12
+#define ARM_SMMU_EVT_F_PERMISSION    0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT    0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT    0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST    0x24
+
    /* Common MSI config fields */
    #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
    #define MSI_CFG2_SH    GENMASK(5, 4)
@@ -332,6 +352,15 @@
    #define EVTQ_MAX_SZ_SHIFT    7
      #define EVTQ_0_ID    GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV    GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID    GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMID    GENMASK_ULL(63, 32)
+#define EVTQ_1_PNU    GENMASK_ULL(33, 33)
+#define EVTQ_1_IND    GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW    GENMASK_ULL(35, 35)
+#define EVTQ_1_S2    GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS    GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR    GENMASK_ULL(51, 3)
      /* PRI queue */
    #define PRIQ_ENT_DWORDS    2
@@ -639,6 +668,64 @@ struct arm_smmu_domain {
    spinlock_t    devices_lock;
    };
    +/* fault propagation */
+
+#define IOMMU_FAULT_F_FIELDS    (IOMMU_FAULT_UNRECOV_PASID_VALID | \
+ IOMMU_FAULT_UNRECOV_PERM_VALID | \
+ IOMMU_FAULT_UNRECOV_ADDR_VALID)
+
+struct arm_smmu_fault_propagation_data {
+    enum iommu_fault_reason reason;
+    bool s1_check;
+    u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data
fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT]    = { },
+[ARM_SMMU_EVT_C_BAD_STREAMID]    = { },
+[ARM_SMMU_EVT_F_STE_FETCH]    = { },
+[ARM_SMMU_EVT_C_BAD_STE]    = { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ]    = { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED]    = { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]    = { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]    =
{IOMMU_FAULT_REASON_PASID_INVALID,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID
+  },
+[ARM_SMMU_EVT_F_CD_FETCH]    = {IOMMU_FAULT_REASON_PASID_FETCH,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID |


It doesn't make sense to presume validity here, or in any of the faults
below...






+   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+  },
+[ARM_SMMU_EVT_C_BAD_CD]    =
{IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID
+  },
+[ARM_SMMU_EVT_F_WALK_EABT]    = {IOMMU_FAULT_REASON_WALK_EABT,
true,
+   IOMMU_FAULT_F_FIELDS |
+   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+  },
+[ARM_SMMU_EVT_F_TRANSLATION]    = {IOMMU_FAULT_REASON_PTE_FETCH,
true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_ADDR_SIZE]    = {IOMMU_FAULT_REASON_OOR_ADDRESS,
true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_ACCESS]    = {IOMMU_FAULT_REASON_ACCESS, true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_PERMISSION]    = {IOMMU_FAULT_REASON_PERMISSION,
true,
+   IOMMU_FAULT_F_FIELDS
+  },

Re: [git pull] IOMMU Updates for Linux v5.2

2019-05-13 Thread pr-tracker-bot
The pull request you sent on Mon, 13 May 2019 13:53:34 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-updates-v5.2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a13f0655503a4a89df67fdc7cac6a7810795d4b3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [ARM SMMU] Dynamic StreamID allocation

2019-05-13 Thread Robin Murphy

On 13/05/2019 13:42, Jean-Philippe Brucker wrote:

On 13/05/2019 08:09, Pankaj Bansal wrote:

Hi Jean,


-Original Message-
From: Jean-Philippe Brucker 
Sent: Friday, 10 May, 2019 07:07 PM
To: Pankaj Bansal ; Will Deacon
; Robin Murphy ; Joerg
Roedel 
Cc: iommu@lists.linux-foundation.org; Varun Sethi ; linux-
arm-ker...@lists.infradead.org; Nipun Gupta 
Subject: Re: [ARM SMMU] Dynamic StreamID allocation

On 10/05/2019 13:33, Pankaj Bansal wrote:

Hi Will/Robin/Joerg,

I am s/w engineer from NXP India Pvt. Ltd.
We are using SMMU-V3 in one of NXP SOC.
I have a question about the SMMU Stream ID allocation in linux.

Right now the Stream IDs allocated to a device are mapped via device tree to

the device.

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix
ir.bootlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%
2Fbindings%2Fiommu%2Farm%2Csmmu-

v3.txt%23L39data=02%7C01%7Cpankaj

.bansal%40nxp.com%7C3cbe8bd4827e425afd0f08d6d54c925e%7C686ea1d3b

c2b4c6



fa92cd99c5c301635%7C0%7C0%7C63693090665343sdata=vIG5u5n
XR5iRp

uuuGjeFxKBtA5f5ohf91znXX0QWm1c%3Dreserved=0

As the device tree is passed from bootloader to linux, we detect all the stream

IDs needed by a device in bootloader and add their IDs in respective device
nodes.

For each PCIE Endpoint (a unique BDF (Bus Device Function)) on PCIE bus, we

are assigning a unique Stream ID in bootloader.


However, this poses an issue with PCIE hot plug.
If we plug in a pcie device while linux is running, a unique BDF is assigned to

the device, for which there is no stream ID in device tree.


How can this problem be solved in linux?


Assuming the streamID associated to a BDF is predictable (streamID = BDF
+ constant), using the iommu-map property should just work:

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo
tlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%2Fbind
ings%2Fpci%2Fpci-
iommu.txtdata=02%7C01%7Cpankaj.bansal%40nxp.com%7C3cbe8bd482
7e425afd0f08d6d54c925e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
%7C63693090665343sdata=GkkovEnvhd5dN%2BGdh%2FnKCyW5Cd
EnLDP3cWTrk%2B%2FO7EQ%3Dreserved=0

It describes the streamIDs of all possible BDFs, including hotplugged functions.


You mean that we should increase the "length" parameter (in 
(rid-base,iommu,iommu-base,length) touple) ?
This would cater to any *new* Bus Device Function being detected on PCIE bus?
Is that right ?


No, the iommu-map solution only works when you can predict at boot time
which streamID will correspond to any BDF, for example if the PCIe
controller or firmware automatically assigns streamID = BDF.


Right now when we make iommu-map in bootloader, we are giving one RID per BDF:
https://elixir.bootlin.com/u-boot/latest/source/drivers/pci/pcie_layerscape_fixup.c#L168


Right, that won't work with hotplug. You can't derive a unique streamID
from any BDF at boot, if the streamID range is limited to 16 values on
this hardware.


Furthermore, does U-Boot enumerate all possible SR-IOV capabilities, or 
is this already broken regardless of hotplug?



Maybe you could move this allocator to the Linux Layerscape driver, and
call iommu_fwspec_add_ids() from there, rather than using iommu-map? Not
sure how feasible that is, but it might still be the simplest.


Assuming there's still the same silly 7-bit limitation as LS2085, you 
could in theory carve out that entire space of ICIDs to allow a static 
mapping like this:


iommu-map-mask = <0x0f07>;
iommu-map = <0x000  0x00 0x8>,
<0x100  0x08 0x8>,
<0x200  0x10 0x8>,
<0x300  0x18 0x8>,
<0x400  0x20 0x8>,
<0x500  0x28 0x8>,
<0x600  0x30 0x8>,
<0x700  0x38 0x8>,
<0x800  0x40 0x8>,
<0x900  0x48 0x8>,
<0xa00  0x50 0x8>,
<0xb00  0x58 0x8>,
<0xc00  0x60 0x8>,
<0xd00  0x68 0x8>,
<0xe00  0x70 0x8>,
<0xf00  0x78 0x8>;

That gives you 16 buses before IDs start aliasing awkwardly (any devices 
sharing a bus will alias per-function, but that's probably acceptable 
since they'd likely get grouped together anyway).


Either way the IOMMU layer itself is actually relatively easy-going 
these days. Even with the existing approach, if you could just update 
Linux's internal idea of the firmware data before the bus_add_device() 
call happens for the hotplugged device, then things should just work. 
You're also going to face the exact same problem for ITS Device IDs, 
though, and things may be less forgiving there.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [ARM SMMU] Dynamic StreamID allocation

2019-05-13 Thread Laurentiu Tudor
Hi Pankaj,

> -Original Message-
> From: linux-arm-kernel  On
> Behalf Of Pankaj Bansal
> Sent: Friday, May 10, 2019 3:34 PM
> 
> Hi Will/Robin/Joerg,
> 
> I am s/w engineer from NXP India Pvt. Ltd.
> We are using SMMU-V3 in one of NXP SOC.
> I have a question about the SMMU Stream ID allocation in linux.
> 
> Right now the Stream IDs allocated to a device are mapped via device tree
> to the device.

[snip]

> 
> As the device tree is passed from bootloader to linux, we detect all the
> stream IDs needed by a device in bootloader and add their IDs in
> respective device nodes.
> For each PCIE Endpoint (a unique BDF (Bus Device Function)) on PCIE bus,
> we are assigning a unique Stream ID in bootloader.
> 
> However, this poses an issue with PCIE hot plug.
> If we plug in a pcie device while linux is running, a unique BDF is
> assigned to the device, for which there is no stream ID in device tree.
> 
> How can this problem be solved in linux?
> 
> Is there a way to assign (and revoke) stream IDs at run time?

I think that our main problem is that we enumerate the PCI EPs in the 
bootloader (u-boot) and allocate StreamIDs just for them, completely 
disregarding hotplug scenarios. One simple fix would be to not do this and 
simply allocate a decently sized, fixed range of StreamIDs per PCI controller.

---
Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [ARM SMMU] Dynamic StreamID allocation

2019-05-13 Thread Jean-Philippe Brucker
On 13/05/2019 08:09, Pankaj Bansal wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker 
>> Sent: Friday, 10 May, 2019 07:07 PM
>> To: Pankaj Bansal ; Will Deacon
>> ; Robin Murphy ; Joerg
>> Roedel 
>> Cc: iommu@lists.linux-foundation.org; Varun Sethi ; linux-
>> arm-ker...@lists.infradead.org; Nipun Gupta 
>> Subject: Re: [ARM SMMU] Dynamic StreamID allocation
>>
>> On 10/05/2019 13:33, Pankaj Bansal wrote:
>>> Hi Will/Robin/Joerg,
>>>
>>> I am s/w engineer from NXP India Pvt. Ltd.
>>> We are using SMMU-V3 in one of NXP SOC.
>>> I have a question about the SMMU Stream ID allocation in linux.
>>>
>>> Right now the Stream IDs allocated to a device are mapped via device tree to
>> the device.
>>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix
>>> ir.bootlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%
>>> 2Fbindings%2Fiommu%2Farm%2Csmmu-
>> v3.txt%23L39data=02%7C01%7Cpankaj
>>> .bansal%40nxp.com%7C3cbe8bd4827e425afd0f08d6d54c925e%7C686ea1d3b
>> c2b4c6
>>>
>> fa92cd99c5c301635%7C0%7C0%7C63693090665343sdata=vIG5u5n
>> XR5iRp
>>> uuuGjeFxKBtA5f5ohf91znXX0QWm1c%3Dreserved=0
>>>
>>> As the device tree is passed from bootloader to linux, we detect all the 
>>> stream
>> IDs needed by a device in bootloader and add their IDs in respective device
>> nodes.
>>> For each PCIE Endpoint (a unique BDF (Bus Device Function)) on PCIE bus, we
>> are assigning a unique Stream ID in bootloader.
>>>
>>> However, this poses an issue with PCIE hot plug.
>>> If we plug in a pcie device while linux is running, a unique BDF is 
>>> assigned to
>> the device, for which there is no stream ID in device tree.
>>>
>>> How can this problem be solved in linux?
>>
>> Assuming the streamID associated to a BDF is predictable (streamID = BDF
>> + constant), using the iommu-map property should just work:
>>
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo
>> tlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%2Fbind
>> ings%2Fpci%2Fpci-
>> iommu.txtdata=02%7C01%7Cpankaj.bansal%40nxp.com%7C3cbe8bd482
>> 7e425afd0f08d6d54c925e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
>> %7C63693090665343sdata=GkkovEnvhd5dN%2BGdh%2FnKCyW5Cd
>> EnLDP3cWTrk%2B%2FO7EQ%3Dreserved=0
>>
>> It describes the streamIDs of all possible BDFs, including hotplugged 
>> functions.
> 
> You mean that we should increase the "length" parameter (in 
> (rid-base,iommu,iommu-base,length) touple) ?
> This would cater to any *new* Bus Device Function being detected on PCIE bus?
> Is that right ?

No, the iommu-map solution only works when you can predict at boot time
which streamID will correspond to any BDF, for example if the PCIe
controller or firmware automatically assigns streamID = BDF.

> Right now when we make iommu-map in bootloader, we are giving one RID per BDF:
> https://elixir.bootlin.com/u-boot/latest/source/drivers/pci/pcie_layerscape_fixup.c#L168

Right, that won't work with hotplug. You can't derive a unique streamID
from any BDF at boot, if the streamID range is limited to 16 values on
this hardware.

Maybe you could move this allocator to the Linux Layerscape driver, and
call iommu_fwspec_add_ids() from there, rather than using iommu-map? Not
sure how feasible that is, but it might still be the simplest.

Thanks,
Jean

> 
> But isn't the better approach to make it dynamic in linux?
> i.e. as soon as a new device is detected "requester id" is allocated to it 
> from available pool.
> When device is removed, return the "requester id" to pool.
> is there any h/w limitation which prevents it?
> 
> Regards,
> Pankaj Bansal
>>
>> Thanks,
>> Jean
>>
>>>
>>> Is there a way to assign (and revoke) stream IDs at run time?
>>>
>>> Regards,
>>> Pankaj Bansal
>>>
>>>
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists
>>> .infradead.org%2Fmailman%2Flistinfo%2Flinux-arm-
>> kerneldata=02%7C0
>>>
>> 1%7Cpankaj.bansal%40nxp.com%7C3cbe8bd4827e425afd0f08d6d54c925e%7C6
>> 86ea
>>>
>> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63693090665343sda
>> ta=2La
>>> GBHO2%2Bbqk519uJvCatlHyRCtAPPjKO8Gxu1bQHBM%3Dreserved=0
>>>
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 18/23] iommu/smmuv3: Report non recoverable faults

2019-05-13 Thread Auger Eric
Hi Robin,
On 5/13/19 1:54 PM, Robin Murphy wrote:
> On 13/05/2019 08:46, Auger Eric wrote:
>> Hi Robin,
>>
>> On 5/8/19 7:20 PM, Robin Murphy wrote:
>>> On 08/04/2019 13:19, Eric Auger wrote:
 When a stage 1 related fault event is read from the event queue,
 let's propagate it to potential external fault listeners, ie. users
 who registered a fault handler.

 Signed-off-by: Eric Auger 

 ---
 v4 -> v5:
 - s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
 ---
    drivers/iommu/arm-smmu-v3.c | 169
 +---
    1 file changed, 158 insertions(+), 11 deletions(-)

 diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
 index 805bc32a..1fd320788dcb 100644
 --- a/drivers/iommu/arm-smmu-v3.c
 +++ b/drivers/iommu/arm-smmu-v3.c
 @@ -167,6 +167,26 @@
    #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
    #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
    +/* Events */
 +#define ARM_SMMU_EVT_F_UUT    0x01
 +#define ARM_SMMU_EVT_C_BAD_STREAMID    0x02
 +#define ARM_SMMU_EVT_F_STE_FETCH    0x03
 +#define ARM_SMMU_EVT_C_BAD_STE    0x04
 +#define ARM_SMMU_EVT_F_BAD_ATS_TREQ    0x05
 +#define ARM_SMMU_EVT_F_STREAM_DISABLED    0x06
 +#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN    0x07
 +#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID    0x08
 +#define ARM_SMMU_EVT_F_CD_FETCH    0x09
 +#define ARM_SMMU_EVT_C_BAD_CD    0x0a
 +#define ARM_SMMU_EVT_F_WALK_EABT    0x0b
 +#define ARM_SMMU_EVT_F_TRANSLATION    0x10
 +#define ARM_SMMU_EVT_F_ADDR_SIZE    0x11
 +#define ARM_SMMU_EVT_F_ACCESS    0x12
 +#define ARM_SMMU_EVT_F_PERMISSION    0x13
 +#define ARM_SMMU_EVT_F_TLB_CONFLICT    0x20
 +#define ARM_SMMU_EVT_F_CFG_CONFLICT    0x21
 +#define ARM_SMMU_EVT_E_PAGE_REQUEST    0x24
 +
    /* Common MSI config fields */
    #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
    #define MSI_CFG2_SH    GENMASK(5, 4)
 @@ -332,6 +352,15 @@
    #define EVTQ_MAX_SZ_SHIFT    7
      #define EVTQ_0_ID    GENMASK_ULL(7, 0)
 +#define EVTQ_0_SSV    GENMASK_ULL(11, 11)
 +#define EVTQ_0_SUBSTREAMID    GENMASK_ULL(31, 12)
 +#define EVTQ_0_STREAMID    GENMASK_ULL(63, 32)
 +#define EVTQ_1_PNU    GENMASK_ULL(33, 33)
 +#define EVTQ_1_IND    GENMASK_ULL(34, 34)
 +#define EVTQ_1_RNW    GENMASK_ULL(35, 35)
 +#define EVTQ_1_S2    GENMASK_ULL(39, 39)
 +#define EVTQ_1_CLASS    GENMASK_ULL(40, 41)
 +#define EVTQ_3_FETCH_ADDR    GENMASK_ULL(51, 3)
      /* PRI queue */
    #define PRIQ_ENT_DWORDS    2
 @@ -639,6 +668,64 @@ struct arm_smmu_domain {
    spinlock_t    devices_lock;
    };
    +/* fault propagation */
 +
 +#define IOMMU_FAULT_F_FIELDS    (IOMMU_FAULT_UNRECOV_PASID_VALID | \
 + IOMMU_FAULT_UNRECOV_PERM_VALID | \
 + IOMMU_FAULT_UNRECOV_ADDR_VALID)
 +
 +struct arm_smmu_fault_propagation_data {
 +    enum iommu_fault_reason reason;
 +    bool s1_check;
 +    u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
 +};
 +
 +/*
 + * Describes how SMMU faults translate into generic IOMMU faults
 + * and if they need to be reported externally
 + */
 +static const struct arm_smmu_fault_propagation_data
 fault_propagation[] = {
 +[ARM_SMMU_EVT_F_UUT]    = { },
 +[ARM_SMMU_EVT_C_BAD_STREAMID]    = { },
 +[ARM_SMMU_EVT_F_STE_FETCH]    = { },
 +[ARM_SMMU_EVT_C_BAD_STE]    = { },
 +[ARM_SMMU_EVT_F_BAD_ATS_TREQ]    = { },
 +[ARM_SMMU_EVT_F_STREAM_DISABLED]    = { },
 +[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]    = { },
 +[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]    =
 {IOMMU_FAULT_REASON_PASID_INVALID,
 +   false,
 +   IOMMU_FAULT_UNRECOV_PASID_VALID
 +  },
 +[ARM_SMMU_EVT_F_CD_FETCH]    = {IOMMU_FAULT_REASON_PASID_FETCH,
 +   false,
 +   IOMMU_FAULT_UNRECOV_PASID_VALID |
>>>
>>> It doesn't make sense to presume validity here, or in any of the faults
>>> below...
>>
>>
>>>
 +   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
 +  },
 +[ARM_SMMU_EVT_C_BAD_CD]    =
 {IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
 +   false,
 +   IOMMU_FAULT_UNRECOV_PASID_VALID
 +  },
 +[ARM_SMMU_EVT_F_WALK_EABT]    = {IOMMU_FAULT_REASON_WALK_EABT,
 true,
 +   IOMMU_FAULT_F_FIELDS |
 +   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
 +  },
 +[ARM_SMMU_EVT_F_TRANSLATION]    = {IOMMU_FAULT_REASON_PTE_FETCH,
 true,
 + 

Re: [PATCH v7 14/23] iommu/smmuv3: Implement cache_invalidate

2019-05-13 Thread Auger Eric
Hi Robin,
On 5/8/19 5:01 PM, Robin Murphy wrote:
> On 08/04/2019 13:19, Eric Auger wrote:
>> Implement domain-selective and page-selective IOTLB invalidations.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v6 -> v7
>> - check the uapi version
>>
>> v3 -> v4:
>> - adapt to changes in the uapi
>> - add support for leaf parameter
>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>    anymore
>>
>> v2 -> v3:
>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>
>> v1 -> v2:
>> - properly pass the asid
>> ---
>>   drivers/iommu/arm-smmu-v3.c | 60 +
>>   1 file changed, 60 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 1486baf53425..4366921d8318 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2326,6 +2326,65 @@ static void arm_smmu_detach_pasid_table(struct
>> iommu_domain *domain)
>>   mutex_unlock(_domain->init_mutex);
>>   }
>>   +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
>> *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +    return -EINVAL;
>> +
>> +    if (!smmu)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
>> +    if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
>> +    struct arm_smmu_cmdq_ent cmd = {
>> +    .opcode = CMDQ_OP_TLBI_NH_ASID,
>> +    .tlbi = {
>> +    .vmid = smmu_domain->s2_cfg.vmid,
>> +    .asid = inv_info->pasid,
>> +    },
>> +    };
>> +
>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>> +    arm_smmu_cmdq_issue_sync(smmu);
> 
> I'd much rather make arm_smmu_tlb_inv_context() understand nested
> domains than open-code commands all over the place.


> 
>> +
>> +    } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
>> +    struct iommu_inv_addr_info *info = _info->addr_info;
>> +    size_t size = info->nb_granules * info->granule_size;
>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +    struct arm_smmu_cmdq_ent cmd = {
>> +    .opcode = CMDQ_OP_TLBI_NH_VA,
>> +    .tlbi = {
>> +    .addr = info->addr,
>> +    .vmid = smmu_domain->s2_cfg.vmid,
>> +    .asid = info->pasid,
>> +    .leaf = leaf,
>> +    },
>> +    };
>> +
>> +    do {
>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>> +    cmd.tlbi.addr += info->granule_size;
>> +    } while (size -= info->granule_size);
>> +    arm_smmu_cmdq_issue_sync(smmu);
> 
> An this in particular I would really like to go all the way through
> io_pgtable_tlb_add_flush()/io_pgtable_sync() if at all possible. Hooking
> up range-based invalidations is going to be a massive headache if the
> abstraction isn't solid.

The concern is the host does not "own" the s1 config asid
(smmu_domain->s1_cfg.cd.asid is not set, practically). In our case the
asid only is passed by the userspace on CACHE_INVALIDATE ioctl call.

arm_smmu_tlb_inv_context and arm_smmu_tlb_inv_range_nosync use this field

Jean-Philippe highlighted that fact in
https://lkml.org/lkml/2019/1/11/1062 and suggested to open-code commands
instead, hence the current form.

Thanks

Eric
> 
> Robin.
> 
>> +    } else {
>> +    return -EINVAL;
>> +    }
>> +    }
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +    return -ENOENT;
>> +    }
>> +    return 0;
>> +}
>> +
>>   static struct iommu_ops arm_smmu_ops = {
>>   .capable    = arm_smmu_capable,
>>   .domain_alloc    = arm_smmu_domain_alloc,
>> @@ -2346,6 +2405,7 @@ static struct iommu_ops arm_smmu_ops = {
>>   .put_resv_regions    = arm_smmu_put_resv_regions,
>>   .attach_pasid_table    = arm_smmu_attach_pasid_table,
>>   .detach_pasid_table    = arm_smmu_detach_pasid_table,
>> +    .cache_invalidate    = arm_smmu_cache_invalidate,
>>   .pgsize_bitmap    = -1UL, /* Restricted during device attach */
>>   };
>>  


Re: [PATCH v7 13/23] iommu/smmuv3: Implement attach/detach_pasid_table

2019-05-13 Thread Robin Murphy

On 10/05/2019 15:35, Auger Eric wrote:

Hi Robin,

On 5/8/19 4:38 PM, Robin Murphy wrote:

On 08/04/2019 13:19, Eric Auger wrote:

On attach_pasid_table() we program STE S1 related info set
by the guest into the actual physical STEs. At minimum
we need to program the context descriptor GPA and compute
whether the stage1 is translated/bypassed or aborted.

Signed-off-by: Eric Auger 

---
v6 -> v7:
- check versions and comment the fact we don't need to take
    into account s1dss and s1fmt
v3 -> v4:
- adapt to changes in iommu_pasid_table_config
- different programming convention at s1_cfg/s2_cfg/ste.abort

v2 -> v3:
- callback now is named set_pasid_table and struct fields
    are laid out differently.

v1 -> v2:
- invalidate the STE before changing them
- hold init_mutex
- handle new fields
---
   drivers/iommu/arm-smmu-v3.c | 121 
   1 file changed, 121 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e22e944ffc05..1486baf53425 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2207,6 +2207,125 @@ static void arm_smmu_put_resv_regions(struct
device *dev,
   kfree(entry);
   }
   +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
+   struct iommu_pasid_table_config *cfg)
+{
+    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+    struct arm_smmu_master_data *entry;
+    struct arm_smmu_s1_cfg *s1_cfg;
+    struct arm_smmu_device *smmu;
+    unsigned long flags;
+    int ret = -EINVAL;
+
+    if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
+    return -EINVAL;
+
+    if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
+    cfg->smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
+    return -EINVAL;
+
+    mutex_lock(_domain->init_mutex);
+
+    smmu = smmu_domain->smmu;
+
+    if (!smmu)
+    goto out;
+
+    if (!((smmu->features & ARM_SMMU_FEAT_TRANS_S1) &&
+  (smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
+    dev_info(smmu_domain->smmu->dev,
+ "does not implement two stages\n");
+    goto out;
+    }


That check is redundant (and frankly looks a little bit spammy). If the
one below is not enough, there is a problem elsewhere - if it's possible
for smmu_domain->stage to ever get set to ARM_SMMU_DOMAIN_NESTED without
both stages of translation present, we've already gone fundamentally wrong.


Makes sense. Moved that check to arm_smmu_domain_finalise() instead and
remove redundant ones.


Urgh, I forgot exactly how the crazy domain-allocation dance worked, 
such that we're not in a position to refuse the domain_set_attr() call 
itself, but that does sound like the best compromise for now.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 18/23] iommu/smmuv3: Report non recoverable faults

2019-05-13 Thread Robin Murphy

On 13/05/2019 08:46, Auger Eric wrote:

Hi Robin,

On 5/8/19 7:20 PM, Robin Murphy wrote:

On 08/04/2019 13:19, Eric Auger wrote:

When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.

Signed-off-by: Eric Auger 

---
v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
   drivers/iommu/arm-smmu-v3.c | 169 +---
   1 file changed, 158 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 805bc32a..1fd320788dcb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -167,6 +167,26 @@
   #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
   #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
   +/* Events */
+#define ARM_SMMU_EVT_F_UUT    0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID    0x02
+#define ARM_SMMU_EVT_F_STE_FETCH    0x03
+#define ARM_SMMU_EVT_C_BAD_STE    0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ    0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED    0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN    0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID    0x08
+#define ARM_SMMU_EVT_F_CD_FETCH    0x09
+#define ARM_SMMU_EVT_C_BAD_CD    0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT    0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION    0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE    0x11
+#define ARM_SMMU_EVT_F_ACCESS    0x12
+#define ARM_SMMU_EVT_F_PERMISSION    0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT    0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT    0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST    0x24
+
   /* Common MSI config fields */
   #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
   #define MSI_CFG2_SH    GENMASK(5, 4)
@@ -332,6 +352,15 @@
   #define EVTQ_MAX_SZ_SHIFT    7
     #define EVTQ_0_ID    GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV    GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID    GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMID    GENMASK_ULL(63, 32)
+#define EVTQ_1_PNU    GENMASK_ULL(33, 33)
+#define EVTQ_1_IND    GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW    GENMASK_ULL(35, 35)
+#define EVTQ_1_S2    GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS    GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR    GENMASK_ULL(51, 3)
     /* PRI queue */
   #define PRIQ_ENT_DWORDS    2
@@ -639,6 +668,64 @@ struct arm_smmu_domain {
   spinlock_t    devices_lock;
   };
   +/* fault propagation */
+
+#define IOMMU_FAULT_F_FIELDS    (IOMMU_FAULT_UNRECOV_PASID_VALID | \
+ IOMMU_FAULT_UNRECOV_PERM_VALID | \
+ IOMMU_FAULT_UNRECOV_ADDR_VALID)
+
+struct arm_smmu_fault_propagation_data {
+    enum iommu_fault_reason reason;
+    bool s1_check;
+    u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data
fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT]    = { },
+[ARM_SMMU_EVT_C_BAD_STREAMID]    = { },
+[ARM_SMMU_EVT_F_STE_FETCH]    = { },
+[ARM_SMMU_EVT_C_BAD_STE]    = { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ]    = { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED]    = { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]    = { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]    = {IOMMU_FAULT_REASON_PASID_INVALID,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID
+  },
+[ARM_SMMU_EVT_F_CD_FETCH]    = {IOMMU_FAULT_REASON_PASID_FETCH,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID |


It doesn't make sense to presume validity here, or in any of the faults
below...






+   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+  },
+[ARM_SMMU_EVT_C_BAD_CD]    =
{IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+   false,
+   IOMMU_FAULT_UNRECOV_PASID_VALID
+  },
+[ARM_SMMU_EVT_F_WALK_EABT]    = {IOMMU_FAULT_REASON_WALK_EABT, true,
+   IOMMU_FAULT_F_FIELDS |
+   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+  },
+[ARM_SMMU_EVT_F_TRANSLATION]    = {IOMMU_FAULT_REASON_PTE_FETCH,
true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_ADDR_SIZE]    = {IOMMU_FAULT_REASON_OOR_ADDRESS,
true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_ACCESS]    = {IOMMU_FAULT_REASON_ACCESS, true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_PERMISSION]    = {IOMMU_FAULT_REASON_PERMISSION,
true,
+   IOMMU_FAULT_F_FIELDS
+  },
+[ARM_SMMU_EVT_F_TLB_CONFLICT]    = { },
+[ARM_SMMU_EVT_F_CFG_CONFLICT]    = { },
+[ARM_SMMU_EVT_E_PAGE_REQUEST]  

[git pull] IOMMU Updates for Linux v5.2

2019-05-13 Thread Joerg Roedel
Hi Linus,

this pull-request includes two reverts which I had to do after the merge
window started, because the reverted patches caused issues in
linux-next. But the rest of this was ready before the merge window. With
this in mind:

The following changes since commit 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b:

  Linux 5.1-rc7 (2019-04-28 17:04:13 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-updates-v5.2

for you to fetch changes up to b5531563e8a0b8fcc5344a38d1fad9217e08e09b:

  Merge branches 'arm/tegra', 'arm/mediatek', 'arm/smmu', 'x86/vt-d', 'x86/amd' 
and 'core' into next (2019-05-07 09:40:12 +0200)


IOMMU Updates for Linux v5.2

Including:

- ATS support for ARM-SMMU-v3.

- AUX domain support in the IOMMU-API and the Intel VT-d driver.
  This adds support for multiple DMA address spaces per
  (PCI-)device. The use-case is to multiplex devices between
  host and KVM guests in a more flexible way than supported by
  SR-IOV.

- The Rest are smaller cleanups and fixes, two of which needed
  to be reverted after testing in linux-next.


Andy Shevchenko (1):
  iommu/vt-d: Switch to bitmap_zalloc()

Christoph Hellwig (4):
  iommu/amd: Remove the leftover of bypass support
  iommu/vt-d: Clean up iommu_no_mapping
  iommu/vt-d: Use dma_direct for bypass devices
  iommu/vt-d: Don't clear GFP_DMA and GFP_DMA32 flags

Dmitry Osipenko (3):
  iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
  iommu/tegra-smmu: Properly release domain resources
  iommu/tegra-smmu: Respect IOMMU API read-write protections

Douglas Anderson (1):
  iommu/arm-smmu: Break insecure users by disabling bypass by default

Eric Auger (1):
  iommu/vt-d: Fix leak in intel_pasid_alloc_table on error path

Gustavo A. R. Silva (1):
  iommu/vt-d: Use struct_size() helper

Jean-Philippe Brucker (11):
  iommu: Bind process address spaces to devices
  iommu/amd: Use pci_prg_resp_pasid_required()
  PCI: Move ATS declarations outside of CONFIG_PCI
  PCI: Add a stub for pci_ats_disabled()
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master
  iommu/arm-smmu-v3: Store SteamIDs in master
  iommu/arm-smmu-v3: Add a master->domain pointer
  iommu/arm-smmu-v3: Link domains and devices
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Disable tagged pointers

Jinyu Qi (1):
  iommu/iova: Separate atomic variables to improve performance

Joerg Roedel (7):
  Merge branch 'api-features' into x86/vt-d
  iommu/amd: Remove amd_iommu_pd_list
  Merge branch 'for-joerg/arm-smmu/updates' of 
git://git.kernel.org/.../will/linux into arm/smmu
  Merge branch 'api-features' into arm/smmu
  Revert "iommu/amd: Remove the leftover of bypass support"
  Revert "iommu/amd: Flush not present cache in iommu_map_page"
  Merge branches 'arm/tegra', 'arm/mediatek', 'arm/smmu', 'x86/vt-d', 
'x86/amd' and 'core' into next

Lu Baolu (15):
  iommu: Remove iommu_callback_data
  iommu: Add APIs for multiple domains per device
  iommu/vt-d: Make intel_iommu_enable_pasid() more generic
  iommu/vt-d: Add per-device IOMMU feature ops entries
  iommu/vt-d: Move common code out of iommu_attch_device()
  iommu/vt-d: Aux-domain specific domain attach/detach
  iommu/vt-d: Return ID associated with an auxiliary domain
  vfio/mdev: Add iommu related member in mdev_device
  vfio/type1: Add domain at(de)taching group helpers
  vfio/type1: Handle different mdev isolation type
  iommu/vt-d: Flush IOTLB for untrusted device in time
  iommu/vt-d: Don't request page request irq under dmar_global_lock
  iommu/vt-d: Cleanup: no spaces at the start of a line
  iommu/vt-d: Set intel_iommu_gfx_mapped correctly
  iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU

Tom Murphy (1):
  iommu/amd: Flush not present cache in iommu_map_page

Vivek Gautam (1):
  iommu/arm-smmu: Log CBFRSYNRA register on context fault

Wen Yang (1):
  iommu/mediatek: Fix leaked of_node references

Will Deacon (1):
  iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel

 drivers/acpi/arm64/iort.c   |  11 +
 drivers/iommu/Kconfig   |  25 ++
 drivers/iommu/amd_iommu.c   |  52 +---
 drivers/iommu/amd_iommu_init.c  |   8 -
 drivers/iommu/amd_iommu_types.h |   6 -
 drivers/iommu/arm-smmu-regs.h   |   2 +
 drivers/iommu/arm-smmu-v3.c | 355 +-
 drivers/iommu/arm-smmu.c|  11 +-
 drivers/iommu/dmar.c|   2 +-
 drivers/iommu/intel-iommu.c | 584 
 

Re: [PATCH v7 12/23] iommu/smmuv3: Get prepared for nested stage support

2019-05-13 Thread Robin Murphy

On 10/05/2019 15:34, Auger Eric wrote:

Hi Robin,

On 5/8/19 4:24 PM, Robin Murphy wrote:

On 08/04/2019 13:19, Eric Auger wrote:

To allow nested stage support, we need to store both
stage 1 and stage 2 configurations (and remove the former
union).

A nested setup is characterized by both s1_cfg and s2_cfg
set.

We introduce a new ste.abort field that will be set upon
guest stage1 configuration passing. If s1_cfg is NULL and
ste.abort is set, traffic can't pass. If ste.abort is not set,
S1 is bypassed.

arm_smmu_write_strtab_ent() is modified to write both stage
fields in the STE and deal with the abort field.

In nested mode, only stage 2 is "finalized" as the host does
not own/configure the stage 1 context descriptor, guest does.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- reset ste.abort on detach

v3 -> v4:
- s1_cfg.nested_abort and nested_bypass removed.
- s/ste.nested/ste.abort
- arm_smmu_write_strtab_ent modifications with introduction
    of local abort, bypass and translate local variables
- comment updated

v1 -> v2:
- invalidate the STE before moving from a live STE config to another
- add the nested_abort and nested_bypass fields
---
   drivers/iommu/arm-smmu-v3.c | 35 ---
   1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 21d027695181..e22e944ffc05 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -211,6 +211,7 @@
   #define STRTAB_STE_0_CFG_BYPASS    4
   #define STRTAB_STE_0_CFG_S1_TRANS    5
   #define STRTAB_STE_0_CFG_S2_TRANS    6
+#define STRTAB_STE_0_CFG_NESTED    7
     #define STRTAB_STE_0_S1FMT    GENMASK_ULL(5, 4)
   #define STRTAB_STE_0_S1FMT_LINEAR    0
@@ -514,6 +515,7 @@ struct arm_smmu_strtab_ent {
    * configured according to the domain type.
    */
   bool    assigned;
+    bool    abort;
   struct arm_smmu_s1_cfg    *s1_cfg;
   struct arm_smmu_s2_cfg    *s2_cfg;
   };
@@ -628,10 +630,8 @@ struct arm_smmu_domain {
   bool    non_strict;
     enum arm_smmu_domain_stage    stage;
-    union {
-    struct arm_smmu_s1_cfg    s1_cfg;
-    struct arm_smmu_s2_cfg    s2_cfg;
-    };
+    struct arm_smmu_s1_cfg    s1_cfg;
+    struct arm_smmu_s2_cfg    s2_cfg;
     struct iommu_domain    domain;
   @@ -1108,12 +1108,13 @@ static void arm_smmu_write_strtab_ent(struct
arm_smmu_device *smmu, u32 sid,
     __le64 *dst, struct arm_smmu_strtab_ent *ste)
   {
   /*
- * This is hideously complicated, but we only really care about
- * three cases at the moment:
+ * We care about the following transitions:
    *
    * 1. Invalid (all zero) -> bypass/fault (init)
- * 2. Bypass/fault -> translation/bypass (attach)
- * 3. Translation/bypass -> bypass/fault (detach)
+ * 2. Bypass/fault -> single stage translation/bypass (attach)
+ * 3. single stage Translation/bypass -> bypass/fault (detach)
+ * 4. S2 -> S1 + S2 (attach_pasid_table)
+ * 5. S1 + S2 -> S2 (detach_pasid_table)
    *
    * Given that we can't update the STE atomically and the SMMU
    * doesn't read the thing in a defined order, that leaves us
@@ -1124,7 +1125,7 @@ static void arm_smmu_write_strtab_ent(struct
arm_smmu_device *smmu, u32 sid,
    * 3. Update Config, sync
    */
   u64 val = le64_to_cpu(dst[0]);
-    bool ste_live = false;
+    bool abort, bypass, translate, ste_live = false;
   struct arm_smmu_cmdq_ent prefetch_cmd = {
   .opcode    = CMDQ_OP_PREFETCH_CFG,
   .prefetch    = {
@@ -1138,11 +1139,11 @@ static void arm_smmu_write_strtab_ent(struct
arm_smmu_device *smmu, u32 sid,
   break;
   case STRTAB_STE_0_CFG_S1_TRANS:
   case STRTAB_STE_0_CFG_S2_TRANS:
+    case STRTAB_STE_0_CFG_NESTED:
   ste_live = true;
   break;
   case STRTAB_STE_0_CFG_ABORT:
-    if (disable_bypass)
-    break;
+    break;
   default:
   BUG(); /* STE corruption */
   }
@@ -1152,8 +1153,13 @@ static void arm_smmu_write_strtab_ent(struct
arm_smmu_device *smmu, u32 sid,
   val = STRTAB_STE_0_V;
     /* Bypass/fault */
-    if (!ste->assigned || !(ste->s1_cfg || ste->s2_cfg)) {
-    if (!ste->assigned && disable_bypass)
+
+    abort = (!ste->assigned && disable_bypass) || ste->abort;
+    translate = ste->s1_cfg || ste->s2_cfg;
+    bypass = !abort && !translate;
+
+    if (abort || bypass) {
+    if (abort)
   val |= FIELD_PREP(STRTAB_STE_0_CFG,
STRTAB_STE_0_CFG_ABORT);
   else
   val |= FIELD_PREP(STRTAB_STE_0_CFG,
STRTAB_STE_0_CFG_BYPASS);
@@ -1172,7 +1178,6 @@ static void arm_smmu_write_strtab_ent(struct
arm_smmu_device *smmu, u32 sid,
   }
     if (ste->s1_cfg) {
-    BUG_ON(ste_live);


Hmm, I'm a little uneasy 

Re: [PATCH v4 1/1] iommu/io-pgtable-arm: Add support to use system cache

2019-05-13 Thread Robin Murphy

On 13/05/2019 11:04, Vivek Gautam wrote:

Few Qualcomm platforms such as, sdm845 have an additional outer
cache called as System cache, aka. Last level cache (LLC) that
allows non-coherent devices to upgrade to using caching.
This cache sits right before the DDR, and is tightly coupled
with the memory controller. The clients using this cache request
their slices from this system cache, make it active, and can then
start using it.

There is a fundamental assumption that non-coherent devices can't
access caches. This change adds an exception where they *can* use
some level of cache despite still being non-coherent overall.
The coherent devices that use cacheable memory, and CPU make use of
this system cache by default.

Looking at memory types, we have following -
a) Normal uncached :- MAIR 0x44, inner non-cacheable,
   outer non-cacheable;
b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
   outer read write-back non-transient;
   attribute setting for coherenet I/O devices.
and, for non-coherent i/o devices that can allocate in system cache
another type gets added -
c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
 outer read write-back non-transient

Coherent I/O devices use system cache by marking the memory as
normal cached.
Non-coherent I/O devices should mark the memory as normal
sys-cached in page tables to use system cache.

Signed-off-by: Vivek Gautam 
---

V3 version of this patch and related series can be found at [1].

This change is a realisation of following changes from downstream msm-4.9:
iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[2]

Changes since v3:
  - Dropping support to cache i/o page tables to system cache. Getting support
for data buffers is the first step.
Removed io-pgtable quirk and related change to add domain attribute.

Glmark2 numbers on SDM845 based cheza board:

S.No.|  with LLC support   |without LLC support
  | for data buffers   |
--- 
1|  4480; 72.3fps  |4042; 65.2fps
2|  4500; 72.6fps  |4039; 65.1fps
3|  4523; 72.9fps  |4106; 66.2fps
4|  4489; 72.4fps  |4104; 66.2fps
5|  4518; 72.9fps  |4072; 65.7fps

[1] https://patchwork.kernel.org/cover/10772629/
[2] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602

  drivers/iommu/io-pgtable-arm.c | 9 -
  include/linux/iommu.h  | 1 +
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index d3700ec15cbd..2dbafe697531 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -167,10 +167,12 @@
  #define ARM_LPAE_MAIR_ATTR_MASK   0xff
  #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04
  #define ARM_LPAE_MAIR_ATTR_NC 0x44
+#define ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE  0xf4
  #define ARM_LPAE_MAIR_ATTR_WBRWA  0xff
  #define ARM_LPAE_MAIR_ATTR_IDX_NC 0
  #define ARM_LPAE_MAIR_ATTR_IDX_CACHE  1
  #define ARM_LPAE_MAIR_ATTR_IDX_DEV2
+#define ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE  3


Here at the implementation level, I'd rather just call these what they 
are, i.e. s/QCOM_SYS_CACHE/INC_OWBRWA/.


  
  /* IOPTE accessors */

  #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
@@ -442,6 +444,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (prot & IOMMU_QCOM_SYS_CACHE)


Where in the call stack is this going to be decided? (I don't recall the 
previous discussions ever really reaching a solid conclusion on how to 
separate responsibilities).



+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
} else {
pte = ARM_LPAE_PTE_HAP_FAULT;
if (prot & IOMMU_READ)
@@ -841,7 +846,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
  (ARM_LPAE_MAIR_ATTR_WBRWA
   << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
  (ARM_LPAE_MAIR_ATTR_DEVICE
-  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+ (ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE
+  << 
ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE));
  
  	cfg->arm_lpae_s1_cfg.mair[0] = reg;

cfg->arm_lpae_s1_cfg.mair[1] = 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..29dd2c624348 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@

Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API

2019-05-13 Thread Jean-Philippe Brucker
Hi Eric,

On 13/05/2019 10:14, Auger Eric wrote:
> I noticed my qemu integration was currently incorrectly using PASID
> invalidation for ASID based invalidation (SMMUV3 Stage1 CMD_TLBI_NH_ASID
> invalidation command). So I think we also need ARCHID invalidation.
> Sorry for the late notice.
>>  
>> +/* defines the granularity of the invalidation */
>> +enum iommu_inv_granularity {
>> +IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
> IOMMU_INV_GRANU_ARCHID, /* archid-selective invalidation */
>> +IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */

In terms of granularity, these values have the same meaning: invalidate
the whole address space of a context. Then you can communicate two
things using the same struct:
* If ATS is enables an Arm host needs to invalidate all ATC entries
using PASID.
* If BTM isn't used by the guest, the host needs to invalidate all TLB
entries using ARCHID.

Rather than introducing a new granule here, could we just add an archid
field to the struct associated with IOMMU_INV_GRANU_PASID? Something like...

>> +IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
>> +IOMMU_INVAL_GRANU_NR,   /* number of invalidation granularities */
>> +};
>> +
>> +/**
>> + * Address Selective Invalidation Structure
>> + *
>> + * @flags indicates the granularity of the address-selective invalidation
>> + * - if PASID bit is set, @pasid field is populated and the invalidation
>> + *   relates to cache entries tagged with this PASID and matching the
>> + *   address range.
>> + * - if ARCHID bit is set, @archid is populated and the invalidation relates
>> + *   to cache entries tagged with this architecture specific id and matching
>> + *   the address range.
>> + * - Both PASID and ARCHID can be set as they may tag different caches.
>> + * - if neither PASID or ARCHID is set, global addr invalidation applies
>> + * - LEAF flag indicates whether only the leaf PTE caching needs to be
>> + *   invalidated and other paging structure caches can be preserved.
>> + * @pasid: process address space id
>> + * @archid: architecture-specific id
>> + * @addr: first stage/level input address
>> + * @granule_size: page/block size of the mapping in bytes
>> + * @nb_granules: number of contiguous granules to be invalidated
>> + */
>> +struct iommu_inv_addr_info {
>> +#define IOMMU_INV_ADDR_FLAGS_PASID  (1 << 0)
>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1)
>> +#define IOMMU_INV_ADDR_FLAGS_LEAF   (1 << 2)
>> +__u32   flags;
>> +__u32   archid;
>> +__u64   pasid;
>> +__u64   addr;
>> +__u64   granule_size;
>> +__u64   nb_granules;
>> +};

struct iommu_inv_pasid_info {
#define IOMMU_INV_PASID_FLAGS_PASID (1 << 0)
#define IOMMU_INV_PASID_FLAGS_ARCHID(1 << 1)
__u32   flags;
__u32   archid;
__u64   pasid;
};

>> +
>> +/**
>> + * First level/stage invalidation information
>> + * @cache: bitfield that allows to select which caches to invalidate
>> + * @granularity: defines the lowest granularity used for the invalidation:
>> + * domain > pasid > addr
>> + *
>> + * Not all the combinations of cache/granularity make sense:
>> + *
>> + * type |   DEV_IOTLB   | IOTLB |  PASID|
>> + * granularity  |   |   |  cache|
>> + * -+---+---+---+
>> + * DOMAIN   |   N/A |   Y   |   Y   |
>  * ARCHID   |   N/A |   Y   |   N/A |
> 
>> + * PASID|   Y   |   Y   |   Y   |
>> + * ADDR |   Y   |   Y   |   N/A |
>> + *
>> + * Invalidations by %IOMMU_INV_GRANU_ADDR use field @addr_info.
>  * Invalidations by %IOMMU_INV_GRANU_ARCHID use field @archid.
>> + * Invalidations by %IOMMU_INV_GRANU_PASID use field @pasid.
>> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument.
>> + *
>> + * If multiple cache types are invalidated simultaneously, they all
>> + * must support the used granularity.
>> + */
>> +struct iommu_cache_invalidate_info {
>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>> +__u32   version;
>> +/* IOMMU paging structure cache */
>> +#define IOMMU_CACHE_INV_TYPE_IOTLB  (1 << 0) /* IOMMU IOTLB */
>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB  (1 << 1) /* Device IOTLB */
>> +#define IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID cache */
>> +#define IOMMU_CACHE_TYPE_NR (3)
>> +__u8cache;
>> +__u8granularity;
>> +__u8padding[2];
>> +union {
>> +__u64   pasid;
> __u32   archid;

struct iommu_inv_pasid_info pasid_info;

Thanks,
Jean

> 
> Thanks
> 
> Eric
>> +struct iommu_inv_addr_info addr_info;
>> +};
>> +};
>> +
>> +
>>  #endif /* _UAPI_IOMMU_H */
>>

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v4 1/1] iommu/io-pgtable-arm: Add support to use system cache

2019-05-13 Thread Vivek Gautam
Few Qualcomm platforms such as, sdm845 have an additional outer
cache called as System cache, aka. Last level cache (LLC) that
allows non-coherent devices to upgrade to using caching.
This cache sits right before the DDR, and is tightly coupled
with the memory controller. The clients using this cache request
their slices from this system cache, make it active, and can then
start using it.

There is a fundamental assumption that non-coherent devices can't
access caches. This change adds an exception where they *can* use
some level of cache despite still being non-coherent overall.
The coherent devices that use cacheable memory, and CPU make use of
this system cache by default.

Looking at memory types, we have following -
a) Normal uncached :- MAIR 0x44, inner non-cacheable,
  outer non-cacheable;
b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
  outer read write-back non-transient;
  attribute setting for coherenet I/O devices.
and, for non-coherent i/o devices that can allocate in system cache
another type gets added -
c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable,
outer read write-back non-transient

Coherent I/O devices use system cache by marking the memory as
normal cached.
Non-coherent I/O devices should mark the memory as normal
sys-cached in page tables to use system cache.

Signed-off-by: Vivek Gautam 
---

V3 version of this patch and related series can be found at [1].

This change is a realisation of following changes from downstream msm-4.9:
iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[2]

Changes since v3:
 - Dropping support to cache i/o page tables to system cache. Getting support
   for data buffers is the first step.
   Removed io-pgtable quirk and related change to add domain attribute.

Glmark2 numbers on SDM845 based cheza board:

S.No.|  with LLC support   |without LLC support
 |  for data buffers   |
--- 
1|  4480; 72.3fps  |4042; 65.2fps
2|  4500; 72.6fps  |4039; 65.1fps
3|  4523; 72.9fps  |4106; 66.2fps
4|  4489; 72.4fps  |4104; 66.2fps
5|  4518; 72.9fps  |4072; 65.7fps

[1] https://patchwork.kernel.org/cover/10772629/
[2] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602

 drivers/iommu/io-pgtable-arm.c | 9 -
 include/linux/iommu.h  | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index d3700ec15cbd..2dbafe697531 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -167,10 +167,12 @@
 #define ARM_LPAE_MAIR_ATTR_MASK0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE  0x04
 #define ARM_LPAE_MAIR_ATTR_NC  0x44
+#define ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE  0xf4
 #define ARM_LPAE_MAIR_ATTR_WBRWA   0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC  0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE   1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
+#define ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE  3
 
 /* IOPTE accessors */
 #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
@@ -442,6 +444,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (prot & IOMMU_QCOM_SYS_CACHE)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
} else {
pte = ARM_LPAE_PTE_HAP_FAULT;
if (prot & IOMMU_READ)
@@ -841,7 +846,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
  (ARM_LPAE_MAIR_ATTR_WBRWA
   << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
  (ARM_LPAE_MAIR_ATTR_DEVICE
-  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+  << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+ (ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE
+  << 
ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE));
 
cfg->arm_lpae_s1_cfg.mair[0] = reg;
cfg->arm_lpae_s1_cfg.mair[1] = 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..29dd2c624348 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC   (1 << 3)
 #define IOMMU_MMIO (1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_QCOM_SYS_CACHE   (1 << 6)
 /*
  * Where the bus hardware includes a privilege level as part of its access type
  * markings, and certain devices are capable of issuing 

Re: [PATCH v3 02/16] iommu: Introduce cache_invalidate API

2019-05-13 Thread Auger Eric
Hi Jacob, Jean-Philippe,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> From: "Liu, Yi L" 
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v6 -> v7:
> - detail which fields are used for each invalidation type
> - add a comment about multiple cache invalidation
> 
> v5 -> v6:
> - fix merge issue
> 
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c  | 14 
>  include/linux/iommu.h  | 15 -
>  include/uapi/linux/iommu.h | 80 
> ++
>  3 files changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8df9d34..a2f6f3e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1645,6 +1645,20 @@ void iommu_detach_pasid_table(struct iommu_domain 
> *domain)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + int ret = 0;
> +
> + if (unlikely(!domain->ops->cache_invalidate))
> + return -ENODEV;
> +
> + ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ab4d922..d182525 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -266,6 +266,7 @@ struct page_response_msg {
>   * @page_response: handle page request response
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -328,8 +329,9 @@ struct iommu_ops {
>   int (*attach_pasid_table)(struct iommu_domain *domain,
> struct iommu_pasid_table_config *cfg);
>   void (*detach_pasid_table)(struct iommu_domain *domain);
> -
>   int (*page_response)(struct device *dev, struct page_response_msg *msg);
> + int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
> + struct iommu_cache_invalidate_info *inv_info);
>  
>   unsigned long pgsize_bitmap;
>  };
> @@ -442,6 +444,9 @@ extern void iommu_detach_device(struct iommu_domain 
> *domain,
>  extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>   struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +   struct device *dev,
> +   struct iommu_cache_invalidate_info *inv_info);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -982,6 +987,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
> *handle)
>  static inline
>  void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>  
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 8848514..fa96ecb 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -162,4 +162,84 @@ struct iommu_pasid_table_config {
>   };
>  };

I noticed my qemu integration was currently incorrectly using PASID
invalidation for ASID based invalidation (SMMUV3 Stage1 

Re: [PATCH v7 18/23] iommu/smmuv3: Report non recoverable faults

2019-05-13 Thread Auger Eric
Hi Robin,

On 5/8/19 7:20 PM, Robin Murphy wrote:
> On 08/04/2019 13:19, Eric Auger wrote:
>> When a stage 1 related fault event is read from the event queue,
>> let's propagate it to potential external fault listeners, ie. users
>> who registered a fault handler.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v4 -> v5:
>> - s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
>> ---
>>   drivers/iommu/arm-smmu-v3.c | 169 +---
>>   1 file changed, 158 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 805bc32a..1fd320788dcb 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -167,6 +167,26 @@
>>   #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
>>   #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
>>   +/* Events */
>> +#define ARM_SMMU_EVT_F_UUT    0x01
>> +#define ARM_SMMU_EVT_C_BAD_STREAMID    0x02
>> +#define ARM_SMMU_EVT_F_STE_FETCH    0x03
>> +#define ARM_SMMU_EVT_C_BAD_STE    0x04
>> +#define ARM_SMMU_EVT_F_BAD_ATS_TREQ    0x05
>> +#define ARM_SMMU_EVT_F_STREAM_DISABLED    0x06
>> +#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN    0x07
>> +#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID    0x08
>> +#define ARM_SMMU_EVT_F_CD_FETCH    0x09
>> +#define ARM_SMMU_EVT_C_BAD_CD    0x0a
>> +#define ARM_SMMU_EVT_F_WALK_EABT    0x0b
>> +#define ARM_SMMU_EVT_F_TRANSLATION    0x10
>> +#define ARM_SMMU_EVT_F_ADDR_SIZE    0x11
>> +#define ARM_SMMU_EVT_F_ACCESS    0x12
>> +#define ARM_SMMU_EVT_F_PERMISSION    0x13
>> +#define ARM_SMMU_EVT_F_TLB_CONFLICT    0x20
>> +#define ARM_SMMU_EVT_F_CFG_CONFLICT    0x21
>> +#define ARM_SMMU_EVT_E_PAGE_REQUEST    0x24
>> +
>>   /* Common MSI config fields */
>>   #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
>>   #define MSI_CFG2_SH    GENMASK(5, 4)
>> @@ -332,6 +352,15 @@
>>   #define EVTQ_MAX_SZ_SHIFT    7
>>     #define EVTQ_0_ID    GENMASK_ULL(7, 0)
>> +#define EVTQ_0_SSV    GENMASK_ULL(11, 11)
>> +#define EVTQ_0_SUBSTREAMID    GENMASK_ULL(31, 12)
>> +#define EVTQ_0_STREAMID    GENMASK_ULL(63, 32)
>> +#define EVTQ_1_PNU    GENMASK_ULL(33, 33)
>> +#define EVTQ_1_IND    GENMASK_ULL(34, 34)
>> +#define EVTQ_1_RNW    GENMASK_ULL(35, 35)
>> +#define EVTQ_1_S2    GENMASK_ULL(39, 39)
>> +#define EVTQ_1_CLASS    GENMASK_ULL(40, 41)
>> +#define EVTQ_3_FETCH_ADDR    GENMASK_ULL(51, 3)
>>     /* PRI queue */
>>   #define PRIQ_ENT_DWORDS    2
>> @@ -639,6 +668,64 @@ struct arm_smmu_domain {
>>   spinlock_t    devices_lock;
>>   };
>>   +/* fault propagation */
>> +
>> +#define IOMMU_FAULT_F_FIELDS    (IOMMU_FAULT_UNRECOV_PASID_VALID | \
>> + IOMMU_FAULT_UNRECOV_PERM_VALID | \
>> + IOMMU_FAULT_UNRECOV_ADDR_VALID)
>> +
>> +struct arm_smmu_fault_propagation_data {
>> +    enum iommu_fault_reason reason;
>> +    bool s1_check;
>> +    u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
>> +};
>> +
>> +/*
>> + * Describes how SMMU faults translate into generic IOMMU faults
>> + * and if they need to be reported externally
>> + */
>> +static const struct arm_smmu_fault_propagation_data
>> fault_propagation[] = {
>> +[ARM_SMMU_EVT_F_UUT]    = { },
>> +[ARM_SMMU_EVT_C_BAD_STREAMID]    = { },
>> +[ARM_SMMU_EVT_F_STE_FETCH]    = { },
>> +[ARM_SMMU_EVT_C_BAD_STE]    = { },
>> +[ARM_SMMU_EVT_F_BAD_ATS_TREQ]    = { },
>> +[ARM_SMMU_EVT_F_STREAM_DISABLED]    = { },
>> +[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]    = { },
>> +[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]    = {IOMMU_FAULT_REASON_PASID_INVALID,
>> +   false,
>> +   IOMMU_FAULT_UNRECOV_PASID_VALID
>> +  },
>> +[ARM_SMMU_EVT_F_CD_FETCH]    = {IOMMU_FAULT_REASON_PASID_FETCH,
>> +   false,
>> +   IOMMU_FAULT_UNRECOV_PASID_VALID |
> 
> It doesn't make sense to presume validity here, or in any of the faults
> below...


> 
>> +   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
>> +  },
>> +[ARM_SMMU_EVT_C_BAD_CD]    =
>> {IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
>> +   false,
>> +   IOMMU_FAULT_UNRECOV_PASID_VALID
>> +  },
>> +[ARM_SMMU_EVT_F_WALK_EABT]    = {IOMMU_FAULT_REASON_WALK_EABT, true,
>> +   IOMMU_FAULT_F_FIELDS |
>> +   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
>> +  },
>> +[ARM_SMMU_EVT_F_TRANSLATION]    = {IOMMU_FAULT_REASON_PTE_FETCH,
>> true,
>> +   IOMMU_FAULT_F_FIELDS
>> +  },
>> +[ARM_SMMU_EVT_F_ADDR_SIZE]    = {IOMMU_FAULT_REASON_OOR_ADDRESS,
>> true,
>> +   IOMMU_FAULT_F_FIELDS
>> +  },
>> +[ARM_SMMU_EVT_F_ACCESS]    = {IOMMU_FAULT_REASON_ACCESS, true,
>> +   IOMMU_FAULT_F_FIELDS
>> +   

[PATCH 3/4] iommu/vt-d: Handle RMRR with PCI bridge device scopes

2019-05-13 Thread Eric Auger
When reading the vtd specification and especially the
Reserved Memory Region Reporting Structure chapter,
it is not obvious a device scope element cannot be a
PCI-PCI bridge, in which case all downstream ports are
likely to access the reserved memory region. Let's handle
this case in device_has_rmrr.

Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed 
into SI Domain")

Signed-off-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e2134b13c9ae..89d82a1d50b1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,31 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
+static bool
+is_downstream_to_pci_bridge(struct device *deva, struct device *devb)
+{
+   struct pci_dev *pdeva, *pdevb;
+
+   if (!dev_is_pci(deva) || !dev_is_pci(devb))
+   return false;
+
+   pdeva = to_pci_dev(deva);
+   pdevb = to_pci_dev(devb);
+
+   if (pdevb->subordinate &&
+   pdevb->subordinate->number <= pdeva->bus->number &&
+   pdevb->subordinate->busn_res.end >= pdeva->bus->number)
+   return true;
+
+   return false;
+}
+
 static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *ptmp, *pdev = NULL;
+   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
 
@@ -787,13 +806,7 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
goto out;
}
 
-   if (!pdev || !dev_is_pci(tmp))
-   continue;
-
-   ptmp = to_pci_dev(tmp);
-   if (ptmp->subordinate &&
-   ptmp->subordinate->number <= pdev->bus->number &&
-   ptmp->subordinate->busn_res.end >= 
pdev->bus->number)
+   if (is_downstream_to_pci_bridge(dev, tmp))
goto got_pdev;
}
 
@@ -2886,7 +2899,8 @@ static bool device_has_rmrr(struct device *dev)
 */
for_each_active_dev_scope(rmrr->devices,
  rmrr->devices_cnt, i, tmp)
-   if (tmp == dev) {
+   if (tmp == dev ||
+   is_downstream_to_pci_bridge(dev, tmp)) {
rcu_read_unlock();
return true;
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 0/6] iommu/ipmmu-vmsa: Suspend/resume support and assorted cleanups

2019-05-13 Thread Yoshihiro Shimoda
Hi Geert-san,

> From: Geert Uytterhoeven, Sent: Wednesday, April 24, 2019 10:55 PM
> 
>   Hi Jörg, Magnus,
> 
> On R-Car Gen3 systems with PSCI, PSCI may power down the SoC during
> system suspend, thus losing all IOMMU state.  Hence after s2ram, devices
> behind an IPMMU (e.g. SATA), and configured to use it, will fail to
> complete their I/O operations.
> 
> This patch series adds suspend/resume support to the Renesas IPMMU-VMSA
> IOMMU driver, and performs some smaller cleanups and fixes during the
> process.  Most patches are fairly independent, except for patch 6/6,
> which depends on patches 4/6 and 5/6.
> 
> Changes compared to v2:
>   - Fix sysfs path typo in patch description,
>   - Add Reviewed-by.
> 
> Changes compared to v1:
>   - Dropped "iommu/ipmmu-vmsa: Call ipmmu_ctx_write_root() instead of
> open coding",
>   - Add Reviewed-by,
>   - Merge IMEAR/IMELAR,
>   - s/ipmmu_context_init/ipmmu_domain_setup_context/,
>   - Drop PSCI checks.
> 
> This has been tested on Salvator-XS with R-Car H3 ES2.0, with IPMMU
> suport for SATA enabled.  To play safe, the resume operation has also
> been tested on R-Car M2-W.

Thank you for the patch! I reviewed this patch series and tested it on
R-Car H3 ES3.0 with IPMMU support for USB3.0 host and SDHI. So,

Reviewed-by: Yoshihiro Shimoda 
Tested-by: Yoshihiro Shimoda 

Best regards,
Yoshihiro Shimoda



[PATCH 0/4] RMRR related fixes

2019-05-13 Thread Eric Auger
Currently the Intel reserved region is attached to the
RMRR unit and when building the list of RMRR seen by a device
we link this unique reserved region without taking care of
potential multiple usage of this reserved region by several devices.

Also while reading the vtd spec it is unclear to me whether
the RMRR device scope referenced by an RMRR ACPI struct could
be a PCI-PCI bridge, in which case I think we also need to
check the device belongs to the PCI sub-hierarchy of the device
referenced in the scope. This would be true for device_has_rmrr()
and intel_iommu_get_resv_regions().

Eric Auger (4):
  iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
  iommu/vt-d: Duplicate iommu_resv_region objects per device list
  iommu/vt-d: Handle RMRR with PCI bridge device scopes
  iommu/vt-d: Handle PCI bridge RMRR device scopes in
intel_iommu_get_resv_regions

 drivers/acpi/arm64/iort.c   |  3 +-
 drivers/iommu/amd_iommu.c   |  7 ++--
 drivers/iommu/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm-smmu.c|  2 +-
 drivers/iommu/intel-iommu.c | 68 +++--
 drivers/iommu/iommu.c   |  7 ++--
 include/linux/iommu.h   |  2 +-
 7 files changed, 55 insertions(+), 36 deletions(-)

-- 
2.20.1



[PATCH 2/4] iommu/vt-d: Duplicate iommu_resv_region objects per device list

2019-05-13 Thread Eric Auger
intel_iommu_get_resv_regions() aims to return the list of
reserved regions accessible by a given @device. However several
devices can access the same reserved memory region and when
building the list it is not safe to use a single iommu_resv_region
object, whose container is the RMRR. This iommu_resv_region must
be duplicated per device reserved region list.

Let's remove the struct iommu_resv_region from the RMRR unit
and allocate the iommu_resv_region directly in
intel_iommu_get_resv_regions().

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")
Signed-off-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2075abdb174d..e2134b13c9ae 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -322,7 +322,6 @@ struct dmar_rmrr_unit {
u64 end_address;/* reserved end address */
struct dmar_dev_scope *devices; /* target devices */
int devices_cnt;/* target device count */
-   struct iommu_resv_region *resv; /* reserved region handle */
 };
 
 struct dmar_atsr_unit {
@@ -4206,7 +4205,6 @@ static inline void init_iommu_pm_ops(void) {}
 int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_reserved_memory *rmrr;
-   int prot = DMA_PTE_READ|DMA_PTE_WRITE;
struct dmar_rmrr_unit *rmrru;
size_t length;
 
@@ -4220,22 +4218,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
rmrru->end_address = rmrr->end_address;
 
length = rmrr->end_address - rmrr->base_address + 1;
-   rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
- IOMMU_RESV_DIRECT, GFP_KERNEL);
-   if (!rmrru->resv)
-   goto free_rmrru;
 
rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
((void *)rmrr) + rmrr->header.length,
>devices_cnt);
if (rmrru->devices_cnt && rmrru->devices == NULL)
-   goto free_all;
+   goto free_rmrru;
 
list_add(>list, _rmrr_units);
 
return 0;
-free_all:
-   kfree(rmrru->resv);
 free_rmrru:
kfree(rmrru);
 out:
@@ -4453,7 +4445,6 @@ static void intel_iommu_free_dmars(void)
list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) {
list_del(>list);
dmar_free_dev_scope(>devices, >devices_cnt);
-   kfree(rmrru->resv);
kfree(rmrru);
}
 
@@ -5271,6 +5262,7 @@ static void intel_iommu_remove_device(struct device *dev)
 static void intel_iommu_get_resv_regions(struct device *device,
 struct list_head *head)
 {
+   int prot = DMA_PTE_READ|DMA_PTE_WRITE;
struct iommu_resv_region *reg;
struct dmar_rmrr_unit *rmrr;
struct device *i_dev;
@@ -5280,10 +5272,21 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
for_each_rmrr_units(rmrr) {
for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
  i, i_dev) {
+   struct iommu_resv_region *resv;
+   size_t length;
+
if (i_dev != device)
continue;
 
-   list_add_tail(>resv->list, head);
+   length = rmrr->end_address - rmrr->base_address + 1;
+   resv = iommu_alloc_resv_region(rmrr->base_address,
+  length, prot,
+  IOMMU_RESV_DIRECT,
+  GFP_ATOMIC);
+   if (!resv)
+   break;
+
+   list_add_tail(>list, head);
}
}
rcu_read_unlock();
@@ -5301,10 +5304,8 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
 {
struct iommu_resv_region *entry, *next;
 
-   list_for_each_entry_safe(entry, next, head, list) {
-   if (entry->type == IOMMU_RESV_MSI)
-   kfree(entry);
-   }
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
 }
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()

2019-05-13 Thread Eric Auger
We plan to call iommu_alloc_resv_region in a non preemptible section.
Pass a GFP flag to the function and update all the call sites.

Signed-off-by: Eric Auger 
---
 drivers/acpi/arm64/iort.c   | 3 ++-
 drivers/iommu/amd_iommu.c   | 7 ---
 drivers/iommu/arm-smmu-v3.c | 2 +-
 drivers/iommu/arm-smmu.c| 2 +-
 drivers/iommu/intel-iommu.c | 4 ++--
 drivers/iommu/iommu.c   | 7 ---
 include/linux/iommu.h   | 2 +-
 7 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index adbf7cbedf80..20b56ae91513 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -868,7 +868,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
struct iommu_resv_region *region;
 
region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K,
-prot, IOMMU_RESV_MSI);
+prot, IOMMU_RESV_MSI,
+GFP_KERNEL);
if (region) {
list_add_tail(>list, head);
resv++;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f7cdd2ab7f11..a9aab13a9487 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3186,7 +3186,8 @@ static void amd_iommu_get_resv_regions(struct device *dev,
type = IOMMU_RESV_RESERVED;
 
region = iommu_alloc_resv_region(entry->address_start,
-length, prot, type);
+length, prot, type,
+GFP_KERNEL);
if (!region) {
dev_err(dev, "Out of memory allocating dm-regions\n");
return;
@@ -3196,14 +3197,14 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
 
region = iommu_alloc_resv_region(MSI_RANGE_START,
 MSI_RANGE_END - MSI_RANGE_START + 1,
-0, IOMMU_RESV_MSI);
+0, IOMMU_RESV_MSI, GFP_KERNEL);
if (!region)
return;
list_add_tail(>list, head);
 
region = iommu_alloc_resv_region(HT_RANGE_START,
 HT_RANGE_END - HT_RANGE_START + 1,
-0, IOMMU_RESV_RESERVED);
+0, IOMMU_RESV_RESERVED, GFP_KERNEL);
if (!region)
return;
list_add_tail(>list, head);
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..5aae50c811b3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2031,7 +2031,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-prot, IOMMU_RESV_SW_MSI);
+prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
if (!region)
return;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 045d93884164..3c28bc0555d4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1667,7 +1667,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-prot, IOMMU_RESV_SW_MSI);
+prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
if (!region)
return;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 28cb713d728c..2075abdb174d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4221,7 +4221,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
 
length = rmrr->end_address - rmrr->base_address + 1;
rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
- IOMMU_RESV_DIRECT);
+ IOMMU_RESV_DIRECT, GFP_KERNEL);
if (!rmrru->resv)
goto free_rmrru;
 
@@ -5290,7 +5290,7 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
 
reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
  IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
- 0, IOMMU_RESV_MSI);
+ 0, IOMMU_RESV_MSI, GFP_KERNEL);
if (!reg)
return;
list_add_tail(>list, 

[PATCH 4/4] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions

2019-05-13 Thread Eric Auger
In the case the RMRR device scope is a PCI-PCI bridge, let's check
the device belongs to the PCI sub-hierarchy.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")

Signed-off-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 89d82a1d50b1..9c1a765eca8b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5289,7 +5289,8 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
struct iommu_resv_region *resv;
size_t length;
 
-   if (i_dev != device)
+   if (i_dev != device &&
+   !is_downstream_to_pci_bridge(device, i_dev))
continue;
 
length = rmrr->end_address - rmrr->base_address + 1;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [ARM SMMU] Dynamic StreamID allocation

2019-05-13 Thread Pankaj Bansal
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker 
> Sent: Friday, 10 May, 2019 07:07 PM
> To: Pankaj Bansal ; Will Deacon
> ; Robin Murphy ; Joerg
> Roedel 
> Cc: iommu@lists.linux-foundation.org; Varun Sethi ; linux-
> arm-ker...@lists.infradead.org; Nipun Gupta 
> Subject: Re: [ARM SMMU] Dynamic StreamID allocation
> 
> On 10/05/2019 13:33, Pankaj Bansal wrote:
> > Hi Will/Robin/Joerg,
> >
> > I am s/w engineer from NXP India Pvt. Ltd.
> > We are using SMMU-V3 in one of NXP SOC.
> > I have a question about the SMMU Stream ID allocation in linux.
> >
> > Right now the Stream IDs allocated to a device are mapped via device tree to
> the device.
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix
> > ir.bootlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%
> > 2Fbindings%2Fiommu%2Farm%2Csmmu-
> v3.txt%23L39data=02%7C01%7Cpankaj
> > .bansal%40nxp.com%7C3cbe8bd4827e425afd0f08d6d54c925e%7C686ea1d3b
> c2b4c6
> >
> fa92cd99c5c301635%7C0%7C0%7C63693090665343sdata=vIG5u5n
> XR5iRp
> > uuuGjeFxKBtA5f5ohf91znXX0QWm1c%3Dreserved=0
> >
> > As the device tree is passed from bootloader to linux, we detect all the 
> > stream
> IDs needed by a device in bootloader and add their IDs in respective device
> nodes.
> > For each PCIE Endpoint (a unique BDF (Bus Device Function)) on PCIE bus, we
> are assigning a unique Stream ID in bootloader.
> >
> > However, this poses an issue with PCIE hot plug.
> > If we plug in a pcie device while linux is running, a unique BDF is 
> > assigned to
> the device, for which there is no stream ID in device tree.
> >
> > How can this problem be solved in linux?
> 
> Assuming the streamID associated to a BDF is predictable (streamID = BDF
> + constant), using the iommu-map property should just work:
> 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.boo
> tlin.com%2Flinux%2Flatest%2Fsource%2FDocumentation%2Fdevicetree%2Fbind
> ings%2Fpci%2Fpci-
> iommu.txtdata=02%7C01%7Cpankaj.bansal%40nxp.com%7C3cbe8bd482
> 7e425afd0f08d6d54c925e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %7C63693090665343sdata=GkkovEnvhd5dN%2BGdh%2FnKCyW5Cd
> EnLDP3cWTrk%2B%2FO7EQ%3Dreserved=0
> 
> It describes the streamIDs of all possible BDFs, including hotplugged 
> functions.

You mean that we should increase the "length" parameter (in 
(rid-base,iommu,iommu-base,length) touple) ?
This would cater to any *new* Bus Device Function being detected on PCIE bus?
Is that right ?
Right now when we make iommu-map in bootloader, we are giving one RID per BDF:
https://elixir.bootlin.com/u-boot/latest/source/drivers/pci/pcie_layerscape_fixup.c#L168

But isn't the better approach to make it dynamic in linux?
i.e. as soon as a new device is detected "requester id" is allocated to it from 
available pool.
When device is removed, return the "requester id" to pool.
is there any h/w limitation which prevents it?

Regards,
Pankaj Bansal
> 
> Thanks,
> Jean
> 
> >
> > Is there a way to assign (and revoke) stream IDs at run time?
> >
> > Regards,
> > Pankaj Bansal
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists
> > .infradead.org%2Fmailman%2Flistinfo%2Flinux-arm-
> kerneldata=02%7C0
> >
> 1%7Cpankaj.bansal%40nxp.com%7C3cbe8bd4827e425afd0f08d6d54c925e%7C6
> 86ea
> >
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63693090665343sda
> ta=2La
> > GBHO2%2Bbqk519uJvCatlHyRCtAPPjKO8Gxu1bQHBM%3Dreserved=0
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free

2019-05-13 Thread Christoph Hellwig
On Mon, May 06, 2019 at 09:54:30AM +0800, Lu Baolu wrote:
> Agreed. I will prepare the next version simply without the optimization, so 
> the offset is not required.
>
> For your changes in swiotlb, will you formalize them in patches or want
> me to do this?

Please do it yourself given that you still need the offset and thus a
rework of the patches anyway.