[PATCH AUTOSEL 5.4 13/46] dma-mapping: Fix dma_pgprot() for unencrypted coherent pages

2020-04-09 Thread Sasha Levin
From: Thomas Hellstrom 

[ Upstream commit 17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1 ]

When dma_mmap_coherent() sets up a mapping to unencrypted coherent memory
under SEV encryption and sometimes under SME encryption, it will actually
set up an encrypted mapping rather than an unencrypted, causing devices
that DMAs from that memory to read encrypted contents. Fix this.

When force_dma_unencrypted() returns true, the linear kernel map of the
coherent pages have had the encryption bit explicitly cleared and the
page content is unencrypted. Make sure that any additional PTEs we set
up to these pages also have the encryption bit cleared by having
dma_pgprot() return a protection with the encryption bit cleared in this
case.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Christoph Hellwig 
Acked-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20200304114527.3636-3-thomas...@shipmail.org
Signed-off-by: Sasha Levin 
---
 kernel/dma/mapping.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index d9334f31a5afb..8682a5305cb36 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,6 +169,8 @@ EXPORT_SYMBOL(dma_get_sgtable_attrs);
  */
 pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
 {
+   if (force_dma_unencrypted(dev))
+   prot = pgprot_decrypted(prot);
if (dev_is_dma_coherent(dev) ||
(IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
  (attrs & DMA_ATTR_NON_CONSISTENT)))
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-09 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric 
> Sent: Friday, April 10, 2020 11:28 AM
> To: Liu, Yi L ; Jean-Philippe Brucker  Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
> 
> Hi Yi,
> 
> On 4/9/20 2:47 PM, Liu, Yi L wrote:
> > Hi Jean,
> >
> >> From: Jean-Philippe Brucker 
> >> Sent: Thursday, April 9, 2020 4:15 PM
> >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1
> >> format to userspace
> >>
> >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> >>> Hi Yi,
> >>>
> >>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
>  Hi Jean,
> 
> > From: Jean-Philippe Brucker 
> > Sent: Friday, April 3, 2020 4:23 PM
> > To: Auger Eric  userspace
> >
> > On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >
> >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > @@ -2254,6 +2309,7
> > @@ static int vfio_iommu_info_add_nesting_cap(struct
>  vfio_iommu *iommu,
> > /* nesting iommu type supports PASID requests
> (alloc/free)
> >> */
> > nesting_cap->nesting_capabilities |=
> >> VFIO_IOMMU_PASID_REQS;
>  What is the meaning for ARM?
> >>>
> >>> I think it's just a software capability exposed to userspace, on
> >>> userspace side, it has a choice to use it or not. :-) The reason
> >>> define it and report it in cap nesting is that I'd like to make
> >>> the pasid alloc/free be available just for IOMMU with type
> >>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not
> >>> good for ARM. We can find a proper way to report the availability.
> >>
> >> Well it is more a question for jean-Philippe. Do we have a system
> >> wide PASID allocation on ARM?
> >
> > We don't, the PASID spaces are per-VM on Arm, so this function
> > should consult
> >> the
> > IOMMU driver before setting flags. As you said on patch 3, nested
> > doesn't necessarily imply PASID support. The SMMUv2 does not
> > support PASID but does support nesting stages 1 and 2 for the IOVA 
> > space.
> > SMMUv3 support of PASID depends on HW capabilities. So I think
> > this needs to
> >> be
> > finer grained:
> >
> > Does the container support:
> > * VFIO_IOMMU_PASID_REQUEST?
> >   -> Yes for VT-d 3
> >   -> No for Arm SMMU
> > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >   -> Yes for VT-d 3
> >   -> Sometimes for SMMUv2
> >   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler
> due to
> >  PASID tables being in GPA space.)
> > * VFIO_IOMMU_BIND_PASID_TABLE?
> >   -> No for VT-d
> >   -> Sometimes for SMMUv3
> >
> > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> 
>  good summary. do you expect to see any
> 
> >
> > +   nesting_cap->stage1_formats = formats;
>  as spotted by Kevin, since a single format is supported, rename
> >>>
> >>> ok, I was believing it may be possible on ARM or so. :-) will
> >>> rename it.
> >
> > Yes I don't think an u32 is going to cut it for Arm :( We need to
> > describe all sorts
> >> of
> > capabilities for page and PASID tables (granules, GPA size,
> > ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm
> > stage-1 format" wouldn't mean much. I guess we could have a secondary
> vendor capability for these?
> 
>  Actually, I'm wondering if we can define some formats to stands for
>  a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may
>  indicates the 1st level page table related caps (aw, a/d, SRE, EA
>  and etc.). And vIOMMU can parse the capabilities.
> >>>
> >>> But eventually do we really need all those capability getters? I
> >>> mean can't we simply rely on the actual call to
> >>> VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively
> >>> the error handling may be heavier on userspace but can't we manage.
> >>
> >> I think we need to present these capabilities at boot time, long
> >> before the guest triggers a bind(). For example if the host SMMU
> >> doesn't support 16-bit ASID, we need to communicate that to the guest
> >> using vSMMU ID registers or PROBE properties. Otherwise a bind() will
> >> succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will
> >> result in C_BAD_CD events which we'll inject into the guest, for no
> >> apparent reason from their perspective.
> >>
> >> In addition some VMMs may have fallbacks if shared page tables are
> >> not available. They could fall back to a MAP/UNMAP interface, or
> >> simply not present a vIOMMU to the guest.
> >>
> >
> > Based on the comments, I think it would be a need to report iommu caps
> > in detail. So I guess iommu uapi needs to provide something alike vfio
> > cap chain in iommu uapi. Please feel free let me know your thoughts.
> > :-)
> 
> 

Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-09 Thread Auger Eric
Hi Yi,

On 4/9/20 2:47 PM, Liu, Yi L wrote:
> Hi Jean,
> 
>> From: Jean-Philippe Brucker 
>> Sent: Thursday, April 9, 2020 4:15 PM
>> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
>> userspace
>>
>> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
>>> Hi Yi,
>>>
>>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
 Hi Jean,

> From: Jean-Philippe Brucker 
> Sent: Friday, April 3, 2020 4:23 PM
> To: Auger Eric 
> userspace
>
> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
>   header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
>
>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> @@ -2254,6 +2309,7
> @@ static int vfio_iommu_info_add_nesting_cap(struct
 vfio_iommu *iommu,
>   /* nesting iommu type supports PASID requests 
> (alloc/free)
>> */
>   nesting_cap->nesting_capabilities |=
>> VFIO_IOMMU_PASID_REQS;
 What is the meaning for ARM?
>>>
>>> I think it's just a software capability exposed to userspace, on
>>> userspace side, it has a choice to use it or not. :-) The reason
>>> define it and report it in cap nesting is that I'd like to make the
>>> pasid alloc/free be available just for IOMMU with type
>>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>>> for ARM. We can find a proper way to report the availability.
>>
>> Well it is more a question for jean-Philippe. Do we have a system wide
>> PASID allocation on ARM?
>
> We don't, the PASID spaces are per-VM on Arm, so this function should 
> consult
>> the
> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> necessarily imply PASID support. The SMMUv2 does not support PASID but 
> does
> support nesting stages 1 and 2 for the IOVA space.
> SMMUv3 support of PASID depends on HW capabilities. So I think this needs 
> to
>> be
> finer grained:
>
> Does the container support:
> * VFIO_IOMMU_PASID_REQUEST?
>   -> Yes for VT-d 3
>   -> No for Arm SMMU
> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
>   -> Yes for VT-d 3
>   -> Sometimes for SMMUv2
>   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due 
> to
>  PASID tables being in GPA space.)
> * VFIO_IOMMU_BIND_PASID_TABLE?
>   -> No for VT-d
>   -> Sometimes for SMMUv3
>
> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.

 good summary. do you expect to see any

>
> + nesting_cap->stage1_formats = formats;
 as spotted by Kevin, since a single format is supported, rename
>>>
>>> ok, I was believing it may be possible on ARM or so. :-) will rename
>>> it.
>
> Yes I don't think an u32 is going to cut it for Arm :( We need to 
> describe all sorts
>> of
> capabilities for page and PASID tables (granules, GPA size, ASID/PASID 
> size, HW
> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean 
> much. I
> guess we could have a secondary vendor capability for these?

 Actually, I'm wondering if we can define some formats to stands for a set 
 of
 capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
 page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
 the capabilities.
>>>
>>> But eventually do we really need all those capability getters? I mean
>>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
>>> to detect any mismatch? Definitively the error handling may be heavier
>>> on userspace but can't we manage.
>>
>> I think we need to present these capabilities at boot time, long before
>> the guest triggers a bind(). For example if the host SMMU doesn't support
>> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
>> registers or PROBE properties. Otherwise a bind() will succeed, but if the
>> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
>> which we'll inject into the guest, for no apparent reason from their
>> perspective.
>>
>> In addition some VMMs may have fallbacks if shared page tables are not
>> available. They could fall back to a MAP/UNMAP interface, or simply not
>> present a vIOMMU to the guest.
>>
> 
> Based on the comments, I think it would be a need to report iommu caps
> in detail. So I guess iommu uapi needs to provide something alike vfio
> cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

Yes to me it sounds sensible.
> 
> In vfio, we can define a cap as below:
> 
> struct vfio_iommu_type1_info_cap_nesting {
>   struct  vfio_info_cap_header header;
>   __u64   iommu_model;
> #define VFIO_IOMMU_PASID_REQS (1 << 0)
I still think the name shall be changed
> #define VFIO_IOMMU_BIND_GPASID(1 << 1)
> #define VFIO_IOMMU_CACHE_INV  (1 

Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Sergey Senozhatsky
On (20/04/09 10:08), Minchan Kim wrote:
> > > Even though I don't know how many usecase we have using zsmalloc as
> > > module(I heard only once by dumb reason), it could affect existing
> > > users. Thus, please include concrete explanation in the patch to
> > > justify when the complain occurs.
> > 
> > The justification is 'we can unexport functions that have no sane reason
> > of being exported in the first place'.
> > 
> > The Changelog pretty much says that.
> 
> Okay, I hope there is no affected user since this patch.
> If there are someone, they need to provide sane reason why they want
> to have zsmalloc as module.

I'm one of those who use zsmalloc as a module - mainly because I use zram
as a compressing general purpose block device, not as a swap device.
I create zram0, mkfs, mount, checkout and compile code, once done -
umount, rmmod. This reduces the number of writes to SSD. Some people use
tmpfs, but zram device(-s) can be much larger in size. That's a niche use
case and I'm not against the patch.

-ss
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: use DMA domain for real DMA devices and subdevices

2020-04-09 Thread Lu Baolu

Hi,

On 2020/4/10 3:17, Jon Derrick wrote:

The PCI devices handled by intel-iommu may have a DMA requester on another bus,
such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was missed
earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an
iommu DMA mapping, and fall back on dma_direct_map_page() for the
IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken
when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for the
subdevices will never set dev->archdata.iommu. This is because that function


Do you mind telling why not setting this?


uses find_domain() to check if there is already an IOMMU for the device, and
find_domain() then defers to the real DMA device which does have one. Thus
dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
iommu_need_mapping() on the subdevice. identity_mapping() returns
false because dev->archdata.iommu is NULL, so this function
returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
A failure is hit and the entire operation is aborted, because this
codepath is not intended to handle IDENTITY mappings:
if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
return NULL;


This is caused by the fragile private domain implementation. We are in
process of removing it by enhancing the iommu subsystem with per-group
default domain.

https://www.spinics.net/lists/iommu/msg42976.html

So ultimately VMD subdevices should have their own per-device iommu data
and support per-device dma ops.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/5] drm/msm: Attach the IOMMU device during initialization

2020-04-09 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index ce19f1d39367..6629a142574e 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -772,7 +772,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(_bus_type);
if (!domain)
@@ -788,13 +787,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda05436f716..9dba37c6344f 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 47b989834af1..1e9ba99fd9eb 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -644,13 +644,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(>dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(>dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354bcd46..91d993a16850 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(>lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(>mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(>mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(>kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(>lock);
aspace->name = name;
aspace->mmu = msm_gpummu_new(dev, gpu);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
+
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
 
drm_mm_init(>mm, (va_start >> 

[PATCH v6 2/5] iommu/arm-smmu: Add support for TTBR1

2020-04-09 Thread Jordan Crouse
Add support to enable TTBR1 if the domain requests it via the
DOMAIN_ATTR_SPLIT_TABLES attribute. If enabled by the hardware
and pagetable configuration the driver will configure the TTBR1 region
and program the domain pagetable on TTBR1. TTBR0 will be disabled.

After attaching the device the value of he domain attribute can
be queried to see if the split pagetables were successfully programmed.
The domain geometry will be updated as well so that the caller can
determine the active region for the pagetable that was programmed.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 48 ++--
 drivers/iommu/arm-smmu.h | 24 +++-
 2 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..db6d503c1673 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,16 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
-   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+   cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   } else {
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+cfg->asid);
+   }
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -673,6 +678,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = _domain->cfg;
+   unsigned long quirks = 0;
 
mutex_lock(_domain->init_mutex);
if (smmu_domain->smmu)
@@ -741,6 +747,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
fmt = ARM_64_LPAE_S1;
+
+   /*
+* We are assuming that split pagetables will always use
+* SEP_UPSTREAM so we don't need to reduce the size of
+* the ias to account for the sign extension bit
+*/
+   if (smmu_domain->split_pagetables)
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -810,6 +824,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
.tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
+   .quirks = quirks,
};
 
if (smmu_domain->non_strict)
@@ -823,8 +838,16 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
-   domain->geometry.force_aperture = true;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   domain->geometry.force_aperture = true;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.force_aperture = true;
+   smmu_domain->split_pagetables = false;
+   }
 
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, _cfg);
@@ -1526,6 +1549,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+   *(int *)data = smmu_domain->split_pagetables;
+   return 0;
default:
 

[PATCH v6 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2020-04-09 Thread Jordan Crouse
This is another iteration for the split pagetable support based on the
suggestions from Robin and Will [1].

Background: In order to support per-context pagetables the GPU needs to enable
split tables so that we can store global buffers in the TTBR1 space leaving the
GPU free to program the TTBR0 register with the address of a context specific
pagetable.

If the DOMAIN_ATTR_SPLIT_TABLES attribute is set on the domain before attaching,
the context bank assigned to the domain will be programmed to allow translations
in the TTBR1 space. Translations in the TTBR0 region will be disallowed because,
as Robin pointe out, having a un-programmed TTBR0 register is dangerous.

The driver can determine if TTBR1 was successfully programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

Upcoming changes will allow auxiliary domains to be attached to the device which
will enable and program TTBR0.

This patchset is based on top of linux-next-20200409

Change log:

v6: Cleanups for the arm-smmu TTBR1 patch from Will Deacon
v4: Only program TTBR1 when split pagetables are requested. TTBR0 will be
enabled later when an auxiliary domain is attached
v3: Remove the implementation specific and make split pagetable support
part of the generic configuration

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041373.html


Jordan Crouse (5):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add support for TTBR1
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 +++--
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 --
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 -
 drivers/gpu/drm/msm/msm_drv.h|  8 +---
 drivers/gpu/drm/msm/msm_gem_vma.c| 36 +++--
 drivers/gpu/drm/msm/msm_gpu.c| 49 +--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +-
 drivers/gpu/drm/msm/msm_gpummu.c |  6 ---
 drivers/gpu/drm/msm/msm_iommu.c  | 18 +
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 drivers/iommu/arm-smmu.c | 48 ++
 drivers/iommu/arm-smmu.h | 24 ---
 include/linux/iommu.h|  2 +
 21 files changed, 200 insertions(+), 155 deletions(-)

-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 5/5] drm/msm/a6xx: Support split pagetables

2020-04-09 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 02ade43d6335..b27daa77723c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -825,6 +825,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* Try to request split pagetables - the request has to be made before
+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
+
+   mmu = msm_iommu_new(>dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+* After the domain is attached, see if the split tables were actually
+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -847,7 +897,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2020-04-09 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Acked-by: Will Deacon 
Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda695..d0f96f748a00 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,8 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   /* Enable split pagetables (for example, TTBR1 on arm-smmu) */
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.17.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 4/5] drm/msm: Refactor address space initialization

2020-04-09 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Reviewed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 ++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 --
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 -
 drivers/gpu/drm/msm/msm_drv.h|  8 +---
 drivers/gpu/drm/msm/msm_gem_vma.c| 51 +++-
 drivers/gpu/drm/msm/msm_gpu.c| 40 +--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +-
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 82 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc18d500..60f6472a3e58 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(>dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index b67f88872726..0a5ea9f56cb8 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 253d8d85daad..b626afb0627d 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 724024a2243a..e81b1deaf535 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1439,6 +1439,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 68af24150de5..02ade43d6335 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -847,6 +847,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
 #endif
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 

Re: [PATCH 0/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-09 Thread Matthias Kaehlcke
On Tue, Feb 04, 2020 at 11:12:17PM +0530, Sai Prakash Ranjan wrote:
> Hello Robin, Will
> 
> On 2020-01-22 17:18, Sai Prakash Ranjan wrote:
> > This series allows drm devices to set a default identity
> > mapping using iommu_request_dm_for_dev(). First patch is
> > a cleanup to support other SoCs to call into QCOM specific
> > implementation and preparation for second patch.
> > Second patch sets the default identity domain for drm devices.
> > 
> > Jordan Crouse (1):
> >   iommu/arm-smmu: Allow client devices to select direct mapping
> > 
> > Sai Prakash Ranjan (1):
> >   iommu: arm-smmu-impl: Convert to a generic reset implementation
> > 
> >  drivers/iommu/arm-smmu-impl.c |  8 +++--
> >  drivers/iommu/arm-smmu-qcom.c | 55 +--
> >  drivers/iommu/arm-smmu.c  |  3 ++
> >  drivers/iommu/arm-smmu.h  |  5 
> >  4 files changed, 65 insertions(+), 6 deletions(-)
> 
> Any review comments?

Ping

What is the status of this series, is it ready to land or are any changes
needed?

Thanks

Matthias
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Benjamin Herrenschmidt
On Thu, 2020-04-09 at 11:41 +0200, Daniel Vetter wrote:
> Now if these boxes didn't ever have agp then I think we can get away
> with deleting this, since we've already deleted the legacy radeon
> driver. And that one used vmalloc for everything. The new kms one does
> use the dma-api if the gpu isn't connected through agp

Definitely no AGP there.

Cheers
Ben.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools

2020-04-09 Thread Tom Lendacky

On 4/8/20 4:21 PM, David Rientjes wrote:

When a device required unencrypted memory and the context does not allow


required => requires


blocking, memory must be returned from the atomic coherent pools.

This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the
config only requires CONFIG_DMA_COHERENT_POOL.  This will be used for
CONFIG_AMD_MEM_ENCRYPT in a subsequent patch.

Keep all memory in these pools unencrypted.

Signed-off-by: David Rientjes 
---
  kernel/dma/direct.c | 16 
  kernel/dma/pool.c   | 15 +--
  2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 70800ca64f13..44165263c185 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
  
+	/*

+* Unencrypted memory must come directly from DMA atomic pools if
+* blocking is not allowed.
+*/
+   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) {
+   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), , gfp);
+   if (!ret)
+   return NULL;
+   goto done;
+   }
+
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
dma_alloc_need_uncached(dev, attrs) &&
!gfpflags_allow_blocking(gfp)) {
@@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
  {
unsigned int page_order = get_order(size);
  
+	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&

+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
+   return;
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index e14c5a2da734..6685ab89cfa7 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
  
  	arch_dma_prep_coherent(page, pool_size);
  
+#ifdef CONFIG_DMA_DIRECT_REMAP

addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
goto free_page;
-
+#else
+   addr = page_to_virt(page);
+#endif
+   /*
+* Memory in the atomic DMA pools must be unencrypted, the pools do not
+* shrink so no re-encryption occurs in dma_direct_free_pages().
+*/
+   set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order);
ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
pool_size, NUMA_NO_NODE);
if (ret)
@@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
return 0;
  
  remove_mapping:

+#ifdef CONFIG_DMA_DIRECT_REMAP
dma_common_free_remap(addr, pool_size);


You're about to free the memory, but you've called set_memory_decrypted() 
against it, so you need to do a set_memory_encrypted() to bring it back to 
a state ready for allocation again.


Thanks,
Tom


-free_page:
+#endif
+free_page: __maybe_unused
if (!dma_release_from_contiguous(NULL, page, 1 << order))
__free_pages(page, order);
  out:


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 25/28] mm: remove vmalloc_user_node_flags

2020-04-09 Thread Andrii Nakryiko
cc Johannes who suggested this API call originally

On Wed, Apr 8, 2020 at 5:03 AM Christoph Hellwig  wrote:
>
> Open code it in __bpf_map_area_alloc, which is the only caller.  Also
> clean up __bpf_map_area_alloc to have a single vmalloc call with
> slightly different flags instead of the current two different calls.
>
> For this to compile for the nommu case add a __vmalloc_node_range stub
> to nommu.c.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/vmalloc.h |  1 -
>  kernel/bpf/syscall.c| 23 +--
>  mm/nommu.c  | 14 --
>  mm/vmalloc.c| 20 
>  4 files changed, 21 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 108f49b47756..f90f2946aac2 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -106,7 +106,6 @@ extern void *vzalloc(unsigned long size);
>  extern void *vmalloc_user(unsigned long size);
>  extern void *vmalloc_node(unsigned long size, int node);
>  extern void *vzalloc_node(unsigned long size, int node);
> -extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t 
> flags);
>  extern void *vmalloc_exec(unsigned long size);
>  extern void *vmalloc_32(unsigned long size);
>  extern void *vmalloc_32_user(unsigned long size);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 48d98ea8fad6..249d9bd43321 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -281,26 +281,29 @@ static void *__bpf_map_area_alloc(u64 size, int 
> numa_node, bool mmapable)
>  * __GFP_RETRY_MAYFAIL to avoid such situations.
>  */
>
> -   const gfp_t flags = __GFP_NOWARN | __GFP_ZERO;
> +   const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
> +   unsigned int flags = 0;
> +   unsigned long align = 1;
> void *area;
>
> if (size >= SIZE_MAX)
> return NULL;
>
> /* kmalloc()'ed memory can't be mmap()'ed */
> -   if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -   area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags,
> +   if (mmapable) {
> +   BUG_ON(!PAGE_ALIGNED(size));
> +   align = SHMLBA;
> +   flags = VM_USERMAP;
> +   } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> +   area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY,
> numa_node);
> if (area != NULL)
> return area;
> }
> -   if (mmapable) {
> -   BUG_ON(!PAGE_ALIGNED(size));
> -   return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
> -  __GFP_RETRY_MAYFAIL | flags);
> -   }
> -   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
> flags,
> - numa_node, __builtin_return_address(0));
> +
> +   return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
> +   gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL,
> +   flags, numa_node, __builtin_return_address(0));
>  }
>
>  void *bpf_map_area_alloc(u64 size, int numa_node)
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 81a86cd85893..b42cd6003d7d 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -150,6 +150,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> +void *__vmalloc_node_range(unsigned long size, unsigned long align,
> +   unsigned long start, unsigned long end, gfp_t gfp_mask,
> +   pgprot_t prot, unsigned long vm_flags, int node,
> +   const void *caller)
> +{
> +   return __vmalloc(size, flags);
> +}
> +
>  void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
> int node, const void *caller)
>  {
> @@ -180,12 +188,6 @@ void *vmalloc_user(unsigned long size)
>  }
>  EXPORT_SYMBOL(vmalloc_user);
>
> -void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
> -{
> -   return __vmalloc_user_flags(size, flags | __GFP_ZERO);
> -}
> -EXPORT_SYMBOL(vmalloc_user_node_flags);
> -
>  struct page *vmalloc_to_page(const void *addr)
>  {
> return virt_to_page(addr);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 333fbe77255a..f6f2acdaf70c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2658,26 +2658,6 @@ void *vzalloc_node(unsigned long size, int node)
>  }
>  EXPORT_SYMBOL(vzalloc_node);
>
> -/**
> - * vmalloc_user_node_flags - allocate memory for userspace on a specific node
> - * @size: allocation size
> - * @node: numa node
> - * @flags: flags for the page level allocator
> - *
> - * The resulting memory area is zeroed so it can be mapped to userspace
> - * without leaking data.
> - *
> - * Return: pointer to the allocated memory or %NULL on error
> - */
> -void 

[PATCH 0/1] Real DMA dev DMA domain patch

2020-04-09 Thread Jon Derrick
Sorry for the late patch here, but I hit the issue Baolu and Daniel
pointed out could occur, and requires this fix (or iommu=nopt).
Hoping to get it into an rc

Jon Derrick (1):
  iommu/vt-d: use DMA domain for real DMA devices and subdevices

 drivers/iommu/intel-iommu.c | 56 -
 1 file changed, 43 insertions(+), 13 deletions(-)

-- 
2.18.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: use DMA domain for real DMA devices and subdevices

2020-04-09 Thread Jon Derrick
The PCI devices handled by intel-iommu may have a DMA requester on another bus,
such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was missed
earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an
iommu DMA mapping, and fall back on dma_direct_map_page() for the
IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken
when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for the
subdevices will never set dev->archdata.iommu. This is because that function
uses find_domain() to check if there is already an IOMMU for the device, and
find_domain() then defers to the real DMA device which does have one. Thus
dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
   iommu_need_mapping() on the subdevice. identity_mapping() returns
   false because dev->archdata.iommu is NULL, so this function
   returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
   the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
   A failure is hit and the entire operation is aborted, because this
   codepath is not intended to handle IDENTITY mappings:
   if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
   return NULL;

This becomes problematic if the real DMA device and the subdevices have
different addressing capabilities and some require translation. Instead we can
put the real DMA dev and any subdevices on the DMA domain. This change assigns
subdevices to the DMA domain, and moves the real DMA device to the DMA domain
if necessary.

Reported-by: Daniel Drake 
Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Jon Derrick 
Signed-off-by: Daniel Drake 
---
 drivers/iommu/intel-iommu.c | 56 -
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b4844a502499 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3049,6 +3049,9 @@ static int device_def_domain_type(struct device *dev)
if ((iommu_identity_mapping & IDENTMAP_GFX) && 
IS_GFX_DEVICE(pdev))
return IOMMU_DOMAIN_IDENTITY;
 
+   if (pci_real_dma_dev(pdev) != pdev)
+   return IOMMU_DOMAIN_DMA;
+
/*
 * We want to start off with all devices in the 1:1 domain, and
 * take them out later if we find they can't access all of 
memory.
@@ -5781,12 +5784,32 @@ static bool intel_iommu_capable(enum iommu_cap cap)
return false;
 }
 
+static int intel_iommu_request_dma_domain_for_dev(struct device *dev,
+  struct dmar_domain *domain)
+{
+   int ret;
+
+   ret = iommu_request_dma_domain_for_dev(dev);
+   if (ret) {
+   dmar_remove_one_dev_info(dev);
+   domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
+   if (!get_private_domain_for_dev(dev)) {
+   dev_warn(dev,
+"Failed to get a private domain.\n");
+   return -ENOMEM;
+   }
+   }
+
+   return 0;
+}
+
 static int intel_iommu_add_device(struct device *dev)
 {
struct dmar_domain *dmar_domain;
struct iommu_domain *domain;
struct intel_iommu *iommu;
struct iommu_group *group;
+   struct device *real_dev = dev;
u8 bus, devfn;
int ret;
 
@@ -5810,6 +5833,21 @@ static int intel_iommu_add_device(struct device *dev)
 
domain = iommu_get_domain_for_dev(dev);
dmar_domain = to_dmar_domain(domain);
+
+   if (dev_is_pci(dev))
+   real_dev = _real_dma_dev(to_pci_dev(dev))->dev;
+
+   if (real_dev != dev) {
+   domain = iommu_get_domain_for_dev(real_dev);
+   if (domain->type != IOMMU_DOMAIN_DMA) {
+   dmar_remove_one_dev_info(real_dev);
+
+   ret = intel_iommu_request_dma_domain_for_dev(real_dev, 
dmar_domain);
+   if (ret)
+   goto unlink;
+   }
+   }
+
if (domain->type == IOMMU_DOMAIN_DMA) {
if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {
ret = iommu_request_dm_for_dev(dev);
@@ -5823,20 +5861,12 @@ static int intel_iommu_add_device(struct device *dev)
}
} else {
   

Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Minchan Kim
On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> This allows to unexport map_vm_area and unmap_kernel_range, which are
> rather deep internal and should not be available to modules.

Even though I don't know how many usecase we have using zsmalloc as
module(I heard only once by dumb reason), it could affect existing
users. Thus, please include concrete explanation in the patch to
justify when the complain occurs.

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/Kconfig   | 2 +-
>  mm/vmalloc.c | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 36949a9425b8..614cc786b519 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -702,7 +702,7 @@ config ZSMALLOC
>  
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y
>   help
> By default, zsmalloc uses a copy-based object mapping method to
> access allocations that span two pages. However, if a particular
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3375f9508ef6..9183fc0d365a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2046,7 +2046,6 @@ void unmap_kernel_range(unsigned long addr, unsigned 
> long size)
>   vunmap_page_range(addr, end);
>   flush_tlb_kernel_range(addr, end);
>  }
> -EXPORT_SYMBOL_GPL(unmap_kernel_range);
>  
>  int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
>  {
> @@ -2058,7 +2057,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
> struct page **pages)
>  
>   return err > 0 ? 0 : err;
>  }
> -EXPORT_SYMBOL_GPL(map_vm_area);
>  
>  static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
>   struct vmap_area *va, unsigned long flags, const void *caller)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-09 Thread Minchan Kim
On Wed, Apr 08, 2020 at 01:59:07PM +0200, Christoph Hellwig wrote:
> Rename the Kconfig variable to clarify the scope.
> 
> Signed-off-by: Christoph Hellwig 
Acked-by: Minchan Kim 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Minchan Kim
On Thu, Apr 09, 2020 at 06:50:30PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote:
> > On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> > > This allows to unexport map_vm_area and unmap_kernel_range, which are
> > > rather deep internal and should not be available to modules.
> > 
> > Even though I don't know how many usecase we have using zsmalloc as
> > module(I heard only once by dumb reason), it could affect existing
> > users. Thus, please include concrete explanation in the patch to
> > justify when the complain occurs.
> 
> The justification is 'we can unexport functions that have no sane reason
> of being exported in the first place'.
> 
> The Changelog pretty much says that.

Okay, I hope there is no affected user since this patch.
If there are someone, they need to provide sane reason why they want
to have zsmalloc as module.

Acked-by: Minchan Kim 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-09 Thread Wei Liu
On Wed, Apr 08, 2020 at 01:58:59PM +0200, Christoph Hellwig wrote:
> Use the designated helper for allocating executable kernel memory, and
> remove the now unused PAGE_KERNEL_RX define.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Wei Liu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jason Gunthorpe
On Thu, Apr 09, 2020 at 09:21:34AM -0700, Jacob Pan wrote:
> On Thu, 9 Apr 2020 11:25:19 -0300
> Jason Gunthorpe  wrote:
> 
> > On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote:
> > > > When the process is killed, mm release can happen before fds are
> > > > released. If you look at do_exit() in kernel/exit.c:
> > > > 
> > > > exit_mm()
> > > >   mmput()  
> > > >-> mmu release notifier
> > > > ...
> > > > exit_files()
> > > >   close_files()
> > > > fput()
> > > > exit_task_work()
> > > >   __fput()  
> > > >-> unbind()
> > > >   
> > > So unbind is coming anyway, the difference in handling in mmu
> > > release notifier is whether we silently drop DMA fault vs.
> > > reporting fault?  
> > 
> > Userspace can significantly delay the final fput triggering the
> > unbind, the above is only for the trivial case where the process
> > owning the mm_struct is the only process holding the fd.
> > 
> Are you talking about FDs owned buy children after fork() or FDs sent
> over to another process. I think, in either case SVA is not supported.

Supported or not a hostile user space can trigger these conditions and
it should not cause misbehavior from the kernel (eg log spamming)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Peter Zijlstra
On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote:
> On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> > This allows to unexport map_vm_area and unmap_kernel_range, which are
> > rather deep internal and should not be available to modules.
> 
> Even though I don't know how many usecase we have using zsmalloc as
> module(I heard only once by dumb reason), it could affect existing
> users. Thus, please include concrete explanation in the patch to
> justify when the complain occurs.

The justification is 'we can unexport functions that have no sane reason
of being exported in the first place'.

The Changelog pretty much says that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jacob Pan
On Thu, 9 Apr 2020 09:08:21 -0300
Jason Gunthorpe  wrote:

> On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:
> > > Yes, this is the proper way, when the DMA is stopped and no use
> > > of the PASID remains then you can drop the mmu notifier and
> > > release the PASID entirely. If that is linked to the lifetime of
> > > the FD then forget completely about the mm_struct lifetime, it
> > > doesn't matter.. 
> > Got everything above, thanks a lot.
> > 
> > If everything is in order with the FD close. Why do we need to 
> > "ask IOMMU drivers to silently abort DMA and Page Requests in the
> > meantime." in mm_exit notifier? This will be done orderly in unbind
> > anyway.  
> 
> I thought this is exactly what Jean-Phillippe is removing here, it is
> a bad idea for the reasons he explained.
> 
I think this patch only removed driver side callbacks, i.e. to stop
DMA. But not removing IOMMU side of stop DMA, PRS.

Before uacce, (universal accelerator framework), sva bind/unbind is not
guaranteed at open/close FD time. Therefore, mmu notifier is needed if
mmexit comes without unbind.

> > > > Enforcing unbind upon FD close might be a precarious path,
> > > > perhaps that is why we have to deal with out of order
> > > > situation?
> > > 
> > > How so? You just put it in the FD release function :)  
> > 
> > I was thinking some driver may choose to defer unbind in some
> > workqueue etc.  
> 
> Doesn't really change anything, the lifetime of the PASID wouuld be
> the lifetime of the notifier and the bind, and DMA can't continue
> without the notifier registered.
> 
True, it is just better not to defer. Otherwise, the window of
suppressing error gets longer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jacob Pan
On Thu, 9 Apr 2020 16:50:58 +0200
Jean-Philippe Brucker  wrote:

> On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote:
> > On Thu, 9 Apr 2020 08:39:05 +0200
> > Jean-Philippe Brucker  wrote:
> >   
> > > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:  
> > > > On Wed, 8 Apr 2020 19:32:18 -0300
> > > > Jason Gunthorpe  wrote:
> > > > 
> > > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote:
> > > > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan
> > > > > > > wrote:  
> > > > > > > > Hi Jean,
> > > > > > > > 
> > > > > > > > On Wed,  8 Apr 2020 16:04:25 +0200
> > > > > > > > Jean-Philippe Brucker  wrote:
> > > > > > > > 
> > > > > > > > > The IOMMU SVA API currently requires device drivers to
> > > > > > > > > implement an mm_exit() callback, which stops device
> > > > > > > > > jobs that do DMA. This function is called in the
> > > > > > > > > release() MMU notifier, when an address space that is
> > > > > > > > > shared with a device exits.
> > > > > > > > > 
> > > > > > > > > It has been noted several time during discussions
> > > > > > > > > about SVA that cancelling DMA jobs can be slow and
> > > > > > > > > complex, and doing it in the release() notifier might
> > > > > > > > > cause synchronization issues (patch 2 has more
> > > > > > > > > background). Device drivers must in any case call
> > > > > > > > > unbind() to remove their bond, after stopping DMA
> > > > > > > > > from a more favorable context (release of a file
> > > > > > > > > descriptor).
> > > > > > > > > 
> > > > > > > > > So after mm exits, rather than notifying device
> > > > > > > > > drivers, we can hold on to the PASID until unbind(),
> > > > > > > > > ask IOMMU drivers to silently abort DMA and Page
> > > > > > > > > Requests in the meantime. This change should relieve
> > > > > > > > > the mmput() path.
> > > > > > > >
> > > > > > > > I assume mm is destroyed after all the FDs are
> > > > > > > > closed
> > > > > > > 
> > > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(),
> > > > > > > ie anything using mmu_notifiers has to hold a grab until
> > > > > > > the notifier is destroyed, which is often triggered by FD
> > > > > > > close. 
> > > > > > Sorry, I don't get this. Are you saying we have to hold a
> > > > > > mmgrab() between svm_bind/mmu_notifier_register and
> > > > > > svm_unbind/mmu_notifier_unregister?  
> > > > > 
> > > > > Yes. This is done automatically for the caller inside the
> > > > > mmu_notifier implementation. We now even store the mm_struct
> > > > > pointer inside the notifier.
> > > > > 
> > > > > Once a notifier is registered the mm_struct remains valid
> > > > > memory until the notifier is unregistered.
> > > > > 
> > > > > > Isn't the idea of mmu_notifier is to avoid holding the mm
> > > > > > reference and rely on the notifier to tell us when mm is
> > > > > > going away?  
> > > > > 
> > > > > The notifier only holds a mmgrab(), not a mmget() - this
> > > > > allows exit_mmap to proceed, but the mm_struct memory remains.
> > > > > 
> > > > > This is also probably why it is a bad idea to tie the
> > > > > lifetime of something like a pasid to the mmdrop as a evil
> > > > > user could cause a large number of mm structs to be released
> > > > > but not freed, probably defeating cgroup limits and so forth
> > > > > (not sure) 
> > > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab
> > > > > > after mmu_notifier_register.  
> > > > > 
> > > > > It is done internally to the implementation.
> > > > > 
> > > > > > > So the exit_mmap() -> release() may happen before the FDs
> > > > > > > are destroyed, but the final mmdrop() will be during some
> > > > > > > FD release when the final mmdrop() happens.  
> > > > > > 
> > > > > > Do you mean mmdrop() is after FD release?   
> > > > > 
> > > > > Yes, it will be done by the mmu_notifier_unregister(), which
> > > > > should be called during FD release if the iommu lifetime is
> > > > > linked to some FD.   
> > > > > > If so, unbind is called in FD release should take care of
> > > > > > everything, i.e. stops DMA, clear PASID context on IOMMU,
> > > > > > flush PRS queue etc.  
> > > > > 
> > > > > Yes, this is the proper way, when the DMA is stopped and no
> > > > > use of the PASID remains then you can drop the mmu notifier
> > > > > and release the PASID entirely. If that is linked to the
> > > > > lifetime of the FD then forget completely about the mm_struct
> > > > > lifetime, it doesn't matter..   
> > > > Got everything above, thanks a lot.
> > > > 
> > > > If everything is in order with the FD close. Why do we need to 
> > > > "ask IOMMU drivers to silently abort DMA and Page Requests in
> > > > the meantime." in mm_exit notifier? This will be done orderly
> > > > in unbind anyway.
> > > 
> > > When the process is killed, mm release can happen before fds are
> > > released. If you look at do_exit() in 

Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jacob Pan
On Thu, 9 Apr 2020 11:25:19 -0300
Jason Gunthorpe  wrote:

> On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote:
> > > When the process is killed, mm release can happen before fds are
> > > released. If you look at do_exit() in kernel/exit.c:
> > > 
> > >   exit_mm()
> > > mmput()  
> > >  -> mmu release notifier
> > >   ...
> > >   exit_files()
> > > close_files()
> > >   fput()
> > >   exit_task_work()
> > > __fput()  
> > >  -> unbind()
> > >   
> > So unbind is coming anyway, the difference in handling in mmu
> > release notifier is whether we silently drop DMA fault vs.
> > reporting fault?  
> 
> Userspace can significantly delay the final fput triggering the
> unbind, the above is only for the trivial case where the process
> owning the mm_struct is the only process holding the fd.
> 
Are you talking about FDs owned buy children after fork() or FDs sent
over to another process. I think, in either case SVA is not supported.

> The destruction of a mm_struct should be treated the same as unmapping
> every vma in the process. The observable effect should be no different
> than munmap.
> 
Good point. I agree, we should suppress the error in the window before
unbind.

> Jason

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Daniel Vetter
On Thu, Apr 9, 2020 at 4:19 PM Alex Deucher  wrote:
>
> On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter  wrote:
> >
> > On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
> >  wrote:
> > >
> > > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > > > If this code was broken for non-coherent caches a crude powerpc hack
> > > > > isn't going to help anyone else.  Remove the hack as it is the last
> > > > > user of __vmalloc passing a page protection flag other than 
> > > > > PAGE_KERNEL.
> > > >
> > > > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > > > layer in drm from back then isn't going to work in other places. I guess
> > > > should have at least an ack from him, in case anyone still cares about
> > > > this on ppc. Adding Ben to cc.
> > >
> > > This was due to some drivers (radeon ?) trying to use vmalloc pages for
> > > coherent DMA, which means on those 4xx powerpc's need to be non-cached.
> > >
> > > There were machines using that (440 based iirc), though I honestly
> > > can't tell if anybody still uses any of it.
> >
> > agp subsystem still seems to happily do that (vmalloc memory for
> > device access), never having been ported to dma apis (or well
> > converted to iommu drivers, which they kinda are really). So I think
> > this all still works exactly as back then, even with the kms radeon
> > drivers. Question really is whether we have users left, and I have no
> > clue about that either.
> >
> > Now if these boxes didn't ever have agp then I think we can get away
> > with deleting this, since we've already deleted the legacy radeon
> > driver. And that one used vmalloc for everything. The new kms one does
> > use the dma-api if the gpu isn't connected through agp.
>
> All radeons have a built in remapping table to handle non-AGP systems.
> On the earlier radeons it wasn't quite as performant as AGP, but it
> was always more reliable because AGP is AGP.  Maybe it's time to let
> AGP go?

I'd be very much in favour of that, if we can just use the integrated
gart and drop agp fast writes wobbliness on the floor. I think the
only other modern driver using agp would be nouveau at that point.
-Daniel

>
> Alex
>
> > -Daniel
> >
> > > Cheers,
> > > Ben.
> > >
> > > > -Daniel
> > > >
> > > > >
> > > > > Signed-off-by: Christoph Hellwig 
> > > > > ---
> > > > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > > > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_scatter.c 
> > > > > b/drivers/gpu/drm/drm_scatter.c
> > > > > index ca520028b2cb..f4e6184d1877 100644
> > > > > --- a/drivers/gpu/drm/drm_scatter.c
> > > > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > > > @@ -43,15 +43,6 @@
> > > > >
> > > > >  #define DEBUG_SCATTER 0
> > > > >
> > > > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > > > -{
> > > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > > > -   return __vmalloc(size, GFP_KERNEL, 
> > > > > pgprot_noncached_wc(PAGE_KERNEL));
> > > > > -#else
> > > > > -   return vmalloc_32(size);
> > > > > -#endif
> > > > > -}
> > > > > -
> > > > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > > > >  {
> > > > > struct page *page;
> > > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, 
> > > > > void *data,
> > > > > return -ENOMEM;
> > > > > }
> > > > >
> > > > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > > > if (!entry->virtual) {
> > > > > kfree(entry->busaddr);
> > > > > kfree(entry->pagelist);
> > > > > --
> > > > > 2.25.1
> > > > >
> > > >
> > > >
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote:
> On Thu, 9 Apr 2020 08:39:05 +0200
> Jean-Philippe Brucker  wrote:
> 
> > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:
> > > On Wed, 8 Apr 2020 19:32:18 -0300
> > > Jason Gunthorpe  wrote:
> > >   
> > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote:  
> > > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote:
> > > > > > > Hi Jean,
> > > > > > > 
> > > > > > > On Wed,  8 Apr 2020 16:04:25 +0200
> > > > > > > Jean-Philippe Brucker  wrote:
> > > > > > >   
> > > > > > > > The IOMMU SVA API currently requires device drivers to
> > > > > > > > implement an mm_exit() callback, which stops device jobs
> > > > > > > > that do DMA. This function is called in the release() MMU
> > > > > > > > notifier, when an address space that is shared with a
> > > > > > > > device exits.
> > > > > > > > 
> > > > > > > > It has been noted several time during discussions about
> > > > > > > > SVA that cancelling DMA jobs can be slow and complex, and
> > > > > > > > doing it in the release() notifier might cause
> > > > > > > > synchronization issues (patch 2 has more background).
> > > > > > > > Device drivers must in any case call unbind() to remove
> > > > > > > > their bond, after stopping DMA from a more favorable
> > > > > > > > context (release of a file descriptor).
> > > > > > > > 
> > > > > > > > So after mm exits, rather than notifying device drivers,
> > > > > > > > we can hold on to the PASID until unbind(), ask IOMMU
> > > > > > > > drivers to silently abort DMA and Page Requests in the
> > > > > > > > meantime. This change should relieve the mmput()
> > > > > > > > path.  
> > > > > > >
> > > > > > > I assume mm is destroyed after all the FDs are closed  
> > > > > > 
> > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie
> > > > > > anything using mmu_notifiers has to hold a grab until the
> > > > > > notifier is destroyed, which is often triggered by FD close.
> > > > > > 
> > > > > Sorry, I don't get this. Are you saying we have to hold a
> > > > > mmgrab() between svm_bind/mmu_notifier_register and
> > > > > svm_unbind/mmu_notifier_unregister?
> > > > 
> > > > Yes. This is done automatically for the caller inside the
> > > > mmu_notifier implementation. We now even store the mm_struct
> > > > pointer inside the notifier.
> > > > 
> > > > Once a notifier is registered the mm_struct remains valid memory
> > > > until the notifier is unregistered.
> > > >   
> > > > > Isn't the idea of mmu_notifier is to avoid holding the mm
> > > > > reference and rely on the notifier to tell us when mm is going
> > > > > away?
> > > > 
> > > > The notifier only holds a mmgrab(), not a mmget() - this allows
> > > > exit_mmap to proceed, but the mm_struct memory remains.
> > > > 
> > > > This is also probably why it is a bad idea to tie the lifetime of
> > > > something like a pasid to the mmdrop as a evil user could cause a
> > > > large number of mm structs to be released but not freed, probably
> > > > defeating cgroup limits and so forth (not sure)
> > > >   
> > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab
> > > > > after mmu_notifier_register.
> > > > 
> > > > It is done internally to the implementation.
> > > >   
> > > > > > So the exit_mmap() -> release() may happen before the FDs are
> > > > > > destroyed, but the final mmdrop() will be during some FD
> > > > > > release when the final mmdrop() happens.
> > > > > 
> > > > > Do you mean mmdrop() is after FD release? 
> > > > 
> > > > Yes, it will be done by the mmu_notifier_unregister(), which
> > > > should be called during FD release if the iommu lifetime is
> > > > linked to some FD. 
> > > > > If so, unbind is called in FD release should take care of
> > > > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush
> > > > > PRS queue etc.
> > > > 
> > > > Yes, this is the proper way, when the DMA is stopped and no use
> > > > of the PASID remains then you can drop the mmu notifier and
> > > > release the PASID entirely. If that is linked to the lifetime of
> > > > the FD then forget completely about the mm_struct lifetime, it
> > > > doesn't matter.. 
> > > Got everything above, thanks a lot.
> > > 
> > > If everything is in order with the FD close. Why do we need to 
> > > "ask IOMMU drivers to silently abort DMA and Page Requests in the
> > > meantime." in mm_exit notifier? This will be done orderly in unbind
> > > anyway.  
> > 
> > When the process is killed, mm release can happen before fds are
> > released. If you look at do_exit() in kernel/exit.c:
> > 
> > exit_mm()
> >   mmput()
> >-> mmu release notifier  
> > ...
> > exit_files()
> >   close_files()
> > fput()
> > exit_task_work()
> >   __fput()
> >-> unbind()  
> > 
> So unbind is coming anyway, the difference in handling in mmu release
> notifier is whether we 

Re: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner

2020-04-09 Thread Joerg Roedel
Hi Marek,

On Thu, Apr 09, 2020 at 03:58:00PM +0200, Marek Szyprowski wrote:
> The main problem after your conversion is the fact that ->probe_device() 
> is called very early, before any other platform device (thus IOMMU 
> controller) is is probed. It doesn't handle EPROBE_DEFER too.

I don't quite understand why probe_device() is called too early, as it
is called at the same time add_device() was called before. But anyway,
I have seen a similar problem on OMAP. If the SYSMMU for a master is not
probed yet when probe_device() is called, it can just return -ENODEV and
in your driver you just call but_iommu_probe() when a new SYSMMU got
initialized to re-probe uninitialized masters on the bus. This patch-set
contains a change to export bus_iommu_probe() for exactly that reason.

What do you think?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jason Gunthorpe
On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote:
> > When the process is killed, mm release can happen before fds are
> > released. If you look at do_exit() in kernel/exit.c:
> > 
> > exit_mm()
> >   mmput()
> >-> mmu release notifier  
> > ...
> > exit_files()
> >   close_files()
> > fput()
> > exit_task_work()
> >   __fput()
> >-> unbind()  
> > 
> So unbind is coming anyway, the difference in handling in mmu release
> notifier is whether we silently drop DMA fault vs. reporting fault?

Userspace can significantly delay the final fput triggering the
unbind, the above is only for the trivial case where the process
owning the mm_struct is the only process holding the fd.

The destruction of a mm_struct should be treated the same as unmapping
every vma in the process. The observable effect should be no different
than munmap.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Alex Deucher
On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter  wrote:
>
> On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
>  wrote:
> >
> > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > > If this code was broken for non-coherent caches a crude powerpc hack
> > > > isn't going to help anyone else.  Remove the hack as it is the last
> > > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> > >
> > > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > > layer in drm from back then isn't going to work in other places. I guess
> > > should have at least an ack from him, in case anyone still cares about
> > > this on ppc. Adding Ben to cc.
> >
> > This was due to some drivers (radeon ?) trying to use vmalloc pages for
> > coherent DMA, which means on those 4xx powerpc's need to be non-cached.
> >
> > There were machines using that (440 based iirc), though I honestly
> > can't tell if anybody still uses any of it.
>
> agp subsystem still seems to happily do that (vmalloc memory for
> device access), never having been ported to dma apis (or well
> converted to iommu drivers, which they kinda are really). So I think
> this all still works exactly as back then, even with the kms radeon
> drivers. Question really is whether we have users left, and I have no
> clue about that either.
>
> Now if these boxes didn't ever have agp then I think we can get away
> with deleting this, since we've already deleted the legacy radeon
> driver. And that one used vmalloc for everything. The new kms one does
> use the dma-api if the gpu isn't connected through agp.

All radeons have a built in remapping table to handle non-AGP systems.
On the earlier radeons it wasn't quite as performant as AGP, but it
was always more reliable because AGP is AGP.  Maybe it's time to let
AGP go?

Alex

> -Daniel
>
> > Cheers,
> > Ben.
> >
> > > -Daniel
> > >
> > > >
> > > > Signed-off-by: Christoph Hellwig 
> > > > ---
> > > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_scatter.c 
> > > > b/drivers/gpu/drm/drm_scatter.c
> > > > index ca520028b2cb..f4e6184d1877 100644
> > > > --- a/drivers/gpu/drm/drm_scatter.c
> > > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > > @@ -43,15 +43,6 @@
> > > >
> > > >  #define DEBUG_SCATTER 0
> > > >
> > > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > > -{
> > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > > -   return __vmalloc(size, GFP_KERNEL, 
> > > > pgprot_noncached_wc(PAGE_KERNEL));
> > > > -#else
> > > > -   return vmalloc_32(size);
> > > > -#endif
> > > > -}
> > > > -
> > > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > > >  {
> > > > struct page *page;
> > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, 
> > > > void *data,
> > > > return -ENOMEM;
> > > > }
> > > >
> > > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > > if (!entry->virtual) {
> > > > kfree(entry->busaddr);
> > > > kfree(entry->pagelist);
> > > > --
> > > > 2.25.1
> > > >
> > >
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/exynos: Rework intialization

2020-04-09 Thread Marek Szyprowski
Fix initialization after driver conversion to
probe_device()/release_device(). Prepared on top of:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/exynos-iommu.c | 80 +---
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index f865c90..53c784f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -565,6 +565,7 @@ static void sysmmu_tlb_invalidate_entry(struct 
sysmmu_drvdata *data,
 }
 
 static const struct iommu_ops exynos_iommu_ops;
+static int exynos_iommu_initialize_owner(struct device *sysmmu);
 
 static int exynos_sysmmu_probe(struct platform_device *pdev)
 {
@@ -573,6 +574,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
struct sysmmu_drvdata *data;
struct resource *res;
 
+   dev_info(dev, "%s %d\n", __func__, __LINE__);
+
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
@@ -649,6 +652,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
 
+   exynos_iommu_initialize_owner(dev);
+
return 0;
 }
 
@@ -1225,24 +1230,8 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *iommu_domain,
 
 static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
-   struct sysmmu_drvdata *data;
-
-   if (!has_sysmmu(dev))
-   return ERR_PTR(-ENODEV);
-
-   list_for_each_entry(data, >controllers, owner_node) {
-   /*
-* SYSMMU will be runtime activated via device link
-* (dependency) to its master device, so there are no
-* direct calls to pm_runtime_get/put in this driver.
-*/
-   data->link = device_link_add(dev, data->sysmmu,
-DL_FLAG_STATELESS |
-DL_FLAG_PM_RUNTIME);
-   }
-
-   return >iommu;
+   /* this is called too early on ARM 32bit to do anything usefull */
+   return ERR_PTR(-ENODEV);
 }
 
 static void exynos_iommu_release_device(struct device *dev)
@@ -1268,7 +1257,8 @@ static void exynos_iommu_release_device(struct device 
*dev)
device_link_del(data->link);
 }
 
-static int exynos_iommu_device_init(struct exynos_iommu_owner *owner)
+static int exynos_iommu_device_init(struct device *dev,
+   struct exynos_iommu_owner *owner)
 {
static u32 counter = 0;
int ret;
@@ -1287,6 +1277,12 @@ static int exynos_iommu_device_init(struct 
exynos_iommu_owner *owner)
 
iommu_device_set_ops(>iommu, _iommu_ops);
 
+   /*
+* the above iommu_device_set_ops is not enough, initializing fwspec
+* is also required
+*/
+   iommu_fwspec_init(dev, >of_node->fwnode, _iommu_ops);
+
return 0;
 }
 
@@ -1308,7 +1304,7 @@ static int exynos_owner_init(struct device *dev)
if (!owner)
return -ENOMEM;
 
-   ret = exynos_iommu_device_init(owner);
+   ret = exynos_iommu_device_init(dev, owner);
if (ret)
goto out_free_owner;
 
@@ -1330,34 +1326,51 @@ static int exynos_owner_init(struct device *dev)
return ret;
 }
 
-static int exynos_iommu_of_xlate(struct device *dev,
-struct of_phandle_args *spec)
+static int exynos_iommu_dev_match_owner(struct device *dev, const void *data)
+{
+   const struct device *sysmmu = data;
+   struct device_node *np;
+   int idx = 0;
+
+   do {
+   np = of_parse_phandle(dev->of_node, "iommus", idx++);
+   if (np == sysmmu->of_node)
+   return true;
+   } while (np);
+
+   return false;
+}
+
+static int exynos_iommu_initialize_owner(struct device *sysmmu)
 {
-   struct platform_device *sysmmu = of_find_device_by_node(spec->np);
-   struct sysmmu_drvdata *data, *entry;
+   struct sysmmu_drvdata *data = dev_get_drvdata(sysmmu);
struct exynos_iommu_owner *owner;
+   struct device *dev;
int ret;
 
-   if (!sysmmu)
+   dev = bus_find_device(_bus_type, NULL, sysmmu,
+ exynos_iommu_dev_match_owner);
+   if (!dev)
return -ENODEV;
 
-   data = platform_get_drvdata(sysmmu);
-   if (!data)
-   return -ENODEV;
+   dev_info(sysmmu, "found master device %s\n", dev_name(dev));
 
ret = exynos_owner_init(dev);
if (ret)
return ret;
 
owner = dev->archdata.iommu;
-
-   list_for_each_entry(entry, >controllers, owner_node)
-   if (entry == data)
-   return 0;
-

Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jacob Pan
On Thu, 9 Apr 2020 08:39:05 +0200
Jean-Philippe Brucker  wrote:

> On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:
> > On Wed, 8 Apr 2020 19:32:18 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote:  
> > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote:
> > > > > > Hi Jean,
> > > > > > 
> > > > > > On Wed,  8 Apr 2020 16:04:25 +0200
> > > > > > Jean-Philippe Brucker  wrote:
> > > > > >   
> > > > > > > The IOMMU SVA API currently requires device drivers to
> > > > > > > implement an mm_exit() callback, which stops device jobs
> > > > > > > that do DMA. This function is called in the release() MMU
> > > > > > > notifier, when an address space that is shared with a
> > > > > > > device exits.
> > > > > > > 
> > > > > > > It has been noted several time during discussions about
> > > > > > > SVA that cancelling DMA jobs can be slow and complex, and
> > > > > > > doing it in the release() notifier might cause
> > > > > > > synchronization issues (patch 2 has more background).
> > > > > > > Device drivers must in any case call unbind() to remove
> > > > > > > their bond, after stopping DMA from a more favorable
> > > > > > > context (release of a file descriptor).
> > > > > > > 
> > > > > > > So after mm exits, rather than notifying device drivers,
> > > > > > > we can hold on to the PASID until unbind(), ask IOMMU
> > > > > > > drivers to silently abort DMA and Page Requests in the
> > > > > > > meantime. This change should relieve the mmput()
> > > > > > > path.  
> > > > > >
> > > > > > I assume mm is destroyed after all the FDs are closed  
> > > > > 
> > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie
> > > > > anything using mmu_notifiers has to hold a grab until the
> > > > > notifier is destroyed, which is often triggered by FD close.
> > > > > 
> > > > Sorry, I don't get this. Are you saying we have to hold a
> > > > mmgrab() between svm_bind/mmu_notifier_register and
> > > > svm_unbind/mmu_notifier_unregister?
> > > 
> > > Yes. This is done automatically for the caller inside the
> > > mmu_notifier implementation. We now even store the mm_struct
> > > pointer inside the notifier.
> > > 
> > > Once a notifier is registered the mm_struct remains valid memory
> > > until the notifier is unregistered.
> > >   
> > > > Isn't the idea of mmu_notifier is to avoid holding the mm
> > > > reference and rely on the notifier to tell us when mm is going
> > > > away?
> > > 
> > > The notifier only holds a mmgrab(), not a mmget() - this allows
> > > exit_mmap to proceed, but the mm_struct memory remains.
> > > 
> > > This is also probably why it is a bad idea to tie the lifetime of
> > > something like a pasid to the mmdrop as a evil user could cause a
> > > large number of mm structs to be released but not freed, probably
> > > defeating cgroup limits and so forth (not sure)
> > >   
> > > > It seems both Intel and AMD iommu drivers don't hold mmgrab
> > > > after mmu_notifier_register.
> > > 
> > > It is done internally to the implementation.
> > >   
> > > > > So the exit_mmap() -> release() may happen before the FDs are
> > > > > destroyed, but the final mmdrop() will be during some FD
> > > > > release when the final mmdrop() happens.
> > > > 
> > > > Do you mean mmdrop() is after FD release? 
> > > 
> > > Yes, it will be done by the mmu_notifier_unregister(), which
> > > should be called during FD release if the iommu lifetime is
> > > linked to some FD. 
> > > > If so, unbind is called in FD release should take care of
> > > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush
> > > > PRS queue etc.
> > > 
> > > Yes, this is the proper way, when the DMA is stopped and no use
> > > of the PASID remains then you can drop the mmu notifier and
> > > release the PASID entirely. If that is linked to the lifetime of
> > > the FD then forget completely about the mm_struct lifetime, it
> > > doesn't matter.. 
> > Got everything above, thanks a lot.
> > 
> > If everything is in order with the FD close. Why do we need to 
> > "ask IOMMU drivers to silently abort DMA and Page Requests in the
> > meantime." in mm_exit notifier? This will be done orderly in unbind
> > anyway.  
> 
> When the process is killed, mm release can happen before fds are
> released. If you look at do_exit() in kernel/exit.c:
> 
>   exit_mm()
> mmput()
>  -> mmu release notifier  
>   ...
>   exit_files()
> close_files()
>   fput()
>   exit_task_work()
> __fput()
>  -> unbind()  
> 
So unbind is coming anyway, the difference in handling in mmu release
notifier is whether we silently drop DMA fault vs. reporting fault?
If a process crash during unbind, something already went seriously
wrong, DMA fault is expected.
I think having some error indication is useful, compared to "silently
drop"

Thanks,

Jacob

> Thanks,
> Jean
> 
> >   

Re: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner

2020-04-09 Thread Marek Szyprowski
Hi Joerg,

On 09.04.2020 13:46, Joerg Roedel wrote:
> Hi Marek,
>
> I had some more thoughts and discussions with Robin about how to make
> this work with the Exynos driver. The result is the patch below, can you
> please test it and report back? Even better if you can fix up any
> breakage it might cause :)

I've checked and it works fine on top of 
ff68eb23308e6538ec7864c83d39540f423bbe90. However I'm not a fan of 
removing this 'owner' structure. It gave a nice abstraction for the all 
SYSMMU controllers for the given device (although most devices in the 
system have only one SYSMMU). Why this structure is a problem for your 
rework?

I've also spent some time trying to fix exynos-iommu on top of your 
iommu-probe-device branch. I really wonder if it works on any ARM 32bit 
or 64bit systems for other IOMMUs.

I got something working on ARM32bit, but I have to move all the 
initialization from exynos_iommu_probe_device/exynos_iommu_of_xlate to 
exynos_sysmmu_probe(). I don't like such approach, because I had to use 
bus_find_device() and manually find the owner for the every SYSMMU 
controller in the system. This approach also lack a proper symmetry and 
release path.

The main problem after your conversion is the fact that ->probe_device() 
is called very early, before any other platform device (thus IOMMU 
controller) is is probed. It doesn't handle EPROBE_DEFER too.

The other issue I've noticed is that iommu_device_set_ops() is not 
enough to assign ops properly. I had to add iommu_fwspec_init(dev, 
>of_node->fwnode, _iommu_ops) to ensure that the 'dev' gets 
proper iommu ops.

I will send my patch in a few minutes to show you the changes.

> >From 60a288509baa34df6a0bf437c977925a0a617c72 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel 
> Date: Thu, 9 Apr 2020 13:38:18 +0200
> Subject: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner'
>
> Remove 'struct exynos_iommu_owner' and replace it with a single-linked
> list of 'struct sysmmu_drvdata'. The first item in the list acts as a
> replacement for the previous exynos_iommu_owner structure. The
> iommu_device member of the first list item is reported to the IOMMU
> core code for the master device.
>
> Signed-off-by: Joerg Roedel 
> ---
>   drivers/iommu/exynos-iommu.c | 155 ---
>   1 file changed, 88 insertions(+), 67 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 186ff5cc975c..e70eb360093f 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -225,18 +225,6 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] 
> = {
>   { 20, REG_V5_FAULT_AW_VA, "AW SECURITY PROTECTION", IOMMU_FAULT_WRITE },
>   };
>   
> -/*
> - * This structure is attached to dev.archdata.iommu of the master device
> - * on device add, contains a list of SYSMMU controllers defined by device 
> tree,
> - * which are bound to given master device. It is usually referenced by 
> 'owner'
> - * pointer.
> -*/
> -struct exynos_iommu_owner {
> - struct list_head controllers;   /* list of sysmmu_drvdata.owner_node */
> - struct iommu_domain *domain;/* domain this device is attached */
> - struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
> -};
> -
>   /*
>* This structure exynos specific generalization of struct iommu_domain.
>* It contains list of SYSMMU controllers from all master devices, which has
> @@ -271,13 +259,23 @@ struct sysmmu_drvdata {
>   bool active;/* current status */
>   struct exynos_iommu_domain *domain; /* domain we belong to */
>   struct list_head domain_node;   /* node for domain clients list */
> - struct list_head owner_node;/* node for owner controllers list */
> + struct sysmmu_drvdata *next;/* Single-linked list to group SMMUs for
> +one master. NULL means not in any
> +list, ERR_PTR(-ENODEV) means end of
> +list */
> + struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
>   phys_addr_t pgtable;/* assigned page table structure */
>   unsigned int version;   /* our version */
>   
>   struct iommu_device iommu;  /* IOMMU core handle */
>   };
>   
> +/* Helper to iterate over all SYSMMUs for a given platform device */
> +#define for_each_sysmmu(dev, drvdata)\
> + for (drvdata = (dev)->archdata.iommu;   \
> +  drvdata != ERR_PTR(-ENODEV);   \
> +  drvdata = drvdata->next)
> +
>   static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain 
> *dom)
>   {
>   return container_of(dom, struct exynos_iommu_domain, domain);
> @@ -624,6 +622,8 @@ static int exynos_sysmmu_probe(struct platform_device 
> *pdev)
>   
>   data->sysmmu = dev;
>   spin_lock_init(>lock);
> + data->next = NULL;

RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-09 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker 
> Sent: Thursday, April 9, 2020 4:15 PM
> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
> 
> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> > Hi Yi,
> >
> > On 4/7/20 11:43 AM, Liu, Yi L wrote:
> > > Hi Jean,
> > >
> > >> From: Jean-Philippe Brucker 
> > >> Sent: Friday, April 3, 2020 4:23 PM
> > >> To: Auger Eric 
> > >> userspace
> > >>
> > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > >>  header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > >>
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > >> @@ -2254,6 +2309,7
> > >> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > > vfio_iommu *iommu,
> > >>  /* nesting iommu type supports PASID requests 
> > >> (alloc/free)
> */
> > >>  nesting_cap->nesting_capabilities |=
> VFIO_IOMMU_PASID_REQS;
> > > What is the meaning for ARM?
> > 
> >  I think it's just a software capability exposed to userspace, on
> >  userspace side, it has a choice to use it or not. :-) The reason
> >  define it and report it in cap nesting is that I'd like to make the
> >  pasid alloc/free be available just for IOMMU with type
> >  VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> >  for ARM. We can find a proper way to report the availability.
> > >>>
> > >>> Well it is more a question for jean-Philippe. Do we have a system wide
> > >>> PASID allocation on ARM?
> > >>
> > >> We don't, the PASID spaces are per-VM on Arm, so this function should 
> > >> consult
> the
> > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> > >> necessarily imply PASID support. The SMMUv2 does not support PASID but 
> > >> does
> > >> support nesting stages 1 and 2 for the IOVA space.
> > >> SMMUv3 support of PASID depends on HW capabilities. So I think this 
> > >> needs to
> be
> > >> finer grained:
> > >>
> > >> Does the container support:
> > >> * VFIO_IOMMU_PASID_REQUEST?
> > >>   -> Yes for VT-d 3
> > >>   -> No for Arm SMMU
> > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> > >>   -> Yes for VT-d 3
> > >>   -> Sometimes for SMMUv2
> > >>   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due 
> > >> to
> > >>  PASID tables being in GPA space.)
> > >> * VFIO_IOMMU_BIND_PASID_TABLE?
> > >>   -> No for VT-d
> > >>   -> Sometimes for SMMUv3
> > >>
> > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> > >
> > > good summary. do you expect to see any
> > >
> > >>
> > >> +nesting_cap->stage1_formats = formats;
> > > as spotted by Kevin, since a single format is supported, rename
> > 
> >  ok, I was believing it may be possible on ARM or so. :-) will rename
> >  it.
> > >>
> > >> Yes I don't think an u32 is going to cut it for Arm :( We need to 
> > >> describe all sorts
> of
> > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID 
> > >> size, HW
> > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean 
> > >> much. I
> > >> guess we could have a secondary vendor capability for these?
> > >
> > > Actually, I'm wondering if we can define some formats to stands for a set 
> > > of
> > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > > the capabilities.
> >
> > But eventually do we really need all those capability getters? I mean
> > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> > to detect any mismatch? Definitively the error handling may be heavier
> > on userspace but can't we manage.
> 
> I think we need to present these capabilities at boot time, long before
> the guest triggers a bind(). For example if the host SMMU doesn't support
> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> registers or PROBE properties. Otherwise a bind() will succeed, but if the
> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> which we'll inject into the guest, for no apparent reason from their
> perspective.
> 
> In addition some VMMs may have fallbacks if shared page tables are not
> available. They could fall back to a MAP/UNMAP interface, or simply not
> present a vIOMMU to the guest.
> 

Based on the comments, I think it would be a need to report iommu caps
in detail. So I guess iommu uapi needs to provide something alike vfio
cap chain in iommu uapi. Please feel free let me know your thoughts. :-)

In vfio, we can define a cap as below:

struct vfio_iommu_type1_info_cap_nesting {
struct  vfio_info_cap_header header;
__u64   iommu_model;
#define VFIO_IOMMU_PASID_REQS   (1 << 0)
#define VFIO_IOMMU_BIND_GPASID  (1 << 1)
#define VFIO_IOMMU_CACHE_INV(1 << 2)
__u32   nesting_capabilities;
__u32   pasid_bits;
#define 

Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jason Gunthorpe
On Wed, Apr 08, 2020 at 04:49:01PM -0700, Fenghua Yu wrote:

> > > Isn't the idea of mmu_notifier is to avoid holding the mm reference and
> > > rely on the notifier to tell us when mm is going away?
> > 
> > The notifier only holds a mmgrab(), not a mmget() - this allows
> > exit_mmap to proceed, but the mm_struct memory remains.
> > 
> > This is also probably why it is a bad idea to tie the lifetime of
> > something like a pasid to the mmdrop as a evil user could cause a
> > large number of mm structs to be released but not freed, probably
> > defeating cgroup limits and so forth (not sure)
> 
> The max number of processes can be limited for a user. PASID is per
> address space so the max number of PASID can be limited for the user.
> So the user cannot exhaust PASID so easily, right?

With the patch Jacob pointed to the PASID lifetime is tied to mmdrop,
and I think (but did not check) that the cgroup accounting happens
before mmdrop.

> Binding the PASID to the mm and freeing the PASID in __mmdrop() can get
> ride of the complexity.

This is a much more reasonable explanation and should be in the patch
commit instead of what is there.

However, it still seems unnecessary to reach for arch code - the
singleton notifier can be arranged to live until exit_mmap or fd
release, whichever is longer by putting a mmu_notififer_put() in the
release() method

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jason Gunthorpe
On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:
> > Yes, this is the proper way, when the DMA is stopped and no use of the
> > PASID remains then you can drop the mmu notifier and release the PASID
> > entirely. If that is linked to the lifetime of the FD then forget
> > completely about the mm_struct lifetime, it doesn't matter..
> > 
> Got everything above, thanks a lot.
> 
> If everything is in order with the FD close. Why do we need to 
> "ask IOMMU drivers to silently abort DMA and Page Requests in the
> meantime." in mm_exit notifier? This will be done orderly in unbind
> anyway.

I thought this is exactly what Jean-Phillippe is removing here, it is
a bad idea for the reasons he explained.

> > > Enforcing unbind upon FD close might be a precarious path, perhaps
> > > that is why we have to deal with out of order situation?  
> > 
> > How so? You just put it in the FD release function :)
> 
> I was thinking some driver may choose to defer unbind in some workqueue
> etc.

Doesn't really change anything, the lifetime of the PASID wouuld be
the lifetime of the notifier and the bind, and DMA can't continue
without the notifier registered.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "DMAR hardware is malfunctioning" error when powering off

2020-04-09 Thread Kenneth R. Crudup


BTW, I normally don't run with "intel_iommu=on" (but I do have "CONFIG_IRQ_REMAP
turned on), as I figure that if I'm a single-user laptop and my only VM is
VMWare (running Win10 guests), and I only use my Thunderbolt ports for my own
docks, that I really don't need an IOMMU anyway- but is there a benefit to
having the IOMMU turned on (were it to work, that is) in my situation?

-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Gerhard Pircher
Am 09.04.20 um 10:54 schrieb Benjamin Herrenschmidt:
> On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
>> On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
>>> If this code was broken for non-coherent caches a crude powerpc hack
>>> isn't going to help anyone else.  Remove the hack as it is the last
>>> user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
>>
>> Well Ben added this to make stuff work on ppc, ofc the home grown dma
>> layer in drm from back then isn't going to work in other places. I guess
>> should have at least an ack from him, in case anyone still cares about
>> this on ppc. Adding Ben to cc.
>
> This was due to some drivers (radeon ?) trying to use vmalloc pages for
> coherent DMA, which means on those 4xx powerpc's need to be non-cached.
>
> There were machines using that (440 based iirc), though I honestly
> can't tell if anybody still uses any of it.
The first-gen amigaone platform (6xx/book32s) uses the radeon driver
together with non-coherent DMA. However this only ever worked reliably
for DRI1.

br,
Gerhard

> Cheers,
> Ben.
>
>> -Daniel
>>
>>>
>>> Signed-off-by: Christoph Hellwig 
>>> ---
>>>  drivers/gpu/drm/drm_scatter.c | 11 +--
>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
>>> index ca520028b2cb..f4e6184d1877 100644
>>> --- a/drivers/gpu/drm/drm_scatter.c
>>> +++ b/drivers/gpu/drm/drm_scatter.c
>>> @@ -43,15 +43,6 @@
>>>
>>>  #define DEBUG_SCATTER 0
>>>
>>> -static inline void *drm_vmalloc_dma(unsigned long size)
>>> -{
>>> -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
>>> -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
>>> -#else
>>> -   return vmalloc_32(size);
>>> -#endif
>>> -}
>>> -
>>>  static void drm_sg_cleanup(struct drm_sg_mem * entry)
>>>  {
>>> struct page *page;
>>> @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
>>> *data,
>>> return -ENOMEM;
>>> }
>>>
>>> -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
>>> +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
>>> if (!entry->virtual) {
>>> kfree(entry->busaddr);
>>> kfree(entry->pagelist);
>>> --
>>> 2.25.1
>>>
>>
>>
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner

2020-04-09 Thread Joerg Roedel
Hi Marek,

I had some more thoughts and discussions with Robin about how to make
this work with the Exynos driver. The result is the patch below, can you
please test it and report back? Even better if you can fix up any
breakage it might cause :)

>From 60a288509baa34df6a0bf437c977925a0a617c72 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Thu, 9 Apr 2020 13:38:18 +0200
Subject: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner'

Remove 'struct exynos_iommu_owner' and replace it with a single-linked
list of 'struct sysmmu_drvdata'. The first item in the list acts as a
replacement for the previous exynos_iommu_owner structure. The
iommu_device member of the first list item is reported to the IOMMU
core code for the master device.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c | 155 ---
 1 file changed, 88 insertions(+), 67 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 186ff5cc975c..e70eb360093f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -225,18 +225,6 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] = 
{
{ 20, REG_V5_FAULT_AW_VA, "AW SECURITY PROTECTION", IOMMU_FAULT_WRITE },
 };
 
-/*
- * This structure is attached to dev.archdata.iommu of the master device
- * on device add, contains a list of SYSMMU controllers defined by device tree,
- * which are bound to given master device. It is usually referenced by 'owner'
- * pointer.
-*/
-struct exynos_iommu_owner {
-   struct list_head controllers;   /* list of sysmmu_drvdata.owner_node */
-   struct iommu_domain *domain;/* domain this device is attached */
-   struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
-};
-
 /*
  * This structure exynos specific generalization of struct iommu_domain.
  * It contains list of SYSMMU controllers from all master devices, which has
@@ -271,13 +259,23 @@ struct sysmmu_drvdata {
bool active;/* current status */
struct exynos_iommu_domain *domain; /* domain we belong to */
struct list_head domain_node;   /* node for domain clients list */
-   struct list_head owner_node;/* node for owner controllers list */
+   struct sysmmu_drvdata *next;/* Single-linked list to group SMMUs for
+  one master. NULL means not in any
+  list, ERR_PTR(-ENODEV) means end of
+  list */
+   struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
phys_addr_t pgtable;/* assigned page table structure */
unsigned int version;   /* our version */
 
struct iommu_device iommu;  /* IOMMU core handle */
 };
 
+/* Helper to iterate over all SYSMMUs for a given platform device */
+#define for_each_sysmmu(dev, drvdata)  \
+   for (drvdata = (dev)->archdata.iommu;   \
+drvdata != ERR_PTR(-ENODEV);   \
+drvdata = drvdata->next)
+
 static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct exynos_iommu_domain, domain);
@@ -624,6 +622,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
data->sysmmu = dev;
spin_lock_init(>lock);
+   data->next = NULL;
+   mutex_init(>rpm_lock);
 
ret = iommu_device_sysfs_add(>iommu, >dev, NULL,
 dev_name(data->sysmmu));
@@ -668,17 +668,20 @@ static int __maybe_unused exynos_sysmmu_suspend(struct 
device *dev)
 {
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
+   struct sysmmu_drvdata *master_data;
 
-   if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   if (!master)
+   return 0;
 
-   mutex_lock(>rpm_lock);
-   if (data->domain) {
-   dev_dbg(data->sysmmu, "saving state\n");
-   __sysmmu_disable(data);
-   }
-   mutex_unlock(>rpm_lock);
+   master_data = master->archdata.iommu;
+
+   mutex_lock(_data->rpm_lock);
+   if (data->domain) {
+   dev_dbg(data->sysmmu, "saving state\n");
+   __sysmmu_disable(data);
}
+   mutex_unlock(_data->rpm_lock);
+
return 0;
 }
 
@@ -686,17 +689,20 @@ static int __maybe_unused exynos_sysmmu_resume(struct 
device *dev)
 {
struct sysmmu_drvdata *data = dev_get_drvdata(dev);
struct device *master = data->master;
+   struct sysmmu_drvdata *master_data;
 
-   if (master) {
-   struct exynos_iommu_owner *owner = master->archdata.iommu;
+   if (!master)
+   return 0;
 
-   mutex_lock(>rpm_lock);
-   if (data->domain) {
-

Re: [PATCH v11 10/10] iommu/vt-d: Add custom allocator for IOASID

2020-04-09 Thread Auger Eric
Hi Jacob,

On 4/3/20 8:42 PM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 84 
> +
>  include/linux/intel-iommu.h |  2 ++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 045c5c08d71d..ff3f0386951f 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1732,6 +1732,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   if (ecap_prs(iommu->ecap))
>   intel_svm_finish_prq(iommu);
>   }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(>pasid_allocator);
> +
>  #endif
>  }
>  
> @@ -3266,6 +3269,84 @@ static int copy_translation_tables(struct intel_iommu 
> *iommu)
>   return ret;
>  }
>  
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_vcmd_ioasid_alloc(ioasid_t min, ioasid_t max, void 
> *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + if (!iommu)
> + return INVALID_IOASID;
> + /*
> +  * VT-d virtual command interface always uses the full 20 bit
> +  * PASID range. Host can partition guest PASID range based on
> +  * policies but it is out of guest's control.
> +  */
> + if (min < PASID_MIN || max > intel_pasid_max_id)
> + return INVALID_IOASID;
> +
> + if (vcmd_alloc_pasid(iommu, ))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_vcmd_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> +  * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +  * We can only free the PASID when all the devices are unbound.
> +  */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +
> +static void register_pasid_allocator(struct intel_iommu *iommu)
> +{
> + /*
> +  * If we are running in the host, no need for custom allocator
> +  * in that PASIDs are allocated from the host system-wide.
> +  */
> + if (!cap_caching_mode(iommu->cap))
> + return;
> +
> + if (!sm_supported(iommu)) {
> + pr_warn("VT-d Scalable Mode not enabled, no PASID 
> allocation\n");
> + return;
> + }
> +
> + /*
> +  * Register a custom PASID allocator if we are running in a guest,
> +  * guest PASID must be obtained via virtual command interface.
> +  * There can be multiple vIOMMUs in each guest but only one allocator
> +  * is active. All vIOMMU allocators will eventually be calling the same
> +  * host allocator.
> +  */
nit: I prefer
if (!ecap_vcs(iommu->ecap) || !vccap_pasid(iommu->vccap))
returns;
as it removes indents.

> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> + pr_info("Register custom PASID allocator\n");
> + iommu->pasid_allocator.alloc = intel_vcmd_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_vcmd_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + if (ioasid_register_allocator(>pasid_allocator)) {
> + pr_warn("Custom PASID allocator failed, scalable mode 
> disabled\n");
> + /*
> +  * Disable scalable mode on this IOMMU if there
> +  * is no custom allocator. Mixing SM capable vIOMMU
> +  * and non-SM vIOMMU are not supported.
> +  */
> + intel_iommu_sm = 0;
> + }
> + }
> +}
> +#endif
> +
>  static int __init init_dmars(void)
>  {
>   struct dmar_drhd_unit *drhd;
> @@ -3383,6 +3464,9 @@ static int __init init_dmars(void)
>*/
>   for_each_active_iommu(iommu, drhd) {
>   iommu_flush_write_buffer(iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + register_pasid_allocator(iommu);
> +#endif
>   iommu_set_root_entry(iommu);
>   iommu->flush.flush_context(iommu, 0, 0, 0, 
> DMA_CCMD_GLOBAL_INVL);
>   iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index f652db3198d9..e122cb30388e 100644
> --- a/include/linux/intel-iommu.h
> 

Re: [PATCH v11 08/10] iommu/vt-d: Cache virtual command capability register

2020-04-09 Thread Auger Eric
Hi Jacob,

On 4/3/20 8:42 PM, Jacob Pan wrote:
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
> 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> Reviewed-by: Lu Baolu 
nit: following Joerg's comment on 02/10, may be worth squashing it into
the patch that uses it (10/10)

Thanks

Eric
> 
> ---
> v7 Reviewed by Eric & Baolu
> ---
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/dmar.c| 1 +
>  include/linux/intel-iommu.h | 5 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 4d6b7b5b37ee..3b36491c8bbb 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -963,6 +963,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 
> phys_addr)
>   warn_invalid_dmar(phys_addr, " returns all ones");
>   goto unmap;
>   }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>  
>   /* the registers might be more than one page */
>   map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index f775bb825343..e02a96848586 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -194,6 +194,9 @@
>  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
>  #define ecap_sc_support(e)   ((e >> 7) & 0x1) /* Snooping Control */
>  
> +/* Virtual command interface capability */
> +#define vccap_pasid(v)   ((v & DMA_VCS_PAS)) /* PASID allocation 
> */
> +
>  /* IOTLB_REG */
>  #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>  #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -287,6 +290,7 @@
>  
>  /* PRS_REG */
>  #define DMA_PRS_PPR  ((u32)1)
> +#define DMA_VCS_PAS  ((u64)1)
>  
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)  \
>  do { \
> @@ -562,6 +566,7 @@ struct intel_iommu {
>   u64 reg_size; /* size of hw register set */
>   u64 cap;
>   u64 ecap;
> + u64 vccap;
>   u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
>   raw_spinlock_t  register_lock; /* protect register handling */
>   int seq_id; /* sequence id of the iommu */
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] uacce: Remove mm_exit() op

2020-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 09, 2020 at 05:07:34PM +0800, Zhangfei Gao wrote:
> 
> 
> On 2020/4/8 下午10:04, Jean-Philippe Brucker wrote:
> > The mm_exit() op will be removed from the SVA API. When a process dies
> > and its mm goes away, the IOMMU driver won't notify device drivers
> > anymore. Drivers should expect to handle a lot more aborted DMA. On the
> > upside, it does greatly simplify the queue management.
> > 
> > The uacce_mm struct, that tracks all queues bound to an mm, was only
> > used by the mm_exit() callback. Remove it.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> Thanks Jean for doing this.
> 
> Tested-by: Zhangfei Gao 
> 
> Except one line.
> > -static void uacce_mm_put(struct uacce_queue *q)
> > +static void uacce_unbind_queue(struct uacce_queue *q)
> >   {
> > -   struct uacce_mm *uacce_mm = q->uacce_mm;
> > -
> > -   lockdep_assert_held(>uacce->mm_lock);
> > -
> > -   mutex_lock(_mm->lock);
> > -   list_del(>list);
> > -   mutex_unlock(_mm->lock);
> > -
> > -   if (list_empty(_mm->queues)) {
> > -   if (uacce_mm->handle)
> > -   iommu_sva_unbind_device(uacce_mm->handle);
> > -   list_del(_mm->list);
> > -   kfree(uacce_mm);
> > -   }
> > +   if (!q->handle)
> > +   return;
> > +   iommu_sva_unbind_device(q->handle);
> + q->handle = 0;
> 
> Otherwise iommu_sva_unbind_device maybe called twice.
> Since uacce_unbind_queue can be called by uacce_remove and uacce_fops_release.

Thanks, I'll add it in v2

Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Daniel Vetter
On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
 wrote:
>
> On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > If this code was broken for non-coherent caches a crude powerpc hack
> > > isn't going to help anyone else.  Remove the hack as it is the last
> > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> >
> > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > layer in drm from back then isn't going to work in other places. I guess
> > should have at least an ack from him, in case anyone still cares about
> > this on ppc. Adding Ben to cc.
>
> This was due to some drivers (radeon ?) trying to use vmalloc pages for
> coherent DMA, which means on those 4xx powerpc's need to be non-cached.
>
> There were machines using that (440 based iirc), though I honestly
> can't tell if anybody still uses any of it.

agp subsystem still seems to happily do that (vmalloc memory for
device access), never having been ported to dma apis (or well
converted to iommu drivers, which they kinda are really). So I think
this all still works exactly as back then, even with the kms radeon
drivers. Question really is whether we have users left, and I have no
clue about that either.

Now if these boxes didn't ever have agp then I think we can get away
with deleting this, since we've already deleted the legacy radeon
driver. And that one used vmalloc for everything. The new kms one does
use the dma-api if the gpu isn't connected through agp.
-Daniel

> Cheers,
> Ben.
>
> > -Daniel
> >
> > >
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> > > index ca520028b2cb..f4e6184d1877 100644
> > > --- a/drivers/gpu/drm/drm_scatter.c
> > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > @@ -43,15 +43,6 @@
> > >
> > >  #define DEBUG_SCATTER 0
> > >
> > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > -{
> > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> > > -#else
> > > -   return vmalloc_32(size);
> > > -#endif
> > > -}
> > > -
> > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > >  {
> > > struct page *page;
> > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> > > *data,
> > > return -ENOMEM;
> > > }
> > >
> > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > if (!entry->virtual) {
> > > kfree(entry->busaddr);
> > > kfree(entry->pagelist);
> > > --
> > > 2.25.1
> > >
> >
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Jean-Philippe Brucker
On Thu, Apr 09, 2020 at 09:15:29AM +, Liu, Yi L wrote:
> > From: Jean-Philippe Brucker 
> > Sent: Thursday, April 9, 2020 4:29 PM
> > To: Liu, Yi L 
> > 
> > On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> > > Hi Jean,
> > >
> > > > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > > > Sent: Friday, April 3, 2020 4:35 PM
> > > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > > >
> > > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > > > >   default:
> > > > > > > > >   return -EINVAL;
> > > > > > > > >   }
> > > > > > > > > +
> > > > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > > > >
> > > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > > > >
> > > > > > > Emm, it's up to the flags to indicate bind what. It was proposed 
> > > > > > > to
> > > > > > > cover the three cases below:
> > > > > > > a) BIND/UNBIND_GPASID
> > > > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > > > c) BIND/UNBIND_PROCESS
> > > > > > > 
> > > > > > > So it's called VFIO_IOMMU_BIND.
> > > > > >
> > > > > > but aren't they all about PASID related binding?
> > > > >
> > > > > yeah, I can rename it. :-)
> > > >
> > > > I don't know if anyone intends to implement it, but SMMUv2 supports
> > > > nesting translation without any PASID support. For that case the name
> > > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more
> > sense.
> > > > Ideally we'd also use a neutral name for the IOMMU API instead of
> > > > bind_gpasid(), but that's easier to change later.
> > >
> > > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it 
> > > may
> > > cause confusion when thinking about VFIO_SET_IOMMU. How about using
> > > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> > > VFIO_BIND_PROCESS in future for the SVA bind case.
> > 
> > I think minimizing the number of ioctls is more important than finding the
> > ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
> > rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
> > non-PASID things (they should be rare enough).
> maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is
> also a discussion on reusing the same ioctl and vfio structure for
> pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your
> opinion?

Merging bind with unbind and alloc with free makes sense. I'd leave at
least invalidate a separate ioctl, because alloc/bind/unbind/free are
setup functions while invalidate is a runtime thing and performance
sensitive. But I can't see a good reason not to merge them all together,
so either way is fine by me.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Liu, Yi L
> From: Jean-Philippe Brucker 
> Sent: Thursday, April 9, 2020 4:29 PM
> To: Liu, Yi L 
> 
> On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> > Hi Jean,
> >
> > > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > > Sent: Friday, April 3, 2020 4:35 PM
> > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > >
> > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > > > default:
> > > > > > > > return -EINVAL;
> > > > > > > > }
> > > > > > > > +
> > > > > > > > +   } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > > >
> > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > > >
> > > > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > > > cover the three cases below:
> > > > > > a) BIND/UNBIND_GPASID
> > > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > > c) BIND/UNBIND_PROCESS
> > > > > > 
> > > > > > So it's called VFIO_IOMMU_BIND.
> > > > >
> > > > > but aren't they all about PASID related binding?
> > > >
> > > > yeah, I can rename it. :-)
> > >
> > > I don't know if anyone intends to implement it, but SMMUv2 supports
> > > nesting translation without any PASID support. For that case the name
> > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more
> sense.
> > > Ideally we'd also use a neutral name for the IOMMU API instead of
> > > bind_gpasid(), but that's easier to change later.
> >
> > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
> > cause confusion when thinking about VFIO_SET_IOMMU. How about using
> > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> > VFIO_BIND_PROCESS in future for the SVA bind case.
> 
> I think minimizing the number of ioctls is more important than finding the
> ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
> rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
> non-PASID things (they should be rare enough).
maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is
also a discussion on reusing the same ioctl and vfio structure for
pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your
opinion?

https://lkml.org/lkml/2020/4/3/833

Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Benjamin Herrenschmidt
On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > If this code was broken for non-coherent caches a crude powerpc hack
> > isn't going to help anyone else.  Remove the hack as it is the last
> > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> 
> Well Ben added this to make stuff work on ppc, ofc the home grown dma
> layer in drm from back then isn't going to work in other places. I guess
> should have at least an ack from him, in case anyone still cares about
> this on ppc. Adding Ben to cc.

This was due to some drivers (radeon ?) trying to use vmalloc pages for
coherent DMA, which means on those 4xx powerpc's need to be non-cached.

There were machines using that (440 based iirc), though I honestly
can't tell if anybody still uses any of it.

Cheers,
Ben.

> -Daniel
> 
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/gpu/drm/drm_scatter.c | 11 +--
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> > index ca520028b2cb..f4e6184d1877 100644
> > --- a/drivers/gpu/drm/drm_scatter.c
> > +++ b/drivers/gpu/drm/drm_scatter.c
> > @@ -43,15 +43,6 @@
> >  
> >  #define DEBUG_SCATTER 0
> >  
> > -static inline void *drm_vmalloc_dma(unsigned long size)
> > -{
> > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> > -#else
> > -   return vmalloc_32(size);
> > -#endif
> > -}
> > -
> >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> >  {
> > struct page *page;
> > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> > *data,
> > return -ENOMEM;
> > }
> >  
> > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > if (!entry->virtual) {
> > kfree(entry->busaddr);
> > kfree(entry->pagelist);
> > -- 
> > 2.25.1
> > 
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] uacce: Remove mm_exit() op

2020-04-09 Thread Zhangfei Gao



On 2020/4/8 下午10:04, Jean-Philippe Brucker wrote:

The mm_exit() op will be removed from the SVA API. When a process dies
and its mm goes away, the IOMMU driver won't notify device drivers
anymore. Drivers should expect to handle a lot more aborted DMA. On the
upside, it does greatly simplify the queue management.

The uacce_mm struct, that tracks all queues bound to an mm, was only
used by the mm_exit() callback. Remove it.

Signed-off-by: Jean-Philippe Brucker 

Thanks Jean for doing this.

Tested-by: Zhangfei Gao 

Except one line.

-static void uacce_mm_put(struct uacce_queue *q)
+static void uacce_unbind_queue(struct uacce_queue *q)
  {
-   struct uacce_mm *uacce_mm = q->uacce_mm;
-
-   lockdep_assert_held(>uacce->mm_lock);
-
-   mutex_lock(_mm->lock);
-   list_del(>list);
-   mutex_unlock(_mm->lock);
-
-   if (list_empty(_mm->queues)) {
-   if (uacce_mm->handle)
-   iommu_sva_unbind_device(uacce_mm->handle);
-   list_del(_mm->list);
-   kfree(uacce_mm);
-   }
+   if (!q->handle)
+   return;
+   iommu_sva_unbind_device(q->handle);

+ q->handle = 0;

Otherwise iommu_sva_unbind_device maybe called twice.
Since uacce_unbind_queue can be called by uacce_remove and uacce_fops_release.

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-09 Thread Auger Eric
Hi Jean,

On 4/9/20 10:14 AM, Jean-Philippe Brucker wrote:
> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
>> Hi Yi,
>>
>> On 4/7/20 11:43 AM, Liu, Yi L wrote:
>>> Hi Jean,
>>>
 From: Jean-Philippe Brucker 
 Sent: Friday, April 3, 2020 4:23 PM
 To: Auger Eric 
 userspace

 On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 
 1);
 @@ -2254,6 +2309,7
 @@ static int vfio_iommu_info_add_nesting_cap(struct
>>> vfio_iommu *iommu,
/* nesting iommu type supports PASID requests 
 (alloc/free) */
nesting_cap->nesting_capabilities |= 
 VFIO_IOMMU_PASID_REQS;
>>> What is the meaning for ARM?
>>
>> I think it's just a software capability exposed to userspace, on
>> userspace side, it has a choice to use it or not. :-) The reason
>> define it and report it in cap nesting is that I'd like to make the
>> pasid alloc/free be available just for IOMMU with type
>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>> for ARM. We can find a proper way to report the availability.
>
> Well it is more a question for jean-Philippe. Do we have a system wide
> PASID allocation on ARM?

 We don't, the PASID spaces are per-VM on Arm, so this function should 
 consult the
 IOMMU driver before setting flags. As you said on patch 3, nested doesn't
 necessarily imply PASID support. The SMMUv2 does not support PASID but does
 support nesting stages 1 and 2 for the IOVA space.
 SMMUv3 support of PASID depends on HW capabilities. So I think this needs 
 to be
 finer grained:

 Does the container support:
 * VFIO_IOMMU_PASID_REQUEST?
   -> Yes for VT-d 3
   -> No for Arm SMMU
 * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
   -> Yes for VT-d 3
   -> Sometimes for SMMUv2
   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
  PASID tables being in GPA space.)
 * VFIO_IOMMU_BIND_PASID_TABLE?
   -> No for VT-d
   -> Sometimes for SMMUv3

 Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
>>>
>>> good summary. do you expect to see any 
>>>

 +  nesting_cap->stage1_formats = formats;
>>> as spotted by Kevin, since a single format is supported, rename
>>
>> ok, I was believing it may be possible on ARM or so. :-) will rename
>> it.

 Yes I don't think an u32 is going to cut it for Arm :( We need to describe 
 all sorts of
 capabilities for page and PASID tables (granules, GPA size, ASID/PASID 
 size, HW
 access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean 
 much. I
 guess we could have a secondary vendor capability for these?
>>>
>>> Actually, I'm wondering if we can define some formats to stands for a set of
>>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
>>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
>>> the capabilities.
>>
>> But eventually do we really need all those capability getters? I mean
>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
>> to detect any mismatch? Definitively the error handling may be heavier
>> on userspace but can't we manage.
> 
> I think we need to present these capabilities at boot time, long before
> the guest triggers a bind(). For example if the host SMMU doesn't support
> 16-bit ASID, we need to communicate that to the guest using vSMMU ID
> registers or PROBE properties. Otherwise a bind() will succeed, but if the
> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
> which we'll inject into the guest, for no apparent reason from their
> perspective.
OK I understand this case as in this situation we may be able to change
the way to iommu is exposed to the guest.
> 
> In addition some VMMs may have fallbacks if shared page tables are not
> available. They could fall back to a MAP/UNMAP interface, or simply not
> present a vIOMMU to the guest.
fair enough, there is a need for such capability checker in the mid
term. But this patch introduces the capability to check whether system
wide PASID alloc is supported and this may not be requested at that
stage for the whole vSVM integration?

Thanks

Eric
> 
> Thanks,
> Jean
> 
>> My fear is we end up with an overly
>> complex series. This capability getter may be interesting if we can
>> switch to a fallback implementation but here I guess we don't have any
>> fallback. With smmuv3 nested stage we don't have any fallback solution
>> either. For the versions, it is different because the userspace shall be
>> able to adapt (or not) to the max version supported by the kernel.
>>
>> Thanks
>>
>> Eric
>>>
>>> Regards,
>>> Yi 

Re: [PATCH v11 07/10] iommu/vt-d: Add svm/sva invalidate function

2020-04-09 Thread Auger Eric
Hi Jacob,

On 4/3/20 8:42 PM, Jacob Pan wrote:
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> ---
> v11 - Removed 2D map array, use -EINVAL in granularity lookup array.
>   Fixed devTLB invalidation granularity mapping. Disregard G=1 case
>   and use address selective invalidation only.
> 
> v7 review fixed in v10
> ---
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c | 158 
> 
>  1 file changed, 158 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 94c7993dac6a..045c5c08d71d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5594,6 +5594,163 @@ static void intel_iommu_aux_detach_device(struct 
> iommu_domain *domain,
>   aux_domain_remove_dev(to_dmar_domain(domain), dev);
>  }
>  
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap 
> operation
> + * as a result of DMA or VFIO unmap. However, for assigned devices guest
> + * owns the first level page tables. Invalidations of translation caches in 
> the
> + * guest are trapped and passed down to the host.
> + *
> + * vIOMMU in the guest will only expose first level page tables, therefore
> + * we do not support IOTLB granularity for request without PASID (second 
> level).
> + *
> + * For example, to find the VT-d granularity encoding for IOTLB
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + */
> +
> +const static int 
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /*
> +  * PASID based IOTLB invalidation: PASID selective (per PASID),
> +  * page selective (address granularity)
> +  */
> + {-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL},
> + /* PASID cache */
> + {-EINVAL, -EINVAL, -EINVAL}
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu)
> +{
> + return inv_type_granu_table[type][granu];
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> +  * IOMMU cache invalidate API passes granu_size in bytes, and number of
> +  * granu size in contiguous memory.
> +  */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info 
> *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size = 0;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;

Check (domain->flags & DOMAIN_FLAG_NESTING_MODE)?
> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + spin_lock(>lock);
> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
> + if (!info) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + did = dmar_domain->iommu_did[iommu->seq_id];
> + sid = PCI_DEVID(bus, devfn);
> +
> + /* Size is only valid in non-PASID selective invalidation */
> + if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
> + size = to_vtd_size(inv_info->addr_info.granule_size,
> +inv_info->addr_info.nb_granules);
> +
> + for_each_set_bit(cache_type, (unsigned long *)_info->cache, 
> IOMMU_CACHE_INV_TYPE_NR) {
> + int 

Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-09 Thread Jean-Philippe Brucker
On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote:
> Hi Jean,
> 
> > From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> > Sent: Friday, April 3, 2020 4:35 PM
> > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> > 
> > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > >   default:
> > > > > > >   return -EINVAL;
> > > > > > >   }
> > > > > > > +
> > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) {
> > > > > >
> > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > > >
> > > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > > cover the three cases below:
> > > > > a) BIND/UNBIND_GPASID
> > > > > b) BIND/UNBIND_GPASID_TABLE
> > > > > c) BIND/UNBIND_PROCESS
> > > > > 
> > > > > So it's called VFIO_IOMMU_BIND.
> > > >
> > > > but aren't they all about PASID related binding?
> > >
> > > yeah, I can rename it. :-)
> > 
> > I don't know if anyone intends to implement it, but SMMUv2 supports
> > nesting translation without any PASID support. For that case the name
> > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense.
> > Ideally we'd also use a neutral name for the IOMMU API instead of
> > bind_gpasid(), but that's easier to change later.
> 
> I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
> cause confusion when thinking about VFIO_SET_IOMMU. How about using
> VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
> VFIO_BIND_PROCESS in future for the SVA bind case.

I think minimizing the number of ioctls is more important than finding the
ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then
rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for
non-PASID things (they should be rare enough).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-09 Thread Jean-Philippe Brucker
On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote:
> Hi Yi,
> 
> On 4/7/20 11:43 AM, Liu, Yi L wrote:
> > Hi Jean,
> > 
> >> From: Jean-Philippe Brucker 
> >> Sent: Friday, April 3, 2020 4:23 PM
> >> To: Auger Eric 
> >> userspace
> >>
> >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> >>header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> >>   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 
> >> 1);
> >> @@ -2254,6 +2309,7
> >> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > vfio_iommu *iommu,
> >>/* nesting iommu type supports PASID requests 
> >> (alloc/free) */
> >>nesting_cap->nesting_capabilities |= 
> >> VFIO_IOMMU_PASID_REQS;
> > What is the meaning for ARM?
> 
>  I think it's just a software capability exposed to userspace, on
>  userspace side, it has a choice to use it or not. :-) The reason
>  define it and report it in cap nesting is that I'd like to make the
>  pasid alloc/free be available just for IOMMU with type
>  VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
>  for ARM. We can find a proper way to report the availability.
> >>>
> >>> Well it is more a question for jean-Philippe. Do we have a system wide
> >>> PASID allocation on ARM?
> >>
> >> We don't, the PASID spaces are per-VM on Arm, so this function should 
> >> consult the
> >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> >> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> >> support nesting stages 1 and 2 for the IOVA space.
> >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs 
> >> to be
> >> finer grained:
> >>
> >> Does the container support:
> >> * VFIO_IOMMU_PASID_REQUEST?
> >>   -> Yes for VT-d 3
> >>   -> No for Arm SMMU
> >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >>   -> Yes for VT-d 3
> >>   -> Sometimes for SMMUv2
> >>   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> >>  PASID tables being in GPA space.)
> >> * VFIO_IOMMU_BIND_PASID_TABLE?
> >>   -> No for VT-d
> >>   -> Sometimes for SMMUv3
> >>
> >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> > 
> > good summary. do you expect to see any 
> > 
> >>
> >> +  nesting_cap->stage1_formats = formats;
> > as spotted by Kevin, since a single format is supported, rename
> 
>  ok, I was believing it may be possible on ARM or so. :-) will rename
>  it.
> >>
> >> Yes I don't think an u32 is going to cut it for Arm :( We need to describe 
> >> all sorts of
> >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID 
> >> size, HW
> >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean 
> >> much. I
> >> guess we could have a secondary vendor capability for these?
> > 
> > Actually, I'm wondering if we can define some formats to stands for a set of
> > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
> > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
> > the capabilities.
> 
> But eventually do we really need all those capability getters? I mean
> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL()
> to detect any mismatch? Definitively the error handling may be heavier
> on userspace but can't we manage.

I think we need to present these capabilities at boot time, long before
the guest triggers a bind(). For example if the host SMMU doesn't support
16-bit ASID, we need to communicate that to the guest using vSMMU ID
registers or PROBE properties. Otherwise a bind() will succeed, but if the
guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events
which we'll inject into the guest, for no apparent reason from their
perspective.

In addition some VMMs may have fallbacks if shared page tables are not
available. They could fall back to a MAP/UNMAP interface, or simply not
present a vIOMMU to the guest.

Thanks,
Jean

> My fear is we end up with an overly
> complex series. This capability getter may be interesting if we can
> switch to a fallback implementation but here I guess we don't have any
> fallback. With smmuv3 nested stage we don't have any fallback solution
> either. For the versions, it is different because the userspace shall be
> able to adapt (or not) to the max version supported by the kernel.
> 
> Thanks
> 
> Eric
> > 
> > Regards,
> > Yi Liu
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-09 Thread Auger Eric
Hi Jacob,

On 4/3/20 8:42 PM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
> .-.  .---.
> |   vIOMMU|  | Guest process CR3, FL only|
> | |  '---'
> ./
> | PASID Entry |--- PASID cache flush -
> '-'   |
> | |   V
> | |CR3 in GPA
> '-'
> Guest
> --| Shadow |--|
>   vv  v
> Host
> .-.  .--.
> |   pIOMMU|  | Bind FL for GVA-GPA  |
> | |  '--'
> ./  |
> | PASID Entry | V (Nested xlate)
> '\.--.
> | |   |SL for GPA-HPA, default domain|
> | |   '--'
> '-'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> ---
> v11: Fixed locking, avoid duplicated paging mode check, added helper to
> free svm if device list is empty. Use rate limited error message since
> the bind gpasid call comes from user space.
> ---
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 206 
> 
>  include/linux/intel-iommu.h |   8 +-
>  include/linux/intel-svm.h   |  17 
>  4 files changed, 234 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c0dadec5a6b3..94c7993dac6a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6178,6 +6178,10 @@ const struct iommu_ops intel_iommu_ops = {
>   .dev_disable_feat   = intel_iommu_dev_disable_feat,
>   .is_attach_deferred = intel_iommu_is_attach_deferred,
>   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid= intel_svm_bind_gpasid,
> + .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> +#endif
>  };
>  
>  static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index d7f2a5358900..7cf711318b87 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -226,6 +226,212 @@ static LIST_HEAD(global_svm_list);
>   list_for_each_entry((sdev), &(svm)->devs, list) \
>   if ((d) != (sdev)->dev) {} else
>  
> +
> +static inline void intel_svm_free_if_empty(struct intel_svm *svm, u64 pasid)
> +{
> + if (list_empty(>devs)) {
> + ioasid_set_data(pasid, NULL);
> + kfree(svm);
> + }
> +}
> +
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev,
> + struct iommu_gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct dmar_domain *dmar_domain;
> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;
> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + } else {
> + return -ENOTSUPP;
> + }
> +
> + /*
> +  * We only check host PASID range, we have no knowledge to check
> +  * guest PASID range.
> +  */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + dmar_domain = to_dmar_domain(domain);
> +
> + mutex_lock(_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> + if (svm) {
> + /*
> +  * If we found svm for the PASID, there must be at
> +  * least one device bond, otherwise svm should be freed.
> +  */
> + if (WARN_ON(list_empty(>devs))) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + for_each_svm_dev(sdev, svm, dev) {
> + /* In case of multiple sub-devices of the same pdev
> 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-09 Thread Jean-Philippe Brucker
On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> On Tue, 7 Apr 2020 21:00:21 -0700
> "Raj, Ashok"  wrote:
> 
> > Hi Alex
> > 
> > + Bjorn
> 
>  + Don
> 
> > FWIW I can't understand why PCI SIG went different ways with ATS, 
> > where its enumerated on PF and VF. But for PASID and PRI its only
> > in PF. 
> > 
> > I'm checking with our internal SIG reps to followup on that.
> > 
> > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > Is there vendor guarantee that hidden registers will locate at the
> > > > same offset between PF and VF config space?   
> > > 
> > > I'm not sure if the spec really precludes hidden registers, but the
> > > fact that these registers are explicitly outside of the capability
> > > chain implies they're only intended for device specific use, so I'd say
> > > there are no guarantees about anything related to these registers.  
> > 
> > As you had suggested in the other thread, we could consider
> > using the same offset as in PF, but even that's a better guess
> > still not reliable.
> > 
> > The other option is to maybe extend driver ops in the PF to expose
> > where the offsets should be. Sort of adding the quirk in the 
> > implementation. 
> > 
> > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > resisting 
> > making VF's first class citizen, we might ask them to add some verbiage
> > to suggest leave the same offsets as PF open to help emulation software.
> 
> Even if we know where to expose these capabilities on the VF, it's not
> clear to me how we can actually virtualize the capability itself.  If
> the spec defines, for example, an enable bit as r/w then software that
> interacts with that register expects the bit is settable.  There's no
> protocol for "try to set the bit and re-read it to see if the hardware
> accepted it".  Therefore a capability with a fixed enable bit
> representing the state of the PF, not settable by the VF, is
> disingenuous to the spec.

Would it be OK to implement a lock down mechanism for the PF PASID
capability, preventing changes to the PF cap when the VF is in use by
VFIO?  The emulation would still break the spec: since the PF cap would
always be enabled the VF configuration bits would have no effect, but it
seems preferable to having the Enable bit not enable anything.

> 
> If what we're trying to do is expose that PASID and PRI are enabled on
> the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> without the ability to control it is not the right approach.  Maybe we
> need new capabilities exposing these as slave features that cannot be
> controlled?  We could define our own vendor capability for this, but of
> course we have both the where to put it in config space issue, as well
> as the issue of trying to push an ad-hoc standard.  vfio could expose
> these as device features rather than emulating capabilities, but that
> still leaves a big gap between vfio in the hypervisor and the driver in
> the guest VM.  That might still help push the responsibility and policy
> for how to expose it to the VM as a userspace problem though.

Userspace might have more difficulty working around the issues mentioned
in this thread. They would still need a guarantee that the PF PASID
configuration doesn't change at runtime, and they wouldn't have the
ability to talk to a vendor driver to figure out where to place the fake
PASID capability.

Thanks,
Jean

> 
> I agree though, I don't know why the SIG would preclude implementing
> per VF control of these features.  Thanks,
> 
> Alex
> 
> > > FWIW, vfio started out being more strict about restricting config space
> > > access to defined capabilities, until...
> > > 
> > > commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> > > Author: Alex Williamson 
> > > Date:   Mon Apr 1 09:04:12 2013 -0600
> > >   
> > 
> > Cheers,
> > Ashok
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-09 Thread Gao Xiang via iommu
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> This is always GFP_KERNEL - for long term mappings with other properties
> vmap should be used.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c   | 2 +-
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 3 +--
>  drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 +--
>  fs/erofs/decompressor.c| 2 +-

For EROFS part,

Acked-by: Gao Xiang 

Thanks,
Gao Xiang

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-09 Thread Jean-Philippe Brucker
On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote:
> On Wed, 8 Apr 2020 19:32:18 -0300
> Jason Gunthorpe  wrote:
> 
> > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote:
> > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote:  
> > > > > Hi Jean,
> > > > > 
> > > > > On Wed,  8 Apr 2020 16:04:25 +0200
> > > > > Jean-Philippe Brucker  wrote:
> > > > > 
> > > > > > The IOMMU SVA API currently requires device drivers to
> > > > > > implement an mm_exit() callback, which stops device jobs that
> > > > > > do DMA. This function is called in the release() MMU
> > > > > > notifier, when an address space that is shared with a device
> > > > > > exits.
> > > > > > 
> > > > > > It has been noted several time during discussions about SVA
> > > > > > that cancelling DMA jobs can be slow and complex, and doing
> > > > > > it in the release() notifier might cause synchronization
> > > > > > issues (patch 2 has more background). Device drivers must in
> > > > > > any case call unbind() to remove their bond, after stopping
> > > > > > DMA from a more favorable context (release of a file
> > > > > > descriptor).
> > > > > > 
> > > > > > So after mm exits, rather than notifying device drivers, we
> > > > > > can hold on to the PASID until unbind(), ask IOMMU drivers to
> > > > > > silently abort DMA and Page Requests in the meantime. This
> > > > > > change should relieve the mmput() path.
> > > > >
> > > > > I assume mm is destroyed after all the FDs are closed
> > > > 
> > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie
> > > > anything using mmu_notifiers has to hold a grab until the
> > > > notifier is destroyed, which is often triggered by FD close.
> > > >   
> > > Sorry, I don't get this. Are you saying we have to hold a mmgrab()
> > > between svm_bind/mmu_notifier_register and
> > > svm_unbind/mmu_notifier_unregister?  
> > 
> > Yes. This is done automatically for the caller inside the mmu_notifier
> > implementation. We now even store the mm_struct pointer inside the
> > notifier.
> > 
> > Once a notifier is registered the mm_struct remains valid memory until
> > the notifier is unregistered.
> > 
> > > Isn't the idea of mmu_notifier is to avoid holding the mm reference
> > > and rely on the notifier to tell us when mm is going away?  
> > 
> > The notifier only holds a mmgrab(), not a mmget() - this allows
> > exit_mmap to proceed, but the mm_struct memory remains.
> > 
> > This is also probably why it is a bad idea to tie the lifetime of
> > something like a pasid to the mmdrop as a evil user could cause a
> > large number of mm structs to be released but not freed, probably
> > defeating cgroup limits and so forth (not sure)
> > 
> > > It seems both Intel and AMD iommu drivers don't hold mmgrab after
> > > mmu_notifier_register.  
> > 
> > It is done internally to the implementation.
> > 
> > > > So the exit_mmap() -> release() may happen before the FDs are
> > > > destroyed, but the final mmdrop() will be during some FD release
> > > > when the final mmdrop() happens.  
> > > 
> > > Do you mean mmdrop() is after FD release?   
> > 
> > Yes, it will be done by the mmu_notifier_unregister(), which should be
> > called during FD release if the iommu lifetime is linked to some FD.
> > 
> > > If so, unbind is called in FD release should take care of
> > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush PRS
> > > queue etc.  
> > 
> > Yes, this is the proper way, when the DMA is stopped and no use of the
> > PASID remains then you can drop the mmu notifier and release the PASID
> > entirely. If that is linked to the lifetime of the FD then forget
> > completely about the mm_struct lifetime, it doesn't matter..
> > 
> Got everything above, thanks a lot.
> 
> If everything is in order with the FD close. Why do we need to 
> "ask IOMMU drivers to silently abort DMA and Page Requests in the
> meantime." in mm_exit notifier? This will be done orderly in unbind
> anyway.

When the process is killed, mm release can happen before fds are released.
If you look at do_exit() in kernel/exit.c:

exit_mm()
  mmput()
   -> mmu release notifier
...
exit_files()
  close_files()
fput()
exit_task_work()
  __fput()
   -> unbind()

Thanks,
Jean

> 
> > > Enforcing unbind upon FD close might be a precarious path, perhaps
> > > that is why we have to deal with out of order situation?  
> > 
> > How so? You just put it in the FD release function :)
> > 
> I was thinking some driver may choose to defer unbind in some workqueue
> etc.
> 
> > > > > In VT-d, because of enqcmd and lazy PASID free we plan to hold
> > > > > on to the PASID until mmdrop.
> > > > > https://lore.kernel.org/patchwork/patch/1217762/
> > > > 
> > > > Why? The bind already gets a mmu_notifier which has refcounts and
> > > > the right lifetime for PASID.. This code could already be
> > > > simplified by using the