IOMMU Maintainership

2020-11-17 Thread Joerg Roedel
Hi,

last week I spent in the hospital and had an unplanned surgery from
which I am recovering now. The recovery will take a few weeks, which
unfortunatly does not allow me to fulfill my IOMMU maintainer duties or
do any other serious work in front of a computer.

Luckily Will Deacon volunteered to handle incoming IOMMU patches and
send them upstream. So please Cc him on any patches that you want to
have merged upstream for the next release and on important fixes for
v5.10. The patches will go through another tree for the time being, Will
can share the details on that.

I hope to return to my duties when the next merge window is over.

Thanks a lot for your help, Will! Also thank you to the others on Cc
which will help Will handling the patch flow.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: fix return error code in iommu_probe_device()

2020-11-17 Thread Lu Baolu

Hi Yingliang,

On 2020/11/17 10:52, Yang Yingliang wrote:

If iommu_group_get() failed, it need return error code
in iommu_probe_device().

Fixes: cf193888bfbd ("iommu: Move new probe_device path...")
Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
  drivers/iommu/iommu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b53446bb8c6b..6f4a32df90f6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -253,8 +253,10 @@ int iommu_probe_device(struct device *dev)
goto err_out;
  
  	group = iommu_group_get(dev);

-   if (!group)
+   if (!group) {
+   ret = -ENODEV;


Can you please explain why you use -ENODEV here?

Best regards,
baolu


goto err_release;
+   }
  
  	/*

 * Try to allocate a default domain - needs support from the


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-11-17 Thread Auger Eric
Hi Shameer,

On 5/13/20 5:57 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 13 May 2020 14:29
>> To: Shameerali Kolothum Thodi ;
>> Zhangfei Gao ; eric.auger@gmail.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
>> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
>> t...@semihalf.com; bbhush...@marvell.com
>> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>>
> [...]
> 
> Yes that's normal this series is not meant to support vSVM at this stage.
>
> I intend to add the missing pieces during the next weeks.

 Thanks for that. I have made an attempt to add the vSVA based on
 your v10 + JPBs sva patches. The host kernel and Qemu changes can
 be found here[1][2].

 This basically adds multiple pasid support on top of your changes.
 I have done some basic sanity testing and we have some initial success
 with the zip vf dev on our D06 platform. Please note that the STALL event 
 is
 not yet supported though, but works fine if we mlock() guest usr mem.
>>>
>>> I have added STALL support for our vSVA prototype and it seems to be
>>> working(on our hardware). I have updated the kernel and qemu branches
>> with
>>> the same[1][2]. I should warn you though that these are prototype code and I
>> am pretty
>>> much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
>> everything.
>>> But thought of sharing, in case if it is useful somehow!.
>>
>> Thank you again for sharing the POC. I looked at the kernel and QEMU
>> branches.
>>
>> Here are some preliminary comments:
>> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":  as
>> you mentionned S2TTB reset now is featured in v11
> 
> Yes.
> 
>> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
>> easily integrate this into my series. Update the iommu api first and
>> pass multiple CD info in a separate patch
> 
> Ok.
in v12, I added
[PATCH v12 14/15] iommu/smmuv3: Accept configs with more than one
context descriptor

I don't think you need to add s1cdmax addition as we already have
pasid_bits which should do the job.

>> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
>> cascaded to host through the PASID cache invalidation uapi (no pb you
>> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
>> think I should add this support in my original series although it does
>> not seem to trigger any issue up to now.
> 
> Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
> think
> this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
> get invalidated. 
in v12 I added [PATCH v12 15/15] iommu/smmuv3: Add PASID cache
invalidation per PASID. I have not tested it though.
> 
>> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
>> the transcode is done somewhere else with SVA but we still need to do it
>> if a single CD is used, right? I will review the SVA code to better
>> understand.

Since I have rebase on 5.10-rc4 you will still have this duplication to
handle.
> 
> Hmm..not sure. Need to take another look to see whether we need a special
> handling for single CD or not.
> 
>> - for the STALL response injection I would tend to use a new VFIO region
>> for responses. At the moment there is a single VFIO region for reporting
>> the fault.

in v12 I added a new VFIO region to inject your fault. This was tested
with dummy event injection, this should work properly.

If we clearly identify all the public dependencies needed for vSVA/ARM I
can help you respinning on top of them

Thanks

Eric
> 
> Sure. That will be much cleaner and probably improve the context switch
> latency. Another thing I noted with STALL is that pasid_valid flag needs to be
> taken care in the SVA kernel path. 
> 
> "iommu: Remove pasid validity check for STALL model page response msg"
> Not sure this one is a proper way to handle this.
>  
>> On QEMU side:
>> - I am currently working on 3.2 range invalidation support which is
>> needed for DPDK/VFIO
>> - While at it I will look at how to incrementally introduce some of the
>> features you need in this series.
> 
> Ok. 
> 
> Thanks for taking a look at the POC.
> 
> Cheers,
> Shameer
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-11-17 Thread Shameerali Kolothum Thodi
Hi Eric,

First, many thanks for the respin. I will go through all of 
these(iommu/vfio/Qemu)
and will do a thorough verification/tests on our hardware. 

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 17 November 2020 08:40
> To: Shameerali Kolothum Thodi ;
> Zhangfei Gao ; eric.auger@gmail.com;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
> t...@semihalf.com; bbhush...@marvell.com
> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Shameer,
> 
> On 5/13/20 5:57 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: 13 May 2020 14:29
> >> To: Shameerali Kolothum Thodi ;
> >> Zhangfei Gao ; eric.auger@gmail.com;
> >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> >> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> >> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
> >> jacob.jun@linux.intel.com; yi.l@intel.com;
> peter.mayd...@linaro.org;
> >> t...@semihalf.com; bbhush...@marvell.com
> >> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> > [...]
> >
> > Yes that's normal this series is not meant to support vSVM at this 
> > stage.
> >
> > I intend to add the missing pieces during the next weeks.
> 
>  Thanks for that. I have made an attempt to add the vSVA based on
>  your v10 + JPBs sva patches. The host kernel and Qemu changes can
>  be found here[1][2].
> 
>  This basically adds multiple pasid support on top of your changes.
>  I have done some basic sanity testing and we have some initial success
>  with the zip vf dev on our D06 platform. Please note that the STALL event
> is
>  not yet supported though, but works fine if we mlock() guest usr mem.
> >>>
> >>> I have added STALL support for our vSVA prototype and it seems to be
> >>> working(on our hardware). I have updated the kernel and qemu branches
> >> with
> >>> the same[1][2]. I should warn you though that these are prototype code
> and I
> >> am pretty
> >>> much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
> >> everything.
> >>> But thought of sharing, in case if it is useful somehow!.
> >>
> >> Thank you again for sharing the POC. I looked at the kernel and QEMU
> >> branches.
> >>
> >> Here are some preliminary comments:
> >> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":
> as
> >> you mentionned S2TTB reset now is featured in v11
> >
> > Yes.
> >
> >> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
> >> easily integrate this into my series. Update the iommu api first and
> >> pass multiple CD info in a separate patch
> >
> > Ok.
> in v12, I added
> [PATCH v12 14/15] iommu/smmuv3: Accept configs with more than one
> context descriptor
> 
> I don't think you need to add s1cdmax addition as we already have
> pasid_bits which should do the job.

Ok.
 
> >> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
> >> cascaded to host through the PASID cache invalidation uapi (no pb you
> >> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
> >> think I should add this support in my original series although it does
> >> not seem to trigger any issue up to now.
> >
> > Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
> > think
> > this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
> > get invalidated.
> in v12 I added [PATCH v12 15/15] iommu/smmuv3: Add PASID cache
> invalidation per PASID. I have not tested it though.

Ok. Will verify this.

> >> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
> >> the transcode is done somewhere else with SVA but we still need to do it
> >> if a single CD is used, right? I will review the SVA code to better
> >> understand.
> 
> Since I have rebase on 5.10-rc4 you will still have this duplication to
> handle.

OK.

> > Hmm..not sure. Need to take another look to see whether we need a special
> > handling for single CD or not.
> >
> >> - for the STALL response injection I would tend to use a new VFIO region
> >> for responses. At the moment there is a single VFIO region for reporting
> >> the fault.
> 
> in v12 I added a new VFIO region to inject your fault. This was tested
> with dummy event injection, this should work properly.

That’s cool. I will check this.
 
> If we clearly identify all the public dependencies needed for vSVA/ARM I
> can help you respinning on top of them

Sure. I will 

[RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue

2020-11-17 Thread John Garry
This series contains a patch to solve the longterm IOVA issue which
leizhen originally tried to address at [0].

A sieved kernel log is at the following, showing periodic dumps of IOVA
sizes, per CPU and per depot bin, per IOVA size granule:
https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test

Notice, for example, the following logs:
[13175.355584] print_iova1 cpu_total=40135 depot_total=3866 total=44001
[83483.457858] print_iova1 cpu_total=62532 depot_total=24476 total=87008

Where total IOVA rcache size has grown from 44K->87K over a long time.

Along with this patch, I included the following:
- A smaller helper to clear all IOVAs for a domain
- Change polarity of the IOVA magazine helpers
- Small optimisation from Cong Wang included, which was never applied [1].
  There was some debate of the other patches in that series, but this one
  is quite straightforward.

Differnces to v2:
- Update commit message for patch 3/4

Differences to v1:
- Add IOVA clearing helper
- Add patch to change polarity of mag helpers
- Avoid logically-redundant extra variable in __iova_rcache_insert()

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/4b74d40a-22d1-af53-fcb6-5d7018370...@huawei.com/

Cong Wang (1):
  iommu: avoid taking iova_rbtree_lock twice

John Garry (3):
  iommu/iova: Add free_all_cpu_cached_iovas()
  iommu/iova: Avoid double-negatives in magazine helpers
  iommu/iova: Flush CPU rcache for when a depot fills

 drivers/iommu/iova.c | 66 +---
 1 file changed, 38 insertions(+), 28 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v3 1/4] iommu/iova: Add free_all_cpu_cached_iovas()

2020-11-17 Thread John Garry
Add a helper function to free the CPU rcache for all online CPUs.

There also exists a function of the same name in
drivers/iommu/intel/iommu.c, but the parameters are different, and there
should be no conflict.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 30d969a4c5fd..81b7399dd5e8 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -227,6 +227,14 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static void free_all_cpu_cached_iovas(struct iova_domain *iovad)
+{
+   unsigned int cpu;
+
+   for_each_online_cpu(cpu)
+   free_cpu_cached_iovas(cpu, iovad);
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -422,15 +430,12 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 retry:
new_iova = alloc_iova(iovad, size, limit_pfn, true);
if (!new_iova) {
-   unsigned int cpu;
-
if (!flush_rcache)
return 0;
 
/* Try replenishing IOVAs by flushing rcache. */
flush_rcache = false;
-   for_each_online_cpu(cpu)
-   free_cpu_cached_iovas(cpu, iovad);
+   free_all_cpu_cached_iovas(iovad);
goto retry;
}
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v3 4/4] iommu: avoid taking iova_rbtree_lock twice

2020-11-17 Thread John Garry
From: Cong Wang 

Both find_iova() and __free_iova() take iova_rbtree_lock,
there is no reason to take and release it twice inside
free_iova().

Fold them into one critical section by calling the unlock
versions instead.

Signed-off-by: Cong Wang 
Reviewed-by: Robin Murphy 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 386005055aca..3b32e6746c70 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -398,10 +398,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
 
 }
 EXPORT_SYMBOL_GPL(free_iova);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-11-17 Thread Auger Eric
Hi Shameer,

On 11/17/20 12:39 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 10:43
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; t...@semihalf.com; nicoleots...@gmail.com
>> Subject: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and
>> s2_cfg
>>
>> In preparation for the introduction of nested stages
>> let's turn s1_cfg and s2_cfg fields into pointers which are
>> dynamically allocated depending on the smmu_domain stage.
> 
> This will break compile if we have CONFIG_ARM_SMMU_V3_SVA
> because ,
> https://github.com/eauger/linux/blob/5.10-rc4-2stage-v12/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c#L40
> 
> Do we really need to make these pointers?

Thanks for reporting. I think I can do differently. Working on this now.

Thanks

Eric
> 
> Thanks,
> Shameer
>  
>> In nested mode, both stages will coexist and s1_cfg will
>> be allocated when the guest configuration gets passed.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 83 -
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 +-
>>  2 files changed, 48 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index d828d6cbeb0e..4baf9fafe462 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -953,9 +953,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct
>> arm_smmu_domain *smmu_domain,
>>  unsigned int idx;
>>  struct arm_smmu_l1_ctx_desc *l1_desc;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>> -if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>> +if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>>  return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
>>
>>  idx = ssid >> CTXDESC_SPLIT;
>> @@ -990,7 +990,7 @@ int arm_smmu_write_ctx_desc(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>  __le64 *cdptr;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>>
>> -if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
>> +if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
>>  return -E2BIG;
>>
>>  cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>> @@ -1056,7 +1056,7 @@ static int arm_smmu_alloc_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  size_t l1size;
>>  size_t max_contexts;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
>> +struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
>>  struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
>>
>>  max_contexts = 1 << cfg->s1cdmax;
>> @@ -1104,7 +1104,7 @@ static void arm_smmu_free_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  int i;
>>  size_t size, l1size;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>>  if (cdcfg->l1_desc) {
>>  size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
>> @@ -1211,17 +1211,8 @@ static void arm_smmu_write_strtab_ent(struct
>> arm_smmu_master *master, u32 sid,
>>  }
>>
>>  if (smmu_domain) {
>> -switch (smmu_domain->stage) {
>> -case ARM_SMMU_DOMAIN_S1:
>> -s1_cfg = _domain->s1_cfg;
>> -break;
>> -case ARM_SMMU_DOMAIN_S2:
>> -case ARM_SMMU_DOMAIN_NESTED:
>> -s2_cfg = _domain->s2_cfg;
>> -break;
>> -default:
>> -break;
>> -}
>> +s1_cfg = smmu_domain->s1_cfg;
>> +s2_cfg = smmu_domain->s2_cfg;
>>  }
>>
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1664,10 +1655,10 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>   * careful, 007.
>>   */
>>  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> -arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>> +arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg->cd.asid);
>>  } else {
>>  cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
>> -cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +

Re: [PATCH v6 3/3] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2020-11-17 Thread Thierry Reding
On Mon, Nov 16, 2020 at 11:48:39AM -0800, John Stultz wrote:
> On Mon, Nov 16, 2020 at 8:36 AM Will Deacon  wrote:
> > On Mon, Nov 16, 2020 at 04:59:36PM +0100, Thierry Reding wrote:
> > > On Fri, Nov 06, 2020 at 04:27:10AM +, John Stultz wrote:
> > > Unfortunately, the ARM SMMU module will eventually end up being loaded
> > > once the root filesystem has been mounted (for example via SDHCI or
> > > Ethernet, both with using just plain, non-IOMMU-backed DMA API) and then
> > > initialize, configuring as "fault by default", which then results from a
> > > slew of SMMU faults from all the devices that have previously configured
> > > themselves without IOMMU support.
> >
> > I wonder if fw_devlink=on would help here?
> >
> > But either way, I'd be more inclined to revert this change if it's causing
> > problems for !QCOM devices.
> >
> > Linus -- please can you drop this one (patch 3/3) for now, given that it's
> > causing problems?
> 
> Agreed. Apologies again for the trouble.
> 
> I do feel like the probe timeout to handle optional links is causing a
> lot of the trouble here. I expect fw_devlink would solve this, but it
> may be awhile before it can be always enabled.  I may see about
> pushing the default probe timeout value to be a little further out
> than init (I backed away from my last attempt as I didn't want to
> cause long (30 second) delays for cases like NFS root, but maybe 2-5
> seconds would be enough to make things work better for everyone).

I think there are two problems here: 1) the deferred probe timeout can
cause a mismatch between what SMMU masters and the SMMU think is going
on and 2) a logistical problem of dealing with the SMMU driver being a
loadable module.

The second problem can be dealt with by shipping the module in the
initial ramdisk. That's a bit annoying, but perhaps the right thing to
do. At least on Tegra we need this because all the devices that carry
the root filesystem (Ethernet for NFS and SDHCI/USB/SATA/PCI for disk
boot) are SMMU masters and will start to fault once the SMMU driver is
loaded.

The first problem is trickier, but if the ARM SMMU driver is built as a
module and shipped in the initial ramdisk it should work. Like I said,
this is annoying because it makes the development a bit more complicated
than just rebuilding a kernel image and flashing it (or boot it straight
from TFTP) because now everytime the ARM SMMU module is built the
initial ramdisk needs to be updated (and potentially flashed) as well.

Thierry

P.S.: Interestingly this is very similar to the problem that I've been
trying to address for display hardware that's left on by the bootloader.
Given that, one potential solution would be to somehow retrieve memory
allocations done by these devices and create identity mappings in the
ARM SMMU address spaces for such devices, much like we plan to do for
devices left on by the bootloader (like the display controller for
showing a boot splash). I suspect that it's not really worth doing this
for devices that are only initialized by the kernel because we have a
bit of control over when we enable them, so I'd prefer if we just kept
things reasonably simple and made sure the SMMU was either always used
by a device from the start or not at all. Dynamically switching between
SMMU and no-SMMU seems a bit eccentric.


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv8 1/8] iommu/io-pgtable-arm: Add support to use system cache

2020-11-17 Thread Sai Prakash Ranjan
Add a quirk IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to override
the attributes set in TCR for the page table walker when
using system cache.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 10 --
 include/linux/io-pgtable.h |  4 
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a7a9bc08dcd1..7c9ea9d7874a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -761,7 +761,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
-   IO_PGTABLE_QUIRK_ARM_TTBR1))
+   IO_PGTABLE_QUIRK_ARM_TTBR1 |
+   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -773,10 +774,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
tcr->sh = ARM_LPAE_TCR_SH_IS;
tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   goto out_free_data;
} else {
tcr->sh = ARM_LPAE_TCR_SH_OS;
tcr->irgn = ARM_LPAE_TCR_RGN_NC;
-   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+   if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+   else
+   tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
}
 
tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4cde111e425b..a9a2c59fab37 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -86,6 +86,9 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
 *  for use in the upper half of a split address space.
+*
+* IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the attributes set in TCR 
for
+*  the page table walker when using system cache.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -93,6 +96,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3)
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
+   #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv8 5/8] drm/msm/a6xx: Add support for using system cache(LLC)

2020-11-17 Thread Sai Prakash Ranjan
From: Sharat Masetty 

The last level system cache can be partitioned to 32 different
slices of which GPU has two slices preallocated. One slice is
used for caching GPU buffers and the other slice is used for
caching the GPU SMMU pagetables. This talks to the core system
cache driver to acquire the slice handles, configure the SCID's
to those slices and activates and deactivates the slices upon
GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use
of the system cache to set the right TCR attributes. GPU then
has the ability to override a few cacheability parameters which
it does to override write-allocate to write-no-allocate as the
GPU hardware does not benefit much from it.

DOMAIN_ATTR_IO_PGTABLE_CFG is another domain level attribute used
by the IOMMU driver for pagetable configuration which will be used
to set a quirk initially to set the right attributes to cache the
hardware pagetables into the system cache.

Signed-off-by: Sharat Masetty 
[saiprakash.ranjan: fix to set attr before device attach to iommu and rebase]
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 83 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  4 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 17 +
 3 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 948f3656c20c..95c98c642876 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -8,7 +8,9 @@
 #include "a6xx_gpu.h"
 #include "a6xx_gmu.xml.h"
 
+#include 
 #include 
+#include 
 
 #define GPU_PAS_ID 13
 
@@ -1022,6 +1024,79 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
return IRQ_HANDLED;
 }
 
+static void a6xx_llc_rmw(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 mask, u32 or)
+{
+   return msm_rmw(a6xx_gpu->llc_mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_write(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 value)
+{
+   return msm_writel(value, a6xx_gpu->llc_mmio + (reg << 2));
+}
+
+static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
+{
+   llcc_slice_deactivate(a6xx_gpu->llc_slice);
+   llcc_slice_deactivate(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
+{
+   u32 cntl1_regval = 0;
+
+   if (IS_ERR(a6xx_gpu->llc_mmio))
+   return;
+
+   if (!llcc_slice_activate(a6xx_gpu->llc_slice)) {
+   u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
+
+   gpu_scid &= 0x1f;
+   cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 
10) |
+  (gpu_scid << 15) | (gpu_scid << 20);
+   }
+
+   if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
+   u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+
+   gpuhtw_scid &= 0x1f;
+   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+   }
+
+   if (cntl1_regval) {
+   /*
+* Program the slice IDs for the various GPU blocks and GPU MMU
+* pagetables
+*/
+   a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
+
+   /*
+* Program cacheability overrides to not allocate cache lines on
+* a write miss
+*/
+   a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
0xF, 0x03);
+   }
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
+{
+   llcc_slice_putd(a6xx_gpu->llc_slice);
+   llcc_slice_putd(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_slices_init(struct platform_device *pdev,
+   struct a6xx_gpu *a6xx_gpu)
+{
+   a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+   if (IS_ERR(a6xx_gpu->llc_mmio))
+   return;
+
+   a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
+   a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+
+   if (IS_ERR(a6xx_gpu->llc_slice) && IS_ERR(a6xx_gpu->htw_llc_slice))
+   a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1038,6 +1113,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
msm_gpu_resume_devfreq(gpu);
 
+   a6xx_llc_activate(a6xx_gpu);
+
return 0;
 }
 
@@ -1048,6 +1125,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 
trace_msm_gpu_suspend(0);
 
+   a6xx_llc_deactivate(a6xx_gpu);
+
devfreq_suspend_device(gpu->devfreq.devfreq);
 
return a6xx_gmu_stop(a6xx_gpu);
@@ -1091,6 +1170,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
drm_gem_object_put(a6xx_gpu->shadow_bo);
}
 
+   a6xx_llc_slices_destroy(a6xx_gpu);
+
a6xx_gmu_remove(a6xx_gpu);
 

[PATCHv8 7/8] iommu: arm-smmu-impl: Use table to list QCOM implementations

2020-11-17 Thread Sai Prakash Ranjan
Use table and of_match_node() to match qcom implementation
instead of multiple of_device_compatible() calls for each
QCOM SMMU implementation.

Signed-off-by: Sai Prakash Ranjan 
Acked-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  9 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 21 -
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 -
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 7fed89c9d18a..26e2734eb4d7 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -214,14 +214,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
return nvidia_smmu_impl_init(smmu);
 
-   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
-   return qcom_smmu_impl_init(smmu);
-
-   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
-   return qcom_adreno_smmu_impl_init(smmu);
+   smmu = qcom_smmu_impl_init(smmu);
 
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = _mmu500_impl;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index d0636c803a36..add1859b2899 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -318,12 +318,23 @@ static struct arm_smmu_device *qcom_smmu_create(struct 
arm_smmu_device *smmu,
return >smmu;
 }
 
+static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
+   { .compatible = "qcom,sc7180-smmu-500" },
+   { .compatible = "qcom,sdm845-smmu-500" },
+   { .compatible = "qcom,sm8150-smmu-500" },
+   { .compatible = "qcom,sm8250-smmu-500" },
+   { }
+};
+
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
 {
-   return qcom_smmu_create(smmu, _smmu_impl);
-}
+   const struct device_node *np = smmu->dev->of_node;
 
-struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
-{
-   return qcom_smmu_create(smmu, _adreno_smmu_impl);
+   if (of_match_node(qcom_smmu_impl_of_match, np))
+   return qcom_smmu_create(smmu, _smmu_impl);
+
+   if (of_device_is_compatible(np, "qcom,adreno-smmu"))
+   return qcom_smmu_create(smmu, _adreno_smmu_impl);
+
+   return smmu;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index caae543ea077..7db81c7c7833 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -523,7 +523,6 @@ static inline void arm_smmu_writeq(struct arm_smmu_device 
*smmu, int page,
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
-struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu);
 
 void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv8 6/8] drm/msm/a6xx: Add support for using system cache on MMU500 based targets

2020-11-17 Thread Sai Prakash Ranjan
From: Jordan Crouse 

GPU targets with an MMU-500 attached have a slightly different process for
enabling system cache. Use the compatible string on the IOMMU phandle
to see if an MMU-500 is attached and modify the programming sequence
accordingly.

Signed-off-by: Jordan Crouse 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 2 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 95c98c642876..3f8b92da8cba 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1042,6 +1042,8 @@ static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
 
 static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 {
+   struct adreno_gpu *adreno_gpu = _gpu->base;
+   struct msm_gpu *gpu = _gpu->base;
u32 cntl1_regval = 0;
 
if (IS_ERR(a6xx_gpu->llc_mmio))
@@ -1055,11 +1057,17 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
   (gpu_scid << 15) | (gpu_scid << 20);
}
 
+   /*
+* For targets with a MMU500, activate the slice but don't program the
+* register.  The XBL will take care of that.
+*/
if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
-   u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+   if (!a6xx_gpu->have_mmu500) {
+   u32 gpuhtw_scid = 
llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
 
-   gpuhtw_scid &= 0x1f;
-   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+   gpuhtw_scid &= 0x1f;
+   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), 
gpuhtw_scid);
+   }
}
 
if (cntl1_regval) {
@@ -1067,13 +1075,20 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 * Program the slice IDs for the various GPU blocks and GPU MMU
 * pagetables
 */
-   a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
-
-   /*
-* Program cacheability overrides to not allocate cache lines on
-* a write miss
-*/
-   a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
0xF, 0x03);
+   if (a6xx_gpu->have_mmu500)
+   gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0),
+   cntl1_regval);
+   else {
+   a6xx_llc_write(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
+
+   /*
+* Program cacheability overrides to not allocate cache
+* lines on a write miss
+*/
+   a6xx_llc_rmw(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 
0x03);
+   }
}
 }
 
@@ -1086,10 +1101,21 @@ static void a6xx_llc_slices_destroy(struct a6xx_gpu 
*a6xx_gpu)
 static void a6xx_llc_slices_init(struct platform_device *pdev,
struct a6xx_gpu *a6xx_gpu)
 {
+   struct device_node *phandle;
+
a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
if (IS_ERR(a6xx_gpu->llc_mmio))
return;
 
+   /*
+* There is a different programming path for targets with an mmu500
+* attached, so detect if that is the case
+*/
+   phandle = of_parse_phandle(pdev->dev.of_node, "iommus", 0);
+   a6xx_gpu->have_mmu500 = (phandle &&
+   of_device_is_compatible(phandle, "arm,mmu-500"));
+   of_node_put(phandle);
+
a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 9e6079af679c..e793d329e77b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -32,6 +32,7 @@ struct a6xx_gpu {
void __iomem *llc_mmio;
void *llc_slice;
void *htw_llc_slice;
+   bool have_mmu500;
 };
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv8 2/8] iommu/arm-smmu: Add domain attribute for pagetable configuration

2020-11-17 Thread Sai Prakash Ranjan
Add iommu domain attribute for pagetable configuration which
initially will be used to set quirks like for system cache aka
last level cache to be used by client drivers like GPU to set
right attributes for caching the hardware pagetables into the
system cache and later can be extended to include other page
table configuration data.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 25 +
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 include/linux/io-pgtable.h|  4 
 include/linux/iommu.h |  1 +
 4 files changed, 31 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0f28a8614da3..7b05782738e2 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -789,6 +789,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+   if (smmu_domain->pgtbl_cfg.quirks)
+   pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops) {
ret = -ENOMEM;
@@ -1511,6 +1514,12 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
+   struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data;
+   *pgtbl_cfg = smmu_domain->pgtbl_cfg;
+
+   return 0;
+   }
default:
return -ENODEV;
}
@@ -1551,6 +1560,22 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
break;
+   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
+   struct domain_attr_io_pgtbl_cfg *pgtbl_cfg = data;
+
+   if (smmu_domain->smmu) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   if (!pgtbl_cfg) {
+   ret = -ENODEV;
+   goto out_unlock;
+   }
+
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   break;
+   }
default:
ret = -ENODEV;
}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 04288b6fc619..18fbed376afb 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -364,6 +364,7 @@ enum arm_smmu_domain_stage {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
+   struct domain_attr_io_pgtbl_cfg pgtbl_cfg;
const struct iommu_flush_ops*flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a9a2c59fab37..686b37d48743 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -212,6 +212,10 @@ struct io_pgtable {
 
 #define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
 
+struct domain_attr_io_pgtbl_cfg {
+   unsigned long quirks;
+};
+
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
iop->cfg.tlb->tlb_flush_all(iop->cookie);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..ffaa389ea128 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -118,6 +118,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_IO_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v3 3/4] iommu/iova: Flush CPU rcache for when a depot fills

2020-11-17 Thread John Garry
Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.

To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.

At a certain point, a depot may become full, and also some CPU rcaches may
also be full when inserting another IOVA is attempted. For this scenario,
currently the "loaded" CPU rcache is freed and a new one is created. This
freeing means that many IOVAs in the RB tree need to be freed, which
makes IO throughput performance fall off a cliff in some storage scenarios:

Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]

And continue in this fashion, without recovering. Note that in this
example it was required to wait 16 hours for this to occur. Also note that
IO throughput also becomes gradually becomes more unstable leading up to
this point.

This problem is only seen for non-strict mode. For strict mode, the rcaches
stay quite compact.

As a solution to this issue, judge that the IOVA caches have grown too big
when cached magazines need to be free, and just flush all the CPUs rcaches
instead.

The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.

In future, some IOVA compaction could be implemented to solve the
instabilty issue, which I figure could be quite complex to implement.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

Analyzed-by: Zhen Lei 
Reported-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 1f3f0f8b12e0..386005055aca 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 struct iova_rcache *rcache,
 unsigned long iova_pfn)
 {
-   struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
bool can_insert = false;
unsigned long flags;
@@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,
if (cpu_rcache->loaded)
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
-   } else {
-   mag_to_free = cpu_rcache->loaded;
+   can_insert = true;
+   cpu_rcache->loaded = new_mag;
}
spin_unlock(>lock);
-
-   cpu_rcache->loaded = new_mag;
-   can_insert = true;
+   if (!can_insert)
+   iova_magazine_free(new_mag);
}
}
 
@@ -938,10 +936,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 
spin_unlock_irqrestore(_rcache->lock, flags);
 
-   if (mag_to_free) {
-   iova_magazine_free_pfns(mag_to_free, iovad);
-   iova_magazine_free(mag_to_free);
-   }
+   if (!can_insert)
+   free_all_cpu_cached_iovas(iovad);
 
return can_insert;
 }
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v3 2/4] iommu/iova: Avoid double-negatives in magazine helpers

2020-11-17 Thread John Garry
A similar crash to the following could be observed if initial CPU rcache
magazine allocations fail in init_iova_rcaches():

Unable to handle kernel NULL pointer dereference at virtual address 

Mem abort info:
   ESR = 0x9604
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
[] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc7-dirty #109
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 
03/15/2019
Call trace:
  free_iova_fast+0xfc/0x280
  iommu_dma_free_iova+0x64/0x70
  __iommu_dma_unmap+0x9c/0xf8
  iommu_dma_unmap_sg+0xa8/0xc8
  dma_unmap_sg_attrs+0x28/0x50
  cq_thread_v3_hw+0x2dc/0x528
  irq_thread_fn+0x2c/0xa0
  irq_thread+0x130/0x1e0
  kthread+0x154/0x158
  ret_from_fork+0x10/0x34

Code: f9400060 f102001f 54000981 d421 (f9400043)

 ---[ end trace 4afcbdfc61b60467 ]---

The issue is that expression !iova_magazine_full(NULL) evaluates true; this
falls over in in __iova_rcache_insert() when we attempt to cache a mag
and cpu_rcache->loaded == NULL:

if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;
...

if (can_insert)
iova_magazine_push(cpu_rcache->loaded, iova_pfn);

As above, can_insert is evaluated true, which it shouldn't be, and we try
to insert pfns in a NULL mag, which is not safe.

To avoid this, stop using double-negatives, like !iova_magazine_full() and
!iova_magazine_empty(), and use positive tests, like
iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these
can safely deal with cpu_rcache->{loaded, prev} = NULL.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 81b7399dd5e8..1f3f0f8b12e0 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct 
iova_domain *iovad)
mag->size = 0;
 }
 
-static bool iova_magazine_full(struct iova_magazine *mag)
+static bool iova_magazine_has_space(struct iova_magazine *mag)
 {
-   return (mag && mag->size == IOVA_MAG_SIZE);
+   if (!mag)
+   return false;
+   return mag->size < IOVA_MAG_SIZE;
 }
 
-static bool iova_magazine_empty(struct iova_magazine *mag)
+static bool iova_magazine_has_pfns(struct iova_magazine *mag)
 {
-   return (!mag || mag->size == 0);
+   if (!mag)
+   return false;
+   return mag->size;
 }
 
 static unsigned long iova_magazine_pop(struct iova_magazine *mag,
@@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
int i;
unsigned long pfn;
 
-   BUG_ON(iova_magazine_empty(mag));
+   BUG_ON(!iova_magazine_has_pfns(mag));
 
/* Only fall back to the rbtree if we have no suitable pfns at all */
for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
@@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
 
 static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn)
 {
-   BUG_ON(iova_magazine_full(mag));
+   BUG_ON(!iova_magazine_has_space(mag));
 
mag->pfns[mag->size++] = pfn;
 }
@@ -905,9 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_full(cpu_rcache->loaded)) {
+   if (iova_magazine_has_space(cpu_rcache->loaded)) {
can_insert = true;
-   } else if (!iova_magazine_full(cpu_rcache->prev)) {
+   } else if (iova_magazine_has_space(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
can_insert = true;
} else {
@@ -916,8 +920,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (new_mag) {
spin_lock(>lock);
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
-   rcache->depot[rcache->depot_size++] =
-   cpu_rcache->loaded;
+   if (cpu_rcache->loaded)
+   rcache->depot[rcache->depot_size++] =
+   cpu_rcache->loaded;
} else {
mag_to_free = cpu_rcache->loaded;
}
@@ -968,9 +973,9 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_empty(cpu_rcache->loaded)) {
+   if 

Re: [PATCH] iommu: fix return error code in iommu_probe_device()

2020-11-17 Thread Yang Yingliang


On 2020/11/17 17:40, Lu Baolu wrote:

Hi Yingliang,

On 2020/11/17 10:52, Yang Yingliang wrote:

If iommu_group_get() failed, it need return error code
in iommu_probe_device().

Fixes: cf193888bfbd ("iommu: Move new probe_device path...")
Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
  drivers/iommu/iommu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b53446bb8c6b..6f4a32df90f6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -253,8 +253,10 @@ int iommu_probe_device(struct device *dev)
  goto err_out;
    group = iommu_group_get(dev);
-    if (!group)
+    if (!group) {
+    ret = -ENODEV;


Can you please explain why you use -ENODEV here?


Before 79659190ee97 ("iommu: Don't take group reference in 
iommu_alloc_default_domain()"), in


iommu_alloc_default_domain(), if group is NULL, it will return -ENODEV.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/iommu.c?h=v5.10-rc4=70b8170e55d3ca9503a53211967faee6b5f18b19



Best regards,
baolu


  goto err_release;
+    }
    /*
   * Try to allocate a default domain - needs support from the


.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv8 3/8] iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg

2020-11-17 Thread Sai Prakash Ranjan
Now that we have a struct domain_attr_io_pgtbl_cfg with quirks,
use that for non_strict mode as well thereby removing the need
for more members of arm_smmu_domain in the future.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 7 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 -
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 7b05782738e2..5f066a1b7221 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -786,9 +786,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
goto out_clear_smmu;
}
 
-   if (smmu_domain->non_strict)
-   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
-
if (smmu_domain->pgtbl_cfg.quirks)
pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;
 
@@ -1527,7 +1524,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
+   *(int *)data = smmu_domain->pgtbl_cfg.quirks;
return 0;
default:
return -ENODEV;
@@ -1583,7 +1580,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
+   smmu_domain->pgtbl_cfg.quirks |= 
IO_PGTABLE_QUIRK_NON_STRICT;
break;
default:
ret = -ENODEV;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 18fbed376afb..caae543ea077 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -368,7 +368,6 @@ struct arm_smmu_domain {
const struct iommu_flush_ops*flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
-   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv8 0/8] System Cache support for GPU and required SMMU support

2020-11-17 Thread Sai Prakash Ranjan
Some hardware variants contain a system cache or the last level
cache(llc). This cache is typically a large block which is shared
by multiple clients on the SOC. GPU uses the system cache to cache
both the GPU data buffers(like textures) as well the SMMU pagetables.
This helps with improved render performance as well as lower power
consumption by reducing the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices
which then be used by multiple SOC clients. This patch series is an
effort to enable and use two of those slices preallocated for the GPU,
one for the GPU data buffers and another for the GPU SMMU hardware
pagetables.

Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
Patch 7 and 8 are minor cleanups for arm-smmu impl.

Changes in v8:
 * Introduce a generic domain attribute for pagetable config (Will)
 * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
 * Move non-strict mode to use new struct domain_attr_io_pgtbl_config (Will)

Changes in v7:
 * Squash Jordan's patch to support MMU500 targets
 * Rebase on top of for-joerg/arm-smmu/updates and Jordan's short series for 
adreno-smmu impl

Changes in v6:
 * Move table to arm-smmu-qcom (Robin)

Changes in v5:
 * Drop cleanup of blank lines since it was intentional (Robin)
 * Rebase again on top of msm-next-pgtables as it moves pretty fast

Changes in v4:
 * Drop IOMMU_SYS_CACHE prot flag
 * Rebase on top of 
https://gitlab.freedesktop.org/drm/msm/-/tree/msm-next-pgtables

Changes in v3:
 * Fix domain attribute setting to before iommu_attach_device()
 * Fix few code style and checkpatch warnings
 * Rebase on top of Jordan's latest split pagetables and per-instance
   pagetables support

Changes in v2:
 * Addressed review comments and rebased on top of Jordan's split
   pagetables series

Jordan Crouse (1):
  drm/msm/a6xx: Add support for using system cache on MMU500 based
targets

Sai Prakash Ranjan (5):
  iommu/io-pgtable-arm: Add support to use system cache
  iommu/arm-smmu: Add domain attribute for pagetable configuration
  iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg
  iommu: arm-smmu-impl: Use table to list QCOM implementations
  iommu: arm-smmu-impl: Add a space before open parenthesis

Sharat Masetty (2):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm/a6xx: Add support for using system cache(LLC)

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 109 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   5 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|  17 
 drivers/gpu/drm/msm/msm_drv.c  |   8 ++
 drivers/gpu/drm/msm/msm_drv.h  |   1 +
 drivers/gpu/drm/msm/msm_gpu.h  |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  11 +--
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  21 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  |  30 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |   3 +-
 drivers/iommu/io-pgtable-arm.c |  10 +-
 include/linux/io-pgtable.h |   8 ++
 include/linux/iommu.h  |   1 +
 13 files changed, 203 insertions(+), 26 deletions(-)


base-commit: a29bbb0861f487a5e144dc997a9f71a36c7a2404
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv8 4/8] drm/msm: rearrange the gpu_rmw() function

2020-11-17 Thread Sai Prakash Ranjan
From: Sharat Masetty 

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty 
Reviewed-by: Jordan Crouse 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/msm_drv.c | 8 
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 49685571dc0e..a1e22b974b77 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -180,6 +180,14 @@ u32 msm_readl(const void __iomem *addr)
return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+   u32 val = msm_readl(addr);
+
+   val &= ~mask;
+   msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
struct work_struct work;
int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b9dd8f8f4887..655b3b0424a1 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -478,6 +478,7 @@ void __iomem *msm_ioremap_quiet(struct platform_device 
*pdev, const char *name,
const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6c9e1fdc1a76..b2b419277953 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -246,10 +246,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-   uint32_t val = gpu_read(gpu, reg);
-
-   val &= ~mask;
-   gpu_write(gpu, reg, val | or);
+   msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-11-17 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: 16 November 2020 10:43
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> Thodi ;
> alex.william...@redhat.com; jacob.jun@linux.intel.com;
> yi.l@intel.com; t...@semihalf.com; nicoleots...@gmail.com
> Subject: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and
> s2_cfg
> 
> In preparation for the introduction of nested stages
> let's turn s1_cfg and s2_cfg fields into pointers which are
> dynamically allocated depending on the smmu_domain stage.

This will break compile if we have CONFIG_ARM_SMMU_V3_SVA
because ,
https://github.com/eauger/linux/blob/5.10-rc4-2stage-v12/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c#L40

Do we really need to make these pointers?

Thanks,
Shameer
 
> In nested mode, both stages will coexist and s1_cfg will
> be allocated when the guest configuration gets passed.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 83 -
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 +-
>  2 files changed, 48 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d828d6cbeb0e..4baf9fafe462 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -953,9 +953,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct
> arm_smmu_domain *smmu_domain,
>   unsigned int idx;
>   struct arm_smmu_l1_ctx_desc *l1_desc;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
> + struct arm_smmu_ctx_desc_cfg *cdcfg =
> _domain->s1_cfg->cdcfg;
> 
> - if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
> + if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>   return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
> 
>   idx = ssid >> CTXDESC_SPLIT;
> @@ -990,7 +990,7 @@ int arm_smmu_write_ctx_desc(struct
> arm_smmu_domain *smmu_domain, int ssid,
>   __le64 *cdptr;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> 
> - if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
> + if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
>   return -E2BIG;
> 
>   cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
> @@ -1056,7 +1056,7 @@ static int arm_smmu_alloc_cd_tables(struct
> arm_smmu_domain *smmu_domain)
>   size_t l1size;
>   size_t max_contexts;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
> + struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
>   struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
> 
>   max_contexts = 1 << cfg->s1cdmax;
> @@ -1104,7 +1104,7 @@ static void arm_smmu_free_cd_tables(struct
> arm_smmu_domain *smmu_domain)
>   int i;
>   size_t size, l1size;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
> + struct arm_smmu_ctx_desc_cfg *cdcfg =
> _domain->s1_cfg->cdcfg;
> 
>   if (cdcfg->l1_desc) {
>   size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> @@ -1211,17 +1211,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>   }
> 
>   if (smmu_domain) {
> - switch (smmu_domain->stage) {
> - case ARM_SMMU_DOMAIN_S1:
> - s1_cfg = _domain->s1_cfg;
> - break;
> - case ARM_SMMU_DOMAIN_S2:
> - case ARM_SMMU_DOMAIN_NESTED:
> - s2_cfg = _domain->s2_cfg;
> - break;
> - default:
> - break;
> - }
> + s1_cfg = smmu_domain->s1_cfg;
> + s2_cfg = smmu_domain->s2_cfg;
>   }
> 
>   if (val & STRTAB_STE_0_V) {
> @@ -1664,10 +1655,10 @@ static void arm_smmu_tlb_inv_context(void
> *cookie)
>* careful, 007.
>*/
>   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> - arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
> + arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg->cd.asid);
>   } else {
>   cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
> - cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
> + cmd.tlbi.vmid   = smmu_domain->s2_cfg->vmid;
>   arm_smmu_cmdq_issue_cmd(smmu, );
>   arm_smmu_cmdq_issue_sync(smmu);
>   }
> @@ -1693,10 +1684,10 @@ static void arm_smmu_tlb_inv_range(unsigned

Re: [PATCH 1/1] vfio/type1: Add subdev_ioasid callback to vfio_iommu_driver_ops

2020-11-17 Thread Lu Baolu

Hi Alex,

On 2020/11/17 3:56, Alex Williamson wrote:

On Thu, 12 Nov 2020 10:24:07 +0800
Lu Baolu  wrote:


Add API for getting the ioasid of a subdevice (vfio/mdev). This calls
into the backend IOMMU module to get the actual value or error number
if ioasid for subdevice is not supported. The physical device driver
implementations which rely on the vfio/mdev framework for mediated
device user level access could typically consume this interface like
below:

struct device *dev = mdev_dev(mdev);
unsigned int pasid;
int ret;

ret = vfio_subdev_ioasid(dev, );
if (ret < 0)
return ret;

  /* Program device context with pasid value. */
  


Seems like an overly specific callback.  We already export means for
you to get a vfio_group, test that a device is an mdev, and get the
iommu device from an mdev.  So you can already test whether a given
device is an mdev with an iommu backing device that supports aux
domains.  The only missing piece seems to be that you can't get the
domain for a group in order to retrieve the pasid.  So why aren't we
exporting a callback that given a vfio_group provides the iommu domain?


Make sense! Thanks for your guidance. :-)

So what we want to export in vfio.c is

struct iommu_domain *vfio_group_get_domain(struct vfio_group *group)

What the callers need to do are:

unsigned int pasid;
struct vfio_group *vfio_group;
struct iommu_domain *iommu_domain;
struct device *dev = mdev_dev(mdev);
struct device *iommu_device = mdev_get_iommu_device(dev);

if (!iommu_device ||
!iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
return -EINVAL;

vfio_group = vfio_group_get_external_user_from_dev(dev);
if (IS_ERR_OR_NULL(vfio_group))
return -EFAULT;

iommu_domain = vfio_group_get_domain(vfio_group);
if (IS_ERR_OR_NULL(iommu_domain)) {
vfio_group_put_external_user(vfio_group);
return -EFAULT;
}

pasid = iommu_aux_get_pasid(iommu_domain, iommu_device);
if (pasid < 0) {
vfio_group_put_external_user(vfio_group);
return -EFAULT;
}

/* Program device context with pasid value. */
...

/* After use of this pasid */

/* Clear the pasid value in device context */
...

vfio_group_put_external_user(vfio_group);

Do I understand your points correctly?

Best regards,
baolu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Ashish Kalra
Hello Konrad, 

On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
> .snip..
> > > > > Lets break this down:
> > > > > 
> > > > > How does the performance improve for one single device if you 
> > > > > increase the SWIOTLB?
> > > > > Is there a specific device/driver that you can talk about that 
> > > > > improve with this patch?
> > > > > 
> > > > > 
> > > > 
> > > > Yes, these are mainly for multi-queue devices such as NICs or even
> > > > multi-queue virtio. 
> > > > 
> > > > This basically improves performance with concurrent DMA, hence,
> > > > basically multi-queue devices.
> > > 
> > > OK, and for _1GB_ guest - what are the "internal teams/external 
> > > customers" amount 
> > > of CPUs they use? Please lets use real use-cases.
> > 
> > >> I am sure you will understand we cannot share any external customer
> > >> data as all that customer information is proprietary.
> > >>
> > >> In similar situation if you have to share Oracle data, you will
> > >> surely have the same concerns and i don't think you will be able
> > >> to share any such information externally, i.e., outside Oracle.
> > >>
> > >I am asking for a simple query - what amount of CPUs does a 1GB
> > >guest have? The reason for this should be fairly obvious - if
> > >it is a 1vCPU, then there is no multi-queue and the existing
> > >SWIOTLB pool size as it is OK.
> > >
> > >If however there are say 2 and multiqueue is enabled, that
> > >gives me an idea of how many you use and I can find out what
> > >the maximum pool size usage of virtio there is with that configuration.
> > 
> > Again we cannot share any customer data.
> > 
> > Also i don't think there can be a definitive answer to how many vCPUs a
> > 1GB guest will have, it will depend on what kind of configuration we are
> > testing.
> > 
> > For example, i usually setup 4-16 vCPUs for as low as 512M configured
> > gueest memory.
> 
> Sure, but you are not the normal user.
> 
> That is I don't like that for 1GB guests your patch ends up doubling the
> SWIOTLB memory pool. That seems to me we are trying to solve a problem
> that normal users will not hit. That is why I want 'here is the customer
> bug'.
> 
> Here is what I am going to do - I will take out the 1GB and 4GB case out of
> your patch and call it a day. If there are customers who start reporting 
> issues
> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
> customer if he or she would like to be recognized on upstream bugs).
>

Ok.

> And in the meantime I am going to look about adding ..
> > 
> > I have been also testing with 16 vCPUs configuration for 512M-1G guest
> > memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> > device environment.
> 
> .. late SWIOTLB expansion to stich the DMA pools together as both
> Mellanox and VirtIO are not 32-bit DMA bound.
> 
> > 
> > So we might be having less configured guest memory, but we still might
> > be using that configuration with I/O intensive workloads.
> >

I am going to submit v4 of my current patch-set which uses max() instead
of clamp() and also replaces constants defined in this patch with the
pre-defined ones in sizes.h

Thanks,
Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-17 Thread Mike Marciniszyn

Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")


Note that the drivers had open coded versions of this earlier.  I think
this goes back to the addition of the qib driver which is now gone
or the addition of the hfi1 or rxe drivers for something that still
matters


Christoph,Jason

I built a branch using the following recipe:
https://patchwork.kernel.org/project/linux-rdma/patch/:
20201106181941.1878556-11-...@lst.de/ dma-mapping: remove dma_virt_ops 
20201106181941.1878556-10-...@lst.de/ PCI/P2PDMA: Cleanup 
__pci_p2pdma_map_sg a bit
20201106181941.1878556-9-...@lst.de/  PCI/P2PDMA: Remove the 
DMA_VIRT_OPS hacks

20201106181941.1878556-8-...@lst.de/  RDMA/core: remove use of dma_virt_ops
20201106181941.1878556-7-...@lst.de/  RDMA/core: remove 
ib_dma_{alloc,free}_coherent

20201106181941.1878556-6-...@lst.de/  rds: stop using dmapool
20201106181941.1878556-5-...@lst.de/  nvme-rdma: use ibdev_to_node 
instead of dereferencing ->dma_device
20201106181941.1878556-4-...@lst.de/  RDMA: lift ibdev_to_node from rds 
to common code
20201106181941.1878556-3-...@lst.de/  RDMA/umem: use ib_dma_max_seg_size 
instead of dma_get_max_seg_size
rdma/for-rc dabbd6abcdbe which has RMDA/sw: don't allow drivers using 
dma_virt_ops on highmem configs


All of our rdmavt/hfi1 tests passed.

So I can at least vouch for "RDMA/core: remove use of dma_virt_ops"

Mike
Tested-by: Mike Marciniszyn 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Ashish Kalra
On Fri, Nov 13, 2020 at 04:19:25PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 05, 2020 at 09:20:45PM +, Ashish Kalra wrote:
> > On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Nov 05, 2020 at 07:38:28PM +, Ashish Kalra wrote:
> > > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > .
> > > > > > > Right, so I am wondering if we can do this better.
> > > > > > > 
> > > > > > > That is you are never going to get any 32-bit devices with SEV 
> > > > > > > right? That
> > > > > > > is there is nothing that bounds you to always use the memory 
> > > > > > > below 4GB?
> > > > > > > 
> > > > > > 
> > > > > > We do support 32-bit PCIe passthrough devices with SEV.
> > > > > 
> > > > > Ewww..  Which devices would this be?
> > > > 
> > > > That will be difficult to predict as customers could be doing
> > > > passthrough of all kinds of devices.
> > > 
> > > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots 
> > > in there.
> > > 
> > > Is it really possible to have a PCIe device that can't do more than 
> > > 32-bit DMA?
> > > 
> > > > 
> > > > > > 
> > > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce 
> > > > > > buffering
> > > > > > when there is I/O pressure, because we do need to support device
> > > > > > passthrough of 32-bit devices.
> > > > > 
> > > > > Presumarily there is just a handful of them?
> > > > >
> > > > Again, it will be incorrect to assume this.
> > > > 
> > > > > > 
> > > > > > Considering this, we believe that this patch needs to adjust/extend
> > > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > > > within a range detemined by amount of allocated guest memory.
> > > > > 
> > > > > I would prefer to not have to revert this in a year as customers
> > > > > complain about "I paid $$$ and I am wasting half a gig on something 
> > > > > I am not using" and giving customers knobs to tweak this instead of
> > > > > doing the right thing from the start.
> > > > 
> > > > Currently, we face a lot of situations where we have to tell our
> > > > internal teams/external customers to explicitly increase SWIOTLB buffer
> > > > via the swiotlb parameter on the kernel command line, especially to
> > > > get better I/O performance numbers with SEV. 
> > > 
> > > Presumarily these are 64-bit?
> > > 
> > > And what devices do you speak off that are actually affected by 
> > > this performance? Increasing the SWIOTLB just means we have more
> > > memory, which in mind means you can have _more_ devices in the guest
> > > that won't handle the fact that DMA mapping returns an error.
> > > 
> > > Not neccessarily that one device suddenly can go faster.
> > > 
> > > > 
> > > > So by having this SWIOTLB size adjustment done implicitly (even using a
> > > > static logic) is a great win-win situation. In other words, having even
> > > > a simple and static default increase of SWIOTLB buffer size for SEV is
> > > > really useful for us.
> > > > 
> > > > We can always think of adding all kinds of heuristics to this, but that
> > > > just adds too much complexity without any predictable performance gain.
> > > > 
> > > > And to add, the patch extends the SWIOTLB size as an architecture
> > > > specific callback, currently it is a simple and static logic for SEV/x86
> > > > specific, but there is always an option to tweak/extend it with
> > > > additional logic in the future.
> > > 
> > > Right, and that is what I would like to talk about as I think you
> > > are going to disappear (aka, busy with other stuff) after this patch goes 
> > > in.
> > > 
> > > I need to understand this more than "performance" and "internal teams"
> > > requirements to come up with a better way going forward as surely other
> > > platforms will hit the same issue anyhow.
> > > 
> > > Lets break this down:
> > > 
> > > How does the performance improve for one single device if you increase 
> > > the SWIOTLB?
> > > Is there a specific device/driver that you can talk about that improve 
> > > with this patch?
> > > 
> > > 
> > 
> > Yes, these are mainly for multi-queue devices such as NICs or even
> > multi-queue virtio. 
> > 
> > This basically improves performance with concurrent DMA, hence,
> > basically multi-queue devices.
> 
> OK, and for _1GB_ guest - what are the "internal teams/external customers" 
> amount 
> of CPUs they use? Please lets use real use-cases.

>> I am sure you will understand we cannot share any external customer
>> data as all that customer information is proprietary.
>>
>> In similar situation if you have to share Oracle data, you will
>> surely have the same concerns and i don't think you will be able
>> to share any such information externally, i.e., outside Oracle.
>>
>I am asking for a simple query - what amount of CPUs does a 1GB
>guest have? The reason for this should be fairly obvious - if
>it is a 1vCPU, then there is no multi-queue and the existing

Re: remove dma_virt_ops v2

2020-11-17 Thread Ka-Cheong Poon

On 11/13/20 1:36 AM, santosh.shilim...@oracle.com wrote:

+ Ka-Cheong

On 11/12/20 5:23 AM, Jason Gunthorpe wrote:

On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:

ping?

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:

Hi Jason,

this series switches the RDMA core to opencode the special case of
devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
have caused a bit of trouble due to the P2P code node working with
them due to the fact that we'd do two dma mapping iterations for a
single I/O, but also are a bit of layering violation and lead to
more code than necessary.

Tested with nvme-rdma over rxe.

Note that the rds changes are untested, as I could not find any
simple rds test setup.

Changes since v2:
  - simplify the INFINIBAND_VIRT_DMA dependencies
  - add a ib_uses_virt_dma helper
  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
  - use ib_dma_max_seg_size in umem
  - stop using dmapool in rds

Changes since v1:
  - disable software RDMA drivers for highmem configs
  - update the PCI commit logs


Santosh can you please check the RDA parts??



Hi Ka-Cheong,

Can you please check Christoph change [1] which clean-up
dma-pool API to use ib_dma_* and slab allocator ? This was added
as part of your "net/rds: Use DMA memory pool allocation for rds_header"
commit.



I applied the patch and ran some basic testing.  And it seems to
work fine.

Thanks.


--
K. Poon
ka-cheong.p...@oracle.com


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Konrad Rzeszutek Wilk
.snip..
> > > > Lets break this down:
> > > > 
> > > > How does the performance improve for one single device if you increase 
> > > > the SWIOTLB?
> > > > Is there a specific device/driver that you can talk about that improve 
> > > > with this patch?
> > > > 
> > > > 
> > > 
> > > Yes, these are mainly for multi-queue devices such as NICs or even
> > > multi-queue virtio. 
> > > 
> > > This basically improves performance with concurrent DMA, hence,
> > > basically multi-queue devices.
> > 
> > OK, and for _1GB_ guest - what are the "internal teams/external customers" 
> > amount 
> > of CPUs they use? Please lets use real use-cases.
> 
> >> I am sure you will understand we cannot share any external customer
> >> data as all that customer information is proprietary.
> >>
> >> In similar situation if you have to share Oracle data, you will
> >> surely have the same concerns and i don't think you will be able
> >> to share any such information externally, i.e., outside Oracle.
> >>
> >I am asking for a simple query - what amount of CPUs does a 1GB
> >guest have? The reason for this should be fairly obvious - if
> >it is a 1vCPU, then there is no multi-queue and the existing
> >SWIOTLB pool size as it is OK.
> >
> >If however there are say 2 and multiqueue is enabled, that
> >gives me an idea of how many you use and I can find out what
> >the maximum pool size usage of virtio there is with that configuration.
> 
> Again we cannot share any customer data.
> 
> Also i don't think there can be a definitive answer to how many vCPUs a
> 1GB guest will have, it will depend on what kind of configuration we are
> testing.
> 
> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> gueest memory.

Sure, but you are not the normal user.

That is I don't like that for 1GB guests your patch ends up doubling the
SWIOTLB memory pool. That seems to me we are trying to solve a problem
that normal users will not hit. That is why I want 'here is the customer
bug'.

Here is what I am going to do - I will take out the 1GB and 4GB case out of
your patch and call it a day. If there are customers who start reporting issues
we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
customer if he or she would like to be recognized on upstream bugs).

And in the meantime I am going to look about adding ..
> 
> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> device environment.

.. late SWIOTLB expansion to stich the DMA pools together as both
Mellanox and VirtIO are not 32-bit DMA bound.

> 
> So we might be having less configured guest memory, but we still might
> be using that configuration with I/O intensive workloads.
> 
> Thanks,
> Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Kalra, Ashish
Hello Konrad,

Actually I didn’t get that, do you mean you are taking 1G and <=4G cases out of 
the patch and only going to apply the >4G case as part of the patch ?

Thanks,
Ashish 

> On Nov 17, 2020, at 11:38 AM, Kalra, Ashish  wrote:
> 
> Hello Konrad, 
> 
>> On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
>> .snip..
>> Lets break this down:
>> 
>> How does the performance improve for one single device if you increase 
>> the SWIOTLB?
>> Is there a specific device/driver that you can talk about that improve 
>> with this patch?
>> 
>> 
> 
> Yes, these are mainly for multi-queue devices such as NICs or even
> multi-queue virtio. 
> 
> This basically improves performance with concurrent DMA, hence,
> basically multi-queue devices.
 
 OK, and for _1GB_ guest - what are the "internal teams/external customers" 
 amount 
 of CPUs they use? Please lets use real use-cases.
>>> 
> I am sure you will understand we cannot share any external customer
> data as all that customer information is proprietary.
> 
> In similar situation if you have to share Oracle data, you will
> surely have the same concerns and i don't think you will be able
> to share any such information externally, i.e., outside Oracle.
> 
 I am asking for a simple query - what amount of CPUs does a 1GB
 guest have? The reason for this should be fairly obvious - if
 it is a 1vCPU, then there is no multi-queue and the existing
 SWIOTLB pool size as it is OK.
 
 If however there are say 2 and multiqueue is enabled, that
 gives me an idea of how many you use and I can find out what
 the maximum pool size usage of virtio there is with that configuration.
>>> 
>>> Again we cannot share any customer data.
>>> 
>>> Also i don't think there can be a definitive answer to how many vCPUs a
>>> 1GB guest will have, it will depend on what kind of configuration we are
>>> testing.
>>> 
>>> For example, i usually setup 4-16 vCPUs for as low as 512M configured
>>> gueest memory.
>> 
>> Sure, but you are not the normal user.
>> 
>> That is I don't like that for 1GB guests your patch ends up doubling the
>> SWIOTLB memory pool. That seems to me we are trying to solve a problem
>> that normal users will not hit. That is why I want 'here is the customer
>> bug'.
>> 
>> Here is what I am going to do - I will take out the 1GB and 4GB case out of
>> your patch and call it a day. If there are customers who start reporting 
>> issues
>> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
>> customer if he or she would like to be recognized on upstream bugs).
>> 
> 
> Ok.
> 
>> And in the meantime I am going to look about adding ..
>>> 
>>> I have been also testing with 16 vCPUs configuration for 512M-1G guest
>>> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
>>> device environment.
>> 
>> .. late SWIOTLB expansion to stich the DMA pools together as both
>> Mellanox and VirtIO are not 32-bit DMA bound.
>> 
>>> 
>>> So we might be having less configured guest memory, but we still might
>>> be using that configuration with I/O intensive workloads.
>>> 
> 
> I am going to submit v4 of my current patch-set which uses max() instead
> of clamp() and also replaces constants defined in this patch with the
> pre-defined ones in sizes.h
> 
> Thanks,
> Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: remove dma_virt_ops v2

2020-11-17 Thread santosh . shilimkar

On 11/17/20 2:50 AM, Ka-Cheong Poon wrote:

On 11/13/20 1:36 AM, santosh.shilim...@oracle.com wrote:

+ Ka-Cheong

On 11/12/20 5:23 AM, Jason Gunthorpe wrote:

On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:

ping?

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:

Hi Jason,

this series switches the RDMA core to opencode the special case of
devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
have caused a bit of trouble due to the P2P code node working with
them due to the fact that we'd do two dma mapping iterations for a
single I/O, but also are a bit of layering violation and lead to
more code than necessary.

Tested with nvme-rdma over rxe.

Note that the rds changes are untested, as I could not find any
simple rds test setup.

Changes since v2:
  - simplify the INFINIBAND_VIRT_DMA dependencies
  - add a ib_uses_virt_dma helper
  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma 
devices

  - use ib_dma_max_seg_size in umem
  - stop using dmapool in rds

Changes since v1:
  - disable software RDMA drivers for highmem configs
  - update the PCI commit logs


Santosh can you please check the RDA parts??



Hi Ka-Cheong,

Can you please check Christoph change [1] which clean-up
dma-pool API to use ib_dma_* and slab allocator ? This was added
as part of your "net/rds: Use DMA memory pool allocation for rds_header"
commit.



I applied the patch and ran some basic testing.  And it seems to
work fine.


Thanks Ka-Cheong.

Jason, Feel free to add ack for the RDS part.

Regards,
Santosh


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Ashish Kalra
From: Ashish Kalra 

For SEV, all DMA to and from guest has to use shared
(un-encrypted) pages. SEV uses SWIOTLB to make this
happen without requiring changes to device drivers.
However, depending on workload being run, the default
64MB of SWIOTLB might not be enough and SWIOTLB
may run out of buffers to use for DMA, resulting
in I/O errors and/or performance degradation for
high I/O workloads.

Increase the default size of SWIOTLB for SEV guests
using a minimum value of 128MB and a maximum value
of 512MB, determining on amount of provisioned guest
memory.

Using late_initcall() interface to invoke
swiotlb_adjust() does not work as the size
adjustment needs to be done before mem_encrypt_init()
and reserve_crashkernel() which use the allocated
SWIOTLB buffer size, hence calling it explicitly
from setup_arch().

The SWIOTLB default size adjustment is added as an
architecture specific interface/callback to allow
architectures such as those supporting memory
encryption to adjust/expand SWIOTLB size for their
use.

Signed-off-by: Ashish Kalra 
---
 arch/x86/kernel/setup.c   |  2 ++
 arch/x86/mm/mem_encrypt.c | 32 
 include/linux/swiotlb.h   |  1 +
 kernel/dma/swiotlb.c  | 27 +++
 4 files changed, 62 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3511736fbc74..b073d58dd4a3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
if (boot_cpu_has(X86_FEATURE_GBPAGES))
hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
 
+   swiotlb_adjust();
+
/*
 * Reserve memory for crash kernel after SRAT is parsed so that it
 * won't consume hotpluggable memory.
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 3f248f0d0e07..f6c04a3ac830 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -490,6 +490,38 @@ static void print_mem_encrypt_feature_info(void)
 }
 
 /* Architecture __weak replacement functions */
+unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
+{
+   unsigned long size = 0;
+
+   /*
+* For SEV, all DMA has to occur via shared/unencrypted pages.
+* SEV uses SWOTLB to make this happen without changing device
+* drivers. However, depending on the workload being run, the
+* default 64MB of SWIOTLB may not be enough & SWIOTLB may
+* run out of buffers for DMA, resulting in I/O errors and/or
+* performance degradation especially with high I/O workloads.
+* Increase the default size of SWIOTLB for SEV guests using
+* a minimum value of 128MB and a maximum value of 512MB,
+* depending on amount of provisioned guest memory.
+*/
+   if (sev_active()) {
+   phys_addr_t total_mem = memblock_phys_mem_size();
+
+   if (total_mem <= SZ_1G)
+   size = max(iotlb_default_size, (unsigned long) SZ_128M);
+   else if (total_mem <= SZ_4G)
+   size = max(iotlb_default_size, (unsigned long) SZ_256M);
+   else
+   size = max(iotlb_default_size, (unsigned long) SZ_512M);
+
+   pr_info("SEV adjusted max SWIOTLB size = %luMB",
+   size >> 20);
+   }
+
+   return size;
+}
+
 void __init mem_encrypt_init(void)
 {
if (!sme_me_mask)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 046bb94bd4d6..01ae6d891327 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 unsigned long swiotlb_size_or_default(void);
+extern void __init swiotlb_adjust(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern void __init swiotlb_update_mem_attributes(void);
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd20..66a9e627bb51 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
+unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
+{
+   return 0;
+}
+
+void __init swiotlb_adjust(void)
+{
+   unsigned long size;
+
+   /*
+* If swiotlb parameter has not been specified, give a chance to
+* architectures such as those supporting memory encryption to
+* adjust/expand SWIOTLB size for their use.
+*/
+   if (!io_tlb_nslabs) {
+   size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
+   if (size) {
+   size = ALIGN(size, 1 << IO_TLB_SHIFT);
+   io_tlb_nslabs = size >> IO_TLB_SHIFT;
+   io_tlb_nslabs = 

Re: remove dma_virt_ops v2

2020-11-17 Thread Jason Gunthorpe
On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs

All applied to for-next, thanks everyone

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Konrad Rzeszutek Wilk
On Tue, Nov 17, 2020 at 07:04:59PM +, Kalra, Ashish wrote:
> Hello Konrad,
> 
> Actually I didn’t get that, do you mean you are taking 1G and <=4G cases out 
> of the patch and only going to apply the >4G case as part of the patch ?

That was the thought, but now I am wondering how TDX is going to work
with this. That is the __weak won't work on distro kernel that has to
run on both AMD and Intel. Hmm.

Let me brush off the late-SWIOTLB patches that internally we developed some 
time ago.

> 
> Thanks,
> Ashish 
> 
> > On Nov 17, 2020, at 11:38 AM, Kalra, Ashish  wrote:
> > 
> > Hello Konrad, 
> > 
> >> On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >> .snip..
> >> Lets break this down:
> >> 
> >> How does the performance improve for one single device if you increase 
> >> the SWIOTLB?
> >> Is there a specific device/driver that you can talk about that improve 
> >> with this patch?
> >> 
> >> 
> > 
> > Yes, these are mainly for multi-queue devices such as NICs or even
> > multi-queue virtio. 
> > 
> > This basically improves performance with concurrent DMA, hence,
> > basically multi-queue devices.
>  
>  OK, and for _1GB_ guest - what are the "internal teams/external 
>  customers" amount 
>  of CPUs they use? Please lets use real use-cases.
> >>> 
> > I am sure you will understand we cannot share any external customer
> > data as all that customer information is proprietary.
> > 
> > In similar situation if you have to share Oracle data, you will
> > surely have the same concerns and i don't think you will be able
> > to share any such information externally, i.e., outside Oracle.
> > 
>  I am asking for a simple query - what amount of CPUs does a 1GB
>  guest have? The reason for this should be fairly obvious - if
>  it is a 1vCPU, then there is no multi-queue and the existing
>  SWIOTLB pool size as it is OK.
>  
>  If however there are say 2 and multiqueue is enabled, that
>  gives me an idea of how many you use and I can find out what
>  the maximum pool size usage of virtio there is with that configuration.
> >>> 
> >>> Again we cannot share any customer data.
> >>> 
> >>> Also i don't think there can be a definitive answer to how many vCPUs a
> >>> 1GB guest will have, it will depend on what kind of configuration we are
> >>> testing.
> >>> 
> >>> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> >>> gueest memory.
> >> 
> >> Sure, but you are not the normal user.
> >> 
> >> That is I don't like that for 1GB guests your patch ends up doubling the
> >> SWIOTLB memory pool. That seems to me we are trying to solve a problem
> >> that normal users will not hit. That is why I want 'here is the customer
> >> bug'.
> >> 
> >> Here is what I am going to do - I will take out the 1GB and 4GB case out of
> >> your patch and call it a day. If there are customers who start reporting 
> >> issues
> >> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
> >> customer if he or she would like to be recognized on upstream bugs).
> >> 
> > 
> > Ok.
> > 
> >> And in the meantime I am going to look about adding ..
> >>> 
> >>> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> >>> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> >>> device environment.
> >> 
> >> .. late SWIOTLB expansion to stich the DMA pools together as both
> >> Mellanox and VirtIO are not 32-bit DMA bound.
> >> 
> >>> 
> >>> So we might be having less configured guest memory, but we still might
> >>> be using that configuration with I/O intensive workloads.
> >>> 
> > 
> > I am going to submit v4 of my current patch-set which uses max() instead
> > of clamp() and also replaces constants defined in this patch with the
> > pre-defined ones in sizes.h
> > 
> > Thanks,
> > Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-11-17 Thread Will Deacon
On Wed, 30 Sep 2020 13:14:23 +0530, vji...@codeaurora.org wrote:
> When ever a new iova alloc request comes iova is always searched
> from the cached node and the nodes which are previous to cached
> node. So, even if there is free iova space available in the nodes
> which are next to the cached node iova allocation can still fail
> because of this approach.
> 
> Consider the following sequence of iova alloc and frees on
> 1GB of iova space
> 
> [...]

Applied to arm64 (for-next/iommu/iova), thanks!

[1/2] iommu/iova: Retry from last rb tree node if iova search fails
  https://git.kernel.org/arm64/c/4e89dce72521
[2/2] iommu/iova: Free global iova rcache on iova alloc failure
  https://git.kernel.org/arm64/c/6fa3525b455a

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Modify the description of iommu_sva_unbind_device

2020-11-17 Thread Will Deacon
On Fri, 23 Oct 2020 06:48:27 +, Chen Jun wrote:
> iommu_sva_unbind_device has no return value.
> 
> Remove the description of the return value of the function.

Applied to arm64 (for-next/iommu/misc), thanks!

[1/1] iommu: Modify the description of iommu_sva_unbind_device
  https://git.kernel.org/arm64/c/6243f572a18d

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM

2020-11-17 Thread Will Deacon
On Sun, 15 Nov 2020 21:59:51 +0100, Lukas Bulwahn wrote:
> Commit 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
> introduced intel_iommu_sva_invalidate() when CONFIG_INTEL_IOMMU_SVM.
> This function uses the dedicated static variable inv_type_granu_table
> and functions to_vtd_granularity() and to_vtd_size().
> 
> These parts are unused when !CONFIG_INTEL_IOMMU_SVM, and hence,
> make CC=clang W=1 warns with an -Wunused-function warning.
> 
> [...]

Applied to arm64 (for-next/iommu/vt-d), thanks!

[1/1] iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
  https://git.kernel.org/arm64/c/68dd9d89eaf5

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] iommu/arm-smmu: First batch of updates for 5.11

2020-11-17 Thread Will Deacon
On Tue, Nov 10, 2020 at 01:56:57PM +, Will Deacon wrote:
> Please can you pull these Arm SMMU updates for 5.11 so that they can get
> into -next? I think Bjorn is keen to get a bunch of DT updates moving, so
> the sooner we can get this lot out there, the better. Summary in the tag.
> 
> There are a few other patches kicking around on the list, so I may send
> a second pull on top in a couple of weeks or so.

This is now queued on a stable branch for -next targetting 5.11:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/iommu/arm-smmu

Bjorn, is this what you needed?

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/14] iommu/amd: Add Generic IO Page Table Framework Support

2020-11-17 Thread Will Deacon
Hey Suravee (it's been a while!),

On Fri, Nov 13, 2020 at 12:57:18PM +0700, Suravee Suthikulpanit wrote:
> Please ignore to include the V3. I am working on V4 to resubmit.

Please can you put me on CC for that?

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enforce 4k mapping for certain IOMMU data structures

2020-11-17 Thread Will Deacon
On Wed, Oct 28, 2020 at 11:18:24PM +, Suravee Suthikulpanit wrote:
> AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
> and the completion wait write-back regions. However, when allocating
> the pages, they could be part of large mapping (e.g. 2M) page.
> This causes #PF due to the SNP RMP hardware enforces the check based
> on the page level for these data structures.

Please could you include an example backtrace here?

> So, fix by calling set_memory_4k() on the allocated pages.

I think I'm missing something here. set_memory_4k() will break the kernel
linear mapping up into page granular mappings, but the IOMMU isn't using
that mapping, right? It's just using the physical address returned by
iommu_virt_to_phys(), so why does it matter?

Just be nice to capture some of this rationale in the log, especially as
I'm not familiar with this device.

> Fixes: commit c69d89aff393 ("iommu/amd: Use 4K page for completion wait 
> write-back semaphore")

I couldn't figure out how that commit could cause this problem. Please can
you explain that to me?

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: avoid unnecessory panic if iommu init fail in tboot system

2020-11-17 Thread Lu Baolu

+Will

Please consider this patch for v5.10.

Best regards,
baolu

On 2020/11/10 15:19, Zhenzhong Duan wrote:

"intel_iommu=off" command line is used to disable iommu but iommu is force
enabled in a tboot system for security reason.

However for better performance on high speed network device, a new option
"intel_iommu=tboot_noforce" is introduced to disable the force on.

By default kernel should panic if iommu init fail in tboot for security
reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off".

Fix the code setting force_on and move intel_iommu_tboot_noforce
from tboot code to intel iommu code.

Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on")
Signed-off-by: Zhenzhong Duan 
---
v2: move ckeck of intel_iommu_tboot_noforce into iommu code per Baolu.

  arch/x86/kernel/tboot.c | 3 ---
  drivers/iommu/intel/iommu.c | 5 +++--
  include/linux/intel-iommu.h | 1 -
  3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 992fb14..420be87 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -514,9 +514,6 @@ int tboot_force_iommu(void)
if (!tboot_enabled())
return 0;
  
-	if (intel_iommu_tboot_noforce)

-   return 1;
-
if (no_iommu || swiotlb || dmar_disabled)
pr_warn("Forcing Intel-IOMMU to enabled\n");
  
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c

index 1b1ca63..4d9b298 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -179,7 +179,7 @@ static inline unsigned long virt_to_dma_pfn(void *p)
   * (used when kernel is launched w/ TXT)
   */
  static int force_on = 0;
-int intel_iommu_tboot_noforce;
+static int intel_iommu_tboot_noforce;
  static int no_platform_optin;
  
  #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))

@@ -4885,7 +4885,8 @@ int __init intel_iommu_init(void)
 * Intel IOMMU is required for a TXT/tboot launch or platform
 * opt in, so enforce that.
 */
-   force_on = tboot_force_iommu() || platform_optin_force_iommu();
+   force_on = (!intel_iommu_tboot_noforce && tboot_force_iommu()) ||
+   platform_optin_force_iommu();
  
  	if (iommu_init_mempool()) {

if (force_on)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fbf5b3e..d956987 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -798,7 +798,6 @@ struct context_entry *iommu_context_addr(struct intel_iommu 
*iommu, u8 bus,
  extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu);
  extern int dmar_disabled;
  extern int intel_iommu_enabled;
-extern int intel_iommu_tboot_noforce;
  extern int intel_iommu_gfx_mapped;
  #else
  static inline int iommu_calculate_agaw(struct intel_iommu *iommu)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-17 Thread Ricardo Ribalda
Hi Christoph

I have been testing with real hardware on arm64 your patchset. And uvc
performs 20 times better using Kieran's test

https://github.com/ribalda/linux/tree/uvc-noncontiguous

These are the result of running   yavta --capture=1000


dma_alloc_noncontiguous

frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 78466000 : duration 33303
FPS: 29.99
URB: 418105/5000 uS/qty: 83.621 avg 98.783 std 17.396 min 1264.688 max (uS)
header: 100040/5000 uS/qty: 20.008 avg 19.458 std 2.969 min 454.167 max (uS)
latency: 347653/5000 uS/qty: 69.530 avg 98.937 std 9.114 min 1256.875 max (uS)
decode: 70452/5000 uS/qty: 14.090 avg 11.547 std 6.146 min 271.510 max (uS)
raw decode speed: 8.967 Gbits/s
raw URB handling speed: 1.501 Gbits/s
throughput: 18.848 Mbits/s
URB decode CPU usage 0.211500 %


usb_alloc_coherent

frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 70501712 : duration 33319
FPS: 29.98
URB: 1854128/5000 uS/qty: 370.825 avg 417.133 std 14.539 min 2875.760 max (uS)
header: 98765/5000 uS/qty: 19.753 avg 30.714 std 1.042 min 573.463 max (uS)
latency: 453316/5000 uS/qty: 90.663 avg 114.987 std 4.065 min 860.795 max (uS)
decode: 1400811/5000 uS/qty: 280.162 avg 330.786 std 6.305 min 2758.202 max (uS)
raw decode speed: 402.866 Mbits/s
raw URB handling speed: 304.214 Mbits/s
throughput: 16.927 Mbits/s
URB decode CPU usage 4.204200 %


Best regards

On Tue, Nov 10, 2020 at 10:57 AM Christoph Hellwig  wrote:
>
> On Tue, Nov 10, 2020 at 06:50:32PM +0900, Tomasz Figa wrote:
> > In what terms it doesn't actually work? Last time I checked some
> > platforms actually defined CONFIG_DMA_NONCOHERENT, so those would
> > instead use the kmalloc() + dma_map() path. I don't have any
> > background on why that was added and whether it needs to be preserved,
> > though. Kieran, Laurent, do you have any insight?
>
> CONFIG_DMA_NONCOHERENT is set on sh and mips for platforms that may
> support non-coherent DMA at compile time (but at least for mips that
> doesn't actually means this gets used).  Using that ifdef to decide
> on using usb_alloc_coherent vs letting the usb layer map the data
> seems at best odd, and if we are unlucky papering over a bug somewhere.



-- 
Ricardo Ribalda
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU Maintainership

2020-11-17 Thread Will Deacon
On Tue, Nov 17, 2020 at 11:09:53AM +0100, Joerg Roedel wrote:
> Luckily Will Deacon volunteered to handle incoming IOMMU patches and
> send them upstream. So please Cc him on any patches that you want to
> have merged upstream for the next release and on important fixes for
> v5.10. The patches will go through another tree for the time being, Will
> can share the details on that.

Thanks Joerg, and please try to get some rest.

As for the temporary new workflow; I'll be queueing IOMMU patches on
branches in the arm64 tree and merging them into a non-stable branch
(for-next/iommu/core) which will go into -next. I'll send a separate pull
for the IOMMU bits when the time comes.

If you have IOMMU patches targetting 5.10 or 5.11, then please CC me to
make sure I don't miss them. Alex, Robin and Lu have also offered to help
with review and I can pull from them too.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: fix return error code in iommu_probe_device()

2020-11-17 Thread Will Deacon
On Tue, Nov 17, 2020 at 07:11:28PM +0800, Yang Yingliang wrote:
> On 2020/11/17 17:40, Lu Baolu wrote:
> > On 2020/11/17 10:52, Yang Yingliang wrote:
> > > If iommu_group_get() failed, it need return error code
> > > in iommu_probe_device().
> > > 
> > > Fixes: cf193888bfbd ("iommu: Move new probe_device path...")
> > > Reported-by: Hulk Robot 
> > > Signed-off-by: Yang Yingliang 
> > > ---
> > >   drivers/iommu/iommu.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index b53446bb8c6b..6f4a32df90f6 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -253,8 +253,10 @@ int iommu_probe_device(struct device *dev)
> > >   goto err_out;
> > >     group = iommu_group_get(dev);
> > > -    if (!group)
> > > +    if (!group) {
> > > +    ret = -ENODEV;
> > 
> > Can you please explain why you use -ENODEV here?
> 
> Before 79659190ee97 ("iommu: Don't take group reference in
> iommu_alloc_default_domain()"), in
> 
> iommu_alloc_default_domain(), if group is NULL, it will return -ENODEV.

Hmm. While I think the patch is ok, I'm not sure it qualifies as a fix.
Has iommu_probe_device() ever propagated this error? The commit you
identify in the 'Fixes:' tag doesn't seem to change this afaict.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu