RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-10 Thread Tian, Kevin
> From: David Gibson 
> Sent: Tuesday, August 10, 2021 12:48 PM
> 
> On Mon, Aug 09, 2021 at 08:34:06AM +, Tian, Kevin wrote:
> > > From: David Gibson 
> > > Sent: Friday, August 6, 2021 12:45 PM
> > > > > > In concept I feel the purpose of DMA endpoint is equivalent to the
> > > routing
> > > > > > info in this proposal.
> > > > >
> > > > > Maybe?  I'm afraid I never quite managed to understand the role of
> the
> > > > > routing info in your proposal.
> > > > >
> > > >
> > > > the IOMMU routes incoming DMA packets to a specific I/O page table,
> > > > according to RID or RID+PASID carried in the packet. RID or RID+PASID
> > > > is the routing information (represented by device cookie +PASID in
> > > > proposed uAPI) and what the iommu driver really cares when activating
> > > > the I/O page table in the iommu.
> > >
> > > Ok, so yes, endpoint is roughly equivalent to that.  But my point is
> > > that the IOMMU layer really only cares about that (device+routing)
> > > combination, not other aspects of what the device is.  So that's the
> > > concept we should give a name and put front and center in the API.
> > >
> >
> > This is how this proposal works, centered around the routing info. the
> > uAPI doesn't care what the device is. It just requires the user to specify
> > the user view of routing info (device fd + optional pasid) to tag an IOAS.
> 
> Which works as long as (just device fd) and (device fd + PASID) covers
> all the options.  If we have new possibilities we need new interfaces.
> And, that can't even handle the case of one endpoint for multiple
> devices (e.g. ACS-incapable bridge).

why not? We have went through a long debate in v1 to reach conclusion
that a device-centric uAPI can cover above group scenario (see section 1.3)

> 
> > the user view is then converted to the kernel view of routing (rid or
> > rid+pasid) by vfio driver and then passed to iommu fd in the attaching
> > operation. and GET_INFO interface is provided for the user to check
> > whether a device supports multiple IOASes and whether pasid space is
> > delegated to the user. We just need a better name if pasid is considered
> > too pci specific...
> >
> > But creating an endpoint per ioasid and making it centered in uAPI is not
> > what the IOMMU layer cares about.
> 
> It's not an endpoint per ioasid.  You can have multiple endpoints per
> ioasid, just not the other way around.  As it is multiple IOASes per

you need create one endpoint per device fd to attach to gpa_ioasid.

then one endpoint per device fd to attach to pasidtbl_ioasid on arm/amd.

then one endpoint per pasid to attach to gva_ioasid on intel.

In the end you just create one endpoint per each attached ioasid given 
a device.

> device means *some* sort of disambiguation (generally by PASID) which
> is hard to describe generall.  Having endpoints as a first-class
> concept makes that simpler.
> 

I don't think pasid causes any disambiguation (except the name itself
being pci specific). with multiple IOASes you always need an id to tag it. 
This id is what iommu layer cares about. which endpoint on the device 
uses the id is not a business to iommu.

Thanks
Kevin

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-10 Thread Sai Prakash Ranjan

On 2021-08-03 11:36, Sai Prakash Ranjan wrote:

On 2021-08-02 21:42, Will Deacon wrote:

On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote:
Some clocks for SMMU can have parent as XO such as 
gpu_cc_hub_cx_int_clk
of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states 
in
such cases, we would need to drop the XO clock vote in unprepare call 
and
this unprepare callback for XO is in RPMh (Resource Power 
Manager-Hardened)
clock driver which controls RPMh managed clock resources for new QTI 
SoCs

and is a blocking call.

Given we cannot have a sleeping calls such as clk_bulk_prepare() and
clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
operations like map and unmap can be in atomic context and are in 
fast
path, add this prepare and unprepare call to drop the XO vote only 
for
system pm callbacks since it is not a fast path and we expect the 
system

to enter deep sleep states with system pm as opposed to runtime pm.

This is a similar sequence of clock requests (prepare,enable and
disable,unprepare) in arm-smmu probe and remove.

Signed-off-by: Sai Prakash Ranjan 
Co-developed-by: Rajendra Nayak 
Signed-off-by: Rajendra Nayak 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)


[+Rob]

How does this work with that funny GPU which writes to the SMMU 
registers
directly? Does the SMMU need to remain independently clocked for that 
to

work or is it all in the same clock domain?



As Rob mentioned, device link should take care of all the dependencies 
between

SMMU and its consumers. But not sure how the question relates to this
patch as this
change is for system pm and not runtime pm, so it is exactly the 
sequence of

SMMU probe/remove which if works currently for that GPU SMMU, then it
should work
just fine for system suspend and resume as well.

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d3c6f54110a5..9561ba4c5d39 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2277,6 +2277,13 @@ static int __maybe_unused 
arm_smmu_runtime_suspend(struct device *dev)


 static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
 {
+   int ret;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   ret = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (ret)
+   return ret;
+
if (pm_runtime_suspended(dev))
return 0;


If we subsequently fail to enable the clks in 
arm_smmu_runtime_resume()

should we unprepare them again?



If we are unable to turn on the clks then its fatal and we will not
live for long.



Nonetheless, it won't hurt to unprepare if clk enable fails as that is
the correct thing anyway, so I have added it and sent a v2.

Thanks,
Sai




Will

@@ -2285,10 +2292,19 @@ static int __maybe_unused 
arm_smmu_pm_resume(struct device *dev)


 static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
 {
+   int ret = 0;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
if (pm_runtime_suspended(dev))
-   return 0;
+   goto clk_unprepare;

-   return arm_smmu_runtime_suspend(dev);
+   ret = arm_smmu_runtime_suspend(dev);
+   if (ret)
+   return ret;
+
+clk_unprepare:
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+   return ret;
 }

 static const struct dev_pm_ops arm_smmu_pm_ops = {
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [iommu:apple/dart 3/3] drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t (*)(struct iommu_domain *, long unsigned int, size_t, struct iommu_iotlb_gather *)' {aka 'long unsigned

2021-08-10 Thread Sven Peter via iommu
Hi Joerg,

This happens because apple/dart is missing the "Optimizing iommu_[map/unmap] 
performance"
series which is already in the core branch [1].
The same commit works fine in iommu/next since that branch merges both 
iommu/core and
apple/dart.

Thanks,

Sven

[1] 
https://lore.kernel.org/lkml/1623850736-389584-1-git-send-email-quic_c_gdj...@quicinc.com/

On Tue, Aug 10, 2021, at 02:12, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> apple/dart
> head:   05ce9d20d699b093dec985192a7db63b48f26ca2
> commit: 05ce9d20d699b093dec985192a7db63b48f26ca2 [3/3] iommu/dart: Add 
> DART iommu driver
> config: sparc-allyesconfig (attached as .config)
> compiler: sparc64-linux-gcc (GCC) 10.3.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?id=05ce9d20d699b093dec985192a7db63b48f26ca2
> git remote add iommu 
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
> git fetch --no-tags iommu apple/dart
> git checkout 05ce9d20d699b093dec985192a7db63b48f26ca2
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross 
> ARCH=sparc 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All errors (new ones prefixed by >>):
> 
>drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages':
>drivers/iommu/apple-dart.c:380:12: error: 'struct io_pgtable_ops' 
> has no member named 'map_pages'
>  380 |  return ops->map_pages(ops, iova, paddr, pgsize, pgcount, 
> prot, gfp,
>  |^~
>drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages':
>drivers/iommu/apple-dart.c:392:12: error: 'struct io_pgtable_ops' 
> has no member named 'unmap_pages'
>  392 |  return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
>  |^~
>drivers/iommu/apple-dart.c: At top level:
>drivers/iommu/apple-dart.c:729:3: error: 'const struct iommu_ops' 
> has no member named 'map_pages'
>  729 |  .map_pages = apple_dart_map_pages,
>  |   ^
>drivers/iommu/apple-dart.c:729:15: error: initialization of 'int 
> (*)(struct iommu_domain *, long unsigned int,  phys_addr_t,  size_t,  
> int,  gfp_t)' {aka 'int (*)(struct iommu_domain *, long unsigned int,  
> long long unsigned int,  long unsigned int,  int,  unsigned int)'} from 
> incompatible pointer type 'int (*)(struct iommu_domain *, long unsigned 
> int,  phys_addr_t,  size_t,  size_t,  int,  gfp_t,  size_t *)' {aka 
> 'int (*)(struct iommu_domain *, long unsigned int,  long long unsigned 
> int,  long unsigned int,  long unsigned int,  int,  unsigned int,  long 
> unsigned int *)'} [-Werror=incompatible-pointer-types]
>  729 |  .map_pages = apple_dart_map_pages,
>  |   ^~~~
>drivers/iommu/apple-dart.c:729:15: note: (near initialization for 
> 'apple_dart_iommu_ops.map')
>drivers/iommu/apple-dart.c:730:3: error: 'const struct iommu_ops' 
> has no member named 'unmap_pages'
>  730 |  .unmap_pages = apple_dart_unmap_pages,
>  |   ^~~
> >> drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t 
> (*)(struct iommu_domain *, long unsigned int,  size_t,  struct 
> iommu_iotlb_gather *)' {aka 'long unsigned int (*)(struct iommu_domain 
> *, long unsigned int,  long unsigned int,  struct iommu_iotlb_gather 
> *)'} from incompatible pointer type 'size_t (*)(struct iommu_domain *, 
> long unsigned int,  size_t,  size_t,  struct iommu_iotlb_gather *)' 
> {aka 'long unsigned int (*)(struct iommu_domain *, long unsigned int,  
> long unsigned int,  long unsigned int,  struct iommu_iotlb_gather *)'} 
> [-Werror=incompatible-pointer-types]
>  730 |  .unmap_pages = apple_dart_unmap_pages,
>  | ^~
>drivers/iommu/apple-dart.c:730:17: note: (near initialization for 
> 'apple_dart_iommu_ops.unmap')
>drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages':
>drivers/iommu/apple-dart.c:393:1: error: control reaches end of 
> non-void function [-Werror=return-type]
>  393 | }
>  | ^
>drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages':
>drivers/iommu/apple-dart.c:382:1: error: control reaches end of 
> non-void function [-Werror=return-type]
>  382 | }
>  | ^
>cc1: some warnings being treated as errors
> 
> 
> vim +730 drivers/iommu/apple-dart.c
> 
>723
>724static const struct iommu_ops apple_dart_iommu_ops = {
>725.domain_alloc = apple_dart_domain_alloc,
>726.domain_free = apple_dart_domain_free,
>727.attach_dev = 

[PATCHv2] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-10 Thread Sai Prakash Ranjan
Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk
of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in
such cases, we would need to drop the XO clock vote in unprepare call and
this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened)
clock driver which controls RPMh managed clock resources for new QTI SoCs.

Given we cannot have a sleeping calls such as clk_bulk_prepare() and
clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
operations like map and unmap can be in atomic context and are in fast
path, add this prepare and unprepare call to drop the XO vote only for
system pm callbacks since it is not a fast path and we expect the system
to enter deep sleep states with system pm as opposed to runtime pm.

This is a similar sequence of clock requests (prepare,enable and
disable,unprepare) in arm-smmu probe and remove.

Signed-off-by: Sai Prakash Ranjan 
Co-developed-by: Rajendra Nayak 
Signed-off-by: Rajendra Nayak 
---

Changes in v2:
 * Add clk unprepare when clk enable fails in resume (Will)

---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d3c6f54110a5..da8ef9d82d79 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2277,18 +2277,38 @@ static int __maybe_unused 
arm_smmu_runtime_suspend(struct device *dev)
 
 static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
 {
+   int ret;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   ret = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (ret)
+   return ret;
+
if (pm_runtime_suspended(dev))
return 0;
 
-   return arm_smmu_runtime_resume(dev);
+   ret = arm_smmu_runtime_resume(dev);
+   if (ret)
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
+   return ret;
 }
 
 static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
 {
+   int ret = 0;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
if (pm_runtime_suspended(dev))
-   return 0;
+   goto clk_unprepare;
 
-   return arm_smmu_runtime_suspend(dev);
+   ret = arm_smmu_runtime_suspend(dev);
+   if (ret)
+   return ret;
+
+clk_unprepare:
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+   return ret;
 }
 
 static const struct dev_pm_ops arm_smmu_pm_ops = {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-10 Thread Eric Auger
Hi Kevin,

On 8/5/21 2:36 AM, Tian, Kevin wrote:
>> From: Eric Auger 
>> Sent: Wednesday, August 4, 2021 11:59 PM
>>
> [...] 
>>> 1.2. Attach Device to I/O address space
>>> +++
>>>
>>> Device attach/bind is initiated through passthrough framework uAPI.
>>>
>>> Device attaching is allowed only after a device is successfully bound to
>>> the IOMMU fd. User should provide a device cookie when binding the
>>> device through VFIO uAPI. This cookie is used when the user queries
>>> device capability/format, issues per-device iotlb invalidation and
>>> receives per-device I/O page fault data via IOMMU fd.
>>>
>>> Successful binding puts the device into a security context which isolates
>>> its DMA from the rest system. VFIO should not allow user to access the
>> s/from the rest system/from the rest of the system
>>> device before binding is completed. Similarly, VFIO should prevent the
>>> user from unbinding the device before user access is withdrawn.
>> With Intel scalable IOV, I understand you could assign an RID/PASID to
>> one VM and another one to another VM (which is not the case for ARM). Is
>> it a targetted use case?How would it be handled? Is it related to the
>> sub-groups evoked hereafter?
> Not related to sub-group. Each mdev is bound to the IOMMU fd respectively
> with the defPASID which represents the mdev.
But how does it work in term of security. The device (RID) is bound to
an IOMMU fd. But then each SID/PASID may be working for a different VM.
How do you detect this is safe as each SID can work safely for a
different VM versus the ARM case where it is not possible.

1.3 says
"

1)  A successful binding call for the first device in the group creates
the security context for the entire group, by:
"
What does it mean for above scalable IOV use case?

>
>> Actually all devices bound to an IOMMU fd should have the same parent
>> I/O address space or root address space, am I correct? If so, maybe add
>> this comment explicitly?
> in most cases yes but it's not mandatory. multiple roots are allowed
> (e.g. with vIOMMU but no nesting).
OK, right, this corresponds to example 4.2 for example. I misinterpreted
the notion of security context. The security context does not match the
IOMMU fd but is something implicit created on 1st device binding.
>
> [...]
>>> The device in the /dev/iommu context always refers to a physical one
>>> (pdev) which is identifiable via RID. Physically each pdev can support
>>> one default I/O address space (routed via RID) and optionally multiple
>>> non-default I/O address spaces (via RID+PASID).
>>>
>>> The device in VFIO context is a logic concept, being either a physical
>>> device (pdev) or mediated device (mdev or subdev). Each vfio device
>>> is represented by RID+cookie in IOMMU fd. User is allowed to create
>>> one default I/O address space (routed by vRID from user p.o.v) per
>>> each vfio_device.
>> The concept of default address space is not fully clear for me. I
>> currently understand this is a
>> root address space (not nesting). Is that coorect.This may need
>> clarification.
> w/o PASID there is only one address space (either GPA or GIOVA)
> per device. This one is called default. whether it's root is orthogonal
> (e.g. GIOVA could be also nested) to the device view of this space.
>
> w/ PASID additional address spaces can be targeted by the device.
> those are called non-default.
>
> I could also rename default to RID address space and non-default to 
> RID+PASID address space if doing so makes it clearer.
Yes I think it is worth having a kind of glossary and defining root as,
default as as you clearly defined child/parent.
>
>>> VFIO decides the routing information for this default
>>> space based on device type:
>>>
>>> 1)  pdev, routed via RID;
>>>
>>> 2)  mdev/subdev with IOMMU-enforced DMA isolation, routed via
>>> the parent's RID plus the PASID marking this mdev;
>>>
>>> 3)  a purely sw-mediated device (sw mdev), no routing required i.e. no
>>> need to install the I/O page table in the IOMMU. sw mdev just uses
>>> the metadata to assist its internal DMA isolation logic on top of
>>> the parent's IOMMU page table;
>> Maybe you should introduce this concept of SW mediated device earlier
>> because it seems to special case the way the attach behaves. I am
>> especially refering to
>>
>> "Successful attaching activates an I/O address space in the IOMMU, if the
>> device is not purely software mediated"
> makes sense.
>
>>> In addition, VFIO may allow user to create additional I/O address spaces
>>> on a vfio_device based on the hardware capability. In such case the user
>>> has its own view of the virtual routing information (vPASID) when marking
>>> these non-default address spaces.
>> I do not catch what does mean "marking these non default address space".
> as explained above, those non-default address spaces are identified/routed
> via PASID. 
>
>>> 1.3. Group isolation
>>> 

Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-10 Thread Yongji Xie
On Tue, Aug 10, 2021 at 11:02 AM Jason Wang  wrote:
>
>
> 在 2021/8/9 下午1:56, Yongji Xie 写道:
> > On Thu, Aug 5, 2021 at 9:31 PM Jason Wang  wrote:
> >>
> >> 在 2021/8/5 下午8:34, Yongji Xie 写道:
>  My main point, though, is that if you've already got something else
>  keeping track of the actual addresses, then the way you're using an
>  iova_domain appears to be something you could do with a trivial bitmap
>  allocator. That's why I don't buy the efficiency argument. The main
>  design points of the IOVA allocator are to manage large address spaces
>  while trying to maximise spatial locality to minimise the underlying
>  pagetable usage, and allocating with a flexible limit to support
>  multiple devices with different addressing capabilities in the same
>  address space. If none of those aspects are relevant to the use-case -
>  which AFAICS appears to be true here - then as a general-purpose
>  resource allocator it's rubbish and has an unreasonably massive memory
>  overhead and there are many, many better choices.
> 
> >>> OK, I get your point. Actually we used the genpool allocator in the
> >>> early version. Maybe we can fall back to using it.
> >>
> >> I think maybe you can share some perf numbers to see how much
> >> alloc_iova_fast() can help.
> >>
> > I did some fio tests[1] with a ram-backend vduse block device[2].
> >
> > Following are some performance data:
> >
> >  numjobs=1   numjobs=2numjobs=4   numjobs=8
> > iova_alloc_fast145k iops  265k iops  514k iops  758k iops
> >
> > iova_alloc137k iops 170k iops  128k iops  113k iops
> >
> > gen_pool_alloc   143k iops  270k iops  458k iops  521k iops
> >
> > The iova_alloc_fast() has the best performance since we always hit the
> > per-cpu cache. Regardless of the per-cpu cache, the genpool allocator
> > should be better than the iova allocator.
>
>
> I think we see convincing numbers for using iova_alloc_fast() than the
> gen_poll_alloc() (45% improvement on job=8).
>

Yes, so alloc_iova_fast() still seems to be the best choice based on
performance considerations.

Hi Robin, any comments?

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 08/13] memory: mtk-smi: Add clocks for smi-sub-common

2021-08-10 Thread Yong Wu
SMI sub common only have one output port. thus it has only one gals
clocks(gals0). then, smi-sub-common require the three clocks(apb/smi/gals0)
in has_gals case.

Signed-off-by: Yong Wu 
---
 drivers/memory/mtk-smi.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 5c2bd5795cfd..58d9f7667490 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -74,10 +74,12 @@ static const char * const mtk_smi_larb_clks[] = {"apb", 
"smi", "gals"};
 
 /*
  * common: Require these four clocks in has_gals case. Otherwise, only apb/smi 
are required.
+ * sub common: Require apb/smi/gals0 clocks in has_gals case. Otherwise, only 
apb/smi are required.
  */
 static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", 
"gals1"};
 #define MTK_SMI_COM_REQ_CLK_NR 2
 #define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX
+#define MTK_SMI_SUB_COM_GALS_REQ_CLK_NR 3
 
 struct mtk_smi_common_plat {
enum mtk_smi_type   type;
@@ -467,8 +469,12 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
common->dev = dev;
common->plat = of_device_get_match_data(dev);
 
-   if (common->plat->has_gals)
-   clk_required = MTK_SMI_COM_GALS_REQ_CLK_NR;
+   if (common->plat->has_gals) {
+   if (common->plat->type == MTK_SMI_GEN2)
+   clk_required = MTK_SMI_COM_GALS_REQ_CLK_NR;
+   else if (common->plat->type == MTK_SMI_GEN2_SUB_COMM)
+   clk_required = MTK_SMI_SUB_COM_GALS_REQ_CLK_NR;
+   }
ret = mtk_smi_dts_clk_init(dev, common, mtk_smi_common_clks, 
clk_required, 0);
if (ret)
return ret;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 09/13] memory: mtk-smi: Use devm_platform_ioremap_resource

2021-08-10 Thread Yong Wu
No functional change. Simplify probing code.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 58d9f7667490..a001e41f5074 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -328,7 +328,6 @@ static int mtk_smi_dts_clk_init(struct device *dev, struct 
mtk_smi *smi,
 static int mtk_smi_larb_probe(struct platform_device *pdev)
 {
struct mtk_smi_larb *larb;
-   struct resource *res;
struct device *dev = >dev;
int ret;
 
@@ -337,8 +336,7 @@ static int mtk_smi_larb_probe(struct platform_device *pdev)
return -ENOMEM;
 
larb->larb_gen = of_device_get_match_data(dev);
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   larb->base = devm_ioremap_resource(dev, res);
+   larb->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(larb->base))
return PTR_ERR(larb->base);
 
@@ -460,7 +458,6 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
 {
struct device *dev = >dev;
struct mtk_smi *common;
-   struct resource *res;
int ret, clk_required = MTK_SMI_COM_REQ_CLK_NR;
 
common = devm_kzalloc(dev, sizeof(*common), GFP_KERNEL);
@@ -486,8 +483,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
 * base.
 */
if (common->plat->type == MTK_SMI_GEN1) {
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   common->smi_ao_base = devm_ioremap_resource(dev, res);
+   common->smi_ao_base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(common->smi_ao_base))
return PTR_ERR(common->smi_ao_base);
 
@@ -499,8 +495,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
if (ret)
return ret;
} else {
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   common->base = devm_ioremap_resource(dev, res);
+   common->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(common->base))
return PTR_ERR(common->base);
}
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/13] memory: mtk-smi: Use clk_bulk clock ops

2021-08-10 Thread Yong Wu
Use clk_bulk interface instead of the orginal one to simplify the code.

For SMI larbs: Require apb/smi clocks while gals is optional.
For SMI common: Require apb/smi/gals0/gal1 in has_gals case. Otherwise,
also only require apb/smi, No optional clk here.

About the "has_gals" flag, for smi larbs, the gals clock also may be
optional even this platform support it. thus it always use
*_bulk_get_optional, then the flag has_gals is unnecessary. Remove it.
The smi_common's has_gals still keep it.

Also remove clk fail logs since bulk interface already output fail log.

Signed-off-by: Yong Wu 
---
change note:
still keep smi-common's has_glas flag. it is more strict.
But it did change many code, thus I didn't keep Ikjoon's R-b.
---
 drivers/memory/mtk-smi.c | 143 +++
 1 file changed, 55 insertions(+), 88 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index c5fb51f73b34..f91eaf5c3ab0 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -60,6 +60,20 @@ enum mtk_smi_gen {
MTK_SMI_GEN2
 };
 
+#define MTK_SMI_CLK_NR_MAX 4
+
+/* larbs: Require apb/smi clocks while gals is optional. */
+static const char * const mtk_smi_larb_clks[] = {"apb", "smi", "gals"};
+#define MTK_SMI_LARB_REQ_CLK_NR2
+#define MTK_SMI_LARB_OPT_CLK_NR1
+
+/*
+ * common: Require these four clocks in has_gals case. Otherwise, only apb/smi 
are required.
+ */
+static const char * const mtk_smi_common_clks[] = {"apb", "smi", "gals0", 
"gals1"};
+#define MTK_SMI_COM_REQ_CLK_NR 2
+#define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX
+
 struct mtk_smi_common_plat {
enum mtk_smi_gen gen;
bool has_gals;
@@ -70,13 +84,12 @@ struct mtk_smi_larb_gen {
int port_in_larb[MTK_LARB_NR_MAX + 1];
void (*config_port)(struct device *dev);
unsigned intlarb_direct_to_common_mask;
-   boolhas_gals;
 };
 
 struct mtk_smi {
struct device   *dev;
-   struct clk  *clk_apb, *clk_smi;
-   struct clk  *clk_gals0, *clk_gals1;
+   unsigned intclk_num;
+   struct clk_bulk_dataclks[MTK_SMI_CLK_NR_MAX];
struct clk  *clk_async; /*only needed by mt2701*/
union {
void __iomem*smi_ao_base; /* only for gen1 */
@@ -95,45 +108,6 @@ struct mtk_smi_larb { /* larb: local arbiter */
unsigned char   *bank;
 };
 
-static int mtk_smi_clk_enable(const struct mtk_smi *smi)
-{
-   int ret;
-
-   ret = clk_prepare_enable(smi->clk_apb);
-   if (ret)
-   return ret;
-
-   ret = clk_prepare_enable(smi->clk_smi);
-   if (ret)
-   goto err_disable_apb;
-
-   ret = clk_prepare_enable(smi->clk_gals0);
-   if (ret)
-   goto err_disable_smi;
-
-   ret = clk_prepare_enable(smi->clk_gals1);
-   if (ret)
-   goto err_disable_gals0;
-
-   return 0;
-
-err_disable_gals0:
-   clk_disable_unprepare(smi->clk_gals0);
-err_disable_smi:
-   clk_disable_unprepare(smi->clk_smi);
-err_disable_apb:
-   clk_disable_unprepare(smi->clk_apb);
-   return ret;
-}
-
-static void mtk_smi_clk_disable(const struct mtk_smi *smi)
-{
-   clk_disable_unprepare(smi->clk_gals1);
-   clk_disable_unprepare(smi->clk_gals0);
-   clk_disable_unprepare(smi->clk_smi);
-   clk_disable_unprepare(smi->clk_apb);
-}
-
 int mtk_smi_larb_get(struct device *larbdev)
 {
int ret = pm_runtime_resume_and_get(larbdev);
@@ -270,7 +244,6 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt6779 = {
 };
 
 static const struct mtk_smi_larb_gen mtk_smi_larb_mt8183 = {
-   .has_gals   = true,
.config_port= mtk_smi_larb_config_port_gen2_general,
.larb_direct_to_common_mask = BIT(2) | BIT(3) | BIT(7),
  /* IPU0 | IPU1 | CCU */
@@ -312,6 +285,27 @@ static const struct of_device_id mtk_smi_larb_of_ids[] = {
{}
 };
 
+static int mtk_smi_dts_clk_init(struct device *dev, struct mtk_smi *smi,
+   const char * const clks[],
+   unsigned int clk_nr_required,
+   unsigned int clk_nr_optional)
+{
+   int i, ret;
+
+   for (i = 0; i < clk_nr_required; i++)
+   smi->clks[i].id = clks[i];
+   ret = devm_clk_bulk_get(dev, clk_nr_required, smi->clks);
+   if (ret)
+   return ret;
+
+   for (i = clk_nr_required; i < clk_nr_required + clk_nr_optional; i++)
+   smi->clks[i].id = clks[i];
+   ret = devm_clk_bulk_get_optional(dev, clk_nr_optional,
+smi->clks + clk_nr_required);
+   smi->clk_num = 

[PATCH v3 04/13] memory: mtk-smi: Rename smi_gen to smi_type

2021-08-10 Thread Yong Wu
Prepare for adding smi sub common. Only rename from smi_gen to smi_type.
No functional change.

About the current "smi_gen", we have gen1/gen2 that stand for the
generation number for HW. I plan to add a new type(sub_common), then the
name "gen" is not proper.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index f91eaf5c3ab0..02a584dfb9b1 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -55,7 +55,7 @@
 /* All are MMU0 defaultly. Only specialize mmu1 here. */
 #define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
 
-enum mtk_smi_gen {
+enum mtk_smi_type {
MTK_SMI_GEN1,
MTK_SMI_GEN2
 };
@@ -75,9 +75,9 @@ static const char * const mtk_smi_common_clks[] = {"apb", 
"smi", "gals0", "gals1
 #define MTK_SMI_COM_GALS_REQ_CLK_NRMTK_SMI_CLK_NR_MAX
 
 struct mtk_smi_common_plat {
-   enum mtk_smi_gen gen;
-   bool has_gals;
-   u32  bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */
+   enum mtk_smi_type   type;
+   boolhas_gals;
+   u32 bus_sel; /* Balance some larbs to enter mmu0 or 
mmu1 */
 };
 
 struct mtk_smi_larb_gen {
@@ -409,29 +409,29 @@ static struct platform_driver mtk_smi_larb_driver = {
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_gen1 = {
-   .gen = MTK_SMI_GEN1,
+   .type = MTK_SMI_GEN1,
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_gen2 = {
-   .gen = MTK_SMI_GEN2,
+   .type = MTK_SMI_GEN2,
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_mt6779 = {
-   .gen= MTK_SMI_GEN2,
-   .has_gals   = true,
-   .bus_sel= F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(4) |
- F_MMU1_LARB(5) | F_MMU1_LARB(6) | F_MMU1_LARB(7),
+   .type = MTK_SMI_GEN2,
+   .has_gals = true,
+   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(4) |
+   F_MMU1_LARB(5) | F_MMU1_LARB(6) | F_MMU1_LARB(7),
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_mt8183 = {
-   .gen  = MTK_SMI_GEN2,
+   .type = MTK_SMI_GEN2,
.has_gals = true,
.bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) |
F_MMU1_LARB(7),
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_mt8192 = {
-   .gen  = MTK_SMI_GEN2,
+   .type = MTK_SMI_GEN2,
.has_gals = true,
.bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) |
F_MMU1_LARB(6),
@@ -494,7 +494,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
 * clock into emi clock domain, but for mtk smi gen2, there's no smi ao
 * base.
 */
-   if (common->plat->gen == MTK_SMI_GEN1) {
+   if (common->plat->type == MTK_SMI_GEN1) {
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
common->smi_ao_base = devm_ioremap_resource(dev, res);
if (IS_ERR(common->smi_ao_base))
@@ -534,7 +534,7 @@ static int __maybe_unused mtk_smi_common_resume(struct 
device *dev)
if (ret)
return ret;
 
-   if (common->plat->gen == MTK_SMI_GEN2 && bus_sel)
+   if (common->plat->type == MTK_SMI_GEN2 && bus_sel)
writel(bus_sel, common->base + SMI_BUS_SEL);
return 0;
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/13] memory: mtk-smi: mt8195: Add smi support

2021-08-10 Thread Yong Wu
MT8195 has two smi-common, their IP are the same. Only the larbs that
connect with the smi-common are different. thus the bus_sel are different
for the two smi-common.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index a001e41f5074..35853ba980c9 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -261,6 +261,10 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt8192 = 
{
.config_port= mtk_smi_larb_config_port_gen2_general,
 };
 
+static const struct mtk_smi_larb_gen mtk_smi_larb_mt8195 = {
+   .config_port= mtk_smi_larb_config_port_gen2_general,
+};
+
 static const struct of_device_id mtk_smi_larb_of_ids[] = {
{.compatible = "mediatek,mt2701-smi-larb", .data = 
_smi_larb_mt2701},
{.compatible = "mediatek,mt2712-smi-larb", .data = 
_smi_larb_mt2712},
@@ -269,6 +273,7 @@ static const struct of_device_id mtk_smi_larb_of_ids[] = {
{.compatible = "mediatek,mt8173-smi-larb", .data = 
_smi_larb_mt8173},
{.compatible = "mediatek,mt8183-smi-larb", .data = 
_smi_larb_mt8183},
{.compatible = "mediatek,mt8192-smi-larb", .data = 
_smi_larb_mt8192},
+   {.compatible = "mediatek,mt8195-smi-larb", .data = 
_smi_larb_mt8195},
{}
 };
 
@@ -443,6 +448,24 @@ static const struct mtk_smi_common_plat 
mtk_smi_common_mt8192 = {
F_MMU1_LARB(6),
 };
 
+static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vdo = {
+   .type = MTK_SMI_GEN2,
+   .has_gals = true,
+   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(3) | F_MMU1_LARB(5) |
+   F_MMU1_LARB(7),
+};
+
+static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vpp = {
+   .type = MTK_SMI_GEN2,
+   .has_gals = true,
+   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(7),
+};
+
+static const struct mtk_smi_common_plat mtk_smi_sub_common_mt8195 = {
+   .type = MTK_SMI_GEN2_SUB_COMM,
+   .has_gals = true,
+};
+
 static const struct of_device_id mtk_smi_common_of_ids[] = {
{.compatible = "mediatek,mt2701-smi-common", .data = 
_smi_common_gen1},
{.compatible = "mediatek,mt2712-smi-common", .data = 
_smi_common_gen2},
@@ -451,6 +474,9 @@ static const struct of_device_id mtk_smi_common_of_ids[] = {
{.compatible = "mediatek,mt8173-smi-common", .data = 
_smi_common_gen2},
{.compatible = "mediatek,mt8183-smi-common", .data = 
_smi_common_mt8183},
{.compatible = "mediatek,mt8192-smi-common", .data = 
_smi_common_mt8192},
+   {.compatible = "mediatek,mt8195-smi-common-vdo", .data = 
_smi_common_mt8195_vdo},
+   {.compatible = "mediatek,mt8195-smi-common-vpp", .data = 
_smi_common_mt8195_vpp},
+   {.compatible = "mediatek,mt8195-smi-sub-common", .data = 
_smi_sub_common_mt8195},
{}
 };
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/13] memory: mtk-smi: mt8195: Add initial setting for smi-common

2021-08-10 Thread Yong Wu
To improve the performance, add initial setting for smi-common.
some register use some fix setting(suggested from DE).

Signed-off-by: Yong Wu 
---
 drivers/memory/mtk-smi.c | 42 
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 35853ba980c9..689a45b39a65 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -18,11 +18,19 @@
 #include 
 
 /* SMI COMMON */
+#define SMI_L1LEN  0x100
+
 #define SMI_BUS_SEL0x220
 #define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1)
 /* All are MMU0 defaultly. Only specialize mmu1 here. */
 #define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
 
+#define SMI_M4U_TH 0x234
+#define SMI_FIFO_TH1   0x238
+#define SMI_FIFO_TH2   0x23c
+#define SMI_DCM0x300
+#define SMI_DUMMY  0x444
+
 /* SMI LARB */
 
 /* Below are about mmu enable registers, they are different in SoCs */
@@ -59,6 +67,13 @@
(_id << 8 | _id << 10 | _id << 12 | _id << 14); \
 })
 
+#define SMI_COMMON_INIT_REGS_NR6
+
+struct mtk_smi_reg_pair {
+   unsigned intoffset;
+   u32 value;
+};
+
 enum mtk_smi_type {
MTK_SMI_GEN1,
MTK_SMI_GEN2,   /* gen2 smi common */
@@ -85,6 +100,8 @@ struct mtk_smi_common_plat {
enum mtk_smi_type   type;
boolhas_gals;
u32 bus_sel; /* Balance some larbs to enter mmu0 or 
mmu1 */
+
+   const struct mtk_smi_reg_pair   *init;
 };
 
 struct mtk_smi_larb_gen {
@@ -419,6 +436,15 @@ static struct platform_driver mtk_smi_larb_driver = {
}
 };
 
+static const struct mtk_smi_reg_pair 
mtk_smi_common_mt8195_init[SMI_COMMON_INIT_REGS_NR] = {
+   {SMI_L1LEN, 0xb},
+   {SMI_M4U_TH, 0xe100e10},
+   {SMI_FIFO_TH1, 0x506090a},
+   {SMI_FIFO_TH2, 0x506090a},
+   {SMI_DCM, 0x4f1},
+   {SMI_DUMMY, 0x1},
+};
+
 static const struct mtk_smi_common_plat mtk_smi_common_gen1 = {
.type = MTK_SMI_GEN1,
 };
@@ -453,12 +479,14 @@ static const struct mtk_smi_common_plat 
mtk_smi_common_mt8195_vdo = {
.has_gals = true,
.bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(3) | F_MMU1_LARB(5) |
F_MMU1_LARB(7),
+   .init = mtk_smi_common_mt8195_init,
 };
 
 static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vpp = {
.type = MTK_SMI_GEN2,
.has_gals = true,
.bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(7),
+   .init = mtk_smi_common_mt8195_init,
 };
 
 static const struct mtk_smi_common_plat mtk_smi_sub_common_mt8195 = {
@@ -551,15 +579,21 @@ static int mtk_smi_common_remove(struct platform_device 
*pdev)
 static int __maybe_unused mtk_smi_common_resume(struct device *dev)
 {
struct mtk_smi *common = dev_get_drvdata(dev);
-   u32 bus_sel = common->plat->bus_sel;
-   int ret;
+   const struct mtk_smi_reg_pair *init = common->plat->init;
+   u32 bus_sel = common->plat->bus_sel; /* default is 0 */
+   int ret, i;
 
ret = clk_bulk_prepare_enable(common->clk_num, common->clks);
if (ret)
return ret;
 
-   if (common->plat->type == MTK_SMI_GEN2 && bus_sel)
-   writel(bus_sel, common->base + SMI_BUS_SEL);
+   if (common->plat->type != MTK_SMI_GEN2)
+   return 0;
+
+   for (i = 0; i < SMI_COMMON_INIT_REGS_NR && init && init[i].offset; i++)
+   writel_relaxed(init[i].value, common->base + init[i].offset);
+
+   writel(bus_sel, common->base + SMI_BUS_SEL);
return 0;
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 12/13] memory: mtk-smi: mt8195: Add initial setting for smi-larb

2021-08-10 Thread Yong Wu
To improve the performance, We add some initial setting for smi larbs.
there are two part:
1), Each port has the special ostd(outstanding) value in each larb.
2), Two general settings for each larb.
   a. THRT_UPDATE: the value in bits[7:4] of 0x24 is not so good.
   The HW default is 4, and we expect it is 5, thus, add a flag to update
   it. This is only a DE recommendatory value, not a actual issue.
   The register name(THRT_CON) means: throttling control, and the field
   RD_NU_LMT means: Read Non-ultra commands limit.
   This change means update the Read non-ultra command from 4 to 5 here.

   b. SW_FLAG: Set 1 to the FLAG register. this is only for helping
   debug. We could confirm if the larb is reset from this value is 1 or 0.

In some SoC, this setting maybe changed dynamically for some special case
like 4K, and this initial setting is enough in mt8195.

Signed-off-by: Yong Wu 
---
 drivers/memory/mtk-smi.c | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 689a45b39a65..b883dcc0bbfa 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -32,6 +32,15 @@
 #define SMI_DUMMY  0x444
 
 /* SMI LARB */
+#define SMI_LARB_CMD_THRT_CON  0x24
+#define SMI_LARB_THRT_RD_NU_LMT_MSKGENMASK(7, 4)
+#define SMI_LARB_THRT_RD_NU_LMT(5 << 4)
+
+#define SMI_LARB_SW_FLAG   0x40
+#define SMI_LARB_SW_FLAG_1 0x1
+
+#define SMI_LARB_OSTDL_PORT0x200
+#define SMI_LARB_OSTDL_PORTx(id)   (SMI_LARB_OSTDL_PORT + (((id) & 0x1f) 
<< 2))
 
 /* Below are about mmu enable registers, they are different in SoCs */
 /* gen1: mt2701 */
@@ -68,6 +77,11 @@
 })
 
 #define SMI_COMMON_INIT_REGS_NR6
+#define SMI_LARB_PORT_NR_MAX   32
+
+#define MTK_SMI_FLAG_THRT_UPDATE   BIT(0)
+#define MTK_SMI_FLAG_SW_FLAG   BIT(1)
+#define MTK_SMI_CAPS(flags, _x)(!!((flags) & (_x)))
 
 struct mtk_smi_reg_pair {
unsigned intoffset;
@@ -108,6 +122,8 @@ struct mtk_smi_larb_gen {
int port_in_larb[MTK_LARB_NR_MAX + 1];
void (*config_port)(struct device *dev);
unsigned intlarb_direct_to_common_mask;
+   unsigned intflags_general;
+   const u8(*ostd)[SMI_LARB_PORT_NR_MAX];
 };
 
 struct mtk_smi {
@@ -224,12 +240,26 @@ static void mtk_smi_larb_config_port_mt8173(struct device 
*dev)
 static void mtk_smi_larb_config_port_gen2_general(struct device *dev)
 {
struct mtk_smi_larb *larb = dev_get_drvdata(dev);
-   u32 reg;
+   u32 reg, flags_general = larb->larb_gen->flags_general;
+   const u8 *larbostd = larb->larb_gen->ostd[larb->larbid];
int i;
 
if (BIT(larb->larbid) & larb->larb_gen->larb_direct_to_common_mask)
return;
 
+   if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_THRT_UPDATE)) {
+   reg = readl_relaxed(larb->base + SMI_LARB_CMD_THRT_CON);
+   reg &= ~SMI_LARB_THRT_RD_NU_LMT_MSK;
+   reg |= SMI_LARB_THRT_RD_NU_LMT;
+   writel_relaxed(reg, larb->base + SMI_LARB_CMD_THRT_CON);
+   }
+
+   if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_SW_FLAG))
+   writel_relaxed(SMI_LARB_SW_FLAG_1, larb->base + 
SMI_LARB_SW_FLAG);
+
+   for (i = 0; i < SMI_LARB_PORT_NR_MAX && larbostd && !!larbostd[i]; i++)
+   writel_relaxed(larbostd[i], larb->base + 
SMI_LARB_OSTDL_PORTx(i));
+
for_each_set_bit(i, (unsigned long *)larb->mmu, 32) {
reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i));
reg |= F_MMU_EN;
@@ -238,6 +268,51 @@ static void mtk_smi_larb_config_port_gen2_general(struct 
device *dev)
}
 }
 
+static const u8 mtk_smi_larb_mt8195_ostd[][SMI_LARB_PORT_NR_MAX] = {
+   [0] = {0x0a, 0xc, 0x22, 0x22, 0x01, 0x0a,}, /* larb0 */
+   [1] = {0x0a, 0xc, 0x22, 0x22, 0x01, 0x0a,}, /* larb1 */
+   [2] = {0x12, 0x12, 0x12, 0x12, 0x0a,},  /* ... */
+   [3] = {0x12, 0x12, 0x12, 0x12, 0x28, 0x28, 0x0a,},
+   [4] = {0x06, 0x01, 0x17, 0x06, 0x0a,},
+   [5] = {0x06, 0x01, 0x17, 0x06, 0x06, 0x01, 0x06, 0x0a,},
+   [6] = {0x06, 0x01, 0x06, 0x0a,},
+   [7] = {0x0c, 0x0c, 0x12,},
+   [8] = {0x0c, 0x0c, 0x12,},
+   [9] = {0x0a, 0x08, 0x04, 0x06, 0x01, 0x01, 0x10, 0x18, 0x11, 0x0a,
+   0x08, 0x04, 0x11, 0x06, 0x02, 0x06, 0x01, 0x11, 0x11, 0x06,},
+   [10] = {0x18, 0x08, 0x01, 0x01, 0x20, 0x12, 0x18, 0x06, 0x05, 0x10,
+   0x08, 0x08, 0x10, 0x08, 0x08, 0x18, 0x0c, 0x09, 0x0b, 0x0d,
+   0x0d, 0x06, 0x10, 0x10,},
+   [11] = {0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x01, 0x01, 0x01, 0x01,},
+   [12] = {0x09, 0x09, 0x05, 0x05, 0x0c, 0x18, 0x02, 0x02, 0x04, 0x02,},
+   [13] = {0x02, 0x02, 0x12, 0x12, 0x02, 0x02, 0x02, 0x02, 0x08, 0x01,},
+   

[PATCH v3 05/13] memory: mtk-smi: Adjust some code position

2021-08-10 Thread Yong Wu
No functional change. Only move the code position to make the code more
readable.
1. Put the register smi-common above smi-larb. Prepare to add some others
   register setting.
2. Put mtk_smi_larb_unbind around larb_bind.
3. Sort the SoC data alphabetically. and put them in one line as the
   current kernel allow it.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 188 ---
 1 file changed, 75 insertions(+), 113 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 02a584dfb9b1..33b6c5efe102 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -17,13 +17,16 @@
 #include 
 #include 
 
-/* mt8173 */
-#define SMI_LARB_MMU_EN0xf00
+/* SMI COMMON */
+#define SMI_BUS_SEL0x220
+#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1)
+/* All are MMU0 defaultly. Only specialize mmu1 here. */
+#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
 
-/* mt8167 */
-#define MT8167_SMI_LARB_MMU_EN 0xfc0
+/* SMI LARB */
 
-/* mt2701 */
+/* Below are about mmu enable registers, they are different in SoCs */
+/* gen1: mt2701 */
 #define REG_SMI_SECUR_CON_BASE 0x5c0
 
 /* every register control 8 port, register offset 0x4 */
@@ -41,20 +44,21 @@
 /* mt2701 domain should be set to 3 */
 #define SMI_SECUR_CON_VAL_DOMAIN(id)   (0x3 << id) & 0x7) << 2) + 1))
 
-/* mt2712 */
-#define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4))
-#define F_MMU_EN   BIT(0)
-#define BANK_SEL(id)   ({  \
+/* gen2: */
+/* mt8167 */
+#define MT8167_SMI_LARB_MMU_EN 0xfc0
+
+/* mt8173 */
+#define MT8173_SMI_LARB_MMU_EN 0xf00
+
+/* general */
+#define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4))
+#define F_MMU_EN   BIT(0)
+#define BANK_SEL(id)   ({  \
u32 _id = (id) & 0x3;   \
(_id << 8 | _id << 10 | _id << 12 | _id << 14); \
 })
 
-/* SMI COMMON */
-#define SMI_BUS_SEL0x220
-#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1)
-/* All are MMU0 defaultly. Only specialize mmu1 here. */
-#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
-
 enum mtk_smi_type {
MTK_SMI_GEN1,
MTK_SMI_GEN2
@@ -140,36 +144,16 @@ mtk_smi_larb_bind(struct device *dev, struct device 
*master, void *data)
return -ENODEV;
 }
 
-static void mtk_smi_larb_config_port_gen2_general(struct device *dev)
-{
-   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
-   u32 reg;
-   int i;
-
-   if (BIT(larb->larbid) & larb->larb_gen->larb_direct_to_common_mask)
-   return;
-
-   for_each_set_bit(i, (unsigned long *)larb->mmu, 32) {
-   reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i));
-   reg |= F_MMU_EN;
-   reg |= BANK_SEL(larb->bank[i]);
-   writel(reg, larb->base + SMI_LARB_NONSEC_CON(i));
-   }
-}
-
-static void mtk_smi_larb_config_port_mt8173(struct device *dev)
+static void
+mtk_smi_larb_unbind(struct device *dev, struct device *master, void *data)
 {
-   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
-
-   writel(*larb->mmu, larb->base + SMI_LARB_MMU_EN);
+   /* Do nothing as the iommu is always enabled. */
 }
 
-static void mtk_smi_larb_config_port_mt8167(struct device *dev)
-{
-   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
-
-   writel(*larb->mmu, larb->base + MT8167_SMI_LARB_MMU_EN);
-}
+static const struct component_ops mtk_smi_larb_component_ops = {
+   .bind = mtk_smi_larb_bind,
+   .unbind = mtk_smi_larb_unbind,
+};
 
 static void mtk_smi_larb_config_port_gen1(struct device *dev)
 {
@@ -202,26 +186,36 @@ static void mtk_smi_larb_config_port_gen1(struct device 
*dev)
}
 }
 
-static void
-mtk_smi_larb_unbind(struct device *dev, struct device *master, void *data)
+static void mtk_smi_larb_config_port_mt8167(struct device *dev)
 {
-   /* Do nothing as the iommu is always enabled. */
+   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
+
+   writel(*larb->mmu, larb->base + MT8167_SMI_LARB_MMU_EN);
 }
 
-static const struct component_ops mtk_smi_larb_component_ops = {
-   .bind = mtk_smi_larb_bind,
-   .unbind = mtk_smi_larb_unbind,
-};
+static void mtk_smi_larb_config_port_mt8173(struct device *dev)
+{
+   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
 
-static const struct mtk_smi_larb_gen mtk_smi_larb_mt8173 = {
-   /* mt8173 do not need the port in larb */
-   .config_port = mtk_smi_larb_config_port_mt8173,
-};
+   writel(*larb->mmu, larb->base + MT8173_SMI_LARB_MMU_EN);
+}
 
-static const struct mtk_smi_larb_gen mtk_smi_larb_mt8167 = {
-   /* mt8167 do not need the port in larb */
-   .config_port = mtk_smi_larb_config_port_mt8167,
-};
+static void 

[PATCH v3 06/13] memory: mtk-smi: Add error handle for smi_probe

2021-08-10 Thread Yong Wu
Add error handle while component_add fail.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 33b6c5efe102..b362d528944e 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -338,7 +338,15 @@ static int mtk_smi_larb_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
platform_set_drvdata(pdev, larb);
-   return component_add(dev, _smi_larb_component_ops);
+   ret = component_add(dev, _smi_larb_component_ops);
+   if (ret)
+   goto err_pm_disable;
+   return 0;
+
+err_pm_disable:
+   pm_runtime_disable(dev);
+   device_link_remove(dev, larb->smi_common_dev);
+   return ret;
 }
 
 static int mtk_smi_larb_remove(struct platform_device *pdev)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 07/13] memory: mtk-smi: Add device link for smi-sub-common

2021-08-10 Thread Yong Wu
In mt8195, there are some larbs connect with the smi-sub-common, then
connect with smi-common.

Before we create device link between smi-larb with smi-common. If we have
sub-common, we should use device link the smi-larb and smi-sub-common,
then use device link between the smi-sub-common with smi-common. This is
for enabling clock/power automatically.

Move the device link code to a new interface for reusing.

Signed-off-by: Yong Wu 
Reviewed-by: Ikjoon Jang 
---
 drivers/memory/mtk-smi.c | 75 +++-
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b362d528944e..5c2bd5795cfd 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -61,7 +61,8 @@
 
 enum mtk_smi_type {
MTK_SMI_GEN1,
-   MTK_SMI_GEN2
+   MTK_SMI_GEN2,   /* gen2 smi common */
+   MTK_SMI_GEN2_SUB_COMM,  /* gen2 smi sub common */
 };
 
 #define MTK_SMI_CLK_NR_MAX 4
@@ -99,13 +100,14 @@ struct mtk_smi {
void __iomem*smi_ao_base; /* only for gen1 */
void __iomem*base;/* only for gen2 */
};
+   struct device   *smi_common_dev; /* for sub common */
const struct mtk_smi_common_plat *plat;
 };
 
 struct mtk_smi_larb { /* larb: local arbiter */
struct mtk_smi  smi;
void __iomem*base;
-   struct device   *smi_common_dev;
+   struct device   *smi_common_dev; /* common or 
sub-common dev */
const struct mtk_smi_larb_gen   *larb_gen;
int larbid;
u32 *mmu;
@@ -268,6 +270,38 @@ static const struct of_device_id mtk_smi_larb_of_ids[] = {
{}
 };
 
+static int mtk_smi_device_link_common(struct device *dev, struct device 
**com_dev)
+{
+   struct platform_device *smi_com_pdev;
+   struct device_node *smi_com_node;
+   struct device *smi_com_dev;
+   struct device_link *link;
+
+   smi_com_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
+   if (!smi_com_node)
+   return -EINVAL;
+
+   smi_com_pdev = of_find_device_by_node(smi_com_node);
+   of_node_put(smi_com_node);
+   if (smi_com_pdev) {
+   /* smi common is the supplier, Make sure it is ready before */
+   if (!platform_get_drvdata(smi_com_pdev))
+   return -EPROBE_DEFER;
+   smi_com_dev = _com_pdev->dev;
+   link = device_link_add(dev, smi_com_dev,
+  DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
+   if (!link) {
+   dev_err(dev, "Unable to link smi-common dev\n");
+   return -ENODEV;
+   }
+   *com_dev = smi_com_dev;
+   } else {
+   dev_err(dev, "Failed to get the smi_common device\n");
+   return -EINVAL;
+   }
+   return 0;
+}
+
 static int mtk_smi_dts_clk_init(struct device *dev, struct mtk_smi *smi,
const char * const clks[],
unsigned int clk_nr_required,
@@ -294,9 +328,6 @@ static int mtk_smi_larb_probe(struct platform_device *pdev)
struct mtk_smi_larb *larb;
struct resource *res;
struct device *dev = >dev;
-   struct device_node *smi_node;
-   struct platform_device *smi_pdev;
-   struct device_link *link;
int ret;
 
larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL);
@@ -315,26 +346,10 @@ static int mtk_smi_larb_probe(struct platform_device 
*pdev)
return ret;
 
larb->smi.dev = dev;
-   smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
-   if (!smi_node)
-   return -EINVAL;
 
-   smi_pdev = of_find_device_by_node(smi_node);
-   of_node_put(smi_node);
-   if (smi_pdev) {
-   if (!platform_get_drvdata(smi_pdev))
-   return -EPROBE_DEFER;
-   larb->smi_common_dev = _pdev->dev;
-   link = device_link_add(dev, larb->smi_common_dev,
-  DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
-   if (!link) {
-   dev_err(dev, "Unable to link smi-common dev\n");
-   return -ENODEV;
-   }
-   } else {
-   dev_err(dev, "Failed to get the smi_common device\n");
-   return -EINVAL;
-   }
+   ret = mtk_smi_device_link_common(dev, >smi_common_dev);
+   if (ret < 0)
+   return ret;
 
pm_runtime_enable(dev);
platform_set_drvdata(pdev, larb);
@@ -483,6 +498,14 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
if (IS_ERR(common->base))
return PTR_ERR(common->base);

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-10 Thread Will Deacon
On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > > > like that to make it more clear that it is not necessarily something
> > > > > that would work with a different outer level cache implementation?
> > > >
> > > > ... or we could just deal with the problem so that other people can 
> > > > reuse
> > > > the code. I haven't really understood the reluctance to solve this 
> > > > properly.
> > > >
> > > > Am I missing some reason this isn't solvable?
> > > 
> > > Oh, was there another way to solve it (other than foregoing setting
> > > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> > > corresponding setting on the MMU pgtables side of things?
> > 
> > Right -- we just need to program the CPU's MMU with the matching memory
> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> > though, as it's usually the DMA API which handles the attributes under
> > the
> > hood.
> > 
> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> > this
> > sort of thing in the Android tree so I assumed Sai knew what needed to
> > be
> > done and then I didn't think to explain to you :(
> > 
> 
> Right I was aware of that but even in the android tree there is no user :)

I'm assuming there are vendor modules using it there, otherwise we wouldn't
have been asked to put it in. Since you work at Qualcomm, maybe you could
talk to your colleagues (Isaac and Patrick) directly?

> I think we can't have a new memory type without any user right in upstream
> like android tree?

Correct. But I don't think we should be adding IOMMU_* anything upstream
if we don't have a user.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-08-10 Thread John Garry

On 28/07/2021 16:17, Ming Lei wrote:

Have you tried turning off the IOMMU to ensure that this is really just
an IOMMU problem?

You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to
disabling for kernel drivers).

Bypassing SMMU via iommu.passthrough=1 basically doesn't make a difference
on this issue.

A ~90% throughput drop still seems to me to be too high to be a software
issue. More so since I don't see similar on my system. And that throughput
drop does not lead to a total CPU usage drop, from the fio log.

Do you know if anyone has run memory benchmark tests on this board to find
out NUMA effect? I think lmbench or stream could be used for this.

https://lore.kernel.org/lkml/YOhbc5C47IzC893B@T590/


Hi Ming,

Out of curiosity, did you investigate this topic any further?

And you also asked about my results earlier:

On 22/07/2021 16:54, Ming Lei wrote:
>> [   52.035895] nvme :81:00.0: Adding to iommu group 5
>> [   52.047732] nvme nvme0: pci function :81:00.0
>> [   52.067216] nvme nvme0: 22/0/2 default/read/poll queues
>> [   52.087318]  nvme0n1: p1
>>
>> So I get these results:
>> cpu0 335K
>> cpu32 346K
>> cpu64 300K
>> cpu96 300K
>>
>> So still not massive changes.
> In your last email, the results are the following with irq mode io_uring:
>
>   cpu0  497K
>   cpu4  307K
>   cpu32 566K
>   cpu64 488K
>   cpu96 508K
>
> So looks you get much worse result with real io_polling?
>

Would the expectation be that at least I get the same performance with 
io_polling here? Anything else to try which you can suggest to 
investigate this lower performance?


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-10 Thread Sai Prakash Ranjan

On 2021-08-10 14:46, Will Deacon wrote:

On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:

On 2021-08-09 23:10, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > > like that to make it more clear that it is not necessarily something
> > > > that would work with a different outer level cache implementation?
> > >
> > > ... or we could just deal with the problem so that other people can reuse
> > > the code. I haven't really understood the reluctance to solve this 
properly.
> > >
> > > Am I missing some reason this isn't solvable?
> >
> > Oh, was there another way to solve it (other than foregoing setting
> > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> > corresponding setting on the MMU pgtables side of things?
>
> Right -- we just need to program the CPU's MMU with the matching memory
> attributes! It's a bit more fiddly if you're just using ioremap_wc()
> though, as it's usually the DMA API which handles the attributes under
> the
> hood.
>
> Anyway, sorry, I should've said that explicitly earlier on. We've done
> this
> sort of thing in the Android tree so I assumed Sai knew what needed to
> be
> done and then I didn't think to explain to you :(
>

Right I was aware of that but even in the android tree there is no 
user :)


I'm assuming there are vendor modules using it there, otherwise we 
wouldn't
have been asked to put it in. Since you work at Qualcomm, maybe you 
could

talk to your colleagues (Isaac and Patrick) directly?



Right I will check with them regarding the vendor modules in android.

I think we can't have a new memory type without any user right in 
upstream

like android tree?


Correct. But I don't think we should be adding IOMMU_* anything 
upstream

if we don't have a user.



Agreed, once we have the fix for GPU crash I can continue further on 
using

this properly.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/13] MT8195 SMI support

2021-08-10 Thread Yong Wu
This patchset mainly adds SMI support for mt8195.

Comparing with the previous version, add two new functions:
a) add smi sub common
b) add initial setting for smi-common and smi-larb.

Change note:
v3:1) In the dt-binding:
   a. Change mediatek,smi type from phandle-array to phandle from Rob.
   b. Add a new bool property (mediatek,smi_sub_common)
  to indicate if this is smi-sub-common.
   2) Change the clock using bulk parting.
  keep the smi-common's has_gals flag. more strict.
   3) More comment about larb initial setting.
   4) Add a maintain patch
   
v2: 
https://lore.kernel.org/linux-mediatek/20210715121209.31024-1-yong...@mediatek.com/
rebase on v5.14-rc1
1) Adjust clk_bulk flow: use devm_clk_bulk_get for necessary clocks.
2) Add two new little patches:
   a) use devm_platform_ioremap_resource
   b) Add error handle for smi_probe

v1: 
https://lore.kernel.org/linux-mediatek/20210616114346.18812-1-yong...@mediatek.com/

Yong Wu (13):
  dt-bindings: memory: mediatek: Add mt8195 smi binding
  dt-bindings: memory: mediatek: Add mt8195 smi sub common
  memory: mtk-smi: Use clk_bulk clock ops
  memory: mtk-smi: Rename smi_gen to smi_type
  memory: mtk-smi: Adjust some code position
  memory: mtk-smi: Add error handle for smi_probe
  memory: mtk-smi: Add device link for smi-sub-common
  memory: mtk-smi: Add clocks for smi-sub-common
  memory: mtk-smi: Use devm_platform_ioremap_resource
  memory: mtk-smi: mt8195: Add smi support
  memory: mtk-smi: mt8195: Add initial setting for smi-common
  memory: mtk-smi: mt8195: Add initial setting for smi-larb
  MAINTAINERS: Add entry for MediaTek SMI

 .../mediatek,smi-common.yaml  |  36 +-
 .../memory-controllers/mediatek,smi-larb.yaml |   3 +
 MAINTAINERS   |   8 +
 drivers/memory/mtk-smi.c  | 596 ++
 4 files changed, 395 insertions(+), 248 deletions(-)

-- 
2.18.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/13] dt-bindings: memory: mediatek: Add mt8195 smi binding

2021-08-10 Thread Yong Wu
Add mt8195 smi supporting in the bindings.

In mt8195, there are two smi-common HW, one is for vdo(video output),
the other is for vpp(video processing pipe). They connect with different
smi-larbs, then some setting(bus_sel) is different. Differentiate them
with the compatible string.

Something like this:

IOMMU(VDO)  IOMMU(VPP)
   |   |
 SMI_COMMON_VDO  SMI_COMMON_VPP
  
  |  |   ...  |  | ...
larb0 larb2  ...larb1 larb3...

Signed-off-by: Yong Wu 
Acked-by: Rob Herring 
---
 .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 +-
 .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
index e87e4382807c..602592b6c3f5 100644
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
@@ -16,7 +16,7 @@ description: |
   MediaTek SMI have two generations of HW architecture, here is the list
   which generation the SoCs use:
   generation 1: mt2701 and mt7623.
-  generation 2: mt2712, mt6779, mt8167, mt8173, mt8183 and mt8192.
+  generation 2: mt2712, mt6779, mt8167, mt8173, mt8183, mt8192 and mt8195.
 
   There's slight differences between the two SMI, for generation 2, the
   register which control the iommu port is at each larb's register base. But
@@ -36,6 +36,8 @@ properties:
   - mediatek,mt8173-smi-common
   - mediatek,mt8183-smi-common
   - mediatek,mt8192-smi-common
+  - mediatek,mt8195-smi-common-vdo
+  - mediatek,mt8195-smi-common-vpp
 
   - description: for mt7623
 items:
@@ -98,6 +100,8 @@ allOf:
 - mediatek,mt6779-smi-common
 - mediatek,mt8183-smi-common
 - mediatek,mt8192-smi-common
+- mediatek,mt8195-smi-common-vdo
+- mediatek,mt8195-smi-common-vpp
 
 then:
   properties:
diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
index 2353f6cf3c80..eaeff1ada7f8 100644
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
@@ -24,6 +24,7 @@ properties:
   - mediatek,mt8173-smi-larb
   - mediatek,mt8183-smi-larb
   - mediatek,mt8192-smi-larb
+  - mediatek,mt8195-smi-larb
 
   - description: for mt7623
 items:
@@ -74,6 +75,7 @@ allOf:
 compatible:
   enum:
 - mediatek,mt8183-smi-larb
+- mediatek,mt8195-smi-larb
 
 then:
   properties:
@@ -108,6 +110,7 @@ allOf:
   - mediatek,mt6779-smi-larb
   - mediatek,mt8167-smi-larb
   - mediatek,mt8192-smi-larb
+  - mediatek,mt8195-smi-larb
 
 then:
   required:
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/13] dt-bindings: memory: mediatek: Add mt8195 smi sub common

2021-08-10 Thread Yong Wu
Add the binding for smi-sub-common. The SMI block diagram like this:

IOMMU
 |  |
  smi-common
  --
  |    |
 larb0   larb7   <-max is 8

The smi-common connects with smi-larb and IOMMU. The maximum larbs number
that connects with a smi-common is 8. If the engines number is over 8,
sometimes we use a smi-sub-common which is nearly same with smi-common.
It supports up to 8 input and 1 output(smi-common has 2 output)

Something like:

IOMMU
 |  |
  smi-common
  -
  |  |  ...
larb0  sub-common   ...   <-max is 8
  ---
   ||...   <-max is 8 too.
 larb2 larb5

We don't need extra SW setting for smi-sub-common, only the sub-common has
special clocks need to enable when the engines access dram.

If it is sub-common, it should have a "mediatek,smi" phandle to point to
its smi-common. meanwhile the sub-common only has one gals clock.

Additionally, add a new property "mediatek,smi_sub_common" for this
sub-common, this is needed in the IOMMU driver in which we create a device
link between smi-common and the IOMMU device. If we add the smi-sub-common
here, the IOMMU driver still need to find the smi-common device. thus,
add this bool property to indicate if it is sub-common.

Signed-off-by: Yong Wu 
---
change note:
a. change mediatek, smi type from phandle-array to phandle from Rob.
b. Add a new bool property (mediatek,smi_sub_common) to indicate this is
   smi-sub-common. the reason is as above.
---
 .../mediatek,smi-common.yaml  | 30 +++
 1 file changed, 30 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
index 602592b6c3f5..26bb9903864b 100644
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
@@ -38,6 +38,7 @@ properties:
   - mediatek,mt8192-smi-common
   - mediatek,mt8195-smi-common-vdo
   - mediatek,mt8195-smi-common-vpp
+  - mediatek,mt8195-smi-sub-common
 
   - description: for mt7623
 items:
@@ -67,6 +68,14 @@ properties:
 minItems: 2
 maxItems: 4
 
+  mediatek,smi:
+$ref: /schemas/types.yaml#/definitions/phandle
+description: a phandle to the smi-common node above. Only for sub-common.
+
+  mediatek,smi_sub_common:
+type: boolean
+description: Indicate if this is smi-sub-common.
+
 required:
   - compatible
   - reg
@@ -93,6 +102,27 @@ allOf:
 - const: smi
 - const: async
 
+  - if:  # only for sub common
+  properties:
+compatible:
+  contains:
+enum:
+  - mediatek,mt8195-smi-sub-common
+then:
+  required:
+- mediatek,smi
+- mediatek,smi_sub_common
+  properties:
+clock:
+  items:
+minItems: 3
+maxItems: 3
+clock-names:
+  items:
+- const: apb
+- const: smi
+- const: gals0
+
   - if:  # for gen2 HW that have gals
   properties:
 compatible:
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-10 Thread Tian, Kevin
> From: Eric Auger 
> Sent: Tuesday, August 10, 2021 3:17 PM
> 
> Hi Kevin,
> 
> On 8/5/21 2:36 AM, Tian, Kevin wrote:
> >> From: Eric Auger 
> >> Sent: Wednesday, August 4, 2021 11:59 PM
> >>
> > [...]
> >>> 1.2. Attach Device to I/O address space
> >>> +++
> >>>
> >>> Device attach/bind is initiated through passthrough framework uAPI.
> >>>
> >>> Device attaching is allowed only after a device is successfully bound to
> >>> the IOMMU fd. User should provide a device cookie when binding the
> >>> device through VFIO uAPI. This cookie is used when the user queries
> >>> device capability/format, issues per-device iotlb invalidation and
> >>> receives per-device I/O page fault data via IOMMU fd.
> >>>
> >>> Successful binding puts the device into a security context which isolates
> >>> its DMA from the rest system. VFIO should not allow user to access the
> >> s/from the rest system/from the rest of the system
> >>> device before binding is completed. Similarly, VFIO should prevent the
> >>> user from unbinding the device before user access is withdrawn.
> >> With Intel scalable IOV, I understand you could assign an RID/PASID to
> >> one VM and another one to another VM (which is not the case for ARM).
> Is
> >> it a targetted use case?How would it be handled? Is it related to the
> >> sub-groups evoked hereafter?
> > Not related to sub-group. Each mdev is bound to the IOMMU fd
> respectively
> > with the defPASID which represents the mdev.
> But how does it work in term of security. The device (RID) is bound to
> an IOMMU fd. But then each SID/PASID may be working for a different VM.
> How do you detect this is safe as each SID can work safely for a
> different VM versus the ARM case where it is not possible.

PASID is managed by the parent driver, which knows which PASID to be 
used given a mdev when later attaching it to an IOASID. 

> 
> 1.3 says
> "
> 
> 1)  A successful binding call for the first device in the group creates
> the security context for the entire group, by:
> "
> What does it mean for above scalable IOV use case?
> 

This is a good question (as Alex raised) which needs more explanation 
in next version:

https://lore.kernel.org/linux-iommu/20210712124150.2bf421d1.alex.william...@redhat.com/

In general we need provide different helpers for binding pdev/mdev/
sw mdev. 1.3 in v2 describes the behavior for pdev via iommu_register_
device(). for mdev a new helper (e.g. iommu_register_device_pasid()) 
is required and then the IOMMU-API will also provide a pasid variation 
for creating security context per pasid. sw mdev will also have its binding 
helper to indicate no routing info required in ioasid attaching.

Thanks
Kevin 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE

2021-08-10 Thread Robin Murphy

On 2021-08-09 21:45, Sven Peter wrote:



On Mon, Aug 9, 2021, at 19:41, Robin Murphy wrote:

On 2021-08-07 12:47, Sven Peter via iommu wrote:



On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:

On 2021-08-06 16:55, Sven Peter via iommu wrote:

@@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);

+	/*

+* If the IOMMU pagesize is larger than the CPU pagesize we will
+* very likely run into sgs with a physical address that is not aligned
+* to an IOMMU page boundary. Fall back to just mapping every entry
+* independently with __iommu_dma_map then.


Scatterlist segments often don't have nicely aligned ends, which is why
we already align things to IOVA granules in main loop here. I think in
principle we'd just need to move the non-IOVA-aligned part of the
address from sg->page to sg->offset in the temporary transformation for
the rest of the assumptions to hold. I don't blame you for being timid
about touching that, though - it took me 3 tries to get right when I
first wrote it...




I've spent some time with that code now and I think we cannot use it
but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb
part is a lie then):

When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr
is aligned to PAGE_SIZE but has an offset of 0x1000 from something
the IOMMU can map.
Now this would result in s->offset = -0x1000 which is already weird
enough.
Offset is unsigned (and 32bit) so this will actually look like
s->offset = 0xf000 then, which isn't much better.
And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and
we'll map some random memory in iommu_map_sg_atomic and a little bit later
everything explodes.

Now I could probably adjust the phys addr backwards and make sure offset is
always positive (and possibly larger than PAGE_SIZE) and later restore it
in __finalise_sg then but I feel like that's pushing this a little bit too far.


Yes, that's what I meant. At a quick guess, something like the
completely untested diff below.


That unfortunately results in unaligned mappings


You mean it even compiles!? :D


[9.630334] iommu: unaligned: iova 0xbff4 pa 0x000801a3b000 size 
0x4000 min_pagesz 0x4000

I'll take a closer look later this week and see if I can fix it.


On reflection, "s->offset ^ s_iova_off" is definitely wrong, that more 
likely wants to be "s->offset & ~s_iova_off".


Robin.


It really comes down to what we want to
achieve here - if it's just to make this thing work at all, then I'd
favour bolting on the absolute minimum changes, possibly even cheating
by tainting the kernel and saying all bets are off instead of trying to
handle the more involved corners really properly. However if you want to
work towards this being a properly-supported thing, then I think it's
worth generalising the existing assumptions of page alignment from the
beginning.


I'd like to try and see if we can make this a properly-supported thing.

That will likely take a few iterations but realistically the rest of the drivers
required to make this platform actually useful (and especially the display 
controller
and GPU drivers) won't be ready for a few more months anyway. And even on 4KB 
PAGE_SIZE
kernels half the USB ports and NVMe will work fine, which should be enough to 
install
a distro and some third-party package that just ships the distro kernel with 
16KB
pages.




Sven


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 13/13] MAINTAINERS: Add entry for MediaTek SMI

2021-08-10 Thread Yong Wu
I am the author of MediaTek SMI driver, and will to maintain
and develop it further.
Add myself to cover these items.

Signed-off-by: Yong Wu 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a61f4f3b78a9..91af55b45e80 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11754,6 +11754,14 @@ M: Sean Wang 
 S: Maintained
 F: drivers/char/hw_random/mtk-rng.c
 
+MEDIATEK SMI DRIVER
+M: Yong Wu 
+L: linux-media...@lists.infradead.org (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/memory-controllers/mediatek,smi*
+F: drivers/memory/mtk-smi.c
+F: include/soc/mediatek/smi.h
+
 MEDIATEK SWITCH DRIVER
 M: Sean Wang 
 M: Landen Chao 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v2 4/5] dma/pool: create dma atomic pool only if dma zone has mamaged pages

2021-08-10 Thread Baoquan He
Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory presented
and locked down by memblock allocator. So no pages are added into buddy
of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve
the first 1M of RAM").

Then in kdump kernel of x86_64, it always prints below failure message:

 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), 
nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ? _raw_spin_unlock_irq+0x24/0x40
  ? __alloc_pages_direct_compact+0x90/0x1b0
  __alloc_pages_slowpath.constprop.0+0xf29/0xf50
  ? __cond_resched+0x16/0x50
  ? prepare_alloc_pages.constprop.0+0x19d/0x1b0
  __alloc_pages+0x24d/0x2c0
  ? __dma_atomic_pool_init+0x93/0x93
  alloc_page_interleave+0x13/0xb0
  atomic_pool_expand+0x118/0x210
  ? __dma_atomic_pool_init+0x93/0x93
  __dma_atomic_pool_init+0x45/0x93
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ..
 DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
 DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations

Here, let's check if DMA zone has managed pages, then create atomic_pool_dma
if yes. Otherwise just skip it.

Signed-off-by: Baoquan He 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: iommu@lists.linux-foundation.org
---
 kernel/dma/pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5a85804b5beb..00df3edd6c5d 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -206,7 +206,7 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+   if (has_managed_dma()) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
GFP_KERNEL | GFP_DMA);
if (!atomic_pool_dma)
@@ -229,7 +229,7 @@ static inline struct gen_pool *dma_guess_pool(struct 
gen_pool *prev, gfp_t gfp)
if (prev == NULL) {
if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
return atomic_pool_dma32;
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   if (atomic_pool_dma && (gfp & GFP_DMA))
return atomic_pool_dma;
return atomic_pool_kernel;
}
-- 
2.17.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-08-10 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:33:54PM +0530, Praveen Kumar wrote:
> On 04-08-2021 03:26, Wei Liu wrote:
> >>>   struct iommu_domain domain;
> >>> @@ -774,6 +784,41 @@ static struct iommu_device 
> >>> *hv_iommu_probe_device(struct device *dev)
> >>>   if (!dev_is_pci(dev))
> >>>   return ERR_PTR(-ENODEV);
> >>>  
> >>> + /*
> >>> +  * Skip the PCI device specified in `pci_devs_to_skip`. This is a
> >>> +  * temporary solution until we figure out a way to extract information
> >>> +  * from the hypervisor what devices it is already using.
> >>> +  */
> >>> + if (pci_devs_to_skip && *pci_devs_to_skip) {
> >>> + int pos = 0;
> >>> + int parsed;
> >>> + int segment, bus, slot, func;
> >>> + struct pci_dev *pdev = to_pci_dev(dev);
> >>> +
> >>> + do {
> >>> + parsed = 0;
> >>> +
> >>> + sscanf(pci_devs_to_skip + pos,
> >>> + " (%x:%x:%x.%x) %n",
> >>> + , , , , );
> >>> +
> >>> + if (parsed <= 0)
> >>> + break;
> >>> +
> >>> + if (pci_domain_nr(pdev->bus) == segment &&
> >>> + pdev->bus->number == bus &&
> >>> + PCI_SLOT(pdev->devfn) == slot &&
> >>> + PCI_FUNC(pdev->devfn) == func)
> >>> + {
> >>> + dev_info(dev, "skipped by MSHV IOMMU\n");
> >>> + return ERR_PTR(-ENODEV);
> >>> + }
> >>> +
> >>> + pos += parsed;
> >>> +
> >>> + } while (pci_devs_to_skip[pos]);
> >>
> >> Is there a possibility of pci_devs_to_skip + pos > sizeof(pci_devs_to_skip)
> >> and also a valid memory ?
> > 
> > pos should point to the last parsed position. If parsing fails pos does
> > not get updated and the code breaks out of the loop. If parsing is
> > success pos should point to either the start of next element of '\0'
> > (end of string). To me this is good enough.
> 
> The point is, hypothetically the address to pci_devs_to_skip + pos can
> be valid address (later to '\0'), and thus there is a possibility,
> that parsing may not fail.

Have you found an example how at any given point in time
pci_devs_to_skip + pos can point outside of user provided string?

> Another, there is also a possibility of sscanf faulting accessing the
> illegal address, if pci_devs_to_skip[pos] turns out to be not NULL or
> valid address.

That depends on pci_devs_to_skip + pos can point to an invalid address
in the first place, so that goes back to the question above.

> 
> > 
> >> I would recommend to have a check of size as well before accessing the
> >> array content, just to be safer accessing any memory.
> >>
> > 
> > What check do you have in mind?
> 
> Something like,
> size_t len = strlen(pci_devs_to_skip);
> do {
> 
>   len -= parsed;
> } while (len);
> 
> OR
> 
> do {
> ...
>   pos += parsed;
> } while (pos < len);
> 
> Further, I'm also fine with the existing code, if you think this won't
> break and already been taken care. Thanks.

But in the loop somewhere you will still need to parse pci_devs_to_skip
+ some_offset. The new code structure does not remove that, right?

Given this is for debugging and is supposed to be temporary, I think the
code is good enough. But I want to make sure if there is anything I
missed.

Wei.

> 
> Regards,
> 
> ~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 5/8] mshv: add paravirtualized IOMMU support

2021-08-10 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:13:42PM +0530, Praveen Kumar wrote:
> On 04-08-2021 03:17, Wei Liu wrote:
> >>> +static size_t hv_iommu_unmap(struct iommu_domain *d, unsigned long iova,
> >>> +size_t size, struct iommu_iotlb_gather *gather)
> >>> +{
> >>> + size_t unmapped;
> >>> + struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
> >>> + unsigned long flags, npages;
> >>> + struct hv_input_unmap_device_gpa_pages *input;
> >>> + u64 status;
> >>> +
> >>> + unmapped = hv_iommu_del_mappings(domain, iova, size);
> >>> + if (unmapped < size)
> >>> + return 0;
> >> Is there a case where unmapped > 0 && unmapped < size ?
> >>
> > There could be such a case -- hv_iommu_del_mappings' return value is >= 0.
> > Is there a problem with this predicate?
> 
> What I understand, if we are unmapping and return 0, means nothing was
> unmapped, and will that not cause any corruption or illegal access of
> unmapped memory later?  From __iommu_unmap

Those pages are not really unmapped. The hypercall is skipped.

> ...
> 13 while (unmapped < size) {
> 12 size_t pgsize = iommu_pgsize(domain, iova, size - 
> unmapped);
> 11
> 10 unmapped_page = ops->unmap(domain, iova, pgsize, 
> iotlb_gather);
>  9 if (!unmapped_page)
>  8 break; <<< we just break here, 
> thinking there is nothing unmapped, but actually hv_iommu_del_mappings has 
> removed some pages.
>  7
>  6 pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>  5 ¦iova, unmapped_page);
>  4
>  3 iova += unmapped_page;
>  2 unmapped += unmapped_page;
>  1 }
> ...
> 
> Am I missing something ?
> 
> Regards,
> 
> ~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-08-10 Thread Ming Lei
On Tue, Aug 10, 2021 at 10:36:47AM +0100, John Garry wrote:
> On 28/07/2021 16:17, Ming Lei wrote:
> > > > > Have you tried turning off the IOMMU to ensure that this is really 
> > > > > just
> > > > > an IOMMU problem?
> > > > > 
> > > > > You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
> > > > > cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent 
> > > > > to
> > > > > disabling for kernel drivers).
> > > > Bypassing SMMU via iommu.passthrough=1 basically doesn't make a 
> > > > difference
> > > > on this issue.
> > > A ~90% throughput drop still seems to me to be too high to be a software
> > > issue. More so since I don't see similar on my system. And that throughput
> > > drop does not lead to a total CPU usage drop, from the fio log.
> > > 
> > > Do you know if anyone has run memory benchmark tests on this board to find
> > > out NUMA effect? I think lmbench or stream could be used for this.
> > https://lore.kernel.org/lkml/YOhbc5C47IzC893B@T590/
> 
> Hi Ming,
> 
> Out of curiosity, did you investigate this topic any further?

IMO, the issue is probably in device/system side, since completion latency is
increased a lot, meantime submission latency isn't changed.

Either the submission isn't committed to hardware in time, or the
completion status isn't updated to HW in time from viewpoint of CPU.

We have tried to update to new FW, but not see difference made.

> 
> And you also asked about my results earlier:
> 
> On 22/07/2021 16:54, Ming Lei wrote:
> >> [   52.035895] nvme :81:00.0: Adding to iommu group 5
> >> [   52.047732] nvme nvme0: pci function :81:00.0
> >> [   52.067216] nvme nvme0: 22/0/2 default/read/poll queues
> >> [   52.087318]  nvme0n1: p1
> >>
> >> So I get these results:
> >> cpu0 335K
> >> cpu32 346K
> >> cpu64 300K
> >> cpu96 300K
> >>
> >> So still not massive changes.
> > In your last email, the results are the following with irq mode io_uring:
> >
> >   cpu0  497K
> >   cpu4  307K
> >   cpu32 566K
> >   cpu64 488K
> >   cpu96 508K
> >
> > So looks you get much worse result with real io_polling?
> >
> 
> Would the expectation be that at least I get the same performance with
> io_polling here?

io_polling is supposed to improve IO latency a lot compared with irq
mode, and the perf data shows that clearly on x86_64.

> Anything else to try which you can suggest to investigate
> this lower performance?

You may try to compare irq mode and polling and narrow down the possible
reasons, no exact suggestion on how to investigate it, :-(


Thanks,
Ming

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-10 Thread Tianyu Lan

Hi Wei:
  Thanks for review.

On 8/10/2021 6:56 PM, Wei Liu wrote:

On Mon, Aug 09, 2021 at 01:56:05PM -0400, Tianyu Lan wrote:
[...]

  static int hv_cpu_init(unsigned int cpu)
  {
union hv_vp_assist_msr_contents msr = { 0 };
@@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
}
}
  
+	hyperv_init_ghcb();

+


Why is the return value not checked here? If that's not required, can
you leave a comment?



The check is necessary here. Will update in the next version.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-10 Thread Tianyu Lan

On 8/10/2021 7:03 PM, Wei Liu wrote:

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0bb4d9ca7a55..b3683083208a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
  
  bool hv_is_isolation_supported(void)

  {
+   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+   return 0;

Nit: false instead of 0.



OK. Will fix in the next version.


+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+  enum hv_mem_host_visibility visibility)
+{
+   struct hv_gpa_range_for_visibility **input_pcpu, *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_gpa_range_for_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -EINVAL;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;

Joseph introduced a few helper functions in 753ed9c95c37d. They will
make the code simpler.


OK. Will update in the next version.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-10 Thread David Gibson
On Fri, Aug 06, 2021 at 09:32:11AM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 06, 2021 at 02:45:26PM +1000, David Gibson wrote:
> 
> > Well, that's kind of what I'm doing.  PCI currently has the notion of
> > "default" address space for a RID, but there's no guarantee that other
> > buses (or even future PCI extensions) will.  The idea is that
> > "endpoint" means exactly the (RID, PASID) or (SID, SSID) or whatever
> > future variations are on that.
> 
> This is already happening in this proposal, it is why I insisted that
> the driver facing API has to be very explicit. That API specifies
> exactly what the device silicon is doing.
> 
> However, that is placed at the IOASID level. There is no reason to
> create endpoint objects that are 1:1 with IOASID objects, eg for
> PASID.

They're not 1:1 though.  You can have multiple endpoints in the same
IOAS, that's the whole point.

> We need to have clear software layers and responsibilities, I think
> this is where the VFIO container design has fallen behind.
> 
> The device driver is responsible to delcare what TLPs the device it
> controls will issue

Right.. and I'm envisaging an endpoint as a abstraction to represent a
single TLP.

> The system layer is responsible to determine how those TLPs can be
> matched to IO page tables, if at all
> 
> The IO page table layer is responsible to map the TLPs to physical
> memory.
> 
> Each must stay in its box and we should not create objects that smush
> together, say, the device and system layers because it will only make
> a mess of the software design.

I agree... and endpoints are explicitly an attempt to do that.  I
don't see how you think they're smushing things together.

> Since the system layer doesn't have any concrete objects in our
> environment (which is based on devices and IO page tables) it has to
> exist as metadata attached to the other two objects.

Whereas I'm suggesting clarifying this by *creating* concrete objects
to represent the concept we need.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-10 Thread Wei Liu
On Mon, Aug 09, 2021 at 01:56:05PM -0400, Tianyu Lan wrote:
[...]
>  static int hv_cpu_init(unsigned int cpu)
>  {
>   union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>   }
>   }
>  
> + hyperv_init_ghcb();
> +

Why is the return value not checked here? If that's not required, can
you leave a comment?

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-10 Thread Wei Liu
On Mon, Aug 09, 2021 at 01:56:07PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v2:
>* Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>  according to platform check.
> 
> Change since v1:
>* Use new staic call x86_set_memory_enc to avoid add Hyper-V
>  specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile   |   2 +-
>  arch/x86/hyperv/hv_init.c  |   6 ++
>  arch/x86/hyperv/ivm.c  | 114 +
>  arch/x86/include/asm/hyperv-tlfs.h |  20 +
>  arch/x86/include/asm/mshyperv.h|   4 +-
>  arch/x86/mm/pat/set_memory.c   |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h |   1 +
>  8 files changed, 160 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>  
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;

Nit: false instead of 0.

> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
>  
[...]
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +enum hv_mem_host_visibility visibility)
> +{
> + struct hv_gpa_range_for_visibility **input_pcpu, *input;
> + u16 pages_processed;
> + u64 hv_status;
> + unsigned long flags;
> +
> + /* no-op if partition isolation is not enabled */
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> + HV_MAX_MODIFY_GPA_REP_COUNT);
> + return -EINVAL;
> + }
> +
> + local_irq_save(flags);
> + input_pcpu = (struct hv_gpa_range_for_visibility **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> + input = *input_pcpu;
> + if (unlikely(!input)) {
> + local_irq_restore(flags);
> + return -EINVAL;
> + }
> +
> + input->partition_id = HV_PARTITION_ID_SELF;
> + input->host_visibility = visibility;
> + input->reserved0 = 0;
> + input->reserved1 = 0;
> + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> + hv_status = hv_do_rep_hypercall(
> + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> + 0, input, _processed);
> + local_irq_restore(flags);
> +
> + if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> + return 0;
> +
> + return hv_status & HV_HYPERCALL_RESULT_MASK;

Joseph introduced a few helper functions in 753ed9c95c37d. They will
make the code simpler.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-10 Thread Tianyu Lan

On 8/10/2021 6:12 AM, Dave Hansen wrote:

On 8/9/21 10:56 AM, Tianyu Lan wrote:

From: Tianyu Lan 

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.


 From an x86/mm perspective:

Acked-by: Dave Hansen 



Thanks for your ACK.



A tiny nit:


diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0bb4d9ca7a55..b3683083208a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
  
  bool hv_is_isolation_supported(void)

  {
+   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+   return 0;
+
+   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+   return 0;
+
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
  }

This might be worthwhile to move to a header.  That ensures that
hv_is_isolation_supported() use can avoid even a function call.  But, I
see this is used in modules and its use here is also in a slow path, so
it's not a big deal



I will move it to header in the following version.


Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: APPLE_DART should depend on ARCH_APPLE

2021-08-10 Thread Geert Uytterhoeven
The Apple DART (Device Address Resolution Table) IOMMU is only present
on Apple ARM SoCs like the M1.  Hence add a dependency on ARCH_APPLE, to
prevent asking the user about this driver when configuring a kernel
without support for the Apple Silicon SoC family.

Fixes: 05ce9d20d699b093 ("iommu/dart: Add DART iommu driver")
Signed-off-by: Geert Uytterhoeven 
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index dfe81da483e9e073..e908b8222e4ed679 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -292,7 +292,7 @@ config SPAPR_TCE_IOMMU
 
 config APPLE_DART
tristate "Apple DART IOMMU Support"
-   depends on ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on ARCH_APPLE || (COMPILE_TEST && !GENERIC_ATOMIC64)
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
default ARCH_APPLE
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/8] iommu/arm-smmu-v3: add support for ECMDQ register mode

2021-08-10 Thread Will Deacon
On Sat, Jun 26, 2021 at 07:01:22PM +0800, Zhen Lei wrote:
> SMMU v3.3 added a new feature, which is Enhanced Command queue interface
> for reducing contention when submitting Commands to the SMMU, in this
> patch set, ECMDQ is the abbreviation of Enhanced Command Queue.
> 
> When the hardware supports ECMDQ and each core can exclusively use one ECMDQ,
> each core does not need to compete with other cores when using its own ECMDQ.
> This means that each core can insert commands in parallel. If each ECMDQ can
> execute commands in parallel, the overall performance may be better. However,
> our hardware currently does not support multiple ECMDQ execute commands in
> parallel.
> 
> In order to reuse existing code, I originally still call 
> arm_smmu_cmdq_issue_cmdlist()
> to insert commands. Even so, however, there was a performance improvement of 
> nearly 12%
> in strict mode.
> 
> The test environment is the EMU, which simulates the connection of the 200 
> Gbit/s NIC.
> Number of queues:passthrough   lazy   strict(ECMDQ)  strict(CMDQ)
>   6  188180   162   145--> 
> 11.7% improvement
>   8  188188   184   183--> 
> 0.55% improvement

Sorry, I don't quite follow the numbers here. Why does the number of queues
affect the classic "CMDQ" mode? We only have one queue there, right?

> In recent days, I implemented a new function without competition with other
> cores to replace arm_smmu_cmdq_issue_cmdlist() when a core can have an ECMDQ.
> I'm guessing it might get better performance results. Because the EMU is too
> slow, it will take a while before the relevant data is available.

I'd certainly prefer to wait until we have something we know is
representative. However, I can take the first four prep patches now if you
respin the second one. At least that's then less for you to carry.

I'd also like review from the Arm side on this (and thank you for adopting
the architecture unlike others seem to have done judging by the patches
floating around).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation

2021-08-10 Thread Will Deacon
On Tue, Aug 03, 2021 at 11:09:17AM +0530, Sai Prakash Ranjan wrote:
> On 2021-08-02 21:13, Will Deacon wrote:
> > On Wed, Jun 23, 2021 at 07:12:01PM +0530, Sai Prakash Ranjan wrote:
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > index d3c6f54110a5..f3845e822565 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > @@ -341,6 +341,12 @@ static void arm_smmu_tlb_add_page_s1(struct
> > > iommu_iotlb_gather *gather,
> > > ARM_SMMU_CB_S1_TLBIVAL);
> > >  }
> > > 
> > > +static void arm_smmu_tlb_inv_walk_impl_s1(unsigned long iova,
> > > size_t size,
> > > +  size_t granule, void *cookie)
> > > +{
> > > + arm_smmu_tlb_inv_context_s1(cookie);
> > > +}
> > > +
> > >  static void arm_smmu_tlb_inv_walk_s2(unsigned long iova, size_t size,
> > >size_t granule, void *cookie)
> > >  {
> > > @@ -388,6 +394,12 @@ static const struct iommu_flush_ops
> > > arm_smmu_s1_tlb_ops = {
> > >   .tlb_add_page   = arm_smmu_tlb_add_page_s1,
> > >  };
> > > 
> > > +const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops = {
> > > + .tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
> > > + .tlb_flush_walk = arm_smmu_tlb_inv_walk_impl_s1,
> > > + .tlb_add_page   = arm_smmu_tlb_add_page_s1,
> > > +};
> > 
> > Hmm, dunno about this. Wouldn't it be a lot cleaner if the
> > tlb_flush_walk
> > callbacks just did the right thing based on the smmu_domain (maybe in
> > the
> > arm_smmu_cfg?) rather than having an entirely new set of ops just
> > because
> > they're const and you can't overide the bit you want?
> > 
> > I don't think there's really an awful lot qcom-specific about the
> > principle
> > here -- there's a trade-off between over-invalidation and invalidation
> > latency. That happens on the CPU as well.
> > 
> 
> Sorry didn't understand, based on smmu_domain what? How do we make
> this implementation specific? Do you mean something like a quirk?
> The reason we didn't make this common was because nvidia folks weren't
> so happy with that, you can find the discussion in this thread [1].
> 
> [1] 
> https://lore.kernel.org/lkml/20210609145315.25750-1-saiprakash.ran...@codeaurora.org/

The ->tlb_flush_walk() callbacks take a 'void *cookie' which, for this
driver, is a 'struct arm_smmu_domain *'. From that, you can get to the
'struct arm_smmu_cfg' which could have something as coarse as:

boolflush_walk_prefer_tlbiasid;

which you can set when you initialise the domain (maybe in the
->init_context callback?). It shouldn't affect anybody else.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-10 Thread Kuppuswamy, Sathyanarayanan



On 8/10/21 12:48 PM, Tom Lendacky wrote:

On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote:



On 7/27/21 3:26 PM, Tom Lendacky wrote:

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
   #include 
   #include 
   #include 
-#include 
+#include 
   #include 
     #include 
@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
physaddr,
    * there is no need to zero it after changing the memory encryption
    * attribute.
    */
-    if (mem_encrypt_active()) {
+    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
   vaddr = (unsigned long)__start_bss_decrypted;
   vaddr_end = (unsigned long)__end_bss_decrypted;



Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with
prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in
TDX.


This is a direct replacement for now. I think the change you're requesting
should be done as part of the TDX support patches so it's clear why it is
being changed.


Ok. I will include it part of TDX changes.



But, wouldn't TDX still need to do something with this shared/unencrypted
area, though? Or since it is shared, there's actually nothing you need to
do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not
configured)?


Kirill had a requirement to turn on CONFIG_AMD_MEM_ENCRYPT for adding lazy
accept support in TDX guest kernel. Kirill, can you add details here?



Thanks,
Tom





--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 14/14] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch

2021-08-10 Thread Jarkko Sakkinen
On Mon, Aug 09, 2021 at 12:38:56PM -0400, Ross Philipson wrote:
> The Secure Launch MLE environment uses PCRs that are only accessible from
> the DRTM locality 2. By default the TPM drivers always initialize the
> locality to 0. When a Secure Launch is in progress, initialize the
> locality to 2.
> 
> Signed-off-by: Ross Philipson 
> ---
>  drivers/char/tpm/tpm-chip.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb..48b9351 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "tpm.h"
>  
>  DEFINE_IDR(dev_nums_idr);
> @@ -34,12 +35,20 @@
>  
>  static int tpm_request_locality(struct tpm_chip *chip)
>  {
> - int rc;
> + int rc, locality;

int locality;
int rc;

>  
>   if (!chip->ops->request_locality)
>   return 0;
>  
> - rc = chip->ops->request_locality(chip, 0);
> + if (slaunch_get_flags() & SL_FLAG_ACTIVE) {
> + dev_dbg(>dev, "setting TPM locality to 2 for MLE\n");
> + locality = 2;
> + } else {
> + dev_dbg(>dev, "setting TPM locality to 0\n");
> + locality = 0;
> + }

Please, remove dev_dbg()'s.

> +
> + rc = chip->ops->request_locality(chip, locality);
>   if (rc < 0)
>   return rc;
>  
> -- 
> 1.8.3.1

/Jarkko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 02/14] x86/boot: Add missing handling of setup_indirect structures

2021-08-10 Thread Jarkko Sakkinen
On Mon, Aug 09, 2021 at 12:38:44PM -0400, Ross Philipson wrote:
> One of the two functions in ioremap.c that handles setup_data was
> missing the correct handling of setup_indirect structures.

What is "correct handling", and how was it broken?

What is 'setup_indirect'?

> Functionality missing from original commit:

Remove this sentence.

> commit b3c72fc9a78e (x86/boot: Introduce setup_indirect)

Should be.

Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect")

 
> Signed-off-by: Ross Philipson 
> ---
>  arch/x86/mm/ioremap.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index ab74e4f..f2b34c5 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -669,17 +669,34 @@ static bool __init 
> early_memremap_is_setup_data(resource_size_t phys_addr,
>  
>   paddr = boot_params.hdr.setup_data;
>   while (paddr) {
> - unsigned int len;
> + unsigned int len, size;
>  
>   if (phys_addr == paddr)
>   return true;
>  
>   data = early_memremap_decrypted(paddr, sizeof(*data));
> + size = sizeof(*data);
>  
>   paddr_next = data->next;
>   len = data->len;
>  
> - early_memunmap(data, sizeof(*data));
> + if ((phys_addr > paddr) && (phys_addr < (paddr + len))) {
> + early_memunmap(data, sizeof(*data));
> + return true;
> + }
> +
> + if (data->type == SETUP_INDIRECT) {
> + size += len;
> + early_memunmap(data, sizeof(*data));
> + data = early_memremap_decrypted(paddr, size);
> +
> + if (((struct setup_indirect *)data->data)->type != 
> SETUP_INDIRECT) {
> + paddr = ((struct setup_indirect 
> *)data->data)->addr;
> + len = ((struct setup_indirect 
> *)data->data)->len;
> + }
> + }
> +
> + early_memunmap(data, size);
>  
>   if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
>   return true;
> -- 
> 1.8.3.1
> 
> 

/Jarkko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/14] x86: Trenchboot secure dynamic launch Linux kernel support

2021-08-10 Thread Jarkko Sakkinen
On Mon, Aug 09, 2021 at 12:38:42PM -0400, Ross Philipson wrote:
> The focus of Trechboot project (https://github.com/TrenchBoot) is to
> enhance the boot security and integrity. This requires the linux kernel
 ~
 Linux

How does it enhance it? The following sentence explains the requirements
for the Linux kernel, i.e. it's a question without answer. And if there
is no answer, there is no need to merge this.

/Jarkko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-10 Thread Tom Lendacky via iommu
On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote:
> 
> 
> On 7/27/21 3:26 PM, Tom Lendacky wrote:
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index de01903c3735..cafed6456d45 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -19,7 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include 
>> +#include 
>>   #include 
>>     #include 
>> @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
>> physaddr,
>>    * there is no need to zero it after changing the memory encryption
>>    * attribute.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
>>   vaddr = (unsigned long)__start_bss_decrypted;
>>   vaddr_end = (unsigned long)__end_bss_decrypted;
> 
> 
> Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with
> prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in
> TDX.

This is a direct replacement for now. I think the change you're requesting
should be done as part of the TDX support patches so it's clear why it is
being changed.

But, wouldn't TDX still need to do something with this shared/unencrypted
area, though? Or since it is shared, there's actually nothing you need to
do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not
configured)?

Thanks,
Tom

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-10 Thread Kuppuswamy, Sathyanarayanan




On 7/27/21 3:26 PM, Tom Lendacky wrote:

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  
  #include 

@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * there is no need to zero it after changing the memory encryption
 * attribute.
 */
-   if (mem_encrypt_active()) {
+   if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;



Since this change is specific to AMD, can you replace PATTR_MEM_ENCRYPT with
prot_guest_has(PATTR_SME) || prot_guest_has(PATTR_SEV). It is not used in
TDX.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 2/8] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync()

2021-08-10 Thread Will Deacon
On Sat, Jun 26, 2021 at 07:01:24PM +0800, Zhen Lei wrote:
> The obvious key to the performance optimization of commit 587e6c10a7ce
> ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
> to allow multiple cores to insert commands in parallel after a brief mutex
> contention.
> 
> Obviously, inserting as many commands at a time as possible can reduce the
> number of times the mutex contention participates, thereby improving the
> overall performance. At least it reduces the number of calls to function
> arm_smmu_cmdq_issue_cmdlist().
> 
> Therefore, function arm_smmu_cmdq_issue_cmd_with_sync() is added to insert
> the 'cmd+sync' commands at a time.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2433d3c29b49ff2..a5361153ca1d6a4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -858,11 +858,25 @@ static int arm_smmu_cmdq_issue_cmd(struct 
> arm_smmu_device *smmu,
>   return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, false);
>  }
>  
> -static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
> +static int __maybe_unused arm_smmu_cmdq_issue_sync(struct arm_smmu_device 
> *smmu)
>  {
>   return arm_smmu_cmdq_issue_cmdlist(smmu, NULL, 0, true);
>  }
>  
> +static int arm_smmu_cmdq_issue_cmd_with_sync(struct arm_smmu_device *smmu,
> +  struct arm_smmu_cmdq_ent *ent)
> +{
> + u64 cmd[CMDQ_ENT_DWORDS];
> +
> + if (arm_smmu_cmdq_build_cmd(cmd, ent)) {
> + dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
> +  ent->opcode);
> + return -EINVAL;
> + }
> +
> + return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, true);
> +}

This function is almost identical to arm_smmu_cmdq_issue_cmd(). How about
moving the guts out into a helper:

static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 struct arm_smmu_cmdq_ent *ent,
 bool sync);

and then having arm_smmu_cmdq_issue_cmd_with_sync() and
arm_smmu_cmdq_issue_cmd() wrap that?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-08-10 Thread Tom Lendacky via iommu
On 8/5/21 1:54 AM, Baoquan He wrote:
> On 06/24/21 at 11:47am, Robin Murphy wrote:
>> On 2021-06-24 10:29, Baoquan He wrote:
>>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
 So reduce the amount allocated.  But the pool is needed for proper
 operation on systems with memory encryption.  And please add the right
 maintainer or at least mailing list for the code you're touching next
 time.
>>>
>>> Oh, I thoutht it's memory issue only, should have run
>>> ./scripts/get_maintainer.pl. sorry.
>>>
>>> About reducing the amount allocated, it may not help. Because on x86_64,
>>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
>>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
>>>
>>> Wondering in which case or on which device the atomic pool is needed on
>>> AMD system with mem encrytion enabled. As we can see, the OOM will
>>> happen too in kdump kernel on Intel system, even though it's not
>>> necessary.
> 
> Sorry for very late response, and thank both for your comments.
> 
>>
>> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
>> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
>> that was the original behaviour anyway. However the implications of
>> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
>> should only be relevant if mem_encrypt_active(), so it probably does make
>> sense to have an additional runtime gate on that.
> 
>>
>> From a quick scan, use of dma_alloc_from_pool() already depends on
>> force_dma_unencrypted() so that's probably fine already, but I think we'd
>> need a bit of extra protection around dma_free_from_pool() to prevent
>> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
>> with your proposed patch as it is. Presumably nothing actually called
>> dma_direct_free() when you tested this?
> 
> Yes, enforcing the conditional check of force_dma_unencrypted() around
> dma_free_from_pool sounds reasonable, just as we have done in
> dma_alloc_from_pool().
> 
> I have tested this patchset on normal x86_64 systems and one amd system
> with SME support, disabling atomic pool can fix the issue that there's no
> managed pages in dma zone then requesting page from dma zone will cause
> allocation failure. And even disabling atomic pool in 1st kernel didn't
> cause any problem on one AMD EPYC system which supports SME. I am not
> expert of DMA area, wondering how atomic pool is supposed to do in
> SME/SEV system. 

I think the atomic pool is used by the NVMe driver. My understanding is
that driver will do a dma_alloc_coherent() from interrupt context, so it
needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
would perform a set_memory_decrypted() call, which can sleep. The pool
eliminates that issue (David can correct me if I got that wrong).

Thanks,
Tom

> 
> Besides, even though atomic pool is disabled, slub page for allocation
> of dma-kmalloc also triggers page allocation failure. So I change to
> take another way to fix them, please check v2 post. The atomic pool
> disabling an be a good to have change.
> 
> Thanks
> Baoquan
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V3 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-08-10 Thread Will Deacon
On Tue, 10 Aug 2021 10:13:59 +0530, Ashish Mhetre wrote:
> When two devices with same SID are getting probed concurrently through
> iommu_probe_device(), the iommu_group and iommu_domain are allocated more
> than once because they are not protected for concurrency. This is leading
> to context faults when one device is accessing IOVA from other device.
> Fix this by protecting iommu_domain and iommu_group creation with mutexes.
> 
> Changes in v3:
> * Updated commit messages.
> * Added Signed-off-by in patch 2.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/2] iommu: Fix race condition during default domain allocation
  https://git.kernel.org/will/c/211ff31b3d33
[2/2] iommu/arm-smmu: Fix race condition during iommu_group creation
  https://git.kernel.org/will/c/b1a1347912a7

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-10 Thread Will Deacon
On Tue, 10 Aug 2021 12:18:08 +0530, Sai Prakash Ranjan wrote:
> Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk
> of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in
> such cases, we would need to drop the XO clock vote in unprepare call and
> this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened)
> clock driver which controls RPMh managed clock resources for new QTI SoCs.
> 
> Given we cannot have a sleeping calls such as clk_bulk_prepare() and
> clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
> operations like map and unmap can be in atomic context and are in fast
> path, add this prepare and unprepare call to drop the XO vote only for
> system pm callbacks since it is not a fast path and we expect the system
> to enter deep sleep states with system pm as opposed to runtime pm.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks
  https://git.kernel.org/will/c/afefe67e0893

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 2/8] iommu/arm-smmu-v3: Add and use static helper function arm_smmu_cmdq_issue_cmd_with_sync()

2021-08-10 Thread Leizhen (ThunderTown)



On 2021/8/11 2:24, Will Deacon wrote:
> On Sat, Jun 26, 2021 at 07:01:24PM +0800, Zhen Lei wrote:
>> The obvious key to the performance optimization of commit 587e6c10a7ce
>> ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is
>> to allow multiple cores to insert commands in parallel after a brief mutex
>> contention.
>>
>> Obviously, inserting as many commands at a time as possible can reduce the
>> number of times the mutex contention participates, thereby improving the
>> overall performance. At least it reduces the number of calls to function
>> arm_smmu_cmdq_issue_cmdlist().
>>
>> Therefore, function arm_smmu_cmdq_issue_cmd_with_sync() is added to insert
>> the 'cmd+sync' commands at a time.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
>>  1 file changed, 21 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 2433d3c29b49ff2..a5361153ca1d6a4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -858,11 +858,25 @@ static int arm_smmu_cmdq_issue_cmd(struct 
>> arm_smmu_device *smmu,
>>  return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, false);
>>  }
>>  
>> -static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>> +static int __maybe_unused arm_smmu_cmdq_issue_sync(struct arm_smmu_device 
>> *smmu)
>>  {
>>  return arm_smmu_cmdq_issue_cmdlist(smmu, NULL, 0, true);
>>  }
>>  
>> +static int arm_smmu_cmdq_issue_cmd_with_sync(struct arm_smmu_device *smmu,
>> + struct arm_smmu_cmdq_ent *ent)
>> +{
>> +u64 cmd[CMDQ_ENT_DWORDS];
>> +
>> +if (arm_smmu_cmdq_build_cmd(cmd, ent)) {
>> +dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n",
>> + ent->opcode);
>> +return -EINVAL;
>> +}
>> +
>> +return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, true);
>> +}
> 
> This function is almost identical to arm_smmu_cmdq_issue_cmd(). How about
> moving the guts out into a helper:
> 
>   static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>struct arm_smmu_cmdq_ent *ent,
>bool sync);
> 
> and then having arm_smmu_cmdq_issue_cmd_with_sync() and
> arm_smmu_cmdq_issue_cmd() wrap that?

OK, I will do it.

How about remove arm_smmu_cmdq_issue_sync()? It's unused now.

> 
> Will
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-08-10 Thread Baoquan He
On 08/10/21 at 03:52pm, Tom Lendacky wrote:
> On 8/5/21 1:54 AM, Baoquan He wrote:
> > On 06/24/21 at 11:47am, Robin Murphy wrote:
> >> On 2021-06-24 10:29, Baoquan He wrote:
> >>> On 06/24/21 at 08:40am, Christoph Hellwig wrote:
>  So reduce the amount allocated.  But the pool is needed for proper
>  operation on systems with memory encryption.  And please add the right
>  maintainer or at least mailing list for the code you're touching next
>  time.
> >>>
> >>> Oh, I thoutht it's memory issue only, should have run
> >>> ./scripts/get_maintainer.pl. sorry.
> >>>
> >>> About reducing the amount allocated, it may not help. Because on x86_64,
> >>> kdump kernel doesn't put any page of memory into buddy allocator of DMA
> >>> zone. Means it will defenitely OOM for atomic_pool_dma initialization.
> >>>
> >>> Wondering in which case or on which device the atomic pool is needed on
> >>> AMD system with mem encrytion enabled. As we can see, the OOM will
> >>> happen too in kdump kernel on Intel system, even though it's not
> >>> necessary.
> > 
> > Sorry for very late response, and thank both for your comments.
> > 
> >>
> >> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> >> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> >> that was the original behaviour anyway. However the implications of
> >> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> >> should only be relevant if mem_encrypt_active(), so it probably does make
> >> sense to have an additional runtime gate on that.
> > 
> >>
> >> From a quick scan, use of dma_alloc_from_pool() already depends on
> >> force_dma_unencrypted() so that's probably fine already, but I think we'd
> >> need a bit of extra protection around dma_free_from_pool() to prevent
> >> gen_pool_has_addr() dereferencing NULL if the pools are uninitialised, even
> >> with your proposed patch as it is. Presumably nothing actually called
> >> dma_direct_free() when you tested this?
> > 
> > Yes, enforcing the conditional check of force_dma_unencrypted() around
> > dma_free_from_pool sounds reasonable, just as we have done in
> > dma_alloc_from_pool().
> > 
> > I have tested this patchset on normal x86_64 systems and one amd system
> > with SME support, disabling atomic pool can fix the issue that there's no
> > managed pages in dma zone then requesting page from dma zone will cause
> > allocation failure. And even disabling atomic pool in 1st kernel didn't
> > cause any problem on one AMD EPYC system which supports SME. I am not
> > expert of DMA area, wondering how atomic pool is supposed to do in
> > SME/SEV system. 
> 
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).

Thanks for the information.

While from the code comment around which atomic pool is requested,
on amd system it's used to satisfy decrypting memory because that
can't block. Combined with your saying, it's because NVMe driver
need decrypted memory on AMD system? 

void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
{
..
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
 (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
..
}

Looking at the those related commits, the below one from David tells 
that atomic dma pool is used when device require non-blocking and
unencrypted buffer. When I checked the system I borrowed, it's AMD EYPC
and SME is enabled. And it has many pci devices, as you can see, its 'ls
pci' outputs 113 lines. But disabling the three atomic pools didn't
trigger any error on that AMD system. Does it mean only specific devices
need this atomic pool in SME/SEV enabling case? Should we add more
details in document or code comment to make clear this? 

commit 82fef0ad811fb5976cf36ccc3d2c3bc0195dfb72
Author: David Rientjes 
Date:   Tue Apr 14 17:05:01 2020 -0700

x86/mm: unencrypted non-blocking DMA allocations use coherent pools

When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
DMA, all non-blocking allocations must originate from the atomic DMA
coherent pools.

Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: David Rientjes 
Signed-off-by: Christoph 

Re: [PATCH RFC 0/8] iommu/arm-smmu-v3: add support for ECMDQ register mode

2021-08-10 Thread Leizhen (ThunderTown)



On 2021/8/11 2:35, Will Deacon wrote:
> On Sat, Jun 26, 2021 at 07:01:22PM +0800, Zhen Lei wrote:
>> SMMU v3.3 added a new feature, which is Enhanced Command queue interface
>> for reducing contention when submitting Commands to the SMMU, in this
>> patch set, ECMDQ is the abbreviation of Enhanced Command Queue.
>>
>> When the hardware supports ECMDQ and each core can exclusively use one ECMDQ,
>> each core does not need to compete with other cores when using its own ECMDQ.
>> This means that each core can insert commands in parallel. If each ECMDQ can
>> execute commands in parallel, the overall performance may be better. However,
>> our hardware currently does not support multiple ECMDQ execute commands in
>> parallel.
>>
>> In order to reuse existing code, I originally still call 
>> arm_smmu_cmdq_issue_cmdlist()
>> to insert commands. Even so, however, there was a performance improvement of 
>> nearly 12%
>> in strict mode.
>>
>> The test environment is the EMU, which simulates the connection of the 200 
>> Gbit/s NIC.
>> Number of queues:passthrough   lazy   strict(ECMDQ)  strict(CMDQ)
>>   6  188180   162   145--> 
>> 11.7% improvement
>>   8  188188   184   183--> 
>> 0.55% improvement
> 
> Sorry, I don't quite follow the numbers here. Why does the number of queues
> affect the classic "CMDQ" mode? We only have one queue there, right?

These queues indicates the network concurrency, maybe I should use channels or 
threads.
6 means six threads are deployed on different cores using their own channels to 
send
and receive network packets.

> 
>> In recent days, I implemented a new function without competition with other
>> cores to replace arm_smmu_cmdq_issue_cmdlist() when a core can have an ECMDQ.
>> I'm guessing it might get better performance results. Because the EMU is too
>> slow, it will take a while before the relevant data is available.
> 
> I'd certainly prefer to wait until we have something we know is
> representative. 

Yes, it would be better to have an actual set of performance data. Now the EMU 
is
used to analyze hardware problems. This test has not been numbered yet.

> However, I can take the first four prep patches now if you
> respin the second one. At least that's then less for you to carry.

Great. Thank you. I will respin the second one.

> 
> I'd also like review from the Arm side on this (and thank you for adopting
> the architecture unlike others seem to have done judging by the patches
> floating around).
> 
> Will
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/5] Fixes for dma-iommu swiotlb bounce buffers

2021-08-10 Thread David Stevens
From: David Stevens 

This patch set includes various fixes for dma-iommu's swiotlb bounce
buffers for untrusted devices. There are three fixes for correctness
issues, one performance issue, and one general cleanup.

The min_align_mask issue was found when running fio on an untrusted nvme
device with bs=512. The other issues were found via code inspection, so
I don't have any specific use cases where things were not working, nor
any concrete performance numbers.

v2 -> v3:
 - Add new patch to address min_align_mask bug
 - Set SKIP_CPU_SYNC flag after syncing in map/unmap
 - Properly call arch_sync_dma_for_cpu in iommu_dma_sync_sg_for_cpu

v1 -> v2:
 - Split fixes into dedicated patches
 - Less invasive changes to fix arch_sync when mapping
 - Leave dev_is_untrusted check for strict iommu

David Stevens (5):
  dma-iommu: fix sync_sg with swiotlb
  dma-iommu: fix arch_sync_dma for map
  dma-iommu: add SKIP_CPU_SYNC after syncing
  dma-iommu: Check CONFIG_SWIOTLB more broadly
  dma-iommu: account for min_align_mask

 drivers/iommu/dma-iommu.c | 97 +--
 1 file changed, 53 insertions(+), 44 deletions(-)

-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/5] dma-iommu: fix sync_sg with swiotlb

2021-08-10 Thread David Stevens
From: David Stevens 

The is_swiotlb_buffer function takes the physical address of the swiotlb
buffer, not the physical address of the original buffer. The sglist
contains the physical addresses of the original buffer, so for the
sync_sg functions to work properly when a bounce buffer might have been
used, we need to use iommu_iova_to_phys to look up the physical address.
This is what sync_single does, so call that function on each sglist
segment.

The previous code mostly worked because swiotlb does the transfer on map
and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
sglists or which call sync_sg would not have had anything copied to the
bounce buffer.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..54e103b989d9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -813,14 +813,13 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
-   for_each_sg(sgl, sg, nelems, i) {
-   if (!dev_is_dma_coherent(dev))
+   if (dev_is_untrusted(dev))
+   for_each_sg(sgl, sg, nelems, i)
+   iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
+ sg->length, dir);
+   else
+   for_each_sg(sgl, sg, nelems, i)
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
-
-   if (is_swiotlb_buffer(sg_phys(sg)))
-   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
-   sg->length, dir);
-   }
 }
 
 static void iommu_dma_sync_sg_for_device(struct device *dev,
@@ -833,14 +832,14 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
-   for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
-   swiotlb_sync_single_for_device(dev, sg_phys(sg),
-  sg->length, dir);
-
-   if (!dev_is_dma_coherent(dev))
+   if (dev_is_untrusted(dev))
+   for_each_sg(sgl, sg, nelems, i)
+   iommu_dma_sync_single_for_device(dev,
+sg_dma_address(sg),
+sg->length, dir);
+   else
+   for_each_sg(sgl, sg, nelems, i)
arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
-   }
 }
 
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/5] dma-iommu: Check CONFIG_SWIOTLB more broadly

2021-08-10 Thread David Stevens
From: David Stevens 

Introduce a new dev_use_swiotlb function to guard swiotlb code, instead
of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be
checked more broadly, so the swiotlb related code can be removed more
aggressively.

Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index be0214b1455c..89b689bf801f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -317,6 +317,11 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+static bool dev_use_swiotlb(struct device *dev)
+{
+   return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -553,8 +558,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
 * If both the physical buffer start address and size are
 * page aligned, we don't need to use a bounce page.
 */
-   if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
-   iova_offset(iovad, phys | org_size)) {
+   if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | org_size)) {
aligned_size = iova_align(iovad, org_size);
phys = swiotlb_tbl_map_single(dev, phys, org_size,
  aligned_size, dir, attrs);
@@ -779,7 +783,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -795,7 +799,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -813,10 +817,10 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sg;
int i;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
  sg->length, dir);
@@ -832,10 +836,10 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
struct scatterlist *sg;
int i;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_device(dev,
 sg_dma_address(sg),
@@ -999,7 +1003,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_deferred_attach(dev, domain))
return 0;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
@@ -1078,7 +1082,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
attrs |= DMA_ATTR_SKIP_CPU_SYNC;
}
 
-   if (dev_is_untrusted(dev)) {
+   if (dev_use_swiotlb(dev)) {
iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
return;
}
-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/5] dma-iommu: account for min_align_mask

2021-08-10 Thread David Stevens
From: David Stevens 

For devices which set min_align_mask, swiotlb preserves the offset of
the original physical address within that mask. Since __iommu_dma_map
accounts for non-aligned addresses, passing a non-aligned swiotlb
address with the swiotlb aligned size results in the offset being
accounted for twice in the size passed to iommu_map_atomic. The extra
page exposed to DMA is also not cleaned up by __iommu_dma_unmap, since
tht at function unmaps with the correct size. This causes mapping failures
if the iova gets reused, due to collisions in the iommu page tables.

To fix this, pass the original size to __iommu_dma_map, since that
function already handles alignment.

Additionally, when swiotlb returns non-aligned addresses, there is
padding at the start of the bounce buffer that needs to be cleared.

Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 89b689bf801f..ffa7e8ef5db4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -549,9 +549,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
-   size_t aligned_size = org_size;
-   void *padding_start;
-   size_t padding_size;
+   void *tlb_start;
+   size_t aligned_size, iova_off, mapping_end_off;
dma_addr_t iova;
 
/*
@@ -566,24 +565,26 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
if (phys == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
-   /* Cleanup the padding area. */
-   padding_start = phys_to_virt(phys);
-   padding_size = aligned_size;
+   iova_off = iova_offset(iovad, phys);
+   tlb_start = phys_to_virt(phys - iova_off);
 
+   /* Cleanup the padding area. */
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE ||
 dir == DMA_BIDIRECTIONAL)) {
-   padding_start += org_size;
-   padding_size -= org_size;
+   mapping_end_off = iova_off + org_size;
+   memset(tlb_start, 0, iova_off);
+   memset(tlb_start + mapping_end_off, 0,
+  aligned_size - mapping_end_off);
+   } else {
+   memset(tlb_start, 0, aligned_size);
}
-
-   memset(padding_start, 0, padding_size);
}
 
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
arch_sync_dma_for_device(phys, org_size, dir);
 
-   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
+   iova = __iommu_dma_map(dev, phys, org_size, prot, dma_mask);
if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/5] dma-iommu: fix arch_sync_dma for map

2021-08-10 Thread David Stevens
From: David Stevens 

When calling arch_sync_dma, we need to pass it the memory that's
actually being used for dma. When using swiotlb bounce buffers, this is
the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
helper, so it can use the bounce buffer address if necessary. This also
means it is no longer necessary to call iommu_dma_sync_sg_for_device in
iommu_dma_map_sg for untrusted devices.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 54e103b989d9..4f0cc4a0a61f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -576,6 +576,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
memset(padding_start, 0, padding_size);
}
 
+   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   arch_sync_dma_for_device(phys, org_size, dir);
+
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
@@ -848,14 +851,9 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
 {
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
-   dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
+   return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
coherent, dir, attrs);
-   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   dma_handle != DMA_MAPPING_ERROR)
-   arch_sync_dma_for_device(phys, size, dir);
-   return dma_handle;
 }
 
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
@@ -998,12 +996,12 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_deferred_attach(dev, domain))
return 0;
 
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
-
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
+
/*
 * Work out how much IOVA space we need, and align the segments to
 * IOVA granules for the IOMMU driver to handle. With some clever
-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/5] dma-iommu: add SKIP_CPU_SYNC after syncing

2021-08-10 Thread David Stevens
From: David Stevens 

After syncing in map/unmap, add the DMA_ATTR_SKIP_CPU_SYNC flag so
anything that uses attrs later on will skip any sync work that has
already been completed. In particular, this skips copying from the
swiotlb twice during unmap.

Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4f0cc4a0a61f..be0214b1455c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -859,8 +859,11 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
+   attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+   }
+
__iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
 }
 
@@ -999,8 +1002,10 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
+   attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+   }
 
/*
 * Work out how much IOVA space we need, and align the segments to
@@ -1068,8 +1073,10 @@ static void iommu_dma_unmap_sg(struct device *dev, 
struct scatterlist *sg,
struct scatterlist *tmp;
int i;
 
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
+   attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+   }
 
if (dev_is_untrusted(dev)) {
iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
-- 
2.32.0.605.g8dce9f2422-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-08-10 Thread Christoph Hellwig
On Tue, Aug 10, 2021 at 03:52:25PM -0500, Tom Lendacky via iommu wrote:
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).

Not just the NVMe driver.  We have plenty of drivers doing that, just
do a quick grep for dma_alloc_* dma_poll_alloc, dma_pool_zalloc with
GFP_ATOMIC (and that won't even find multi-line strings).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] dma-iommu: fix sync_sg with swiotlb

2021-08-10 Thread Christoph Hellwig
On Wed, Aug 11, 2021 at 11:42:43AM +0900, David Stevens wrote:
> From: David Stevens 
> 
> The is_swiotlb_buffer function takes the physical address of the swiotlb
> buffer, not the physical address of the original buffer. The sglist
> contains the physical addresses of the original buffer, so for the
> sync_sg functions to work properly when a bounce buffer might have been
> used, we need to use iommu_iova_to_phys to look up the physical address.
> This is what sync_single does, so call that function on each sglist
> segment.
> 
> The previous code mostly worked because swiotlb does the transfer on map
> and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
> sglists or which call sync_sg would not have had anything copied to the
> bounce buffer.
> 
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: David Stevens 
> ---
>  drivers/iommu/dma-iommu.c | 27 +--
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 98ba927aee1a..54e103b989d9 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -813,14 +813,13 @@ static void iommu_dma_sync_sg_for_cpu(struct device 
> *dev,
>   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
>   return;
>  
> + if (dev_is_untrusted(dev))
> + for_each_sg(sgl, sg, nelems, i)
> + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> +   sg->length, dir);
> + else
> + for_each_sg(sgl, sg, nelems, i)
>   arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  }

I'd remove the above check and fold the if (!dev_is_dma_coherent(dev))
into the else line.  Same for iommu_dma_sync_sg_for_device.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu