Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-02 Thread Sai Prakash Ranjan

On 2021-08-02 21:42, Will Deacon wrote:

On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote:
Some clocks for SMMU can have parent as XO such as 
gpu_cc_hub_cx_int_clk
of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states 
in
such cases, we would need to drop the XO clock vote in unprepare call 
and
this unprepare callback for XO is in RPMh (Resource Power 
Manager-Hardened)
clock driver which controls RPMh managed clock resources for new QTI 
SoCs

and is a blocking call.

Given we cannot have a sleeping calls such as clk_bulk_prepare() and
clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
operations like map and unmap can be in atomic context and are in fast
path, add this prepare and unprepare call to drop the XO vote only for
system pm callbacks since it is not a fast path and we expect the 
system

to enter deep sleep states with system pm as opposed to runtime pm.

This is a similar sequence of clock requests (prepare,enable and
disable,unprepare) in arm-smmu probe and remove.

Signed-off-by: Sai Prakash Ranjan 
Co-developed-by: Rajendra Nayak 
Signed-off-by: Rajendra Nayak 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)


[+Rob]

How does this work with that funny GPU which writes to the SMMU 
registers
directly? Does the SMMU need to remain independently clocked for that 
to

work or is it all in the same clock domain?



As Rob mentioned, device link should take care of all the dependencies 
between
SMMU and its consumers. But not sure how the question relates to this 
patch as this
change is for system pm and not runtime pm, so it is exactly the 
sequence of
SMMU probe/remove which if works currently for that GPU SMMU, then it 
should work

just fine for system suspend and resume as well.

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d3c6f54110a5..9561ba4c5d39 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2277,6 +2277,13 @@ static int __maybe_unused 
arm_smmu_runtime_suspend(struct device *dev)


 static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
 {
+   int ret;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   ret = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (ret)
+   return ret;
+
if (pm_runtime_suspended(dev))
return 0;


If we subsequently fail to enable the clks in arm_smmu_runtime_resume()
should we unprepare them again?



If we are unable to turn on the clks then its fatal and we will not live 
for long.


Thanks,
Sai


Will

@@ -2285,10 +2292,19 @@ static int __maybe_unused 
arm_smmu_pm_resume(struct device *dev)


 static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
 {
+   int ret = 0;
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
if (pm_runtime_suspended(dev))
-   return 0;
+   goto clk_unprepare;

-   return arm_smmu_runtime_suspend(dev);
+   ret = arm_smmu_runtime_suspend(dev);
+   if (ret)
+   return ret;
+
+clk_unprepare:
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+   return ret;
 }

 static const struct dev_pm_ops arm_smmu_pm_ops = {
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/iova: kmemleak when disable SRIOV.

2021-08-02 Thread Zhou, Peng Ju via iommu
[AMD Official Use Only]

Hi Chris
I hit kmemleak with your following patch, Can you help to fix it?

According to the info in this thread, it seems the patch doesn't merge into 
iommu mainline branch, but I can get your patch from my kernel: 5.11.0

 
commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d
Author: Chris Wilson 
Date:   Sat Jan 16 11:10:35 2021 +

iommu/iova: Use bottom-up allocation for DMA32

Since DMA32 allocations are currently allocated top-down from the 4G
boundary, this causes fragmentation and reduces the maximise allocation
size. On some systems, the dma address space may be extremely
constrained by PCIe windows, requiring a stronger anti-fragmentation
strategy.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2929
Signed-off-by: Chris Wilson 


-- 
BW
Pengju Zhou





> -Original Message-
> From: Robin Murphy 
> Sent: Tuesday, July 27, 2021 10:23 PM
> To: Zhou, Peng Ju ; iommu@lists.linux-
> foundation.org
> Cc: Deucher, Alexander ; Wang, Yin
> ; w...@kernel.org
> Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV.
> 
> On 2021-07-27 15:05, Zhou, Peng Ju wrote:
> > [AMD Official Use Only]
> >
> > Hi Robin
> > The patch which add "head" :
> >
> > commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d
> > Author: Chris Wilson 
> > Date:   Sat Jan 16 11:10:35 2021 +
> >
> >  iommu/iova: Use bottom-up allocation for DMA32
> >
> >  Since DMA32 allocations are currently allocated top-down from the 4G
> >  boundary, this causes fragmentation and reduces the maximise allocation
> >  size. On some systems, the dma address space may be extremely
> >  constrained by PCIe windows, requiring a stronger anti-fragmentation
> >  strategy.
> >
> >  Closes:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.f
> reedesktop.org%2Fdrm%2Fintel%2F-
> %2Fissues%2F2929&data=04%7C01%7CPengJu.Zhou%40amd.com%7C47f
> c4308f6044a379ed908d9510a19b1%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C637629927137121754%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000
> &sdata=iO5%2FKSW8KV1UZtwGU3oiZpYqiR4eBNcSpF3%2Ft6uSDpY%3D&
> amp;reserved=0
> >  Signed-off-by: Chris Wilson 
> 
> ...which is not in mainline. I've never even seen it posted for review.
> In fact two search engines can't seem to find any trace of that SHA or patch
> subject on the internet at all.
> 
> By all means tell Chris that his patch, wherever you got it from, is broken, 
> but
> once again there's nothing the upstream maintainers/reviewers can do about
> code which isn't upstream.
> 
> Thanks,
> Robin.
> 
> > --
> > BW
> > Pengju Zhou
> >
> >
> >
> >> -Original Message-
> >> From: Robin Murphy 
> >> Sent: Tuesday, July 27, 2021 4:52 PM
> >> To: Zhou, Peng Ju ; iommu@lists.linux-
> >> foundation.org
> >> Cc: Deucher, Alexander ; Wang, Yin
> >> ; w...@kernel.org
> >> Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV.
> >>
> >> On 2021-07-27 05:46, Zhou, Peng Ju wrote:
> >>> [AMD Official Use Only]
> >>>
> >>> Hi Robin
> >>> 1. it is not a good manner to free a statically allocated object(in
> >>> this case, it
> >> is iovad->head) dynamically even though the free only occurred when
> >> shut down the OS in most cases.
> >>> 2. the kmemleak occurred when disable SRIOV(remove a PCIE device), I
> >>> post the log in the following, in the log, the line:" kmemleak:
> >>> Found object by alias at 0x9221ae647050" means the OS frees a
> >>> not existing object(iovad->head) which added to RB_TREE in the
> >>> function init_iova_domain
> >>
> >> Sure, it was apparent enough what the bug was; my point is that the
> >> bug does not exist in mainline. This is the current mainline
> >> definition of struct
> >> iova_domain:
> >>
> >>
> >> /* holds all the iova translations for a domain */ struct iova_domain {
> >>spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree
> >> */
> >>struct rb_root  rbroot; /* iova domain rbtree root */
> >>struct rb_node  *cached_node;   /* Save last alloced node */
> >>struct rb_node  *cached32_node; /* Save last 32-bit alloced node */
> >>unsigned long   granule;/* pfn granularity for this domain */
> >>unsigned long   start_pfn;  /* Lower limit for this domain */
> >>unsigned long   dma_32bit_pfn;
> >>unsigned long   max32_alloc_size; /* Size of last failed allocation */
> >>struct iova_fq __percpu *fq;/* Flush Queue */
> >>
> >>atomic64_t  fq_flush_start_cnt; /* Number of TLB flushes
> >> that
> >>   have been started */
> >>
> >>atomic64_t  fq_flush_finish_cnt;/* Number of TLB flushes
> >> that
> >>   have been finished */
> >

Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation

2021-08-02 Thread Sai Prakash Ranjan

On 2021-08-02 21:13, Will Deacon wrote:

On Wed, Jun 23, 2021 at 07:12:01PM +0530, Sai Prakash Ranjan wrote:
Currently for iommu_unmap() of large scatter-gather list with page 
size
elements, the majority of time is spent in flushing of partial walks 
in

__arm_lpae_unmap() which is a VA based TLB invalidation invalidating
page-by-page on iommus like arm-smmu-v2 (TLBIVA).

For example: to unmap a 32MB scatter-gather list with page size 
elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize 
(2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs 
(2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a 
huge

overhead.

On qcom implementation, there are several performance improvements for
TLB cache invalidations in HW like wait-for-safe (for realtime clients
such as camera and display) and few others to allow for cache
lookups/updates when TLBI is in progress for the same context bank.
So the cost of over-invalidation is less compared to the unmap latency
on several usecases like camera which deals with large buffers. So,
ASID based TLB invalidations (TLBIASID) can be used to invalidate the
entire context for partial walk flush thereby improving the unmap
latency.

Non-strict mode can use this by default for all platforms given its
all about over-invalidation saving time on individual unmaps and
non-deterministic generally.

For this example of 32MB scatter-gather list unmap, this change 
results

in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192
TLBIVAs thereby increasing the performance of unmaps drastically.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

sizeiommu_map_sg  iommu_unmap
  4K2.067 us 1.854 us
 64K9.598 us 8.802 us
  1M  148.890 us   130.718 us
  2M  305.864 us67.291 us
 12M 1793.604 us   390.838 us
 16M 2386.848 us   518.187 us
 24M 3563.296 us   775.989 us
 32M 4747.171 us  1033.364 us

After this optimization:

sizeiommu_map_sg  iommu_unmap
  4K1.723 us 1.765 us
 64K9.880 us 8.869 us
  1M  155.364 us   135.223 us
  2M  303.906 us 5.385 us
 12M 1786.557 us21.250 us
 16M 2391.890 us27.437 us
 24M 3570.895 us39.937 us
 32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in 
which

will result in just 1 TLBIASID as compared to 16 TLBIASIDs.

Real world data also shows big difference in unmap performance as 
below:


There were reports of camera frame drops because of high overhead in
iommu unmap without this optimization because of frequent unmaps 
issued
by camera of about 100MB/s taking more than 100ms thereby causing 
frame

drops.

Signed-off-by: Sai Prakash Ranjan 
---

Changes in v3:
 * Move the logic to arm-smmu driver from io-pgtable (Robin)
 * Use a new set of iommu_flush_ops->arm_smmu_s1_tlb_impl_ops and use 
it for qcom impl


Changes in v2:
 * Add a quirk to choose tlb_flush_all in partial walk flush
 * Set the quirk for QTI SoC implementation

---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 17 -
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c

index 7771d40176de..218c71465819 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,6 +10,8 @@

 #include "arm-smmu.h"

+extern const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops;
+
 struct qcom_smmu {
struct arm_smmu_device smmu;
bool bypass_quirk;
@@ -146,6 +148,8 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,

 {
struct adreno_smmu_priv *priv;

+   pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops;
+
/* Only enable split pagetables for the GPU device (SID 0) */
if (!qcom_adreno_smmu_is_gpu_device(dev))
return 0;
@@ -185,6 +189,14 @@ static const struct of_device_id 
qcom_smmu_client_of_match[] __maybe_unused = {

{ }
 };

+static int qcom_smmu_init_context(struct arm_smmu_domain 
*smmu_domain,

+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
+{
+   pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops;
+
+   return 0;
+}
+
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
 	unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups 
- 1);
@@ -308,6 +320,7 @@ static int qcom_smmu500_reset(struct 
arm_smmu_device *smmu)

 }

 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .init_context = qcom_smmu_init_context,
  

Re: [PATCH v2 11/11] memory: mtk-smi: mt8195: Add initial setting for smi-larb

2021-08-02 Thread Ikjoon Jang
Hi,

On Thu, Jul 29, 2021 at 2:41 PM Yong Wu  wrote:
>
> Hi Ikjoon,
>
> Just a ping.
>
> On Thu, 2021-07-22 at 14:38 +0800, Yong Wu wrote:
> > On Wed, 2021-07-21 at 21:40 +0800, Ikjoon Jang wrote:
> > > On Thu, Jul 15, 2021 at 8:23 PM Yong Wu  wrote:
> > > >
> > > > To improve the performance, We add some initial setting for smi larbs.
> > > > there are two part:
> > > > 1), Each port has the special ostd(outstanding) value in each larb.
> > > > 2), Two general setting for each larb.

Honestly, I think nobody outside Mediatek will understand this.
Can you please update this to be more generic?
Like "Apply default bus settings for mt8195, without this, XXX
problems can happen.. "?

Or for example, adding brief descriptions on what
MTK_SMI_FLAG_LARB_THRT_EN, MTK_SMI_FLAG_LARB_SW_FLAG,
and MTK_SMI_FLAG_LARB_SW_FLAG[] are for would be better if it's available.

> > > >
> > > > In some SoC, this setting maybe changed dynamically for some special 
> > > > case
> > > > like 4K, and this initial setting is enough in mt8195.
> > > >
> > > > Signed-off-by: Yong Wu 
> > > > ---
> > [...]
> > > >  struct mtk_smi {
> > > > @@ -213,12 +228,22 @@ static void 
> > > > mtk_smi_larb_config_port_mt8173(struct device *dev)
> > > >  static void mtk_smi_larb_config_port_gen2_general(struct device *dev)
> > > >  {
> > > > struct mtk_smi_larb *larb = dev_get_drvdata(dev);
> > > > -   u32 reg;
> > > > +   u32 reg, flags_general = larb->larb_gen->flags_general;
> > > > +   const u8 *larbostd = larb->larb_gen->ostd[larb->larbid];
> > > > int i;
> > > >
> > > > if (BIT(larb->larbid) & 
> > > > larb->larb_gen->larb_direct_to_common_mask)
> > > > return;
> > > >
> > > > +   if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_THRT_EN))
> > > > +   writel_relaxed(SMI_LARB_THRT_EN, larb->base + 
> > > > SMI_LARB_CMD_THRT_CON);
> > > > +
> > > > +   if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_SW_FLAG))
> > > > +   writel_relaxed(SMI_LARB_SW_FLAG_1, larb->base + 
> > > > SMI_LARB_SW_FLAG);
> > > > +
> > > > +   for (i = 0; i < SMI_LARB_PORT_NR_MAX && larbostd && 
> > > > !!larbostd[i]; i++)
> > > > +   writel_relaxed(larbostd[i], larb->base + 
> > > > SMI_LARB_OSTDL_PORTx(i));
> > >
> > > All other mtk platform's larbs have the same format for 
> > > SMI_LARB_OSTDL_PORTx()
> > > registers at the same offset? or is this unique feature for mt8195?
> >
> > All the other Platform's larbs have the same format at the same offset.
>
> In this case, Do you have some other further comment? If no, I will keep
> the current solution for this.

Sorry for the late response,
I have no further comments or any objections on here. Please go ahead for v3.
I just had no idea on the register definitions and wanted to be sure that
newly added register definitions are common to all MTK platforms.

Thanks!

>
> Thanks.
>
> >
> > >
> > > > +
> > > > for_each_set_bit(i, (unsigned long *)larb->mmu, 32) {
> > > > reg = readl_relaxed(larb->base + 
> > > > SMI_LARB_NONSEC_CON(i));
> > > > reg |= F_MMU_EN;
> > > > @@ -227,6 +252,51 @@ static void 
> > > > mtk_smi_larb_config_port_gen2_general(struct device *dev)
> > > > }
> > > >  }
> > > >
> >
> > [...]
> >
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-02 Thread Tian, Kevin
> From: David Gibson 
> Sent: Tuesday, August 3, 2021 9:51 AM
> 
> On Wed, Jul 28, 2021 at 04:04:24AM +, Tian, Kevin wrote:
> > Hi, David,
> >
> > > From: David Gibson 
> > > Sent: Monday, July 26, 2021 12:51 PM
> > >
> > > On Fri, Jul 09, 2021 at 07:48:44AM +, Tian, Kevin wrote:
> > > > /dev/iommu provides an unified interface for managing I/O page tables
> for
> > > > devices assigned to userspace. Device passthrough frameworks (VFIO,
> > > vDPA,
> > > > etc.) are expected to use this interface instead of creating their own
> logic to
> > > > isolate untrusted device DMAs initiated by userspace.
> > > >
> > > > This proposal describes the uAPI of /dev/iommu and also sample
> > > sequences
> > > > with VFIO as example in typical usages. The driver-facing kernel API
> > > provided
> > > > by the iommu layer is still TBD, which can be discussed after consensus
> is
> > > > made on this uAPI.
> > > >
> > > > It's based on a lengthy discussion starting from here:
> > > > https://lore.kernel.org/linux-
> > > iommu/20210330132830.go2356...@nvidia.com/
> > > >
> > > > v1 can be found here:
> > > > https://lore.kernel.org/linux-
> > >
> iommu/PH0PR12MB54811863B392C644E5365446DC3E9@PH0PR12MB5481.n
> > > amprd12.prod.outlook.com/T/
> > > >
> > > > This doc is also tracked on github, though it's not very useful for 
> > > > v1->v2
> > > > given dramatic refactoring:
> > > > https://github.com/luxis1999/dev_iommu_uapi
> > >
> > > Thanks for all your work on this, Kevin.  Apart from the actual
> > > semantic improvements, I'm finding v2 significantly easier to read and
> > > understand than v1.
> > >
> > > [snip]
> > > > 1.2. Attach Device to I/O address space
> > > > +++
> > > >
> > > > Device attach/bind is initiated through passthrough framework uAPI.
> > > >
> > > > Device attaching is allowed only after a device is successfully bound to
> > > > the IOMMU fd. User should provide a device cookie when binding the
> > > > device through VFIO uAPI. This cookie is used when the user queries
> > > > device capability/format, issues per-device iotlb invalidation and
> > > > receives per-device I/O page fault data via IOMMU fd.
> > > >
> > > > Successful binding puts the device into a security context which 
> > > > isolates
> > > > its DMA from the rest system. VFIO should not allow user to access the
> > > > device before binding is completed. Similarly, VFIO should prevent the
> > > > user from unbinding the device before user access is withdrawn.
> > > >
> > > > When a device is in an iommu group which contains multiple devices,
> > > > all devices within the group must enter/exit the security context
> > > > together. Please check {1.3} for more info about group isolation via
> > > > this device-centric design.
> > > >
> > > > Successful attaching activates an I/O address space in the IOMMU,
> > > > if the device is not purely software mediated. VFIO must provide device
> > > > specific routing information for where to install the I/O page table in
> > > > the IOMMU for this device. VFIO must also guarantee that the attached
> > > > device is configured to compose DMAs with the routing information
> that
> > > > is provided in the attaching call. When handling DMA requests, IOMMU
> > > > identifies the target I/O address space according to the routing
> > > > information carried in the request. Misconfiguration breaks DMA
> > > > isolation thus could lead to severe security vulnerability.
> > > >
> > > > Routing information is per-device and bus specific. For PCI, it is
> > > > Requester ID (RID) identifying the device plus optional Process Address
> > > > Space ID (PASID). For ARM, it is Stream ID (SID) plus optional Sub-
> Stream
> > > > ID (SSID). PASID or SSID is used when multiple I/O address spaces are
> > > > enabled on a single device. For simplicity and continuity reason the
> > > > following context uses RID+PASID though SID+SSID may sound a clearer
> > > > naming from device p.o.v. We can decide the actual naming when
> coding.
> > > >
> > > > Because one I/O address space can be attached by multiple devices,
> > > > per-device routing information (plus device cookie) is tracked under
> > > > each IOASID and is used respectively when activating the I/O address
> > > > space in the IOMMU for each attached device.
> > > >
> > > > The device in the /dev/iommu context always refers to a physical one
> > > > (pdev) which is identifiable via RID. Physically each pdev can support
> > > > one default I/O address space (routed via RID) and optionally multiple
> > > > non-default I/O address spaces (via RID+PASID).
> > > >
> > > > The device in VFIO context is a logic concept, being either a physical
> > > > device (pdev) or mediated device (mdev or subdev). Each vfio device
> > > > is represented by RID+cookie in IOMMU fd. User is allowed to create
> > > > one default I/O address space (routed by vRID from user p.o.v) per
> > > > each vfio_device. 

Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-02 Thread David Gibson
On Fri, Jul 30, 2021 at 11:51:23AM -0300, Jason Gunthorpe wrote:
> On Mon, Jul 26, 2021 at 02:50:48PM +1000, David Gibson wrote:
> 
> > That said, I'm still finding the various ways a device can attach to
> > an ioasid pretty confusing.  Here are some thoughts on some extra
> > concepts that might make it easier to handle [note, I haven't thought
> > this all the way through so far, so there might be fatal problems with
> > this approach].
> 
> I think you've summarized how I've been viewing this problem. All the
> concepts you pointed to should show through in the various APIs at the
> end, one way or another.
> 
> How much we need to expose to userspace, I don't know.
> 
> Does userspace need to care how the system labels traffic between DMA
> endpoint and the IOASID? At some point maybe yes since stuff like
> PASID does leak out in various spots

Yeah, I'm not sure.  I think it probably doesn't for the "main path"
of the API, though we might want to expose that for debugging and some
edge cases.

We *should* however be exposing the address type for each IOAS, since
that affects how your MAP operations will work, as well as what
endpoints are compatible with the IOAS.

> > /dev/iommu would work entirely (or nearly so) in terms of endpoint
> > handles, not device handles.  Endpoints are what get bound to an IOAS,
> > and endpoints are what get the user chosen endpoint cookie.
> 
> While an accurate modeling of groups, it feels like an
> overcomplication at this point in history where new HW largely doesn't
> need it.

So.. first, is that really true across the board?  I expect it's true
of high end server hardware, but for consumer level and embedded
hardware as well?  Then there's virtual hardware - I could point to
several things still routinely using emulated PCIe to PCI bridges in
qemu.

Second, we can't just ignore older hardware.

> The user interface VFIO and others presents is device
> centric, inserting a new endpoint object is going going back to some
> kind of group centric view of the world.

Well, kind of, yeah, because I still think the concept has value.
Part of the trouble is that "device" is pretty ambiguous.  "Device" in
the sense of PCI address for register interface may not be the same as
"device" in terms of DMA RID may not be the same as as "device" in
terms of Linux struct device 


terms of PCI register interface is not the same as "device"
in terms of RID / DMA identifiability is not the same "device" in
terms of what.

> I'd rather deduce the endpoint from a collection of devices than the
> other way around...

Which I think is confusing, and in any case doesn't cover the case of
one "device" with multiple endpoints.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-02 Thread David Gibson
On Wed, Jul 28, 2021 at 04:04:24AM +, Tian, Kevin wrote:
> Hi, David,
> 
> > From: David Gibson 
> > Sent: Monday, July 26, 2021 12:51 PM
> > 
> > On Fri, Jul 09, 2021 at 07:48:44AM +, Tian, Kevin wrote:
> > > /dev/iommu provides an unified interface for managing I/O page tables for
> > > devices assigned to userspace. Device passthrough frameworks (VFIO,
> > vDPA,
> > > etc.) are expected to use this interface instead of creating their own 
> > > logic to
> > > isolate untrusted device DMAs initiated by userspace.
> > >
> > > This proposal describes the uAPI of /dev/iommu and also sample
> > sequences
> > > with VFIO as example in typical usages. The driver-facing kernel API
> > provided
> > > by the iommu layer is still TBD, which can be discussed after consensus is
> > > made on this uAPI.
> > >
> > > It's based on a lengthy discussion starting from here:
> > >   https://lore.kernel.org/linux-
> > iommu/20210330132830.go2356...@nvidia.com/
> > >
> > > v1 can be found here:
> > >   https://lore.kernel.org/linux-
> > iommu/PH0PR12MB54811863B392C644E5365446DC3E9@PH0PR12MB5481.n
> > amprd12.prod.outlook.com/T/
> > >
> > > This doc is also tracked on github, though it's not very useful for v1->v2
> > > given dramatic refactoring:
> > >   https://github.com/luxis1999/dev_iommu_uapi
> > 
> > Thanks for all your work on this, Kevin.  Apart from the actual
> > semantic improvements, I'm finding v2 significantly easier to read and
> > understand than v1.
> > 
> > [snip]
> > > 1.2. Attach Device to I/O address space
> > > +++
> > >
> > > Device attach/bind is initiated through passthrough framework uAPI.
> > >
> > > Device attaching is allowed only after a device is successfully bound to
> > > the IOMMU fd. User should provide a device cookie when binding the
> > > device through VFIO uAPI. This cookie is used when the user queries
> > > device capability/format, issues per-device iotlb invalidation and
> > > receives per-device I/O page fault data via IOMMU fd.
> > >
> > > Successful binding puts the device into a security context which isolates
> > > its DMA from the rest system. VFIO should not allow user to access the
> > > device before binding is completed. Similarly, VFIO should prevent the
> > > user from unbinding the device before user access is withdrawn.
> > >
> > > When a device is in an iommu group which contains multiple devices,
> > > all devices within the group must enter/exit the security context
> > > together. Please check {1.3} for more info about group isolation via
> > > this device-centric design.
> > >
> > > Successful attaching activates an I/O address space in the IOMMU,
> > > if the device is not purely software mediated. VFIO must provide device
> > > specific routing information for where to install the I/O page table in
> > > the IOMMU for this device. VFIO must also guarantee that the attached
> > > device is configured to compose DMAs with the routing information that
> > > is provided in the attaching call. When handling DMA requests, IOMMU
> > > identifies the target I/O address space according to the routing
> > > information carried in the request. Misconfiguration breaks DMA
> > > isolation thus could lead to severe security vulnerability.
> > >
> > > Routing information is per-device and bus specific. For PCI, it is
> > > Requester ID (RID) identifying the device plus optional Process Address
> > > Space ID (PASID). For ARM, it is Stream ID (SID) plus optional Sub-Stream
> > > ID (SSID). PASID or SSID is used when multiple I/O address spaces are
> > > enabled on a single device. For simplicity and continuity reason the
> > > following context uses RID+PASID though SID+SSID may sound a clearer
> > > naming from device p.o.v. We can decide the actual naming when coding.
> > >
> > > Because one I/O address space can be attached by multiple devices,
> > > per-device routing information (plus device cookie) is tracked under
> > > each IOASID and is used respectively when activating the I/O address
> > > space in the IOMMU for each attached device.
> > >
> > > The device in the /dev/iommu context always refers to a physical one
> > > (pdev) which is identifiable via RID. Physically each pdev can support
> > > one default I/O address space (routed via RID) and optionally multiple
> > > non-default I/O address spaces (via RID+PASID).
> > >
> > > The device in VFIO context is a logic concept, being either a physical
> > > device (pdev) or mediated device (mdev or subdev). Each vfio device
> > > is represented by RID+cookie in IOMMU fd. User is allowed to create
> > > one default I/O address space (routed by vRID from user p.o.v) per
> > > each vfio_device. VFIO decides the routing information for this default
> > > space based on device type:
> > >
> > > 1)  pdev, routed via RID;
> > >
> > > 2)  mdev/subdev with IOMMU-enforced DMA isolation, routed via
> > > the parent's RID plus the PASID marking this mdev;
> > >
> > > 3)  a purely sw-

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-02 Thread Rob Clark
On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
>
> On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > >
> > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
> > > > > > flag")
> > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > system cache. Now that system cache support for GPU is added, we 
> > > > > > will
> > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > cached.
> > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > >
> > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > and makes GPU the user of this protection flag.
> > > > >
> > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > not apply anymore?
> > > > >
> > > >
> > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > I can repost the patch.
> > >
> > > I still think you need to handle the mismatched alias, no? You're adding
> > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > can't be right.
> > >
> >
> > Just curious, and maybe this is a dumb question, but what is your
> > concern about mismatched aliases?  I mean the cache hierarchy on the
> > GPU device side (anything beyond the LLC) is pretty different and
> > doesn't really care about the smmu pgtable attributes..
>
> If the CPU accesses a shared buffer with different attributes to those which
> the device is using then you fall into the "mismatched memory attributes"
> part of the Arm architecture. It's reasonably unforgiving (you should go and
> read it) and in some cases can apply to speculative accesses as well, but
> the end result is typically loss of coherency.

Ok, I might have a few other sections to read first to decipher the
terminology..

But my understanding of LLC is that it looks just like system memory
to the CPU and GPU (I think that would make it "the point of
coherence" between the GPU and CPU?)  If that is true, shouldn't it be
invisible from the point of view of different CPU mapping options?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-02 Thread Rob Clark
On Mon, Aug 2, 2021 at 9:12 AM Will Deacon  wrote:
>
> On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote:
> > Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk
> > of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in
> > such cases, we would need to drop the XO clock vote in unprepare call and
> > this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened)
> > clock driver which controls RPMh managed clock resources for new QTI SoCs
> > and is a blocking call.
> >
> > Given we cannot have a sleeping calls such as clk_bulk_prepare() and
> > clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
> > operations like map and unmap can be in atomic context and are in fast
> > path, add this prepare and unprepare call to drop the XO vote only for
> > system pm callbacks since it is not a fast path and we expect the system
> > to enter deep sleep states with system pm as opposed to runtime pm.
> >
> > This is a similar sequence of clock requests (prepare,enable and
> > disable,unprepare) in arm-smmu probe and remove.
> >
> > Signed-off-by: Sai Prakash Ranjan 
> > Co-developed-by: Rajendra Nayak 
> > Signed-off-by: Rajendra Nayak 
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
>
> [+Rob]
>
> How does this work with that funny GPU which writes to the SMMU registers
> directly? Does the SMMU need to remain independently clocked for that to
> work or is it all in the same clock domain?

AFAIU the device_link stuff should keep the SMMU clocked as long as
the GPU is alive, so I think this should work out ok.. ie. the SMMU
won't suspend while the GPU is not suspended.

BR,
-R


> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index d3c6f54110a5..9561ba4c5d39 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -2277,6 +2277,13 @@ static int __maybe_unused 
> > arm_smmu_runtime_suspend(struct device *dev)
> >
> >  static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
> >  {
> > + int ret;
> > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > +
> > + ret = clk_bulk_prepare(smmu->num_clks, smmu->clks);
> > + if (ret)
> > + return ret;
> > +
> >   if (pm_runtime_suspended(dev))
> >   return 0;
>
> If we subsequently fail to enable the clks in arm_smmu_runtime_resume()
> should we unprepare them again?
>
> Will
>
> > @@ -2285,10 +2292,19 @@ static int __maybe_unused arm_smmu_pm_resume(struct 
> > device *dev)
> >
> >  static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
> >  {
> > + int ret = 0;
> > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > +
> >   if (pm_runtime_suspended(dev))
> > - return 0;
> > + goto clk_unprepare;
> >
> > - return arm_smmu_runtime_suspend(dev);
> > + ret = arm_smmu_runtime_suspend(dev);
> > + if (ret)
> > + return ret;
> > +
> > +clk_unprepare:
> > + clk_bulk_unprepare(smmu->num_clks, smmu->clks);
> > + return ret;
> >  }
> >
> >  static const struct dev_pm_ops arm_smmu_pm_ops = {
> > --
> > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> > of Code Aurora Forum, hosted by The Linux Foundation
> >
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-08-02 Thread Rajat Jain via iommu
Hi Rob,

On Mon, Aug 2, 2021 at 5:09 PM Rajat Jain  wrote:
>
> Hi Robin, Doug,
>
> On Wed, Jul 14, 2021 at 8:14 AM Doug Anderson  wrote:
> >
> > Hi,
> >
> > On Tue, Jul 13, 2021 at 11:07 AM Robin Murphy  wrote:
> > >
> > > On 2021-07-08 15:36, Doug Anderson wrote:
> > > [...]
> > > >> Or document for the users that want performance how to
> > > >> change the setting, so that they can decide.
> > > >
> > > > Pushing this to the users can make sense for a Linux distribution but
> > > > probably less sense for an embedded platform. So I'm happy to make
> > > > some way for a user to override this (like via kernel command line),
> > > > but I also strongly believe there should be a default that users don't
> > > > have to futz with that we think is correct.
> > >
> > > FYI I did make progress on the "punt it to userspace" approach. I'm not
> > > posting it even as an RFC yet because I still need to set up a machine
> > > to try actually testing any of it (it's almost certainly broken
> > > somewhere), but in the end it comes out looking surprisingly not too bad
> > > overall. If you're curious to take a look in the meantime I put it here:
> > >
> > > https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/fq

BTW, is there another mirror to this? I (and another colleague) are
getting the following error when trying to clone it:

rajatja@rajat2:~/rob_iommu$ git clone
https://git.gitlab.arm.com/linux-arm/linux-rm.git
Cloning into 'linux-rm'...
remote: Enumerating objects: 125712, done.
remote: Counting objects: 100% (125712/125712), done.
remote: Compressing objects: 100% (41203/41203), done.
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 804 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet fatal:
early EOF
fatal: fetch-pack: invalid index-pack output rajatja@rajat2:~/rob_iommu$

We've tried both git and https methods.

>
> I was wondering if you got any closer to testing / sending it out? I
> looked at the patches and am trying to understand, would they also
> make it possible to convert at runtime, an existing "non-strict"
> domain (for a particular device) into a "strict" domain leaving the
> other devices/domains as-is? Please let me know when you think your
> patches are good to be tested, and I'd also be interested in trying
> them out.
>
> >
> > Being able to change this at runtime through sysfs sounds great and it
> > fills all the needs I'm aware of, thanks! In Chrome OS we can just use
> > this with some udev rules and get everything we need.
>
> I still have another (inverse) use case where this does not work:
> We have an Intel chromebook with the default domain type being
> non-strict. There is an LTE modem (an internal PCI device which cannot
> be marked external), which we'd like to be treated as a "Strict" DMA
> domain.
>
> Do I understand it right that using Rob's patches, I could potentially
> switch the domain to "strict" *after* booting (since we don't use
> initramfs), but by that time, the driver might have already attached
> to the modem device (using "non-strict" domain), and thus the damage
> may have already been done? So perhaps we still need a device property
> that the firmware could use to indicate "strictness" for certain
> devices at boot?
>
> Thanks,
> Rajat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-08-02 Thread Rajat Jain via iommu
Hi Robin, Doug,

On Wed, Jul 14, 2021 at 8:14 AM Doug Anderson  wrote:
>
> Hi,
>
> On Tue, Jul 13, 2021 at 11:07 AM Robin Murphy  wrote:
> >
> > On 2021-07-08 15:36, Doug Anderson wrote:
> > [...]
> > >> Or document for the users that want performance how to
> > >> change the setting, so that they can decide.
> > >
> > > Pushing this to the users can make sense for a Linux distribution but
> > > probably less sense for an embedded platform. So I'm happy to make
> > > some way for a user to override this (like via kernel command line),
> > > but I also strongly believe there should be a default that users don't
> > > have to futz with that we think is correct.
> >
> > FYI I did make progress on the "punt it to userspace" approach. I'm not
> > posting it even as an RFC yet because I still need to set up a machine
> > to try actually testing any of it (it's almost certainly broken
> > somewhere), but in the end it comes out looking surprisingly not too bad
> > overall. If you're curious to take a look in the meantime I put it here:
> >
> > https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/fq

I was wondering if you got any closer to testing / sending it out? I
looked at the patches and am trying to understand, would they also
make it possible to convert at runtime, an existing "non-strict"
domain (for a particular device) into a "strict" domain leaving the
other devices/domains as-is? Please let me know when you think your
patches are good to be tested, and I'd also be interested in trying
them out.

>
> Being able to change this at runtime through sysfs sounds great and it
> fills all the needs I'm aware of, thanks! In Chrome OS we can just use
> this with some udev rules and get everything we need.

I still have another (inverse) use case where this does not work:
We have an Intel chromebook with the default domain type being
non-strict. There is an LTE modem (an internal PCI device which cannot
be marked external), which we'd like to be treated as a "Strict" DMA
domain.

Do I understand it right that using Rob's patches, I could potentially
switch the domain to "strict" *after* booting (since we don't use
initramfs), but by that time, the driver might have already attached
to the modem device (using "non-strict" domain), and thus the damage
may have already been done? So perhaps we still need a device property
that the firmware could use to indicate "strictness" for certain
devices at boot?

Thanks,
Rajat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 3/5] dma-mapping: Enable global non-coherent pool support for RISC-V

2021-08-02 Thread Atish Patra
On Tue, Jul 27, 2021 at 1:52 AM Christoph Hellwig  wrote:
>
> On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote:
> > arch_dma_set_uncached works as well in this case. However, mips,
> > niops2 & xtensa uses a
> > fixed (via config) value for the offset. Similar approach can't be
> > used here because the platform specific
> > offset value has to be determined at runtime so that a single kernel
> > image can boot on all platforms.
>
> Nothing in the interface requires a fixed offset.  And using the offset
> has one enormous advantage in that there is no need to declare a
> statically sized pool - allocations are fully dynamic.  And any kind of
> fixed pool tends to cause huge problems.
>
> > 1. a new DT property so that arch specific code is aware of the
> > non-cacheable window offset.
>
> Yes.
>
> > individual device if a per-device non-cacheable
> >window support is required in future. As of now, the beagleV memory
>
> If you require a per-device noncachable area you can use the per-device
> coherent pools.  But why would you want that?
>
> > region lies in 0x10__0 - x17__
> >which is mapped to start of DRAM 0x8000. All of the
> > non-coherent devices can do 32bit DMA only.
>
> Adjust ZONE_DMA32 so that it takes the uncached offset into account.
>
> > > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true);
> > > > + if (phys_addr == device_addr)
> > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, 
> > > > size, true);
> > > > + else
> > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, 
> > > > size, false);
> > >
> > > Nak.  The phys_addr != device_addr support is goign away.  This needs
> >
> > ok.
> >
> > > to be filled in using dma-ranges property hanging of the struct device.
> >
> > struct device is only accessible in rmem_dma_device_init. I couldn't
> > find a proper way to access it during
> > dma_reserved_default_memory setup under global pool.
> >
> > Does that mean we should use a per-device memory pool instead of a
> > global non-coherent pool ?
>
> Indeed, that would be a problem in this case.  But if we can just
> use the uncached offset directly I think everything will be much
> simpler.

Yes. I was planning to change this to use an uncached offset.
However, the planned mass production for beaglev starlight sbc is
cancelled now [1].
As there is no other board that requires an uncached offset, I don't
think there is no usecase
for adding uncached offset support for RISC-V right now. I will
revisit(hopefully we don't have to)
this in case any platform implements uncached window support in future.

[1] 
https://www.cnx-software.com/2021/07/31/beaglev-starlight-sbc-wont-be-mass-manufactured-redesigned-beaglev-risc-v-sbc-expected-in-q1-2022/
-- 
Regards,
Atish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-08-02 Thread John Garry

On 02/08/2021 17:40, John Garry wrote:

On 02/08/2021 17:16, Robin Murphy wrote:

On 2021-08-02 17:06, John Garry wrote:

On 02/08/2021 16:06, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote:
Add max opt argument to init_iova_domain(), and use it to set the 
rcaches

range.

Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?


Sure, I can do something like that. I actually did have separate 
along those lines in v3 before I decided to change it.


Y'know, at this point I'm now starting to seriously wonder whether 
moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of 
things simpler... :/


As I see, the rcache stuff isn't really specific to IOVA anyway, so it 
seems sane.




Does that sound like crazy talk to you, or an idea worth entertaining?


If you're going to start moving things, has anyone considered putting 
rcache support in lib as a generic solution to "Magazines and Vmem: .." 
paper?


Having said that, I still think that the rcache code has certain 
scalability issues, as discussed before. So making more generic and then 
discarding would be less than ideal.


Thanks,
john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-08-02 Thread John Garry

On 02/08/2021 17:16, Robin Murphy wrote:

On 2021-08-02 17:06, John Garry wrote:

On 02/08/2021 16:06, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote:
Add max opt argument to init_iova_domain(), and use it to set the 
rcaches

range.

Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?


Sure, I can do something like that. I actually did have separate along 
those lines in v3 before I decided to change it.


Y'know, at this point I'm now starting to seriously wonder whether 
moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of 
things simpler... :/


As I see, the rcache stuff isn't really specific to IOVA anyway, so it 
seems sane.




Does that sound like crazy talk to you, or an idea worth entertaining?


If you're going to start moving things, has anyone considered putting 
rcache support in lib as a generic solution to "Magazines and Vmem: .." 
paper?


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-08-02 Thread Robin Murphy

On 2021-08-02 17:06, John Garry wrote:

On 02/08/2021 16:06, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote:
Add max opt argument to init_iova_domain(), and use it to set the 
rcaches

range.

Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?


Sure, I can do something like that. I actually did have separate along 
those lines in v3 before I decided to change it.


Y'know, at this point I'm now starting to seriously wonder whether 
moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of 
things simpler... :/


Does that sound like crazy talk to you, or an idea worth entertaining?

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks

2021-08-02 Thread Will Deacon
On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote:
> Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk
> of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in
> such cases, we would need to drop the XO clock vote in unprepare call and
> this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened)
> clock driver which controls RPMh managed clock resources for new QTI SoCs
> and is a blocking call.
> 
> Given we cannot have a sleeping calls such as clk_bulk_prepare() and
> clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu
> operations like map and unmap can be in atomic context and are in fast
> path, add this prepare and unprepare call to drop the XO vote only for
> system pm callbacks since it is not a fast path and we expect the system
> to enter deep sleep states with system pm as opposed to runtime pm.
> 
> This is a similar sequence of clock requests (prepare,enable and
> disable,unprepare) in arm-smmu probe and remove.
> 
> Signed-off-by: Sai Prakash Ranjan 
> Co-developed-by: Rajendra Nayak 
> Signed-off-by: Rajendra Nayak 
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)

[+Rob]

How does this work with that funny GPU which writes to the SMMU registers
directly? Does the SMMU need to remain independently clocked for that to
work or is it all in the same clock domain?

> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index d3c6f54110a5..9561ba4c5d39 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -2277,6 +2277,13 @@ static int __maybe_unused 
> arm_smmu_runtime_suspend(struct device *dev)
>  
>  static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
>  {
> + int ret;
> + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +
> + ret = clk_bulk_prepare(smmu->num_clks, smmu->clks);
> + if (ret)
> + return ret;
> +
>   if (pm_runtime_suspended(dev))
>   return 0;

If we subsequently fail to enable the clks in arm_smmu_runtime_resume()
should we unprepare them again?

Will

> @@ -2285,10 +2292,19 @@ static int __maybe_unused arm_smmu_pm_resume(struct 
> device *dev)
>  
>  static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
>  {
> + int ret = 0;
> + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +
>   if (pm_runtime_suspended(dev))
> - return 0;
> + goto clk_unprepare;
>  
> - return arm_smmu_runtime_suspend(dev);
> + ret = arm_smmu_runtime_suspend(dev);
> + if (ret)
> + return ret;
> +
> +clk_unprepare:
> + clk_bulk_unprepare(smmu->num_clks, smmu->clks);
> + return ret;
>  }
>  
>  static const struct dev_pm_ops arm_smmu_pm_ops = {
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible

2021-08-02 Thread Robin Murphy

On 2021-08-02 16:23, John Garry wrote:

On 02/08/2021 16:01, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:

Some LLDs may request DMA mappings whose IOVA length exceeds that of the
current rcache upper limit.


What's an LLD?



low-level driver

maybe I'll stick with simply "drivers"


This means that allocations for those IOVAs will never be cached, and
always must be allocated and freed from the RB tree per DMA mapping 
cycle.

This has a significant effect on performance, more so since commit
4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
fails"), as discussed at [0].

As a first step towards allowing the rcache range upper limit be
configured, hold this value in the IOVA rcache structure, and allocate
the rcaches separately.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ 



Signed-off-by: John Garry 
---
  drivers/iommu/dma-iommu.c |  2 +-
  drivers/iommu/iova.c  | 23 +--
  include/linux/iova.h  |  4 ++--
  3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..4772278aa5da 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
   * rounding up anything cacheable to make sure that can't 
happen. The

   * order of the unadjusted size will still match upon freeing.
   */
-    if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+    if (iova_len < (1 << (iovad->rcache_max_size - 1)))
  iova_len = roundup_pow_of_two(iova_len);
  dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..07ce73fdd8c1 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,6 +15,8 @@
  /* The anchor node sits above the top of the usable address space */
  #define IOVA_ANCHOR    ~0UL
+#define IOVA_RANGE_CACHE_MAX_SIZE 6    /* log of max cached IOVA 
range size (in pages) */


Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?


Yeah, that may be better. I was just using the same name as before.




+
  static bool iova_rcache_insert(struct iova_domain *iovad,
 unsigned long pfn,
 unsigned long size);
@@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain 
*iovad)

  unsigned int cpu;
  int i;
-    for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+    iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
+
+    iovad->rcaches = kcalloc(iovad->rcache_max_size,
+ sizeof(*iovad->rcaches), GFP_KERNEL);
+    if (!iovad->rcaches)
+    return;


Returning quietly here doesn't seem like the right thing to do. At 
least, I

don't think the rest of the functions here are checking rcaches against
NULL.



For sure, but that is what other code which can fail here already does, 
like:


static void init_iova_rcaches(struct iova_domain *iovad)
{
 ...

 for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
     ...

     rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
cache_line_size());

     if (WARN_ON(!rcache->cpu_rcaches))
     continue;
}

and that is not safe either.


Yeah, along with flush queues, historically this has all been 
super-dodgy in terms of failure handling (or lack of).


This issue was raised a while ago. I don't mind trying to fix it - a 
slightly painful part is that it touches a few subsystems.


Maybe pull the rcache init out of iova_domain_init() entirely? Only 
iommu-dma uses {alloc,free}_iova_fast(), so TBH it's only a great big 
waste of memory for all the other IOVA domain users anyway.


The other week I started pondering how much of iommu-dma only needs to 
be exposed to the IOMMU core rather than the whole kernel now; I suppose 
there's probably an equal argument to be made for some of these bits of 
the IOVA API, and this might pave the way towards some more logical 
separation, but let's get the functional side dealt with before we worry 
too much about splitting headers.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-08-02 Thread John Garry

On 02/08/2021 16:06, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote:

Add max opt argument to init_iova_domain(), and use it to set the rcaches
range.

Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?


Sure, I can do something like that. I actually did have separate along 
those lines in v3 before I decided to change it.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-08-02 Thread Robin Murphy

On 2021-08-02 16:16, Will Deacon wrote:

On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote:

Multiple iommu domains and iommu groups are getting created for the devices
sharing same SID. It is expected for devices sharing same SID to be in same
iommu group and same iommu domain.
This is leading to context faults when one device is accessing IOVA from
other device which shouldn't be the case for devices sharing same SID.
Fix this by protecting iommu domain and iommu group creation with mutexes.


Robin -- any chance you could take a look at these, please? You had some
comments on the first version which convinced me that they are needed,
but I couldn't tell whether you wanted to solve this a different way or not.


Sorry, I was lamenting that this came to light due to the 
of_iommu_configure() flow being yucky, but that wasn't meant to imply 
that there aren't - or couldn't be in future - better reasons for 
iommu_probe_device() to be robust against concurrency anyway. I do think 
these are legitimate fixes to make in their own right, even if the 
current need might get swept back under the rug in future.


I would say, however, that the commit messages seem to focus too much on 
the wrong details and aren't overly useful, and patch #2 is missing 
Ashish's sign-off.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation

2021-08-02 Thread Will Deacon
On Wed, Jun 23, 2021 at 07:12:01PM +0530, Sai Prakash Ranjan wrote:
> Currently for iommu_unmap() of large scatter-gather list with page size
> elements, the majority of time is spent in flushing of partial walks in
> __arm_lpae_unmap() which is a VA based TLB invalidation invalidating
> page-by-page on iommus like arm-smmu-v2 (TLBIVA).
> 
> For example: to unmap a 32MB scatter-gather list with page size elements
> (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
> for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
> resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
> overhead.
> 
> On qcom implementation, there are several performance improvements for
> TLB cache invalidations in HW like wait-for-safe (for realtime clients
> such as camera and display) and few others to allow for cache
> lookups/updates when TLBI is in progress for the same context bank.
> So the cost of over-invalidation is less compared to the unmap latency
> on several usecases like camera which deals with large buffers. So,
> ASID based TLB invalidations (TLBIASID) can be used to invalidate the
> entire context for partial walk flush thereby improving the unmap
> latency.
> 
> Non-strict mode can use this by default for all platforms given its
> all about over-invalidation saving time on individual unmaps and
> non-deterministic generally.
> 
> For this example of 32MB scatter-gather list unmap, this change results
> in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192
> TLBIVAs thereby increasing the performance of unmaps drastically.
> 
> Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
> (average over 10 iterations)
> 
> Before this optimization:
> 
> sizeiommu_map_sg  iommu_unmap
>   4K2.067 us 1.854 us
>  64K9.598 us 8.802 us
>   1M  148.890 us   130.718 us
>   2M  305.864 us67.291 us
>  12M 1793.604 us   390.838 us
>  16M 2386.848 us   518.187 us
>  24M 3563.296 us   775.989 us
>  32M 4747.171 us  1033.364 us
> 
> After this optimization:
> 
> sizeiommu_map_sg  iommu_unmap
>   4K1.723 us 1.765 us
>  64K9.880 us 8.869 us
>   1M  155.364 us   135.223 us
>   2M  303.906 us 5.385 us
>  12M 1786.557 us21.250 us
>  16M 2391.890 us27.437 us
>  24M 3570.895 us39.937 us
>  32M 4755.234 us51.797 us
> 
> This is further reduced once the map/unmap_pages() support gets in which
> will result in just 1 TLBIASID as compared to 16 TLBIASIDs.
> 
> Real world data also shows big difference in unmap performance as below:
> 
> There were reports of camera frame drops because of high overhead in
> iommu unmap without this optimization because of frequent unmaps issued
> by camera of about 100MB/s taking more than 100ms thereby causing frame
> drops.
> 
> Signed-off-by: Sai Prakash Ranjan 
> ---
> 
> Changes in v3:
>  * Move the logic to arm-smmu driver from io-pgtable (Robin)
>  * Use a new set of iommu_flush_ops->arm_smmu_s1_tlb_impl_ops and use it for 
> qcom impl
> 
> Changes in v2:
>  * Add a quirk to choose tlb_flush_all in partial walk flush
>  * Set the quirk for QTI SoC implementation
> 
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 +
>  drivers/iommu/arm/arm-smmu/arm-smmu.c  | 17 -
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 7771d40176de..218c71465819 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -10,6 +10,8 @@
>  
>  #include "arm-smmu.h"
>  
> +extern const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops;
> +
>  struct qcom_smmu {
>   struct arm_smmu_device smmu;
>   bool bypass_quirk;
> @@ -146,6 +148,8 @@ static int qcom_adreno_smmu_init_context(struct 
> arm_smmu_domain *smmu_domain,
>  {
>   struct adreno_smmu_priv *priv;
>  
> + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops;
> +
>   /* Only enable split pagetables for the GPU device (SID 0) */
>   if (!qcom_adreno_smmu_is_gpu_device(dev))
>   return 0;
> @@ -185,6 +189,14 @@ static const struct of_device_id 
> qcom_smmu_client_of_match[] __maybe_unused = {
>   { }
>  };
>  
> +static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> +{
> + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops;
> +
> + return 0;
> +}
> +
>  static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
>  {
>   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
> 1);
> @@ -308,6 +320,7 @@

Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible

2021-08-02 Thread John Garry

On 02/08/2021 16:01, Will Deacon wrote:

On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:

Some LLDs may request DMA mappings whose IOVA length exceeds that of the
current rcache upper limit.


What's an LLD?



low-level driver

maybe I'll stick with simply "drivers"


This means that allocations for those IOVAs will never be cached, and
always must be allocated and freed from the RB tree per DMA mapping cycle.
This has a significant effect on performance, more so since commit
4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
fails"), as discussed at [0].

As a first step towards allowing the rcache range upper limit be
configured, hold this value in the IOVA rcache structure, and allocate
the rcaches separately.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/

Signed-off-by: John Garry 
---
  drivers/iommu/dma-iommu.c |  2 +-
  drivers/iommu/iova.c  | 23 +--
  include/linux/iova.h  |  4 ++--
  3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..4772278aa5da 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 * rounding up anything cacheable to make sure that can't happen. The
 * order of the unadjusted size will still match upon freeing.
 */
-   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+   if (iova_len < (1 << (iovad->rcache_max_size - 1)))
iova_len = roundup_pow_of_two(iova_len);
  
  	dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..07ce73fdd8c1 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,6 +15,8 @@
  /* The anchor node sits above the top of the usable address space */
  #define IOVA_ANCHOR   ~0UL
  
+#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size (in pages) */


Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?


Yeah, that may be better. I was just using the same name as before.




+
  static bool iova_rcache_insert(struct iova_domain *iovad,
   unsigned long pfn,
   unsigned long size);
@@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain *iovad)
unsigned int cpu;
int i;
  
-	for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {

+   iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
+
+   iovad->rcaches = kcalloc(iovad->rcache_max_size,
+sizeof(*iovad->rcaches), GFP_KERNEL);
+   if (!iovad->rcaches)
+   return;


Returning quietly here doesn't seem like the right thing to do. At least, I
don't think the rest of the functions here are checking rcaches against
NULL.



For sure, but that is what other code which can fail here already does, 
like:


static void init_iova_rcaches(struct iova_domain *iovad)
{
...

for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
...

		rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
cache_line_size());

if (WARN_ON(!rcache->cpu_rcaches))
continue;
}

and that is not safe either.

This issue was raised a while ago. I don't mind trying to fix it - a 
slightly painful part is that it touches a few subsystems.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-08-02 Thread Will Deacon
On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote:
> Multiple iommu domains and iommu groups are getting created for the devices
> sharing same SID. It is expected for devices sharing same SID to be in same
> iommu group and same iommu domain.
> This is leading to context faults when one device is accessing IOVA from
> other device which shouldn't be the case for devices sharing same SID.
> Fix this by protecting iommu domain and iommu group creation with mutexes.

Robin -- any chance you could take a look at these, please? You had some
comments on the first version which convinced me that they are needed,
but I couldn't tell whether you wanted to solve this a different way or not.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-02 Thread Will Deacon
On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> >
> > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > the memory type setting required for the non-coherent masters to use
> > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > Without this, the system cache lines are not allocated for GPU.
> > > > >
> > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > and makes GPU the user of this protection flag.
> > > >
> > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > not apply anymore?
> > > >
> > >
> > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > I can repost the patch.
> >
> > I still think you need to handle the mismatched alias, no? You're adding
> > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > can't be right.
> >
> 
> Just curious, and maybe this is a dumb question, but what is your
> concern about mismatched aliases?  I mean the cache hierarchy on the
> GPU device side (anything beyond the LLC) is pretty different and
> doesn't really care about the smmu pgtable attributes..

If the CPU accesses a shared buffer with different attributes to those which
the device is using then you fall into the "mismatched memory attributes"
part of the Arm architecture. It's reasonably unforgiving (you should go and
read it) and in some cases can apply to speculative accesses as well, but
the end result is typically loss of coherency.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-08-02 Thread Will Deacon
On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote:
> Add max opt argument to init_iova_domain(), and use it to set the rcaches
> range.
> 
> Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Remove stale amd_iommu_unmap_flush usage

2021-08-02 Thread Joerg Roedel
From: Joerg Roedel 

Remove the new use of the variable introduced in the AMD driver branch.
The variable was removed already in the iommu core branch, causing build
errors when the brances are merged.

Cc: Nadav Amit 
Cc: Zhen Lei 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/init.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 239556c1f698..bdcf167b4afe 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,11 +1850,9 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
return ret;
 
if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
-   if (!amd_iommu_unmap_flush)
-   pr_info("IOMMU batching is disabled due to 
virtualization\n");
-
+   pr_info("Using strict mode due to virtualization\n");
+   iommu_set_dma_strict();
amd_iommu_np_cache = true;
-   amd_iommu_unmap_flush = true;
}
 
init_iommu_perf_ctr(iommu);
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-02 Thread Rob Clark
On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
>
> On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > the memory type setting required for the non-coherent masters to use
> > > > system cache. Now that system cache support for GPU is added, we will
> > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > Without this, the system cache lines are not allocated for GPU.
> > > >
> > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > and makes GPU the user of this protection flag.
> > >
> > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > not apply anymore?
> > >
> >
> > I was waiting on Will's reply [1]. If there are no changes needed, then
> > I can repost the patch.
>
> I still think you need to handle the mismatched alias, no? You're adding
> a new memory type to the SMMU which doesn't exist on the CPU side. That
> can't be right.
>

Just curious, and maybe this is a dumb question, but what is your
concern about mismatched aliases?  I mean the cache hierarchy on the
GPU device side (anything beyond the LLC) is pretty different and
doesn't really care about the smmu pgtable attributes..

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible

2021-08-02 Thread Will Deacon
On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote:
> Some LLDs may request DMA mappings whose IOVA length exceeds that of the
> current rcache upper limit.

What's an LLD?

> This means that allocations for those IOVAs will never be cached, and
> always must be allocated and freed from the RB tree per DMA mapping cycle.
> This has a significant effect on performance, more so since commit
> 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
> fails"), as discussed at [0].
> 
> As a first step towards allowing the rcache range upper limit be
> configured, hold this value in the IOVA rcache structure, and allocate
> the rcaches separately.
> 
> [0] 
> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/dma-iommu.c |  2 +-
>  drivers/iommu/iova.c  | 23 +--
>  include/linux/iova.h  |  4 ++--
>  3 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 98ba927aee1a..4772278aa5da 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
> iommu_domain *domain,
>* rounding up anything cacheable to make sure that can't happen. The
>* order of the unadjusted size will still match upon freeing.
>*/
> - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> + if (iova_len < (1 << (iovad->rcache_max_size - 1)))
>   iova_len = roundup_pow_of_two(iova_len);
>  
>   dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index b6cf5f16123b..07ce73fdd8c1 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -15,6 +15,8 @@
>  /* The anchor node sits above the top of the usable address space */
>  #define IOVA_ANCHOR  ~0UL
>  
> +#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size 
> (in pages) */

Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER?

> +
>  static bool iova_rcache_insert(struct iova_domain *iovad,
>  unsigned long pfn,
>  unsigned long size);
> @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain *iovad)
>   unsigned int cpu;
>   int i;
>  
> - for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
> + iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE;
> +
> + iovad->rcaches = kcalloc(iovad->rcache_max_size,
> +  sizeof(*iovad->rcaches), GFP_KERNEL);
> + if (!iovad->rcaches)
> + return;

Returning quietly here doesn't seem like the right thing to do. At least, I
don't think the rest of the functions here are checking rcaches against
NULL.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Implement [map/unmap]_pages callbacks for ARM SMMUV3

2021-08-02 Thread Joerg Roedel
On Mon, Aug 02, 2021 at 03:43:20PM +0100, Will Deacon wrote:
> For both patches:
> 
> Acked-by: Will Deacon 
> 
> Joerg -- please can you take these directly? They build on top of the
> changes from Isaac which you have queued on your 'core' branch.

Sure, applied to core branch now.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Refactor iommu_group_store_type()

2021-08-02 Thread Will Deacon
On Wed, Jul 14, 2021 at 06:36:38PM +0800, John Garry wrote:
> Function iommu_group_store_type() supports changing the default domain
> of an IOMMU group.
> 
> Many conditions need to be satisfied and steps taken for this action to be
> successful.
> 
> Satisfying these conditions and steps will be required for setting other
> IOMMU group attributes, so factor into a common part and a part specific
> to update the IOMMU group attribute.
> 
> No functional change intended.
> 
> Some code comments are tidied up also.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/iommu.c | 73 +++
>  1 file changed, 46 insertions(+), 27 deletions(-)

Acked-by: Will Deacon 

Although likely to conflict with Robin's monster series.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Implement [map/unmap]_pages callbacks for ARM SMMUV3

2021-08-02 Thread Will Deacon
On Sat, Jul 31, 2021 at 10:17:09AM +0800, chenxiang wrote:
> From: Xiang Chen 
> 
> The series ("Optimizing iommu_[map/unmap] performance") improve the
> iommu_[map/unmap] performance. Based on the series, implement 
> [map/unmap]_pages
> callbacks for ARM SMMUV3.
> Use tool dma_map_benchmark to test the latency of map/unmap, and it promotes
> much on it. The test result is as follows:
> t = 1(thread = 1):
>before opt(us)   after opt(us)
> g=1(4K size)0.1/1.3  0.1/0.8
> g=2(8K size)0.2/1.5  0.2/0.9
> g=4(16K size)   0.3/1.9  0.1/1.1
> g=8(32K size)   0.5/2.7  0.2/1.4
> g=16(64K size)  1.0/4.5  0.2/2.0
> g=32(128K size) 1.8/7.9  0.2/3.3
> g=64(256K size) 3.7/14.8 0.4/6.1
> g=128(512K size)7.1/14.7 0.5/10.4
> g=256(1M size)  14.0/53.90.8/19.3
> g=512(2M size)  0.2/0.9  0.2/0.9
> g=1024(4M size) 0.5/1.5  0.4/1.0
> 
> t = 10(thread = 10):
>before opt(us)   after opt(us)
> g=1(4K size)0.3/7.0  0.1/5.8
> g=2(8K size)0.4/6.7  0.3/6.0
> g=4(16K size)   0.5/6.3  0.3/5.6
> g=8(32K size)   0.5/8.3  0.2/6.3
> g=16(64K size)  1.0/17.3 0.3/12.4
> g=32(128K size) 1.8/36.0 0.2/24.2
> g=64(256K size) 4.3/67.2 1.2/46.4
> g=128(512K size)7.8/93.7 1.3/94.2
> g=256(1M size)  14.7/280.8   1.8/191.5
> g=512(2M size)  3.6/3.2  1.5/2.5
> g=1024(4M size) 2.0/3.1  1.8/2.6 
> 
> Xiang Chen (2):
>   iommu/arm-smmu-v3: Implement the unmap_pages() IOMMU driver callback
>   iommu/arm-smmu-v3: Implement the map_pages() IOMMU driver callback

For both patches:

Acked-by: Will Deacon 

Joerg -- please can you take these directly? They build on top of the
changes from Isaac which you have queued on your 'core' branch.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: Check if group is NULL before remove device

2021-08-02 Thread Joerg Roedel
On Sat, Jul 31, 2021 at 09:47:37AM +0200, Frank Wunderlich wrote:
> Fixes: d72e31c93746 ("iommu: IOMMU Groups")
> Signed-off-by: Frank Wunderlich 
> ---
> v2:
> - commit-message with captial letters on beginning of sentenence
> - added more information, many thanks to Yong Wu

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface

2021-08-02 Thread Robin Murphy

On 2021-08-02 14:04, Will Deacon wrote:

On Wed, Jul 28, 2021 at 04:58:44PM +0100, Robin Murphy wrote:

To make io-pgtable aware of a flush queue being dynamically enabled,
allow IO_PGTABLE_QUIRK_NON_STRICT to be set even after a domain has been
attached to, and hook up the final piece of the puzzle in iommu-dma.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 11 +++
  drivers/iommu/dma-iommu.c   |  3 +++
  3 files changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 19400826eba7..40fa9cb382c3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2711,6 +2711,20 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
*domain)
return ret;
  }
  
+static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,

+   unsigned long quirks)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (quirks == IO_PGTABLE_QUIRK_NON_STRICT && smmu_domain->pgtbl_ops) {
+   struct io_pgtable *iop = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+   iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   return 0;
+   }
+   return -EINVAL;
+}


I don't see anything serialising this against a concurrent iommu_unmap(), so
the ordering and atomicity looks quite suspicious to me here. I don't think
it's just the page-table quirks either, but also setting cookie->fq_domain.


Heh, I confess to very much taking the cheeky "let's say nothing and see 
what Will thinks about concurrency" approach here :)


The beauty of only allowing relaxation in the strict->non-strict 
direction is that it shouldn't need serialising as such - it doesn't 
matter if the update to cookie->fq_domain is observed between 
iommu_unmap() and iommu_dma_free_iova(), since there's no correctness 
impact to queueing IOVAs which may already have been invalidated and may 
or may not have been synced. AFAICS the only condition which matters is 
that the setting of the io-pgtable quirk must observe fq_domain being 
set. It feels like there must be enough dependencies on the read side, 
but we might need an smp_wmb() between the two in iommu_dma_init_fq()?


I've also flip-flopped a bit on whether fq_domain needs to be accessed 
with READ_ONCE/WRITE_ONCE - by the time of posting I'd convinced myself 
that it was probably OK, but looking again now I suppose this wacky 
reordering is theoretically possible:



iommu_dma_unmap() {
bool free_fq = cookie->fq_domain; // == false

iommu_unmap();

if (!cookie->fq_domain) // observes new non-NULL value
iommu_tlb_sync(); // skipped

iommu_dma_free_iova { // inlined
if (free_fq) // false
queue_iova();
else
free_iova_fast(); // Uh-oh!
}
}

so although I still can't see atomicity being a problem I guess we do 
need it for the sake of reordering at least.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-02 Thread Tianyu Lan

On 8/2/2021 9:20 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote:

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.


I am wondering why you dont't use DMA-API unconditionally? It provides
enough abstraction to do the right thing for isolated and legacy VMs.



In VMbus, there is already a similar bounce buffer design and so there 
is no need to call DMA-API for such buffer. Calling DMA-API is to use

swiotlb bounce buffer for those buffer which hasn't been covered. This
is why need to conditionally call DMA API.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/4] dma-iommu: pass SKIP_CPU_SYNC to swiotlb unmap

2021-08-02 Thread Will Deacon
On Fri, Jul 09, 2021 at 12:35:01PM +0900, David Stevens wrote:
> From: David Stevens 
> 
> If SKIP_CPU_SYNC isn't already set, then iommu_dma_unmap_(page|sg) has
> already called iommu_dma_sync_(single|sg)_for_cpu, so there is no need
> to copy from the bounce buffer again.
> 
> Signed-off-by: David Stevens 
> ---
>  drivers/iommu/dma-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index e79e274d2dc5..0a9a9a343e64 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
> dma_addr_t dma_addr,
>   __iommu_dma_unmap(dev, dma_addr, size);
>  
>   if (unlikely(is_swiotlb_buffer(phys)))
> - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> + swiotlb_tbl_unmap_single(dev, phys, size, dir,
> +  attrs | DMA_ATTR_SKIP_CPU_SYNC);
>  }

I think it would be cleaner to drop DMA_ATTR_SKIP_CPU_SYNC in the callers
once they've called iommu_dma_sync_*_for_cpu().

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] dma-iommu: fix arch_sync_dma for map with swiotlb

2021-08-02 Thread Will Deacon
On Mon, Aug 02, 2021 at 02:40:59PM +0100, Will Deacon wrote:
> On Fri, Jul 09, 2021 at 12:35:00PM +0900, David Stevens wrote:
> > From: David Stevens 
> > 
> > When calling arch_sync_dma, we need to pass it the memory that's
> > actually being used for dma. When using swiotlb bounce buffers, this is
> > the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
> > helper, so it can use the bounce buffer address if necessary. This also
> > means it is no longer necessary to call iommu_dma_sync_sg_for_device in
> > iommu_dma_map_sg for untrusted devices.
> > 
> > Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> > Signed-off-by: David Stevens 
> > ---
> >  drivers/iommu/dma-iommu.c | 16 +++-
> >  1 file changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index eac65302439e..e79e274d2dc5 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -574,6 +574,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> > *dev, phys_addr_t phys,
> > memset(padding_start, 0, padding_size);
> > }
> >  
> > +   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> > +   arch_sync_dma_for_device(phys, org_size, dir);
> 
> I think this relies on the swiotlb buffers residing in the linear mapping
> (i.e. where phys_to_virt() is reliable), which doesn't look like a safe
> assumption to me.

No, sorry, ignore me here. I misread swiotlb_bounce(), so I think your
change is good.

As an aside, it strikes me that we'd probably be better off using
uncacheable bounce buffers for non-coherent devices so we could avoid all
this maintenance entirely.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] dma-iommu: fix arch_sync_dma for map with swiotlb

2021-08-02 Thread Will Deacon
On Fri, Jul 09, 2021 at 12:35:00PM +0900, David Stevens wrote:
> From: David Stevens 
> 
> When calling arch_sync_dma, we need to pass it the memory that's
> actually being used for dma. When using swiotlb bounce buffers, this is
> the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
> helper, so it can use the bounce buffer address if necessary. This also
> means it is no longer necessary to call iommu_dma_sync_sg_for_device in
> iommu_dma_map_sg for untrusted devices.
> 
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: David Stevens 
> ---
>  drivers/iommu/dma-iommu.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index eac65302439e..e79e274d2dc5 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -574,6 +574,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>   memset(padding_start, 0, padding_size);
>   }
>  
> + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> + arch_sync_dma_for_device(phys, org_size, dir);

I think this relies on the swiotlb buffers residing in the linear mapping
(i.e. where phys_to_virt() is reliable), which doesn't look like a safe
assumption to me.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

2021-08-02 Thread Tianyu Lan

On 8/2/2021 8:39 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote:

+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;


The values set to ghcb_usage deserve some defines (here and below).



OK. Will update in the next version.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dma-iommu: fix sync_sg with swiotlb

2021-08-02 Thread Will Deacon
On Fri, Jul 09, 2021 at 12:34:59PM +0900, David Stevens wrote:
> From: David Stevens 
> 
> The is_swiotlb_buffer function takes the physical address of the swiotlb
> buffer, not the physical address of the original buffer. The sglist
> contains the physical addresses of the original buffer, so for the
> sync_sg functions to work properly when a bounce buffer might have been
> used, we need to use iommu_iova_to_phys to look up the physical address.
> This is what sync_single does, so call that function on each sglist
> segment.
> 
> The previous code mostly worked because swiotlb does the transfer on map
> and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
> sglists or which call sync_sg would not have had anything copied to the
> bounce buffer.
> 
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: David Stevens 
> ---
>  drivers/iommu/dma-iommu.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..eac65302439e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -811,14 +811,14 @@ static void iommu_dma_sync_sg_for_cpu(struct device 
> *dev,
>   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
>   return;
>  
> - for_each_sg(sgl, sg, nelems, i) {
> - if (!dev_is_dma_coherent(dev))
> - arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
> -
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (dev_is_untrusted(dev))
> + for_each_sg(sgl, sg, nelems, i)
> + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> +   sg->length, dir);
> + else
> + for_each_sg(sgl, sg, nelems, i)
>   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   sg->length, dir);

Doesn't this skip arch_sync_dma_for_cpu() for non-coherent trusted devices?

Why not skip the extra dev_is_untrusted(dev) call here and just call
iommu_dma_sync_single_for_cpu() for each entry regardless?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-02 Thread Joerg Roedel
On Mon, Aug 02, 2021 at 03:11:40PM +0200, Juergen Gross wrote:
> As those cases are all mutually exclusive, wouldn't a static_call() be
> the appropriate solution?

Right, static_call() is even better, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote:
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still need to handle. Use DMA API to map/umap these
> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.

I am wondering why you dont't use DMA-API unconditionally? It provides
enough abstraction to do the right thing for isolated and legacy VMs.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-08-02 Thread Tianyu Lan

On 8/2/2021 8:28 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:

+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);


Do you really need to zero out the whole 4k? The validation bitmap
should be enough, there are no secrets on the page anyway.
Same in hv_ghcb_msr_read().


OK. Thanks for suggestion. I will have a try.




+enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+  struct es_em_ctxt *ctxt,
+  u64 exit_code, u64 exit_info_1,
+  u64 exit_info_2)
  {
enum es_result ret;
  
@@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,

ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
  
-	sev_es_wr_ghcb_msr(__pa(ghcb));

+   /*
+* Hyper-V runs paravisor with SEV. Ghcb page is allocated by
+* paravisor and not needs to be updated in the Linux guest.
+* Otherwise, the ghcb page's PA reported by paravisor is above
+* VTOM. Hyper-V use this function with NULL for ctxt point and
+* skip setting ghcb page in such case.
+*/
+   if (ctxt)
+   sev_es_wr_ghcb_msr(__pa(ghcb));


No, do not make this function work with ctxt==NULL. Instead, factor out
a helper function which contains what Hyper-V needs and use that in
sev_es_ghcb_hv_call() and Hyper-V code.



OK. Will update.


+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);


I am curious what this will end up being good for.



Hyper-V introduces a specific hypercall request in GHCB page and use 
same union in the Linux Hyper-V code to read/write MSR and call the new 
hypercall request.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-02 Thread Juergen Gross via iommu

On 02.08.21 14:01, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:

__set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
Hyper-V are messing around in here.


I was going to suggest a PV_OPS call where the fitting implementation
for the guest environment can be plugged in at boot. There is TDX and an
SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
cloud/hypervisor vendors. Hiding all these behind feature checks is not
going to make things cleaner.


As those cases are all mutually exclusive, wouldn't a static_call() be
the appropriate solution?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-02 Thread Tianyu Lan

On 8/2/2021 8:59 PM, Joerg Roedel wrote:

On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote:

Both second and third are HV_GPADL_RING type. One is send ring and the
other is receive ring. The driver keeps the order to allocate rx and
tx buffer. You are right this is not robust and will add a mutex to keep
the order.


Or you introduce fixed indexes for the RX and TX buffers?



The interface just allocates a buffer and driver will continue to 
configure the buffer to be rx or tx after calling.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface

2021-08-02 Thread Will Deacon
On Wed, Jul 28, 2021 at 04:58:44PM +0100, Robin Murphy wrote:
> To make io-pgtable aware of a flush queue being dynamically enabled,
> allow IO_PGTABLE_QUIRK_NON_STRICT to be set even after a domain has been
> attached to, and hook up the final piece of the puzzle in iommu-dma.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 11 +++
>  drivers/iommu/dma-iommu.c   |  3 +++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 19400826eba7..40fa9cb382c3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2711,6 +2711,20 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
> *domain)
>   return ret;
>  }
>  
> +static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
> + unsigned long quirks)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> + if (quirks == IO_PGTABLE_QUIRK_NON_STRICT && smmu_domain->pgtbl_ops) {
> + struct io_pgtable *iop = 
> io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> +
> + iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
> + return 0;
> + }
> + return -EINVAL;
> +}

I don't see anything serialising this against a concurrent iommu_unmap(), so
the ordering and atomicity looks quite suspicious to me here. I don't think
it's just the page-table quirks either, but also setting cookie->fq_domain.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-02 Thread Tianyu Lan




On 8/2/2021 8:01 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:

__set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
Hyper-V are messing around in here.


I was going to suggest a PV_OPS call where the fitting implementation
for the guest environment can be plugged in at boot. There is TDX and an
SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
cloud/hypervisor vendors. Hiding all these behind feature checks is not
going to make things cleaner.



Yes, that makes sense. I will do this in the next version.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-02 Thread Joerg Roedel
On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote:
> Both second and third are HV_GPADL_RING type. One is send ring and the
> other is receive ring. The driver keeps the order to allocate rx and
> tx buffer. You are right this is not robust and will add a mutex to keep
> the order.

Or you introduce fixed indexes for the RX and TX buffers?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:22AM -0400, Tianyu Lan wrote:
> + if (hv_is_isolation_supported()) {
> + vmbus_connection.monitor_pages_va[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages[0]
> + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages_va[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages[1]
> + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> +
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> + }

Okay, let me see if I got this right. In Hyper-V Isolation VMs, when the
guest wants to make memory shared, it does":

- Call to the Hypervisor the mark the pages shared. The
  Hypervisor will do the RMP update and remap the pages at
  (VTOM + pa)

- The guest maps the memory again into its page-table, so that
  the entries point to the correct GPA (which is above VTOM
  now).

Or in other words, Hyper-V implements a hardware-independent and
configurable c-bit position, as the VTOM value is always power-of-two
aligned. Is that correct?
This would at least explain why there is no separate
allocation/dealloction of memory for the shared range.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-02 Thread Christophe Leroy



Le 28/07/2021 à 00:26, Tom Lendacky a écrit :

Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
with the PATTR_MEM_ENCRYPT attribute.



What about 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210730114231.23445-1-w...@kernel.org/ ?


Christophe




Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: VMware Graphics 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Dave Young 
Cc: Baoquan He 
Signed-off-by: Tom Lendacky 
---
  arch/x86/kernel/head64.c| 4 ++--
  arch/x86/mm/ioremap.c   | 4 ++--
  arch/x86/mm/mem_encrypt.c   | 5 ++---
  arch/x86/mm/pat/set_memory.c| 3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
  drivers/gpu/drm/drm_cache.c | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
  drivers/iommu/amd/iommu.c   | 3 ++-
  drivers/iommu/amd/iommu_v2.c| 3 ++-
  drivers/iommu/iommu.c   | 3 ++-
  fs/proc/vmcore.c| 6 +++---
  kernel/dma/swiotlb.c| 4 ++--
  13 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..cafed6456d45 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  
  #include 

@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * there is no need to zero it after changing the memory encryption
 * attribute.
 */
-   if (mem_encrypt_active()) {
+   if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0f2d5ace5986..5e1c1f5cbbe8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -693,7 +693,7 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
  bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long 
size,
 unsigned long flags)
  {
-   if (!mem_encrypt_active())
+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return true;
  
  	if (flags & MEMREMAP_ENC)

@@ -723,7 +723,7 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
  {
bool encrypted_prot;
  
-	if (!mem_encrypt_active())

+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return prot;
  
  	encrypted_prot = true;

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 451de8e84fce..0f1533dbe81c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
  /*
   * SME and SEV are very similar but they are not the same, so there are
   * times that the kernel will need to distinguish between SME and SEV. The
- * sme_active() and sev_active() functions are used for this.  When a
- * distinction isn't needed, the mem_encrypt_active() function can be used.
+ * sme_active() and sev_active() functions are used for this.
   *
   * The trampoline code is a good example for this requirement.  Before
   * paging is activated, SME will access all memory as decrypted, but SEV
@@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
 * The unused memory range was mapped decrypted, change the encryption
 * attribute from decrypted to encrypted before freeing it.
 */
-   if (mem_encrypt_active()) {
+   if (sme_me_mask) {
r = set_memory_encrypted(vaddr, npages);
if (r) {
pr_warn("failed to free unused decrypted pages\n");
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..6925f2bb4be1 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -1986,7 +1987,7 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
int ret;
  
  	/* Nothing to do if memory encryption is not active */

-   if (!mem_encrypt_active())
+   if (!prot_guest_has(PATTR_MEM_ENCRYPT))
return 0;
  
  	/* Should not be working on unaligned addresses */

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index abb928894eac..8407224717df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -38,6 +38,7 @@
  #i

Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-02 Thread Tianyu Lan




On 8/2/2021 8:07 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:

+   if (type == HV_GPADL_BUFFER)
+   index = 0;
+   else
+   index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;


Hmm... This doesn't look very robust. Can you set fixed indexes for
different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
as it is implemented here you risk that index 2 gets overwritten by
subsequent calls.


Both second and third are HV_GPADL_RING type. One is send ring and the
other is receive ring. The driver keeps the order to allocate rx and
tx buffer. You are right this is not robust and will add a mutex to keep
the order.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote:
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 1;

The values set to ghcb_usage deserve some defines (here and below).

> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();
> +
> + hv_ghcb->ghcb.ghcb_usage = 0x;

...

>  union hv_ghcb {
>   struct ghcb ghcb;
> + struct {
> + u64 hypercalldata[509];
> + u64 outputgpa;
> + union {
> + union {
> + struct {
> + u32 callcode: 16;
> + u32 isfast  : 1;
> + u32 reserved1   : 14;
> + u32 isnested: 1;
> + u32 countofelements : 12;
> + u32 reserved2   : 4;
> + u32 repstartindex   : 12;
> + u32 reserved3   : 4;
> + };
> + u64 asuint64;
> + } hypercallinput;
> + union {
> + struct {
> + u16 callstatus;
> + u16 reserved1;
> + u32 elementsprocessed : 12;
> + u32 reserved2 : 20;
> + };
> + u64 asunit64;
> + } hypercalloutput;
> + };
> + u64 reserved2;
> + } hypercall;

Okay, this answers my previous question :)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-02 Thread Tianyu Lan

Hi Joerg:
 Thanks for your review.


On 8/2/2021 7:53 PM, Joerg Roedel wrote:

On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote:

+static int hyperv_init_ghcb(void)
+{
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EINVAL;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);


This deserves a comment. As I understand it, the GHCB pa is set by
Hyper-V or the paravisor, so the page does not need to be allocated by
Linux.
And it is not mapped unencrypted because the GHCB page is allocated
above the VTOM boundary?


You are right. The ghdb page is allocated by paravisor and its physical 
address is above VTOM boundary. Will add a comment to describe this.

Thanks for suggestion.




@@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
  {
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+   unsigned long flags;
+   void **input_arg;
+   void *pg;
+   void **ghcb_va = NULL;
+
+   local_irq_save(flags);
+   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+   pg = *input_arg;


Pg is never used later on, why is it set?


Sorry for noise. This should be removed during rebase and will fix in 
the next version.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);

Do you really need to zero out the whole 4k? The validation bitmap
should be enough, there are no secrets on the page anyway.
Same in hv_ghcb_msr_read().

> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> +struct es_em_ctxt *ctxt,
> +u64 exit_code, u64 exit_info_1,
> +u64 exit_info_2)
>  {
>   enum es_result ret;
>  
> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb 
> *ghcb,
>   ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
>   ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
>  
> - sev_es_wr_ghcb_msr(__pa(ghcb));
> + /*
> +  * Hyper-V runs paravisor with SEV. Ghcb page is allocated by
> +  * paravisor and not needs to be updated in the Linux guest.
> +  * Otherwise, the ghcb page's PA reported by paravisor is above
> +  * VTOM. Hyper-V use this function with NULL for ctxt point and
> +  * skip setting ghcb page in such case.
> +  */
> + if (ctxt)
> + sev_es_wr_ghcb_msr(__pa(ghcb));

No, do not make this function work with ctxt==NULL. Instead, factor out
a helper function which contains what Hyper-V needs and use that in
sev_es_ghcb_hv_call() and Hyper-V code.

> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);

I am curious what this will end up being good for.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:
> + if (type == HV_GPADL_BUFFER)
> + index = 0;
> + else
> + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;

Hmm... This doesn't look very robust. Can you set fixed indexes for
different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
as it is implemented here you risk that index 2 gets overwritten by
subsequent calls.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:
> __set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
> Hyper-V are messing around in here.

I was going to suggest a PV_OPS call where the fitting implementation
for the guest environment can be plugged in at boot. There is TDX and an
SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
cloud/hypervisor vendors. Hiding all these behind feature checks is not
going to make things cleaner.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-02 Thread Joerg Roedel
On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote:
> +static int hyperv_init_ghcb(void)
> +{
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;
> +
> + if (!ms_hyperv.ghcb_base)
> + return -EINVAL;
> +
> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);

This deserves a comment. As I understand it, the GHCB pa is set by
Hyper-V or the paravisor, so the page does not need to be allocated by
Linux.
And it is not mapped unencrypted because the GHCB page is allocated
above the VTOM boundary?

> @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>   struct hv_reenlightenment_control re_ctrl;
>   unsigned int new_cpu;
> + unsigned long flags;
> + void **input_arg;
> + void *pg;
> + void **ghcb_va = NULL;
> +
> + local_irq_save(flags);
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> + pg = *input_arg;

Pg is never used later on, why is it set?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()

2021-08-02 Thread Michael Ellerman
Will Deacon  writes:
> Commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> introduced a set_memory_encrypted() call to swiotlb_exit() so that the
> buffer pages are returned to an encrypted state prior to being freed.
>
> Sachin reports that this leads to the following crash on a Power server:
>
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>
> Nick spotted that this is because set_memory_encrypted() is issuing an
> ultracall which doesn't exist for the processor, and should therefore
> be gated by mem_encrypt_active() to mirror the x86 implementation.
>
> Cc: Konrad Rzeszutek Wilk 
> Cc: Claire Chang 
> Cc: Christoph Hellwig 
> Cc: Robin Murphy 
> Fixes: ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> Suggested-by: Nicholas Piggin 
> Reported-by: Sachin Sant 
> Tested-by: Sachin Sant 
> Tested-by: Nathan Chancellor 
> Link: 
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/
> Signed-off-by: Will Deacon 
> ---
>  arch/powerpc/platforms/pseries/svm.c | 6 ++
>  1 file changed, 6 insertions(+)

Thanks.

Acked-by: Michael Ellerman 


I assume Konrad will take this via the swiotlb tree?

cheers
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-02 Thread Will Deacon
On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> On 2021-07-28 19:30, Georgi Djakov wrote:
> > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > the memory type setting required for the non-coherent masters to use
> > > system cache. Now that system cache support for GPU is added, we will
> > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > Without this, the system cache lines are not allocated for GPU.
> > > 
> > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > and makes GPU the user of this protection flag.
> > 
> > Thank you for the patchset! Are you planning to refresh it, as it does
> > not apply anymore?
> > 
> 
> I was waiting on Will's reply [1]. If there are no changes needed, then
> I can repost the patch.

I still think you need to handle the mismatched alias, no? You're adding
a new memory type to the SMMU which doesn't exist on the CPU side. That
can't be right.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/11] mm: Remove the now unused mem_encrypt_active() function

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:11PM -0500, Tom Lendacky wrote:
> The mem_encrypt_active() function has been replaced by prot_guest_has(),
> so remove the implementation.
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/11] x86/sev: Remove the now unused mem_encrypt_active() function

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:12PM -0500, Tom Lendacky wrote:
> The mem_encrypt_active() function has been replaced by prot_guest_has(),
> so remove the implementation.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote:
> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct 
> trampoline_header *th)
>   if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>   th->flags |= TH_FLAGS_SME_ACTIVE;
>  
> - if (sev_es_active()) {
> + if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
>   /*
>* Skip the call to verify_cpu() in secondary_startup_64 as it
>* will cause #VC exceptions when the AP can't handle them yet.

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/11] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:08PM -0500, Tom Lendacky wrote:
> Replace occurrences of sev_active() with the more generic prot_guest_has()
> using PATTR_GUEST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c
> where PATTR_SEV will be used. If future support is added for other memory
> encryption technologies, the use of PATTR_GUEST_MEM_ENCRYPT can be
> updated, as required, to use PATTR_SEV.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Ard Biesheuvel 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/11] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:07PM -0500, Tom Lendacky wrote:
> Replace occurrences of sme_active() with the more generic prot_guest_has()
> using PATTR_HOST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c
> where PATTR_SME will be used. If future support is added for other memory
> encryption technologies, the use of PATTR_HOST_MEM_ENCRYPT can be
> updated, as required, to use PATTR_SME.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky wrote:
> Introduce an x86 version of the prot_guest_has() function. This will be
> used in the more generic x86 code to replace vendor specific calls like
> sev_active(), etc.
> 
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
> 
> The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the
> same reasons previously stated when changing sme_active(), sev_active and
> sme_me_mask to EXPORT_SYBMOL:
>   commit 87df26175e67 ("x86/mm: Unbreak modules that rely on external 
> PAGE_KERNEL availability")
>   commit 9d5f38ba6c82 ("x86/mm: Unbreak modules that use the DMA API")
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-08-02 Thread Joerg Roedel
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).
> 
> Co-developed-by: Andi Kleen 
> Signed-off-by: Andi Kleen 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 13/21] s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR

2021-08-02 Thread Niklas Schnelle
On Fri, 2021-07-23 at 11:50 -0600, Logan Gunthorpe wrote:
> Setting the ->dma_address to DMA_MAPPING_ERROR is not part of
> the ->map_sg calling convention, so remove it.
> 
> Link: https://lore.kernel.org/linux-mips/20210716063241.gc13...@lst.de/
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Logan Gunthorpe 
> Cc: Niklas Schnelle 
> Cc: Gerald Schaefer 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> ---
>  arch/s390/pci/pci_dma.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index c78b02012764..be48e5b5bfcf 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -492,7 +492,6 @@ static int s390_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   for (i = 1; i < nr_elements; i++) {
>   s = sg_next(s);
>  
> - s->dma_address = DMA_MAPPING_ERROR;
>   s->dma_length = 0;
>  
>   if (s->offset || (size & ~PAGE_MASK) ||

Acked-by: Niklas Schnelle 

Thanks!

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/12] Clean up "mediatek,larb"

2021-08-02 Thread Joerg Roedel
On Fri, Jul 30, 2021 at 10:52:26AM +0800, Yong Wu wrote:
>  .../display/mediatek/mediatek,disp.txt|  9 
>  .../bindings/media/mediatek-jpeg-decoder.yaml |  9 
>  .../bindings/media/mediatek-jpeg-encoder.yaml |  9 
>  .../bindings/media/mediatek-mdp.txt   |  8 
>  .../bindings/media/mediatek-vcodec.txt|  4 --
>  arch/arm/boot/dts/mt2701.dtsi |  2 -
>  arch/arm/boot/dts/mt7623n.dtsi|  5 --
>  arch/arm64/boot/dts/mediatek/mt8173.dtsi  | 16 ---
>  arch/arm64/boot/dts/mediatek/mt8183.dtsi  |  6 ---
>  drivers/gpu/drm/mediatek/mtk_disp_ovl.c   |  9 +++-
>  drivers/gpu/drm/mediatek/mtk_disp_rdma.c  |  9 +++-
>  drivers/gpu/drm/mediatek/mtk_drm_crtc.c   | 15 +++---
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c   | 36 +--
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h   |  1 -
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c|  5 +-
>  drivers/iommu/mtk_iommu.c | 24 +-
>  drivers/iommu/mtk_iommu_v1.c  | 31 -
>  .../media/platform/mtk-jpeg/mtk_jpeg_core.c   | 45 +-
>  .../media/platform/mtk-jpeg/mtk_jpeg_core.h   |  2 -
>  drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 46 +--
>  drivers/media/platform/mtk-mdp/mtk_mdp_comp.h |  2 -
>  drivers/media/platform/mtk-mdp/mtk_mdp_core.c |  1 -
>  .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   | 37 ++-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  3 --
>  .../platform/mtk-vcodec/mtk_vcodec_enc.c  |  1 -
>  .../platform/mtk-vcodec/mtk_vcodec_enc_pm.c   | 44 ++
>  drivers/memory/mtk-smi.c  | 14 --
>  include/soc/mediatek/smi.h| 20 
>  28 files changed, 92 insertions(+), 321 deletions(-)

So this is likely not going through the IOMMU tree, given Matthias has
reviewed the IOMMU changes you can add my Acked-by: Joerg Roedel 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 0/7] iommu/amd: Enable page-selective flushes

2021-08-02 Thread Joerg Roedel
On Fri, Jul 23, 2021 at 02:32:02AM -0700, Nadav Amit wrote:
> Nadav Amit (6):
>   iommu/amd: Selective flush on unmap
>   iommu/amd: Do not use flush-queue when NpCache is on
>   iommu: Factor iommu_iotlb_gather_is_disjoint() out
>   iommu/amd: Tailored gather logic for AMD
>   iommu/amd: Sync once for scatter-gather operations
>   iommu/amd: Use only natural aligned flushes in a VM
> 
> Robin Murphy (1):
>   iommu: Improve iommu_iotlb_gather helpers

Applied, thanks Nadav.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 0/9] ACPI/IORT: Support for IORT RMR node

2021-08-02 Thread j...@8bytes.org
On Tue, Jul 27, 2021 at 06:51:56AM +, Shameerali Kolothum Thodi wrote:
> A gentle ping on this...

This needs more reviews, and please add

Will Deacon 

when you post the next version of this patch-set.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu