Re: [PATCH v2 2/2] drm/msm: Hangcheck progress detection

2022-11-03 Thread Rob Clark
On Wed, Nov 2, 2022 at 4:26 PM Dmitry Baryshkov
 wrote:
>
> On 02/11/2022 01:33, Rob Clark wrote:
> > From: Rob Clark 
> >
> > If the hangcheck timer expires, check if the fw's position in the
> > cmdstream has advanced (changed) since last timer expiration, and
> > allow it up to three additional "extensions" to it's alotted time.
> > The intention is to continue to catch "shader stuck in a loop" type
> > hangs quickly, but allow more time for things that are actually
> > making forward progress.
>
> Just out of curiosity: wouldn't position also change for a 'shader stuck
> in a loop'?

There is some pipelining, in that there can be a couple draws in
flight at the same time, and SQE is running ahead of that, but with a
shader stuck in a loop the associated draw will not complete, and that
will halt forward progress through the cmdstream.  Basically what this
is doing is detecting that forward progress through the cmdstream has
stopped.

BR,
-R

>
> > Because we need to sample the CP state twice to detect if there has
> > not been progress, this also cuts the the timer's duration in half.
> >
> > v2: Fix typo (REG_A6XX_CP_CSQ_IB2_STAT), add comment
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Akhil P Oommen 
>
>
>
> --
> With best wishes
> Dmitry
>


Re: [PATCH v4 1/2] drm/msm: move domain allocation into msm_iommu_new()

2022-11-02 Thread Rob Clark
On Wed, Nov 2, 2022 at 10:54 AM Dmitry Baryshkov
 wrote:
>
> After the msm_iommu instance is created, the IOMMU domain is completely
> handled inside the msm_iommu code. Move the iommu_domain_alloc() call
> into the msm_iommu_new() to simplify callers code.
>
> Reported-by: kernel test robot 
> Signed-off-by: Dmitry Baryshkov 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c| 12 +---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 23 +++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 25 +---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  2 --
>  drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 19 +-
>  drivers/gpu/drm/msm/msm_drv.c| 18 -
>  drivers/gpu/drm/msm/msm_iommu.c  | 20 ---
>  drivers/gpu/drm/msm/msm_mmu.h|  3 ++-
>  8 files changed, 62 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index e033d6a67a20..6484b97c5344 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1213,19 +1213,17 @@ static int a6xx_gmu_memory_alloc(struct a6xx_gmu 
> *gmu, struct a6xx_gmu_bo *bo,
>
>  static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>  {
> -   struct iommu_domain *domain;
> struct msm_mmu *mmu;
>
> -   domain = iommu_domain_alloc(_bus_type);
> -   if (!domain)
> +   mmu = msm_iommu_new(gmu->dev, 0);
> +   if (!mmu)
> return -ENODEV;
> +   if (IS_ERR(mmu))
> +   return PTR_ERR(mmu);
>
> -   mmu = msm_iommu_new(gmu->dev, domain);
> gmu->aspace = msm_gem_address_space_create(mmu, "gmu", 0x0, 
> 0x8000);
> -   if (IS_ERR(gmu->aspace)) {
> -   iommu_domain_free(domain);
> +   if (IS_ERR(gmu->aspace))
> return PTR_ERR(gmu->aspace);
> -   }
>
> return 0;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index fdc578016e0b..db4b3a48c708 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1786,35 +1786,34 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
> platform_device *pdev)
>  {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> -   struct iommu_domain *iommu;
> +   struct iommu_domain_geometry *geometry;
> struct msm_mmu *mmu;
> struct msm_gem_address_space *aspace;
> u64 start, size;
> -
> -   iommu = iommu_domain_alloc(_bus_type);
> -   if (!iommu)
> -   return NULL;
> +   unsigned long quirks = 0;
>
> /*
>  * This allows GPU to set the bus attributes required to use system
>  * cache on behalf of the iommu page table walker.
>  */
> if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> -   adreno_set_llc_attributes(iommu);
> +   quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
>
> -   mmu = msm_iommu_new(>dev, iommu);
> -   if (IS_ERR(mmu)) {
> -   iommu_domain_free(iommu);
> +   mmu = msm_iommu_new(>dev, quirks);
> +   if (IS_ERR_OR_NULL(mmu))
> return ERR_CAST(mmu);
> -   }
> +
> +   geometry = msm_iommu_get_geometry(mmu);
> +   if (IS_ERR(geometry))
> +   return ERR_CAST(geometry);
>
> /*
>  * Use the aperture start or SZ_16M, whichever is greater. This will
>  * ensure that we align with the allocated pagetable range while still
>  * allowing room in the lower 32 bits for GMEM and whatnot
>  */
> -   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> -   size = iommu->geometry.aperture_end - start + 1;
> +   start = max_t(u64, SZ_16M, geometry->aperture_start);
> +   size = geometry->aperture_end - start + 1;
>
> aspace = msm_gem_address_space_create(mmu, "gpu",
> start & GENMASK_ULL(48, 0), size);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 382fb7f9e497..12d0497f57e1 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -191,37 +191,30 @@ int adreno_zap_shader_load(struct msm_gpu *gpu, u32 
> pasid)
> return zap_shader_load_mdt(gpu, adreno_gpu->info->zapfw, pasid);
>  }
>
> -void adreno_set_llc_attributes(struct iommu_doma

Re: [PATCH v4 2/2] drm/msm: remove duplicated code from a6xx_create_address_space

2022-11-02 Thread Rob Clark
On Wed, Nov 2, 2022 at 10:54 AM Dmitry Baryshkov
 wrote:
>
> The function a6xx_create_address_space() is mostly a copy of
> adreno_iommu_create_address_space() with added quirk setting. Rework
> these two functions to be a thin wrappers around a common helper.
>
> Signed-off-by: Dmitry Baryshkov 

Reviewed-by: Rob Clark 


> ---
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 28 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 12 +--
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  7 ++-
>  6 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> index 2c8b9899625b..948785ed07bb 100644
> --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> @@ -500,7 +500,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #endif
> .gpu_state_get = a3xx_gpu_state_get,
> .gpu_state_put = adreno_gpu_state_put,
> -   .create_address_space = adreno_iommu_create_address_space,
> +   .create_address_space = adreno_create_address_space,
> .get_rptr = a3xx_get_rptr,
> },
>  };
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index 7cb8d9849c07..2fb32d5552c4 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -635,7 +635,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #endif
> .gpu_state_get = a4xx_gpu_state_get,
> .gpu_state_put = adreno_gpu_state_put,
> -   .create_address_space = adreno_iommu_create_address_space,
> +   .create_address_space = adreno_create_address_space,
> .get_rptr = a4xx_get_rptr,
> },
> .get_timestamp = a4xx_get_timestamp,
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 3dcec7acb384..3c537c0016fa 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1705,7 +1705,7 @@ static const struct adreno_gpu_funcs funcs = {
> .gpu_busy = a5xx_gpu_busy,
> .gpu_state_get = a5xx_gpu_state_get,
> .gpu_state_put = a5xx_gpu_state_put,
> -   .create_address_space = adreno_iommu_create_address_space,
> +   .create_address_space = adreno_create_address_space,
> .get_rptr = a5xx_get_rptr,
> },
> .get_timestamp = a5xx_get_timestamp,
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index db4b3a48c708..e87196457b9a 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1786,10 +1786,6 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
> platform_device *pdev)
>  {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> -   struct iommu_domain_geometry *geometry;
> -   struct msm_mmu *mmu;
> -   struct msm_gem_address_space *aspace;
> -   u64 start, size;
> unsigned long quirks = 0;
>
> /*
> @@ -1799,29 +1795,7 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
> platform_device *pdev)
> if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
>
> -   mmu = msm_iommu_new(>dev, quirks);
> -   if (IS_ERR_OR_NULL(mmu))
> -   return ERR_CAST(mmu);
> -
> -   geometry = msm_iommu_get_geometry(mmu);
> -   if (IS_ERR(geometry))
> -   return ERR_CAST(geometry);
> -
> -   /*
> -* Use the aperture start or SZ_16M, whichever is greater. This will
> -* ensure that we align with the allocated pagetable range while still
> -* allowing room in the lower 32 bits for GMEM and whatnot
> -*/
> -   start = max_t(u64, SZ_16M, geometry->aperture_start);
> -   size = geometry->aperture_end - start + 1;
> -
> -   aspace = msm_gem_address_space_create(mmu, "gpu",
> -   start & GENMASK_ULL(48, 0), size);
> -
> -   if (IS_ERR(aspace) && !IS_ERR(mmu))
> -   mmu->funcs->destroy(mmu);
> -
> -   return aspace;
> +   return adreno_iommu_create_address_space(gpu, pdev, quirks);
>  }
>
>  static struct msm_gem_address_space *
> d

[PATCH] drm/syncobj: Remove unused field

2022-11-02 Thread Rob Clark
From: Rob Clark 

Seems to be a leftover after commit e7cdf5c82f17 ("drm/syncobj: Stop
reusing the same struct file for all syncobj -> fd").

Signed-off-by: Rob Clark 
---
 include/drm/drm_syncobj.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 6cf7243a1dc5..affc4d8e50e2 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -57,10 +57,6 @@ struct drm_syncobj {
 * @lock: Protects _list and write-locks 
 */
spinlock_t lock;
-   /**
-* @file: A file backing for this syncobj.
-*/
-   struct file *file;
 };
 
 void drm_syncobj_free(struct kref *kref);
-- 
2.38.1



Re: [PATCH] drm/msm: Remove exclusive-fence hack

2022-11-02 Thread Rob Clark
On Wed, Nov 2, 2022 at 3:46 AM Christian König  wrote:
>
> Am 01.11.22 um 22:40 schrieb Rob Clark:
> > From: Rob Clark 
> >
> > The workaround was initially necessary due to dma_resv having only a
> > single exclusive fence slot, yet whe don't necessarily know what order
> > the gpu scheduler will schedule jobs.  Unfortunately this workaround
> > also has the result of forcing implicit sync, even when userspace does
> > not want it.
> >
> > However, since commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
> > dma_resv workaround") the workaround is no longer needed.  So remove
> > it.  This effectively reverts commit f1b3f696a084 ("drm/msm: Don't
> > break exclusive fence ordering")
> >
> > Signed-off-by: Rob Clark 
>
> Oh, yes please. I had that on my todo list for after the initial patch
> had landed, but couldn't find the time to look into it once more.
>
> There was another case with one of the other ARM drivers which could be
> cleaned up now, but I can't find it any more of hand.
>
> Anyway this patch here is Acked-by: Christian König
> .

Thanks.. I had a quick look for the other driver but couldn't spot
anything, so perhaps it has already been fixed?

BR,
-R

>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/msm/msm_gem_submit.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
> > b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 5599d93ec0d2..cc48f73adadf 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -334,8 +334,7 @@ static int submit_fence_sync(struct msm_gem_submit 
> > *submit, bool no_implicit)
> >   if (ret)
> >   return ret;
> >
> > - /* exclusive fences must be ordered */
> > - if (no_implicit && !write)
> > + if (no_implicit)
> >   continue;
> >
> >   ret = drm_sched_job_add_implicit_dependencies(>base,
>


[PATCH v2 2/2] drm/msm: Hangcheck progress detection

2022-11-01 Thread Rob Clark
From: Rob Clark 

If the hangcheck timer expires, check if the fw's position in the
cmdstream has advanced (changed) since last timer expiration, and
allow it up to three additional "extensions" to it's alotted time.
The intention is to continue to catch "shader stuck in a loop" type
hangs quickly, but allow more time for things that are actually
making forward progress.

Because we need to sample the CP state twice to detect if there has
not been progress, this also cuts the the timer's duration in half.

v2: Fix typo (REG_A6XX_CP_CSQ_IB2_STAT), add comment

Signed-off-by: Rob Clark 
Reviewed-by: Akhil P Oommen 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 16 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 34 +++
 drivers/gpu/drm/msm/msm_drv.h |  8 ++-
 drivers/gpu/drm/msm/msm_gpu.c | 20 +++-
 drivers/gpu/drm/msm/msm_gpu.h |  5 +++-
 drivers/gpu/drm/msm/msm_ringbuffer.h  | 24 +++
 6 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index ba22d3c918bc..9638ce71e172 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1677,6 +1677,22 @@ static uint32_t a5xx_get_rptr(struct msm_gpu *gpu, 
struct msm_ringbuffer *ring)
return ring->memptrs->rptr = gpu_read(gpu, REG_A5XX_CP_RB_RPTR);
 }
 
+static bool a5xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+   struct msm_cp_state cp_state = {
+   .ib1_base = gpu_read64(gpu, REG_A5XX_CP_IB1_BASE),
+   .ib2_base = gpu_read64(gpu, REG_A5XX_CP_IB2_BASE),
+   .ib1_rem  = gpu_read(gpu, REG_A5XX_CP_IB1_BUFSZ),
+   .ib2_rem  = gpu_read(gpu, REG_A5XX_CP_IB2_BUFSZ),
+   };
+   bool progress =
+   !!memcmp(_state, >last_cp_state, sizeof(cp_state));
+
+   ring->last_cp_state = cp_state;
+
+   return progress;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 1ff605c18ee6..7fe60c65a1eb 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1843,6 +1843,39 @@ static uint32_t a6xx_get_rptr(struct msm_gpu *gpu, 
struct msm_ringbuffer *ring)
return ring->memptrs->rptr = gpu_read(gpu, REG_A6XX_CP_RB_RPTR);
 }
 
+static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+   struct msm_cp_state cp_state = {
+   .ib1_base = gpu_read64(gpu, REG_A6XX_CP_IB1_BASE),
+   .ib2_base = gpu_read64(gpu, REG_A6XX_CP_IB2_BASE),
+   .ib1_rem  = gpu_read(gpu, REG_A6XX_CP_IB1_REM_SIZE),
+   .ib2_rem  = gpu_read(gpu, REG_A6XX_CP_IB2_REM_SIZE),
+   };
+   bool progress;
+
+   /*
+* Adjust the remaining data to account for what has already been
+* fetched from memory, but not yet consumed by the SQE.
+*
+* This is not *technically* correct, the amount buffered could
+* exceed the IB size due to hw prefetching ahead, but:
+*
+* (1) We aren't trying to find the exact position, just whether
+* progress has been made
+* (2) The CP_REG_TO_MEM at the end of a submit should be enough
+* to prevent prefetching into an unrelated submit.  (And
+* either way, at some point the ROQ will be full.)
+*/
+   cp_state.ib1_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB1_STAT) >> 16;
+   cp_state.ib2_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB2_STAT) >> 16;
+
+   progress = !!memcmp(_state, >last_cp_state, sizeof(cp_state));
+
+   ring->last_cp_state = cp_state;
+
+   return progress;
+}
+
 static u32 a618_get_speed_bin(u32 fuse)
 {
if (fuse == 0)
@@ -1961,6 +1994,7 @@ static const struct adreno_gpu_funcs funcs = {
.create_address_space = a6xx_create_address_space,
.create_private_address_space = 
a6xx_create_private_address_space,
.get_rptr = a6xx_get_rptr,
+   .progress = a6xx_progress,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0609daf4fa4c..876d8d5eec2f 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -225,7 +225,13 @@ struct msm_drm_private {
 
struct drm_atomic_state *pm_state;
 
-   /* For hang detection, in ms */
+   /**
+* hangcheck_period: For hang detection, in ms
+*
+* Note that in practice, a submit/job will get at least two hangcheck
+* periods, due to checking for progress being implemented as simply
+* "have the CP position registers changed since 

[PATCH v2 1/2] drm/msm/adreno: Simplify read64/write64 helpers

2022-11-01 Thread Rob Clark
From: Rob Clark 

The _HI reg is always following the _LO reg, so no need to pass these
offsets seprately.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
Reviewed-by: Akhil P Oommen 
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  3 +--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 27 -
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c   |  4 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 24 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  3 +--
 drivers/gpu/drm/msm/msm_gpu.h   | 12 -
 6 files changed, 27 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 7cb8d9849c07..a10feb8a4194 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -606,8 +606,7 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
 
 static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-   *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
-   REG_A4XX_RBBM_PERFCTR_CP_0_HI);
+   *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
 
return 0;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 3dcec7acb384..ba22d3c918bc 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -605,11 +605,9 @@ static int a5xx_ucode_init(struct msm_gpu *gpu)
a5xx_ucode_check_version(a5xx_gpu, a5xx_gpu->pfp_bo);
}
 
-   gpu_write64(gpu, REG_A5XX_CP_ME_INSTR_BASE_LO,
-   REG_A5XX_CP_ME_INSTR_BASE_HI, a5xx_gpu->pm4_iova);
+   gpu_write64(gpu, REG_A5XX_CP_ME_INSTR_BASE_LO, a5xx_gpu->pm4_iova);
 
-   gpu_write64(gpu, REG_A5XX_CP_PFP_INSTR_BASE_LO,
-   REG_A5XX_CP_PFP_INSTR_BASE_HI, a5xx_gpu->pfp_iova);
+   gpu_write64(gpu, REG_A5XX_CP_PFP_INSTR_BASE_LO, a5xx_gpu->pfp_iova);
 
return 0;
 }
@@ -868,8 +866,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 * memory rendering at this point in time and we don't want to block off
 * part of the virtual memory space.
 */
-   gpu_write64(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO,
-   REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x);
+   gpu_write64(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO, 0x);
gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x);
 
/* Put the GPU into 64 bit by default */
@@ -908,8 +905,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
return ret;
 
/* Set the ringbuffer address */
-   gpu_write64(gpu, REG_A5XX_CP_RB_BASE, REG_A5XX_CP_RB_BASE_HI,
-   gpu->rb[0]->iova);
+   gpu_write64(gpu, REG_A5XX_CP_RB_BASE, gpu->rb[0]->iova);
 
/*
 * If the microcode supports the WHERE_AM_I opcode then we can use that
@@ -936,7 +932,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
}
 
gpu_write64(gpu, REG_A5XX_CP_RB_RPTR_ADDR,
-   REG_A5XX_CP_RB_RPTR_ADDR_HI, shadowptr(a5xx_gpu, 
gpu->rb[0]));
+   shadowptr(a5xx_gpu, gpu->rb[0]));
} else if (gpu->nr_rings > 1) {
/* Disable preemption if WHERE_AM_I isn't available */
a5xx_preempt_fini(gpu);
@@ -1239,9 +1235,9 @@ static void a5xx_fault_detect_irq(struct msm_gpu *gpu)
gpu_read(gpu, REG_A5XX_RBBM_STATUS),
gpu_read(gpu, REG_A5XX_CP_RB_RPTR),
gpu_read(gpu, REG_A5XX_CP_RB_WPTR),
-   gpu_read64(gpu, REG_A5XX_CP_IB1_BASE, REG_A5XX_CP_IB1_BASE_HI),
+   gpu_read64(gpu, REG_A5XX_CP_IB1_BASE),
gpu_read(gpu, REG_A5XX_CP_IB1_BUFSZ),
-   gpu_read64(gpu, REG_A5XX_CP_IB2_BASE, REG_A5XX_CP_IB2_BASE_HI),
+   gpu_read64(gpu, REG_A5XX_CP_IB2_BASE),
gpu_read(gpu, REG_A5XX_CP_IB2_BUFSZ));
 
/* Turn off the hangcheck timer to keep it from bothering us */
@@ -1427,8 +1423,7 @@ static int a5xx_pm_suspend(struct msm_gpu *gpu)
 
 static int a5xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-   *value = gpu_read64(gpu, REG_A5XX_RBBM_ALWAYSON_COUNTER_LO,
-   REG_A5XX_RBBM_ALWAYSON_COUNTER_HI);
+   *value = gpu_read64(gpu, REG_A5XX_RBBM_ALWAYSON_COUNTER_LO);
 
return 0;
 }
@@ -1465,8 +1460,7 @@ static int a5xx_crashdumper_run(struct msm_gpu *gpu,
if (IS_ERR_OR_NULL(dumper->ptr))
return -EINVAL;
 
-   gpu_write64(gpu, REG_A5XX_CP_CRASH_SCRIPT_BASE_LO,
-   REG_A5XX_CP_CRASH_SCRIPT_BASE_HI, dumper->iova);
+   gpu_write64(gpu, REG_A5XX_CP_CRASH_SCRIPT_BASE_LO, dumper->iova);
 
gpu_write(gpu, REG_A5XX_CP_CRASH_DUMP_CNTL, 1);
 
@@ -1666,8 +1660,7 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned 
long *out_sample_rate)
 {
u64 bus

[PATCH v2 0/2] drm/msm: Improved hang detection

2022-11-01 Thread Rob Clark
From: Rob Clark 

Try to detect when submit jobs are making forward progress and give them
a bit more time.

Rob Clark (2):
  drm/msm/adreno: Simplify read64/write64 helpers
  drm/msm: Hangcheck progress detection

 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  3 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 43 +--
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c   |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 58 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  3 +-
 drivers/gpu/drm/msm/msm_drv.h   |  8 ++-
 drivers/gpu/drm/msm/msm_gpu.c   | 20 ++-
 drivers/gpu/drm/msm/msm_gpu.h   | 17 +++---
 drivers/gpu/drm/msm/msm_ringbuffer.h| 24 +
 9 files changed, 131 insertions(+), 49 deletions(-)

-- 
2.38.1



Re: [PATCH 2/2] drm/msm: Hangcheck progress detection

2022-11-01 Thread Rob Clark
On Tue, Nov 1, 2022 at 12:54 PM Akhil P Oommen  wrote:
>
> On 11/1/2022 4:24 AM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > If the hangcheck timer expires, check if the fw's position in the
> > cmdstream has advanced (changed) since last timer expiration, and
> > allow it up to three additional "extensions" to it's alotted time.
> > The intention is to continue to catch "shader stuck in a loop" type
> > hangs quickly, but allow more time for things that are actually
> > making forward progress.
> >
> > Because we need to sample the CP state twice to detect if there has
> > not been progress, this also cuts the the timer's duration in half.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 34 +++
> >   drivers/gpu/drm/msm/msm_drv.h |  8 ++-
> >   drivers/gpu/drm/msm/msm_gpu.c | 20 +++-
> >   drivers/gpu/drm/msm/msm_gpu.h |  5 +++-
> >   drivers/gpu/drm/msm/msm_ringbuffer.h  | 24 +++
> >   5 files changed, 88 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 1ff605c18ee6..3b8fb7a11dff 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1843,6 +1843,39 @@ static uint32_t a6xx_get_rptr(struct msm_gpu *gpu, 
> > struct msm_ringbuffer *ring)
> >   return ring->memptrs->rptr = gpu_read(gpu, REG_A6XX_CP_RB_RPTR);
> >   }
> >
> > +static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
> > +{
> > + struct msm_cp_state cp_state = {
> > + .ib1_base = gpu_read64(gpu, REG_A6XX_CP_IB1_BASE),
> > + .ib2_base = gpu_read64(gpu, REG_A6XX_CP_IB2_BASE),
> > + .ib1_rem  = gpu_read(gpu, REG_A6XX_CP_IB1_REM_SIZE),
> > + .ib2_rem  = gpu_read(gpu, REG_A6XX_CP_IB2_REM_SIZE),
> > + };
> > + bool progress;
> > +
> > + /*
> > +  * Adjust the remaining data to account for what has already been
> > +  * fetched from memory, but not yet consumed by the SQE.
> > +  *
> > +  * This is not *technically* correct, the amount buffered could
> > +  * exceed the IB size due to hw prefetching ahead, but:
> > +  *
> > +  * (1) We aren't trying to find the exact position, just whether
> > +  * progress has been made
> > +  * (2) The CP_REG_TO_MEM at the end of a submit should be enough
> > +  * to prevent prefetching into an unrelated submit.  (And
> > +  * either way, at some point the ROQ will be full.)
> > +  */
> > + cp_state.ib1_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB1_STAT) >> 16;
> > + cp_state.ib2_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB1_STAT) >> 16;
> REG_A6XX_CP_CSQ_IB1_STAT -> REG_A6XX_CP_CSQ_IB2_STAT

oops, will fix in v2

> With that, Reviewed-by: Akhil P Oommen 
>
> -Akhil.
> > +
> > + progress = !!memcmp(_state, >last_cp_state, 
> > sizeof(cp_state));
> > +
> > + ring->last_cp_state = cp_state;
> > +
> > + return progress;
> > +}
> > +
> >   static u32 a618_get_speed_bin(u32 fuse)
> >   {
> >   if (fuse == 0)
> > @@ -1961,6 +1994,7 @@ static const struct adreno_gpu_funcs funcs = {
> >   .create_address_space = a6xx_create_address_space,
> >   .create_private_address_space = 
> > a6xx_create_private_address_space,
> >   .get_rptr = a6xx_get_rptr,
> > + .progress = a6xx_progress,
> >   },
> >   .get_timestamp = a6xx_get_timestamp,
> >   };
> > diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> > index efcd7260f428..970a1a0ab34f 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.h
> > +++ b/drivers/gpu/drm/msm/msm_drv.h
> > @@ -226,7 +226,13 @@ struct msm_drm_private {
> >
> >   struct drm_atomic_state *pm_state;
> >
> > - /* For hang detection, in ms */
> > + /**
> > +  * hangcheck_period: For hang detection, in ms
> > +  *
> > +  * Note that in practice, a submit/job will get at least two hangcheck
> > +  * periods, due to checking for progress being implemented as simply
> > +  * "have the CP position registers changed since last time?"
> > +  */
> >   unsigned int hangcheck_period;
> >
> >   /**
> > diff --git a/drivers

[PATCH] drm/msm: Remove exclusive-fence hack

2022-11-01 Thread Rob Clark
From: Rob Clark 

The workaround was initially necessary due to dma_resv having only a
single exclusive fence slot, yet whe don't necessarily know what order
the gpu scheduler will schedule jobs.  Unfortunately this workaround
also has the result of forcing implicit sync, even when userspace does
not want it.

However, since commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
dma_resv workaround") the workaround is no longer needed.  So remove
it.  This effectively reverts commit f1b3f696a084 ("drm/msm: Don't
break exclusive fence ordering")

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5599d93ec0d2..cc48f73adadf 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -334,8 +334,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, 
bool no_implicit)
if (ret)
return ret;
 
-   /* exclusive fences must be ordered */
-   if (no_implicit && !write)
+   if (no_implicit)
continue;
 
ret = drm_sched_job_add_implicit_dependencies(>base,
-- 
2.38.1



[PATCH 1/2] drm/msm/adreno: Simplify read64/write64 helpers

2022-10-31 Thread Rob Clark
From: Rob Clark 

The _HI reg is always following the _LO reg, so no need to pass these
offsets seprately.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  3 +--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 27 -
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c   |  4 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 24 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  3 +--
 drivers/gpu/drm/msm/msm_gpu.h   | 12 -
 6 files changed, 27 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 7cb8d9849c07..a10feb8a4194 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -606,8 +606,7 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
 
 static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-   *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
-   REG_A4XX_RBBM_PERFCTR_CP_0_HI);
+   *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
 
return 0;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 3dcec7acb384..ba22d3c918bc 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -605,11 +605,9 @@ static int a5xx_ucode_init(struct msm_gpu *gpu)
a5xx_ucode_check_version(a5xx_gpu, a5xx_gpu->pfp_bo);
}
 
-   gpu_write64(gpu, REG_A5XX_CP_ME_INSTR_BASE_LO,
-   REG_A5XX_CP_ME_INSTR_BASE_HI, a5xx_gpu->pm4_iova);
+   gpu_write64(gpu, REG_A5XX_CP_ME_INSTR_BASE_LO, a5xx_gpu->pm4_iova);
 
-   gpu_write64(gpu, REG_A5XX_CP_PFP_INSTR_BASE_LO,
-   REG_A5XX_CP_PFP_INSTR_BASE_HI, a5xx_gpu->pfp_iova);
+   gpu_write64(gpu, REG_A5XX_CP_PFP_INSTR_BASE_LO, a5xx_gpu->pfp_iova);
 
return 0;
 }
@@ -868,8 +866,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 * memory rendering at this point in time and we don't want to block off
 * part of the virtual memory space.
 */
-   gpu_write64(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO,
-   REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x);
+   gpu_write64(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_LO, 0x);
gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x);
 
/* Put the GPU into 64 bit by default */
@@ -908,8 +905,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
return ret;
 
/* Set the ringbuffer address */
-   gpu_write64(gpu, REG_A5XX_CP_RB_BASE, REG_A5XX_CP_RB_BASE_HI,
-   gpu->rb[0]->iova);
+   gpu_write64(gpu, REG_A5XX_CP_RB_BASE, gpu->rb[0]->iova);
 
/*
 * If the microcode supports the WHERE_AM_I opcode then we can use that
@@ -936,7 +932,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
}
 
gpu_write64(gpu, REG_A5XX_CP_RB_RPTR_ADDR,
-   REG_A5XX_CP_RB_RPTR_ADDR_HI, shadowptr(a5xx_gpu, 
gpu->rb[0]));
+   shadowptr(a5xx_gpu, gpu->rb[0]));
} else if (gpu->nr_rings > 1) {
/* Disable preemption if WHERE_AM_I isn't available */
a5xx_preempt_fini(gpu);
@@ -1239,9 +1235,9 @@ static void a5xx_fault_detect_irq(struct msm_gpu *gpu)
gpu_read(gpu, REG_A5XX_RBBM_STATUS),
gpu_read(gpu, REG_A5XX_CP_RB_RPTR),
gpu_read(gpu, REG_A5XX_CP_RB_WPTR),
-   gpu_read64(gpu, REG_A5XX_CP_IB1_BASE, REG_A5XX_CP_IB1_BASE_HI),
+   gpu_read64(gpu, REG_A5XX_CP_IB1_BASE),
gpu_read(gpu, REG_A5XX_CP_IB1_BUFSZ),
-   gpu_read64(gpu, REG_A5XX_CP_IB2_BASE, REG_A5XX_CP_IB2_BASE_HI),
+   gpu_read64(gpu, REG_A5XX_CP_IB2_BASE),
gpu_read(gpu, REG_A5XX_CP_IB2_BUFSZ));
 
/* Turn off the hangcheck timer to keep it from bothering us */
@@ -1427,8 +1423,7 @@ static int a5xx_pm_suspend(struct msm_gpu *gpu)
 
 static int a5xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-   *value = gpu_read64(gpu, REG_A5XX_RBBM_ALWAYSON_COUNTER_LO,
-   REG_A5XX_RBBM_ALWAYSON_COUNTER_HI);
+   *value = gpu_read64(gpu, REG_A5XX_RBBM_ALWAYSON_COUNTER_LO);
 
return 0;
 }
@@ -1465,8 +1460,7 @@ static int a5xx_crashdumper_run(struct msm_gpu *gpu,
if (IS_ERR_OR_NULL(dumper->ptr))
return -EINVAL;
 
-   gpu_write64(gpu, REG_A5XX_CP_CRASH_SCRIPT_BASE_LO,
-   REG_A5XX_CP_CRASH_SCRIPT_BASE_HI, dumper->iova);
+   gpu_write64(gpu, REG_A5XX_CP_CRASH_SCRIPT_BASE_LO, dumper->iova);
 
gpu_write(gpu, REG_A5XX_CP_CRASH_DUMP_CNTL, 1);
 
@@ -1666,8 +1660,7 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned 
long *out_sample_rate)
 {
u64 busy_cycles;
 
-   busy_cycles =

[PATCH 2/2] drm/msm: Hangcheck progress detection

2022-10-31 Thread Rob Clark
From: Rob Clark 

If the hangcheck timer expires, check if the fw's position in the
cmdstream has advanced (changed) since last timer expiration, and
allow it up to three additional "extensions" to it's alotted time.
The intention is to continue to catch "shader stuck in a loop" type
hangs quickly, but allow more time for things that are actually
making forward progress.

Because we need to sample the CP state twice to detect if there has
not been progress, this also cuts the the timer's duration in half.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 34 +++
 drivers/gpu/drm/msm/msm_drv.h |  8 ++-
 drivers/gpu/drm/msm/msm_gpu.c | 20 +++-
 drivers/gpu/drm/msm/msm_gpu.h |  5 +++-
 drivers/gpu/drm/msm/msm_ringbuffer.h  | 24 +++
 5 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 1ff605c18ee6..3b8fb7a11dff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1843,6 +1843,39 @@ static uint32_t a6xx_get_rptr(struct msm_gpu *gpu, 
struct msm_ringbuffer *ring)
return ring->memptrs->rptr = gpu_read(gpu, REG_A6XX_CP_RB_RPTR);
 }
 
+static bool a6xx_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+   struct msm_cp_state cp_state = {
+   .ib1_base = gpu_read64(gpu, REG_A6XX_CP_IB1_BASE),
+   .ib2_base = gpu_read64(gpu, REG_A6XX_CP_IB2_BASE),
+   .ib1_rem  = gpu_read(gpu, REG_A6XX_CP_IB1_REM_SIZE),
+   .ib2_rem  = gpu_read(gpu, REG_A6XX_CP_IB2_REM_SIZE),
+   };
+   bool progress;
+
+   /*
+* Adjust the remaining data to account for what has already been
+* fetched from memory, but not yet consumed by the SQE.
+*
+* This is not *technically* correct, the amount buffered could
+* exceed the IB size due to hw prefetching ahead, but:
+*
+* (1) We aren't trying to find the exact position, just whether
+* progress has been made
+* (2) The CP_REG_TO_MEM at the end of a submit should be enough
+* to prevent prefetching into an unrelated submit.  (And
+* either way, at some point the ROQ will be full.)
+*/
+   cp_state.ib1_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB1_STAT) >> 16;
+   cp_state.ib2_rem += gpu_read(gpu, REG_A6XX_CP_CSQ_IB1_STAT) >> 16;
+
+   progress = !!memcmp(_state, >last_cp_state, sizeof(cp_state));
+
+   ring->last_cp_state = cp_state;
+
+   return progress;
+}
+
 static u32 a618_get_speed_bin(u32 fuse)
 {
if (fuse == 0)
@@ -1961,6 +1994,7 @@ static const struct adreno_gpu_funcs funcs = {
.create_address_space = a6xx_create_address_space,
.create_private_address_space = 
a6xx_create_private_address_space,
.get_rptr = a6xx_get_rptr,
+   .progress = a6xx_progress,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index efcd7260f428..970a1a0ab34f 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -226,7 +226,13 @@ struct msm_drm_private {
 
struct drm_atomic_state *pm_state;
 
-   /* For hang detection, in ms */
+   /**
+* hangcheck_period: For hang detection, in ms
+*
+* Note that in practice, a submit/job will get at least two hangcheck
+* periods, due to checking for progress being implemented as simply
+* "have the CP position registers changed since last time?"
+*/
unsigned int hangcheck_period;
 
/**
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3dffee54a951..136f5977b0bf 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -500,6 +500,21 @@ static void hangcheck_timer_reset(struct msm_gpu *gpu)
round_jiffies_up(jiffies + 
msecs_to_jiffies(priv->hangcheck_period)));
 }
 
+static bool made_progress(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+   if (ring->hangcheck_progress_retries >= 
DRM_MSM_HANGCHECK_PROGRESS_RETRIES)
+   return false;
+
+   if (!gpu->funcs->progress)
+   return false;
+
+   if (!gpu->funcs->progress(gpu, ring))
+   return false;
+
+   ring->hangcheck_progress_retries++;
+   return true;
+}
+
 static void hangcheck_handler(struct timer_list *t)
 {
struct msm_gpu *gpu = from_timer(gpu, t, hangcheck_timer);
@@ -511,9 +526,12 @@ static void hangcheck_handler(struct timer_list *t)
if (fence != ring->hangcheck_fence) {
/* some progress has been made.. ya! */
ring->hangcheck_fence = fence;
- 

[PATCH 0/2] drm/msm: Improved hang detection

2022-10-31 Thread Rob Clark
From: Rob Clark 

Try to detect when submit jobs are making forward progress and give them
a bit more time.

Rob Clark (2):
  drm/msm/adreno: Simplify read64/write64 helpers
  drm/msm: Hangcheck progress detection

 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  3 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 27 --
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c   |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 58 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  3 +-
 drivers/gpu/drm/msm/msm_drv.h   |  8 ++-
 drivers/gpu/drm/msm/msm_gpu.c   | 20 ++-
 drivers/gpu/drm/msm/msm_gpu.h   | 17 +++---
 drivers/gpu/drm/msm/msm_ringbuffer.h| 24 +
 9 files changed, 115 insertions(+), 49 deletions(-)

-- 
2.37.3



Re: [PATCH 2/3] drm/prime: set the dma_coherent flag for export

2022-10-27 Thread Rob Clark
On Thu, Oct 20, 2022 at 7:57 AM Christian König
 wrote:
>
> Am 20.10.22 um 16:43 schrieb Rob Clark:
> > On Thu, Oct 20, 2022 at 5:13 AM Christian König
> >  wrote:
> >> When a device driver is snooping the CPU cache during access we assume
> >> that all importers need to be able to snoop the CPU cache as well.
> >>
> >> Signed-off-by: Christian König 
> >> ---
> >>   drivers/gpu/drm/drm_prime.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> >> index 20e109a802ae..d5c70b6fe8a4 100644
> >> --- a/drivers/gpu/drm/drm_prime.c
> >> +++ b/drivers/gpu/drm/drm_prime.c
> >> @@ -28,6 +28,7 @@
> >>
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>
> >> @@ -889,6 +890,7 @@ struct dma_buf *drm_gem_prime_export(struct 
> >> drm_gem_object *obj,
> >>  .size = obj->size,
> >>  .flags = flags,
> >>  .priv = obj,
> >> +   .coherent = dev_is_dma_coherent(dev->dev),
> > To set the coherent flag correctly, I think I'd need a way to override
> > on a per buffer basis, since coherency is a property of the gpu
> > pgtables (which in the msm case is an immutable property of the gem
> > object).  We also have some awkwardness that drm->dev isn't actually
> > the GPU, thanks to the kernels device model seeing a collection of
> > other small devices shoehorned into a single drm device to fit
> > userspace's view of the world.  So relying on drm->dev isn't really
> > going to give sensible results.
>
> Yeah, I've the same problem for amdgpu where some buffers are snooped
> while others aren't.
>
> But this should be unproblematic since the flag can always be cleared by
> the driver later on (it just can't be set).
>
> Additional to that I've just noted that armada, i915, omap and tegra use
> their own DMA-buf export function. MSM could do the same as well if the
> device itself is marked as not coherent while some buffers are mapped
> cache coherent.

yeah, I guess that would work.. it would be a bit unfortunate to need
to use our own export function, but I guess it is a small price to pay
and I like the overall idea, so a-b for the series

For the VMM case, it would be nice to expose this to userspace, but
I've sent a patch to do this in an msm specific way, and I guess at
least solving that problem for one driver and better than the current
state of "iff driver == "i915" { it's mmap'd cached } else { it's
writecombine }" in crosvm

Admittedly the VMM case is a rather special case compared to your
average userspace dealing with dmabuf's, but it would be nice to get
out of the current situation where it is having to make assumptions
which are quite possibly wrong, so giving the VMM some information
even if it is "the cachability isn't static, you should bail now if
your arch can't cope" would be an improvement.  (For background, this
case is also a bit specific for android/gralloc.. for driver allocated
buffers in a VM, with the native usermode driver (UMD) in guest, you
still have some control within the UMD)

BR,
-R


> Regards,
> Christian.
>
> > I guess msm could just bury our heads in the sand and continue to do
> > things the way we have been (buffers that are mapped cached-coherent
> > are only self-shared) but would be nice to catch if userspace tried to
> > import one into (for ex) v4l2..
> >
> > BR,
> > -R
> >
> >>  .resv = obj->resv,
> >>  };
> >>
> >> --
> >> 2.25.1
> >>
>


Re: [PATCH v3 1/2] drm/msm: move domain allocation into msm_iommu_new()

2022-10-27 Thread Rob Clark
On Tue, Oct 25, 2022 at 1:04 PM Dmitry Baryshkov
 wrote:
>
> After the msm_iommu instance is created, the IOMMU domain is completely
> handled inside the msm_iommu code. Move the iommu_domain_alloc() call
> into the msm_iommu_new() to simplify callers code.
>
> Reported-by: kernel test robot 
> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c| 12 +---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 25 +---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 25 +---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  2 --
>  drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 19 +-
>  drivers/gpu/drm/msm/msm_drv.c| 18 -
>  drivers/gpu/drm/msm/msm_iommu.c  | 20 ---
>  drivers/gpu/drm/msm/msm_mmu.h|  3 ++-
>  8 files changed, 60 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index e033d6a67a20..6484b97c5344 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1213,19 +1213,17 @@ static int a6xx_gmu_memory_alloc(struct a6xx_gmu 
> *gmu, struct a6xx_gmu_bo *bo,
>
>  static int a6xx_gmu_memory_probe(struct a6xx_gmu *gmu)
>  {
> -   struct iommu_domain *domain;
> struct msm_mmu *mmu;
>
> -   domain = iommu_domain_alloc(_bus_type);
> -   if (!domain)
> +   mmu = msm_iommu_new(gmu->dev, 0);
> +   if (!mmu)
> return -ENODEV;
> +   if (IS_ERR(mmu))
> +   return PTR_ERR(mmu);
>
> -   mmu = msm_iommu_new(gmu->dev, domain);
> gmu->aspace = msm_gem_address_space_create(mmu, "gmu", 0x0, 
> 0x8000);
> -   if (IS_ERR(gmu->aspace)) {
> -   iommu_domain_free(domain);
> +   if (IS_ERR(gmu->aspace))
> return PTR_ERR(gmu->aspace);
> -   }
>
> return 0;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index fdc578016e0b..7a1b4397b842 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1784,37 +1784,30 @@ static void a6xx_gpu_set_freq(struct msm_gpu *gpu, 
> struct dev_pm_opp *opp,
>  static struct msm_gem_address_space *
>  a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
>  {
> -   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> -   struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> -   struct iommu_domain *iommu;
> struct msm_mmu *mmu;
> struct msm_gem_address_space *aspace;
> +   struct iommu_domain_geometry *geometry;
> u64 start, size;
>
> -   iommu = iommu_domain_alloc(_bus_type);
> -   if (!iommu)
> -   return NULL;
> -
> /*
>  * This allows GPU to set the bus attributes required to use system
>  * cache on behalf of the iommu page table walker.
>  */
> -   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> -   adreno_set_llc_attributes(iommu);
> -
> -   mmu = msm_iommu_new(>dev, iommu);
> -   if (IS_ERR(mmu)) {
> -   iommu_domain_free(iommu);
> +   mmu = msm_iommu_new(>dev, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA);

I think/assume the quirk still needs to be conditional on
a6xx_gpu->htw_llc_slice.. or at least I'm not sure what happens if we
set it but do not have an LLCC (or allocated slice)

BR,
-R

> +   if (IS_ERR_OR_NULL(mmu))
> return ERR_CAST(mmu);
> -   }
> +
> +   geometry = msm_iommu_get_geometry(mmu);
> +   if (IS_ERR(geometry))
> +   return ERR_CAST(geometry);
>
> /*
>  * Use the aperture start or SZ_16M, whichever is greater. This will
>  * ensure that we align with the allocated pagetable range while still
>  * allowing room in the lower 32 bits for GMEM and whatnot
>  */
> -   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> -   size = iommu->geometry.aperture_end - start + 1;
> +   start = max_t(u64, SZ_16M, geometry->aperture_start);
> +   size = geometry->aperture_end - start + 1;
>
> aspace = msm_gem_address_space_create(mmu, "gpu",
> start & GENMASK_ULL(48, 0), size);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 382fb7f9e497..5808911899c7 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -191,37 +191,30 @@ int adreno_zap_shader_load(struct msm_gpu *gpu, u32 
> pasid)
> return zap_shader_load_mdt(gpu, adreno_gpu->info->zapfw, pasid);
>  }
>
> -void adreno_set_llc_attributes(struct iommu_domain *iommu)
> -{
> -   iommu_set_pgtable_quirks(iommu, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA);
> -}
> -
>  struct msm_gem_address_space *
>  adreno_iommu_create_address_space(struct msm_gpu *gpu,
>   

Re: Must-Pass Test Suite for KMS drivers

2022-10-27 Thread Rob Clark
On Wed, Oct 26, 2022 at 1:17 AM  wrote:
>
> Hi Rob,
>
> On Mon, Oct 24, 2022 at 08:48:15AM -0700, Rob Clark wrote:
> > On Mon, Oct 24, 2022 at 5:43 AM  wrote:
> > > I've discussing the idea for the past year to add an IGT test suite that
> > > all well-behaved KMS drivers must pass.
> > >
> > > The main idea behind it comes from v4l2-compliance and cec-compliance,
> > > that are being used to validate that the drivers are sane.
> > >
> > > We should probably start building up the test list, and eventually
> > > mandate that all tests pass for all the new KMS drivers we would merge
> > > in the kernel, and be run by KCi or similar.
> >
> > Let's get https://patchwork.freedesktop.org/patch/502641/ merged
> > first, that already gives us a mechanism similar to what we use in
> > mesa to track pass/fail/flake
>
> I'm not sure it's a dependency per-se, and I believe both can (and
> should) happen separately.

Basically my reasoning is that getting IGT green is a process that so
far is consisting of equal parts IGT test fixes, to clear out
lingering i915'isms, etc, and driver fixes.  Yes, you could do this
manually but the drm/ci approach seems like it would make it easier to
track, so it is easier to see what tests are being run on which hw,
and what the pass/fail/flake status is.  And the expectation files can
also be updated as we uprev the igt version being used in CI.

I could be biased by how CI has been deployed (IMHO, successfully) in
mesa.. my experience there doesn't make me see any value in a
"mustpass" list.  But does make me see value in automating and
tracking status.  Obviously we want all the tests to pass, but getting
there is going to be a process.  Tracking that progress is the thing
that is useful now.

BR,
-R

> AFAIU, the CI patches are here to track which tests are supposed to be
> working and which aren't so that we can track regressions.
>
> The list I was talking about is here to identify issues in the first
> place. All tests must pass, and if one fails it should be considered a
> hard failure.
>
> This would be eligible for CI only for drivers which have been known to
> pass them all already, but we wouldn't need to track which ones can fail
> or not, all of them must.
>
> > Beyond that, I think some of the igt tests need to get more stable
> > before we could consider a "mustpass" list.
>
> I agree that IGT tests could get more stable on ARM platforms, but it's
> also a chicken-and-egg issue. If no-one is using them regularly on ARM,
> then they'll never get fixed.
>
> > The kms_lease tests seem to fail on msm due to bad assumptions in the
> > test about which CRTCs primary planes can attach to. The legacy-cursor
> > crc tests seem a bit racy (there was a patch posted for that, not sure
> > if it landed yet), etc.
>
> And this is fine, we can merge that list without them, and if and when
> they get stable, we'll add them later.
>
> > The best thing to do is actually start running CI and tracking xfails
> > and flakes ;-)
>
> Again, I wouldn't oppose them.
>
> The issue I'm trying to solve is that there's just no way to know, at
> the moment:
>
>   - When you're running IGT, which tests are relevant for your platform
> exactly.
>
>   - If some of them fail, is it expected for them to fail or not. The
> ci/ patch you mentioned help for that a bit, but only for platforms
> where someone already did that work. When you want to do that work
> in the first place, it's extremely tedious and obscure.
>
>   - And if some of them fail, is it something that I should actually fix
> or not.
>
> The mustpass list addresses all those issues by providing a baseline.
>
> Maxime


[pull] drm/msm: drm-msm-fixes-2022-10-24 for v6.1-rc3

2022-10-24 Thread Rob Clark
Hi Dave,

A few fixes for the v6.1 cycle.  Summary below.

The following changes since commit e8b595f7b058c7909e410f3e0736d95e8f909d01:

  drm/msm/hdmi: make hdmi_phy_8996 OF clk provider (2022-09-18 09:38:07 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2022-10-24

for you to fetch changes up to e0e86f25fd469ca76c1b50091372aed1ff99ca1a:

  drm/msm: Kconfig: Fix spelling mistake "throught" -> "through"
(2022-10-14 09:33:12 -0700)


msm-fixes for v6.1

- Fix shrinker deadlock
- Fix crash during suspend after unbind
- Fix IRQ lifetime issues
- Fix potential memory corruption with too many bridges
- Fix memory corruption on GPU state capture


Aashish Sharma (1):
  drm/msm: Remove redundant check for 'submit'

Akhil P Oommen (2):
  drm/msm/a6xx: Replace kcalloc() with kvzalloc()
  drm/msm/gpu: Fix crash during system suspend after unbind

Colin Ian King (1):
  drm/msm: Kconfig: Fix spelling mistake "throught" -> "through"

Johan Hovold (8):
  drm/msm: fix use-after-free on probe deferral
  drm/msm/dp: fix memory corruption with too many bridges
  drm/msm/dsi: fix memory corruption with too many bridges
  drm/msm/hdmi: fix memory corruption with too many bridges
  drm/msm/dp: fix IRQ lifetime
  drm/msm/dp: fix aux-bus EP lifetime
  drm/msm/dp: fix bridge lifetime
  drm/msm/hdmi: fix IRQ lifetime

Kuogee Hsieh (2):
  drm/msm/dp: add atomic_check to bridge ops
  drm/msm/dp: cleared DP_DOWNSPREAD_CTRL register before start link training

Nathan Huckleberry (1):
  drm/msm: Fix return type of mdp4_lvds_connector_mode_valid

Rob Clark (4):
  drm/msm/gem: Unpin objects slightly later
  drm/msm/a6xx: Fix kvzalloc vs state_kcalloc usage
  drm/msm/a6xx: Skip snapshotting unused GMU buffers
  drm/msm/a6xx: Remove state objects from list before freeing

 drivers/gpu/drm/msm/Kconfig|  2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c| 14 +++--
 drivers/gpu/drm/msm/adreno/adreno_device.c | 10 ++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|  7 -
 .../gpu/drm/msm/disp/mdp4/mdp4_lvds_connector.c|  5 ++--
 drivers/gpu/drm/msm/dp/dp_ctrl.c   | 13 -
 drivers/gpu/drm/msm/dp/dp_display.c| 23 +--
 drivers/gpu/drm/msm/dp/dp_drm.c| 34 ++
 drivers/gpu/drm/msm/dp/dp_parser.c |  6 ++--
 drivers/gpu/drm/msm/dp/dp_parser.h |  5 ++--
 drivers/gpu/drm/msm/dsi/dsi.c  |  6 
 drivers/gpu/drm/msm/hdmi/hdmi.c|  7 -
 drivers/gpu/drm/msm/msm_drv.c  |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c   |  9 +++---
 drivers/gpu/drm/msm/msm_gpu.c  |  2 ++
 drivers/gpu/drm/msm/msm_gpu.h  |  4 +++
 drivers/gpu/drm/msm/msm_ringbuffer.c   |  3 +-
 17 files changed, 120 insertions(+), 31 deletions(-)


Re: [PATCH v2 1/2] drm/msm: remove duplicated code from a6xx_create_address_space

2022-10-24 Thread Rob Clark
On Mon, Oct 24, 2022 at 8:14 AM Dmitry Baryshkov
 wrote:
>
> The function a6xx_create_address_space() is mostly a copy of
> adreno_iommu_create_address_space() with added quirk setting. Reuse the
> original function to do the work, while introducing the wrapper to set
> the quirk.
>
> Signed-off-by: Dmitry Baryshkov 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 31 -
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 ++--
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +-
>  drivers/gpu/drm/msm/msm_iommu.c |  7 ++
>  drivers/gpu/drm/msm/msm_mmu.h   |  1 +
>  5 files changed, 15 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index fdc578016e0b..7640f5b960d6 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1786,41 +1786,18 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
> platform_device *pdev)
>  {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> -   struct iommu_domain *iommu;
> -   struct msm_mmu *mmu;
> struct msm_gem_address_space *aspace;
> -   u64 start, size;
>
> -   iommu = iommu_domain_alloc(_bus_type);
> -   if (!iommu)
> -   return NULL;
> +   aspace = adreno_iommu_create_address_space(gpu, pdev);
> +   if (IS_ERR_OR_NULL(aspace))
> +   return ERR_CAST(aspace);
>
> /*
>  * This allows GPU to set the bus attributes required to use system
>  * cache on behalf of the iommu page table walker.
>  */
> if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> -   adreno_set_llc_attributes(iommu);
> -
> -   mmu = msm_iommu_new(>dev, iommu);
> -   if (IS_ERR(mmu)) {
> -   iommu_domain_free(iommu);
> -   return ERR_CAST(mmu);
> -   }
> -
> -   /*
> -* Use the aperture start or SZ_16M, whichever is greater. This will
> -* ensure that we align with the allocated pagetable range while still
> -* allowing room in the lower 32 bits for GMEM and whatnot
> -*/
> -   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> -   size = iommu->geometry.aperture_end - start + 1;
> -
> -   aspace = msm_gem_address_space_create(mmu, "gpu",
> -   start & GENMASK_ULL(48, 0), size);
> -
> -   if (IS_ERR(aspace) && !IS_ERR(mmu))
> -   mmu->funcs->destroy(mmu);
> +   adreno_set_llc_attributes(aspace->mmu);
>
> return aspace;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 382fb7f9e497..ed26b8dfc789 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -191,9 +191,9 @@ int adreno_zap_shader_load(struct msm_gpu *gpu, u32 pasid)
> return zap_shader_load_mdt(gpu, adreno_gpu->info->zapfw, pasid);
>  }
>
> -void adreno_set_llc_attributes(struct iommu_domain *iommu)
> +void adreno_set_llc_attributes(struct msm_mmu *mmu)
>  {
> -   iommu_set_pgtable_quirks(iommu, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA);
> +   msm_iommu_set_pgtable_quirks(mmu, IO_PGTABLE_QUIRK_ARM_OUTER_WBWA);
>  }

This won't actually work.. looking at the arm-smmu code, the quirks
need to be set before attaching the device.  But there is an even
simpler way, just pass the quirks bitmask to msm_iommu_new() and get
rid of adreno_set_llc_attributes(), and msm_iommu_set_pgtable_quirks()

BR,
-R

>
>  struct msm_gem_address_space *
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index e7adc5c632d0..723729e463e8 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -338,7 +338,7 @@ struct msm_gem_address_space *
>  adreno_iommu_create_address_space(struct msm_gpu *gpu,
> struct platform_device *pdev);
>
> -void adreno_set_llc_attributes(struct iommu_domain *iommu);
> +void adreno_set_llc_attributes(struct msm_mmu *mmu);
>
>  int adreno_read_speedbin(struct device *dev, u32 *speedbin);
>
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 5577cea7c009..768ab71cc43e 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -186,6 +186,13 @@ int msm_iommu_pagetable_params(struct msm_mmu *mmu,
> return 0;
>  }
>
> +int msm_iommu_set_pgtable_quirks(struct msm_mmu *mmu, unsigned long quirk)
> +{
> +   struct msm_iommu *iommu = to_msm_iommu(mmu);
> +
> +   return iommu_set_pgtable_quirks(iommu->domain, quirk);
> +}
> +
>  static const struct msm_mmu_funcs pagetable_funcs = {
> .map = msm_iommu_pagetable_map,
> .unmap = msm_iommu_pagetable_unmap,
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h 

Re: Must-Pass Test Suite for KMS drivers

2022-10-24 Thread Rob Clark
On Mon, Oct 24, 2022 at 5:43 AM  wrote:
>
> Hi,
>
> I've discussing the idea for the past year to add an IGT test suite that
> all well-behaved KMS drivers must pass.
>
> The main idea behind it comes from v4l2-compliance and cec-compliance,
> that are being used to validate that the drivers are sane.
>
> We should probably start building up the test list, and eventually
> mandate that all tests pass for all the new KMS drivers we would merge
> in the kernel, and be run by KCi or similar.

Let's get https://patchwork.freedesktop.org/patch/502641/ merged
first, that already gives us a mechanism similar to what we use in
mesa to track pass/fail/flake

Beyond that, I think some of the igt tests need to get more stable
before we could consider a "mustpass" list.  The kms_lease tests seem
to fail on msm due to bad assumptions in the test about which CRTCs
primary planes can attach to.  The legacy-cursor crc tests seem a bit
racy (there was a patch posted for that, not sure if it landed yet),
etc.

The best thing to do is actually start running CI and tracking xfails
and flakes ;-)

BR,
-R

> I did a first pass to create a draft of such a test-suite, which would
> contain:
>
> igt@core_auth@basic-auth
> igt@core_auth@getclient-master-drop
> igt@core_auth@getclient-simple
> igt@core_auth@many-magics
> igt@core_getclient
> igt@core_getstats
> igt@core_getversion
> igt@core_hotunplug@hotrebind-lateclose
> igt@core_hotunplug@hotunbind-rebind
> igt@core_hotunplug@unbind-rebind
> igt@core_setmaster
> igt@core_setmaster_vs_auth
> igt@device_reset@unbind-reset-rebind
> igt@drm_read
> igt@dumb_buffer
> igt@fbdev
> igt@feature_discovery@display
> igt@kms_3d
> igt@kms_addfb_basic
> igt@kms_async_flips
> igt@kms_color
> igt@kms_concurrent
> igt@kms_cursor_crc
> igt@kms_cursor_edge_walk
> igt@kms_cursor_legacy@basic-busy-flip-before-cursor
> igt@kms_cursor_legacy@basic-flip-after-cursor
> igt@kms_cursor_legacy@basic-flip-after-cursor
> igt@kms_display_modes
> igt@kms_dither
> igt@kms_dp_aux_dev
> igt@kms_flip@basic-flip-vs-dpms
> igt@kms_flip@basic-flip-vs-modeset
> igt@kms_flip@basic-flip-vs-wf_vblank
> igt@kms_flip@basic-plain-flip
> igt@kms_flip_event_leak@basic
> igt@kms_force_connector_basic@force-connector-state
> igt@kms_force_connector_basic@force-edid
> igt@kms_force_connector_basic@force-load-detect
> igt@kms_force_connector_basic@prune-stale-modes
> igt@kms_getfb
> igt@kms_hdmi_inject
> igt@kms_hdr
> igt@kms_invalid_mode
> igt@kms_lease
> igt@kms_panel_fitting
> igt@kms_pipe_crc_basic
> igt@kms_plane_alpha_blend
> igt@kms_plane
> igt@kms_plane_cursor
> igt@kms_plane_lowres
> igt@kms_plane_multiple
> igt@kms_plane_scaling
> igt@kms_prop_blob
> igt@kms_properties
> igt@kms_rmfb
> igt@kms_scaling_modes
> igt@kms_sequence
> igt@kms_setmode
> igt@kms_sysfs_edid_timing
> igt@kms_tv_load_detect
> igt@kms_universal_plane
> igt@kms_vblank
> igt@kms_vrr
> igt@kms_writeback
>
> Most of them are skipped on vc4 right now, but I could see that some of
> them fail already (kms_rmfb, core_hotunplug), so it proves to be useful
> already.
>
> What do you think? Is there some more tests needed, or did I include
> some tests that shouldn't have been there?
>
> Thanks!
> Maxime


Re: [PATCH v2 00/10] drm/msm: probe deferral fixes

2022-10-24 Thread Rob Clark
On Mon, Oct 24, 2022 at 4:34 AM Johan Hovold  wrote:
>
> On Fri, Oct 21, 2022 at 09:05:52AM -0700, Abhinav Kumar wrote:
> > Hi Johan
> >
> > On 10/20/2022 11:27 PM, Johan Hovold wrote:
> > > On Tue, Sep 20, 2022 at 11:06:30AM +0200, Johan Hovold wrote:
> > >> On Tue, Sep 13, 2022 at 10:53:10AM +0200, Johan Hovold wrote:
> > >>> The MSM DRM driver is currently broken in multiple ways with respect to
> > >>> probe deferral. Not only does the driver currently fail to probe again
> > >>> after a late deferral, but due to a related use-after-free bug this also
> > >>> triggers NULL-pointer dereferences.
> > >>>
> > >>> These bugs are not new but have become critical with the release of
> > >>> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> > >>> yet been loaded.
> > >>>
> > >>> The underlying problem is lifetime issues due to careless use of
> > >>> device-managed resources.
> > >>
> > >> Any chance of getting this merged for 6.1?
> > >
> > > Is anyone picking these up as fixes for 6.1-rc as we discussed?
> > >
> > > Johan
> >
> > All of these except the last two ( as discussed ) have landed in the
> > -fixes tree
> >
> > https://gitlab.freedesktop.org/drm/msm/-/commit/6808abdb33bf90330e70a687d29f038507e06ebb
>
> Ah, perfect, thanks.
>
> When do you expect to send these on so that they end up in linux-next
> and eventually Linus's tree?

I'll send a -fixes PR this week

> Note that it looks like something happened with the commit messages when
> you applied these. Specifically, the Fixes tags appears to now have a
> line break in them and there's also some random whitespace before your
> SoB:
>
> Fixes: c3bf8e21
>
>  ("drm/msm/dp: Add eDP support via aux_bus")

naw, that is just some problem with gitlab's html generation, the
actual patch is fine ;-)

BR,
-R

> Cc: sta...@vger.kernel.org  # 5.19
> Reviewed-by: Dmitry Baryshkov 
> Signed-off-by: Johan Hovold 
> Tested-by: Kuogee Hsieh 
> Reviewed-by: Kuogee Hsieh 
> Patchwork: https://patchwork.freedesktop.org/patch/502667/
> Link: 
> https://lore.kernel.org/r/20220913085320.8577-8-johan+lin...@kernel.org
>
>
> Signed-off-by: Abhinav Kumar's avatarAbhinav Kumar 
> 
>
> It's possible just the gitlab UI that's messed up, but perhaps you can
> double check before they hit linux-next, which should complain about
> this otherwise.
>
> Johan


Re: [PATCH 2/3] drm/prime: set the dma_coherent flag for export

2022-10-20 Thread Rob Clark
On Thu, Oct 20, 2022 at 5:13 AM Christian König
 wrote:
>
> When a device driver is snooping the CPU cache during access we assume
> that all importers need to be able to snoop the CPU cache as well.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_prime.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 20e109a802ae..d5c70b6fe8a4 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -28,6 +28,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -889,6 +890,7 @@ struct dma_buf *drm_gem_prime_export(struct 
> drm_gem_object *obj,
> .size = obj->size,
> .flags = flags,
> .priv = obj,
> +   .coherent = dev_is_dma_coherent(dev->dev),

To set the coherent flag correctly, I think I'd need a way to override
on a per buffer basis, since coherency is a property of the gpu
pgtables (which in the msm case is an immutable property of the gem
object).  We also have some awkwardness that drm->dev isn't actually
the GPU, thanks to the kernels device model seeing a collection of
other small devices shoehorned into a single drm device to fit
userspace's view of the world.  So relying on drm->dev isn't really
going to give sensible results.

I guess msm could just bury our heads in the sand and continue to do
things the way we have been (buffers that are mapped cached-coherent
are only self-shared) but would be nice to catch if userspace tried to
import one into (for ex) v4l2..

BR,
-R

> .resv = obj->resv,
> };
>
> --
> 2.25.1
>


Re: [PATCH v9] drm: Add initial ci/ subdirectory

2022-10-14 Thread Rob Clark
On Mon, Sep 12, 2022 at 12:29 AM Tomeu Vizoso
 wrote:
>
> And use it to store expectations about what the DRM drivers are
> supposed to pass in the IGT test suite.
>
> Also include a configuration file that points to the out-of-tree CI
> scripts.
>
> By storing the test expectations along the code we can make sure both
> stay in sync with each other, and so we can know when a code change
> breaks those expectations.
>
> This will allow all contributors to drm to reuse the infrastructure
> already in gitlab.freedesktop.org to test the driver on several
> generations of the hardware.
>
> v2:
>   - Fix names of result expectation files to match SoC
>   - Don't execute tests that are going to skip on all boards
>
> v3:
>   - Remove tracking of dmesg output during test execution
>
> v4:
>   - Move up to drivers/gpu/drm
>   - Add support for a bunch of other drivers
>   - Explain how to incorporate fixes for CI from a
> ${TARGET_BRANCH}-external-fixes branch
>   - Remove tests that pass from expected results file, to reduce the
> size of in-tree files
>   - Add docs about how to deal with outages in automated testing labs
>   - Specify the exact SHA of the CI scripts to be used
>
> v5:
>   - Remove unneeded skips from Meson expectations file
>   - Use a more advanced runner that detects flakes automatically
>   - Use a more succint format for the expectations
>   - Run many more tests (and use sharding to finish in time)
>   - Use skip lists to avoid hanging machines
>   - Add some build testing
>   - Build IGT in each pipeline for faster uprevs
>   - List failures in the GitLab UI
>
> v6:
>   - Rebase on top of latest drm-next
>   - Lower priority of LAVA jobs to not impact Mesa CI as much
>   - Update docs
>
> v7:
>   - Rebase on top of latest drm-next
>
> v8:
>   - Move all files specific to testing the kernel into the kernel tree
> (thus I have dropped the r-bs I had collected so far)
>   - Uprev Gitlab CI infrastructure scripts to the latest from Mesa
>   - Add MAINTAINERS entry
>   - Fix boot on MT8173 by adding some Kconfigs that are now needed
>   - Link to the docs from index.rst and hard-wrap the file
>
> v9:
>   - Only automatically run the pipelines for merge requests
>   - Switch to zstd for the build artifacts to align with Mesa
>   - Add Qcom USB PHYs to config as they are now =m in the defconfig
>
> Signed-off-by: Tomeu Vizoso 

Reviewed-by: Rob Clark 

> ---
>  Documentation/gpu/automated_testing.rst   |  144 +
>  Documentation/gpu/index.rst   |1 +
>  MAINTAINERS   |8 +
>  drivers/gpu/drm/ci/arm.config |   57 +
>  drivers/gpu/drm/ci/arm64.config   |  179 ++
>  drivers/gpu/drm/ci/build-igt.sh   |   43 +
>  drivers/gpu/drm/ci/build.sh   |  158 +
>  drivers/gpu/drm/ci/build.yml  |  110 +
>  drivers/gpu/drm/ci/check-patch.py |   57 +
>  drivers/gpu/drm/ci/container.yml  |   54 +
>  drivers/gpu/drm/ci/gitlab-ci.yml  |  225 ++
>  drivers/gpu/drm/ci/igt_runner.sh  |   77 +
>  drivers/gpu/drm/ci/image-tags.yml |   13 +
>  drivers/gpu/drm/ci/lava-submit.sh |   53 +
>  drivers/gpu/drm/ci/static-checks.yml  |   12 +
>  drivers/gpu/drm/ci/test.yml   |  322 ++
>  drivers/gpu/drm/ci/testlist.txt   | 2763 +
>  drivers/gpu/drm/ci/x86_64.config  |  105 +
>  .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |   19 +
>  .../drm/ci/xfails/amdgpu-stoney-flakes.txt|   15 +
>  .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |2 +
>  .../gpu/drm/ci/xfails/i915-amly-flakes.txt|   32 +
>  drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |2 +
>  drivers/gpu/drm/ci/xfails/i915-apl-fails.txt  |   29 +
>  drivers/gpu/drm/ci/xfails/i915-apl-flakes.txt |1 +
>  drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |2 +
>  drivers/gpu/drm/ci/xfails/i915-cml-flakes.txt |   37 +
>  drivers/gpu/drm/ci/xfails/i915-glk-flakes.txt |   40 +
>  drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |2 +
>  drivers/gpu/drm/ci/xfails/i915-kbl-fails.txt  |8 +
>  drivers/gpu/drm/ci/xfails/i915-kbl-flakes.txt |   25 +
>  drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |2 +
>  drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   19 +
>  drivers/gpu/drm/ci/xfails/i915-tgl-flakes.txt |5 +
>  drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |8 +
>  drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |   30 +
>  drivers/gpu/drm/ci/xfails/i915-whl-flakes.txt |1 +
>  .../drm/ci/xfails/mediatek-mt8173-fails.txt   |   29 +
>  .../drm/ci/xfails/mediatek-mt8

[PATCH 1/3] drm/msm/a6xx: Fix kvzalloc vs state_kcalloc usage

2022-10-13 Thread Rob Clark
From: Rob Clark 

adreno_show_object() is a trap!  It will re-allocate the pointer it is
passed on first call, when the data is ascii85 encoded, using kvmalloc/
kvfree().  Which means the data *passed* to it must be kvmalloc'd, ie.
we cannot use the state_kcalloc() helper.

This partially reverts
ec8f1813bf8d ("drm/msm/a6xx: Replace kcalloc() with kvzalloc()"), but
fix the missing kvfree() to fix the memory leak that was present
previously.  And adds a warning comment.

Fixes: ec8f1813bf8d ("drm/msm/a6xx: Replace kcalloc() with kvzalloc()")
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/20
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 11 ++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  7 ++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 3c112a6cc8a2..730355f9e2d4 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -819,7 +819,7 @@ static struct msm_gpu_state_bo *a6xx_snapshot_gmu_bo(
 
snapshot->iova = bo->iova;
snapshot->size = bo->size;
-   snapshot->data = state_kcalloc(a6xx_state, 1, snapshot->size);
+   snapshot->data = kvzalloc(snapshot->size, GFP_KERNEL);
if (!snapshot->data)
return NULL;
 
@@ -1034,6 +1034,15 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
struct a6xx_gpu_state *a6xx_state = container_of(state,
struct a6xx_gpu_state, base);
 
+   if (a6xx_state->gmu_log)
+   kvfree(a6xx_state->gmu_log->data);
+
+   if (a6xx_state->gmu_hfi)
+   kvfree(a6xx_state->gmu_hfi->data);
+
+   if (a6xx_state->gmu_debug)
+   kvfree(a6xx_state->gmu_debug->data);
+
list_for_each_entry_safe(obj, tmp, _state->objs, node)
kvfree(obj);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 382fb7f9e497..5a0e8491cd3a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -729,7 +729,12 @@ static char *adreno_gpu_ascii85_encode(u32 *src, size_t 
len)
return buf;
 }
 
-/* len is expected to be in bytes */
+/* len is expected to be in bytes
+ *
+ * WARNING: *ptr should be allocated with kvmalloc or friends.  It can be 
free'd
+ * with kvfree() and replaced with a newly kvmalloc'd buffer on the first call
+ * when the unencoded raw data is encoded
+ */
 void adreno_show_object(struct drm_printer *p, void **ptr, int len,
bool *encoded)
 {
-- 
2.37.3



[PATCH 3/3] drm/msm/a6xx: Remove state objects from list before freeing

2022-10-13 Thread Rob Clark
From: Rob Clark 

Technically it worked as it was before, only because it was using the
_safe version of the iterator.  But it is sloppy practice to leave
dangling pointers.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index b0124d0f286c..a5c3d1ed255a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -1046,8 +1046,10 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
if (a6xx_state->gmu_debug)
kvfree(a6xx_state->gmu_debug->data);
 
-   list_for_each_entry_safe(obj, tmp, _state->objs, node)
+   list_for_each_entry_safe(obj, tmp, _state->objs, node) {
+   list_del(>node);
kvfree(obj);
+   }
 
adreno_gpu_state_destroy(state);
kfree(a6xx_state);
-- 
2.37.3



[PATCH 2/3] drm/msm/a6xx: Skip snapshotting unused GMU buffers

2022-10-13 Thread Rob Clark
From: Rob Clark 

Some buffers are unused on certain sub-generations of a6xx.  So just
skip them.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 730355f9e2d4..b0124d0f286c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -813,6 +813,9 @@ static struct msm_gpu_state_bo *a6xx_snapshot_gmu_bo(
 {
struct msm_gpu_state_bo *snapshot;
 
+   if (!bo->size)
+   return NULL;
+
snapshot = state_kcalloc(a6xx_state, 1, sizeof(*snapshot));
if (!snapshot)
return NULL;
-- 
2.37.3



[PATCH 0/3] drm/msm/a6xx: devcore dump fixes

2022-10-13 Thread Rob Clark
From: Rob Clark 

First patch fixes a recently introduced memory corruption, the remaining
two are cleanups.

Rob Clark (3):
  drm/msm/a6xx: Fix kvzalloc vs state_kcalloc usage
  drm/msm/a6xx: Skip snapshotting unused GMU buffers
  drm/msm/a6xx: Remove state objects from list before freeing

 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 18 --
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  7 ++-
 2 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.37.3



Re: [DNM RFC PATCH] drm/msm: Use lowercase hex for !defines

2022-10-12 Thread Rob Clark
On Sat, Oct 8, 2022 at 10:43 AM Konrad Dybcio
 wrote:
>
> drm/msm capitalizes hex numbers rather randomly. Try to unify it.

yeah, there were some different preferences of various patch authors
for shouty HEX vs quiet hex... tbh I prefer the latter, but not really
sure it is worth the noise in git history to change it

BR,
-R

> Generated with:
>
> grep -rEl "\s0x\w*[A-Z]+*\w*" drivers/gpu/drm/msm | \
> xargs sed -i '/define/! s/\s0x\w*[A-Z]+*\w*/\L&/g'
> ---
> I could not find any strict hex capitalization rules for Linux, so
> I'm sending this very loosely, without an S-o-b and as a DNM RFC.
> Funnily enough, this patch somehow broke get_maintainer.pl
>
>  drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 138 +++
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 164 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 126 +++---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h |   2 +-
>  drivers/gpu/drm/msm/adreno/a5xx_power.c   | 128 +++---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  26 +--
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   4 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c   |   4 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h   |   2 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  12 +-
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 162 -
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|   4 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c|  26 +--
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c |  10 +-
>  .../gpu/drm/msm/disp/dpu1/dpu_hw_pingpong.c   |   8 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c   |  16 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_util.c   |  98 +--
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_vbif.c   |   2 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.c |   2 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   |   6 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c |  18 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_vbif.c  |   2 +-
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c  |  28 +--
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c |   2 +-
>  drivers/gpu/drm/msm/disp/mdp5/mdp5_encoder.c  |   4 +-
>  drivers/gpu/drm/msm/dp/dp_audio.c |   8 +-
>  drivers/gpu/drm/msm/dp/dp_aux.c   |   2 +-
>  drivers/gpu/drm/msm/dp/dp_catalog.c   |  14 +-
>  drivers/gpu/drm/msm/dp/dp_ctrl.c  |   4 +-
>  drivers/gpu/drm/msm/dp/dp_display.c   |   2 +-
>  drivers/gpu/drm/msm/dp/dp_link.c  |  10 +-
>  drivers/gpu/drm/msm/dsi/phy/dsi_phy_10nm.c|   4 +-
>  drivers/gpu/drm/msm/dsi/phy/dsi_phy_7nm.c |   2 +-
>  drivers/gpu/drm/msm/hdmi/hdmi.c   |   4 +-
>  drivers/gpu/drm/msm/hdmi/hdmi_hdcp.c  |  20 +--
>  drivers/gpu/drm/msm/hdmi/hdmi_phy_8996.c  |  22 +--
>  36 files changed, 543 insertions(+), 543 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> index 6c9a747eb4ad..f207588218c6 100644
> --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
> @@ -236,7 +236,7 @@ static int a2xx_hw_init(struct msm_gpu *gpu)
> for (i = 1; i < len; i++)
> gpu_write(gpu, REG_A2XX_CP_PFP_UCODE_DATA, ptr[i]);
>
> -   gpu_write(gpu, REG_AXXX_CP_QUEUE_THRESHOLDS, 0x000C0804);
> +   gpu_write(gpu, REG_AXXX_CP_QUEUE_THRESHOLDS, 0x000c0804);
>
> /* clear ME_HALT to start micro engine */
> gpu_write(gpu, REG_AXXX_CP_ME_CNTL, 0);
> @@ -335,90 +335,90 @@ static irqreturn_t a2xx_irq(struct msm_gpu *gpu)
>  }
>
>  static const unsigned int a200_registers[] = {
> -   0x, 0x0002, 0x0004, 0x000B, 0x003B, 0x003D, 0x0040, 0x0044,
> -   0x0046, 0x0047, 0x01C0, 0x01C1, 0x01C3, 0x01C8, 0x01D5, 0x01D9,
> -   0x01DC, 0x01DD, 0x01EA, 0x01EA, 0x01EE, 0x01F3, 0x01F6, 0x01F7,
> -   0x01FC, 0x01FF, 0x0391, 0x0392, 0x039B, 0x039E, 0x03B2, 0x03B5,
> -   0x03B7, 0x03B7, 0x03F8, 0x03FB, 0x0440, 0x0440, 0x0443, 0x0444,
> -   0x044B, 0x044B, 0x044D, 0x044F, 0x0452, 0x0452, 0x0454, 0x045B,
> -   0x047F, 0x047F, 0x0578, 0x0587, 0x05C9, 0x05C9, 0x05D0, 0x05D0,
> -   0x0601, 0x0604, 0x0606, 0x0609, 0x060B, 0x060E, 0x0613, 0x0614,
> -   0x0A29, 0x0A2B, 0x0A2F, 0x0A31, 0x0A40, 0x0A43, 0x0A45, 0x0A45,
> -   0x0A4E, 0x0A4F, 0x0C2C, 0x0C2C, 0x0C30, 0x0C30, 0x0C38, 0x0C3C,
> -   0x0C40, 0x0C40, 0x0C44, 0x0C44, 0x0C80, 0x0C86, 0x0C88, 0x0C94,
> -   0x0C99, 0x0C9A, 0x0CA4, 0x0CA5, 0x0D00, 0x0D03, 0x0D06, 0x0D06,
> -   0x0D08, 0x0D0B, 0x0D34, 0x0D35, 0x0DAE, 0x0DC1, 0x0DC8, 0x0DD4,
> -   0x0DD8, 0x0DD9, 0x0E00, 0x0E00, 0x0E02, 0x0E04, 0x0E17, 0x0E1E,
> -   0x0EC0, 0x0EC9, 0x0ECB, 0x0ECC, 0x0ED0, 0x0ED0, 0x0ED4, 0x0ED7,
> -   0x0EE0, 0x0EE2, 0x0F01, 0x0F02, 0x0F0C, 0x0F0C, 0x0F0E, 0x0F12,
> -   0x0F26, 0x0F2A, 0x0F2C, 0x0F2C, 0x2000, 0x2002, 0x2006, 0x200F,
> -   0x2080, 0x2082, 0x2100, 0x2109, 0x210C, 0x2114, 0x2180, 

Re: [Freedreno] [PATCH -next] drm/msm/msm_gem_shrinker: fix compile error in can_block()

2022-09-29 Thread Rob Clark
On Thu, Sep 29, 2022 at 4:51 AM Akhil P Oommen  wrote:
>
> On 9/29/2022 3:00 PM, Yang Yingliang wrote:
> > I got the compile error:
> >
> >drivers/gpu/drm/msm/msm_gem_shrinker.c: In function ‘can_block’:
> >drivers/gpu/drm/msm/msm_gem_shrinker.c:29:21: error: ‘__GFP_ATOMIC’ 
> > undeclared (first use in this function); did you mean ‘GFP_ATOMIC’?
> >  if (sc->gfp_mask & __GFP_ATOMIC)
> > ^~~~
> > GFP_ATOMIC
> >drivers/gpu/drm/msm/msm_gem_shrinker.c:29:21: note: each undeclared 
> > identifier is reported only once for each function it appears in
> >
> > __GFP_ATOMIC is dropped by commit 6708fe6bec50 ("mm: discard __GFP_ATOMIC").
> > Use __GFP_HIGH instead.
> >
> > Fixes: 025d27239a2f ("drm/msm/gem: Evict active GEM objects when necessary")
> > Signed-off-by: Yang Yingliang 
> > ---
> >   drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
> > b/drivers/gpu/drm/msm/msm_gem_shrinker.c
> > index 58e0513be5f4..6a0de6cdb82b 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
> > @@ -26,7 +26,7 @@ static bool can_swap(void)
> >
> >   static bool can_block(struct shrink_control *sc)
> >   {
> > - if (sc->gfp_mask & __GFP_ATOMIC)
> > + if (sc->gfp_mask & __GFP_HIGH)
> >   return false;
> >   return current_is_kswapd() || (sc->gfp_mask & __GFP_RECLAIM);
> >   }
>
> Reviewed-by: Akhil P Oommen 
>

Somehow the original patch didn't show up in my inbox, but I've sent this:

https://patchwork.freedesktop.org/series/109255/

I guess __GFP_HIGH could also be used to detect GFP_ATOMIC, but
checking that direct reclaim is ok seems safer (ie. it should always
be safe to sleep in that case)

BR,
-R

>
> -Akhil.


[PATCH] drm/msm: Fix build break with recent mm tree

2022-09-29 Thread Rob Clark
From: Rob Clark 

9178e3dcb121 ("mm: discard __GFP_ATOMIC") removed __GFP_ATOMIC,
replacing it with a check for not __GFP_DIRECT_RECLAIM.

Reported-by: Randy Dunlap 
Reported-by: Stephen Rothwell 
Signed-off-by: Rob Clark 
---
Sorry, this was reported by Stephen earlier in the month, while
I was on the other side of the globe and jetlagged.  Unfortunately
I forgot about it by the time I got back home.  Applying this patch
after 025d27239a2f ("drm/msm/gem: Evict active GEM objects when necessary")
but before or after 9178e3dcb121 ("mm: discard __GFP_ATOMIC") should
resolve the build breakage.

 drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 473ced14e520..8f83454ceedf 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -27,7 +27,7 @@ static bool can_swap(void)
 
 static bool can_block(struct shrink_control *sc)
 {
-   if (sc->gfp_mask & __GFP_ATOMIC)
+   if (!(sc->gfp_mask & __GFP_DIRECT_RECLAIM))
return false;
return current_is_kswapd() || (sc->gfp_mask & __GFP_RECLAIM);
 }
-- 
2.37.2



Re: linux-next: Tree for Sep 28 (drivers/gpu/drm/msm/msm_gem_shrinker.c)

2022-09-29 Thread Rob Clark
On Thu, Sep 29, 2022 at 12:09 AM Geert Uytterhoeven
 wrote:
>
> On Thu, Sep 29, 2022 at 8:10 AM Randy Dunlap  wrote:
> > On 9/28/22 12:26, broo...@kernel.org wrote:
> > > Changes since 20220927:
> > >
> >
> > on x86_64:
> >
> > ../drivers/gpu/drm/msm/msm_gem_shrinker.c: In function ‘can_block’:
> > ../drivers/gpu/drm/msm/msm_gem_shrinker.c:29:28: error: ‘__GFP_ATOMIC’ 
> > undeclared (first use in this function); did you mean ‘GFP_ATOMIC’?
> >29 | if (sc->gfp_mask & __GFP_ATOMIC)
> >   |^~~~
> >   |GFP_ATOMIC
>
> Also on m68k, as reported by nore...@ellerman.id.au
>
> I have bisected it to commit 1ccea29f90329e35 ("Merge branch
> 'mm-everything' of
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm"), but I didn't
> see immediately what caused it.

I'll send a patch for this shortly

BR,
-R


> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


[PATCH] drm/msm/gem: Unpin objects slightly later

2022-09-23 Thread Rob Clark
From: Rob Clark 

The introduction of 025d27239a2f exposes a problem with f371bcc0c2ac, in
that we need to keep the object pinned in the time the submit is queued
up in the gpu scheduler.  Otherwise the shrinker will see it as a thing
that can be evicted if we wait for it to be signaled.  But if the
shrinker path is waiting on it with the obj lock held, the job cannot be
scheduled, as that also requires briefly grabbing the obj lock, leading
to deadlock.  (Not to mention, we don't want the shrinker to evict an
obj queued up in gpu scheduler.)

Fixes: f371bcc0c2ac ("drm/msm/gem: Unpin buffers earlier")
Fixes: 025d27239a2f ("drm/msm/gem: Evict active GEM objects when necessary")
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/19
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 4 ++--
 drivers/gpu/drm/msm/msm_ringbuffer.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5599d93ec0d2..c670591995e6 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -501,11 +501,11 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct msm_gem_object *ob
  */
 static void submit_cleanup(struct msm_gem_submit *submit, bool error)
 {
-   unsigned cleanup_flags = BO_LOCKED | BO_OBJ_PINNED;
+   unsigned cleanup_flags = BO_LOCKED;
unsigned i;
 
if (error)
-   cleanup_flags |= BO_VMA_PINNED;
+   cleanup_flags |= BO_VMA_PINNED | BO_OBJ_PINNED;
 
for (i = 0; i < submit->nr_bos; i++) {
struct msm_gem_object *msm_obj = submit->bos[i].obj;
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index cad4c3525f0b..57a8e9564540 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -25,7 +25,8 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
 
msm_gem_lock(obj);
msm_gem_unpin_vma_fenced(submit->bos[i].vma, fctx);
-   submit->bos[i].flags &= ~BO_VMA_PINNED;
+   msm_gem_unpin_locked(obj);
+   submit->bos[i].flags &= ~(BO_VMA_PINNED | BO_OBJ_PINNED);
msm_gem_unlock(obj);
}
 
-- 
2.37.2



[PATCH] drm/msm: Add MSM_INFO_GET_FLAGS

2022-09-23 Thread Rob Clark
From: Rob Clark 

In some cases crosvm needs a way to query the cache flags to communicate
them to the guest kernel for guest userspace mapping.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 10 ++
 include/uapi/drm/msm_drm.h|  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 07f66412533b..66b515a956c1 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -818,6 +818,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
case MSM_INFO_GET_OFFSET:
case MSM_INFO_GET_IOVA:
case MSM_INFO_SET_IOVA:
+   case MSM_INFO_GET_FLAGS:
/* value returned as immediate, not pointer, so len==0: */
if (args->len)
return -EINVAL;
@@ -845,6 +846,15 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
case MSM_INFO_SET_IOVA:
ret = msm_ioctl_gem_info_set_iova(dev, file, obj, args->value);
break;
+   case MSM_INFO_GET_FLAGS:
+   if (obj->import_attach) {
+   ret = -EINVAL;
+   break;
+   }
+   /* Hide internal kernel-only flags: */
+   args->value = to_msm_bo(obj)->flags & MSM_BO_FLAGS;
+   ret = 0;
+   break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
if (args->len >= sizeof(msm_obj->name)) {
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 3c7b097c4e3d..f54b48ef6a2d 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -138,6 +138,7 @@ struct drm_msm_gem_new {
 #define MSM_INFO_SET_NAME  0x02   /* set the debug name (by pointer) */
 #define MSM_INFO_GET_NAME  0x03   /* get debug name, returned by pointer */
 #define MSM_INFO_SET_IOVA  0x04   /* set the iova, passed by value */
+#define MSM_INFO_GET_FLAGS 0x05   /* get the MSM_BO_x flags */
 
 struct drm_msm_gem_info {
__u32 handle; /* in */
-- 
2.37.2



[pull] drm/msm: drm-msm-next-2022-09-22 for v6.1

2022-09-22 Thread Rob Clark
drm/msm/dsi_phy_28nm_8960: Use stack memory for temporary clock names
  drm/msm/dsi/phy: Replace hardcoded char-array length with sizeof()
  drm/msm/dsi_phy_28nm_8960: Replace parent names with clk_hw pointers
  drm/msm/dsi_phy_28nm: Replace parent names with clk_hw pointers
  drm/msm/dsi_phy_14nm: Replace parent names with clk_hw pointers
  drm/msm/dsi_phy_10nm: Replace parent names with clk_hw pointers
  drm/msm/dsi_phy_7nm: Replace parent names with clk_hw pointers

Nathan Chancellor (1):
  drm/msm/dsi: Remove use of device_node in dsi_host_parse_dt()

Rob Clark (19):
  drm/msm: Reorder lock vs submit alloc
  drm/msm: Small submit cleanup
  drm/msm: Split out idr_lock
  drm/msm/gem: Check for active in shrinker path
  drm/msm/gem: Rename update_inactive
  drm/msm/gem: Rename to pin/unpin_pages
  drm/msm/gem: Consolidate pin/unpin paths
  drm/msm/gem: Remove active refcnt
  drm/gem: Add LRU/shrinker helper
  drm/msm/gem: Convert to using drm_gem_lru
  drm/msm/gem: Unpin buffers earlier
  drm/msm/gem: Consolidate shrinker trace
  drm/msm/gem: Evict active GEM objects when necessary
  drm/msm/gem: Add msm_gem_assert_locked()
  drm/msm/gem: Convert to lockdep assert
  drm/msm: Add fault-injection support
  drm/msm/iommu: optimize map/unmap
  drm/msm: De-open-code some CP_EVENT_WRITE
  drm/msm/rd: Fix FIFO-full deadlock

Stephen Boyd (4):
  drm/msm/dp: Reorganize code to avoid forward declaration
  drm/msm/dp: Remove pixel_rate from struct dp_ctrl
  drm/msm/dp: Get rid of dp_ctrl_on_stream_phy_test_report()
  drm/msm/dp: Silence inconsistent indent warning

sunliming (1):
  drm/msm/dsi: fix the inconsistent indenting

ye xingchen (1):
  drm/msm/dsi: Remove the unneeded result variable

 .../bindings/display/msm/dp-controller.yaml|  47 +++-
 .../bindings/display/msm/dpu-msm8998.yaml  |   4 +
 .../bindings/display/msm/dpu-qcm2290.yaml  |   3 +
 .../bindings/display/msm/dpu-sc7180.yaml   |   3 +
 .../bindings/display/msm/dpu-sc7280.yaml   |   3 +
 .../bindings/display/msm/dpu-sdm845.yaml   |   4 +
 .../devicetree/bindings/display/msm/gmu.yaml   | 166 ++--
 .../devicetree/bindings/display/msm/gpu.yaml   |   3 +-
 .../devicetree/bindings/display/msm/mdp4.yaml  |   2 +-
 .../devicetree/bindings/phy/qcom,hdmi-phy-qmp.yaml |  15 +-
 drivers/gpu/drm/drm_gem.c  | 170 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |   2 +-
 drivers/gpu/drm/msm/adreno/a6xx.xml.h  |   4 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  |  83 +++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  45 +++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c   |  50 ++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.h   |   1 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c|  37 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h|   2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   9 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c |  78 +++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h |  35 ++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c |  74 ++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.h |   4 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h|   6 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c|   3 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h|   4 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|  27 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c  |  94 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.h  |  22 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_vbif.c   |  65 +++--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c   |   9 +-
 drivers/gpu/drm/msm/dp/dp_catalog.c|   2 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c   | 150 +--
 drivers/gpu/drm/msm/dp/dp_ctrl.h   |   1 -
 drivers/gpu/drm/msm/dp/dp_link.c   |   5 +-
 drivers/gpu/drm/msm/dsi/dsi.c  |  37 +--
 drivers/gpu/drm/msm/dsi/dsi.h  |  31 +--
 drivers/gpu/drm/msm/dsi/dsi_cfg.c  | 172 ++--
 drivers/gpu/drm/msm/dsi/dsi_cfg.h  |   3 +-
 drivers/gpu/drm/msm/dsi/dsi_host.c | 299 ++---
 drivers/gpu/drm/msm/dsi/dsi_manager.c  | 288 +++-
 drivers/gpu/drm/msm/dsi/phy/dsi_phy.c  | 162 +++
 drivers/gpu/drm/msm/dsi/phy/dsi_phy.h  |   5 +-
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_10nm.c | 185 ++---
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_14nm.c |  87 +++---
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_20nm.c |  14 +-
 drivers/gpu/drm/msm/dsi/phy/dsi_phy_28nm.c   

Re: [PATCH 2/2] drm/msm/dpu: Add support for P010 format

2022-09-12 Thread Rob Clark
On Thu, Sep 1, 2022 at 1:34 PM Jessica Zhang  wrote:
>
> Add support for P010 color format. This adds support for both linear and
> compressed formats.
>
> Signed-off-by: Jessica Zhang 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c| 17 -
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |  1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c  |  1 +
>  3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> index 57971c08f57c..d95540309d4d 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> @@ -434,6 +434,12 @@ static const struct dpu_format dpu_format_map[] = {
> DPU_CHROMA_H2V1, DPU_FORMAT_FLAG_YUV,
> DPU_FETCH_LINEAR, 2),
>
> +   PSEUDO_YUV_FMT_LOOSE(P010,
> +   0, COLOR_8BIT, COLOR_8BIT, COLOR_8BIT,
> +   C1_B_Cb, C2_R_Cr,
> +   DPU_CHROMA_420, DPU_FORMAT_FLAG_DX | DPU_FORMAT_FLAG_YUV,
> +   DPU_FETCH_LINEAR, 2),
> +
> INTERLEAVED_YUV_FMT(VYUY,
> 0, COLOR_8BIT, COLOR_8BIT, COLOR_8BIT,
> C2_R_Cr, C0_G_Y, C1_B_Cb, C0_G_Y,
> @@ -536,6 +542,14 @@ static const struct dpu_format dpu_format_map_ubwc[] = {
> DPU_CHROMA_420, DPU_FORMAT_FLAG_YUV |
> DPU_FORMAT_FLAG_COMPRESSED,
> DPU_FETCH_UBWC, 4, DPU_TILE_HEIGHT_NV12),
> +
> +   PSEUDO_YUV_FMT_TILED(P010,
> +   0, COLOR_8BIT, COLOR_8BIT, COLOR_8BIT,
> +   C1_B_Cb, C2_R_Cr,
> +   DPU_CHROMA_420, DPU_FORMAT_FLAG_DX |
> +   DPU_FORMAT_FLAG_YUV |
> +   DPU_FORMAT_FLAG_COMPRESSED,
> +   DPU_FETCH_UBWC, 4, DPU_TILE_HEIGHT_UBWC),
>  };
>
>  /* _dpu_get_v_h_subsample_rate - Get subsample rates for all formats we 
> support
> @@ -584,7 +598,8 @@ static int _dpu_format_get_media_color_ubwc(const struct 
> dpu_format *fmt)
> int color_fmt = -1;
> int i;
>
> -   if (fmt->base.pixel_format == DRM_FORMAT_NV12) {
> +   if (fmt->base.pixel_format == DRM_FORMAT_NV12 ||
> +   fmt->base.pixel_format == DRM_FORMAT_P010) {
> if (DPU_FORMAT_IS_DX(fmt)) {
> if (fmt->unpack_tight)
> color_fmt = COLOR_FMT_NV12_BPP10_UBWC;
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> index 53b6edb2f563..199a2f755db4 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> @@ -210,6 +210,7 @@ static const uint32_t plane_formats_yuv[] = {
> DRM_FORMAT_RGBX,
> DRM_FORMAT_BGRX,
>
> +   DRM_FORMAT_P010,
> DRM_FORMAT_NV12,
> DRM_FORMAT_NV21,
> DRM_FORMAT_NV16,
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> index 60ea834dc8d6..f130bf783081 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> @@ -73,6 +73,7 @@ static const uint32_t qcom_compressed_supported_formats[] = 
> {
> DRM_FORMAT_BGR565,
>
> DRM_FORMAT_NV12,
> +   DRM_FORMAT_P010,
>  };
>
>  /**
> --
> 2.35.1
>


Re: [PATCH 1/2] drm/msm/dpu: Add support for XR30 format

2022-09-12 Thread Rob Clark
On Thu, Sep 1, 2022 at 1:34 PM Jessica Zhang  wrote:
>
> Add support for XR30 color format. This supports both linear and
> compressed formats.
>
> Signed-off-by: Jessica Zhang 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c| 7 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 ++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c  | 1 +
>  3 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> index f436a1f3419d..57971c08f57c 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c
> @@ -524,6 +524,12 @@ static const struct dpu_format dpu_format_map_ubwc[] = {
> true, 4, DPU_FORMAT_FLAG_DX | DPU_FORMAT_FLAG_COMPRESSED,
> DPU_FETCH_UBWC, 2, DPU_TILE_HEIGHT_UBWC),
>
> +   INTERLEAVED_RGB_FMT_TILED(XRGB2101010,
> +   COLOR_8BIT, COLOR_8BIT, COLOR_8BIT, COLOR_8BIT,
> +   C2_R_Cr, C0_G_Y, C1_B_Cb, C3_ALPHA, 4,
> +   true, 4, DPU_FORMAT_FLAG_DX | DPU_FORMAT_FLAG_COMPRESSED,
> +   DPU_FETCH_UBWC, 2, DPU_TILE_HEIGHT_UBWC),
> +
> PSEUDO_YUV_FMT_TILED(NV12,
> 0, COLOR_8BIT, COLOR_8BIT, COLOR_8BIT,
> C1_B_Cb, C2_R_Cr,
> @@ -571,6 +577,7 @@ static int _dpu_format_get_media_color_ubwc(const struct 
> dpu_format *fmt)
> {DRM_FORMAT_XBGR, COLOR_FMT_RGBA_UBWC},
> {DRM_FORMAT_XRGB, COLOR_FMT_RGBA_UBWC},
> {DRM_FORMAT_ABGR2101010, COLOR_FMT_RGBA1010102_UBWC},
> +   {DRM_FORMAT_XRGB2101010, COLOR_FMT_RGBA1010102_UBWC},
> {DRM_FORMAT_XBGR2101010, COLOR_FMT_RGBA1010102_UBWC},
> {DRM_FORMAT_BGR565, COLOR_FMT_RGB565_UBWC},
> };
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> index 0239a811d5ec..53b6edb2f563 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
> @@ -156,6 +156,7 @@ static const uint32_t plane_formats[] = {
> DRM_FORMAT_RGBX,
> DRM_FORMAT_BGRX,
> DRM_FORMAT_XBGR,
> +   DRM_FORMAT_XRGB2101010,
> DRM_FORMAT_RGB888,
> DRM_FORMAT_BGR888,
> DRM_FORMAT_RGB565,
> @@ -184,6 +185,7 @@ static const uint32_t plane_formats_yuv[] = {
> DRM_FORMAT_RGBA,
> DRM_FORMAT_BGRX,
> DRM_FORMAT_BGRA,
> +   DRM_FORMAT_XRGB2101010,
> DRM_FORMAT_XRGB,
> DRM_FORMAT_XBGR,
> DRM_FORMAT_RGBX,
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> index a617a3d8b1bc..60ea834dc8d6 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> @@ -69,6 +69,7 @@ static const uint32_t qcom_compressed_supported_formats[] = 
> {
> DRM_FORMAT_ARGB,
> DRM_FORMAT_XBGR,
> DRM_FORMAT_XRGB,
> +   DRM_FORMAT_XRGB2101010,
> DRM_FORMAT_BGR565,
>
> DRM_FORMAT_NV12,
> --
> 2.35.1
>


Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-09-11 Thread Rob Clark
On Wed, Sep 7, 2022 at 3:25 AM Dmitry Osipenko
 wrote:
>
> On 8/23/22 19:47, Rob Clark wrote:
> > On Tue, Aug 23, 2022 at 3:01 AM Christian König
> >  wrote:
> >>
> >> Am 22.08.22 um 19:26 schrieb Dmitry Osipenko:
> >>> On 8/16/22 22:55, Dmitry Osipenko wrote:
> >>>> On 8/16/22 15:03, Christian König wrote:
> >>>>> Am 16.08.22 um 13:44 schrieb Dmitry Osipenko:
> >>>>>> [SNIP]
> >>>>>>> The other complication I noticed is that we don't seem to keep around
> >>>>>>> the fd after importing to a GEM handle.  And I could imagine that
> >>>>>>> doing so could cause issues with too many fd's.  So I guess the best
> >>>>>>> thing is to keep the status quo and let drivers that cannot mmap
> >>>>>>> imported buffers just fail mmap?
> >>>>>> That actually should be all the drivers excluding those that use
> >>>>>> DRM-SHMEM because only DRM-SHMEM uses dma_buf_mmap(), that's why it
> >>>>>> works for Panfrost. I'm pretty sure mmaping of imported GEMs doesn't
> >>>>>> work for the MSM driver, isn't it?
> >>>>>>
> >>>>>> Intel and AMD drivers don't allow to map the imported dma-bufs. Both
> >>>>>> refuse to do the mapping.
> >>>>>>
> >>>>>> Although, AMDGPU "succeeds" to do the mapping using
> >>>>>> AMDGPU_GEM_DOMAIN_GTT, but then touching the mapping causes bus fault,
> >>>>>> hence mapping actually fails. I think it might be the AMDGPU
> >>>>>> driver/libdrm bug, haven't checked yet.
> >>>>> That's then certainly broken somehow. Amdgpu should nerve ever have
> >>>>> allowed to mmap() imported DMA-bufs and the last time I check it didn't.
> >>>> I'll take a closer look. So far I can only tell that it's a kernel
> >>>> driver issue because once I re-applied this "Don't map imported GEMs"
> >>>> patch, AMDGPU began to refuse mapping AMDGPU_GEM_DOMAIN_GTT.
> >>>>
> >>>>>> So we're back to the point that neither of DRM drivers need to map
> >>>>>> imported dma-bufs and this was never tested. In this case this patch is
> >>>>>> valid, IMO.
> >>>> Actually, I'm now looking at Etnaviv and Nouveau and seems they should
> >>>> map imported dma-buf properly. I know that people ran Android on
> >>>> Etnaviv. So maybe devices with a separated GPU/display need to map
> >>>> imported display BO for Android support. Wish somebody who ran Android
> >>>> on one of these devices using upstream drivers could give a definitive
> >>>> answer. I may try to test Nouveau later on.
> >>>>
> >>> Nouveau+Intel combo doesn't work because of [1] that says:
> >>>
> >>> "Refuse to fault imported pages. This should be handled (if at all) by
> >>> redirecting mmap to the exporter."
> >>>
> >>> [1]
> >>> https://elixir.bootlin.com/linux/v5.19/source/drivers/gpu/drm/ttm/ttm_bo_vm.c#L154
> >>>
> >>> Interestingly, I noticed that there are IGT tests which check prime
> >>> mmaping of Nouveau+Intel [2] (added 9 years ago), but they fail as well,
> >>> as expected. The fact that IGT has such tests is interesting because it
> >>> suggests that the mapping worked in the past. It's also surprising that
> >>> nobody cared to fix the failing tests. For the reference, I checked
> >>> v5.18 and today's linux-next.
> >>>
> >>> [2]
> >>> https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/prime_nv_test.c#L132
> >>>
> >>> Starting subtest: nv_write_i915_cpu_mmap_read
> >>> Received signal SIGBUS.
> >>> Stack trace:
> >>>   #0 [fatal_sig_handler+0x163]
> >>>   #1 [__sigaction+0x50]
> >>>   #2 [__igt_uniquereal_main354+0x406]
> >>>   #3 [main+0x23]
> >>>   #4 [__libc_start_call_main+0x80]
> >>>   #5 [__libc_start_main+0x89]
> >>>   #6 [_start+0x25]
> >>> Subtest nv_write_i915_cpu_mmap_read: CRASH (0,005s)
> >>>
> >>> Starting subtest: nv_write_i915_gtt_mmap_read
> >>> Received signal SIGBUS.
> >>> Stack trace:
> >>>   #0 [fatal_sig_handler+0x163]
> >>>   #1 [__sigaction+0x50]
&g

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-09-08 Thread Rob Clark
On Tue, Sep 6, 2022 at 1:01 PM Daniel Vetter  wrote:
>
> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > Higher order pages allocated using alloc_pages() aren't refcounted and 
> > > they
> > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > faults.
> > >
> > > Without this change guest virgl driver can't map host buffers into guest
> > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > mappings are also needed for enabling the Venus driver using host GPU
> > > drivers that are utilizing TTM.
> > >
> > > Based on a patch proposed by Trigger Huang.
> >
> > Well I can't count how often I have repeated this: This is an absolutely
> > clear NAK!
> >
> > TTM pages are not reference counted in the first place and because of this
> > giving them to virgl is illegal.
> >
> > Please immediately stop this completely broken approach. We have discussed
> > this multiple times now.
>
> Yeah we need to get this stuff closed for real by tagging them all with
> VM_IO or VM_PFNMAP asap.
>
> It seems ot be a recurring amount of fun that people try to mmap dma-buf
> and then call get_user_pages on them.
>
> Which just doesn't work. I guess this is also why Rob Clark send out that
> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

No, not really.. my patch was simply so that the VMM side of virtgpu
could send the correct cache mode to the guest when handed a dma-buf
;-)

BR,
-R

>
> There seems to be some serious bonghits going on :-/
> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Cc: sta...@vger.kernel.org
> > > Cc: Trigger Huang 
> > > Link: 
> > > https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > > Tested-by: Dmitry Osipenko  # AMDGPU (Qemu 
> > > and crosvm)
> > > Signed-off-by: Dmitry Osipenko 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 25 -
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c 
> > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index 21b61631f73a..11e92bb149c9 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
> > > *pool, gfp_t gfp_flags,
> > > unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > > struct ttm_pool_dma *dma;
> > > struct page *p;
> > > +   unsigned int i;
> > > void *vaddr;
> > > /* Don't set the __GFP_COMP flag for higher order allocations.
> > > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > > if (!pool->use_dma_alloc) {
> > > p = alloc_pages(gfp_flags, order);
> > > -   if (p)
> > > +   if (p) {
> > > p->private = order;
> > > +   goto ref_tail_pages;
> > > +   }
> > > return p;
> > > }
> > > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > > dma->vaddr = (unsigned long)vaddr | order;
> > > p->private = (unsigned long)dma;
> > > +
> > > +ref_tail_pages:
> > > +   /*
> > > +* KVM requires mapped tail pages to be refcounted because put_page()
> > > +* is invoked on them in the end of the page fault handling, and thus,
> > > +* tail pages need to be protected from the premature releasing.
> > > +* In fact, KVM page fault handler refuses to map tail pages to guest
> > > +* if they aren't refcounted because hva_to_pfn_remapped() checks the
> > > +* refcount specifically for this case.
> > > +*
> > > +* In particular, unreferenced tail pages result in a KVM "Bad 
> > > address"
> > > +* failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > > +* accesses mapped host TTM buffer that contains tail pages.
> > > +*/
> > > +   for (i = 1; i < 1 << order; i++)
> > > +   page_ref_inc(p + i);
> > >

Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-09-07 Thread Rob Clark
On Wed, Sep 7, 2022 at 9:55 AM Daniel Vetter  wrote:
>
> On Thu, Aug 18, 2022 at 08:01:53AM -0700, Rob Clark wrote:
> > On Thu, Aug 18, 2022 at 7:54 AM Christian König
> >  wrote:
> > >
> > > Am 18.08.22 um 16:25 schrieb Rob Clark:
> > > > On Thu, Aug 18, 2022 at 4:21 AM Christian König
> > > >  wrote:
> > > >> Am 17.08.22 um 15:44 schrieb Rob Clark:
> > > >>> On Wed, Aug 17, 2022 at 2:57 AM Christian König
> > > >>>  wrote:
> > > >>>> [SNIP]
> > > >>>>
> > > >>>> The resulting cache attrs from combination of S1 and S2 translation
> > > >>>> can differ.  So ideally we setup the S2 pgtables in guest aligned 
> > > >>>> with
> > > >>>> host userspace mappings
> > > >>>> Well exactly that is not very convincing.
> > > >>>>
> > > >>>> What you want to do is to use one channel for the address and a
> > > >>>> different one for the cache attrs, that's not something I would
> > > >>>> recommend doing in general.
> > > >>> How would that work.. mmap() is the channel for the address, we'd need
> > > >>> to introduce a new syscall that returned additional information?
> > > >> The channel for the address is not mmap(), but rather the page faults.
> > > >> mmap() is just the function setting up that channel.
> > > >>
> > > >> The page faults then insert both the address as well as the caching
> > > >> attributes (at least on x86).
> > > > This is true on arm64 as well, but only in the S1 tables (which I
> > > > would have to assume is the case on x86 as well)
> > > >
> > > >> That we then need to forward the caching attributes manually once more
> > > >> seems really misplaced.
> > > >>
> > > >>>> Instead the client pgtables should be setup in a way so that host can
> > > >>>> overwrite them.
> > > >>> How?  That is completely not how VMs work.  Even if the host knew
> > > >>> where the pgtables were and somehow magically knew the various guest
> > > >>> userspace VAs, it would be racey.
> > > >> Well you mentioned that the client page tables can be setup in a way
> > > >> that the host page tables determine what caching to use. As far as I 
> > > >> can
> > > >> see this is what we should use here.
> > > > On arm64/aarch64, they *can*.. but the system (on some versions of
> > > > armv8) can also be configured to let S2 determine the attributes.  And
> > > > apparently there are benefits to this (avoids unnecessary cache
> > > > flushing in the host, AFAIU.)  This is the case where we need this new
> > > > api.
> > > >
> > > > IMO it is fine for the exporter to return a value indicating that the
> > > > attributes change dynamically or that S1 attributes must somehow be
> > > > used by the hw.  This would at least let the VMM return an error in
> > > > cases where S1 attrs cannot be relied on.  But there are enough
> > > > exporters where the cache attrs are static for the life of the buffer.
> > > > So even if you need to return DMA_BUF_MAP_I_DONT_KNOW, maybe that is
> > > > fine (if x86 can always rely on S1 attrs), or at least will let the
> > > > VMM return an error rather than just blindly assuming things will
> > > > work.
> > > >
> > > > But it makes no sense to reject the whole idea just because of some
> > > > exporters (which may not even need this).  There is always room to let
> > > > them return a map-info value that describes the situation or
> > > > limitations to the VMM.
> > >
> > > Well it does make sense as far as I can see.
> > >
> > > This is a very specific workaround for a platform problem which only
> > > matters there, but increases complexity for everybody.
> >
> > I'm not sure how this adds complexity for everybody.. or at least the
> > intention was the default value for the new enum is the same as
> > current status-quo, so no need to plumb something thru every single
> > exporter.
>
> I think what König freaks out about here, and I think it's the same
> concern I have, is that this is for _all_ dma-buf exporter.
>
> Yeah I know we're having this "anything might not be impleme

Re: [Linaro-mm-sig] [PATCH v2 1/3] dma-buf: Add ioctl to query mmap info

2022-09-07 Thread Rob Clark
On Tue, Sep 6, 2022 at 12:46 PM Daniel Vetter  wrote:
>
> On Mon, Aug 01, 2022 at 10:04:55AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This is a fairly narrowly focused interface, providing a way for a VMM
> > in userspace to tell the guest kernel what pgprot settings to use when
> > mapping a buffer to guest userspace.
> >
> > For buffers that get mapped into guest userspace, virglrenderer returns
> > a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
> > pages into the guest VM, it needs to report to drm/virtio in the guest
> > the cache settings to use for guest userspace.  In particular, on some
> > architectures, creating aliased mappings with different cache attributes
> > is frowned upon, so it is important that the guest mappings have the
> > same cache attributes as any potential host mappings.
> >
> > Signed-off-by: Rob Clark 
> > ---
> > v2. fix compiler warning
>
> I think I bikeshedded this on irc already, here for the record too.

You should look at v3 (which I confusingly titled as v2, sry):

https://patchwork.freedesktop.org/patch/497799/?series=106847=3

> - this wont work for buffers which do change the mapping when they move
>   (ttm can do that). And cros does make noises about discrete gpus I've
>   heard, this matters even for you :-)

Correct, in v3 you could use DMA_BUF_MAP_INCOHERENT for this case (or
we could add additional enum values.. DMA_BUF_MAP_IDK or whatever)

re: dgpu, I guess those will be pass-thru so not so relevant for this issue

> - I'm pretty sure this will put is even more onto the nasty people list
>   that dma-api folks maintain, especially with passing this all to
>   userspace
> - follow_pte() can figure this out internally in the kernel and kvm is
>   already using this, and I think doing this all internally with mmu
>   notifier and what not to make sure it all stays in sync is the right
>   approach. So your kvm/whatever combo should be able to figure out wth
>   it's supposed to be doing.

This doesn't help, because the VMM is in userspace.. it is the VMM
which needs this information.

> I think if you make this a virtio special case like we've done with the
> magic uuid stuff, then that would make sense. Making it a full dma-buf
> interface doesn't imo.

IMHO we can consider this (at least in case of v3) as a virtio
special, at least in the sense that it is opt-in for the exporting
driver, and exporting drivers are free to not play along

BR,
-R

>
> Cheers, Daniel
>
> >
> >  drivers/dma-buf/dma-buf.c| 26 ++
> >  include/linux/dma-buf.h  |  7 +++
> >  include/uapi/linux/dma-buf.h | 28 
> >  3 files changed, 61 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 32f55640890c..87c52f080274 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -326,6 +326,29 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, 
> > const char __user *buf)
> >   return 0;
> >  }
> >
> > +static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
> > +{
> > + struct dma_buf_info arg;
> > +
> > + if (copy_from_user(, uarg, sizeof(arg)))
> > + return -EFAULT;
> > +
> > + switch (arg.param) {
> > + case DMA_BUF_INFO_VM_PROT:
> > + if (!dmabuf->ops->mmap_info)
> > + return -ENOSYS;
> > + arg.value = dmabuf->ops->mmap_info(dmabuf);
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + if (copy_to_user(uarg, , sizeof(arg)))
> > + return -EFAULT;
> > +
> > + return 0;
> > +}
> > +
> >  static long dma_buf_ioctl(struct file *file,
> > unsigned int cmd, unsigned long arg)
> >  {
> > @@ -369,6 +392,9 @@ static long dma_buf_ioctl(struct file *file,
> >   case DMA_BUF_SET_NAME_B:
> >   return dma_buf_set_name(dmabuf, (const char __user *)arg);
> >
> > + case DMA_BUF_IOCTL_INFO:
> > + return dma_buf_info(dmabuf, (void __user *)arg);
> > +
> >   default:
> >   return -ENOTTY;
> >   }
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 71731796c8c3..6f4de64a5937 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -283,6 +283,13 @@ struct dma_buf_ops {
> >*/
> >   int (*mmap)(struct dma_buf *, struct vm_are

Re: [PATCH] drm/msm: fix repeated words in comments

2022-08-27 Thread Rob Clark
On Fri, Aug 26, 2022 at 2:43 AM Dmitry Baryshkov
 wrote:
>
> On 23/08/2022 14:54, Jilin Yuan wrote:
> >   Delete the redundant word 'one'.
>
> The whitespace is unnecessary.
>
> >
> > Signed-off-by: Jilin Yuan 
>
> Reviewed-by: Dmitry Baryshkov 
> Fixes: 7198e6b03155 ("drm/msm: add a3xx gpu support")
>

jfyi, this comment (and associated list-head) is removed by:

https://patchwork.freedesktop.org/patch/496131/?series=105633=4

BR,
-R

>
> > ---
> >   drivers/gpu/drm/msm/msm_gem.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> > index c75d3b879a53..e300c70e8904 100644
> > --- a/drivers/gpu/drm/msm/msm_gem.h
> > +++ b/drivers/gpu/drm/msm/msm_gem.h
> > @@ -118,7 +118,7 @@ struct msm_gem_object {
> >* An object is either:
> >*  inactive - on priv->inactive_dontneed or priv->inactive_willneed
> >* (depending on purgeability status)
> > -  *  active   - on one one of the gpu's active_list..  well, at
> > +  *  active   - on one of the gpu's active_list..  well, at
> >* least for now we don't have (I don't think) hw sync between
> >* 2d and 3d one devices which have both, meaning we need to
> >* block on submit if a bo is already on other ring
>
> --
> With best wishes
> Dmitry
>


[pull] drm/msm: drm-msm-fixes-2022-08-27 for v6.0

2022-08-27 Thread Rob Clark
(one more time without forgetting dri-devel this time)

Hi Dave,

A few fixes for the v6.0 cycle.  I meant to send this a bit earlier
but ended up at the bottom of other rabbit holes.  Summary below (and
in tag msg)

The following changes since commit cb77085b1f0a86ef9dfba86b5f3ed6c3340c2ea3:

  drm/msm/dpu: Fix for non-visible planes (2022-07-08 08:10:58 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2022-08-27

for you to fetch changes up to 174974d8463b77c2b4065e98513adb204e64de7d:

  drm/msm/rd: Fix FIFO-full deadlock (2022-08-15 10:19:53 -0700)


Fixes for v6.0

- Fix for inconsistent indenting in function msm_dsi_dphy_timing_calc_v3.
  This fixes a smatch warning reported by kbot
- Fix to make eDP the first connector in the connected list. This was
  mainly done to address a screen corruption issue we were seeing on
  sc7280 boards which have eDP as the primary display. The corruption
  itself is from usermode but we decided to fix it this way because
  things work correct with the primary display as the first one for
  usermode
- Fix to populate intf_cfg correctly before calling reset_intf_cfg().
  Without this, the display pipeline is not torn down correctly for
  writeback
- Specify the correct number of DSI regulators for SDM660. It should
  have been 1 but 2 was mentioned
- Specify the correct number of DSI regulators for MSM8996. It should
  have been 3 but 2 was mentioned
- Fix for removing DP_RECOVERED_CLOCK_OUT_EN bit for tps4 link training
  for DP. This was causing link training failures and hence no display
  for a specific DP to HDMI cable on chromebooks
- Fix probe-deferral crash in gpu devfreq
- Fix gpu debugfs deadlock


Abhinav Kumar (1):
  drm/msm/dpu: populate wb or intf before reset_intf_cfg

Bjorn Andersson (1):
  drm/msm/gpu: Drop qos request if devm_devfreq_add_device() fails

Douglas Anderson (2):
  drm/msm/dsi: Fix number of regulators for msm8996_dsi_cfg
  drm/msm/dsi: Fix number of regulators for SDM660

Kuogee Hsieh (2):
  drm/msm/dp: make eDP panel as the first connected connector
  drm/msm/dp: delete DP_RECOVERED_CLOCK_OUT_EN to fix tps4

Rob Clark (1):
  drm/msm/rd: Fix FIFO-full deadlock

sunliming (1):
  drm/msm/dsi: fix the inconsistent indenting

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 6 ++
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 2 +-
 drivers/gpu/drm/msm/dsi/dsi_cfg.c   | 4 ++--
 drivers/gpu/drm/msm/dsi/phy/dsi_phy.c   | 2 +-
 drivers/gpu/drm/msm/msm_drv.c   | 2 ++
 drivers/gpu/drm/msm/msm_gpu_devfreq.c   | 2 ++
 drivers/gpu/drm/msm/msm_rd.c| 3 +++
 7 files changed, 17 insertions(+), 4 deletions(-)


Re: [PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables

2022-08-24 Thread Rob Clark
On Wed, Aug 24, 2022 at 10:46 AM Akhil P Oommen
 wrote:
>
> On 8/21/2022 11:49 PM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++
> >   drivers/gpu/drm/msm/msm_iommu.c   | 29 +++
> >   2 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index c8ad8aeca777..1ba0ed629549 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
> >   /* Always come up on rb 0 */
> >   a6xx_gpu->cur_ring = gpu->rb[0];
> >
> > + /*
> > +  * Note, we cannot assume anything about the state of the SMMU when
> > +  * coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
> > +  * on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
> > +  * this behavior.
> > +  */
> >   gpu->cur_ctx_seqno = 0;
> >
> >   /* Enable the SQE_to start the CP engine */
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> > b/drivers/gpu/drm/msm/msm_iommu.c
> > index 94c8c09980d1..218074a58081 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
> > *mmu, u64 iova,
> >   size -= 4096;
> >   }
> >
> > + /*
> > +  * A CP_SMMU_TABLE_UPDATE is always sent for the first
> > +  * submit after resume, and that does a TLB invalidate.
> > +  * So we can skip that if the device is not currently
> > +  * powered.
> > +  */
> > + if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
> > + goto out;
> > +
> > + /*
> > +  * If we are not the current pgtables, we can rely on the
> > +  * TLB invalidate done by CP_SMMU_TABLE_UPDATE.
> > +  *
> > +  * We'll always be racing with the GPU updating ttbr0,
> > +  * but there are only two cases:
> > +  *
> > +  *  + either we are not the the current pgtables and there
> > +  *will be a tlbinv done by the GPU before we are again
> > +  *
> > +  *  + or we are.. there might have already been a tblinv
> > +  *if we raced with the GPU, but we have to assume the
> > +  *worse and do the tlbinv
> > +  */
> > + if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
> > + goto out_put;
> > +
> >   adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
> >
> > +out_put:
> > + pm_runtime_put(pagetable->parent->dev);
> > +out:
> >   return (unmapped == size) ? 0 : -EINVAL;
> >   }
> >
> Asking because it is a *security issue* if we get this wrong:
> 1. Is there any measure benefit with this patch? I believe tlb
> invalidation doesn't contribute much to the unmap latency.

It turned out to not make a huge difference.. although I expect the
part about skipping the inv when runtime suspended is still useful
from a power standpoint (but don't have a great setup to measure that)

BR,
-R

> 2. We at least should insert a full memory barrier before reading the
> ttbr0 register to ensure that everything we did prior to that is visible
> to smmu. But then I guess the cost of the full barrier would be similar
> to the tlb invalidation.
>
> Because it could lead to security issues or other very hard to debug
> issues, I would prefer this optimization only if there is a significant
> measurable gain.
>
> -Akhil.
>


Re: [Freedreno] [PATCH v2] drm/msm/iommu: optimize map/unmap

2022-08-23 Thread Rob Clark
On Tue, Aug 23, 2022 at 2:37 PM Akhil P Oommen  wrote:
>
> On 8/23/2022 10:07 PM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Using map_pages/unmap_pages cuts down on the # of pgtable walks needed
> > in the process of finding where to insert/remove an entry.  The end
> > result is ~5-10x faster than mapping a single page at a time.
> >
> > v2: Rename iommu_pgsize(), drop obsolete comments, fix error handling
> >  in msm_iommu_pagetable_map()
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/msm_iommu.c | 101 +++-
> >   1 file changed, 86 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> > b/drivers/gpu/drm/msm/msm_iommu.c
> > index a54ed354578b..5577cea7c009 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -21,6 +21,7 @@ struct msm_iommu_pagetable {
> >   struct msm_mmu base;
> >   struct msm_mmu *parent;
> >   struct io_pgtable_ops *pgtbl_ops;
> > + unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
> >   phys_addr_t ttbr;
> >   u32 asid;
> >   };
> > @@ -29,23 +30,84 @@ static struct msm_iommu_pagetable *to_pagetable(struct 
> > msm_mmu *mmu)
> >   return container_of(mmu, struct msm_iommu_pagetable, base);
> >   }
> >
> > +/* based on iommu_pgsize() in iommu.c: */
> > +static size_t calc_pgsize(struct msm_iommu_pagetable *pagetable,
> > +unsigned long iova, phys_addr_t paddr,
> > +size_t size, size_t *count)
> > +{
> > + unsigned int pgsize_idx, pgsize_idx_next;
> > + unsigned long pgsizes;
> > + size_t offset, pgsize, pgsize_next;
> > + unsigned long addr_merge = paddr | iova;
> > +
> > + /* Page sizes supported by the hardware and small enough for @size */
> > + pgsizes = pagetable->pgsize_bitmap & GENMASK(__fls(size), 0);
> > +
> > + /* Constrain the page sizes further based on the maximum alignment */
> > + if (likely(addr_merge))
> > + pgsizes &= GENMASK(__ffs(addr_merge), 0);
> > +
> > + /* Make sure we have at least one suitable page size */
> > + BUG_ON(!pgsizes);
> > +
> > + /* Pick the biggest page size remaining */
> > + pgsize_idx = __fls(pgsizes);
> > + pgsize = BIT(pgsize_idx);
> > + if (!count)
> > + return pgsize;
> > +
> > + /* Find the next biggest support page size, if it exists */
> > + pgsizes = pagetable->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
> > + if (!pgsizes)
> > + goto out_set_count;
> > +
> > + pgsize_idx_next = __ffs(pgsizes);
> > + pgsize_next = BIT(pgsize_idx_next);
> > +
> > + /*
> > +  * There's no point trying a bigger page size unless the virtual
> > +  * and physical addresses are similarly offset within the larger page.
> > +  */
> > + if ((iova ^ paddr) & (pgsize_next - 1))
> > + goto out_set_count;
> > +
> > + /* Calculate the offset to the next page size alignment boundary */
> > + offset = pgsize_next - (addr_merge & (pgsize_next - 1));
> > +
> > + /*
> > +  * If size is big enough to accommodate the larger page, reduce
> > +  * the number of smaller pages.
> > +  */
> > + if (offset + pgsize_next <= size)
> > + size = offset;
> > +
> > +out_set_count:
> > + *count = size >> pgsize_idx;
> > + return pgsize;
> > +}
> > +
> Can we keep this in iommu driver? Seems useful to other drivers too.

This might end up being only temporary.. Robin had the idea of adding
a private way to create "dummy" iommu_domain's which we could use
instead of the pgtbl ops directly.  On one hand, it would simplify
this quite a bit.  On the other hand it would force powering up (at
least the SMMU) for unmaps/maps, and make it harder to do things like
this:

  https://patchwork.freedesktop.org/patch/498660/?series=107536=1

> Perhaps implement an sg friendly version of iopgtble ops, like
> unmap_sg() maybe!

Probably not a good idea to push more into the iopgtbl
implementations.. __iommu_map_sg() does have a bit of cleverness, but
that shouldn't really be required if we get our sg from
drm_prime_pages_to_sg() since sg_alloc_append_table_from_pages()
already performs a similar merging

BR,
-R

>
> -Akhil.
> >   static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
> 

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-23 Thread Rob Clark
On Tue, Aug 23, 2022 at 3:01 AM Christian König
 wrote:
>
> Am 22.08.22 um 19:26 schrieb Dmitry Osipenko:
> > On 8/16/22 22:55, Dmitry Osipenko wrote:
> >> On 8/16/22 15:03, Christian König wrote:
> >>> Am 16.08.22 um 13:44 schrieb Dmitry Osipenko:
>  [SNIP]
> > The other complication I noticed is that we don't seem to keep around
> > the fd after importing to a GEM handle.  And I could imagine that
> > doing so could cause issues with too many fd's.  So I guess the best
> > thing is to keep the status quo and let drivers that cannot mmap
> > imported buffers just fail mmap?
>  That actually should be all the drivers excluding those that use
>  DRM-SHMEM because only DRM-SHMEM uses dma_buf_mmap(), that's why it
>  works for Panfrost. I'm pretty sure mmaping of imported GEMs doesn't
>  work for the MSM driver, isn't it?
> 
>  Intel and AMD drivers don't allow to map the imported dma-bufs. Both
>  refuse to do the mapping.
> 
>  Although, AMDGPU "succeeds" to do the mapping using
>  AMDGPU_GEM_DOMAIN_GTT, but then touching the mapping causes bus fault,
>  hence mapping actually fails. I think it might be the AMDGPU
>  driver/libdrm bug, haven't checked yet.
> >>> That's then certainly broken somehow. Amdgpu should nerve ever have
> >>> allowed to mmap() imported DMA-bufs and the last time I check it didn't.
> >> I'll take a closer look. So far I can only tell that it's a kernel
> >> driver issue because once I re-applied this "Don't map imported GEMs"
> >> patch, AMDGPU began to refuse mapping AMDGPU_GEM_DOMAIN_GTT.
> >>
>  So we're back to the point that neither of DRM drivers need to map
>  imported dma-bufs and this was never tested. In this case this patch is
>  valid, IMO.
> >> Actually, I'm now looking at Etnaviv and Nouveau and seems they should
> >> map imported dma-buf properly. I know that people ran Android on
> >> Etnaviv. So maybe devices with a separated GPU/display need to map
> >> imported display BO for Android support. Wish somebody who ran Android
> >> on one of these devices using upstream drivers could give a definitive
> >> answer. I may try to test Nouveau later on.
> >>
> > Nouveau+Intel combo doesn't work because of [1] that says:
> >
> > "Refuse to fault imported pages. This should be handled (if at all) by
> > redirecting mmap to the exporter."
> >
> > [1]
> > https://elixir.bootlin.com/linux/v5.19/source/drivers/gpu/drm/ttm/ttm_bo_vm.c#L154
> >
> > Interestingly, I noticed that there are IGT tests which check prime
> > mmaping of Nouveau+Intel [2] (added 9 years ago), but they fail as well,
> > as expected. The fact that IGT has such tests is interesting because it
> > suggests that the mapping worked in the past. It's also surprising that
> > nobody cared to fix the failing tests. For the reference, I checked
> > v5.18 and today's linux-next.
> >
> > [2]
> > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/prime_nv_test.c#L132
> >
> > Starting subtest: nv_write_i915_cpu_mmap_read
> > Received signal SIGBUS.
> > Stack trace:
> >   #0 [fatal_sig_handler+0x163]
> >   #1 [__sigaction+0x50]
> >   #2 [__igt_uniquereal_main354+0x406]
> >   #3 [main+0x23]
> >   #4 [__libc_start_call_main+0x80]
> >   #5 [__libc_start_main+0x89]
> >   #6 [_start+0x25]
> > Subtest nv_write_i915_cpu_mmap_read: CRASH (0,005s)
> >
> > Starting subtest: nv_write_i915_gtt_mmap_read
> > Received signal SIGBUS.
> > Stack trace:
> >   #0 [fatal_sig_handler+0x163]
> >   #1 [__sigaction+0x50]
> >   #2 [__igt_uniquereal_main354+0x33d]
> >   #3 [main+0x23]
> >   #4 [__libc_start_call_main+0x80]
> >   #5 [__libc_start_main+0x89]
> >   #6 [_start+0x25]
> > Subtest nv_write_i915_gtt_mmap_read: CRASH (0,004s)
> >
> > I'm curious about the Etnaviv driver because it uses own shmem
> > implementation and maybe it has a working mmaping of imported GEMs since
> > it imports the dma-buf pages into Entaviv BO. Although, it should be
> > risking to map pages using a different caching attributes (WC) from the
> > exporter, which is prohibited on ARM ad then one may try to map imported
> > udmabuf.
> >
> > Apparently, the Intel DG TTM driver should be able to map imported
> > dma-buf because it sets TTM_TT_FLAG_EXTERNAL_MAPPABLE.
>
> Even with that flag set it is illegal to map the pages directly by an
> importer.
>
> If that ever worked then the only real solution is to redirect mmap()
> calls on importer BOs to dma_buf_mmap().

Yeah, I think this is the best option.  Forcing userspace to hang on
to the fd just in case someone calls readpix would be pretty harsh.

BR,
-R

> Regards,
> Christian.
>
> >
> > Overall, it still questionable to me whether it's worthwhile to allow
> > the mmaping of imported GEMs since only Panfrost/Lima can do it out of
> > all drivers and h/w that I tested. Feels like drivers that can do the
> > mapping have it just because they can and not because they need.
> >
>


[PATCH v2] drm/msm/iommu: optimize map/unmap

2022-08-23 Thread Rob Clark
From: Rob Clark 

Using map_pages/unmap_pages cuts down on the # of pgtable walks needed
in the process of finding where to insert/remove an entry.  The end
result is ~5-10x faster than mapping a single page at a time.

v2: Rename iommu_pgsize(), drop obsolete comments, fix error handling
in msm_iommu_pagetable_map()

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_iommu.c | 101 +++-
 1 file changed, 86 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..5577cea7c009 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -21,6 +21,7 @@ struct msm_iommu_pagetable {
struct msm_mmu base;
struct msm_mmu *parent;
struct io_pgtable_ops *pgtbl_ops;
+   unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
phys_addr_t ttbr;
u32 asid;
 };
@@ -29,23 +30,84 @@ static struct msm_iommu_pagetable *to_pagetable(struct 
msm_mmu *mmu)
return container_of(mmu, struct msm_iommu_pagetable, base);
 }
 
+/* based on iommu_pgsize() in iommu.c: */
+static size_t calc_pgsize(struct msm_iommu_pagetable *pagetable,
+  unsigned long iova, phys_addr_t paddr,
+  size_t size, size_t *count)
+{
+   unsigned int pgsize_idx, pgsize_idx_next;
+   unsigned long pgsizes;
+   size_t offset, pgsize, pgsize_next;
+   unsigned long addr_merge = paddr | iova;
+
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = pagetable->pgsize_bitmap & GENMASK(__fls(size), 0);
+
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
+
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
+
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
+
+   /* Find the next biggest support page size, if it exists */
+   pgsizes = pagetable->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
+   return pgsize;
+}
+
 static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
size_t size)
 {
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
-   size_t unmapped = 0;
 
-   /* Unmap the block one page at a time */
while (size) {
-   unmapped += ops->unmap(ops, iova, 4096, NULL);
-   iova += 4096;
-   size -= 4096;
+   size_t unmapped, pgsize, count;
+
+   pgsize = calc_pgsize(pagetable, iova, iova, size, );
+
+   unmapped = ops->unmap_pages(ops, iova, pgsize, count, NULL);
+   if (!unmapped)
+   break;
+
+   iova += unmapped;
+   size -= unmapped;
}
 
iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
 
-   return (unmapped == size) ? 0 : -EINVAL;
+   return (size == 0) ? 0 : -EINVAL;
 }
 
 static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
@@ -54,7 +116,6 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 
iova,
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
struct scatterlist *sg;
-   size_t mapped = 0;
u64 addr = iova;
unsigned int i;
 
@@ -62,17 +123,26 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, 
u64 iova,
size_t size = sg->length;
phys_addr_t phys = sg_phys(sg);
 
-   /* Map the block one page at a time */
while (size) {
-   if (ops->map(ops, addr, phys, 4096, prot, GFP_KERNEL)) {
-   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   size_t pgsize, count, mapped = 0;
+ 

[PATCH] drm/msm/iommu: optimize map/unmap

2022-08-22 Thread Rob Clark
From: Rob Clark 

Using map_pages/unmap_pages cuts down on the # of pgtable walks needed
in the process of finding where to insert/remove an entry.  The end
result is ~5-10x faster than mapping a single page at a time.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_iommu.c | 91 -
 1 file changed, 79 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..0f3f60da3314 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -21,6 +21,7 @@ struct msm_iommu_pagetable {
struct msm_mmu base;
struct msm_mmu *parent;
struct io_pgtable_ops *pgtbl_ops;
+   unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
phys_addr_t ttbr;
u32 asid;
 };
@@ -29,23 +30,85 @@ static struct msm_iommu_pagetable *to_pagetable(struct 
msm_mmu *mmu)
return container_of(mmu, struct msm_iommu_pagetable, base);
 }
 
+/* based on iommu_pgsize() in iommu.c: */
+static size_t iommu_pgsize(struct msm_iommu_pagetable *pagetable,
+  unsigned long iova, phys_addr_t paddr,
+  size_t size, size_t *count)
+{
+   unsigned int pgsize_idx, pgsize_idx_next;
+   unsigned long pgsizes;
+   size_t offset, pgsize, pgsize_next;
+   unsigned long addr_merge = paddr | iova;
+
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = pagetable->pgsize_bitmap & GENMASK(__fls(size), 0);
+
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
+
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
+
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
+
+   /* Find the next biggest support page size, if it exists */
+   pgsizes = pagetable->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
+   return pgsize;
+}
+
 static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
size_t size)
 {
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
-   size_t unmapped = 0;
 
/* Unmap the block one page at a time */
while (size) {
-   unmapped += ops->unmap(ops, iova, 4096, NULL);
-   iova += 4096;
-   size -= 4096;
+   size_t unmapped, pgsize, count;
+
+   pgsize = iommu_pgsize(pagetable, iova, iova, size, );
+
+   unmapped = ops->unmap_pages(ops, iova, pgsize, count, NULL);
+   if (!unmapped)
+   break;
+
+   iova += unmapped;
+   size -= unmapped;
}
 
iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
 
-   return (unmapped == size) ? 0 : -EINVAL;
+   return (size == 0) ? 0 : -EINVAL;
 }
 
 static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
@@ -54,7 +117,6 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 
iova,
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
struct scatterlist *sg;
-   size_t mapped = 0;
u64 addr = iova;
unsigned int i;
 
@@ -64,15 +126,19 @@ static int msm_iommu_pagetable_map(struct msm_mmu *mmu, 
u64 iova,
 
/* Map the block one page at a time */
while (size) {
-   if (ops->map(ops, addr, phys, 4096, prot, GFP_KERNEL)) {
-   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   size_t pgsize, count, mapped;
+
+   pgsize = iommu_pgsize(pagetable, addr, phys, size, 
);
+
+   if (ops->map_pages(ops, addr, phys, pgsize, count,

Re: Rust in our code base

2022-08-21 Thread Rob Clark
On Sun, Aug 21, 2022 at 10:45 AM Karol Herbst  wrote:
>
> On Sun, Aug 21, 2022 at 7:43 PM Karol Herbst  wrote:
> >
> > On Sun, Aug 21, 2022 at 5:46 PM Rob Clark  wrote:
> > >
> > > On Sat, Aug 20, 2022 at 5:23 AM Karol Herbst  wrote:
> > > >
> > > > Hey everybody,
> > > >
> > > > so I think it's time to have this discussion for real.
> > > >
> > > > I am working on Rusticl
> > > > (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15439)
> > > > which I would like to merge quite soon.
> > > >
> > > > Others might also plan on starting kernel drivers written in Rust (and
> > > > if people feel comfortable to discuss this as well, they might reply
> > > > here)
> > > >
> > > > The overall implication of that is: if we are doing this, people (that
> > > > is we) have to accept that touching Rust code will be part of our
> > > > development process. There is no other sane way of doing it.
> > > >
> > > > I am not willing to wrap things in Rusticl so changing gallium APIs
> > > > won't involve touching Rust code, and we also can't expect people to
> > > > design their kernel drivers in weird ways "just because somebody
> > > > doesn't want to deal with Rust"
> > > >
> > > > If we are going to do this, we have to do it for real, which means,
> > > > Rust code will call C APIs directly and a change in those APIs will
> > > > also require changes in Rust code making use of those APIs.
> > > >
> > > > I am so explicit on this very point, because we had some discussion on
> > > > IRC where this was seen as a no-go at least from some people, which
> > > > makes me think we have to find a mutual agreement on how it should be
> > > > going forward.
> > > >
> > > > And I want to be very explicit here about the future of Rusticl as
> > > > well: if the agreement is that people don't want to have to deal with
> > > > Rust changing e.g. gallium, Rusticl is a dead project. I am not
> > > > willing to come up with some trashy external-internal API just to
> > > > maintain Rusticl outside of the mesa git repo.
> > > > And doing it on a kernel level is even more of a no-go.
> > > >
> > > > So what are we all thinking about Rust in our core repos?
> > >
> > > I think there has to be willingness on the part of rust folks to help
> > > others who aren't so familiar with rust with these sorts of API
> > > changes.  You can't completely impose the burden on others who have
> > > never touched rust before.  That said, I expect a lot of API changes
> > > over time are simple enough that other devs could figure out the
> > > related rust side changes.
> > >
> >
> > yeah, I agree here. I wouldn't say it's all the responsibility of
> > developers changing APIs to also know how to change the code. So e.g.
> > if an MR fails to compile and it's because of rusticl, I will help out
> > and do the changes myself if necessary. But long term we have to
> > accept that API changes also come with the implication of also having
> > to touch Rust code.
> >
> > Short term it might be a learning opportunity for some/most, but long
> > term it has to be accepted as a part of development to deal with Rust.
> >
> > > As long as folks who want to start introducing rust in mesa and drm
> > > realize they are also signing up to play the role of rust tutor and
> > > technical assistance, I don't see a problem.  But if they aren't
> > > around and willing to help, I could see this going badly.
> > >
> >
> > Yep, I fully agree here. This is also the main reason I am bringing
> > this up. Nobody should be left alone with having to deal with changing
> > the code. On the other hand a "not having to touch Rust code when
> > changing APIs" guarantee is something which is simply impossible to
> > have in any sane architecture. So we should figure out under which
> > circumstances it will be okay for everybody.

Yeah, this sounds fine to me.

> > At least I don't see a way how I can structure Rusticl so that
> > somebody working on gallium won't have to also deal with rusticl. One
> > possibility would be to have a libgallium.so file I can link to, but
> > then it's all about "stable gallium API" vs "not having to touch rust
> > code" and I hope everybody prefers th

[PATCH 5/5] drm/msm: Skip tlbinv on unmap from non-current pgtables

2022-08-21 Thread Rob Clark
From: Rob Clark 

We can rely on the tlbinv done by CP_SMMU_TABLE_UPDATE in this case.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++
 drivers/gpu/drm/msm/msm_iommu.c   | 29 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c8ad8aeca777..1ba0ed629549 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1180,6 +1180,12 @@ static int hw_init(struct msm_gpu *gpu)
/* Always come up on rb 0 */
a6xx_gpu->cur_ring = gpu->rb[0];
 
+   /*
+* Note, we cannot assume anything about the state of the SMMU when
+* coming back from power collapse, so force a CP_SMMU_TABLE_UPDATE
+* on the first submit.  Also, msm_iommu_pagetable_unmap() relies on
+* this behavior.
+*/
gpu->cur_ctx_seqno = 0;
 
/* Enable the SQE_to start the CP engine */
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 94c8c09980d1..218074a58081 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -45,8 +45,37 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, 
u64 iova,
size -= 4096;
}
 
+   /*
+* A CP_SMMU_TABLE_UPDATE is always sent for the first
+* submit after resume, and that does a TLB invalidate.
+* So we can skip that if the device is not currently
+* powered.
+*/
+   if (!pm_runtime_get_if_in_use(pagetable->parent->dev))
+   goto out;
+
+   /*
+* If we are not the current pgtables, we can rely on the
+* TLB invalidate done by CP_SMMU_TABLE_UPDATE.
+*
+* We'll always be racing with the GPU updating ttbr0,
+* but there are only two cases:
+*
+*  + either we are not the the current pgtables and there
+*will be a tlbinv done by the GPU before we are again
+*
+*  + or we are.. there might have already been a tblinv
+*if we raced with the GPU, but we have to assume the
+*worse and do the tlbinv
+*/
+   if (adreno_smmu->get_ttbr0(adreno_smmu->cookie) != pagetable->ttbr)
+   goto out_put;
+
adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
 
+out_put:
+   pm_runtime_put(pagetable->parent->dev);
+out:
return (unmapped == size) ? 0 : -EINVAL;
 }
 
-- 
2.37.2



[PATCH 3/5] iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID

2022-08-21 Thread Rob Clark
From: Rob Clark 

This will let the drm driver use different ASID values for each set of
pgtables to avoid over-invalidation on unmap.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  1 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 43 --
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
 include/linux/adreno-smmu-priv.h   |  2 +
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 59b460c1c9a5..3230348729ab 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -229,6 +229,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
priv->set_stall = qcom_adreno_smmu_set_stall;
priv->resume_translation = qcom_adreno_smmu_resume_translation;
+   priv->tlb_inv_by_id = arm_smmu_tlb_inv_by_id;
 
return 0;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..624359bb2092 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -252,7 +252,7 @@ static void arm_smmu_tlb_sync_context(struct 
arm_smmu_domain *smmu_domain)
spin_unlock_irqrestore(_domain->cb_lock, flags);
 }
 
-static void arm_smmu_tlb_inv_context_s1(void *cookie)
+static void arm_smmu_tlb_inv_context_s1_asid(void *cookie, u16 asid)
 {
struct arm_smmu_domain *smmu_domain = cookie;
/*
@@ -261,21 +261,56 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
 */
wmb();
arm_smmu_cb_write(smmu_domain->smmu, smmu_domain->cfg.cbndx,
- ARM_SMMU_CB_S1_TLBIASID, smmu_domain->cfg.asid);
+ ARM_SMMU_CB_S1_TLBIASID, asid);
arm_smmu_tlb_sync_context(smmu_domain);
 }
 
-static void arm_smmu_tlb_inv_context_s2(void *cookie)
+static void arm_smmu_tlb_inv_context_s1(void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   arm_smmu_tlb_inv_context_s1_asid(cookie, smmu_domain->cfg.asid);
+}
+
+static void arm_smmu_tlb_inv_context_s2_vmid(void *cookie, u16 vmid)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
 
/* See above */
wmb();
-   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, vmid);
arm_smmu_tlb_sync_global(smmu);
 }
 
+static void arm_smmu_tlb_inv_context_s2(void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+
+   arm_smmu_tlb_inv_context_s2_vmid(cookie, smmu_domain->cfg.vmid);
+}
+
+void arm_smmu_tlb_inv_by_id(const void *cookie, u16 id)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   arm_smmu_rpm_get(smmu);
+   switch (smmu_domain->stage) {
+   case ARM_SMMU_DOMAIN_S1:
+   arm_smmu_tlb_inv_context_s1_asid(smmu_domain, id);
+   break;
+   case ARM_SMMU_DOMAIN_S2:
+   case ARM_SMMU_DOMAIN_NESTED:
+   arm_smmu_tlb_inv_context_s2_vmid(smmu_domain, id);
+   break;
+   case ARM_SMMU_DOMAIN_BYPASS:
+   break;
+   }
+
+   arm_smmu_rpm_put(smmu);
+}
+
 static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
  size_t granule, void *cookie, int reg)
 {
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 2b9b42fb6f30..f6fb52d6f841 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -527,6 +527,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu);
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
 
+void arm_smmu_tlb_inv_by_id(const void *cookie, u16 id);
 void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index 4ad90541a095..c44fc68d4de8 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -50,6 +50,7 @@ struct adreno_smmu_fault_info {
  * before set_ttbr0_cfg().  If stalling on fault is enabled,
  * the GPU driver must call resume_translation()
  * @resume_translation: Resume translation after a fault
+ * @tlb_inv_by_id: Flush TLB by ASID/VMID
  *
  *
  * The GPU driver (drm/msm) and adreno-smmu work together for controlling
@@ -69,6 +70,7 @@ struct adreno_smmu_priv {
void (*get_fault_info)(const void *co

[PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables

2022-08-21 Thread Rob Clark
From: Rob Clark 

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_iommu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 
iova,
size_t size)
 {
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct adreno_smmu_priv *adreno_smmu =
+   dev_get_drvdata(pagetable->parent->dev);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
size_t unmapped = 0;
 
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 
iova,
size -= 4096;
}
 
-   iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
+   adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
 
return (unmapped == size) ? 0 : -EINVAL;
 }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, 
struct device *dev,
 
 struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 {
+   static atomic_t asid = ATOMIC_INIT(1);
struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
struct msm_iommu *iommu = to_msm_iommu(parent);
struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu 
*parent)
pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
 
/*
-* TODO we would like each set of page tables to have a unique ASID
-* to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
-* end up flushing the ASID used for TTBR1 pagetables, which is not
-* what we want.  So for now just use the same ASID as TTBR1.
+* ASID 0 is used for kernel mapped buffers in TTBR1, which we
+* do not need to invalidate when unmapping from TTBR0 pgtables.
+* The hw ASID is at *least* 8b, but can be 16b.  We just assume
+* the worst:
 */
pagetable->asid = 0;
+   while (!pagetable->asid)
+   pagetable->asid = atomic_inc_return() & 0xff;
 
return >base;
 }
-- 
2.37.2



[PATCH 2/5] iommu/arm-smmu-qcom: Provide way to access current TTBR0

2022-08-21 Thread Rob Clark
From: Rob Clark 

The drm driver can skip tlbinv when unmapping from something that isn't
the current pgtables, as there is already a tlbinv on context switch.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 9 +
 include/linux/adreno-smmu-priv.h   | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7820711c4560..59b460c1c9a5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -157,6 +157,14 @@ static int qcom_adreno_smmu_set_ttbr0_cfg(const void 
*cookie,
return 0;
 }
 
+static u64 qcom_adreno_smmu_get_ttbr0(const void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct arm_smmu_cfg *cfg = _domain->cfg;
+
+   return arm_smmu_cb_readq(smmu_domain->smmu, cfg->cbndx, 
ARM_SMMU_CB_TTBR0);
+}
+
 static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain 
*smmu_domain,
   struct arm_smmu_device *smmu,
   struct device *dev, int start)
@@ -217,6 +225,7 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
priv->cookie = smmu_domain;
priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+   priv->get_ttbr0 = qcom_adreno_smmu_get_ttbr0;
priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
priv->set_stall = qcom_adreno_smmu_set_stall;
priv->resume_translation = qcom_adreno_smmu_resume_translation;
diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index ac4c2c0ab724..4ad90541a095 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -43,6 +43,7 @@ struct adreno_smmu_fault_info {
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
  * NULL config disables TTBR0 translation, otherwise
  * TTBR0 translation is enabled with the specified cfg
+ * @get_ttbr0: Get current TTBR0 value
  * @get_fault_info: Called by the GPU fault handler to get information about
  *  the fault
  * @set_stall: Configure whether stall on fault (CFCFG) is enabled.  Call
@@ -64,6 +65,7 @@ struct adreno_smmu_priv {
const void *cookie;
const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg 
*cfg);
+   u64 (*get_ttbr0)(const void *cookie);
void (*get_fault_info)(const void *cookie, struct 
adreno_smmu_fault_info *info);
void (*set_stall)(const void *cookie, bool enabled);
void (*resume_translation)(const void *cookie, bool terminate);
-- 
2.37.2



[PATCH 1/5] iommu/arm-smmu-qcom: Fix indentation

2022-08-21 Thread Rob Clark
From: Rob Clark 

Plus typo.

Signed-off-by: Rob Clark 
---
 include/linux/adreno-smmu-priv.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
index c637e0997f6d..ac4c2c0ab724 100644
--- a/include/linux/adreno-smmu-priv.h
+++ b/include/linux/adreno-smmu-priv.h
@@ -37,7 +37,7 @@ struct adreno_smmu_fault_info {
 /**
  * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
  *
- * @cookie:An opque token provided by adreno-smmu and passed
+ * @cookie:An opaque token provided by adreno-smmu and passed
  * back into the callbacks
  * @get_ttbr1_cfg: Get the TTBR1 config for the GPUs context-bank
  * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
@@ -61,12 +61,12 @@ struct adreno_smmu_fault_info {
  * it's domain.
  */
 struct adreno_smmu_priv {
-const void *cookie;
-const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
-int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
-void (*get_fault_info)(const void *cookie, struct adreno_smmu_fault_info 
*info);
-void (*set_stall)(const void *cookie, bool enabled);
-void (*resume_translation)(const void *cookie, bool terminate);
+   const void *cookie;
+   const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
+   int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg 
*cfg);
+   void (*get_fault_info)(const void *cookie, struct 
adreno_smmu_fault_info *info);
+   void (*set_stall)(const void *cookie, bool enabled);
+   void (*resume_translation)(const void *cookie, bool terminate);
 };
 
 #endif /* __ADRENO_SMMU_PRIV_H */
-- 
2.37.2



[PATCH 0/5] drm/msm+iommu/arm-smmu-qcom: tlbinv optimizations

2022-08-21 Thread Rob Clark
From: Rob Clark 

Two additions to adreno_smmu_priv to allow for a couple of
optimizations:

 + Use a separate ASID for each set of pgtables to avoid
   over-invalidation.
 + Detect the case of unmapping from non-current pgtables
   where we can skip the redundant tlbinv

Rob Clark (5):
  iommu/arm-smmu-qcom: Fix indentation
  iommu/arm-smmu-qcom: Provide way to access current TTBR0
  iommu/arm-smmu-qcom: Add private interface to tlbinv by ASID
  drm/msm: Use separate ASID for each set of pgtables
  drm/msm: Skip tlbinv on unmap from non-current pgtables

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 +++
 drivers/gpu/drm/msm/msm_iommu.c| 44 +++---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 10 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 43 +++--
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
 include/linux/adreno-smmu-priv.h   | 18 +
 6 files changed, 106 insertions(+), 16 deletions(-)

-- 
2.37.2



[PATCH] drm/msm: De-open-code some CP_EVENT_WRITE

2022-08-21 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 2 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 2 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 0ab0e1dd8bbb..2c8b9899625b 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -68,7 +68,7 @@ static void a3xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
 
/* BIT(31) of CACHE_FLUSH_TS triggers CACHE_FLUSH_TS IRQ from GPU */
OUT_PKT3(ring, CP_EVENT_WRITE, 3);
-   OUT_RING(ring, CACHE_FLUSH_TS | BIT(31));
+   OUT_RING(ring, CACHE_FLUSH_TS | CP_EVENT_WRITE_0_IRQ);
OUT_RING(ring, rbmemptr(ring, fence));
OUT_RING(ring, submit->seqno);
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 0c6b2a6d0b4c..7cb8d9849c07 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -62,7 +62,7 @@ static void a4xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
 
/* BIT(31) of CACHE_FLUSH_TS triggers CACHE_FLUSH_TS IRQ from GPU */
OUT_PKT3(ring, CP_EVENT_WRITE, 3);
-   OUT_RING(ring, CACHE_FLUSH_TS | BIT(31));
+   OUT_RING(ring, CACHE_FLUSH_TS | CP_EVENT_WRITE_0_IRQ);
OUT_RING(ring, rbmemptr(ring, fence));
OUT_RING(ring, submit->seqno);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 4d501100b9e4..c8ad8aeca777 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -146,7 +146,7 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
 */
 
OUT_PKT7(ring, CP_EVENT_WRITE, 1);
-   OUT_RING(ring, 0x31);
+   OUT_RING(ring, CACHE_INVALIDATE);
 
if (!sysprof) {
/*
-- 
2.37.2



Re: Rust in our code base

2022-08-21 Thread Rob Clark
On Sat, Aug 20, 2022 at 5:23 AM Karol Herbst  wrote:
>
> Hey everybody,
>
> so I think it's time to have this discussion for real.
>
> I am working on Rusticl
> (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15439)
> which I would like to merge quite soon.
>
> Others might also plan on starting kernel drivers written in Rust (and
> if people feel comfortable to discuss this as well, they might reply
> here)
>
> The overall implication of that is: if we are doing this, people (that
> is we) have to accept that touching Rust code will be part of our
> development process. There is no other sane way of doing it.
>
> I am not willing to wrap things in Rusticl so changing gallium APIs
> won't involve touching Rust code, and we also can't expect people to
> design their kernel drivers in weird ways "just because somebody
> doesn't want to deal with Rust"
>
> If we are going to do this, we have to do it for real, which means,
> Rust code will call C APIs directly and a change in those APIs will
> also require changes in Rust code making use of those APIs.
>
> I am so explicit on this very point, because we had some discussion on
> IRC where this was seen as a no-go at least from some people, which
> makes me think we have to find a mutual agreement on how it should be
> going forward.
>
> And I want to be very explicit here about the future of Rusticl as
> well: if the agreement is that people don't want to have to deal with
> Rust changing e.g. gallium, Rusticl is a dead project. I am not
> willing to come up with some trashy external-internal API just to
> maintain Rusticl outside of the mesa git repo.
> And doing it on a kernel level is even more of a no-go.
>
> So what are we all thinking about Rust in our core repos?

I think there has to be willingness on the part of rust folks to help
others who aren't so familiar with rust with these sorts of API
changes.  You can't completely impose the burden on others who have
never touched rust before.  That said, I expect a lot of API changes
over time are simple enough that other devs could figure out the
related rust side changes.

As long as folks who want to start introducing rust in mesa and drm
realize they are also signing up to play the role of rust tutor and
technical assistance, I don't see a problem.  But if they aren't
around and willing to help, I could see this going badly.

I do also wonder a bit about code tooling (indexers, etc).. I'm not
sure what the state of things when it comes to cross c<->rust
integration.  Ie. it is usually straightforward enough to track down
all the spots in C code which would be affected by some change.  It
might be easier to overlook things on the rust side.  On the mesa
side, pre-merge CI jobs help to catch these issues.  Less sure about
how to handle that on the kernel side.

BR,
-R

> CCing a bunch of people who think are the most impacted by it either
> from a reviewer/maintainer or developer perspective. If you think some
> others should be explicitly aware of this, please point them to this
> discussion here.
>
> Karol
>


Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-18 Thread Rob Clark
On Thu, Aug 18, 2022 at 7:54 AM Christian König
 wrote:
>
> Am 18.08.22 um 16:25 schrieb Rob Clark:
> > On Thu, Aug 18, 2022 at 4:21 AM Christian König
> >  wrote:
> >> Am 17.08.22 um 15:44 schrieb Rob Clark:
> >>> On Wed, Aug 17, 2022 at 2:57 AM Christian König
> >>>  wrote:
> >>>> [SNIP]
> >>>>
> >>>> The resulting cache attrs from combination of S1 and S2 translation
> >>>> can differ.  So ideally we setup the S2 pgtables in guest aligned with
> >>>> host userspace mappings
> >>>> Well exactly that is not very convincing.
> >>>>
> >>>> What you want to do is to use one channel for the address and a
> >>>> different one for the cache attrs, that's not something I would
> >>>> recommend doing in general.
> >>> How would that work.. mmap() is the channel for the address, we'd need
> >>> to introduce a new syscall that returned additional information?
> >> The channel for the address is not mmap(), but rather the page faults.
> >> mmap() is just the function setting up that channel.
> >>
> >> The page faults then insert both the address as well as the caching
> >> attributes (at least on x86).
> > This is true on arm64 as well, but only in the S1 tables (which I
> > would have to assume is the case on x86 as well)
> >
> >> That we then need to forward the caching attributes manually once more
> >> seems really misplaced.
> >>
> >>>> Instead the client pgtables should be setup in a way so that host can
> >>>> overwrite them.
> >>> How?  That is completely not how VMs work.  Even if the host knew
> >>> where the pgtables were and somehow magically knew the various guest
> >>> userspace VAs, it would be racey.
> >> Well you mentioned that the client page tables can be setup in a way
> >> that the host page tables determine what caching to use. As far as I can
> >> see this is what we should use here.
> > On arm64/aarch64, they *can*.. but the system (on some versions of
> > armv8) can also be configured to let S2 determine the attributes.  And
> > apparently there are benefits to this (avoids unnecessary cache
> > flushing in the host, AFAIU.)  This is the case where we need this new
> > api.
> >
> > IMO it is fine for the exporter to return a value indicating that the
> > attributes change dynamically or that S1 attributes must somehow be
> > used by the hw.  This would at least let the VMM return an error in
> > cases where S1 attrs cannot be relied on.  But there are enough
> > exporters where the cache attrs are static for the life of the buffer.
> > So even if you need to return DMA_BUF_MAP_I_DONT_KNOW, maybe that is
> > fine (if x86 can always rely on S1 attrs), or at least will let the
> > VMM return an error rather than just blindly assuming things will
> > work.
> >
> > But it makes no sense to reject the whole idea just because of some
> > exporters (which may not even need this).  There is always room to let
> > them return a map-info value that describes the situation or
> > limitations to the VMM.
>
> Well it does make sense as far as I can see.
>
> This is a very specific workaround for a platform problem which only
> matters there, but increases complexity for everybody.

I'm not sure how this adds complexity for everybody.. or at least the
intention was the default value for the new enum is the same as
current status-quo, so no need to plumb something thru every single
exporter.

BR,
-R

> If we don't have any other choice on the problem to work around that I
> would say ok we add an ARM specific workaround.
>
> But as long as that's not the case the whole idea is pretty clearly a
> NAK from my side.
>
> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> BR,
> >>>>> -R
> >>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>> BR,
> >>>>>>> -R
> >>>>>>>
> >>>>>>>> If the hardware can't use the caching information from the host CPU 
> >>>>>>>> page
> >>>>>>>> tables directly then that pretty much completely breaks the 

Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-18 Thread Rob Clark
On Thu, Aug 18, 2022 at 4:21 AM Christian König
 wrote:
>
> Am 17.08.22 um 15:44 schrieb Rob Clark:
> > On Wed, Aug 17, 2022 at 2:57 AM Christian König
> >  wrote:
> >> [SNIP]
> >>
> >> The resulting cache attrs from combination of S1 and S2 translation
> >> can differ.  So ideally we setup the S2 pgtables in guest aligned with
> >> host userspace mappings
> >> Well exactly that is not very convincing.
> >>
> >> What you want to do is to use one channel for the address and a
> >> different one for the cache attrs, that's not something I would
> >> recommend doing in general.
> > How would that work.. mmap() is the channel for the address, we'd need
> > to introduce a new syscall that returned additional information?
>
> The channel for the address is not mmap(), but rather the page faults.
> mmap() is just the function setting up that channel.
>
> The page faults then insert both the address as well as the caching
> attributes (at least on x86).

This is true on arm64 as well, but only in the S1 tables (which I
would have to assume is the case on x86 as well)

> That we then need to forward the caching attributes manually once more
> seems really misplaced.
>
> >> Instead the client pgtables should be setup in a way so that host can
> >> overwrite them.
> > How?  That is completely not how VMs work.  Even if the host knew
> > where the pgtables were and somehow magically knew the various guest
> > userspace VAs, it would be racey.
>
> Well you mentioned that the client page tables can be setup in a way
> that the host page tables determine what caching to use. As far as I can
> see this is what we should use here.

On arm64/aarch64, they *can*.. but the system (on some versions of
armv8) can also be configured to let S2 determine the attributes.  And
apparently there are benefits to this (avoids unnecessary cache
flushing in the host, AFAIU.)  This is the case where we need this new
api.

IMO it is fine for the exporter to return a value indicating that the
attributes change dynamically or that S1 attributes must somehow be
used by the hw.  This would at least let the VMM return an error in
cases where S1 attrs cannot be relied on.  But there are enough
exporters where the cache attrs are static for the life of the buffer.
So even if you need to return DMA_BUF_MAP_I_DONT_KNOW, maybe that is
fine (if x86 can always rely on S1 attrs), or at least will let the
VMM return an error rather than just blindly assuming things will
work.

But it makes no sense to reject the whole idea just because of some
exporters (which may not even need this).  There is always room to let
them return a map-info value that describes the situation or
limitations to the VMM.

BR,
-R

> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> BR,
> >>>>> -R
> >>>>>
> >>>>>> If the hardware can't use the caching information from the host CPU 
> >>>>>> page
> >>>>>> tables directly then that pretty much completely breaks the concept 
> >>>>>> that
> >>>>>> the exporter is responsible for setting up those page tables.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>>  drivers/dma-buf/dma-buf.c| 63 
> >>>>>>> +++--
> >>>>>>>  include/linux/dma-buf.h  | 11 ++
> >>>>>>>  include/uapi/linux/dma-buf.h | 68 
> >>>>>>> 
> >>>>>>>  3 files changed, 132 insertions(+), 10 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>>>>>> index 32f55640890c..262c4706f721 100644
> >>>>>>> --- a/drivers/dma-buf/dma-buf.c
> >>>>>>> +++ b/drivers/dma-buf/dma-buf.c
> >>>>>>> @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = 
> >>>>>>> {
> >>>>>>>  .kill_sb = kill_anon_super,
> >>>>>>>  };
> >>>>>>>
> >>>>>>> +static int __dma_buf_mm

Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-17 Thread Rob Clark
On Wed, Aug 17, 2022 at 2:57 AM Christian König
 wrote:
>
>
>
> Am 16.08.22 um 19:29 schrieb Rob Clark:
> > On Tue, Aug 16, 2022 at 9:51 AM Christian König
> >  wrote:
> >> Am 16.08.22 um 16:26 schrieb Rob Clark:
> >>> On Tue, Aug 16, 2022 at 1:27 AM Christian König
> >>>  wrote:
> >>>> Am 15.08.22 um 23:15 schrieb Rob Clark:
> >>>>> From: Rob Clark 
> >>>>>
> >>>>> This is a fairly narrowly focused interface, providing a way for a VMM
> >>>>> in userspace to tell the guest kernel what pgprot settings to use when
> >>>>> mapping a buffer to guest userspace.
> >>>>>
> >>>>> For buffers that get mapped into guest userspace, virglrenderer returns
> >>>>> a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
> >>>>> pages into the guest VM, it needs to report to drm/virtio in the guest
> >>>>> the cache settings to use for guest userspace.  In particular, on some
> >>>>> architectures, creating aliased mappings with different cache attributes
> >>>>> is frowned upon, so it is important that the guest mappings have the
> >>>>> same cache attributes as any potential host mappings.
> >>>>>
> >>>>> Signed-off-by: Rob Clark 
> >>>>> ---
> >>>>> v2: Combine with coherency, as that is a related concept.. and it is
> >>>>>relevant to the VMM whether coherent access without the SYNC 
> >>>>> ioctl
> >>>>>is possible; set map_info at export time to make it more clear
> >>>>>that it applies for the lifetime of the dma-buf (for any mmap
> >>>>>created via the dma-buf)
> >>>> Well, exactly that's a conceptual NAK from my side.
> >>>>
> >>>> The caching information can change at any time. For CPU mappings even
> >>>> without further notice from the exporter.
> >>> You should look before you criticize, as I left you a way out.. the
> >>> idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
> >>> cannot be mapped to the guest.  We could ofc add more DMA_BUF_MAP_*
> >>> values if something else would suit you better.  But the goal is to
> >>> give the VMM enough information to dtrt, or return an error if mapping
> >>> to guest is not possible.  That seems better than assuming mapping to
> >>> guest will work and guessing about cache attrs
> >> Well I'm not rejecting the implementation, I'm rejecting this from the
> >> conceptual point of view.
> >>
> >> We intentional gave the exporter full control over the CPU mappings.
> >> This approach here breaks that now.
> >>
> >> I haven't seen the full detailed reason why we should do that and to be
> >> honest KVM seems to mess with things it is not supposed to touch.
> >>
> >> For example the page reference count of mappings marked with VM_IO is a
> >> complete no-go. This is a very strong evidence that this was based on
> >> rather dangerous halve knowledge about the background of the handling here.
> >>
> >> So as long as I don't see a full explanation why KVM is grabbing
> >> reference to pages while faulting them and why we manually need to
> >> forward the caching while the hardware documentation indicates otherwise
> >> I will be rejecting this whole approach.
> > Didn't we cover this on the previous iteration already.  From a host
> > kernel PoV these are just normal host userspace mappings.  The
> > userspace VMM mapping becomes the "physical address" in the guest and
> > the stage 2 translation tables map it to the guest userspace.
> >
> > The resulting cache attrs from combination of S1 and S2 translation
> > can differ.  So ideally we setup the S2 pgtables in guest aligned with
> > host userspace mappings
>
> Well exactly that is not very convincing.
>
> What you want to do is to use one channel for the address and a
> different one for the cache attrs, that's not something I would
> recommend doing in general.

How would that work.. mmap() is the channel for the address, we'd need
to introduce a new syscall that returned additional information?

> Instead the client pgtables should be setup in a way so that host can
> overwrite them.

How?  That is completely not how VMs work.  Even if the host knew
where the pgtables were and somehow magically knew the various guest
u

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-16 Thread Rob Clark
On Tue, Aug 16, 2022 at 4:45 AM Dmitry Osipenko
 wrote:
>
> On 8/12/22 18:01, Rob Clark wrote:
> > On Fri, Aug 12, 2022 at 7:57 AM Rob Clark  wrote:
> >>
> >> On Fri, Aug 12, 2022 at 4:26 AM Dmitry Osipenko
> >>  wrote:
> >>>
> >>> On 8/11/22 02:19, Rob Clark wrote:
> >>>> On Wed, Aug 10, 2022 at 3:23 PM Dmitry Osipenko
> >>>>  wrote:
> >>>>>
> >>>>> On 8/11/22 01:03, Rob Clark wrote:
> >>>>>> On Wed, Aug 10, 2022 at 12:26 PM Dmitry Osipenko
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> On 8/10/22 18:08, Rob Clark wrote:
> >>>>>>>> On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> >>>>>>>>>> On 7/6/22 00:48, Rob Clark wrote:
> >>>>>>>>>>> On Tue, Jul 5, 2022 at 4:51 AM Christian König 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> >>>>>>>>>>>>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers 
> >>>>>>>>>>>>> don't
> >>>>>>>>>>>>> handle imported dma-bufs properly, which results in mapping of 
> >>>>>>>>>>>>> something
> >>>>>>>>>>>>> else than the imported dma-buf. On NVIDIA Tegra we get a hard 
> >>>>>>>>>>>>> lockup when
> >>>>>>>>>>>>> userspace writes to the memory mapping of a dma-buf that was 
> >>>>>>>>>>>>> imported into
> >>>>>>>>>>>>> Tegra's DRM GEM.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Majority of DRM drivers prohibit mapping of the imported GEM 
> >>>>>>>>>>>>> objects.
> >>>>>>>>>>>>> Mapping of imported GEMs require special care from userspace 
> >>>>>>>>>>>>> since it
> >>>>>>>>>>>>> should sync dma-buf because mapping coherency of the exporter 
> >>>>>>>>>>>>> device may
> >>>>>>>>>>>>> not match the DRM device. Let's prohibit the mapping for all 
> >>>>>>>>>>>>> DRM drivers
> >>>>>>>>>>>>> for consistency.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Suggested-by: Thomas Hellström 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Signed-off-by: Dmitry Osipenko 
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm pretty sure that this is the right approach, but it's 
> >>>>>>>>>>>> certainly more
> >>>>>>>>>>>> than possible that somebody abused this already.
> >>>>>>>>>>>
> >>>>>>>>>>> I suspect that this is abused if you run deqp cts on android.. 
> >>>>>>>>>>> ie. all
> >>>>>>>>>>> winsys buffers are dma-buf imports from gralloc.  And then when 
> >>>>>>>>>>> you
> >>>>>>>>>>> hit readpix...
> >>>>>>>>>>>
> >>>>>>>>>>> You might only hit this in scenarios with separate gpu and 
> >>>>>>>>>>> display (or
> >>>>>>>>>>> dGPU+iGPU) because self-imports are handled differently in
> >>>>>>>>>>> drm_gem_prime_import_dev().. and maybe not in cases where you end 
> >>>>>>>>>>> up
> >>>>>>>>>>> with a blit from tiled/compressed to linear.. maybe that narrows 
> >>>>>>>>>>> the
> >>>>>>>>>>> scope enough to just fix it in userspace?
> >>&g

Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-16 Thread Rob Clark
On Tue, Aug 16, 2022 at 9:51 AM Christian König
 wrote:
>
> Am 16.08.22 um 16:26 schrieb Rob Clark:
> > On Tue, Aug 16, 2022 at 1:27 AM Christian König
> >  wrote:
> >> Am 15.08.22 um 23:15 schrieb Rob Clark:
> >>> From: Rob Clark 
> >>>
> >>> This is a fairly narrowly focused interface, providing a way for a VMM
> >>> in userspace to tell the guest kernel what pgprot settings to use when
> >>> mapping a buffer to guest userspace.
> >>>
> >>> For buffers that get mapped into guest userspace, virglrenderer returns
> >>> a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
> >>> pages into the guest VM, it needs to report to drm/virtio in the guest
> >>> the cache settings to use for guest userspace.  In particular, on some
> >>> architectures, creating aliased mappings with different cache attributes
> >>> is frowned upon, so it is important that the guest mappings have the
> >>> same cache attributes as any potential host mappings.
> >>>
> >>> Signed-off-by: Rob Clark 
> >>> ---
> >>> v2: Combine with coherency, as that is a related concept.. and it is
> >>>   relevant to the VMM whether coherent access without the SYNC ioctl
> >>>   is possible; set map_info at export time to make it more clear
> >>>   that it applies for the lifetime of the dma-buf (for any mmap
> >>>   created via the dma-buf)
> >> Well, exactly that's a conceptual NAK from my side.
> >>
> >> The caching information can change at any time. For CPU mappings even
> >> without further notice from the exporter.
> > You should look before you criticize, as I left you a way out.. the
> > idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
> > cannot be mapped to the guest.  We could ofc add more DMA_BUF_MAP_*
> > values if something else would suit you better.  But the goal is to
> > give the VMM enough information to dtrt, or return an error if mapping
> > to guest is not possible.  That seems better than assuming mapping to
> > guest will work and guessing about cache attrs
>
> Well I'm not rejecting the implementation, I'm rejecting this from the
> conceptual point of view.
>
> We intentional gave the exporter full control over the CPU mappings.
> This approach here breaks that now.
>
> I haven't seen the full detailed reason why we should do that and to be
> honest KVM seems to mess with things it is not supposed to touch.
>
> For example the page reference count of mappings marked with VM_IO is a
> complete no-go. This is a very strong evidence that this was based on
> rather dangerous halve knowledge about the background of the handling here.
>
> So as long as I don't see a full explanation why KVM is grabbing
> reference to pages while faulting them and why we manually need to
> forward the caching while the hardware documentation indicates otherwise
> I will be rejecting this whole approach.

Didn't we cover this on the previous iteration already.  From a host
kernel PoV these are just normal host userspace mappings.  The
userspace VMM mapping becomes the "physical address" in the guest and
the stage 2 translation tables map it to the guest userspace.

The resulting cache attrs from combination of S1 and S2 translation
can differ.  So ideally we setup the S2 pgtables in guest aligned with
host userspace mappings

BR,
-R

>
> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> If the hardware can't use the caching information from the host CPU page
> >> tables directly then that pretty much completely breaks the concept that
> >> the exporter is responsible for setting up those page tables.
> >>
> >> Regards,
> >> Christian.
> >>
> >>>drivers/dma-buf/dma-buf.c| 63 +++--
> >>>include/linux/dma-buf.h  | 11 ++
> >>>include/uapi/linux/dma-buf.h | 68 
> >>>3 files changed, 132 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index 32f55640890c..262c4706f721 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
> >>>.kill_sb = kill_anon_super,
> >>>};
> >>>
> >>> +static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_a

Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-16 Thread Rob Clark
On Tue, Aug 16, 2022 at 1:27 AM Christian König
 wrote:
>
> Am 15.08.22 um 23:15 schrieb Rob Clark:
> > From: Rob Clark 
> >
> > This is a fairly narrowly focused interface, providing a way for a VMM
> > in userspace to tell the guest kernel what pgprot settings to use when
> > mapping a buffer to guest userspace.
> >
> > For buffers that get mapped into guest userspace, virglrenderer returns
> > a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
> > pages into the guest VM, it needs to report to drm/virtio in the guest
> > the cache settings to use for guest userspace.  In particular, on some
> > architectures, creating aliased mappings with different cache attributes
> > is frowned upon, so it is important that the guest mappings have the
> > same cache attributes as any potential host mappings.
> >
> > Signed-off-by: Rob Clark 
> > ---
> > v2: Combine with coherency, as that is a related concept.. and it is
> >  relevant to the VMM whether coherent access without the SYNC ioctl
> >  is possible; set map_info at export time to make it more clear
> >  that it applies for the lifetime of the dma-buf (for any mmap
> >  created via the dma-buf)
>
> Well, exactly that's a conceptual NAK from my side.
>
> The caching information can change at any time. For CPU mappings even
> without further notice from the exporter.

You should look before you criticize, as I left you a way out.. the
idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
cannot be mapped to the guest.  We could ofc add more DMA_BUF_MAP_*
values if something else would suit you better.  But the goal is to
give the VMM enough information to dtrt, or return an error if mapping
to guest is not possible.  That seems better than assuming mapping to
guest will work and guessing about cache attrs

BR,
-R

> If the hardware can't use the caching information from the host CPU page
> tables directly then that pretty much completely breaks the concept that
> the exporter is responsible for setting up those page tables.
>
> Regards,
> Christian.
>
> >
> >   drivers/dma-buf/dma-buf.c| 63 +++--
> >   include/linux/dma-buf.h  | 11 ++
> >   include/uapi/linux/dma-buf.h | 68 
> >   3 files changed, 132 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 32f55640890c..262c4706f721 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
> >   .kill_sb = kill_anon_super,
> >   };
> >
> > +static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct 
> > *vma)
> > +{
> > + int ret;
> > +
> > + /* check if buffer supports mmap */
> > + if (!dmabuf->ops->mmap)
> > + return -EINVAL;
> > +
> > + ret = dmabuf->ops->mmap(dmabuf, vma);
> > +
> > + /*
> > +  * If the exporter claims to support coherent access, ensure the
> > +  * pgprot flags match the claim.
> > +  */
> > + if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
> > + pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
> > + if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
> > + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != 
> > pgprot_val(wc_prot));
> > + } else {
> > + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == 
> > pgprot_val(wc_prot));
> > + }
> > + }
> > +
> > + return ret;
> > +}
> > +
> >   static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct 
> > *vma)
> >   {
> >   struct dma_buf *dmabuf;
> > @@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, 
> > struct vm_area_struct *vma)
> >
> >   dmabuf = file->private_data;
> >
> > - /* check if buffer supports mmap */
> > - if (!dmabuf->ops->mmap)
> > - return -EINVAL;
> > -
> >   /* check for overflowing the buffer's size */
> >   if (vma->vm_pgoff + vma_pages(vma) >
> >   dmabuf->size >> PAGE_SHIFT)
> >   return -EINVAL;
> >
> > - return dmabuf->ops->mmap(dmabuf, vma);
> > + return __dma_buf_mmap(dmabuf, vma);
> >   }
> >
> >   static loff_t dma_buf_llseek(struct file *file, lof

[PATCH v2 3/3] drm/msm/prime: Add mmap_info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1dee0d18abbb..1db53545ac40 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1048,6 +1048,17 @@ static const struct vm_operations_struct vm_ops = {
.close = drm_gem_vm_close,
 };
 
+static enum dma_buf_map_info msm_gem_map_info(struct drm_gem_object *obj)
+{
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
+
+   switch (msm_obj->flags & MSM_BO_CACHE_MASK) {
+   case MSM_BO_WC:return DMA_BUF_COHERENT_WC;
+   case MSM_BO_CACHED_COHERENT:   return DMA_BUF_COHERENT_CACHED;
+   default:   return DMA_BUF_MAP_INCOHERENT;
+   }
+}
+
 static const struct drm_gem_object_funcs msm_gem_object_funcs = {
.free = msm_gem_free_object,
.pin = msm_gem_prime_pin,
@@ -1057,6 +1068,7 @@ static const struct drm_gem_object_funcs 
msm_gem_object_funcs = {
.vunmap = msm_gem_prime_vunmap,
.mmap = msm_gem_object_mmap,
.vm_ops = _ops,
+   .map_info = msm_gem_map_info,
 };
 
 static int msm_gem_new_impl(struct drm_device *dev,
-- 
2.36.1



[PATCH v2 2/3] drm/prime: Wire up mmap_info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

Just plumbing the thing thru an extra layer.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_prime.c |  3 +++
 include/drm/drm_gem.h   | 11 +++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index e3f09f18110c..4457fedde1ec 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -888,6 +888,9 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object 
*obj,
.resv = obj->resv,
};
 
+   if (obj->funcs && obj->funcs->map_info)
+   exp_info.map_info = obj->funcs->map_info(obj);
+
return drm_gem_dmabuf_export(dev, _info);
 }
 EXPORT_SYMBOL(drm_gem_prime_export);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index f28a48a6f846..a573ebfc529a 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -172,6 +172,17 @@ struct drm_gem_object_funcs {
 * This is optional but necessary for mmap support.
 */
const struct vm_operations_struct *vm_ops;
+
+   /**
+* @map_info:
+*
+* Return dma_buf_map_info indicating the coherency of an exported
+* dma-buf.
+*
+* This callback is optional.  If not provided, exported dma-bufs are
+* assumed to be DMA_BUF_MAP_INCOHERENT.
+*/
+   enum dma_buf_map_info (*map_info)(struct drm_gem_object *obj);
 };
 
 /**
-- 
2.36.1



[PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-15 Thread Rob Clark
From: Rob Clark 

This is a fairly narrowly focused interface, providing a way for a VMM
in userspace to tell the guest kernel what pgprot settings to use when
mapping a buffer to guest userspace.

For buffers that get mapped into guest userspace, virglrenderer returns
a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
pages into the guest VM, it needs to report to drm/virtio in the guest
the cache settings to use for guest userspace.  In particular, on some
architectures, creating aliased mappings with different cache attributes
is frowned upon, so it is important that the guest mappings have the
same cache attributes as any potential host mappings.

Signed-off-by: Rob Clark 
---
v2: Combine with coherency, as that is a related concept.. and it is
relevant to the VMM whether coherent access without the SYNC ioctl
is possible; set map_info at export time to make it more clear
that it applies for the lifetime of the dma-buf (for any mmap
created via the dma-buf)

 drivers/dma-buf/dma-buf.c| 63 +++--
 include/linux/dma-buf.h  | 11 ++
 include/uapi/linux/dma-buf.h | 68 
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..262c4706f721 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
.kill_sb = kill_anon_super,
 };
 
+static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
+{
+   int ret;
+
+   /* check if buffer supports mmap */
+   if (!dmabuf->ops->mmap)
+   return -EINVAL;
+
+   ret = dmabuf->ops->mmap(dmabuf, vma);
+
+   /*
+* If the exporter claims to support coherent access, ensure the
+* pgprot flags match the claim.
+*/
+   if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
+   pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
+   WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != 
pgprot_val(wc_prot));
+   } else {
+   WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == 
pgprot_val(wc_prot));
+   }
+   }
+
+   return ret;
+}
+
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
struct dma_buf *dmabuf;
@@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, 
struct vm_area_struct *vma)
 
dmabuf = file->private_data;
 
-   /* check if buffer supports mmap */
-   if (!dmabuf->ops->mmap)
-   return -EINVAL;
-
/* check for overflowing the buffer's size */
if (vma->vm_pgoff + vma_pages(vma) >
dmabuf->size >> PAGE_SHIFT)
return -EINVAL;
 
-   return dmabuf->ops->mmap(dmabuf, vma);
+   return __dma_buf_mmap(dmabuf, vma);
 }
 
 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -326,6 +348,27 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
+{
+   struct dma_buf_info arg;
+
+   if (copy_from_user(, uarg, sizeof(arg)))
+   return -EFAULT;
+
+   switch (arg.param) {
+   case DMA_BUF_INFO_MAP_INFO:
+   arg.value = dmabuf->map_info;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (copy_to_user(uarg, , sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static long dma_buf_ioctl(struct file *file,
  unsigned int cmd, unsigned long arg)
 {
@@ -369,6 +412,9 @@ static long dma_buf_ioctl(struct file *file,
case DMA_BUF_SET_NAME_B:
return dma_buf_set_name(dmabuf, (const char __user *)arg);
 
+   case DMA_BUF_IOCTL_INFO:
+   return dma_buf_info(dmabuf, (void __user *)arg);
+
default:
return -ENOTTY;
}
@@ -530,6 +576,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
dmabuf->priv = exp_info->priv;
dmabuf->ops = exp_info->ops;
dmabuf->size = exp_info->size;
+   dmabuf->map_info = exp_info->map_info;
dmabuf->exp_name = exp_info->exp_name;
dmabuf->owner = exp_info->owner;
spin_lock_init(>name_lock);
@@ -1245,10 +1292,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
vm_area_struct *vma,
if (WARN_ON(!dmabuf || !vma))
return -EINVAL;
 
-   /* check if buffer supports mmap */
-   if (!dmabuf->ops->mmap)
-   return -EINVAL;
-
/* check for offset overflow */
if 

[PATCH v2 0/3] dma-buf: map-info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

See 1/3 for motivation.

Rob Clark (3):
  dma-buf: Add ioctl to query mmap coherency/cache info
  drm/prime: Wire up mmap_info support
  drm/msm/prime: Add mmap_info support

 drivers/dma-buf/dma-buf.c | 63 ++--
 drivers/gpu/drm/drm_prime.c   |  3 ++
 drivers/gpu/drm/msm/msm_gem.c | 12 +++
 include/drm/drm_gem.h | 11 ++
 include/linux/dma-buf.h   | 11 ++
 include/uapi/linux/dma-buf.h  | 68 +++
 6 files changed, 158 insertions(+), 10 deletions(-)

-- 
2.36.1



[PATCH] drm/virtio: Fix same-context optimization

2022-08-12 Thread Rob Clark
From: Rob Clark 

When VIRTGPU_EXECBUF_RING_IDX is used, we should be considering the
timeline that the EB if running on rather than the global driver fence
context.

Fixes: 85c83ea915ed ("drm/virtio: implement context init: allocate an array of 
fence contexts")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index fa2f757583f7..9e6cb3c2666e 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -168,7 +168,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device 
*dev, void *data,
 * array contains any fence from a foreign context.
 */
ret = 0;
-   if (!dma_fence_match_context(in_fence, 
vgdev->fence_drv.context))
+   if (!dma_fence_match_context(in_fence, fence_ctx + ring_idx))
ret = dma_fence_wait(in_fence, true);
 
dma_fence_put(in_fence);
-- 
2.36.1



Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-12 Thread Rob Clark
On Fri, Aug 12, 2022 at 7:57 AM Rob Clark  wrote:
>
> On Fri, Aug 12, 2022 at 4:26 AM Dmitry Osipenko
>  wrote:
> >
> > On 8/11/22 02:19, Rob Clark wrote:
> > > On Wed, Aug 10, 2022 at 3:23 PM Dmitry Osipenko
> > >  wrote:
> > >>
> > >> On 8/11/22 01:03, Rob Clark wrote:
> > >>> On Wed, Aug 10, 2022 at 12:26 PM Dmitry Osipenko
> > >>>  wrote:
> > >>>>
> > >>>> On 8/10/22 18:08, Rob Clark wrote:
> > >>>>> On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  wrote:
> > >>>>>>
> > >>>>>> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> > >>>>>>> On 7/6/22 00:48, Rob Clark wrote:
> > >>>>>>>> On Tue, Jul 5, 2022 at 4:51 AM Christian König 
> > >>>>>>>>  wrote:
> > >>>>>>>>>
> > >>>>>>>>> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> > >>>>>>>>>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers 
> > >>>>>>>>>> don't
> > >>>>>>>>>> handle imported dma-bufs properly, which results in mapping of 
> > >>>>>>>>>> something
> > >>>>>>>>>> else than the imported dma-buf. On NVIDIA Tegra we get a hard 
> > >>>>>>>>>> lockup when
> > >>>>>>>>>> userspace writes to the memory mapping of a dma-buf that was 
> > >>>>>>>>>> imported into
> > >>>>>>>>>> Tegra's DRM GEM.
> > >>>>>>>>>>
> > >>>>>>>>>> Majority of DRM drivers prohibit mapping of the imported GEM 
> > >>>>>>>>>> objects.
> > >>>>>>>>>> Mapping of imported GEMs require special care from userspace 
> > >>>>>>>>>> since it
> > >>>>>>>>>> should sync dma-buf because mapping coherency of the exporter 
> > >>>>>>>>>> device may
> > >>>>>>>>>> not match the DRM device. Let's prohibit the mapping for all DRM 
> > >>>>>>>>>> drivers
> > >>>>>>>>>> for consistency.
> > >>>>>>>>>>
> > >>>>>>>>>> Suggested-by: Thomas Hellström 
> > >>>>>>>>>> Signed-off-by: Dmitry Osipenko 
> > >>>>>>>>>
> > >>>>>>>>> I'm pretty sure that this is the right approach, but it's 
> > >>>>>>>>> certainly more
> > >>>>>>>>> than possible that somebody abused this already.
> > >>>>>>>>
> > >>>>>>>> I suspect that this is abused if you run deqp cts on android.. ie. 
> > >>>>>>>> all
> > >>>>>>>> winsys buffers are dma-buf imports from gralloc.  And then when you
> > >>>>>>>> hit readpix...
> > >>>>>>>>
> > >>>>>>>> You might only hit this in scenarios with separate gpu and display 
> > >>>>>>>> (or
> > >>>>>>>> dGPU+iGPU) because self-imports are handled differently in
> > >>>>>>>> drm_gem_prime_import_dev().. and maybe not in cases where you end 
> > >>>>>>>> up
> > >>>>>>>> with a blit from tiled/compressed to linear.. maybe that narrows 
> > >>>>>>>> the
> > >>>>>>>> scope enough to just fix it in userspace?
> > >>>>>>>
> > >>>>>>> Given that that only drivers which use DRM-SHMEM potentially 
> > >>>>>>> could've
> > >>>>>>> map imported dma-bufs (Panfrost, Lima) and they already don't allow 
> > >>>>>>> to
> > >>>>>>> do that, I think we're good.
> > >>>>>>
> > >>>>>> So can I have an ack from Rob here or are there still questions that 
> > >>>>>> this
> > >>>>>> might go boom?
> > >>>>>>
> > >>>>>> D

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-12 Thread Rob Clark
On Fri, Aug 12, 2022 at 4:26 AM Dmitry Osipenko
 wrote:
>
> On 8/11/22 02:19, Rob Clark wrote:
> > On Wed, Aug 10, 2022 at 3:23 PM Dmitry Osipenko
> >  wrote:
> >>
> >> On 8/11/22 01:03, Rob Clark wrote:
> >>> On Wed, Aug 10, 2022 at 12:26 PM Dmitry Osipenko
> >>>  wrote:
> >>>>
> >>>> On 8/10/22 18:08, Rob Clark wrote:
> >>>>> On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  wrote:
> >>>>>>
> >>>>>> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> >>>>>>> On 7/6/22 00:48, Rob Clark wrote:
> >>>>>>>> On Tue, Jul 5, 2022 at 4:51 AM Christian König 
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> >>>>>>>>>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers 
> >>>>>>>>>> don't
> >>>>>>>>>> handle imported dma-bufs properly, which results in mapping of 
> >>>>>>>>>> something
> >>>>>>>>>> else than the imported dma-buf. On NVIDIA Tegra we get a hard 
> >>>>>>>>>> lockup when
> >>>>>>>>>> userspace writes to the memory mapping of a dma-buf that was 
> >>>>>>>>>> imported into
> >>>>>>>>>> Tegra's DRM GEM.
> >>>>>>>>>>
> >>>>>>>>>> Majority of DRM drivers prohibit mapping of the imported GEM 
> >>>>>>>>>> objects.
> >>>>>>>>>> Mapping of imported GEMs require special care from userspace since 
> >>>>>>>>>> it
> >>>>>>>>>> should sync dma-buf because mapping coherency of the exporter 
> >>>>>>>>>> device may
> >>>>>>>>>> not match the DRM device. Let's prohibit the mapping for all DRM 
> >>>>>>>>>> drivers
> >>>>>>>>>> for consistency.
> >>>>>>>>>>
> >>>>>>>>>> Suggested-by: Thomas Hellström 
> >>>>>>>>>> Signed-off-by: Dmitry Osipenko 
> >>>>>>>>>
> >>>>>>>>> I'm pretty sure that this is the right approach, but it's certainly 
> >>>>>>>>> more
> >>>>>>>>> than possible that somebody abused this already.
> >>>>>>>>
> >>>>>>>> I suspect that this is abused if you run deqp cts on android.. ie. 
> >>>>>>>> all
> >>>>>>>> winsys buffers are dma-buf imports from gralloc.  And then when you
> >>>>>>>> hit readpix...
> >>>>>>>>
> >>>>>>>> You might only hit this in scenarios with separate gpu and display 
> >>>>>>>> (or
> >>>>>>>> dGPU+iGPU) because self-imports are handled differently in
> >>>>>>>> drm_gem_prime_import_dev().. and maybe not in cases where you end up
> >>>>>>>> with a blit from tiled/compressed to linear.. maybe that narrows the
> >>>>>>>> scope enough to just fix it in userspace?
> >>>>>>>
> >>>>>>> Given that that only drivers which use DRM-SHMEM potentially could've
> >>>>>>> map imported dma-bufs (Panfrost, Lima) and they already don't allow to
> >>>>>>> do that, I think we're good.
> >>>>>>
> >>>>>> So can I have an ack from Rob here or are there still questions that 
> >>>>>> this
> >>>>>> might go boom?
> >>>>>>
> >>>>>> Dmitry, since you have a bunch of patches merged now I think would 
> >>>>>> also be
> >>>>>> good to get commit rights so you can drive this more yourself. I've 
> >>>>>> asked
> >>>>>> Daniel Stone to help you out with getting that.
> >>>>>
> >>>>> I *think* we'd be ok with this on msm, mostly just by dumb luck.
> >>>>> Because the dma-buf's we import will be self-import.  I'm less sure
> >>>>> about 

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-10 Thread Rob Clark
On Wed, Aug 10, 2022 at 3:23 PM Dmitry Osipenko
 wrote:
>
> On 8/11/22 01:03, Rob Clark wrote:
> > On Wed, Aug 10, 2022 at 12:26 PM Dmitry Osipenko
> >  wrote:
> >>
> >> On 8/10/22 18:08, Rob Clark wrote:
> >>> On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  wrote:
> >>>>
> >>>> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> >>>>> On 7/6/22 00:48, Rob Clark wrote:
> >>>>>> On Tue, Jul 5, 2022 at 4:51 AM Christian König 
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> >>>>>>>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
> >>>>>>>> handle imported dma-bufs properly, which results in mapping of 
> >>>>>>>> something
> >>>>>>>> else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup 
> >>>>>>>> when
> >>>>>>>> userspace writes to the memory mapping of a dma-buf that was 
> >>>>>>>> imported into
> >>>>>>>> Tegra's DRM GEM.
> >>>>>>>>
> >>>>>>>> Majority of DRM drivers prohibit mapping of the imported GEM objects.
> >>>>>>>> Mapping of imported GEMs require special care from userspace since it
> >>>>>>>> should sync dma-buf because mapping coherency of the exporter device 
> >>>>>>>> may
> >>>>>>>> not match the DRM device. Let's prohibit the mapping for all DRM 
> >>>>>>>> drivers
> >>>>>>>> for consistency.
> >>>>>>>>
> >>>>>>>> Suggested-by: Thomas Hellström 
> >>>>>>>> Signed-off-by: Dmitry Osipenko 
> >>>>>>>
> >>>>>>> I'm pretty sure that this is the right approach, but it's certainly 
> >>>>>>> more
> >>>>>>> than possible that somebody abused this already.
> >>>>>>
> >>>>>> I suspect that this is abused if you run deqp cts on android.. ie. all
> >>>>>> winsys buffers are dma-buf imports from gralloc.  And then when you
> >>>>>> hit readpix...
> >>>>>>
> >>>>>> You might only hit this in scenarios with separate gpu and display (or
> >>>>>> dGPU+iGPU) because self-imports are handled differently in
> >>>>>> drm_gem_prime_import_dev().. and maybe not in cases where you end up
> >>>>>> with a blit from tiled/compressed to linear.. maybe that narrows the
> >>>>>> scope enough to just fix it in userspace?
> >>>>>
> >>>>> Given that that only drivers which use DRM-SHMEM potentially could've
> >>>>> map imported dma-bufs (Panfrost, Lima) and they already don't allow to
> >>>>> do that, I think we're good.
> >>>>
> >>>> So can I have an ack from Rob here or are there still questions that this
> >>>> might go boom?
> >>>>
> >>>> Dmitry, since you have a bunch of patches merged now I think would also 
> >>>> be
> >>>> good to get commit rights so you can drive this more yourself. I've asked
> >>>> Daniel Stone to help you out with getting that.
> >>>
> >>> I *think* we'd be ok with this on msm, mostly just by dumb luck.
> >>> Because the dma-buf's we import will be self-import.  I'm less sure
> >>> about panfrost (src/panfrost/lib/pan_bo.c doesn't seem to have a
> >>> special path for imported dma-bufs either, and in that case they won't
> >>> be self-imports.. but I guess no one has tried to run android cts on
> >>> panfrost).
> >>
> >> The last time I tried to mmap dma-buf imported to Panfrost didn't work
> >> because Panfrost didn't implement something needed for that. I'll need
> >> to take a look again because can't recall what it was.
> >>
> >>> What about something less drastic to start, like (apologies for
> >>> hand-edited patch):
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> >>> index 86d670c71286..fc9ec42fa0ab 100644
> >>> --- a/drivers/gpu/drm/drm_gem.c
> >>> +++ b/drivers/gpu/drm

Re: [PATCH v5] drm: Add initial ci/ subdirectory

2022-08-10 Thread Rob Clark
On Wed, Aug 10, 2022 at 11:25 AM Rodrigo Siqueira Jordao
 wrote:
>
> Hi Tomeu,
>
> First of all, nice patch! I just saw it, and I have some basic questions
> (I don't understand many of these CI details). I also CC some CI folks
> from the display team at AMD.
>
> On 2022-07-26 14:16, Tomeu Vizoso wrote:
> > And use it to store expectations about what the DRM drivers are
> > supposed to pass in the IGT test suite.
> >
> > Also include a configuration file that points to the out-of-tree CI
> > scripts.
> >
> > By storing the test expectations along the code we can make sure both
> > stay in sync with each other, and so we can know when a code change
> > breaks those expectations.
> >
> > This will allow all contributors to drm to reuse the infrastructure
> > already in gitlab.freedesktop.org to test the driver on several
> > generations of the hardware.
> >
> > v2:
> >- Fix names of result expectation files to match SoC
> >- Don't execute tests that are going to skip on all boards
> >
> > v3:
> >- Remove tracking of dmesg output during test execution
> >
> > v4:
> >- Move up to drivers/gpu/drm
> >- Add support for a bunch of other drivers
> >- Explain how to incorporate fixes for CI from a
> >  ${TARGET_BRANCH}-external-fixes branch
> >- Remove tests that pass from expected results file, to reduce the
> >  size of in-tree files
> >- Add docs about how to deal with outages in automated testing labs
> >- Specify the exact SHA of the CI scripts to be used
> >
> > v5:
> >- Remove unneeded skips from Meson expectations file
> >- Use a more advanced runner that detects flakes automatically
> >- Use a more succint format for the expectations
> >- Run many more tests (and use sharding to finish in time)
> >- Use skip lists to avoid hanging machines
> >- Add some build testing
> >- Build IGT in each pipeline for faster uprevs
> >- List failures in the GitLab UI
> >
> > Signed-off-by: Tomeu Vizoso 
> > Reviewed-by: Neil Armstrong 
> > ---
> >   Documentation/gpu/automated_testing.rst   | 84 ++
> >   drivers/gpu/drm/ci/amdgpu-stoney-fails.txt| 13 +++
> >   drivers/gpu/drm/ci/amdgpu-stoney-flakes.txt   | 20 +
> >   drivers/gpu/drm/ci/amdgpu-stoney-skips.txt|  2 +
> >   drivers/gpu/drm/ci/gitlab-ci.yml  | 13 +++
> >   drivers/gpu/drm/ci/i915-amly-flakes.txt   | 32 +++
> >   drivers/gpu/drm/ci/i915-amly-skips.txt|  2 +
> >   drivers/gpu/drm/ci/i915-apl-fails.txt | 29 +++
> >   drivers/gpu/drm/ci/i915-apl-flakes.txt|  1 +
> >   drivers/gpu/drm/ci/i915-apl-skips.txt |  2 +
> >   drivers/gpu/drm/ci/i915-cml-flakes.txt| 36 
> >   drivers/gpu/drm/ci/i915-glk-flakes.txt| 40 +
> >   drivers/gpu/drm/ci/i915-glk-skips.txt |  2 +
> >   drivers/gpu/drm/ci/i915-kbl-fails.txt |  8 ++
> >   drivers/gpu/drm/ci/i915-kbl-flakes.txt| 24 ++
> >   drivers/gpu/drm/ci/i915-kbl-skips.txt |  2 +
> >   drivers/gpu/drm/ci/i915-tgl-fails.txt | 19 
> >   drivers/gpu/drm/ci/i915-tgl-flakes.txt|  6 ++
> >   drivers/gpu/drm/ci/i915-tgl-skips.txt |  8 ++
> >   drivers/gpu/drm/ci/i915-whl-fails.txt | 30 +++
> >   drivers/gpu/drm/ci/i915-whl-flakes.txt|  1 +
> >   drivers/gpu/drm/ci/mediatek-mt8173-fails.txt  | 29 +++
> >   drivers/gpu/drm/ci/mediatek-mt8183-fails.txt  | 10 +++
> >   drivers/gpu/drm/ci/mediatek-mt8183-flakes.txt | 14 +++
> >   drivers/gpu/drm/ci/meson-g12b-fails.txt   |  5 ++
> >   drivers/gpu/drm/ci/meson-g12b-flakes.txt  |  4 +
> >   drivers/gpu/drm/ci/msm-apq8016-fails.txt  | 15 
> >   drivers/gpu/drm/ci/msm-apq8016-flakes.txt |  4 +
> >   drivers/gpu/drm/ci/msm-apq8096-fails.txt  |  2 +
> >   drivers/gpu/drm/ci/msm-apq8096-flakes.txt |  4 +
> >   drivers/gpu/drm/ci/msm-apq8096-skips.txt  |  2 +
> >   drivers/gpu/drm/ci/msm-sc7180-fails.txt   | 22 +
> >   drivers/gpu/drm/ci/msm-sc7180-flakes.txt  | 14 +++
> >   drivers/gpu/drm/ci/msm-sc7180-skips.txt   | 18 
> >   drivers/gpu/drm/ci/msm-sdm845-fails.txt   | 44 ++
> >   drivers/gpu/drm/ci/msm-sdm845-flakes.txt  | 33 +++
> >   drivers/gpu/drm/ci/msm-sdm845-skips.txt   |  2 +
> >   drivers/gpu/drm/ci/rockchip-rk3288-fails.txt  | 75 
> >   drivers/gpu/drm/ci/rockchip-rk3288-flakes.txt |  5 ++
> >   drivers/gpu/drm/ci/rockchip-rk3288-skips.txt  | 46 ++
> >   drivers/gpu/drm/ci/rockchip-rk3399-fails.txt  | 86 +++
> >   drivers/gpu/drm/ci/rockchip-rk3399-flakes.txt | 25 ++
> >   drivers/gpu/drm/ci/rockchip-rk3399-skips.txt  |  5 ++
> >   drivers/gpu/drm/ci/virtio_gpu-none-fails.txt  | 38 
> >   drivers/gpu/drm/ci/virtio_gpu-none-flakes.txt |  0
> >   drivers/gpu/drm/ci/virtio_gpu-none-skips.txt  |  6 ++
> >   46 files changed, 882 insertions(+)
> >   create mode 100644 

Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-10 Thread Rob Clark
On Wed, Aug 10, 2022 at 12:26 PM Dmitry Osipenko
 wrote:
>
> On 8/10/22 18:08, Rob Clark wrote:
> > On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  wrote:
> >>
> >> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> >>> On 7/6/22 00:48, Rob Clark wrote:
> >>>> On Tue, Jul 5, 2022 at 4:51 AM Christian König 
> >>>>  wrote:
> >>>>>
> >>>>> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> >>>>>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
> >>>>>> handle imported dma-bufs properly, which results in mapping of 
> >>>>>> something
> >>>>>> else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup 
> >>>>>> when
> >>>>>> userspace writes to the memory mapping of a dma-buf that was imported 
> >>>>>> into
> >>>>>> Tegra's DRM GEM.
> >>>>>>
> >>>>>> Majority of DRM drivers prohibit mapping of the imported GEM objects.
> >>>>>> Mapping of imported GEMs require special care from userspace since it
> >>>>>> should sync dma-buf because mapping coherency of the exporter device 
> >>>>>> may
> >>>>>> not match the DRM device. Let's prohibit the mapping for all DRM 
> >>>>>> drivers
> >>>>>> for consistency.
> >>>>>>
> >>>>>> Suggested-by: Thomas Hellström 
> >>>>>> Signed-off-by: Dmitry Osipenko 
> >>>>>
> >>>>> I'm pretty sure that this is the right approach, but it's certainly more
> >>>>> than possible that somebody abused this already.
> >>>>
> >>>> I suspect that this is abused if you run deqp cts on android.. ie. all
> >>>> winsys buffers are dma-buf imports from gralloc.  And then when you
> >>>> hit readpix...
> >>>>
> >>>> You might only hit this in scenarios with separate gpu and display (or
> >>>> dGPU+iGPU) because self-imports are handled differently in
> >>>> drm_gem_prime_import_dev().. and maybe not in cases where you end up
> >>>> with a blit from tiled/compressed to linear.. maybe that narrows the
> >>>> scope enough to just fix it in userspace?
> >>>
> >>> Given that that only drivers which use DRM-SHMEM potentially could've
> >>> map imported dma-bufs (Panfrost, Lima) and they already don't allow to
> >>> do that, I think we're good.
> >>
> >> So can I have an ack from Rob here or are there still questions that this
> >> might go boom?
> >>
> >> Dmitry, since you have a bunch of patches merged now I think would also be
> >> good to get commit rights so you can drive this more yourself. I've asked
> >> Daniel Stone to help you out with getting that.
> >
> > I *think* we'd be ok with this on msm, mostly just by dumb luck.
> > Because the dma-buf's we import will be self-import.  I'm less sure
> > about panfrost (src/panfrost/lib/pan_bo.c doesn't seem to have a
> > special path for imported dma-bufs either, and in that case they won't
> > be self-imports.. but I guess no one has tried to run android cts on
> > panfrost).
>
> The last time I tried to mmap dma-buf imported to Panfrost didn't work
> because Panfrost didn't implement something needed for that. I'll need
> to take a look again because can't recall what it was.
>
> > What about something less drastic to start, like (apologies for
> > hand-edited patch):
> >
> > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > index 86d670c71286..fc9ec42fa0ab 100644
> > --- a/drivers/gpu/drm/drm_gem.c
> > +++ b/drivers/gpu/drm/drm_gem.c
> > @@ -1034,6 +1034,10 @@ int drm_gem_mmap_obj(struct drm_gem_object
> > *obj, unsigned long obj_size,
> >  {
> > int ret;
> >
> > +   WARN_ON_ONCE(obj->import_attach);
>
> This will hang NVIDIA Tegra, which is what this patch fixed initially.
> If neither of upstream DRM drivers need to map imported dma-bufs and
> never needed, then why do we need this?

oh, tegra isn't using shmem helpers?  I assumed it was.  Well my point
was to make a more targeted fail on tegra, and a WARN_ON for everyone
else to make it clear that what they are doing is undefined behavior.
Because so far existing userspace (or well, panfrost and freedreno at
least, those are the two I know or checked) don't make special cases
for mmap'ing against the dmabuf fd against the dmabuf fd instead of
the drm device fd.

I *think* it should work out that we don't hit this path with
freedreno but on android I can't really guarantee or prove it.  So
your patch would potentially break existing working userspace.  Maybe
it is userspace that isn't portable (but OTOH it isn't like you are
going to be using freedreno on tegra).  So why don't you go for a more
targeted fix that only returns an error on hw where this is
problematic?

BR,
-R


Re: [PATCH v8 2/2] drm/gem: Don't map imported GEMs

2022-08-10 Thread Rob Clark
On Wed, Aug 10, 2022 at 4:47 AM Daniel Vetter  wrote:
>
> On Wed, Jul 06, 2022 at 10:02:07AM +0300, Dmitry Osipenko wrote:
> > On 7/6/22 00:48, Rob Clark wrote:
> > > On Tue, Jul 5, 2022 at 4:51 AM Christian König  
> > > wrote:
> > >>
> > >> Am 01.07.22 um 11:02 schrieb Dmitry Osipenko:
> > >>> Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
> > >>> handle imported dma-bufs properly, which results in mapping of something
> > >>> else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup 
> > >>> when
> > >>> userspace writes to the memory mapping of a dma-buf that was imported 
> > >>> into
> > >>> Tegra's DRM GEM.
> > >>>
> > >>> Majority of DRM drivers prohibit mapping of the imported GEM objects.
> > >>> Mapping of imported GEMs require special care from userspace since it
> > >>> should sync dma-buf because mapping coherency of the exporter device may
> > >>> not match the DRM device. Let's prohibit the mapping for all DRM drivers
> > >>> for consistency.
> > >>>
> > >>> Suggested-by: Thomas Hellström 
> > >>> Signed-off-by: Dmitry Osipenko 
> > >>
> > >> I'm pretty sure that this is the right approach, but it's certainly more
> > >> than possible that somebody abused this already.
> > >
> > > I suspect that this is abused if you run deqp cts on android.. ie. all
> > > winsys buffers are dma-buf imports from gralloc.  And then when you
> > > hit readpix...
> > >
> > > You might only hit this in scenarios with separate gpu and display (or
> > > dGPU+iGPU) because self-imports are handled differently in
> > > drm_gem_prime_import_dev().. and maybe not in cases where you end up
> > > with a blit from tiled/compressed to linear.. maybe that narrows the
> > > scope enough to just fix it in userspace?
> >
> > Given that that only drivers which use DRM-SHMEM potentially could've
> > map imported dma-bufs (Panfrost, Lima) and they already don't allow to
> > do that, I think we're good.
>
> So can I have an ack from Rob here or are there still questions that this
> might go boom?
>
> Dmitry, since you have a bunch of patches merged now I think would also be
> good to get commit rights so you can drive this more yourself. I've asked
> Daniel Stone to help you out with getting that.

I *think* we'd be ok with this on msm, mostly just by dumb luck.
Because the dma-buf's we import will be self-import.  I'm less sure
about panfrost (src/panfrost/lib/pan_bo.c doesn't seem to have a
special path for imported dma-bufs either, and in that case they won't
be self-imports.. but I guess no one has tried to run android cts on
panfrost).

What about something less drastic to start, like (apologies for
hand-edited patch):

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..fc9ec42fa0ab 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1034,6 +1034,10 @@ int drm_gem_mmap_obj(struct drm_gem_object
*obj, unsigned long obj_size,
 {
int ret;

+   WARN_ON_ONCE(obj->import_attach);
+
/* Check for valid size. */
if (obj_size < vma->vm_end - vma->vm_start)
return -EINVAL;
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 8ad0e02991ca..6190f5018986 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops);
  */
 int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct
vm_area_struct *vma)
 {
-   struct drm_gem_object *obj = >base;
int ret;

if (obj->import_attach) {
-   /* Drop the reference drm_gem_mmap_obj() acquired.*/
-   drm_gem_object_put(obj);
-   vma->vm_private_data = NULL;
-
-   return dma_buf_mmap(obj->dma_buf, vma, 0);
+   return -EINVAL;
}

ret = drm_gem_shmem_get_pages(shmem);
if (ret) {
drm_gem_vm_close(vma);
--
2.36.1


Re: DMA-buf and uncached system memory

2022-08-09 Thread Rob Clark
On Mon, Feb 15, 2021 at 12:58 AM Christian König
 wrote:
>
> Hi guys,
>
> we are currently working an Freesync and direct scan out from system
> memory on AMD APUs in A+A laptops.
>
> On problem we stumbled over is that our display hardware needs to scan
> out from uncached system memory and we currently don't have a way to
> communicate that through DMA-buf.
>
> For our specific use case at hand we are going to implement something
> driver specific, but the question is should we have something more
> generic for this?

I'm a bit late to this party (and sorry, I didn't read the entire
thread), but it occurs to me that dmabuf mmap_info[1] would also get
you what you need, ie. display importing dma-buf could check whether
the exporter is mapping cached or not, and reject the import if
needed?

[1] https://patchwork.freedesktop.org/patch/496069/?series=106847=2

> After all the system memory access pattern is a PCIe extension and as
> such something generic.
>
> Regards,
> Christian.
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-08 Thread Rob Clark
On Mon, Aug 8, 2022 at 7:56 AM Christian König  wrote:
>
> Am 08.08.22 um 15:26 schrieb Rob Clark:
> > On Mon, Aug 8, 2022 at 4:22 AM Christian König  
> > wrote:
> >
> > [SNIP]
> >>>> If the virtio/virtgpu UAPI was build around the idea that this is
> >>>> possible then it is most likely fundamental broken.
> >>> How else can you envision mmap'ing to guest userspace working?
> >> Well long story short: You can't.
> >>
> >> See userspace mappings are not persistent, but rather faulted in on
> >> demand. The exporter is responsible for setting those up to be able to
> >> add reverse tracking and so can invalidate those mappings when the
> >> backing store changes.
> > I think that is not actually a problem.  At least for how it works on
> > arm64 but I'm almost positive x86 is similar.. I'm not sure how else
> > you could virtualize mmu/iommu/etc in a way that didn't have horrible
> > performance.
> >
> > There are two levels of pagetable translation, the first controlled by
> > the host kernel, the second by the guest.  From the PoV of host
> > kernel, it is just memory mapped to userspace, getting faulted in on
> > demand, just as normal.  First the guest controlled translation
> > triggers a fault in the guest which sets up guest mapping.  And then
> > the second level of translation to translate from what guest sees as
> > PA (but host sees as VA) to actual PA triggers a fault in the host.
>
> Ok, that's calming.
>
> At least that's not the approach talked about the last time this came up
> and it doesn't rip a massive security hole somewhere.

Hmm, tbh I'm not sure which thread/discussion this was.. it could have
been before I was paying much attention to the vm use-case

> The question is why is the guest then not using the caching attributes
> setup by the host page tables when the translation is forwarded anyway?

The guest kernel itself doesn't know.  AFAICT, at least on arm, the hw
will combine the attributes of the mapping in S1 and S2 pagetables and
use the most restrictive.  So if S1 (host) is cached but S2 (guest) is
WC, you'll end up w/ WC.

That said, at least on aarch64, it seems like we could always tell the
guest it is cached, and if mapped WC in S1 you'll end up with WC
access.  But this seems to depend on an optional feature, FWB, which
allows S2 to override S1 attributes, not being enabled.  And not
entirely sure how it works on x86.

BR,
-R

> > [SNIP]
> > This is basically what happens, although via the two levels of pgtable
> > translation.  This patch provides the missing piece, the caching
> > attributes.
>
> Yeah, but that won't work like this. See the backing store migrates all
> the time and when it is backed by PCIe/VRAM/local memory you need to use
> write combine while system memory is usually cached.
>
> >>   Because otherwise you can't accommodate that the exporter is
> >> changing those caching attributes.
> > Changing the attributes dynamically isn't going to work.. or at least
> > not easily.  If you had some sort of synchronous notification to host
> > userspace, it could trigger an irq to the guest, I suppose.  But it
> > would mean host kernel has to block waiting for host userspace to
> > interrupt the guest, then wait for guest vgpu process to be scheduled
> > and handle the irq.
>
> We basically change that on every page flip on APUs and that doesn't
> sound like something fast.
>
> Thanks for the explanation how this works,
> Christian.
>
> >
> > At least in the case of msm, the cache attributes are static for the
> > life of the buffer, so this scenario isn't a problem.  AFAICT this
> > should work fine for at least all UMA hw.. I'm a bit less sure when it
> > comes to TTM, but shouldn't you at least be able to use worst-cache
> > cache attributes for buffers that are allowed to be mapped to guest?
> >
> > BR,
> > -R
> >
> >>> But more seriously, let's take a step back here.. what scenarios are
> >>> you seeing this being problematic for?  Then we can see how to come up
> >>> with solutions.  The current situation of host userspace VMM just
> >>> guessing isn't great.
> >> Well "isn't great" is a complete understatement. When KVM/virtio/virtgpu
> >> is doing what I guess they are doing here then that is a really major
> >> security hole.
> >>
> >>> And sticking our heads in the sand and
> >>> pretending VMs don't exist isn't great.  So what can we do?  I can
> >>> instead add a msm ioctl to return this info and solve the problem even
>

Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-08 Thread Rob Clark
On Mon, Aug 8, 2022 at 4:22 AM Christian König  wrote:
>
> Am 07.08.22 um 21:10 schrieb Rob Clark:
> > On Sun, Aug 7, 2022 at 11:05 AM Christian König
> >  wrote:
> >> Am 07.08.22 um 19:56 schrieb Rob Clark:
> >>> On Sun, Aug 7, 2022 at 10:38 AM Christian König
> >>>  wrote:
> >>>> [SNIP]
> >>>> And exactly that was declared completely illegal the last time it came
> >>>> up on the mailing list.
> >>>>
> >>>> Daniel implemented a whole bunch of patches into the DMA-buf layer to
> >>>> make it impossible for KVM to do this.
> >>> This issue isn't really with KVM, it is not making any CPU mappings
> >>> itself.  KVM is just making the pages available to the guest.
> >> Well I can only repeat myself: This is strictly illegal.
> >>
> >> Please try this approach with CONFIG_DMABUF_DEBUG set. I'm pretty sure
> >> you will immediately run into a crash.
> >>
> >> See this here as well
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fdrivers%2Fdma-buf%2Fdma-buf.c%23L653data=05%7C01%7Cchristian.koenig%40amd.com%7Cc1392f76994f4fef7c7f08da78a86283%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637954961892996770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=T3g9ICZizCWXkIn5vEnhFYs38Uj37jCwHjMb1s3UtOw%3Dreserved=0
> >>
> >> Daniel intentionally added code to mangle the page pointers to make it
> >> impossible for KVM to do this.
> > I don't believe KVM is using the sg table, so this isn't going to stop
> > anything ;-)
>
> Then I have no idea how KVM actually works. Can you please briefly
> describe that?
>
> >> If the virtio/virtgpu UAPI was build around the idea that this is
> >> possible then it is most likely fundamental broken.
> > How else can you envision mmap'ing to guest userspace working?
>
> Well long story short: You can't.
>
> See userspace mappings are not persistent, but rather faulted in on
> demand. The exporter is responsible for setting those up to be able to
> add reverse tracking and so can invalidate those mappings when the
> backing store changes.

I think that is not actually a problem.  At least for how it works on
arm64 but I'm almost positive x86 is similar.. I'm not sure how else
you could virtualize mmu/iommu/etc in a way that didn't have horrible
performance.

There are two levels of pagetable translation, the first controlled by
the host kernel, the second by the guest.  From the PoV of host
kernel, it is just memory mapped to userspace, getting faulted in on
demand, just as normal.  First the guest controlled translation
triggers a fault in the guest which sets up guest mapping.  And then
the second level of translation to translate from what guest sees as
PA (but host sees as VA) to actual PA triggers a fault in the host.

>
> > The guest kernel is the one that controls the guest userspace pagetables,
> > not the host kernel.  I guess your complaint is about VMs in general,
> > but unfortunately I don't think you'll convince the rest of the
> > industry to abandon VMs ;-)
>
> I'm not arguing against the usefulness of VM, it's just that what you
> describe here technically is just utterly nonsense as far as I can tell.
>
> I have to confess that I'm totally lacking how this KVM mapping works,
> but when the struct pages pointers from the sg_table are not used I see
> two possibilities what was implemented here:
>
> 1. KVM is somehow walking the page tables to figure out what to map into
> the guest VM.

it is just mapping host VA to the guest.. the guest kernel sees this
as the PA and uses the level of pgtable translation that it controls
to map to guest userspace.  *All* that is needed (which this patch
provides) is the correct cache attributes.

>  This would be *HIGHLY* illegal and not just with DMA-buf, but with
> pretty much a whole bunch of other drivers/subsystems as well.
>  In other words it would be trivial for the guest to take over the
> host with that because it doesn't take into account that the underlying
> backing store of DMA-buf and other mmaped() areas can change at any time.
>
> 2. The guest VM triggers the fault handler for the mappings to fill in
> their page tables on demand.
>
>  That would actually work with DMA-buf, but then the guest needs to
> somehow use the caching attributes from the host side and not use it's own.

This is basically what happens, although via the two levels of pgtable
translation.  This patch provides the missing piece, the caching
attributes.

>  Because otherwise you can't accommod

Re: [Freedreno] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-08 Thread Rob Clark
On Sun, Aug 7, 2022 at 1:25 PM Akhil P Oommen  wrote:
>
> On 7/29/2022 10:37 PM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This is a fairly narrowly focused interface, providing a way for a VMM
> > in userspace to tell the guest kernel what pgprot settings to use when
> > mapping a buffer to guest userspace.
> >
> > For buffers that get mapped into guest userspace, virglrenderer returns
> > a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
> > pages into the guest VM, it needs to report to drm/virtio in the guest
> > the cache settings to use for guest userspace.  In particular, on some
> > architectures, creating aliased mappings with different cache attributes
> > is frowned upon, so it is important that the guest mappings have the
> > same cache attributes as any potential host mappings.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/dma-buf/dma-buf.c| 26 ++
> >   include/linux/dma-buf.h  |  7 +++
> >   include/uapi/linux/dma-buf.h | 28 
> >   3 files changed, 61 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 32f55640890c..d02d6c2a3b49 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -326,6 +326,29 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, 
> > const char __user *buf)
> >   return 0;
> >   }
> >
> > +static long dma_buf_info(struct dma_buf *dmabuf, const void __user *uarg)
> > +{
> > + struct dma_buf_info arg;
> > +
> > + if (copy_from_user(, uarg, sizeof(arg)))
> > + return -EFAULT;
> > +
> > + switch (arg.param) {
> > + case DMA_BUF_INFO_VM_PROT:
> > + if (!dmabuf->ops->mmap_info)
> > + return -ENOSYS;
> > + arg.value = dmabuf->ops->mmap_info(dmabuf);
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + if (copy_to_user(uarg, , sizeof(arg)))
> > + return -EFAULT;
> > +
> > + return 0;
> > +}
> > +
> >   static long dma_buf_ioctl(struct file *file,
> > unsigned int cmd, unsigned long arg)
> >   {
> > @@ -369,6 +392,9 @@ static long dma_buf_ioctl(struct file *file,
> >   case DMA_BUF_SET_NAME_B:
> >   return dma_buf_set_name(dmabuf, (const char __user *)arg);
> >
> > + case DMA_BUF_IOCTL_INFO:
> > + return dma_buf_info(dmabuf, (const void __user *)arg);
> > +
> >   default:
> >   return -ENOTTY;
> >   }
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 71731796c8c3..6f4de64a5937 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -283,6 +283,13 @@ struct dma_buf_ops {
> >*/
> >   int (*mmap)(struct dma_buf *, struct vm_area_struct *vma);
> >
> > + /**
> > +  * @mmap_info:
> > +  *
> > +  * Return mmapping info for the buffer.  See DMA_BUF_INFO_VM_PROT.
> > +  */
> > + int (*mmap_info)(struct dma_buf *);
> > +
> >   int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map);
> >   void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map);
> >   };
> > diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> > index b1523cb8ab30..a41adac0f46a 100644
> > --- a/include/uapi/linux/dma-buf.h
> > +++ b/include/uapi/linux/dma-buf.h
> > @@ -85,6 +85,32 @@ struct dma_buf_sync {
> >
> >   #define DMA_BUF_NAME_LEN32
> >
> > +
> > +/**
> > + * struct dma_buf_info - Query info about the buffer.
> > + */
> > +struct dma_buf_info {
> > +
> > +#define DMA_BUF_INFO_VM_PROT  1
> > +#  define DMA_BUF_VM_PROT_WC  0
> > +#  define DMA_BUF_VM_PROT_CACHED  1
> > +
> > + /**
> > +  * @param: Which param to query
> > +  *
> > +  * DMA_BUF_INFO_BM_PROT:
> Is there a typo here? BM -> VM ?

yes, fixed locally

>
> -Akhil.
> > +  * Query the access permissions of userspace mmap's of this 
> > buffer.
> > +  * Returns one of DMA_BUF_VM_PROT_x
> > +  */
> > + __u32 param;
> > + __u32 pad;
> > +
> > + /**
> > +  * @value: Return value of the query.
> > +  */
> > + __u64 value;
> > +};
> > +
> >   #define DMA_BUF_BASE'b'
> >   #define DMA_BUF_IOCTL_SYNC  _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
> >
> > @@ -95,4 +121,6 @@ struct dma_buf_sync {
> >   #define DMA_BUF_SET_NAME_A  _IOW(DMA_BUF_BASE, 1, __u32)
> >   #define DMA_BUF_SET_NAME_B  _IOW(DMA_BUF_BASE, 1, __u64)
> >
> > +#define DMA_BUF_IOCTL_INFO   _IOWR(DMA_BUF_BASE, 2, struct dma_buf_info)
> > +
> >   #endif
>


Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-07 Thread Rob Clark
On Sun, Aug 7, 2022 at 11:05 AM Christian König
 wrote:
>
> Am 07.08.22 um 19:56 schrieb Rob Clark:
> > On Sun, Aug 7, 2022 at 10:38 AM Christian König
> >  wrote:
> >> [SNIP]
> >> And exactly that was declared completely illegal the last time it came
> >> up on the mailing list.
> >>
> >> Daniel implemented a whole bunch of patches into the DMA-buf layer to
> >> make it impossible for KVM to do this.
> > This issue isn't really with KVM, it is not making any CPU mappings
> > itself.  KVM is just making the pages available to the guest.
>
> Well I can only repeat myself: This is strictly illegal.
>
> Please try this approach with CONFIG_DMABUF_DEBUG set. I'm pretty sure
> you will immediately run into a crash.
>
> See this here as well
> https://elixir.bootlin.com/linux/v5.19/source/drivers/dma-buf/dma-buf.c#L653
>
> Daniel intentionally added code to mangle the page pointers to make it
> impossible for KVM to do this.

I don't believe KVM is using the sg table, so this isn't going to stop
anything ;-)

> If the virtio/virtgpu UAPI was build around the idea that this is
> possible then it is most likely fundamental broken.

How else can you envision mmap'ing to guest userspace working?  The
guest kernel is the one that controls the guest userspace pagetables,
not the host kernel.  I guess your complaint is about VMs in general,
but unfortunately I don't think you'll convince the rest of the
industry to abandon VMs ;-)

But more seriously, let's take a step back here.. what scenarios are
you seeing this being problematic for?  Then we can see how to come up
with solutions.  The current situation of host userspace VMM just
guessing isn't great.  And sticking our heads in the sand and
pretending VMs don't exist isn't great.  So what can we do?  I can
instead add a msm ioctl to return this info and solve the problem even
more narrowly for a single platform.  But then the problem still
remains on other platforms.

Slightly implicit in this is that mapping dma-bufs to the guest won't
work for anything that requires DMA_BUF_IOCTL_SYNC for coherency.. we
could add a possible return value for DMA_BUF_INFO_VM_PROT indicating
that the buffer does not support mapping to guest or CPU access
without DMA_BUF_IOCTL_SYNC.  Then at least the VMM can fail gracefully
instead of subtly.

BR,
-R


Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-07 Thread Rob Clark
On Sun, Aug 7, 2022 at 10:38 AM Christian König
 wrote:
>
> Am 07.08.22 um 19:35 schrieb Rob Clark:
> > On Sun, Aug 7, 2022 at 10:14 AM Christian König
> >  wrote:
> >> Am 07.08.22 um 19:02 schrieb Rob Clark:
> >>> On Sun, Aug 7, 2022 at 9:09 AM Christian König
> >>>  wrote:
> >>>> Am 29.07.22 um 19:07 schrieb Rob Clark:
> >>>>> From: Rob Clark 
> >>>>>
> >>>>> This is a fairly narrowly focused interface, providing a way for a VMM
> >>>>> in userspace to tell the guest kernel what pgprot settings to use when
> >>>>> mapping a buffer to guest userspace.
> >>>>>
> >>>>> For buffers that get mapped into guest userspace, virglrenderer returns
> >>>>> a dma-buf fd to the VMM (crosvm or qemu).
> >>>> Wow, wait a second. Who is giving whom the DMA-buf fd here?
> >>> Not sure I understand the question.. the dma-buf fd could come from
> >>> EGL_MESA_image_dma_buf_export, gbm, or similar.
> >>>
> >>>> My last status was that this design was illegal and couldn't be
> >>>> implemented because it requires internal knowledge only the exporting
> >>>> driver can have.
> >>> This ioctl provides that information from the exporting driver so that
> >>> a VMM doesn't have to make assumptions ;-)
> >> And exactly that was NAKed the last time it came up. Only the exporting
> >> driver is allowed to mmap() the DMA-buf into the guest.
> > except the exporting driver is in the host ;-)
> >
> >> This way you also don't need to transport any caching information anywhere.
> >>
> >>> Currently crosvm assumes if (drivername == "i915") then it is a cached
> >>> mapping, otherwise it is wc.  I'm trying to find a way to fix this.
> >>> Suggestions welcome, but because of how mapping to a guest VM works, a
> >>> VMM is a somewhat special case where this information is needed in
> >>> userspace.
> >> Ok that leaves me completely puzzled. How does that work in the first 
> >> place?
> >>
> >> In other words how does the mapping into the guest page tables happen?
> > There are multiple levels to this, but in short mapping to guest
> > userspace happens via drm/virtio (aka "virtio_gpu" or "virtgpu"), the
> > cache attributes are set via "map_info" attribute returned from the
> > host VMM (host userspace, hence the need for this ioctl).
> >
> > In the host, the host kernel driver mmaps to host userspace (VMM).
> > Here the exporting driver is performing the mmap with correct cache
> > attributes.
>
> > The VMM uses KVM to map these pages into the guest so
>
> And exactly that was declared completely illegal the last time it came
> up on the mailing list.
>
> Daniel implemented a whole bunch of patches into the DMA-buf layer to
> make it impossible for KVM to do this.

This issue isn't really with KVM, it is not making any CPU mappings
itself.  KVM is just making the pages available to the guest.  Like I
said the CPU mapping to the guest userspace is setup by virtgpu.  But
based on information that the host VMM provides.  This patch simply
provides a way for the host VMM to provide the correct information.

> I have absolutely no idea why that is now a topic again and why anybody
> is still using this approach.

Because this is how VMMs work.  And it is how the virtgpu device
spec[1] is designed.

[1] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-gpu.tex#L767

> @Daniel can you clarify.

Like I've said, I'd be happy to hear alternative suggestions.  But the
root problem is that it is not possible for the host kernel to
directly map into guest userspace.  So I don't really see an
alternative.  Even if it were possible for host kernel to directly map
to guest userspace, that ship has already sailed as far as virtio
device specification.

BR,
-R

> Thanks,
> Christian.
>
> > they appear as physical pages to the guest kernel.  The guest kernel
> > (virtgpu) in turn maps them to guest userspace.
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
> >>>
> >>>> @Daniel has anything changed on that is or my status still valid?
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>>  In addition to mapping the
> >>>>> pages into the guest VM, it needs to report to drm/virtio in the guest
> >>>>

Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-07 Thread Rob Clark
On Sun, Aug 7, 2022 at 10:14 AM Christian König
 wrote:
>
> Am 07.08.22 um 19:02 schrieb Rob Clark:
> > On Sun, Aug 7, 2022 at 9:09 AM Christian König
> >  wrote:
> >> Am 29.07.22 um 19:07 schrieb Rob Clark:
> >>> From: Rob Clark 
> >>>
> >>> This is a fairly narrowly focused interface, providing a way for a VMM
> >>> in userspace to tell the guest kernel what pgprot settings to use when
> >>> mapping a buffer to guest userspace.
> >>>
> >>> For buffers that get mapped into guest userspace, virglrenderer returns
> >>> a dma-buf fd to the VMM (crosvm or qemu).
> >> Wow, wait a second. Who is giving whom the DMA-buf fd here?
> > Not sure I understand the question.. the dma-buf fd could come from
> > EGL_MESA_image_dma_buf_export, gbm, or similar.
> >
> >> My last status was that this design was illegal and couldn't be
> >> implemented because it requires internal knowledge only the exporting
> >> driver can have.
> > This ioctl provides that information from the exporting driver so that
> > a VMM doesn't have to make assumptions ;-)
>
> And exactly that was NAKed the last time it came up. Only the exporting
> driver is allowed to mmap() the DMA-buf into the guest.

except the exporting driver is in the host ;-)

> This way you also don't need to transport any caching information anywhere.
>
> > Currently crosvm assumes if (drivername == "i915") then it is a cached
> > mapping, otherwise it is wc.  I'm trying to find a way to fix this.
> > Suggestions welcome, but because of how mapping to a guest VM works, a
> > VMM is a somewhat special case where this information is needed in
> > userspace.
>
> Ok that leaves me completely puzzled. How does that work in the first place?
>
> In other words how does the mapping into the guest page tables happen?

There are multiple levels to this, but in short mapping to guest
userspace happens via drm/virtio (aka "virtio_gpu" or "virtgpu"), the
cache attributes are set via "map_info" attribute returned from the
host VMM (host userspace, hence the need for this ioctl).

In the host, the host kernel driver mmaps to host userspace (VMM).
Here the exporting driver is performing the mmap with correct cache
attributes.  The VMM uses KVM to map these pages into the guest so
they appear as physical pages to the guest kernel.  The guest kernel
(virtgpu) in turn maps them to guest userspace.

BR,
-R

>
> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> @Daniel has anything changed on that is or my status still valid?
> >>
> >> Regards,
> >> Christian.
> >>
> >>> In addition to mapping the
> >>> pages into the guest VM, it needs to report to drm/virtio in the guest
> >>> the cache settings to use for guest userspace.  In particular, on some
> >>> architectures, creating aliased mappings with different cache attributes
> >>> is frowned upon, so it is important that the guest mappings have the
> >>> same cache attributes as any potential host mappings.
> >>>
> >>> Signed-off-by: Rob Clark 
> >>> ---
> >>>drivers/dma-buf/dma-buf.c| 26 ++
> >>>include/linux/dma-buf.h  |  7 +++
> >>>include/uapi/linux/dma-buf.h | 28 
> >>>3 files changed, 61 insertions(+)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index 32f55640890c..d02d6c2a3b49 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -326,6 +326,29 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, 
> >>> const char __user *buf)
> >>>return 0;
> >>>}
> >>>
> >>> +static long dma_buf_info(struct dma_buf *dmabuf, const void __user *uarg)
> >>> +{
> >>> + struct dma_buf_info arg;
> >>> +
> >>> + if (copy_from_user(, uarg, sizeof(arg)))
> >>> + return -EFAULT;
> >>> +
> >>> + switch (arg.param) {
> >>> + case DMA_BUF_INFO_VM_PROT:
> >>> + if (!dmabuf->ops->mmap_info)
> >>> + return -ENOSYS;
> >>> + arg.value = dmabuf->ops->mmap_info(dmabuf);
> >>> + break;
> >>> + default:
> >>> + return -EINVAL;
> >>> + }
> >>> +

[PATCH] drm/msm: Add fault-injection support

2022-08-07 Thread Rob Clark
From: Rob Clark 

Intended as a way to trigger error paths in mesa.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_debugfs.c |  8 
 drivers/gpu/drm/msm/msm_drv.c | 15 +++
 drivers/gpu/drm/msm/msm_drv.h |  7 +++
 3 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_debugfs.c 
b/drivers/gpu/drm/msm/msm_debugfs.c
index ea2a20699cb4..a515ddcec007 100644
--- a/drivers/gpu/drm/msm/msm_debugfs.c
+++ b/drivers/gpu/drm/msm/msm_debugfs.c
@@ -7,6 +7,7 @@
 #ifdef CONFIG_DEBUG_FS
 
 #include 
+#include 
 
 #include 
 #include 
@@ -325,6 +326,13 @@ void msm_debugfs_init(struct drm_minor *minor)
 
if (priv->kms && priv->kms->funcs->debugfs_init)
priv->kms->funcs->debugfs_init(priv->kms, minor);
+
+#ifdef CONFIG_FAULT_INJECTION
+   fault_create_debugfs_attr("fail_gem_alloc", minor->debugfs_root,
+ _gem_alloc);
+   fault_create_debugfs_attr("fail_gem_iova", minor->debugfs_root,
+ _gem_iova);
+#endif
 }
 #endif
 
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4979aa8187ec..6b1b483ddd59 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -6,6 +6,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -78,6 +79,11 @@ static bool modeset = true;
 MODULE_PARM_DESC(modeset, "Use kernel modesetting [KMS] (1=on (default), 
0=disable)");
 module_param(modeset, bool, 0600);
 
+#ifdef CONFIG_FAULT_INJECTION
+DECLARE_FAULT_ATTR(fail_gem_alloc);
+DECLARE_FAULT_ATTR(fail_gem_iova);
+#endif
+
 static irqreturn_t msm_irq(int irq, void *arg)
 {
struct drm_device *dev = arg;
@@ -701,6 +707,9 @@ static int msm_ioctl_gem_new(struct drm_device *dev, void 
*data,
flags |= MSM_BO_WC;
}
 
+   if (should_fail(_gem_alloc, args->size))
+   return -ENOMEM;
+
return msm_gem_new_handle(dev, file, args->size,
args->flags, >handle, NULL);
 }
@@ -762,6 +771,9 @@ static int msm_ioctl_gem_info_iova(struct drm_device *dev,
if (!priv->gpu)
return -EINVAL;
 
+   if (should_fail(_gem_iova, obj->size))
+   return -ENOMEM;
+
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
@@ -783,6 +795,9 @@ static int msm_ioctl_gem_info_set_iova(struct drm_device 
*dev,
if (priv->gpu->aspace == ctx->aspace)
return -EOPNOTSUPP;
 
+   if (should_fail(_gem_iova, obj->size))
+   return -ENOMEM;
+
return msm_gem_set_iova(obj, ctx->aspace, iova);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b4ace34ec889..e830f9609f2d 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -34,6 +34,13 @@
 #include 
 #include 
 
+#ifdef CONFIG_FAULT_INJECTION
+extern struct fault_attr fail_gem_alloc;
+extern struct fault_attr fail_gem_iova;
+#else
+#  define should_fail(attr, size) 0
+#endif
+
 struct msm_kms;
 struct msm_gpu;
 struct msm_mmu;
-- 
2.36.1



Re: [Linaro-mm-sig] [PATCH 1/3] dma-buf: Add ioctl to query mmap info

2022-08-07 Thread Rob Clark
On Sun, Aug 7, 2022 at 9:09 AM Christian König
 wrote:
>
> Am 29.07.22 um 19:07 schrieb Rob Clark:
> > From: Rob Clark 
> >
> > This is a fairly narrowly focused interface, providing a way for a VMM
> > in userspace to tell the guest kernel what pgprot settings to use when
> > mapping a buffer to guest userspace.
> >
> > For buffers that get mapped into guest userspace, virglrenderer returns
> > a dma-buf fd to the VMM (crosvm or qemu).
>
> Wow, wait a second. Who is giving whom the DMA-buf fd here?

Not sure I understand the question.. the dma-buf fd could come from
EGL_MESA_image_dma_buf_export, gbm, or similar.

> My last status was that this design was illegal and couldn't be
> implemented because it requires internal knowledge only the exporting
> driver can have.

This ioctl provides that information from the exporting driver so that
a VMM doesn't have to make assumptions ;-)

Currently crosvm assumes if (drivername == "i915") then it is a cached
mapping, otherwise it is wc.  I'm trying to find a way to fix this.
Suggestions welcome, but because of how mapping to a guest VM works, a
VMM is a somewhat special case where this information is needed in
userspace.

BR,
-R

> @Daniel has anything changed on that is or my status still valid?
>
> Regards,
> Christian.
>
> >In addition to mapping the
> > pages into the guest VM, it needs to report to drm/virtio in the guest
> > the cache settings to use for guest userspace.  In particular, on some
> > architectures, creating aliased mappings with different cache attributes
> > is frowned upon, so it is important that the guest mappings have the
> > same cache attributes as any potential host mappings.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/dma-buf/dma-buf.c| 26 ++
> >   include/linux/dma-buf.h  |  7 +++
> >   include/uapi/linux/dma-buf.h | 28 
> >   3 files changed, 61 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 32f55640890c..d02d6c2a3b49 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -326,6 +326,29 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, 
> > const char __user *buf)
> >   return 0;
> >   }
> >
> > +static long dma_buf_info(struct dma_buf *dmabuf, const void __user *uarg)
> > +{
> > + struct dma_buf_info arg;
> > +
> > + if (copy_from_user(, uarg, sizeof(arg)))
> > + return -EFAULT;
> > +
> > + switch (arg.param) {
> > + case DMA_BUF_INFO_VM_PROT:
> > + if (!dmabuf->ops->mmap_info)
> > + return -ENOSYS;
> > + arg.value = dmabuf->ops->mmap_info(dmabuf);
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + if (copy_to_user(uarg, , sizeof(arg)))
> > + return -EFAULT;
> > +
> > + return 0;
> > +}
> > +
> >   static long dma_buf_ioctl(struct file *file,
> > unsigned int cmd, unsigned long arg)
> >   {
> > @@ -369,6 +392,9 @@ static long dma_buf_ioctl(struct file *file,
> >   case DMA_BUF_SET_NAME_B:
> >   return dma_buf_set_name(dmabuf, (const char __user *)arg);
> >
> > + case DMA_BUF_IOCTL_INFO:
> > + return dma_buf_info(dmabuf, (const void __user *)arg);
> > +
> >   default:
> >   return -ENOTTY;
> >   }
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 71731796c8c3..6f4de64a5937 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -283,6 +283,13 @@ struct dma_buf_ops {
> >*/
> >   int (*mmap)(struct dma_buf *, struct vm_area_struct *vma);
> >
> > + /**
> > +  * @mmap_info:
> > +  *
> > +  * Return mmapping info for the buffer.  See DMA_BUF_INFO_VM_PROT.
> > +  */
> > + int (*mmap_info)(struct dma_buf *);
> > +
> >   int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map);
> >   void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map);
> >   };
> > diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> > index b1523cb8ab30..a41adac0f46a 100644
> > --- a/include/uapi/linux/dma-buf.h
> > +++ b/include/uapi/linux/dma-buf.h
> > @@ -85,6 +85,32 @@ struct dma_buf_sync {
> >
> >   #define DMA_BUF_NAME_LEN32
>

[PATCH v2 2/2] drm/msm/rd: Fix FIFO-full deadlock

2022-08-07 Thread Rob Clark
From: Rob Clark 

If the previous thing cat'ing $debugfs/rd left the FIFO full, then
subsequent open could deadlock in rd_write() (because open is blocked,
not giving a chance for read() to consume any data in the FIFO).  Also
it is generally a good idea to clear out old data from the FIFO.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_rd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_rd.c b/drivers/gpu/drm/msm/msm_rd.c
index a92ffde53f0b..db2f847c8535 100644
--- a/drivers/gpu/drm/msm/msm_rd.c
+++ b/drivers/gpu/drm/msm/msm_rd.c
@@ -196,6 +196,9 @@ static int rd_open(struct inode *inode, struct file *file)
file->private_data = rd;
rd->open = true;
 
+   /* Reset fifo to clear any previously unread data: */
+   rd->fifo.head = rd->fifo.tail = 0;
+
/* the parsing tools need to know gpu-id to know which
 * register database to load.
 *
-- 
2.36.1



[PATCH v2 1/2] drm/msm: Move hangcheck timer restart

2022-08-07 Thread Rob Clark
From: Rob Clark 

Don't directly restart the hangcheck timer from the timer handler, but
instead start it after the recover_worker replays remaining jobs.

If the kthread is blocked for other reasons, there is no point to
immediately restart the timer.  Fixes a random symptom of the problem
fixed in the next patch.

v2: Keep the hangcheck timer restart in the timer handler in the case
where we aren't scheduling recover_worker

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index fba85f894314..6762001d9945 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -328,6 +328,7 @@ find_submit(struct msm_ringbuffer *ring, uint32_t fence)
 }
 
 static void retire_submits(struct msm_gpu *gpu);
+static void hangcheck_timer_reset(struct msm_gpu *gpu);
 
 static void get_comm_cmdline(struct msm_gem_submit *submit, char **comm, char 
**cmd)
 {
@@ -420,6 +421,8 @@ static void recover_worker(struct kthread_work *work)
}
 
if (msm_gpu_active(gpu)) {
+   bool restart_hangcheck = false;
+
/* retire completed submits, plus the one that hung: */
retire_submits(gpu);
 
@@ -436,10 +439,15 @@ static void recover_worker(struct kthread_work *work)
unsigned long flags;
 
spin_lock_irqsave(>submit_lock, flags);
-   list_for_each_entry(submit, >submits, node)
+   list_for_each_entry(submit, >submits, node) {
gpu->funcs->submit(gpu, submit);
+   restart_hangcheck = true;
+   }
spin_unlock_irqrestore(>submit_lock, flags);
}
+
+   if (restart_hangcheck)
+   hangcheck_timer_reset(gpu);
}
 
mutex_unlock(>lock);
@@ -498,6 +506,7 @@ static void hangcheck_handler(struct timer_list *t)
struct drm_device *dev = gpu->dev;
struct msm_ringbuffer *ring = gpu->funcs->active_ring(gpu);
uint32_t fence = ring->memptrs->fence;
+   bool restart_hangcheck = true;
 
if (fence != ring->hangcheck_fence) {
/* some progress has been made.. ya! */
@@ -513,10 +522,16 @@ static void hangcheck_handler(struct timer_list *t)
gpu->name, ring->fctx->last_fence);
 
kthread_queue_work(gpu->worker, >recover_work);
+
+   /* If we do recovery, we want to defer restarting the hangcheck
+* timer until recovery completes and the remaining non-guilty
+* jobs are re-played.
+*/
+   restart_hangcheck = false;
}
 
/* if still more pending work, reset the hangcheck timer: */
-   if (fence_after(ring->fctx->last_fence, ring->hangcheck_fence))
+   if (restart_hangcheck && fence_after(ring->fctx->last_fence, 
ring->hangcheck_fence))
hangcheck_timer_reset(gpu);
 
/* workaround for missing irq: */
-- 
2.36.1



Re: [PATCH 1/2] drm/msm: Move hangcheck timer restart

2022-08-04 Thread Rob Clark
On Thu, Aug 4, 2022 at 12:53 AM Akhil P Oommen  wrote:
>
> On 8/4/2022 1:59 AM, Rob Clark wrote:
> > On Wed, Aug 3, 2022 at 12:52 PM Akhil P Oommen  
> > wrote:
> >> On 8/3/2022 10:53 PM, Rob Clark wrote:
> >>> From: Rob Clark 
> >>>
> >>> Don't directly restart the hangcheck timer from the timer handler, but
> >>> instead start it after the recover_worker replays remaining jobs.
> >>>
> >>> If the kthread is blocked for other reasons, there is no point to
> >>> immediately restart the timer.  Fixes a random symptom of the problem
> >>> fixed in the next patch.
> >>>
> >>> Signed-off-by: Rob Clark 
> >>> ---
> >>>drivers/gpu/drm/msm/msm_gpu.c | 14 +-
> >>>1 file changed, 9 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> >>> index fba85f894314..8f9c48eabf7d 100644
> >>> --- a/drivers/gpu/drm/msm/msm_gpu.c
> >>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> >>> @@ -328,6 +328,7 @@ find_submit(struct msm_ringbuffer *ring, uint32_t 
> >>> fence)
> >>>}
> >>>
> >>>static void retire_submits(struct msm_gpu *gpu);
> >>> +static void hangcheck_timer_reset(struct msm_gpu *gpu);
> >>>
> >>>static void get_comm_cmdline(struct msm_gem_submit *submit, char 
> >>> **comm, char **cmd)
> >>>{
> >>> @@ -420,6 +421,8 @@ static void recover_worker(struct kthread_work *work)
> >>>}
> >>>
> >>>if (msm_gpu_active(gpu)) {
> >>> + bool restart_hangcheck = false;
> >>> +
> >>>/* retire completed submits, plus the one that hung: */
> >>>retire_submits(gpu);
> >>>
> >>> @@ -436,10 +439,15 @@ static void recover_worker(struct kthread_work 
> >>> *work)
> >>>unsigned long flags;
> >>>
> >>>spin_lock_irqsave(>submit_lock, flags);
> >>> - list_for_each_entry(submit, >submits, node)
> >>> + list_for_each_entry(submit, >submits, node) {
> >>>gpu->funcs->submit(gpu, submit);
> >>> + restart_hangcheck = true;
> >>> + }
> >>>spin_unlock_irqrestore(>submit_lock, flags);
> >>>}
> >>> +
> >>> + if (restart_hangcheck)
> >>> + hangcheck_timer_reset(gpu);
> >>>}
> >>>
> >>>mutex_unlock(>lock);
> >>> @@ -515,10 +523,6 @@ static void hangcheck_handler(struct timer_list *t)
> >>>kthread_queue_work(gpu->worker, >recover_work);
> >>>}
> >>>
> >>> - /* if still more pending work, reset the hangcheck timer: */
> >> In the scenario mentioned here, shouldn't we restart the timer?
> > yeah, actually the case where we don't want to restart the timer is
> > *only* when we schedule recover_work..
> >
> > BR,
> > -R
> Not sure if your codebase is different but based on msm-next branch,
> when "if (fence != ring->hangcheck_fence)" is true, we now skip
> rescheduling the timer. I don't think that is what we want. There should
> be a hangcheck timer running as long as there is an active submit,
> unless we have scheduled a recover_work here.
>

right, v2 will change that to only skip rescheduling the timer in the
recover path

BR,
-R

> -Akhil.
> >
> >> -Akhil.
> >>> - if (fence_after(ring->fctx->last_fence, ring->hangcheck_fence))
> >>> - hangcheck_timer_reset(gpu);
> >>> -
> >>>/* workaround for missing irq: */
> >>>msm_gpu_retire(gpu);
> >>>}
> >>>
>


Re: [PATCH 1/2] drm/msm: Move hangcheck timer restart

2022-08-03 Thread Rob Clark
On Wed, Aug 3, 2022 at 12:52 PM Akhil P Oommen  wrote:
>
> On 8/3/2022 10:53 PM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Don't directly restart the hangcheck timer from the timer handler, but
> > instead start it after the recover_worker replays remaining jobs.
> >
> > If the kthread is blocked for other reasons, there is no point to
> > immediately restart the timer.  Fixes a random symptom of the problem
> > fixed in the next patch.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/msm/msm_gpu.c | 14 +-
> >   1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index fba85f894314..8f9c48eabf7d 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -328,6 +328,7 @@ find_submit(struct msm_ringbuffer *ring, uint32_t fence)
> >   }
> >
> >   static void retire_submits(struct msm_gpu *gpu);
> > +static void hangcheck_timer_reset(struct msm_gpu *gpu);
> >
> >   static void get_comm_cmdline(struct msm_gem_submit *submit, char **comm, 
> > char **cmd)
> >   {
> > @@ -420,6 +421,8 @@ static void recover_worker(struct kthread_work *work)
> >   }
> >
> >   if (msm_gpu_active(gpu)) {
> > + bool restart_hangcheck = false;
> > +
> >   /* retire completed submits, plus the one that hung: */
> >   retire_submits(gpu);
> >
> > @@ -436,10 +439,15 @@ static void recover_worker(struct kthread_work *work)
> >   unsigned long flags;
> >
> >   spin_lock_irqsave(>submit_lock, flags);
> > - list_for_each_entry(submit, >submits, node)
> > + list_for_each_entry(submit, >submits, node) {
> >   gpu->funcs->submit(gpu, submit);
> > + restart_hangcheck = true;
> > + }
> >   spin_unlock_irqrestore(>submit_lock, flags);
> >   }
> > +
> > + if (restart_hangcheck)
> > + hangcheck_timer_reset(gpu);
> >   }
> >
> >   mutex_unlock(>lock);
> > @@ -515,10 +523,6 @@ static void hangcheck_handler(struct timer_list *t)
> >   kthread_queue_work(gpu->worker, >recover_work);
> >   }
> >
> > - /* if still more pending work, reset the hangcheck timer: */
> In the scenario mentioned here, shouldn't we restart the timer?

yeah, actually the case where we don't want to restart the timer is
*only* when we schedule recover_work..

BR,
-R

>
> -Akhil.
> > - if (fence_after(ring->fctx->last_fence, ring->hangcheck_fence))
> > - hangcheck_timer_reset(gpu);
> > -
> >   /* workaround for missing irq: */
> >   msm_gpu_retire(gpu);
> >   }
> >
>


[PATCH 2/2] drm/msm/rd: Fix FIFO-full deadlock

2022-08-03 Thread Rob Clark
From: Rob Clark 

If the previous thing cat'ing $debugfs/rd left the FIFO full, then
subsequent open could deadlock in rd_write() (because open is blocked,
not giving a chance for read() to consume any data in the FIFO).  Also
it is generally a good idea to clear out old data from the FIFO.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_rd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_rd.c b/drivers/gpu/drm/msm/msm_rd.c
index a92ffde53f0b..db2f847c8535 100644
--- a/drivers/gpu/drm/msm/msm_rd.c
+++ b/drivers/gpu/drm/msm/msm_rd.c
@@ -196,6 +196,9 @@ static int rd_open(struct inode *inode, struct file *file)
file->private_data = rd;
rd->open = true;
 
+   /* Reset fifo to clear any previously unread data: */
+   rd->fifo.head = rd->fifo.tail = 0;
+
/* the parsing tools need to know gpu-id to know which
 * register database to load.
 *
-- 
2.36.1



[PATCH 1/2] drm/msm: Move hangcheck timer restart

2022-08-03 Thread Rob Clark
From: Rob Clark 

Don't directly restart the hangcheck timer from the timer handler, but
instead start it after the recover_worker replays remaining jobs.

If the kthread is blocked for other reasons, there is no point to
immediately restart the timer.  Fixes a random symptom of the problem
fixed in the next patch.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gpu.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index fba85f894314..8f9c48eabf7d 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -328,6 +328,7 @@ find_submit(struct msm_ringbuffer *ring, uint32_t fence)
 }
 
 static void retire_submits(struct msm_gpu *gpu);
+static void hangcheck_timer_reset(struct msm_gpu *gpu);
 
 static void get_comm_cmdline(struct msm_gem_submit *submit, char **comm, char 
**cmd)
 {
@@ -420,6 +421,8 @@ static void recover_worker(struct kthread_work *work)
}
 
if (msm_gpu_active(gpu)) {
+   bool restart_hangcheck = false;
+
/* retire completed submits, plus the one that hung: */
retire_submits(gpu);
 
@@ -436,10 +439,15 @@ static void recover_worker(struct kthread_work *work)
unsigned long flags;
 
spin_lock_irqsave(>submit_lock, flags);
-   list_for_each_entry(submit, >submits, node)
+   list_for_each_entry(submit, >submits, node) {
gpu->funcs->submit(gpu, submit);
+   restart_hangcheck = true;
+   }
spin_unlock_irqrestore(>submit_lock, flags);
}
+
+   if (restart_hangcheck)
+   hangcheck_timer_reset(gpu);
}
 
mutex_unlock(>lock);
@@ -515,10 +523,6 @@ static void hangcheck_handler(struct timer_list *t)
kthread_queue_work(gpu->worker, >recover_work);
}
 
-   /* if still more pending work, reset the hangcheck timer: */
-   if (fence_after(ring->fctx->last_fence, ring->hangcheck_fence))
-   hangcheck_timer_reset(gpu);
-
/* workaround for missing irq: */
msm_gpu_retire(gpu);
 }
-- 
2.36.1



Re: [PATCH 0/5] clk/qcom: Support gdsc collapse polling using 'reset' inteface

2022-08-02 Thread Rob Clark
On Tue, Aug 2, 2022 at 12:02 AM Dmitry Baryshkov
 wrote:
>
> On 30/07/2022 12:17, Akhil P Oommen wrote:
> >
> > Some clients like adreno gpu driver would like to ensure that its gdsc
> > is collapsed at hardware during a gpu reset sequence. This is because it
> > has a votable gdsc which could be ON due to a vote from another subsystem
> > like tz, hyp etc or due to an internal hardware signal.
>
> If this is votable, do we have any guarantee that the gdsc will collapse
> at all? How can we proceed if it did not collapse?

Other potential votes should be transient.  But I guess we eventually
need to timeout and give up.  At which point we are no worse off than
before.

But hmm, we aren't using RBBM_SW_RESET_CMD for sw reset like we have
on previous generations?  That does seem a bit odd.  Looks like kgsl
does use it.

BR,
-R

> > To allow
> > this, gpucc driver can expose an interface to the client driver using
> > reset framework. Using this the client driver can trigger a polling within
> > the gdsc driver.
>
> Trigger the polling made me think initially that we will actually
> trigger something in the HW. Instead the client uses reset framework to
> poll for the gdsc to be reset.
>
> >
> > This series is rebased on top of linus's master branch.
> >
> > Related discussion: https://patchwork.freedesktop.org/patch/493144/
> >
> >
> > Akhil P Oommen (5):
> >dt-bindings: clk: qcom: Support gpu cx gdsc reset
> >clk: qcom: Allow custom reset ops
> >clk: qcom: gpucc-sc7280: Add cx collapse reset support
> >clk: qcom: gdsc: Add a reset op to poll gdsc collapse
> >arm64: dts: qcom: sc7280: Add Reset support for gpu
> >
> >   arch/arm64/boot/dts/qcom/sc7280.dtsi  |  3 +++
> >   drivers/clk/qcom/gdsc.c   | 23 +++
> >   drivers/clk/qcom/gdsc.h   |  7 +++
> >   drivers/clk/qcom/gpucc-sc7280.c   |  6 ++
> >   drivers/clk/qcom/reset.c  |  6 ++
> >   drivers/clk/qcom/reset.h  |  2 ++
> >   include/dt-bindings/clock/qcom,gpucc-sc7280.h |  3 +++
> >   7 files changed, 46 insertions(+), 4 deletions(-)
> >
>
>
> --
> With best wishes
> Dmitry


[PATCH v4 15/15] drm/msm/gem: Convert to lockdep assert

2022-08-02 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 3c6add51d13b..c4844cf3a585 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -197,8 +197,8 @@ msm_gem_unlock(struct drm_gem_object *obj)
dma_resv_unlock(obj->resv);
 }
 
-static inline bool
-msm_gem_is_locked(struct drm_gem_object *obj)
+static inline void
+msm_gem_assert_locked(struct drm_gem_object *obj)
 {
/*
 * Destroying the object is a special case.. msm_gem_free_object()
@@ -212,13 +212,10 @@ msm_gem_is_locked(struct drm_gem_object *obj)
 * Unfortunately lockdep is not aware of this detail.  So when the
 * refcount drops to zero, we pretend it is already locked.
 */
-   return dma_resv_is_locked(obj->resv) || (kref_read(>refcount) == 
0);
-}
-
-static inline void
-msm_gem_assert_locked(struct drm_gem_object *obj)
-{
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   lockdep_assert_once(
+   (kref_read(>refcount) == 0) ||
+   (lockdep_is_held(>resv->lock.base) != LOCK_STATE_NOT_HELD)
+   );
 }
 
 /* imported/exported objects are not purgeable: */
-- 
2.36.1



[PATCH v4 09/15] drm/gem: Add LRU/shrinker helper

2022-08-02 Thread Rob Clark
From: Rob Clark 

Add a simple LRU helper to assist with driver's shrinker implementation.
It handles tracking the number of backing pages associated with a given
LRU, and provides a helper to implement shrinker_scan.

A driver can use multiple LRU instances to track objects in various
states, for example a dontneed LRU for purgeable objects, a willneed LRU
for evictable objects, and an unpinned LRU for objects without backing
pages.

All LRUs that the object can be moved between must share a single lock.

v2: lockdep_assert_held() instead of WARN_ON(!mutex_is_locked())
v3: make drm_gem_lru_move_tail_locked() static until there is a user

Cc: Daniel Vetter 
Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Osipenko 
---
 drivers/gpu/drm/drm_gem.c | 170 ++
 include/drm/drm_gem.h |  55 
 2 files changed, 225 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..556714c10472 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -165,6 +165,7 @@ void drm_gem_private_object_init(struct drm_device *dev,
obj->resv = >_resv;
 
drm_vma_node_reset(>vma_node);
+   INIT_LIST_HEAD(>lru_node);
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
@@ -951,6 +952,7 @@ drm_gem_object_release(struct drm_gem_object *obj)
 
dma_resv_fini(>_resv);
drm_gem_free_mmap_offset(obj);
+   drm_gem_lru_remove(obj);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
 
@@ -1274,3 +1276,171 @@ drm_gem_unlock_reservations(struct drm_gem_object 
**objs, int count,
ww_acquire_fini(acquire_ctx);
 }
 EXPORT_SYMBOL(drm_gem_unlock_reservations);
+
+/**
+ * drm_gem_lru_init - initialize a LRU
+ *
+ * @lru: The LRU to initialize
+ * @lock: The lock protecting the LRU
+ */
+void
+drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock)
+{
+   lru->lock = lock;
+   lru->count = 0;
+   INIT_LIST_HEAD(>list);
+}
+EXPORT_SYMBOL(drm_gem_lru_init);
+
+static void
+drm_gem_lru_remove_locked(struct drm_gem_object *obj)
+{
+   obj->lru->count -= obj->size >> PAGE_SHIFT;
+   WARN_ON(obj->lru->count < 0);
+   list_del(>lru_node);
+   obj->lru = NULL;
+}
+
+/**
+ * drm_gem_lru_remove - remove object from whatever LRU it is in
+ *
+ * If the object is currently in any LRU, remove it.
+ *
+ * @obj: The GEM object to remove from current LRU
+ */
+void
+drm_gem_lru_remove(struct drm_gem_object *obj)
+{
+   struct drm_gem_lru *lru = obj->lru;
+
+   if (!lru)
+   return;
+
+   mutex_lock(lru->lock);
+   drm_gem_lru_remove_locked(obj);
+   mutex_unlock(lru->lock);
+}
+EXPORT_SYMBOL(drm_gem_lru_remove);
+
+static void
+drm_gem_lru_move_tail_locked(struct drm_gem_lru *lru, struct drm_gem_object 
*obj)
+{
+   lockdep_assert_held_once(lru->lock);
+
+   if (obj->lru)
+   drm_gem_lru_remove_locked(obj);
+
+   lru->count += obj->size >> PAGE_SHIFT;
+   list_add_tail(>lru_node, >list);
+   obj->lru = lru;
+}
+
+/**
+ * drm_gem_lru_move_tail - move the object to the tail of the LRU
+ *
+ * If the object is already in this LRU it will be moved to the
+ * tail.  Otherwise it will be removed from whichever other LRU
+ * it is in (if any) and moved into this LRU.
+ *
+ * @lru: The LRU to move the object into.
+ * @obj: The GEM object to move into this LRU
+ */
+void
+drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj)
+{
+   mutex_lock(lru->lock);
+   drm_gem_lru_move_tail_locked(lru, obj);
+   mutex_unlock(lru->lock);
+}
+EXPORT_SYMBOL(drm_gem_lru_move_tail);
+
+/**
+ * drm_gem_lru_scan - helper to implement shrinker.scan_objects
+ *
+ * If the shrink callback succeeds, it is expected that the driver
+ * move the object out of this LRU.
+ *
+ * If the LRU possibly contain active buffers, it is the responsibility
+ * of the shrink callback to check for this (ie. dma_resv_test_signaled())
+ * or if necessary block until the buffer becomes idle.
+ *
+ * @lru: The LRU to scan
+ * @nr_to_scan: The number of pages to try to reclaim
+ * @shrink: Callback to try to shrink/reclaim the object.
+ */
+unsigned long
+drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan,
+bool (*shrink)(struct drm_gem_object *obj))
+{
+   struct drm_gem_lru still_in_lru;
+   struct drm_gem_object *obj;
+   unsigned freed = 0;
+
+   drm_gem_lru_init(_in_lru, lru->lock);
+
+   mutex_lock(lru->lock);
+
+   while (freed < nr_to_scan) {
+   obj = list_first_entry_or_null(>list, typeof(*obj), 
lru_node);
+
+   if (!obj)
+   break;
+
+   drm_gem_lru_move_tail_locked(_in_lru, obj);
+
+   /*
+* If it's in the process of being freed, gem_obj

[PATCH v4 14/15] drm/msm/gem: Add msm_gem_assert_locked()

2022-08-02 Thread Rob Clark
From: Rob Clark 

All use of msm_gem_is_locked() is just for WARN_ON()s, so extract out
into an msm_gem_assert_locked() patch.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 36 +--
 drivers/gpu/drm/msm/msm_gem.h |  8 +++-
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index d4e8af46f4ef..1dee0d18abbb 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -97,7 +97,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
if (!msm_obj->pages) {
struct drm_device *dev = obj->dev;
@@ -183,7 +183,7 @@ static struct page **msm_gem_pin_pages_locked(struct 
drm_gem_object *obj)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct page **p;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
return ERR_PTR(-EBUSY);
@@ -278,7 +278,7 @@ static uint64_t mmap_offset(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
int ret;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
/* Make it mmapable */
ret = drm_gem_create_mmap_offset(obj);
@@ -307,7 +307,7 @@ static struct msm_gem_vma *add_vma(struct drm_gem_object 
*obj,
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
vma = kzalloc(sizeof(*vma), GFP_KERNEL);
if (!vma)
@@ -326,7 +326,7 @@ static struct msm_gem_vma *lookup_vma(struct drm_gem_object 
*obj,
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
list_for_each_entry(vma, _obj->vmas, list) {
if (vma->aspace == aspace)
@@ -357,7 +357,7 @@ put_iova_spaces(struct drm_gem_object *obj, bool close)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
list_for_each_entry(vma, _obj->vmas, list) {
if (vma->aspace) {
@@ -375,7 +375,7 @@ put_iova_vmas(struct drm_gem_object *obj)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma, *tmp;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
list_for_each_entry_safe(vma, tmp, _obj->vmas, list) {
del_vma(vma);
@@ -388,7 +388,7 @@ static struct msm_gem_vma *get_vma_locked(struct 
drm_gem_object *obj,
 {
struct msm_gem_vma *vma;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
vma = lookup_vma(obj, aspace);
 
@@ -428,7 +428,7 @@ int msm_gem_pin_vma_locked(struct drm_gem_object *obj, 
struct msm_gem_vma *vma)
if (msm_obj->flags & MSM_BO_CACHED_COHERENT)
prot |= IOMMU_CACHE;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
return -EBUSY;
@@ -448,7 +448,7 @@ void msm_gem_unpin_locked(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
msm_obj->pin_count--;
GEM_WARN_ON(msm_obj->pin_count < 0);
@@ -469,7 +469,7 @@ static int get_and_pin_iova_range_locked(struct 
drm_gem_object *obj,
struct msm_gem_vma *vma;
int ret;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
vma = get_vma_locked(obj, aspace, range_start, range_end);
if (IS_ERR(vma))
@@ -630,7 +630,7 @@ static void *get_vaddr(struct drm_gem_object *obj, unsigned 
madv)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
int ret = 0;
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
 
if (obj->import_attach)
return ERR_PTR(-ENODEV);
@@ -703,7 +703,7 @@ void msm_gem_put_vaddr_locked(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   msm_gem_assert_locked(obj);
GEM_WARN_ON(msm_obj->vmap_count < 1);
 
msm_obj->vmap_count--;
@@ -745,7 +745,7 @@ void msm_gem_purge(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   

[PATCH v4 12/15] drm/msm/gem: Consolidate shrinker trace

2022-08-02 Thread Rob Clark
From: Rob Clark 

Combine separate trace events for purge vs evict into one.  When we add
support for purging/evicting active buffers we'll just add more info
into this one trace event, rather than adding a bunch more events.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 19 ++-
 drivers/gpu/drm/msm/msm_gpu_trace.h| 32 +++---
 2 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 530b1102b46d..5cc05d669a08 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -71,25 +71,20 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
long nr = sc->nr_to_scan;
-   unsigned long freed;
+   unsigned long freed, purged, evicted = 0;
 
-   freed = drm_gem_lru_scan(>lru.dontneed, nr, purge);
-   nr -= freed;
-
-   if (freed > 0)
-   trace_msm_gem_purge(freed << PAGE_SHIFT);
+   purged = drm_gem_lru_scan(>lru.dontneed, nr, purge);
+   nr -= purged;
 
if (can_swap() && nr > 0) {
-   unsigned long evicted;
-
evicted = drm_gem_lru_scan(>lru.willneed, nr, evict);
nr -= evicted;
+   }
 
-   if (evicted > 0)
-   trace_msm_gem_evict(evicted << PAGE_SHIFT);
+   freed = purged + evicted;
 
-   freed += evicted;
-   }
+   if (freed)
+   trace_msm_gem_shrink(sc->nr_to_scan, purged, evicted);
 
return (freed > 0) ? freed : SHRINK_STOP;
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
b/drivers/gpu/drm/msm/msm_gpu_trace.h
index ca0b08d7875b..8867fa0a0306 100644
--- a/drivers/gpu/drm/msm/msm_gpu_trace.h
+++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
@@ -115,29 +115,23 @@ TRACE_EVENT(msm_gmu_freq_change,
 );
 
 
-TRACE_EVENT(msm_gem_purge,
-   TP_PROTO(u32 bytes),
-   TP_ARGS(bytes),
+TRACE_EVENT(msm_gem_shrink,
+   TP_PROTO(u32 nr_to_scan, u32 purged, u32 evicted),
+   TP_ARGS(nr_to_scan, purged, evicted),
TP_STRUCT__entry(
-   __field(u32, bytes)
+   __field(u32, nr_to_scan)
+   __field(u32, purged)
+   __field(u32, evicted)
),
TP_fast_assign(
-   __entry->bytes = bytes;
+   __entry->nr_to_scan = nr_to_scan;
+   __entry->purged = purged;
+   __entry->evicted = evicted;
),
-   TP_printk("Purging %u bytes", __entry->bytes)
-);
-
-
-TRACE_EVENT(msm_gem_evict,
-   TP_PROTO(u32 bytes),
-   TP_ARGS(bytes),
-   TP_STRUCT__entry(
-   __field(u32, bytes)
-   ),
-   TP_fast_assign(
-   __entry->bytes = bytes;
-   ),
-   TP_printk("Evicting %u bytes", __entry->bytes)
+   TP_printk("nr_to_scan=%u pages, purged=%u pages, evicted=%u 
pages",
+ __entry->nr_to_scan,
+ __entry->purged,
+ __entry->evicted)
 );
 
 
-- 
2.36.1



[PATCH v4 11/15] drm/msm/gem: Unpin buffers earlier

2022-08-02 Thread Rob Clark
From: Rob Clark 

We've already attached the fences, so obj->resv (which shrinker checks)
tells us whether they are still active.  So we can unpin sooner, before
we drop the queue lock.

This also avoids the need to grab the obj lock in the retire path,
avoiding potential for lock contention between submit and retire.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index adf358fb8e9d..5599d93ec0d2 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -501,11 +501,11 @@ static int submit_reloc(struct msm_gem_submit *submit, 
struct msm_gem_object *ob
  */
 static void submit_cleanup(struct msm_gem_submit *submit, bool error)
 {
-   unsigned cleanup_flags = BO_LOCKED;
+   unsigned cleanup_flags = BO_LOCKED | BO_OBJ_PINNED;
unsigned i;
 
if (error)
-   cleanup_flags |= BO_VMA_PINNED | BO_OBJ_PINNED;
+   cleanup_flags |= BO_VMA_PINNED;
 
for (i = 0; i < submit->nr_bos; i++) {
struct msm_gem_object *msm_obj = submit->bos[i].obj;
@@ -522,10 +522,6 @@ void msm_submit_retire(struct msm_gem_submit *submit)
for (i = 0; i < submit->nr_bos; i++) {
struct drm_gem_object *obj = >bos[i].obj->base;
 
-   msm_gem_lock(obj);
-   /* Note, VMA already fence-unpinned before submit: */
-   submit_cleanup_bo(submit, i, BO_OBJ_PINNED);
-   msm_gem_unlock(obj);
drm_gem_object_put(obj);
}
 }
-- 
2.36.1



[PATCH v4 13/15] drm/msm/gem: Evict active GEM objects when necessary

2022-08-02 Thread Rob Clark
From: Rob Clark 

If we are under enough memory pressure, we should stall waiting for
active buffers to become idle in order to evict.

v2: Check for __GFP_ATOMIC before blocking

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 70 +-
 drivers/gpu/drm/msm/msm_gpu_trace.h| 16 +++---
 2 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 5cc05d669a08..f31054d25314 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -24,6 +24,13 @@ static bool can_swap(void)
return enable_eviction && get_nr_swap_pages() > 0;
 }
 
+static bool can_block(struct shrink_control *sc)
+{
+   if (sc->gfp_mask & __GFP_ATOMIC)
+   return false;
+   return current_is_kswapd() || (sc->gfp_mask & __GFP_RECLAIM);
+}
+
 static unsigned long
 msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
@@ -65,26 +72,65 @@ evict(struct drm_gem_object *obj)
return true;
 }
 
+static bool
+wait_for_idle(struct drm_gem_object *obj)
+{
+   enum dma_resv_usage usage = dma_resv_usage_rw(true);
+   return dma_resv_wait_timeout(obj->resv, usage, false, 1000) > 0;
+}
+
+static bool
+active_purge(struct drm_gem_object *obj)
+{
+   if (!wait_for_idle(obj))
+   return false;
+
+   return purge(obj);
+}
+
+static bool
+active_evict(struct drm_gem_object *obj)
+{
+   if (!wait_for_idle(obj))
+   return false;
+
+   return evict(obj);
+}
+
 static unsigned long
 msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 {
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
+   struct {
+   struct drm_gem_lru *lru;
+   bool (*shrink)(struct drm_gem_object *obj);
+   bool cond;
+   unsigned long freed;
+   } stages[] = {
+   /* Stages of progressively more aggressive/expensive reclaim: */
+   { >lru.dontneed, purge,true },
+   { >lru.willneed, evict,can_swap() },
+   { >lru.dontneed, active_purge, can_block(sc) },
+   { >lru.willneed, active_evict, can_swap() && 
can_block(sc) },
+   };
long nr = sc->nr_to_scan;
-   unsigned long freed, purged, evicted = 0;
-
-   purged = drm_gem_lru_scan(>lru.dontneed, nr, purge);
-   nr -= purged;
-
-   if (can_swap() && nr > 0) {
-   evicted = drm_gem_lru_scan(>lru.willneed, nr, evict);
-   nr -= evicted;
+   unsigned long freed = 0;
+
+   for (unsigned i = 0; (nr > 0) && (i < ARRAY_SIZE(stages)); i++) {
+   if (!stages[i].cond)
+   continue;
+   stages[i].freed =
+   drm_gem_lru_scan(stages[i].lru, nr, stages[i].shrink);
+   nr -= stages[i].freed;
+   freed += stages[i].freed;
}
 
-   freed = purged + evicted;
-
-   if (freed)
-   trace_msm_gem_shrink(sc->nr_to_scan, purged, evicted);
+   if (freed) {
+   trace_msm_gem_shrink(sc->nr_to_scan, stages[0].freed,
+stages[1].freed, stages[2].freed,
+stages[3].freed);
+   }
 
return (freed > 0) ? freed : SHRINK_STOP;
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
b/drivers/gpu/drm/msm/msm_gpu_trace.h
index 8867fa0a0306..ac40d857bc45 100644
--- a/drivers/gpu/drm/msm/msm_gpu_trace.h
+++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
@@ -116,22 +116,26 @@ TRACE_EVENT(msm_gmu_freq_change,
 
 
 TRACE_EVENT(msm_gem_shrink,
-   TP_PROTO(u32 nr_to_scan, u32 purged, u32 evicted),
-   TP_ARGS(nr_to_scan, purged, evicted),
+   TP_PROTO(u32 nr_to_scan, u32 purged, u32 evicted,
+u32 active_purged, u32 active_evicted),
+   TP_ARGS(nr_to_scan, purged, evicted, active_purged, 
active_evicted),
TP_STRUCT__entry(
__field(u32, nr_to_scan)
__field(u32, purged)
__field(u32, evicted)
+   __field(u32, active_purged)
+   __field(u32, active_evicted)
),
TP_fast_assign(
__entry->nr_to_scan = nr_to_scan;
__entry->purged = purged;
__entry->evicted = evicted;
+   __entry->active_purged = active_purged;
+   __entry->active_evicted = active_evicted;
),
-   TP_printk("nr_to_scan=%u pages, purged=%u pages, evicted=%u 
pages",
-

[PATCH v4 10/15] drm/msm/gem: Convert to using drm_gem_lru

2022-08-02 Thread Rob Clark
From: Rob Clark 

This converts over to use the shared GEM LRU/shrinker helpers.  Note
that it means we are no longer tracking purgeable or willneed buffers
that are active separately.  But the most recently pinned buffers should
be at the tail of the various LRUs, and the shrinker is already prepared
to encounter objects which are still active.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c  |  14 +--
 drivers/gpu/drm/msm/msm_drv.h  |  70 +++
 drivers/gpu/drm/msm/msm_gem.c  |  58 
 drivers/gpu/drm/msm/msm_gem.h  |  93 
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 117 ++---
 drivers/gpu/drm/msm/msm_gpu.c  |   3 -
 drivers/gpu/drm/msm/msm_gpu.h  |   6 --
 7 files changed, 104 insertions(+), 257 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index d7ca025457b6..1ca4a92ba96e 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -418,14 +418,18 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
INIT_LIST_HEAD(>objects);
mutex_init(>obj_lock);
 
-   INIT_LIST_HEAD(>inactive_willneed);
-   INIT_LIST_HEAD(>inactive_dontneed);
-   INIT_LIST_HEAD(>inactive_unpinned);
-   mutex_init(>mm_lock);
+   /*
+* Initialize the LRUs:
+*/
+   mutex_init(>lru.lock);
+   drm_gem_lru_init(>lru.unbacked, >lru.lock);
+   drm_gem_lru_init(>lru.pinned,   >lru.lock);
+   drm_gem_lru_init(>lru.willneed, >lru.lock);
+   drm_gem_lru_init(>lru.dontneed, >lru.lock);
 
/* Teach lockdep about lock ordering wrt. shrinker: */
fs_reclaim_acquire(GFP_KERNEL);
-   might_lock(>mm_lock);
+   might_lock(>lru.lock);
fs_reclaim_release(GFP_KERNEL);
 
drm_mode_config_init(ddev);
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b3689a2d27d7..208ae5bc5e6b 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -142,28 +142,60 @@ struct msm_drm_private {
struct mutex obj_lock;
 
/**
-* LRUs of inactive GEM objects.  Every bo is either in one of the
-* inactive lists (depending on whether or not it is shrinkable) or
-* gpu->active_list (for the gpu it is active on[1]), or transiently
-* on a temporary list as the shrinker is running.
+* lru:
 *
-* Note that inactive_willneed also contains pinned and vmap'd bos,
-* but the number of pinned-but-not-active objects is small (scanout
-* buffers, ringbuffer, etc).
+* The various LRU's that a GEM object is in at various stages of
+* it's lifetime.  Objects start out in the unbacked LRU.  When
+* pinned (for scannout or permanently mapped GPU buffers, like
+* ringbuffer, memptr, fw, etc) it moves to the pinned LRU.  When
+* unpinned, it moves into willneed or dontneed LRU depending on
+* madvise state.  When backing pages are evicted (willneed) or
+* purged (dontneed) it moves back into the unbacked LRU.
 *
-* These lists are protected by mm_lock (which should be acquired
-* before per GEM object lock).  One should *not* hold mm_lock in
-* get_pages()/vmap()/etc paths, as they can trigger the shrinker.
-*
-* [1] if someone ever added support for the old 2d cores, there could 
be
-* more than one gpu object
+* The dontneed LRU is considered by the shrinker for objects
+* that are candidate for purging, and the willneed LRU is
+* considered for objects that could be evicted.
 */
-   struct list_head inactive_willneed;  /* inactive + potentially 
unpin/evictable */
-   struct list_head inactive_dontneed;  /* inactive + shrinkable */
-   struct list_head inactive_unpinned;  /* inactive + purged or unpinned */
-   long shrinkable_count;   /* write access under mm_lock */
-   long evictable_count;/* write access under mm_lock */
-   struct mutex mm_lock;
+   struct {
+   /**
+* unbacked:
+*
+* The LRU for GEM objects without backing pages allocated.
+* This mostly exists so that objects are always is one
+* LRU.
+*/
+   struct drm_gem_lru unbacked;
+
+   /**
+* pinned:
+*
+* The LRU for pinned GEM objects
+*/
+   struct drm_gem_lru pinned;
+
+   /**
+* willneed:
+*
+* The LRU for unpinned GEM objects which are in madvise
+* WILLNEED state (ie. can be evicted)
+

[PATCH v4 08/15] drm/msm/gem: Remove active refcnt

2022-08-02 Thread Rob Clark
From: Rob Clark 

At this point the pinned refcnt is sufficient, and the shrinker is
already prepared to encounter objects which are still active according
to fences attached to the resv.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c| 45 ++--
 drivers/gpu/drm/msm/msm_gem.h| 14 ++---
 drivers/gpu/drm/msm/msm_gem_submit.c | 22 ++
 3 files changed, 8 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 407b18a24dc4..209438744bab 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -734,8 +734,7 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned 
madv)
/* If the obj is inactive, we might need to move it
 * between inactive lists
 */
-   if (msm_obj->active_count == 0)
-   update_lru(obj);
+   update_lru(obj);
 
msm_gem_unlock(obj);
 
@@ -788,7 +787,6 @@ void msm_gem_evict(struct drm_gem_object *obj)
GEM_WARN_ON(!msm_gem_is_locked(obj));
GEM_WARN_ON(is_unevictable(msm_obj));
GEM_WARN_ON(!msm_obj->evictable);
-   GEM_WARN_ON(msm_obj->active_count);
 
/* Get rid of any iommu mapping(s): */
put_iova_spaces(obj, false);
@@ -813,37 +811,6 @@ void msm_gem_vunmap(struct drm_gem_object *obj)
msm_obj->vaddr = NULL;
 }
 
-void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu)
-{
-   struct msm_gem_object *msm_obj = to_msm_bo(obj);
-   struct msm_drm_private *priv = obj->dev->dev_private;
-
-   might_sleep();
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
-   GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED);
-   GEM_WARN_ON(msm_obj->dontneed);
-
-   if (msm_obj->active_count++ == 0) {
-   mutex_lock(>mm_lock);
-   if (msm_obj->evictable)
-   mark_unevictable(msm_obj);
-   list_move_tail(_obj->mm_list, >active_list);
-   mutex_unlock(>mm_lock);
-   }
-}
-
-void msm_gem_active_put(struct drm_gem_object *obj)
-{
-   struct msm_gem_object *msm_obj = to_msm_bo(obj);
-
-   might_sleep();
-   GEM_WARN_ON(!msm_gem_is_locked(obj));
-
-   if (--msm_obj->active_count == 0) {
-   update_lru(obj);
-   }
-}
-
 static void update_lru(struct drm_gem_object *obj)
 {
struct msm_drm_private *priv = obj->dev->dev_private;
@@ -851,9 +818,6 @@ static void update_lru(struct drm_gem_object *obj)
 
GEM_WARN_ON(!msm_gem_is_locked(_obj->base));
 
-   if (msm_obj->active_count != 0)
-   return;
-
mutex_lock(>mm_lock);
 
if (msm_obj->dontneed)
@@ -926,7 +890,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m,
stats->all.count++;
stats->all.size += obj->size;
 
-   if (is_active(msm_obj)) {
+   if (msm_gem_active(obj)) {
stats->active.count++;
stats->active.size += obj->size;
}
@@ -954,7 +918,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m,
}
 
seq_printf(m, "%08x: %c %2d (%2d) %08llx %p",
-   msm_obj->flags, is_active(msm_obj) ? 'A' : 'I',
+   msm_obj->flags, msm_gem_active(obj) ? 'A' : 'I',
obj->name, kref_read(>refcount),
off, msm_obj->vaddr);
 
@@ -1037,9 +1001,6 @@ static void msm_gem_free_object(struct drm_gem_object 
*obj)
list_del(_obj->mm_list);
mutex_unlock(>mm_lock);
 
-   /* object should not be on active list: */
-   GEM_WARN_ON(is_active(msm_obj));
-
put_iova_spaces(obj, true);
 
if (obj->import_attach) {
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 6fe521ccda45..420ba49bf21a 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -138,7 +138,6 @@ struct msm_gem_object {
 
char name[32]; /* Identifier to print for the debugfs files */
 
-   int active_count;
int pin_count;
 };
 #define to_msm_bo(x) container_of(x, struct msm_gem_object, base)
@@ -171,8 +170,6 @@ void *msm_gem_get_vaddr_active(struct drm_gem_object *obj);
 void msm_gem_put_vaddr_locked(struct drm_gem_object *obj);
 void msm_gem_put_vaddr(struct drm_gem_object *obj);
 int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv);
-void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu);
-void msm_gem_active_put(struct drm_gem_object *obj);
 bool msm_gem_active(struct drm_gem_object *obj);
 int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t 
*timeout);
 int msm_gem_cpu_fini(struct drm_gem_object *obj);
@@ -245,12 +242,6 @@ msm_gem_is_locked(struct drm_gem_object *obj)
return dma_resv_is_locked(obj->resv) || (

[PATCH v4 06/15] drm/msm/gem: Rename to pin/unpin_pages

2022-08-02 Thread Rob Clark
From: Rob Clark 

Since that is what these fxns actually do.. they are getting *pinned*
pages (as opposed to cases where we need pages, but don't need them
pinned, like CPU mappings).

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c   | 18 +-
 drivers/gpu/drm/msm/msm_gem.h   |  4 ++--
 drivers/gpu/drm/msm/msm_gem_prime.c |  4 ++--
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 97467364dc0a..3da64c7f65a2 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -177,30 +177,38 @@ static void put_pages(struct drm_gem_object *obj)
}
 }
 
-struct page **msm_gem_get_pages(struct drm_gem_object *obj)
+static struct page **msm_gem_pin_pages_locked(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct page **p;
 
-   msm_gem_lock(obj);
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
-   msm_gem_unlock(obj);
return ERR_PTR(-EBUSY);
}
 
p = get_pages(obj);
-
if (!IS_ERR(p)) {
msm_obj->pin_count++;
update_lru(obj);
}
 
+   return p;
+}
+
+struct page **msm_gem_pin_pages(struct drm_gem_object *obj)
+{
+   struct page **p;
+
+   msm_gem_lock(obj);
+   p = msm_gem_pin_pages_locked(obj);
msm_gem_unlock(obj);
+
return p;
 }
 
-void msm_gem_put_pages(struct drm_gem_object *obj)
+void msm_gem_unpin_pages(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 0ab0dc4f8c25..6fe521ccda45 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -159,8 +159,8 @@ int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
struct msm_gem_address_space *aspace, uint64_t *iova);
 void msm_gem_unpin_iova(struct drm_gem_object *obj,
struct msm_gem_address_space *aspace);
-struct page **msm_gem_get_pages(struct drm_gem_object *obj);
-void msm_gem_put_pages(struct drm_gem_object *obj);
+struct page **msm_gem_pin_pages(struct drm_gem_object *obj);
+void msm_gem_unpin_pages(struct drm_gem_object *obj);
 int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
struct drm_mode_create_dumb *args);
 int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c 
b/drivers/gpu/drm/msm/msm_gem_prime.c
index dcc8a573bc76..c1d91863df05 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -63,12 +63,12 @@ struct drm_gem_object *msm_gem_prime_import_sg_table(struct 
drm_device *dev,
 int msm_gem_prime_pin(struct drm_gem_object *obj)
 {
if (!obj->import_attach)
-   msm_gem_get_pages(obj);
+   msm_gem_pin_pages(obj);
return 0;
 }
 
 void msm_gem_prime_unpin(struct drm_gem_object *obj)
 {
if (!obj->import_attach)
-   msm_gem_put_pages(obj);
+   msm_gem_unpin_pages(obj);
 }
-- 
2.36.1



[PATCH v4 04/15] drm/msm/gem: Check for active in shrinker path

2022-08-02 Thread Rob Clark
From: Rob Clark 

Currently in our shrinker path we shouldn't be encountering anything
that is active, but this will change in subsequent patches.  So check
if there are unsignaled fences.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c  | 10 ++
 drivers/gpu/drm/msm/msm_gem.h  |  1 +
 drivers/gpu/drm/msm/msm_gem_shrinker.c |  6 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 8ddbd2e001d4..b55d252aef17 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -870,6 +870,16 @@ static void update_inactive(struct msm_gem_object *msm_obj)
mutex_unlock(>mm_lock);
 }
 
+bool msm_gem_active(struct drm_gem_object *obj)
+{
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
+
+   if (to_msm_bo(obj)->pin_count)
+   return true;
+
+   return !dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true));
+}
+
 int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout)
 {
bool write = !!(op & MSM_PREP_WRITE);
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 432032ad4aed..0ab0dc4f8c25 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -173,6 +173,7 @@ void msm_gem_put_vaddr(struct drm_gem_object *obj);
 int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv);
 void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu);
 void msm_gem_active_put(struct drm_gem_object *obj);
+bool msm_gem_active(struct drm_gem_object *obj);
 int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t 
*timeout);
 int msm_gem_cpu_fini(struct drm_gem_object *obj);
 int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 6e39d959b9f0..ea8ed74982c1 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -43,6 +43,9 @@ purge(struct msm_gem_object *msm_obj)
if (!is_purgeable(msm_obj))
return false;
 
+   if (msm_gem_active(_obj->base))
+   return false;
+
/*
 * This will move the obj out of still_in_list to
 * the purged list
@@ -58,6 +61,9 @@ evict(struct msm_gem_object *msm_obj)
if (is_unevictable(msm_obj))
return false;
 
+   if (msm_gem_active(_obj->base))
+   return false;
+
msm_gem_evict(_obj->base);
 
return true;
-- 
2.36.1



[PATCH v4 03/15] drm/msm: Split out idr_lock

2022-08-02 Thread Rob Clark
From: Rob Clark 

Otherwise if we hit reclaim pinning objects in the submit path, we'll be
blocking retire_worker trying to free a submit.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c |  4 ++--
 drivers/gpu/drm/msm/msm_gem_submit.c  | 10 --
 drivers/gpu/drm/msm/msm_gpu.h |  4 +++-
 drivers/gpu/drm/msm/msm_submitqueue.c |  1 +
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 1ed4cd09dbf8..d7ca025457b6 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -883,13 +883,13 @@ static int wait_fence(struct msm_gpu_submitqueue *queue, 
uint32_t fence_id,
 * retired, so if the fence is not found it means there is nothing
 * to wait for
 */
-   ret = mutex_lock_interruptible(>lock);
+   ret = mutex_lock_interruptible(>idr_lock);
if (ret)
return ret;
fence = idr_find(>fence_idr, fence_id);
if (fence)
fence = dma_fence_get_rcu(fence);
-   mutex_unlock(>lock);
+   mutex_unlock(>idr_lock);
 
if (!fence)
return 0;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index c7819781879c..16c662808522 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -72,9 +72,9 @@ void __msm_gem_submit_destroy(struct kref *kref)
unsigned i;
 
if (submit->fence_id) {
-   mutex_lock(>queue->lock);
+   mutex_lock(>queue->idr_lock);
idr_remove(>queue->fence_idr, submit->fence_id);
-   mutex_unlock(>queue->lock);
+   mutex_unlock(>queue->idr_lock);
}
 
dma_fence_put(submit->user_fence);
@@ -881,6 +881,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
submit->nr_cmds = i;
 
+   mutex_lock(>idr_lock);
+
/*
 * If using userspace provided seqno fence, validate that the id
 * is available before arming sched job.  Since access to fence_idr
@@ -889,6 +891,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 */
if ((args->flags & MSM_SUBMIT_FENCE_SN_IN) &&
idr_find(>fence_idr, args->fence)) {
+   mutex_unlock(>idr_lock);
ret = -EINVAL;
goto out;
}
@@ -921,6 +924,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
submit->user_fence, 1,
INT_MAX, GFP_KERNEL);
}
+
+   mutex_unlock(>idr_lock);
+
if (submit->fence_id < 0) {
ret = submit->fence_id;
submit->fence_id = 0;
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 4d935fedd2ac..962d2070bcdf 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -466,7 +466,8 @@ static inline int msm_gpu_convert_priority(struct msm_gpu 
*gpu, int prio,
  * @node:  node in the context's list of submitqueues
  * @fence_idr: maps fence-id to dma_fence for userspace visible fence
  * seqno, protected by submitqueue lock
- * @lock:  submitqueue lock
+ * @idr_lock:  for serializing access to fence_idr
+ * @lock:  submitqueue lock for serializing submits on a queue
  * @ref:   reference count
  * @entity:the submit job-queue
  */
@@ -479,6 +480,7 @@ struct msm_gpu_submitqueue {
struct msm_file_private *ctx;
struct list_head node;
struct idr fence_idr;
+   struct mutex idr_lock;
struct mutex lock;
struct kref ref;
struct drm_sched_entity *entity;
diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
b/drivers/gpu/drm/msm/msm_submitqueue.c
index f486a3cd4e55..c6929e205b51 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -200,6 +200,7 @@ int msm_submitqueue_create(struct drm_device *drm, struct 
msm_file_private *ctx,
*id = queue->id;
 
idr_init(>fence_idr);
+   mutex_init(>idr_lock);
mutex_init(>lock);
 
list_add_tail(>node, >submitqueues);
-- 
2.36.1



[PATCH v4 07/15] drm/msm/gem: Consolidate pin/unpin paths

2022-08-02 Thread Rob Clark
From: Rob Clark 

Avoid having multiple spots where we increment/decrement pin_count (and
associated LRU updating)

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 3da64c7f65a2..407b18a24dc4 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -190,7 +190,7 @@ static struct page **msm_gem_pin_pages_locked(struct 
drm_gem_object *obj)
 
p = get_pages(obj);
if (!IS_ERR(p)) {
-   msm_obj->pin_count++;
+   to_msm_bo(obj)->pin_count++;
update_lru(obj);
}
 
@@ -213,9 +213,7 @@ void msm_gem_unpin_pages(struct drm_gem_object *obj)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
msm_gem_lock(obj);
-   msm_obj->pin_count--;
-   GEM_WARN_ON(msm_obj->pin_count < 0);
-   update_lru(obj);
+   msm_gem_unpin_locked(obj);
msm_gem_unlock(obj);
 }
 
@@ -436,14 +434,13 @@ int msm_gem_pin_vma_locked(struct drm_gem_object *obj, 
struct msm_gem_vma *vma)
if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
return -EBUSY;
 
-   pages = get_pages(obj);
+   pages = msm_gem_pin_pages_locked(obj);
if (IS_ERR(pages))
return PTR_ERR(pages);
 
ret = msm_gem_map_vma(vma->aspace, vma, prot, msm_obj->sgt, obj->size);
-
-   if (!ret)
-   msm_obj->pin_count++;
+   if (ret)
+   msm_gem_unpin_locked(obj);
 
return ret;
 }
-- 
2.36.1



<    4   5   6   7   8   9   10   11   12   13   >