Re: [PATCH v18 04/26] drm/shmem-helper: Refactor locked/unlocked functions

2023-11-28 Thread Boris Brezillon
On Wed, 29 Nov 2023 01:05:14 +0300
Dmitry Osipenko  wrote:

> On 11/28/23 15:37, Boris Brezillon wrote:
> > On Tue, 28 Nov 2023 12:14:42 +0100
> > Maxime Ripard  wrote:
> >   
> >> Hi,
> >>
> >> On Fri, Nov 24, 2023 at 11:59:11AM +0100, Boris Brezillon wrote:  
> >>> On Fri, 24 Nov 2023 11:40:06 +0100
> >>> Maxime Ripard  wrote:
> >>> 
>  On Mon, Oct 30, 2023 at 02:01:43AM +0300, Dmitry Osipenko wrote:
> > Add locked and remove unlocked postfixes from drm-shmem function names,
> > making names consistent with the drm/gem core code.
> >
> > Reviewed-by: Boris Brezillon 
> > Suggested-by: Boris Brezillon 
> > Signed-off-by: Dmitry Osipenko   
> 
>  This contradicts my earlier ack on a patch but...
>  
> > ---
> >  drivers/gpu/drm/drm_gem_shmem_helper.c| 64 +--
> >  drivers/gpu/drm/lima/lima_gem.c   |  8 +--
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   |  2 +-
> >  drivers/gpu/drm/panfrost/panfrost_gem.c   |  6 +-
> >  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  2 +-
> >  drivers/gpu/drm/panfrost/panfrost_mmu.c   |  2 +-
> >  drivers/gpu/drm/v3d/v3d_bo.c  |  4 +-
> >  drivers/gpu/drm/virtio/virtgpu_object.c   |  4 +-
> >  include/drm/drm_gem_shmem_helper.h| 36 +--
> >  9 files changed, 64 insertions(+), 64 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> > b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index 0d61f2b3e213..154585ddae08 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -43,8 +43,8 @@ static const struct drm_gem_object_funcs 
> > drm_gem_shmem_funcs = {
> > .pin = drm_gem_shmem_object_pin,
> > .unpin = drm_gem_shmem_object_unpin,
> > .get_sg_table = drm_gem_shmem_object_get_sg_table,
> > -   .vmap = drm_gem_shmem_object_vmap,
> > -   .vunmap = drm_gem_shmem_object_vunmap,
> > +   .vmap = drm_gem_shmem_object_vmap_locked,
> > +   .vunmap = drm_gem_shmem_object_vunmap_locked,  
> 
>  While I think we should indeed be consistent with the names, I would
>  also expect helpers to get the locking right by default.
> >>>
> >>> Wait, actually I think this patch does what you suggest already. The
> >>> _locked() prefix tells the caller: "you should take care of the locking,
> >>> I expect the lock to be held when this is hook/function is called". So
> >>> helpers without the _locked() prefix take care of the locking (which I
> >>> guess matches your 'helpers get the locking right' expectation), and
> >>> those with the _locked() prefix don't.
> >>
> >> What I meant by "getting the locking right" is indeed a bit ambiguous,
> >> sorry. What I'm trying to say I guess is that, in this particular case,
> >> I don't think you can expect the vmap implementation to be called with
> >> or without the locks held. The doc for that function will say that it's
> >> either one or the other, but not both.
> >>
> >> So helpers should follow what is needed to provide a default vmap/vunmap
> >> implementation, including what locking is expected from a vmap/vunmap
> >> implementation.  
> > 
> > Hm, yeah, I think that's a matter of taste. When locking is often
> > deferrable, like it is in DRM, I find it beneficial for funcions and
> > function pointers to reflect the locking scheme, rather than relying on
> > people properly reading the doc, especially when this is the only
> > outlier in the group of drm_gem_object_funcs we already have, and it's
> > not event documented at the drm_gem_object_funcs level [1] :P.
> >   
> >>
> >> If that means that vmap is always called with the locks taken, then
> >> drm_gem_shmem_object_vmap can just assume that it will be called with
> >> the locks taken and there's no need to mention it in the name (and you
> >> can probably sprinkle a couple of lockdep assertion to make sure the
> >> locking is indeed consistent).  
> > 
> > Things get very confusing when you end up having drm_gem_shmem helpers
> > that are suffixed with _locked() to encode the fact locking is the
> > caller's responsibility and no suffix for the
> > callee-takes-care-of-the-locking semantics, while other helpers that are
> > not suffixed at all actually implement the
> > caller-should-take-care-of-the-locking semantics.
> >   
> >>  
>  I'm not sure how reasonable it is, but I think I'd prefer to turn this
>  around and keep the drm_gem_shmem_object_vmap/unmap helpers name, and
>  convert whatever function needs to be converted to the unlock suffix so
>  we get a consistent naming.
> >>>
> >>> That would be an _unlocked() prefix if we do it the other way around. I
> >>> think the main confusion comes from the names of the hooks in
> >>> drm_gem_shmem_funcs. Some of them, like 

Re: [PATCH v3 10/17] drm/v3d: Detach the CSD job BO setup

2023-11-28 Thread Iago Toral
El mar, 28-11-2023 a las 07:47 -0300, Maira Canal escribió:
> Hi Iago,
> 
> On 11/28/23 05:42, Iago Toral wrote:
> > El lun, 27-11-2023 a las 15:48 -0300, Maíra Canal escribió:
> > > From: Melissa Wen 
> > > 
> > > Detach CSD job setup from CSD submission ioctl to reuse it in CPU
> > > submission ioctl for indirect CSD job.
> > > 
> > > Signed-off-by: Melissa Wen 
> > > Co-developed-by: Maíra Canal 
> > > Signed-off-by: Maíra Canal 
> > > ---
> > >   drivers/gpu/drm/v3d/v3d_submit.c | 68 -
> > > -
> > > --
> > >   1 file changed, 42 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> > > b/drivers/gpu/drm/v3d/v3d_submit.c
> > > index c134b113b181..eb26fe1e27e3 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_submit.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> > > @@ -256,6 +256,45 @@
> > > v3d_attach_fences_and_unlock_reservation(struct
> > > drm_file *file_priv,
> > >  }
> > >   }
> > >   
> > > +static int
> > > +v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> > > +  struct v3d_dev *v3d,
> > > +  struct drm_v3d_submit_csd *args,
> > > +  struct v3d_csd_job **job,
> > > +  struct v3d_job **clean_job,
> > > +  struct v3d_submit_ext *se,
> > > +  struct ww_acquire_ctx *acquire_ctx)
> > > +{
> > > +   int ret;
> > > +
> > > +   ret = v3d_job_allocate((void *)job, sizeof(**job));
> > > +   if (ret)
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_init(v3d, file_priv, &(*job)->base,
> > > +  v3d_job_free, args->in_sync, se,
> > > V3D_CSD);
> > > +   if (ret)
> > 
> > 
> > We should free the job here.
> > 
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_allocate((void *)clean_job,
> > > sizeof(**clean_job));
> > > +   if (ret)
> > > +   return ret;
> > > +
> > > +   ret = v3d_job_init(v3d, file_priv, *clean_job,
> > > +  v3d_job_free, 0, NULL,
> > > V3D_CACHE_CLEAN);
> > > +   if (ret)
> > 
> > We should free job and clean_job here.
> > 
> > > +   return ret;
> > > +
> > > +   (*job)->args = *args;
> > > +
> > > +   ret = v3d_lookup_bos(>drm, file_priv, *clean_job,
> > > +    args->bo_handles, args-
> > > > bo_handle_count);
> > > +   if (ret)
> > 
> > Same here.
> > 
> > I think we probably want to have a fail label where we do this and
> > just
> > jump there from all the paths I mentioned above.
> 
> Actually, we are freeing the job in `v3d_submit_csd_ioctl`. Take a
> look
> here:
> 
>    48 ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
>    47  , _job, ,
>    46  _ctx);
>    45 if (ret)
>    44 goto fail;
> 
> If `v3d_setup_csd_jobs_and_bos` fails, we go to fail.
> 
>    43
>    42 if (args->perfmon_id) {
>    41 job->base.perfmon = v3d_perfmon_find(v3d_priv,
>    40 
> args->perfmon_id);
>    39 if (!job->base.perfmon) {
>    38 ret = -ENOENT;
>    37 goto fail_perfmon;
>    36 }
>    35 }
>    34
>    33 mutex_lock(>sched_lock);
>    32 v3d_push_job(>base);
>    31
>    30 ret = drm_sched_job_add_dependency(_job->base,
>    29 
> dma_fence_get(job->base.done_fence));
>    28 if (ret)
>    27 goto fail_unreserve;
>    26
>    25 v3d_push_job(clean_job);
>    24 mutex_unlock(>sched_lock);
>    23
>    22 v3d_attach_fences_and_unlock_reservation(file_priv,
>    21  clean_job,
>    20  _ctx,
>    19  args-
> >out_sync,
>    18  ,
>    17 
> clean_job->done_fence);
>    16
>    15 v3d_job_put(>base);
>    14 v3d_job_put(clean_job);
>    13
>    12 return 0;
>    11
>    10 fail_unreserve:
>     9 mutex_unlock(>sched_lock);
>     8 fail_perfmon:
>     7 drm_gem_unlock_reservations(clean_job->bo, 
> clean_job->bo_count,
>     6 _ctx);
>     5 fail:
>     4 v3d_job_cleanup((void *)job);
>     3 v3d_job_cleanup(clean_job);
> 
> Here we cleanup `job` and `clean_job`. This will call `v3d_job_free`
> and
> free the jobs.


Ah, yes, ignore my previous comment then.

Iago

> 
> Best Regards,
> - Maíra
> 
>     2 v3d_put_multisync_post_deps();
>     1
> 1167 return ret;
> 
> > 
> > > +   return ret;
> > > +
> > > +   return v3d_lock_bo_reservations(*clean_job, acquire_ctx);
> > > +}
> > > +
> > >   static void
> 

Re: [PATCH v2 05/12] drm/rockchip: vop2: Set YUV/RGB overlay mode

2023-11-28 Thread Andy Yan

Hi Sasha:

On 11/27/23 22:16, Sascha Hauer wrote:

On Wed, Nov 22, 2023 at 08:54:38PM +0800, Andy Yan wrote:

From: Andy Yan 

Set overlay mode register according to the
output mode is yuv or rgb.

Signed-off-by: Andy Yan 
---

(no changes since v1)

  drivers/gpu/drm/rockchip/rockchip_drm_drv.h  |  1 +
  drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 19 ---
  2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 3d8ab2defa1b..7a58c5c9d4ec 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -51,6 +51,7 @@ struct rockchip_crtc_state {
u32 bus_format;
u32 bus_flags;
int color_space;
+   bool yuv_overlay;

This struct already contains a bool type variable. Please add this one
next to it to keep the struct size smaller.



Okay, will do.




  };
  #define to_rockchip_crtc_state(s) \
container_of(s, struct rockchip_crtc_state, base)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
index a019cc9bbd54..b32a291c5caa 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -1612,6 +1612,8 @@ static void vop2_crtc_atomic_enable(struct drm_crtc *crtc,
  
  	vop2->enable_count++;
  
+	vcstate->yuv_overlay = is_yuv_output(vcstate->bus_format);

+
vop2_crtc_enable_irq(vp, VP_INT_POST_BUF_EMPTY);
  
  	polflags = 0;

@@ -1639,7 +1641,7 @@ static void vop2_crtc_atomic_enable(struct drm_crtc *crtc,
if (vop2_output_uv_swap(vcstate->bus_format, vcstate->output_mode))
dsp_ctrl |= RK3568_VP_DSP_CTRL__DSP_RB_SWAP;
  
-	if (is_yuv_output(vcstate->bus_format))

+   if (vcstate->yuv_overlay)
dsp_ctrl |= RK3568_VP_DSP_CTRL__POST_DSP_OUT_R2Y;
  
  	vop2_dither_setup(crtc, _ctrl);

@@ -1948,10 +1950,12 @@ static void vop2_setup_layer_mixer(struct 
vop2_video_port *vp)
u16 hdisplay;
u32 bg_dly;
u32 pre_scan_dly;
+   u32 ovl_ctrl;
int i;
struct vop2_video_port *vp0 = >vps[0];
struct vop2_video_port *vp1 = >vps[1];
struct vop2_video_port *vp2 = >vps[2];
+   struct rockchip_crtc_state *vcstate = 
to_rockchip_crtc_state(vp->crtc.state);
  
  	adjusted_mode = >crtc.state->adjusted_mode;

hsync_len = adjusted_mode->crtc_hsync_end - 
adjusted_mode->crtc_hsync_start;
@@ -1964,7 +1968,14 @@ static void vop2_setup_layer_mixer(struct 
vop2_video_port *vp)
pre_scan_dly = ((bg_dly + (hdisplay >> 1) - 1) << 16) | hsync_len;
vop2_vp_write(vp, RK3568_VP_PRE_SCAN_HTIMING, pre_scan_dly);
  
-	vop2_writel(vop2, RK3568_OVL_CTRL, 0);

+   ovl_ctrl = vop2_readl(vop2, RK3568_OVL_CTRL);
+   if (vcstate->yuv_overlay)
+   ovl_ctrl |= BIT(vp->id);
+   else
+   ovl_ctrl &= ~BIT(vp->id);

Some

#define RK3568_OVL_CTRL__YUV_MODE(vp)   BIT(vp)

Would be nice.



Okay, will do.


+
+   vop2_writel(vop2, RK3568_OVL_CTRL, ovl_ctrl);

Is it necessary to write this register twice?


I don't think so. Just follow the original code write it here.

Anyway, I will just write once in next version.


And would you please check my response about debugfs patch[0] when it is 
convenient for you?

I want to know what you think, and prepare the next version.


[0]https://patchwork.kernel.org/project/dri-devel/patch/20231122125601.3455031-1-andys...@163.com/




+
port_sel = vop2_readl(vop2, RK3568_OVL_PORT_SEL);
port_sel &= RK3568_OVL_PORT_SEL__SEL_PORT;
  
@@ -2036,9 +2047,11 @@ static void vop2_setup_layer_mixer(struct vop2_video_port *vp)

layer_sel |= RK3568_OVL_LAYER_SEL__LAYER(nlayer + ofs, 5);
}
  
+	ovl_ctrl |= RK3568_OVL_CTRL__LAYERSEL_REGDONE_IMD;

+
vop2_writel(vop2, RK3568_OVL_LAYER_SEL, layer_sel);
vop2_writel(vop2, RK3568_OVL_PORT_SEL, port_sel);
-   vop2_writel(vop2, RK3568_OVL_CTRL, 
RK3568_OVL_CTRL__LAYERSEL_REGDONE_IMD);
+   vop2_writel(vop2, RK3568_OVL_CTRL, ovl_ctrl);

Sascha



Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-28 Thread Moritz Fischer

On Tue, Nov 28, 2023 at 08:48:06PM -0400, Jason Gunthorpe wrote:

The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.



Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/Kconfig| 2 --
  drivers/acpi/Makefile   | 2 +-
  drivers/acpi/arm64/Kconfig  | 1 +
  drivers/acpi/arm64/Makefile | 2 +-
  drivers/iommu/Kconfig   | 1 +
  5 files changed, 4 insertions(+), 4 deletions(-)



diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.



-if ARM64
  source "drivers/acpi/arm64/Kconfig"
-endif



  config ACPI_PPTT
bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
  video-objs+= acpi_video.o video_detect.o
  obj-y += dptf/



-obj-$(CONFIG_ARM64)+= arm64/
+obj-y  += arm64/



  obj-$(CONFIG_ACPI_VIOT)   += viot.o



diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT



  config ACPI_AGDI
bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
  obj-$(CONFIG_ACPI_APMT)   += apmt.o
  obj-$(CONFIG_ARM_AMBA)+= amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI
help
  Support for implementations of the ARM System MMU architecture
  versions 1 and 2.
--
2.42.0



Reviewed-by: Moritz Fischer 

Ok, now the previous patch makes sense :)

Cheers,
Moritz


Re: [PATCH 09/10] ACPI: IORT: Cast from ULL to phys_addr_t

2023-11-28 Thread Moritz Fischer

On Tue, Nov 28, 2023 at 08:48:05PM -0400, Jason Gunthorpe wrote:

gcc on i386 (when compile testing) warns:


This is a weird test. The Makefile for drivers/acpi/arm64 is conditional
on CONFIG_ARM64. How does this happen?


8->8

obj-$(CONFIG_ARM64) += arm64/

8->8



  drivers/acpi/arm64/iort.c:2014:18: warning: implicit conversion  
from 'unsigned long long' to 'phys_addr_t' (aka 'unsigned int') changes  
value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]
local_limit =  
DMA_BIT_MASK(ncomp->memory_address_limit);



Because DMA_BIT_MASK returns a large ULL constant. Explicitly truncate it
to phys_addr_t.



Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/arm64/iort.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)



diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 6496ff5a6ba20d..bdaf9256870d92 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -2011,7 +2011,8 @@ phys_addr_t __init  
acpi_iort_dma_get_max_cpu_address(void)



case ACPI_IORT_NODE_NAMED_COMPONENT:
ncomp = (struct acpi_iort_named_component 
*)node->node_data;
-   local_limit = DMA_BIT_MASK(ncomp->memory_address_limit);
+   local_limit = (phys_addr_t)DMA_BIT_MASK(
+   ncomp->memory_address_limit);
limit = min_not_zero(limit, local_limit);
break;


@@ -2020,7 +2021,8 @@ phys_addr_t __init  
acpi_iort_dma_get_max_cpu_address(void)

break;



rc = (struct acpi_iort_root_complex *)node->node_data;
-   local_limit = DMA_BIT_MASK(rc->memory_address_limit);
+   local_limit = (phys_addr_t)DMA_BIT_MASK(
+   rc->memory_address_limit);
limit = min_not_zero(limit, local_limit);
break;
}
--
2.42.0



Cheers,
Moritz


Re: [PATCH 07/10] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()

2023-11-28 Thread Moritz Fischer

On Tue, Nov 28, 2023 at 08:48:03PM -0400, Jason Gunthorpe wrote:

Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.



Acked-by: Rafael J. Wysocki 
Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/scan.c | 29 +
  1 file changed, 17 insertions(+), 12 deletions(-)



diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 444a0b3c72f2d8..340ba720c72129 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1562,8 +1562,7 @@ static inline const struct iommu_ops  
*acpi_iommu_fwspec_ops(struct device *dev)

return fwspec ? fwspec->ops : NULL;
  }


-static const struct iommu_ops *acpi_iommu_configure_id(struct device  
*dev,

-  const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
  {
int err;
const struct iommu_ops *ops;
@@ -1577,7 +1576,7 @@ static const struct iommu_ops  
*acpi_iommu_configure_id(struct device *dev,

ops = acpi_iommu_fwspec_ops(dev);
if (ops) {
mutex_unlock(_probe_device_lock);
-   return ops;
+   return 0;
}



err = iort_iommu_configure_id(dev, id_in);
@@ -1594,12 +1593,14 @@ static const struct iommu_ops  
*acpi_iommu_configure_id(struct device *dev,



/* Ignore all other errors apart from EPROBE_DEFER */
if (err == -EPROBE_DEFER) {
-   return ERR_PTR(err);
+   return err;
} else if (err) {
dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-   return NULL;
+   return -ENODEV;
}
-   return acpi_iommu_fwspec_ops(dev);
+   if (!acpi_iommu_fwspec_ops(dev))
+   return -ENODEV;
+   return 0;
  }



  #else /* !CONFIG_IOMMU_API */
@@ -1611,10 +1612,9 @@ int acpi_iommu_fwspec_init(struct device *dev, u32  
id,

return -ENODEV;
  }


-static const struct iommu_ops *acpi_iommu_configure_id(struct device  
*dev,

-  const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
  {
-   return NULL;
+   return -ENODEV;
  }



  #endif /* !CONFIG_IOMMU_API */
@@ -1628,7 +1628,7 @@ static const struct iommu_ops  
*acpi_iommu_configure_id(struct device *dev,

  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
  const u32 *input_id)
  {
-   const struct iommu_ops *iommu;
+   int ret;



if (attr == DEV_DMA_NOT_SUPPORTED) {
set_dma_ops(dev, _dummy_ops);
@@ -1637,10 +1637,15 @@ int acpi_dma_configure_id(struct device *dev,  
enum dev_dma_attr attr,



acpi_arch_dma_setup(dev);



-   iommu = acpi_iommu_configure_id(dev, input_id);
-   if (PTR_ERR(iommu) == -EPROBE_DEFER)
+   ret = acpi_iommu_configure_id(dev, input_id);
+   if (ret == -EPROBE_DEFER)
return -EPROBE_DEFER;



+   /*
+* Historically this routine doesn't fail driver probing due to errors
+* in acpi_iommu_configure_id()
+*/
+
arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT);



return 0;
--
2.42.0



Reviewed-by: Moritz Fischer 

Cheers,
Moritz


Re: [PATCH 04/10] iommu: Mark dev_iommu_get() with lockdep

2023-11-28 Thread Moritz Fischer

On Tue, Nov 28, 2023 at 08:48:00PM -0400, Jason Gunthorpe wrote:

Allocation of dev->iommu must be done under the
iommu_probe_device_lock. Mark this with lockdep to discourage future
mistakes.



Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 2 ++
  1 file changed, 2 insertions(+)



diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0d25468d53a68a..4323b6276e977f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -334,6 +334,8 @@ static struct dev_iommu *dev_iommu_get(struct device  
*dev)

  {
struct dev_iommu *param = dev->iommu;



+   lockdep_assert_held(_probe_device_lock);
+
if (param)
return param;



--
2.42.0



Reviewed-by: Moritz Fischer 

Cheers,
Moritz


Re: [PATCH 03/10] iommu/of: Use -ENODEV consistently in of_iommu_configure()

2023-11-28 Thread Moritz Fischer

On Tue, Nov 28, 2023 at 08:47:59PM -0400, Jason Gunthorpe wrote:

Instead of returning 1 and trying to handle positive error codes just
stick to the convention of returning -ENODEV. Remove references to ops
from of_iommu_configure(), a NULL ops will already generate an error code.



There is no reason to check dev->bus, if err=0 at this point then the
called configure functions thought there was an iommu and we should try to
probe it. Remove it.



Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/of_iommu.c | 49 
  1 file changed, 15 insertions(+), 34 deletions(-)



diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c6510d7e7b241b..164317bfb8a81f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -17,8 +17,6 @@
  #include 
  #include 



-#define NO_IOMMU   1
-
  static int of_iommu_xlate(struct device *dev,
  struct of_phandle_args *iommu_spec)
  {
@@ -29,7 +27,7 @@ static int of_iommu_xlate(struct device *dev,
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
!of_device_is_available(iommu_spec->np))
-   return NO_IOMMU;
+   return -ENODEV;



ret = iommu_fwspec_init(dev, _spec->np->fwnode, ops);
if (ret)
@@ -61,7 +59,7 @@ static int of_iommu_configure_dev_id(struct device_node  
*master_np,

 "iommu-map-mask", _spec.np,
 iommu_spec.args);
if (err)
-   return err == -ENODEV ? NO_IOMMU : err;
+   return err;



err = of_iommu_xlate(dev, _spec);
of_node_put(iommu_spec.np);
@@ -72,7 +70,7 @@ static int of_iommu_configure_dev(struct device_node  
*master_np,

  struct device *dev)
  {
struct of_phandle_args iommu_spec;
-   int err = NO_IOMMU, idx = 0;
+   int err = -ENODEV, idx = 0;



while (!of_parse_phandle_with_args(master_np, "iommus",
   "#iommu-cells",
@@ -117,9 +115,8 @@ static int of_iommu_configure_device(struct  
device_node *master_np,

  int of_iommu_configure(struct device *dev, struct device_node *master_np,
   const u32 *id)
  {
-   const struct iommu_ops *ops = NULL;
struct iommu_fwspec *fwspec;
-   int err = NO_IOMMU;
+   int err;



if (!master_np)
return -ENODEV;
@@ -153,37 +150,21 @@ int of_iommu_configure(struct device *dev, struct  
device_node *master_np,

} else {
err = of_iommu_configure_device(master_np, dev, id);
}
-
-   /*
-* Two success conditions can be represented by non-negative err here:
-* >0 : there is no IOMMU, or one was unavailable for non-fatal reasons
-*  0 : we found an IOMMU, and dev->fwspec is initialised appropriately
-* <0 : any actual error
-*/
-   if (!err) {
-   /* The fwspec pointer changed, read it again */
-   fwspec = dev_iommu_fwspec_get(dev);
-   ops= fwspec->ops;
-   }
mutex_unlock(_probe_device_lock);



-   /*
-* If we have reason to believe the IOMMU driver missed the initial
-* probe for dev, replay it to get things in order.
-*/
-   if (!err && dev->bus)
-   err = iommu_probe_device(dev);
-
-   /* Ignore all other errors apart from EPROBE_DEFER */
-   if (err < 0) {
-   if (err == -EPROBE_DEFER)
-   return err;
-   dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+   if (err == -ENODEV || err == -EPROBE_DEFER)
return err;
-   }
-   if (!ops)
-   return -ENODEV;
+   if (err)
+   goto err_log;
+
+   err = iommu_probe_device(dev);
+   if (err)
+   goto err_log;
return 0;
+
+err_log:
+   dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+   return err;
  }



  static enum iommu_resv_type __maybe_unused
--
2.42.0



Reviewed-by: Moritz Fischer 


Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Dave Airlie
On Tue, 28 Nov 2023 at 23:07, Christian König  wrote:
>
> Am 28.11.23 um 13:50 schrieb Weixi Zhu:
> > The problem:
> >
> > Accelerator driver developers are forced to reinvent external MM subsystems
> > case by case, because Linux core MM only considers host memory resources.
> > These reinvented MM subsystems have similar orders of magnitude of LoC as
> > Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
> > 30K. Meanwhile, more and more vendors are implementing their own
> > accelerators, e.g. Microsoft's Maia 100. At the same time,
> > application-level developers suffer from poor programmability -- they must
> > consider parallel address spaces and be careful about the limited device
> > DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
> > be shared by the accelerator, or the abundant host DRAM can further
> > transparently backup the device local memory.
> >
> > These external MM systems share similar mechanisms except for the
> > hardware-dependent part, so reinventing them is effectively introducing
> > redundant code (14K~70K for each case). Such developing/maintaining is not
> > cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
> > need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
> > notifiers/HMM. This raises the bar for driver development, since developers
> > must understand how Linux MM works. Further, it creates code maintenance
> > problems -- any changes to Linux MM potentially require coordinated changes
> > to accelerator drivers using low-level MM APIs.
> >
> > Putting a cache-coherent bus between host and device will not make these
> > external MM subsystems disappear. For example, a throughput-oriented
> > accelerator will not tolerate executing heavy memory access workload with
> > a host MMU/IOMMU via a remote bus. Therefore, devices will still have
> > their own MMU and pick a simpler page table format for lower address
> > translation overhead, requiring external MM subsystems.
> >
> > 
> >
> > What GMEM (Generalized Memory Management [1]) does:
> >
> > GMEM extends Linux MM to share its machine-independent MM code. Only
> > high-level interface is provided for device drivers. This prevents
> > accelerator drivers from reinventing the wheel, but relies on drivers to
> > implement their hardware-dependent functions declared by GMEM. GMEM's key
> > interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
> > gm_dev_register_physmem(). Here briefly describe how a device driver
> > utilizes them:
> > 1. At boot time, call gm_dev_create() and registers the implementation of
> > hardware-dependent functions as declared in struct gm_mmu.
> >   - If the device has local DRAM, call gm_dev_register_physmem() to
> > register available physical addresses.
> > 2. When a device context is initialized (e.g. triggered by ioctl), check if
> > the current CPU process has been attached to a gmem address space
> > (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
> > to it.
> > 3. Call gm_as_attach() to attach the device context to a gmem address space.
> > 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
> > device computation happens.
> >
> > GMEM has changed the following assumptions in Linux MM:
> >1. An mm_struct not only handle a single CPU context, but may also handle
> >   external memory contexts encapsulated as gm_context listed in
> >   mm->gm_as. An external memory context can include a few or all of the
> >   following parts: an external MMU (that requires TLB invalidation), an
> >   external page table (that requires PTE manipulation) and external DRAM
> >   (that requires physical memory management).
>
> Well that is pretty much exactly what AMD has already proposed with KFD
> and was rejected for rather good reasons.

> >
> > MMU functions
> > The MMU functions peer_map() and peer_unmap() overlap other functions,
> > leaving a question if the MMU functions should be decoupled as more basic
> > operations. Decoupling them could potentially prevent device drivers
> > coalescing these basic steps within a single host-device communication
> > operation, while coupling them makes it more difficult for device drivers
> > to utilize GMEM interface.
>
> Well to be honest all of this sounds like history to me. We have already
> seen the same basic approach in KFD, HMM and to some extend in TTM as well.
>
> And all of them more or less failed. Why should this here be different?


Any info we have on why this has failed to work in the past would be
useful to provide. This is one of those cases where we may not have
documented the bad ideas to stop future developers from thinking they
are bad.

I do think we would want more common code in this area, but I would
think we'd have it more on the driver infrastructure side, than in the
core mm.

Dave.


Re: Radeon regression in 6.6 kernel

2023-11-28 Thread Luben Tuikov
On 2023-11-28 17:13, Alex Deucher wrote:
> On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi  wrote:
>>
>> Alex Deucher  writes:
>>
 In that case those are the already known problems with the scheduler
 changes, aren't they?
>>>
>>> Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
>>> misunderstanding what the original report was actually testing.  If it
>>> was 6.7, then try reverting:
>>> 56e449603f0ac580700621a356d35d5716a62ce5
>>> b70438004a14f4d0f9890b3297cd66248728546c
>>
>> At some point it was suggested that I file a gitlab issue, but I took
>> this to mean it was already known and being worked on.  -rc3 came out
>> today and still has the problem.  Is there a known issue I could track?
>>
> 
> At this point, unless there are any objections, I think we should just
> revert the two patches
Uhm, no.

Why "the two" patches?

This email, part of this thread,

https://lore.kernel.org/all/87r0kircdo@vps.thesusis.net/

clearly states that reverting *only* this commit,
56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number of 
run-queues
*does not* mitigate the failed suspend. (Furthermore, this commit doesn't 
really change
anything operational, other than using an allocated array, instead of a static 
one, in DRM,
while the 2nd patch is solely contained within the amdgpu driver code.)

Leaving us with only this change,
b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level
to be at fault, as the kernel log attached in the linked email above shows.

The conclusion is that only b70438004a14f4 needs reverting.
-- 
Regards,
Luben


OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/6] dt-bindings: display: Add yamls for JH7110 display system

2023-11-28 Thread Keith Zhao



On 2023/10/25 20:50, Krzysztof Kozlowski wrote:
> On 25/10/2023 12:39, Keith Zhao wrote:
>> StarFive SoCs JH7110 display system:
> 
> A nit, subject: drop second/last, redundant "yamls for". The
> "dt-bindings" prefix is already stating that these are bindings, so
> format is fixed.
> 
>> lcd-controller bases verisilicon dc8200 IP,
>> and hdmi bases Innosilicon IP. Add bindings for them.
> 
> Please make it a proper sentences, with proper wrapping.
> 
>> 
>> also update MAINTAINERS for dt-bindings
> 
> Not a sentence, but also not really needed.ok I see.
> 
>> 
>> about this patch, I tested the dtbs_check and dt_binding_check
>> with the result pass.
>> Based on the feedback of the previous version, the corresponding arrangement 
>> is made
> 
> Not relevant, so not really suitable for commit msg.
> 
>> 
>> Signed-off-by: Keith Zhao 
>> ---
>>  .../starfive/starfive,display-subsystem.yaml  |  41 +++
>>  .../starfive/starfive,jh7110-dc8200.yaml  | 109 ++
>>  .../starfive/starfive,jh7110-inno-hdmi.yaml   |  85 ++
>>  MAINTAINERS   |   7 ++
>>  4 files changed, 242 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/display/starfive/starfive,display-subsystem.yaml
>>  create mode 100644 
>> Documentation/devicetree/bindings/display/starfive/starfive,jh7110-dc8200.yaml
>>  create mode 100644 
>> Documentation/devicetree/bindings/display/starfive/starfive,jh7110-inno-hdmi.yaml
>> 
>> diff --git 
>> a/Documentation/devicetree/bindings/display/starfive/starfive,display-subsystem.yaml
>>  
>> b/Documentation/devicetree/bindings/display/starfive/starfive,display-subsystem.yaml
>> new file mode 100644
>> index 0..f45b97b08
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/display/starfive/starfive,display-subsystem.yaml
>> @@ -0,0 +1,41 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: 
>> http://devicetree.org/schemas/display/starfive/starfive,display-subsystem.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Starfive DRM master device
> 
> What is DRM in hardware? I know Digital Rights Management, but then
> subsystem seems wrong. If you mean Linux DRM, then Linux is not a
> hardware, so drop all Linuxisms and describe hardware.
ok , will only keep hardware describe in my next version
> 
> 
>> +
>> +maintainers:
>> +  - Keith Zhao 
>> +  - ShengYang Chen 
>> +
>> +description:
>> +  The Starfive DRM master device is a virtual device needed to list all
> 
> Virtual device? Then not suitable for bindings, sorry.
> 
>> +  display controller or other display interface nodes that comprise the
>> +  graphics subsystem.
>> +
>> +properties:
>> +  compatible:
>> +const: starfive,display-subsystem
>> +
>> +  ports:
>> +$ref: /schemas/types.yaml#/definitions/phandle-array
> 
> No, ports is not phandle-array. ports is object, always.
> 
>> +description:
>> +  Should contain a list of phandles pointing to display interface ports
>> +  of display controller devices. Display controller definitions as 
>> defined
>> +  in Documentation/devicetree/bindings/display/starfive/
>> +  starfive,jh7110-dc8200.yaml
> 
> Use standard graph ports, not some own, custom property.
> 
> Anyway, entire binding should be dropped. You do not need it even.
Hi Krzysztof:
Virtual device is not suitable for bindings, matbe I need associate it with the 
real hardware.
such as the top clocks & reset , irq , etc.
Currently I configure them in another yaml file. Logically speaking, this is 
more suitable.

Can adding the corresponding hardware description change its fate of being 
deleted?
 
> 
>> +
>> +required:
>> +  - compatible
>> +  - ports
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> +  - |
>> +display-subsystem {
>> +compatible = "starfive,display-subsystem";
>> +ports = <_out>;
>> +};
>> diff --git 
>> a/Documentation/devicetree/bindings/display/starfive/starfive,jh7110-dc8200.yaml
>>  
>> b/Documentation/devicetree/bindings/display/starfive/starfive,jh7110-dc8200.yaml
>> new file mode 100644
>> index 0..87051cddf
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/display/starfive/starfive,jh7110-dc8200.yaml
>> @@ -0,0 +1,109 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: 
>> http://devicetree.org/schemas/display/starfive/starfive,jh7110-dc8200.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: StarFive display controller
>> +
>> +description:
>> +  The StarFive SoC uses the display controller based on Verisilicon IP
>> +  to transfer the image data from a video memory buffer to an external
>> +  LCD interface.
>> +
>> +maintainers:
>> +  - Keith Zhao 
>> +
>> +properties:
>> +  compatible:
>> +const: starfive,jh7110-dc8200
>> +
>> +  reg:
>> +minItems: 1
>> +items:
>> +  - 

Re: [PATCH 04/10] iommu: Mark dev_iommu_get() with lockdep

2023-11-28 Thread Baolu Lu

On 2023/11/29 8:48, Jason Gunthorpe wrote:

Allocation of dev->iommu must be done under the
iommu_probe_device_lock. Mark this with lockdep to discourage future
mistakes.

Reviewed-by: Jerry Snitselaar
Tested-by: Hector Martin
Signed-off-by: Jason Gunthorpe
---
  drivers/iommu/iommu.c | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Lu Baolu 

Best regards,
baolu


Re: [PATCH 07/10] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()

2023-11-28 Thread Baolu Lu

On 2023/11/29 8:48, Jason Gunthorpe wrote:

Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Acked-by: Rafael J. Wysocki
Reviewed-by: Jerry Snitselaar
Tested-by: Hector Martin
Signed-off-by: Jason Gunthorpe
---
  drivers/acpi/scan.c | 29 +
  1 file changed, 17 insertions(+), 12 deletions(-)


Reviewed-by: Lu Baolu 

Best regards,
baolu


Re: [PATCH 02/10] iommmu/of: Do not return struct iommu_ops from of_iommu_configure()

2023-11-28 Thread Baolu Lu

On 2023/11/29 8:47, Jason Gunthorpe wrote:

Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Reviewed-by: Jerry Snitselaar
Acked-by: Rob Herring
Tested-by: Hector Martin
Signed-off-by: Jason Gunthorpe
---
  drivers/iommu/of_iommu.c | 31 +++
  drivers/of/device.c  | 22 +++---
  include/linux/of_iommu.h | 13 ++---
  3 files changed, 40 insertions(+), 26 deletions(-)


Reviewed-by: Lu Baolu 

Best regards,
baolu


Friendly ping. I think this patch was forgotten.//回复: [PATCH] drm/qxl: remove unused declaration

2023-11-28 Thread 何敏红
Friendly ping. I think this patch was forgotten.


 

主 题:[PATCH] drm/qxl: remove unused declaration 日 期:2023-11-10 13:50 发件人:何敏红 收件人:airlied;kraxel;maarten.lankhorst;mripard;tzimmermann;airlied;daniel;



Some functions are never used by the driver,removing the functions declaration, it can be reducing program size,and improving code readability and maintainability.Signed-off-by: heminhong ---drivers/gpu/drm/qxl/qxl_drv.h | 7 ---1 file changed, 7 deletions(-)diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.hindex 307a890fde13..32069acd93f8 100644--- a/drivers/gpu/drm/qxl/qxl_drv.h+++ b/drivers/gpu/drm/qxl/qxl_drv.h@@ -119,7 +119,6 @@ struct qxl_output {#define to_qxl_crtc(x) container_of(x, struct qxl_crtc, base)#define drm_connector_to_qxl_output(x) container_of(x, struct qxl_output, base)-#define drm_encoder_to_qxl_output(x) container_of(x, struct qxl_output, enc)struct qxl_mman {struct ttm_device bdev;@@ -256,8 +255,6 @@ struct qxl_device {#define to_qxl(dev) container_of(dev, struct qxl_device, ddev)-int qxl_debugfs_fence_init(struct qxl_device *rdev);-int qxl_device_init(struct qxl_device *qdev, struct pci_dev *pdev);void qxl_device_fini(struct qxl_device *qdev);@@ -344,8 +341,6 @@ qxl_image_alloc_objects(struct qxl_device *qdev,int height, int stride);void qxl_image_free_objects(struct qxl_device *qdev, struct qxl_drm_image *dimage);-void qxl_update_screen(struct qxl_device *qxl);-/* qxl io operations (qxl_cmd.c) */void qxl_io_create_primary(struct qxl_device *qdev,@@ -445,8 +440,6 @@ int qxl_hw_surface_dealloc(struct qxl_device *qdev,int qxl_bo_check_id(struct qxl_device *qdev, struct qxl_bo *bo);-struct qxl_drv_surface *-qxl_surface_lookup(struct drm_device *dev, int surface_id);void qxl_surface_evict(struct qxl_device *qdev, struct qxl_bo *surf, bool freeing);/* qxl_ioctl.c */-- 2.25.1




回复: [PATCH v2] drm/i915: correct the input parameter on _intel_dsb_commit()

2023-11-28 Thread 何敏红
Friendly ping. I think this patch was forgotten.


 

主 题:[PATCH v2] drm/i915: correct the input parameter on _intel_dsb_commit() 日 期:2023-11-14 10:43 发件人:何敏红 收件人:何敏红;



Current, the dewake_scanline variable is defined as unsigned int,an unsigned int variable that is always greater than or equal to 0.when _intel_dsb_commit function is called by intel_dsb_commit function,the dewake_scanline variable may have an int value.So the dewake_scanline variable is necessary to defined as an int.Fixes: f83b94d23770 ("drm/i915/dsb: Use DEwake to combat PkgC latency")Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310052201.anvbpgpr-...@intel.com/Cc: Ville Syrjälä Cc: Uma Shankar Signed-off-by: heminhong ---drivers/gpu/drm/i915/display/intel_dsb.c | 2 +-1 file changed, 1 insertion(+), 1 deletion(-)diff --git a/drivers/gpu/drm/i915/display/intel_dsb.c b/drivers/gpu/drm/i915/display/intel_dsb.cindex 78b6fe24dcd8..7fd6280c54a7 100644--- a/drivers/gpu/drm/i915/display/intel_dsb.c+++ b/drivers/gpu/drm/i915/display/intel_dsb.c@@ -340,7 +340,7 @@ static int intel_dsb_dewake_scanline(const struct intel_crtc_state *crtc_state)}static void _intel_dsb_commit(struct intel_dsb *dsb, u32 ctrl,- unsigned int dewake_scanline)+ int dewake_scanline){struct intel_crtc *crtc = dsb->crtc;struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);-- 2.25.1




Re: [Nouveau] [PATCH][next] nouveau/gsp: replace zero-length array with flex-array member and use __counted_by

2023-11-28 Thread Danilo Krummrich

On 11/29/23 02:06, Gustavo A. R. Silva wrote:



On 11/28/23 19:01, Danilo Krummrich wrote:

On 11/16/23 20:55, Timur Tabi wrote:

On Thu, 2023-11-16 at 20:45 +0100, Danilo Krummrich wrote:

As I already mentioned for Timur's patch [2], I'd prefer to get a fix
upstream
(meaning [1] in this case). Of course, that's probably more up to Timur to
tell
if this will work out.


Don't count on it.


I see. Well, I think it's fine. Once we implement a decent abstraction we likely
don't need those header files in the kernel anymore.

@Gustavo, if you agree I will discard the indentation change when applying the
patch to keep the diff as small as possible.


No problem.


Applied to drm-misc-fixes.



Thanks
--
Gustavo






Re: [PATCH] nouveau/gsp/r535: remove a stray unlock in r535_gsp_rpc_send()

2023-11-28 Thread Danilo Krummrich

On 11/27/23 13:56, Dan Carpenter wrote:

This unlock doesn't belong here and it leads to a double unlock in
the caller, r535_gsp_rpc_push().

Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Dan Carpenter 


Good catch - applied to drm-misc-fixes.

- Danilo


---
  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index dc44f5c7833f..818e5c73b7a6 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -365,10 +365,8 @@ r535_gsp_rpc_send(struct nvkm_gsp *gsp, void *argv, bool 
wait, u32 repc)
}
  
  	ret = r535_gsp_cmdq_push(gsp, rpc);

-   if (ret) {
-   mutex_unlock(>cmdq.mutex);
+   if (ret)
return ERR_PTR(ret);
-   }
  
  	if (wait) {

msg = r535_gsp_msg_recv(gsp, fn, repc);




Re: [Nouveau] [PATCH][next] nouveau/gsp: replace zero-length array with flex-array member and use __counted_by

2023-11-28 Thread Gustavo A. R. Silva




On 11/28/23 19:01, Danilo Krummrich wrote:

On 11/16/23 20:55, Timur Tabi wrote:

On Thu, 2023-11-16 at 20:45 +0100, Danilo Krummrich wrote:

As I already mentioned for Timur's patch [2], I'd prefer to get a fix
upstream
(meaning [1] in this case). Of course, that's probably more up to Timur to
tell
if this will work out.


Don't count on it.


I see. Well, I think it's fine. Once we implement a decent abstraction we likely
don't need those header files in the kernel anymore.

@Gustavo, if you agree I will discard the indentation change when applying the
patch to keep the diff as small as possible.


No problem.

Thanks
--
Gustavo




Re: [Nouveau] [PATCH][next] nouveau/gsp: replace zero-length array with flex-array member and use __counted_by

2023-11-28 Thread Danilo Krummrich

On 11/16/23 20:55, Timur Tabi wrote:

On Thu, 2023-11-16 at 20:45 +0100, Danilo Krummrich wrote:

As I already mentioned for Timur's patch [2], I'd prefer to get a fix
upstream
(meaning [1] in this case). Of course, that's probably more up to Timur to
tell
if this will work out.


Don't count on it.


I see. Well, I think it's fine. Once we implement a decent abstraction we likely
don't need those header files in the kernel anymore.

@Gustavo, if you agree I will discard the indentation change when applying the
patch to keep the diff as small as possible.

- Danilo



Even if I did change [0] to [], I'm not going to be able to add the
"__counted_by(numEntries);" because that's just not something that our build
system uses.

And even then, I would need to change all [0] to [].

You're not going to be able to use RM's header files as-is anyway in the
long term.  If we changed the layout of PACKED_REGISTRY_TABLE, we're not
going to create a PACKED_REGISTRY_TABLE2 and keep both around.  We're just
going to change PACKED_REGISTRY_TABLE and pretend the previous version never
existed.  You will then have to manually copy the new struct to your header
files and and maintain two versions yourself.







Re: [PATCH 1/3] drm/bridge: ti-sn65dsi86: Simplify using pm_runtime_resume_and_get()

2023-11-28 Thread Doug Anderson
Hi,

On Thu, Nov 23, 2023 at 9:54 AM Uwe Kleine-König
 wrote:
>
> pm_runtime_resume_and_get() already drops the runtime PM usage counter
> in the error case. So a call to pm_runtime_put_sync() can be dropped.
>
> Signed-off-by: Uwe Kleine-König 
> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Pushed this patch to drm-misc-next:

c9d99c73940e drm/bridge: ti-sn65dsi86: Simplify using
pm_runtime_resume_and_get()


[PATCH 02/10] iommmu/of: Do not return struct iommu_ops from of_iommu_configure()

2023-11-28 Thread Jason Gunthorpe
Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Reviewed-by: Jerry Snitselaar 
Acked-by: Rob Herring 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/of_iommu.c | 31 +++
 drivers/of/device.c  | 22 +++---
 include/linux/of_iommu.h | 13 ++---
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5ecca53847d325..c6510d7e7b241b 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -107,16 +107,22 @@ static int of_iommu_configure_device(struct device_node 
*master_np,
  of_iommu_configure_dev(master_np, dev);
 }
 
-const struct iommu_ops *of_iommu_configure(struct device *dev,
-  struct device_node *master_np,
-  const u32 *id)
+/*
+ * Returns:
+ *  0 on success, an iommu was configured
+ *  -ENODEV if the device does not have any IOMMU
+ *  -EPROBEDEFER if probing should be tried again
+ *  -errno fatal errors
+ */
+int of_iommu_configure(struct device *dev, struct device_node *master_np,
+  const u32 *id)
 {
const struct iommu_ops *ops = NULL;
struct iommu_fwspec *fwspec;
int err = NO_IOMMU;
 
if (!master_np)
-   return NULL;
+   return -ENODEV;
 
/* Serialise to make dev->iommu stable under our potential fwspec */
mutex_lock(_probe_device_lock);
@@ -124,7 +130,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
if (fwspec) {
if (fwspec->ops) {
mutex_unlock(_probe_device_lock);
-   return fwspec->ops;
+   return 0;
}
/* In the deferred case, start again from scratch */
iommu_fwspec_free(dev);
@@ -169,14 +175,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
err = iommu_probe_device(dev);
 
/* Ignore all other errors apart from EPROBE_DEFER */
-   if (err == -EPROBE_DEFER) {
-   ops = ERR_PTR(err);
-   } else if (err < 0) {
-   dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-   ops = NULL;
+   if (err < 0) {
+   if (err == -EPROBE_DEFER)
+   return err;
+   dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+   return err;
}
-
-   return ops;
+   if (!ops)
+   return -ENODEV;
+   return 0;
 }
 
 static enum iommu_resv_type __maybe_unused
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 65c71be71a8d45..873d933e8e6d1d 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -93,12 +93,12 @@ of_dma_set_restricted_buffer(struct device *dev, struct 
device_node *np)
 int of_dma_configure_id(struct device *dev, struct device_node *np,
bool force_dma, const u32 *id)
 {
-   const struct iommu_ops *iommu;
const struct bus_dma_region *map = NULL;
struct device_node *bus_np;
u64 dma_start = 0;
u64 mask, end, size = 0;
bool coherent;
+   int iommu_ret;
int ret;
 
if (np == dev->of_node)
@@ -181,21 +181,29 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev_dbg(dev, "device is%sdma coherent\n",
coherent ? " " : " not ");
 
-   iommu = of_iommu_configure(dev, np, id);
-   if (PTR_ERR(iommu) == -EPROBE_DEFER) {
+   iommu_ret = of_iommu_configure(dev, np, id);
+   if (iommu_ret == -EPROBE_DEFER) {
/* Don't touch range map if it wasn't set from a valid 
dma-ranges */
if (!ret)
dev->dma_range_map = NULL;
kfree(map);
return -EPROBE_DEFER;
-   }
+   } else if (iommu_ret == -ENODEV) {
+   dev_dbg(dev, "device is not behind an iommu\n");
+   } else if (iommu_ret) {
+   dev_err(dev, "iommu configuration for device failed with %pe\n",
+   ERR_PTR(iommu_ret));
 
-   dev_dbg(dev, "device is%sbehind an iommu\n",
-   iommu ? " " : " not ");
+   /*
+* Historically this routine doesn't fail driver probing
+* due to errors in of_iommu_configure()
+*/
+   } else
+   dev_dbg(dev, "device is behind an iommu\n");
 
arch_setup_dma_ops(dev, dma_start, size, coherent);
 
-   if (!iommu)
+   if (iommu_ret)
of_dma_set_restricted_buffer(dev, np);
 
return 0;
diff --git a/include/linux/of_iommu.h b/include/linux/of_iommu.h
index 9a5e6b410dd2fb..e61cbbe12dac6f 100644
--- a/include/linux/of_iommu.h
+++ b/include/linux/of_iommu.h
@@ 

[PATCH 03/10] iommu/of: Use -ENODEV consistently in of_iommu_configure()

2023-11-28 Thread Jason Gunthorpe
Instead of returning 1 and trying to handle positive error codes just
stick to the convention of returning -ENODEV. Remove references to ops
from of_iommu_configure(), a NULL ops will already generate an error code.

There is no reason to check dev->bus, if err=0 at this point then the
called configure functions thought there was an iommu and we should try to
probe it. Remove it.

Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/of_iommu.c | 49 
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index c6510d7e7b241b..164317bfb8a81f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -17,8 +17,6 @@
 #include 
 #include 
 
-#define NO_IOMMU   1
-
 static int of_iommu_xlate(struct device *dev,
  struct of_phandle_args *iommu_spec)
 {
@@ -29,7 +27,7 @@ static int of_iommu_xlate(struct device *dev,
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
!of_device_is_available(iommu_spec->np))
-   return NO_IOMMU;
+   return -ENODEV;
 
ret = iommu_fwspec_init(dev, _spec->np->fwnode, ops);
if (ret)
@@ -61,7 +59,7 @@ static int of_iommu_configure_dev_id(struct device_node 
*master_np,
 "iommu-map-mask", _spec.np,
 iommu_spec.args);
if (err)
-   return err == -ENODEV ? NO_IOMMU : err;
+   return err;
 
err = of_iommu_xlate(dev, _spec);
of_node_put(iommu_spec.np);
@@ -72,7 +70,7 @@ static int of_iommu_configure_dev(struct device_node 
*master_np,
  struct device *dev)
 {
struct of_phandle_args iommu_spec;
-   int err = NO_IOMMU, idx = 0;
+   int err = -ENODEV, idx = 0;
 
while (!of_parse_phandle_with_args(master_np, "iommus",
   "#iommu-cells",
@@ -117,9 +115,8 @@ static int of_iommu_configure_device(struct device_node 
*master_np,
 int of_iommu_configure(struct device *dev, struct device_node *master_np,
   const u32 *id)
 {
-   const struct iommu_ops *ops = NULL;
struct iommu_fwspec *fwspec;
-   int err = NO_IOMMU;
+   int err;
 
if (!master_np)
return -ENODEV;
@@ -153,37 +150,21 @@ int of_iommu_configure(struct device *dev, struct 
device_node *master_np,
} else {
err = of_iommu_configure_device(master_np, dev, id);
}
-
-   /*
-* Two success conditions can be represented by non-negative err here:
-* >0 : there is no IOMMU, or one was unavailable for non-fatal reasons
-*  0 : we found an IOMMU, and dev->fwspec is initialised appropriately
-* <0 : any actual error
-*/
-   if (!err) {
-   /* The fwspec pointer changed, read it again */
-   fwspec = dev_iommu_fwspec_get(dev);
-   ops= fwspec->ops;
-   }
mutex_unlock(_probe_device_lock);
 
-   /*
-* If we have reason to believe the IOMMU driver missed the initial
-* probe for dev, replay it to get things in order.
-*/
-   if (!err && dev->bus)
-   err = iommu_probe_device(dev);
-
-   /* Ignore all other errors apart from EPROBE_DEFER */
-   if (err < 0) {
-   if (err == -EPROBE_DEFER)
-   return err;
-   dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+   if (err == -ENODEV || err == -EPROBE_DEFER)
return err;
-   }
-   if (!ops)
-   return -ENODEV;
+   if (err)
+   goto err_log;
+
+   err = iommu_probe_device(dev);
+   if (err)
+   goto err_log;
return 0;
+
+err_log:
+   dev_dbg(dev, "Adding to IOMMU failed: %pe\n", ERR_PTR(err));
+   return err;
 }
 
 static enum iommu_resv_type __maybe_unused
-- 
2.42.0



[PATCH 09/10] ACPI: IORT: Cast from ULL to phys_addr_t

2023-11-28 Thread Jason Gunthorpe
gcc on i386 (when compile testing) warns:

 drivers/acpi/arm64/iort.c:2014:18: warning: implicit conversion from 'unsigned 
long long' to 'phys_addr_t' (aka 'unsigned int') changes value from 
18446744073709551615 to 4294967295 [-Wconstant-conversion]
   local_limit = 
DMA_BIT_MASK(ncomp->memory_address_limit);

Because DMA_BIT_MASK returns a large ULL constant. Explicitly truncate it
to phys_addr_t.

Signed-off-by: Jason Gunthorpe 
---
 drivers/acpi/arm64/iort.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 6496ff5a6ba20d..bdaf9256870d92 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -2011,7 +2011,8 @@ phys_addr_t __init acpi_iort_dma_get_max_cpu_address(void)
 
case ACPI_IORT_NODE_NAMED_COMPONENT:
ncomp = (struct acpi_iort_named_component 
*)node->node_data;
-   local_limit = DMA_BIT_MASK(ncomp->memory_address_limit);
+   local_limit = (phys_addr_t)DMA_BIT_MASK(
+   ncomp->memory_address_limit);
limit = min_not_zero(limit, local_limit);
break;
 
@@ -2020,7 +2021,8 @@ phys_addr_t __init acpi_iort_dma_get_max_cpu_address(void)
break;
 
rc = (struct acpi_iort_root_complex *)node->node_data;
-   local_limit = DMA_BIT_MASK(rc->memory_address_limit);
+   local_limit = (phys_addr_t)DMA_BIT_MASK(
+   rc->memory_address_limit);
limit = min_not_zero(limit, local_limit);
break;
}
-- 
2.42.0



[PATCH 05/10] iommu: Mark dev_iommu_priv_set() with a lockdep

2023-11-28 Thread Jason Gunthorpe
A perfect driver would only call dev_iommu_priv_set() from its probe
callback. We've made it functionally correct to call it from the of_xlate
by adding a lock around that call.

lockdep assert that iommu_probe_device_lock is held to discourage misuse.

Exclude PPC kernels with CONFIG_FSL_PAMU turned on because FSL_PAMU uses a
global static for its priv and abuses priv for its domain.

Remove the pointless stores of NULL, all these are on paths where the core
code will free dev->iommu after the op returns.

Reviewed-by: Lu Baolu 
Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/amd/iommu.c   | 2 --
 drivers/iommu/apple-dart.c  | 1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 1 -
 drivers/iommu/intel/iommu.c | 2 --
 drivers/iommu/iommu.c   | 9 +
 drivers/iommu/omap-iommu.c  | 1 -
 include/linux/iommu.h   | 5 +
 8 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9f706436082833..be58644a6fa518 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -551,8 +551,6 @@ static void amd_iommu_uninit_device(struct device *dev)
if (dev_data->domain)
detach_device(dev);
 
-   dev_iommu_priv_set(dev, NULL);
-
/*
 * We keep dev_data around for unplugged devices and reuse it when the
 * device is re-plugged - not doing so would introduce a ton of races.
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 7438e9c82ba982..25135440b5dd54 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -743,7 +743,6 @@ static void apple_dart_release_device(struct device *dev)
 {
struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 
-   dev_iommu_priv_set(dev, NULL);
kfree(cfg);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fc4317c25b6d53..1855d3892b15f8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2695,7 +2695,6 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
 
 err_free_master:
kfree(master);
-   dev_iommu_priv_set(dev, NULL);
return ERR_PTR(ret);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4d09c004789274..adc7937fd8a3a3 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1420,7 +1420,6 @@ static void arm_smmu_release_device(struct device *dev)
 
arm_smmu_rpm_put(cfg->smmu);
 
-   dev_iommu_priv_set(dev, NULL);
kfree(cfg);
 }
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 897159dba47de4..511589341074f0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4461,7 +4461,6 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
ret = intel_pasid_alloc_table(dev);
if (ret) {
dev_err(dev, "PASID table allocation failed\n");
-   dev_iommu_priv_set(dev, NULL);
kfree(info);
return ERR_PTR(ret);
}
@@ -4479,7 +4478,6 @@ static void intel_iommu_release_device(struct device *dev)
dmar_remove_one_dev_info(dev);
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
-   dev_iommu_priv_set(dev, NULL);
kfree(info);
set_dma_ops(dev, NULL);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4323b6276e977f..08f29a1dfcd5f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -387,6 +387,15 @@ static u32 dev_iommu_get_max_pasids(struct device *dev)
return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
 }
 
+void dev_iommu_priv_set(struct device *dev, void *priv)
+{
+   /* FSL_PAMU does something weird */
+   if (!IS_ENABLED(CONFIG_FSL_PAMU))
+   lockdep_assert_held(_probe_device_lock);
+   dev->iommu->priv = priv;
+}
+EXPORT_SYMBOL_GPL(dev_iommu_priv_set);
+
 /*
  * Init the dev->iommu and dev->iommu_group in the struct device and get the
  * driver probed
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c66b070841dd41..c9528065a59afa 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1719,7 +1719,6 @@ static void omap_iommu_release_device(struct device *dev)
if (!dev->of_node || !arch_data)
return;
 
-   dev_iommu_priv_set(dev, NULL);
kfree(arch_data);
 
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c7394b39599c84..c24933a1d0d643 100644
--- a/include/linux/iommu.h

[PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-28 Thread Jason Gunthorpe
The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.

Signed-off-by: Jason Gunthorpe 
---
 drivers/acpi/Kconfig| 2 --
 drivers/acpi/Makefile   | 2 +-
 drivers/acpi/arm64/Kconfig  | 1 +
 drivers/acpi/arm64/Makefile | 2 +-
 drivers/iommu/Kconfig   | 1 +
 5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.
 
-if ARM64
 source "drivers/acpi/arm64/Kconfig"
-endif
 
 config ACPI_PPTT
bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
 video-objs += acpi_video.o video_detect.o
 obj-y  += dptf/
 
-obj-$(CONFIG_ARM64)+= arm64/
+obj-y  += arm64/
 
 obj-$(CONFIG_ACPI_VIOT)+= viot.o
 
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT
 
 config ACPI_AGDI
bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
 obj-$(CONFIG_ACPI_APMT)+= apmt.o
 obj-$(CONFIG_ARM_AMBA) += amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI
help
  Support for implementations of the ARM System MMU architecture
  versions 1 and 2.
-- 
2.42.0



[PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock

2023-11-28 Thread Jason Gunthorpe
The iommu_device_lock protects the iommu_device_list which is only read by
iommu_ops_from_fwnode().

This is now always called under the iommu_probe_device_lock, so we don't
need to double lock the linked list. Use the iommu_probe_device_lock on
the write side too.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 08f29a1dfcd5f8..9557c2ec08d915 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
container_of(_kobj, struct iommu_group, kobj)
 
 static LIST_HEAD(iommu_device_list);
-static DEFINE_SPINLOCK(iommu_device_lock);
 
 static const struct bus_type * const iommu_buses[] = {
_bus_type,
@@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu,
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
 
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
 
for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)
err = bus_iommu_probe(iommu_buses[i]);
@@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
 
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
list_del(>list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
 
/* Pairs with the alloc in generic_single_device_group() */
iommu_group_put(iommu->singleton_group);
@@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu,
if (err)
return err;
 
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
 
err = bus_iommu_probe(bus);
if (err) {
@@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus)
 
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
if (iommu_buses[i] == bus) {
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
ret = !list_empty(_device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
}
}
return ret;
@@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough);
 
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 {
-   const struct iommu_ops *ops = NULL;
struct iommu_device *iommu;
 
-   spin_lock(_device_lock);
+   lockdep_assert_held(_probe_device_lock);
+
list_for_each_entry(iommu, _device_list, list)
-   if (iommu->fwnode == fwnode) {
-   ops = iommu->ops;
-   break;
-   }
-   spin_unlock(_device_lock);
-   return ops;
+   if (iommu->fwnode == fwnode)
+   return iommu->ops;
+   return NULL;
 }
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
-- 
2.42.0



[PATCH 08/10] iommu/tegra: Use tegra_dev_iommu_get_stream_id() in the remaining places

2023-11-28 Thread Jason Gunthorpe
This API was defined to formalize the access to internal iommu details on
some Tegra SOCs, but a few callers got missed. Add them.

The helper already masks by 0x so remove this code from the callers.

Suggested-by: Thierry Reding 
Signed-off-by: Jason Gunthorpe 
---
 drivers/dma/tegra186-gpc-dma.c  |  8 +++-
 drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c |  7 ++-
 drivers/memory/tegra/tegra186.c | 12 ++--
 3 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/dma/tegra186-gpc-dma.c b/drivers/dma/tegra186-gpc-dma.c
index fa4d4142a68a21..88547a23825b18 100644
--- a/drivers/dma/tegra186-gpc-dma.c
+++ b/drivers/dma/tegra186-gpc-dma.c
@@ -1348,8 +1348,8 @@ static int tegra_dma_program_sid(struct tegra_dma_channel 
*tdc, int stream_id)
 static int tegra_dma_probe(struct platform_device *pdev)
 {
const struct tegra_dma_chip_data *cdata = NULL;
-   struct iommu_fwspec *iommu_spec;
-   unsigned int stream_id, i;
+   unsigned int i;
+   u32 stream_id;
struct tegra_dma *tdma;
int ret;
 
@@ -1378,12 +1378,10 @@ static int tegra_dma_probe(struct platform_device *pdev)
 
tdma->dma_dev.dev = >dev;
 
-   iommu_spec = dev_iommu_fwspec_get(>dev);
-   if (!iommu_spec) {
+   if (!tegra_dev_iommu_get_stream_id(>dev, _id)) {
dev_err(>dev, "Missing iommu stream-id\n");
return -EINVAL;
}
-   stream_id = iommu_spec->ids[0] & 0x;
 
ret = device_property_read_u32(>dev, "dma-channel-mask",
   >chan_mask);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c
index e7e8fdf3adab7a..b40fd1dbb21617 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c
@@ -28,16 +28,13 @@ static void
 gp10b_ltc_init(struct nvkm_ltc *ltc)
 {
struct nvkm_device *device = ltc->subdev.device;
-   struct iommu_fwspec *spec;
+   u32 sid;
 
nvkm_wr32(device, 0x17e27c, ltc->ltc_nr);
nvkm_wr32(device, 0x17e000, ltc->ltc_nr);
nvkm_wr32(device, 0x100800, ltc->ltc_nr);
 
-   spec = dev_iommu_fwspec_get(device->dev);
-   if (spec) {
-   u32 sid = spec->ids[0] & 0x;
-
+   if (tegra_dev_iommu_get_stream_id(device->dev, )) {
/* stream ID */
nvkm_wr32(device, 0x16, sid << 2);
}
diff --git a/drivers/memory/tegra/tegra186.c b/drivers/memory/tegra/tegra186.c
index 533f85a4b2bdb7..3e4fbe94dd666e 100644
--- a/drivers/memory/tegra/tegra186.c
+++ b/drivers/memory/tegra/tegra186.c
@@ -111,21 +111,21 @@ static void tegra186_mc_client_sid_override(struct 
tegra_mc *mc,
 static int tegra186_mc_probe_device(struct tegra_mc *mc, struct device *dev)
 {
 #if IS_ENABLED(CONFIG_IOMMU_API)
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct of_phandle_args args;
unsigned int i, index = 0;
+   u32 sid;
 
+   WARN_ON(!tegra_dev_iommu_get_stream_id(dev, ));
while (!of_parse_phandle_with_args(dev->of_node, "interconnects", 
"#interconnect-cells",
   index, )) {
if (args.np == mc->dev->of_node && args.args_count != 0) {
for (i = 0; i < mc->soc->num_clients; i++) {
const struct tegra_mc_client *client = 
>soc->clients[i];
 
-   if (client->id == args.args[0]) {
-   u32 sid = fwspec->ids[0] & 
MC_SID_STREAMID_OVERRIDE_MASK;
-
-   tegra186_mc_client_sid_override(mc, 
client, sid);
-   }
+   if (client->id == args.args[0])
+   tegra186_mc_client_sid_override(
+   mc, client,
+   sid & 
MC_SID_STREAMID_OVERRIDE_MASK);
}
}
 
-- 
2.42.0



[PATCH 04/10] iommu: Mark dev_iommu_get() with lockdep

2023-11-28 Thread Jason Gunthorpe
Allocation of dev->iommu must be done under the
iommu_probe_device_lock. Mark this with lockdep to discourage future
mistakes.

Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0d25468d53a68a..4323b6276e977f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -334,6 +334,8 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
 {
struct dev_iommu *param = dev->iommu;
 
+   lockdep_assert_held(_probe_device_lock);
+
if (param)
return param;
 
-- 
2.42.0



[PATCH 01/10] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()

2023-11-28 Thread Jason Gunthorpe
This is not being used to pass ops, it is just a way to tell if an
iommu driver was probed. These days this can be detected directly via
device_iommu_mapped(). Call device_iommu_mapped() in the two places that
need to check it and remove the iommu parameter everywhere.

Reviewed-by: Jerry Snitselaar 
Reviewed-by: Lu Baolu 
Reviewed-by: Moritz Fischer 
Acked-by: Christoph Hellwig 
Acked-by: Rob Herring 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 arch/arc/mm/dma.c   |  2 +-
 arch/arm/mm/dma-mapping-nommu.c |  2 +-
 arch/arm/mm/dma-mapping.c   | 10 +-
 arch/arm64/mm/dma-mapping.c |  4 ++--
 arch/mips/mm/dma-noncoherent.c  |  2 +-
 arch/riscv/mm/dma-noncoherent.c |  2 +-
 drivers/acpi/scan.c |  3 +--
 drivers/hv/hv_common.c  |  2 +-
 drivers/of/device.c |  2 +-
 include/linux/dma-map-ops.h |  4 ++--
 10 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 2a7fbbb83b7056..197707bc765889 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -91,7 +91,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
  * Plug in direct dma map ops.
  */
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool coherent)
+   bool coherent)
 {
/*
 * IOC hardware snoops all DMA traffic keeping the caches consistent
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index cfd9c933d2f09c..b94850b579952a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -34,7 +34,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool coherent)
+   bool coherent)
 {
if (IS_ENABLED(CONFIG_CPU_V7M)) {
/*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5409225b4abc06..6c359a3af8d9c7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1713,7 +1713,7 @@ void arm_iommu_detach_device(struct device *dev)
 EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
 
 static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool 
coherent)
+   bool coherent)
 {
struct dma_iommu_mapping *mapping;
 
@@ -1748,7 +1748,7 @@ static void arm_teardown_iommu_dma_ops(struct device *dev)
 #else
 
 static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool 
coherent)
+   bool coherent)
 {
 }
 
@@ -1757,7 +1757,7 @@ static void arm_teardown_iommu_dma_ops(struct device 
*dev) { }
 #endif /* CONFIG_ARM_DMA_USE_IOMMU */
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool coherent)
+   bool coherent)
 {
/*
 * Due to legacy code that sets the ->dma_coherent flag from a bus
@@ -1776,8 +1776,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
if (dev->dma_ops)
return;
 
-   if (iommu)
-   arm_setup_iommu_dma_ops(dev, dma_base, size, iommu, coherent);
+   if (device_iommu_mapped(dev))
+   arm_setup_iommu_dma_ops(dev, dma_base, size, coherent);
 
xen_setup_dma_ops(dev);
dev->archdata.dma_ops_setup = true;
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 3cb101e8cb29ba..61886e43e3a10f 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -47,7 +47,7 @@ void arch_teardown_dma_ops(struct device *dev)
 #endif
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool coherent)
+   bool coherent)
 {
int cls = cache_line_size_of_cpu();
 
@@ -58,7 +58,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
   ARCH_DMA_MINALIGN, cls);
 
dev->dma_coherent = coherent;
-   if (iommu)
+   if (device_iommu_mapped(dev))
iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
 
xen_setup_dma_ops(dev);
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 3c4fc97b9f394b..0f3cec663a12cd 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -138,7 +138,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-   const struct iommu_ops *iommu, bool coherent)
+   bool coherent)
 {

[PATCH 07/10] acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()

2023-11-28 Thread Jason Gunthorpe
Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.

Acked-by: Rafael J. Wysocki 
Reviewed-by: Jerry Snitselaar 
Tested-by: Hector Martin 
Signed-off-by: Jason Gunthorpe 
---
 drivers/acpi/scan.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 444a0b3c72f2d8..340ba720c72129 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1562,8 +1562,7 @@ static inline const struct iommu_ops 
*acpi_iommu_fwspec_ops(struct device *dev)
return fwspec ? fwspec->ops : NULL;
 }
 
-static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
-  const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
int err;
const struct iommu_ops *ops;
@@ -1577,7 +1576,7 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
ops = acpi_iommu_fwspec_ops(dev);
if (ops) {
mutex_unlock(_probe_device_lock);
-   return ops;
+   return 0;
}
 
err = iort_iommu_configure_id(dev, id_in);
@@ -1594,12 +1593,14 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
 
/* Ignore all other errors apart from EPROBE_DEFER */
if (err == -EPROBE_DEFER) {
-   return ERR_PTR(err);
+   return err;
} else if (err) {
dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);
-   return NULL;
+   return -ENODEV;
}
-   return acpi_iommu_fwspec_ops(dev);
+   if (!acpi_iommu_fwspec_ops(dev))
+   return -ENODEV;
+   return 0;
 }
 
 #else /* !CONFIG_IOMMU_API */
@@ -1611,10 +1612,9 @@ int acpi_iommu_fwspec_init(struct device *dev, u32 id,
return -ENODEV;
 }
 
-static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
-  const u32 *id_in)
+static int acpi_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
-   return NULL;
+   return -ENODEV;
 }
 
 #endif /* !CONFIG_IOMMU_API */
@@ -1628,7 +1628,7 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
  const u32 *input_id)
 {
-   const struct iommu_ops *iommu;
+   int ret;
 
if (attr == DEV_DMA_NOT_SUPPORTED) {
set_dma_ops(dev, _dummy_ops);
@@ -1637,10 +1637,15 @@ int acpi_dma_configure_id(struct device *dev, enum 
dev_dma_attr attr,
 
acpi_arch_dma_setup(dev);
 
-   iommu = acpi_iommu_configure_id(dev, input_id);
-   if (PTR_ERR(iommu) == -EPROBE_DEFER)
+   ret = acpi_iommu_configure_id(dev, input_id);
+   if (ret == -EPROBE_DEFER)
return -EPROBE_DEFER;
 
+   /*
+* Historically this routine doesn't fail driver probing due to errors
+* in acpi_iommu_configure_id()
+*/
+
arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT);
 
return 0;
-- 
2.42.0



[PATCH 00/10] IOMMU related FW parsing cleanup

2023-11-28 Thread Jason Gunthorpe
These are the patches from the from the prior series without the "fwspec
polishing":
 https://lore.kernel.org/r/0-v2-36a0088ecaa7+22c6e-iommu_fwspec_...@nvidia.com

Rebased onto Robin's patch:
 
https://lore.kernel.org/all/16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.mur...@arm.com/

Does a few things to prepare for the next:

- Clean up the call chains around dma_configure so the iommu_ops isn't being
  exposed.

- Add additional lockdep annotations now that we can.

- Replace the iommu_device_lock with iommu_probe_device_lock.

- Fix some missed places that need to call tegra_dev_iommu_get_stream_id()

Jason Gunthorpe (10):
  iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()
  iommmu/of: Do not return struct iommu_ops from of_iommu_configure()
  iommu/of: Use -ENODEV consistently in of_iommu_configure()
  iommu: Mark dev_iommu_get() with lockdep
  iommu: Mark dev_iommu_priv_set() with a lockdep
  iommu: Replace iommu_device_lock with iommu_probe_device_lock
  acpi: Do not return struct iommu_ops from acpi_iommu_configure_id()
  iommu/tegra: Use tegra_dev_iommu_get_stream_id() in the remaining
places
  ACPI: IORT: Cast from ULL to phys_addr_t
  ACPI: IORT: Allow COMPILE_TEST of IORT

 arch/arc/mm/dma.c |  2 +-
 arch/arm/mm/dma-mapping-nommu.c   |  2 +-
 arch/arm/mm/dma-mapping.c | 10 +--
 arch/arm64/mm/dma-mapping.c   |  4 +-
 arch/mips/mm/dma-noncoherent.c|  2 +-
 arch/riscv/mm/dma-noncoherent.c   |  2 +-
 drivers/acpi/Kconfig  |  2 -
 drivers/acpi/Makefile |  2 +-
 drivers/acpi/arm64/Kconfig|  1 +
 drivers/acpi/arm64/Makefile   |  2 +-
 drivers/acpi/arm64/iort.c |  6 +-
 drivers/acpi/scan.c   | 32 ++
 drivers/dma/tegra186-gpc-dma.c|  8 +--
 .../gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c   |  7 +-
 drivers/hv/hv_common.c|  2 +-
 drivers/iommu/Kconfig |  1 +
 drivers/iommu/amd/iommu.c |  2 -
 drivers/iommu/apple-dart.c|  1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c |  1 -
 drivers/iommu/intel/iommu.c   |  2 -
 drivers/iommu/iommu.c | 41 +++-
 drivers/iommu/of_iommu.c  | 64 ---
 drivers/iommu/omap-iommu.c|  1 -
 drivers/memory/tegra/tegra186.c   | 12 ++--
 drivers/of/device.c   | 24 ---
 include/linux/dma-map-ops.h   |  4 +-
 include/linux/iommu.h |  5 +-
 include/linux/of_iommu.h  | 13 ++--
 29 files changed, 124 insertions(+), 132 deletions(-)


base-commit: 173ff345925a394284250bfa6e47d231e62031c7
-- 
2.42.0



Re: [PATCH 0/3] drm/bridge: ti-sn65dsi86: Some updates

2023-11-28 Thread Laurent Pinchart
Hello,

On Fri, Nov 24, 2023 at 09:56:55AM +0100, Neil Armstrong wrote:
> On 23/11/2023 18:54, Uwe Kleine-König wrote:
> > Hello,
> > 
> > this is a series I created while starring at the ti-sn65dsi86 driver in
> > the context of my pwm-lifetime series.
> > 
> > The first patch should be fine. The last one has a few rough edges, but
> > maybe you like the direction this is going to? The 2nd patch probably
> > only makes sense if you also take the third.
> > 
> > Best regards
> > Uwe
> > 
> > Uwe Kleine-König (3):
> >drm/bridge: ti-sn65dsi86: Simplify using pm_runtime_resume_and_get()
> >drm/bridge: ti-sn65dsi86: Change parameters of
> >  ti_sn65dsi86_{read,write}_u16
> >drm/bridge: ti-sn65dsi86: Loosen coupling of PWM to ti-sn65dsi86 core
> > 
> >   drivers/gpu/drm/bridge/ti-sn65dsi86.c | 146 +++---
> >   1 file changed, 83 insertions(+), 63 deletions(-)
> > 
> > base-commit: 4e87148f80d198ba5febcbcc969c6b9471099a09
> 
> It looks fine to me, even without the goal to move the driver to drivers/pwm
> I think it's same to move the pwm ddata out of the main pdata ans associate
> it to the pwm aux device lifetime.
> 
> I don't anything wrong, and so far it's of for me, let's see if there's 
> comments
> for other people before applying!

I like 1/3 very much, but as mentioned in a reply to 3/3, I'm not
convinced by that one at all. Not only does it make the driver more
complex for, I believe, very little gain (if any), usage of
devm_kzalloc() in ti_sn_pwm_probe() is most likely wrong. Lifetime of
driver-specific structures need to be handled through reference
counting.

-- 
Regards,

Laurent Pinchart


Re: [PATCH 2/3] drm/bridge: ti-sn65dsi86: Change parameters of ti_sn65dsi86_{read,write}_u16

2023-11-28 Thread Laurent Pinchart
On Tue, Nov 28, 2023 at 04:34:30PM -0800, Doug Anderson wrote:
> On Thu, Nov 23, 2023 at 9:54 AM Uwe Kleine-König wrote:
> >
> > This aligns the function's parameters to regmap_{read,write} and
> > simplifies the next change that takes pwm driver data out of struct
> > ti_sn65dsi86.
> >
> > Signed-off-by: Uwe Kleine-König 
> > ---
> >  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 20 ++--
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> I'm on the fence for this one. It almost feels like if this takes a
> "regmap" as the first field then it should be part of the regmap API.
> Adding a concept like this to the regmap API might be interesting if
> there were more than one user, but that's a pretty big yak to shave.

See include/media/v4l2-cci.h and drivers/media/v4l2-core/v4l2-cci.c. We
could discuss moving it to regmap.

> I'd tend to agree with your statement in the cover letter that this
> patch really makes more sense if we were to take patch #3, and (as per
> my response there) I'm not convinced.

Likewise :-) 1/3 is good, but without 3/3, I'm not conviced by 2/3.

> That being said, similar to patch #3 if everything else thinks this is
> great then I won't stand in the way.

-- 
Regards,

Laurent Pinchart


Re: [PATCH 1/3] drm/bridge: ti-sn65dsi86: Simplify using pm_runtime_resume_and_get()

2023-11-28 Thread Laurent Pinchart
Hi Uwe,

Thank you for the patch.

On Thu, Nov 23, 2023 at 06:54:27PM +0100, Uwe Kleine-König wrote:
> pm_runtime_resume_and_get() already drops the runtime PM usage counter
> in the error case. So a call to pm_runtime_put_sync() can be dropped.
> 
> Signed-off-by: Uwe Kleine-König 

I wonder if checkpatch should warn about usage of pm_runtime_get_sync().

Reviewed-by: Laurent Pinchart 

> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
> b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> index c45c07840f64..5b8e1dfc458d 100644
> --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> @@ -1413,11 +1413,9 @@ static int ti_sn_pwm_apply(struct pwm_chip *chip, 
> struct pwm_device *pwm,
>   int ret;
>  
>   if (!pdata->pwm_enabled) {
> - ret = pm_runtime_get_sync(pdata->dev);
> - if (ret < 0) {
> - pm_runtime_put_sync(pdata->dev);
> + ret = pm_runtime_resume_and_get(pdata->dev);
> + if (ret < 0)
>   return ret;
> - }
>   }
>  
>   if (state->enabled) {

-- 
Regards,

Laurent Pinchart


Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-28 Thread Owen T. Heisler

On 11/21/23 14:23, Owen T. Heisler wrote:

On 11/21/23 09:16, Linux regression tracking (Thorsten Leemhuis) wrote:

On 15.11.23 07:19, Owen T. Heisler wrote:

On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:

On 28.10.23 04:46, Owen T. Heisler wrote:

#regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
#regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180

3. Suddenly the secondary Nvidia-connected display turns off and X 
stops responding to keyboard/mouse input.



I am currently testing v6.6 with the culprit commit reverted.


- v6.6: fails
- v6.6 with the culprit commit reverted: works

See  for full 
details including a decoded kernel log.


Thanks,
Owen

--
Owen T. Heisler



Re: [PATCH 3/3] drm/bridge: ti-sn65dsi86: Loosen coupling of PWM to ti-sn65dsi86 core

2023-11-28 Thread Laurent Pinchart
On Tue, Nov 28, 2023 at 04:32:10PM -0800, Doug Anderson wrote:
> On Thu, Nov 23, 2023 at 9:54 AM Uwe Kleine-König wrote:
> >
> > Introduce a dedicated private data structure for the pwm aux driver
> > provided by the sn65dsi86 driver. This way data needed for PWM operation
> > is (to a certain degree) nicely separated and doesn't occupy memory in
> > the ti_sn65dsi86 core's private data if the PWM isn't used.
> 
> I suspect we still end up at a loss memory-wise. All of the extra code
> + the overhead of another kmalloc seems like it would take up more
> space than the tiny bit of data in the structure.
> 
> 
> > The eventual goal is to decouple the PWM driver completely from the
> > ti-sn65dsi86 core and maybe even move it to a dedicated driver below
> > drivers/pwm. There are a few obstacles to that quest though:
> >
> >  - The busy pin check (implemented in ti_sn_pwm_pin_request() and
> >ti_sn_pwm_pin_release()) would need to be available unconditionally.
> >
> >  - The refclk should probably abstracted by a struct clk such that the
> >pwm_refclk_freq member that currently still lives in ti-sn65dsi86
> >core driver data can be dropped.
> 
> Right that the above could be done with more abstraction layers. I
> guess the question I have is: how much do we gain with that?
> 
> Personally I'm not really sold on the idea. If others think this is a
> great change then I won't stand in the way, but IMO without a
> compelling reason this is just extra abstraction / complexity without
> any gain...

I'm not convinced either, especially on moving to a separate driver, but
even when it comes to dynamically allocating a new structure. Splitting
the PWM fields to a new ti_sn65dsi86_pwm would be fine (and I think
would increase readibility), but we can then simply embed it in
ti_sn65dsi86.

> > +/*
> > + * struct ti_sn65dsi86_pwm_ddata - Platform data for ti-sn65dsi86 pwm 
> > driver.
> 
> Why "ddata" exactly? It seems like this is just the pwm "data" ?
> 
> 
> > + * @chip: pwm_chip if the PWM is exposed.
> > + * @pwm_enabled:  Used to track if the PWM signal is currently enabled.
> > + * @regmap:   Regmap for accessing i2c.
> > + * @pdata:   platform data of the parent device
> 
> "pdata" isn't a member of the struct, but "pwm_refclk_freq" is.

-- 
Regards,

Laurent Pinchart


Re: [PATCH 2/3] drm/bridge: ti-sn65dsi86: Change parameters of ti_sn65dsi86_{read, write}_u16

2023-11-28 Thread Doug Anderson
Hi,

On Thu, Nov 23, 2023 at 9:54 AM Uwe Kleine-König
 wrote:
>
> This aligns the function's parameters to regmap_{read,write} and
> simplifies the next change that takes pwm driver data out of struct
> ti_sn65dsi86.
>
> Signed-off-by: Uwe Kleine-König 
> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

I'm on the fence for this one. It almost feels like if this takes a
"regmap" as the first field then it should be part of the regmap API.
Adding a concept like this to the regmap API might be interesting if
there were more than one user, but that's a pretty big yak to shave.

I'd tend to agree with your statement in the cover letter that this
patch really makes more sense if we were to take patch #3, and (as per
my response there) I'm not convinced.

That being said, similar to patch #3 if everything else thinks this is
great then I won't stand in the way.

-Doug


Re: [PATCH 3/3] drm/bridge: ti-sn65dsi86: Loosen coupling of PWM to ti-sn65dsi86 core

2023-11-28 Thread Doug Anderson
Hi,

On Thu, Nov 23, 2023 at 9:54 AM Uwe Kleine-König
 wrote:
>
> Introduce a dedicated private data structure for the pwm aux driver
> provided by the sn65dsi86 driver. This way data needed for PWM operation
> is (to a certain degree) nicely separated and doesn't occupy memory in
> the ti_sn65dsi86 core's private data if the PWM isn't used.

I suspect we still end up at a loss memory-wise. All of the extra code
+ the overhead of another kmalloc seems like it would take up more
space than the tiny bit of data in the structure.


> The eventual goal is to decouple the PWM driver completely from the
> ti-sn65dsi86 core and maybe even move it to a dedicated driver below
> drivers/pwm. There are a few obstacles to that quest though:
>
>  - The busy pin check (implemented in ti_sn_pwm_pin_request() and
>ti_sn_pwm_pin_release()) would need to be available unconditionally.
>
>  - The refclk should probably abstracted by a struct clk such that the
>pwm_refclk_freq member that currently still lives in ti-sn65dsi86
>core driver data can be dropped.

Right that the above could be done with more abstraction layers. I
guess the question I have is: how much do we gain with that?

Personally I'm not really sold on the idea. If others think this is a
great change then I won't stand in the way, but IMO without a
compelling reason this is just extra abstraction / complexity without
any gain...

> +/*
> + * struct ti_sn65dsi86_pwm_ddata - Platform data for ti-sn65dsi86 pwm driver.

Why "ddata" exactly? It seems like this is just the pwm "data" ?


> + * @chip: pwm_chip if the PWM is exposed.
> + * @pwm_enabled:  Used to track if the PWM signal is currently enabled.
> + * @regmap:   Regmap for accessing i2c.
> + * @pdata:   platform data of the parent device

"pdata" isn't a member of the struct, but "pwm_refclk_freq" is.


-Doug


Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Michael Walle

>> >> > DSI device lifetime has three different stages:
>> >> > 1. before the DSI link being powered up and clocking,
>> >> > 2. when the DSI link is in LP state (for the purpose of this question,
>> >> > this is the time between the DSI link being powered up and the video
>> >> > stream start)
>> >> > 3. when the DSI link is in HS state (while streaming the video).
>> >>
>> >> It's not clear to me what (2) is. What is the state of the clock and
>> >> data lanes?
>> >
>> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
>>
>> Then this is somehow missing
>> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
>>
>>A DSI host should keep the PHY powered down until the pre_enable
>> operation
>>is called. All lanes are in an undefined idle state up to this point,
>> and
>>it must not be assumed that it is LP-11. pre_enable should initialise
>> the
>>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
>> or HS
>>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
>>
>> So I don't think these three states are sufficient, see below, that
>> there
>> should be at least four.
>
>Which one is #4?

enabled clock lane (HS mode), data lanes in LP-11


What is the purpose of such a mode?


To repeat my first mail:

I'm facing similar issues with the tc358775 bridge. This bridge needs
to release its reset while both clock and data lanes are in LP-11
mode.
But then it needs to be configured (via I2C) while the clock lane is
in enabled (HS mode), but the data lanes are still in LP-11 mode.

Therefore, for the correct init sequence is:
(1) dsi host enables lanes, that is clock and data are in lp-11
(2) dsi bridge driver releases reset of the bridge
(3) dsi host enables clock lane, leaves data lanes in lp-11
(4) dsi bridge driver configures the bridge
(5) dsi host enables the video stream
(6) dsi bridge enables the output port of the bridge

-michael


>> > I don't think we support ULPS currently.
>> >
>> >
>> >>
>> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
>> >> to release its reset while both clock and data lanes are in LP-11
>> >> mode.
>> >> But then it needs to be configured (via I2C) while the clock lane is
>> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
>> >>
>> >> To me it looks like there is a fouth case then:
>> >> 1. unpowered
>> >> 2. DSI clock and data are in LP-11
>> >> 3. DSI clock is in HS and data are in LP-11
>> >> 4. DSI clock is in HS and data is in HS
>> >>
>> >> (And of course the bridge needs continuous clock mode).
>> >>
>> >> > Different DSI bridges have different requirements with respect to the
>> >> > code being executed at stages 1 and 2. For example several DSI-to-eDP
>> >> > bridges (ps8640, tc358767 require for the link to be quiet during
>> >> > reset time.
>> >> > The DSI-controlled bridges and DSI panels need to send some commands
>> >> > in stage 2, before starting up video
>> >> >
>> >> > In the DRM subsystem stage 3 naturally maps to the
>> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
>> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
>> >> > the DRM call chain.
>> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
>> >> > which remapped pre-enable callback execution order. However it has led
>> >> > us to the two issues. First, at the DSI host driver we do not know
>> >> > whether the panel / bridge were updated to use pre_enable_prev_first
>> >> > or not. Second, if the bridge has to perform steps during both stages
>> >> > 1 and 2, it can not do that.
>> >> >
>> >> > I'm trying to find a way to express the difference between stages 1
>> >> > and 2 in the generic code, so that we do not to worry about particular
>> >> > DSI host and DSI bridge / panel peculiarities when implementing the
>> >> > DSI host and/or DSI panel driver.
>> >>
>> >> For now, I have a rather hacky ".dsi_lp11_notify" callback in
>> >> drm_bridge_funcs which is supposed to be called by the DSI host while
>> >> the
>> >> clock and data lanes are in LP-11 mode. But that is rather an RFC and
>> >> me
>> >> needing something to get the driver for this bridge working. Because
>> >> it's
>> >> badly broken. FWIW, you can find my work-in-progress patches at
>> >> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
>> >>
>> >> -michael
>> >>
>> >
>> >
>> > --
>> > With best wishes
>> > Dmitry
>
>
>



Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Michael Walle

I'm facing similar issues with the tc358775 bridge. This bridge needs
to release its reset while both clock and data lanes are in LP-11
mode.
But then it needs to be configured (via I2C) while the clock lane is
in enabled (HS mode), but the data lanes are still in LP-11 mode.


This is quite an interesting requirement. For example, I'm not 100%
sure whether we can get that done on our (msm) hosts. I need to double
check that.
What frequency is expected on the CLK lane? Can it be an arbitrary
frequency or it should be the same freq as the one used later for the
video transfer?


I presume it has to be the same frequency as the video stream later.
That's a least what I have successfully tested.
The datasheet doesn't mention if a frequency switch is allowed on the
clock lane (which would need a brief switch to LP mode, I presume). I'd 
say
it's not allowed/supported as the bridge is very picky regarding the 
init

sequence in general.

I'm using the Mediatek DSI host, where that sequence is possible. I.e. 
you

just enable the clock and data lanes in continuous clock mode, but don't
enable the video stream, which should leave the data lanes in LP-11 
mode.


Sometimes you also have a command mode (instead of a video mode). And if
you don't send any commands, the data lanes are in LP-11 mode, too.

-michael


Therefore, for the correct init sequence is:
(1) dsi host enables lanes, that is clock and data are in lp-11
(2) dsi bridge driver releases reset of the bridge
(3) dsi host enables clock lane, leaves data lanes in lp-11
(4) dsi bridge driver configures the bridge
(5) dsi host enables the video stream
(6) dsi bridge enables the output port of the bridge


Re: [PATCH v7 3/3] drm/panel-edp: Avoid adding multiple preferred modes

2023-11-28 Thread Doug Anderson
Hi,

On Fri, Nov 17, 2023 at 1:51 PM Hsin-Yi Wang  wrote:
>
> If a non generic edp-panel is under aux-bus, the mode read from edid would
> still be selected as preferred and results in multiple preferred modes,
> which is ambiguous.
>
> If both hard-coded mode and edid exists, only add mode from hard-coded.
>
> Signed-off-by: Hsin-Yi Wang 
> Reviewed-by: Douglas Anderson 
> ---
> v6->v7: no change
> ---
>  drivers/gpu/drm/panel/panel-edp.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Pushed to drm-misc-next:

fb3f43d50d9b drm/panel-edp: Avoid adding multiple preferred modes


Re: [PATCH v7 2/3] drm/panel-edp: Add auo_b116xa3_mode

2023-11-28 Thread Doug Anderson
Hi,

On Fri, Nov 17, 2023 at 1:51 PM Hsin-Yi Wang  wrote:
>
> Add auo_b116xa3_mode to override the original modes parsed from edid
> of the panels 0x405c B116XAK01.0 and 0x615c B116XAN06.1 which result
> in glitches on panel.
>
> Signed-off-by: Hsin-Yi Wang 
> ---
> v6->v7: split usecase to another patch.
> ---
>  drivers/gpu/drm/panel/panel-edp.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)

Pushed to drm-misc-next:

70e0d5550f5c drm/panel-edp: Add auo_b116xa3_mode


Re: [PATCH v7 1/3] drm/panel-edp: Add override_edid_mode quirk for generic edp

2023-11-28 Thread Doug Anderson
Hi,

On Fri, Nov 17, 2023 at 1:51 PM Hsin-Yi Wang  wrote:
>
> Generic edp gets mode from edid. However, some panels report incorrect
> mode in this way, resulting in glitches on panel. Introduce a new quirk
> additional_mode to the generic edid to pick a correct hardcoded mode.
>
> Signed-off-by: Hsin-Yi Wang 
> Reviewed-by: Douglas Anderson 
> ---
> v6->v7: split usecase to another patch.
> ---
>  drivers/gpu/drm/panel/panel-edp.c | 48 +--
>  1 file changed, 45 insertions(+), 3 deletions(-)

Pushed to drm-misc-next:

9f7843b51581 drm/panel-edp: Add override_edid_mode quirk for generic edp


Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Dmitry Baryshkov
On Wed, 29 Nov 2023 at 00:20, Michael Walle  wrote:
>
> [sorry I fat fingered my former reply and converted all CCs to BCCs..]
>
> >> >> >> > DSI device lifetime has three different stages:
> >> >> >> > 1. before the DSI link being powered up and clocking,
> >> >> >> > 2. when the DSI link is in LP state (for the purpose of this 
> >> >> >> > question,
> >> >> >> > this is the time between the DSI link being powered up and the 
> >> >> >> > video
> >> >> >> > stream start)
> >> >> >> > 3. when the DSI link is in HS state (while streaming the video).
> >> >> >>
> >> >> >> It's not clear to me what (2) is. What is the state of the clock and
> >> >> >> data lanes?
> >> >> >
> >> >> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
> >> >>
> >> >> Then this is somehow missing
> >> >> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
> >> >>
> >> >>A DSI host should keep the PHY powered down until the pre_enable
> >> >> operation
> >> >>is called. All lanes are in an undefined idle state up to this point,
> >> >> and
> >> >>it must not be assumed that it is LP-11. pre_enable should initialise
> >> >> the
> >> >>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
> >> >> or HS
> >> >>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
> >> >>
> >> >> So I don't think these three states are sufficient, see below, that
> >> >> there
> >> >> should be at least four.
> >> >
> >> >Which one is #4?
> >>
> >> enabled clock lane (HS mode), data lanes in LP-11
> >
> > What is the purpose of such a mode?
>
> To repeat my first mail:

Excuse me please, I either missed it, or forgot it.

>
> I'm facing similar issues with the tc358775 bridge. This bridge needs
> to release its reset while both clock and data lanes are in LP-11
> mode.
> But then it needs to be configured (via I2C) while the clock lane is
> in enabled (HS mode), but the data lanes are still in LP-11 mode.

This is quite an interesting requirement. For example, I'm not 100%
sure whether we can get that done on our (msm) hosts. I need to double
check that.
What frequency is expected on the CLK lane? Can it be an arbitrary
frequency or it should be the same freq as the one used later for the
video transfer?

>
> Therefore, for the correct init sequence is:
> (1) dsi host enables lanes, that is clock and data are in lp-11
> (2) dsi bridge driver releases reset of the bridge
> (3) dsi host enables clock lane, leaves data lanes in lp-11
> (4) dsi bridge driver configures the bridge
> (5) dsi host enables the video stream
> (6) dsi bridge enables the output port of the bridge
>
> -michael
>
> >> >> > I don't think we support ULPS currently.
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
> >> >> >> to release its reset while both clock and data lanes are in LP-11
> >> >> >> mode.
> >> >> >> But then it needs to be configured (via I2C) while the clock lane is
> >> >> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
> >> >> >>
> >> >> >> To me it looks like there is a fouth case then:
> >> >> >> 1. unpowered
> >> >> >> 2. DSI clock and data are in LP-11
> >> >> >> 3. DSI clock is in HS and data are in LP-11
> >> >> >> 4. DSI clock is in HS and data is in HS
> >> >> >>
> >> >> >> (And of course the bridge needs continuous clock mode).
> >> >> >>
> >> >> >> > Different DSI bridges have different requirements with respect to 
> >> >> >> > the
> >> >> >> > code being executed at stages 1 and 2. For example several 
> >> >> >> > DSI-to-eDP
> >> >> >> > bridges (ps8640, tc358767 require for the link to be quiet during
> >> >> >> > reset time.
> >> >> >> > The DSI-controlled bridges and DSI panels need to send some 
> >> >> >> > commands
> >> >> >> > in stage 2, before starting up video
> >> >> >> >
> >> >> >> > In the DRM subsystem stage 3 naturally maps to the
> >> >> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
> >> >> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
> >> >> >> > the DRM call chain.
> >> >> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
> >> >> >> > which remapped pre-enable callback execution order. However it has 
> >> >> >> > led
> >> >> >> > us to the two issues. First, at the DSI host driver we do not know
> >> >> >> > whether the panel / bridge were updated to use 
> >> >> >> > pre_enable_prev_first
> >> >> >> > or not. Second, if the bridge has to perform steps during both 
> >> >> >> > stages
> >> >> >> > 1 and 2, it can not do that.
> >> >> >> >
> >> >> >> > I'm trying to find a way to express the difference between stages 1
> >> >> >> > and 2 in the generic code, so that we do not to worry about 
> >> >> >> > particular
> >> >> >> > DSI host and DSI bridge / panel peculiarities when implementing the
> >> >> >> > DSI host and/or DSI panel driver.
> >> >> >>
> >> >> >> For now, I have a rather hacky 

Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Michael Walle

[sorry I fat fingered my former reply and converted all CCs to BCCs..]


>> >> > DSI device lifetime has three different stages:
>> >> > 1. before the DSI link being powered up and clocking,
>> >> > 2. when the DSI link is in LP state (for the purpose of this question,
>> >> > this is the time between the DSI link being powered up and the video
>> >> > stream start)
>> >> > 3. when the DSI link is in HS state (while streaming the video).
>> >>
>> >> It's not clear to me what (2) is. What is the state of the clock and
>> >> data lanes?
>> >
>> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
>>
>> Then this is somehow missing
>> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
>>
>>A DSI host should keep the PHY powered down until the pre_enable
>> operation
>>is called. All lanes are in an undefined idle state up to this point,
>> and
>>it must not be assumed that it is LP-11. pre_enable should initialise
>> the
>>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
>> or HS
>>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
>>
>> So I don't think these three states are sufficient, see below, that
>> there
>> should be at least four.
>
>Which one is #4?

enabled clock lane (HS mode), data lanes in LP-11


What is the purpose of such a mode?


To repeat my first mail:

I'm facing similar issues with the tc358775 bridge. This bridge needs
to release its reset while both clock and data lanes are in LP-11
mode.
But then it needs to be configured (via I2C) while the clock lane is
in enabled (HS mode), but the data lanes are still in LP-11 mode.

Therefore, for the correct init sequence is:
(1) dsi host enables lanes, that is clock and data are in lp-11
(2) dsi bridge driver releases reset of the bridge
(3) dsi host enables clock lane, leaves data lanes in lp-11
(4) dsi bridge driver configures the bridge
(5) dsi host enables the video stream
(6) dsi bridge enables the output port of the bridge

-michael


>> > I don't think we support ULPS currently.
>> >
>> >
>> >>
>> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
>> >> to release its reset while both clock and data lanes are in LP-11
>> >> mode.
>> >> But then it needs to be configured (via I2C) while the clock lane is
>> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
>> >>
>> >> To me it looks like there is a fouth case then:
>> >> 1. unpowered
>> >> 2. DSI clock and data are in LP-11
>> >> 3. DSI clock is in HS and data are in LP-11
>> >> 4. DSI clock is in HS and data is in HS
>> >>
>> >> (And of course the bridge needs continuous clock mode).
>> >>
>> >> > Different DSI bridges have different requirements with respect to the
>> >> > code being executed at stages 1 and 2. For example several DSI-to-eDP
>> >> > bridges (ps8640, tc358767 require for the link to be quiet during
>> >> > reset time.
>> >> > The DSI-controlled bridges and DSI panels need to send some commands
>> >> > in stage 2, before starting up video
>> >> >
>> >> > In the DRM subsystem stage 3 naturally maps to the
>> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
>> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
>> >> > the DRM call chain.
>> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
>> >> > which remapped pre-enable callback execution order. However it has led
>> >> > us to the two issues. First, at the DSI host driver we do not know
>> >> > whether the panel / bridge were updated to use pre_enable_prev_first
>> >> > or not. Second, if the bridge has to perform steps during both stages
>> >> > 1 and 2, it can not do that.
>> >> >
>> >> > I'm trying to find a way to express the difference between stages 1
>> >> > and 2 in the generic code, so that we do not to worry about particular
>> >> > DSI host and DSI bridge / panel peculiarities when implementing the
>> >> > DSI host and/or DSI panel driver.
>> >>
>> >> For now, I have a rather hacky ".dsi_lp11_notify" callback in
>> >> drm_bridge_funcs which is supposed to be called by the DSI host while
>> >> the
>> >> clock and data lanes are in LP-11 mode. But that is rather an RFC and
>> >> me
>> >> needing something to get the driver for this bridge working. Because
>> >> it's
>> >> badly broken. FWIW, you can find my work-in-progress patches at
>> >> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
>> >>
>> >> -michael
>> >>
>> >
>> >
>> > --
>> > With best wishes
>> > Dmitry
>
>
>



Re: Radeon regression in 6.6 kernel

2023-11-28 Thread Alex Deucher
On Mon, Nov 27, 2023 at 6:24 PM Phillip Susi  wrote:
>
> Alex Deucher  writes:
>
> >> In that case those are the already known problems with the scheduler
> >> changes, aren't they?
> >
> > Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
> > misunderstanding what the original report was actually testing.  If it
> > was 6.7, then try reverting:
> > 56e449603f0ac580700621a356d35d5716a62ce5
> > b70438004a14f4d0f9890b3297cd66248728546c
>
> At some point it was suggested that I file a gitlab issue, but I took
> this to mean it was already known and being worked on.  -rc3 came out
> today and still has the problem.  Is there a known issue I could track?
>

At this point, unless there are any objections, I think we should just
revert the two patches.

Alex


Re: [PATCH v5 00/32] drm/amd/display: add AMD driver-specific properties for color mgmt

2023-11-28 Thread Harry Wentland
On 2023-11-16 14:57, Melissa Wen wrote:
> Hello,
> 
> This series extends the current KMS color management API with AMD
> driver-specific properties to enhance the color management support on
> AMD Steam Deck. The key additions to the color pipeline include:
> 

snip

> Melissa Wen (18):
>   drm/drm_mode_object: increase max objects to accommodate new color
> props
>   drm/drm_property: make replace_property_blob_from_id a DRM helper
>   drm/drm_plane: track color mgmt changes per plane

If all patches are merged through amd-staging-drm-next I worry that
conflicts creep in if any code around replace_property_blob_from_id
changes in DRM.

My plan is to merge DRM patches through drm-misc-next, as well
as include them in the amd-staging-drm-next merge. They should then
fall out at the next amd-staging-drm-next pull and (hopefully)
ensure that there is no conflict.

If no objections I'll go ahead with that later this week.

Harry

>   drm/amd/display: add driver-specific property for plane degamma LUT
>   drm/amd/display: explicitly define EOTF and inverse EOTF
>   drm/amd/display: document AMDGPU pre-defined transfer functions
>   drm/amd/display: add plane 3D LUT driver-specific properties
>   drm/amd/display: add plane shaper LUT and TF driver-specific
> properties
>   drm/amd/display: add CRTC gamma TF driver-specific property
>   drm/amd/display: add comments to describe DM crtc color mgmt behavior
>   drm/amd/display: encapsulate atomic regamma operation
>   drm/amd/display: decouple steps for mapping CRTC degamma to DC plane
>   drm/amd/display: reject atomic commit if setting both plane and CRTC
> degamma
>   drm/amd/display: add plane shaper LUT support
>   drm/amd/display: add plane shaper TF support
>   drm/amd/display: add plane 3D LUT support
>   drm/amd/display: add plane CTM driver-specific property
>   drm/amd/display: add plane CTM support
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h  |  91 ++
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  34 +-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 108 +++
>  .../amd/display/amdgpu_dm/amdgpu_dm_color.c   | 818 --
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c|  72 ++
>  .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   | 232 -
>  .../gpu/drm/amd/display/include/fixed31_32.h  |  12 +
>  drivers/gpu/drm/arm/malidp_crtc.c |   2 +-
>  drivers/gpu/drm/drm_atomic.c  |   1 +
>  drivers/gpu/drm/drm_atomic_state_helper.c |   1 +
>  drivers/gpu/drm/drm_atomic_uapi.c |  43 +-
>  drivers/gpu/drm/drm_property.c|  49 ++
>  include/drm/drm_mode_object.h |   2 +-
>  include/drm/drm_plane.h   |   7 +
>  include/drm/drm_property.h|   6 +
>  include/uapi/drm/drm_mode.h   |   8 +
>  16 files changed, 1377 insertions(+), 109 deletions(-)
> 



Re: [PATCH v18 04/26] drm/shmem-helper: Refactor locked/unlocked functions

2023-11-28 Thread Dmitry Osipenko
On 11/28/23 15:37, Boris Brezillon wrote:
> On Tue, 28 Nov 2023 12:14:42 +0100
> Maxime Ripard  wrote:
> 
>> Hi,
>>
>> On Fri, Nov 24, 2023 at 11:59:11AM +0100, Boris Brezillon wrote:
>>> On Fri, 24 Nov 2023 11:40:06 +0100
>>> Maxime Ripard  wrote:
>>>   
 On Mon, Oct 30, 2023 at 02:01:43AM +0300, Dmitry Osipenko wrote:  
> Add locked and remove unlocked postfixes from drm-shmem function names,
> making names consistent with the drm/gem core code.
>
> Reviewed-by: Boris Brezillon 
> Suggested-by: Boris Brezillon 
> Signed-off-by: Dmitry Osipenko 

 This contradicts my earlier ack on a patch but...
   
> ---
>  drivers/gpu/drm/drm_gem_shmem_helper.c| 64 +--
>  drivers/gpu/drm/lima/lima_gem.c   |  8 +--
>  drivers/gpu/drm/panfrost/panfrost_drv.c   |  2 +-
>  drivers/gpu/drm/panfrost/panfrost_gem.c   |  6 +-
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  2 +-
>  drivers/gpu/drm/panfrost/panfrost_mmu.c   |  2 +-
>  drivers/gpu/drm/v3d/v3d_bo.c  |  4 +-
>  drivers/gpu/drm/virtio/virtgpu_object.c   |  4 +-
>  include/drm/drm_gem_shmem_helper.h| 36 +--
>  9 files changed, 64 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index 0d61f2b3e213..154585ddae08 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -43,8 +43,8 @@ static const struct drm_gem_object_funcs 
> drm_gem_shmem_funcs = {
>   .pin = drm_gem_shmem_object_pin,
>   .unpin = drm_gem_shmem_object_unpin,
>   .get_sg_table = drm_gem_shmem_object_get_sg_table,
> - .vmap = drm_gem_shmem_object_vmap,
> - .vunmap = drm_gem_shmem_object_vunmap,
> + .vmap = drm_gem_shmem_object_vmap_locked,
> + .vunmap = drm_gem_shmem_object_vunmap_locked,

 While I think we should indeed be consistent with the names, I would
 also expect helpers to get the locking right by default.  
>>>
>>> Wait, actually I think this patch does what you suggest already. The
>>> _locked() prefix tells the caller: "you should take care of the locking,
>>> I expect the lock to be held when this is hook/function is called". So
>>> helpers without the _locked() prefix take care of the locking (which I
>>> guess matches your 'helpers get the locking right' expectation), and
>>> those with the _locked() prefix don't.  
>>
>> What I meant by "getting the locking right" is indeed a bit ambiguous,
>> sorry. What I'm trying to say I guess is that, in this particular case,
>> I don't think you can expect the vmap implementation to be called with
>> or without the locks held. The doc for that function will say that it's
>> either one or the other, but not both.
>>
>> So helpers should follow what is needed to provide a default vmap/vunmap
>> implementation, including what locking is expected from a vmap/vunmap
>> implementation.
> 
> Hm, yeah, I think that's a matter of taste. When locking is often
> deferrable, like it is in DRM, I find it beneficial for funcions and
> function pointers to reflect the locking scheme, rather than relying on
> people properly reading the doc, especially when this is the only
> outlier in the group of drm_gem_object_funcs we already have, and it's
> not event documented at the drm_gem_object_funcs level [1] :P.
> 
>>
>> If that means that vmap is always called with the locks taken, then
>> drm_gem_shmem_object_vmap can just assume that it will be called with
>> the locks taken and there's no need to mention it in the name (and you
>> can probably sprinkle a couple of lockdep assertion to make sure the
>> locking is indeed consistent).
> 
> Things get very confusing when you end up having drm_gem_shmem helpers
> that are suffixed with _locked() to encode the fact locking is the
> caller's responsibility and no suffix for the
> callee-takes-care-of-the-locking semantics, while other helpers that are
> not suffixed at all actually implement the
> caller-should-take-care-of-the-locking semantics.
> 
>>
 I'm not sure how reasonable it is, but I think I'd prefer to turn this
 around and keep the drm_gem_shmem_object_vmap/unmap helpers name, and
 convert whatever function needs to be converted to the unlock suffix so
 we get a consistent naming.  
>>>
>>> That would be an _unlocked() prefix if we do it the other way around. I
>>> think the main confusion comes from the names of the hooks in
>>> drm_gem_shmem_funcs. Some of them, like drm_gem_shmem_funcs::v[un]map()
>>> are called with the GEM resv lock held, and locking is handled by the
>>> core, others, like drm_gem_shmem_funcs::[un]pin() are called
>>> without the GEM resv lock held, and locking is deferred to the
>>> implementation. As I said, I don't mind prefixing hooks/helpers with
>>> _unlocked() for those that take 

Re: (subset) [PATCH 00/17] dt-bindings: samsung: add specific compatibles for existing SoC

2023-11-28 Thread Uwe Kleine-König
On Tue, Nov 28, 2023 at 06:49:23PM +0100, Thierry Reding wrote:
> 
> On Wed, 08 Nov 2023 11:43:26 +0100, Krzysztof Kozlowski wrote:
> > Merging
> > ===
> > I propose to take entire patchset through my tree (Samsung SoC), because:
^^^

> > 1. Next cycle two new SoCs will be coming (Google GS101 and 
> > ExynosAutov920), so
> >they will touch the same lines in some of the DT bindings (not all, 
> > though).
> >It is reasonable for me to take the bindings for the new SoCs, to have 
> > clean
> >`make dtbs_check` on the new DTS.
> > 2. Having it together helps me to have clean `make dtbs_check` within my 
> > tree
> >on the existing DTS.
> > 3. No drivers are affected by this change.
> > 4. I plan to do the same for Tesla FSD and Exynos ARM32 SoCs, thus expect
> >follow up patchsets.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [12/17] dt-bindings: pwm: samsung: add specific compatibles for existing SoC
> commit: 5d67b8f81b9d598599366214e3b2eb5f84003c9f

You didn't honor (or even comment) Krzysztof's proposal to take the
whole patchset via his tree (marked above). Was there some off-list
agreement?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


Re: [PATCH] drm/msm/dpu: Capture dpu snapshot when frame_done_timer timeouts

2023-11-28 Thread Dmitry Baryshkov
On Tue, 28 Nov 2023 at 19:43, Paloma Arellano  wrote:
>
>
> On 11/27/2023 5:48 PM, Dmitry Baryshkov wrote:
> > On Tue, 28 Nov 2023 at 03:12, Paloma Arellano  
> > wrote:
> >> Trigger a devcoredump to dump dpu registers and capture the drm atomic
> >> state when the frame_done_timer timeouts.
> >>
> >> Signed-off-by: Paloma Arellano 
> >> ---
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 13 +++--
> >>   1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> >> b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> index 1cf7ff6caff4..5cf7594feb5a 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> >> @@ -191,6 +191,7 @@ struct dpu_encoder_virt {
> >>  void *crtc_frame_event_cb_data;
> >>
> >>  atomic_t frame_done_timeout_ms;
> >> +   atomic_t frame_done_timeout_cnt;
> >>  struct timer_list frame_done_timer;
> >>
> >>  struct msm_display_info disp_info;
> >> @@ -1204,6 +1205,8 @@ static void dpu_encoder_virt_atomic_enable(struct 
> >> drm_encoder *drm_enc,
> >>
> >>  dpu_enc->dsc = dpu_encoder_get_dsc_config(drm_enc);
> >>
> >> +   atomic_set(_enc->frame_done_timeout_cnt, 0);
> >> +
> >>  if (disp_info->intf_type == INTF_DP)
> >>  dpu_enc->wide_bus_en = 
> >> msm_dp_wide_bus_available(priv->dp[index]);
> >>  else if (disp_info->intf_type == INTF_DSI)
> >> @@ -2115,11 +2118,12 @@ static int _dpu_encoder_status_show(struct 
> >> seq_file *s, void *data)
> >>  for (i = 0; i < dpu_enc->num_phys_encs; i++) {
> >>  struct dpu_encoder_phys *phys = dpu_enc->phys_encs[i];
> >>
> >> -   seq_printf(s, "intf:%d  wb:%d  vsync:%8d underrun:%8d  
> >>   ",
> >> +   seq_printf(s, "intf:%d  wb:%d  vsync:%8d underrun:%8d  
> >>   frame_done_cnt:%d",
> >>  phys->hw_intf ? phys->hw_intf->idx - 
> >> INTF_0 : -1,
> >>  phys->hw_wb ? phys->hw_wb->idx - WB_0 : 
> >> -1,
> >>  atomic_read(>vsync_cnt),
> >> -   atomic_read(>underrun_cnt));
> >> +   atomic_read(>underrun_cnt),
> >> +   
> >> atomic_read(_enc->frame_done_timeout_cnt));
> >>
> >>  seq_printf(s, "mode: %s\n", 
> >> dpu_encoder_helper_get_intf_type(phys->intf_mode));
> >>  }
> >> @@ -2341,6 +2345,10 @@ static void dpu_encoder_frame_done_timeout(struct 
> >> timer_list *t)
> >>
> >>  DPU_ERROR_ENC(dpu_enc, "frame done timeout\n");
> >>
> >> +   atomic_inc(_enc->frame_done_timeout_cnt);
> >> +   if (atomic_read(_enc->frame_done_timeout_cnt) == 1)
> >> +   msm_disp_snapshot_state(drm_enc->dev);
> > atomic_inc_and_test(), please
>
> Hi Dmitry,
>
> We only want to create a snapshot for the first instance in which the
> timer timeouts. atomic_int_and_test() increments the value and then
> returns whether it has a value of zero or not. FWIW I think I should
> change it to 'atomic_add_return(1, _enc->frame_done_timeout_cnt)' so
> that we can check only when this value equals one.

Works for me too.

I suggested atomic_inc_test() because then we can let devcoredump take
care of duplicate events.

>
> Thank you,
>
> Paloma
>
> >
> >> +
> >>  event = DPU_ENCODER_FRAME_EVENT_ERROR;
> >>  trace_dpu_enc_frame_done_timeout(DRMID(drm_enc), event);
> >>  dpu_enc->crtc_frame_event_cb(dpu_enc->crtc_frame_event_cb_data, 
> >> event);
> >> @@ -2392,6 +2400,7 @@ struct drm_encoder *dpu_encoder_init(struct 
> >> drm_device *dev,
> >>  goto fail;
> >>
> >>  atomic_set(_enc->frame_done_timeout_ms, 0);
> >> +   atomic_set(_enc->frame_done_timeout_cnt, 0);
> >>  timer_setup(_enc->frame_done_timer,
> >>  dpu_encoder_frame_done_timeout, 0);
> >>
> >> --
> >> 2.41.0
> >>
> >



-- 
With best wishes
Dmitry


Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Dmitry Baryshkov
On Tue, 28 Nov 2023 at 21:50, Michael Walle  wrote:
>
> >> >> > DSI device lifetime has three different stages:
> >> >> > 1. before the DSI link being powered up and clocking,
> >> >> > 2. when the DSI link is in LP state (for the purpose of this question,
> >> >> > this is the time between the DSI link being powered up and the video
> >> >> > stream start)
> >> >> > 3. when the DSI link is in HS state (while streaming the video).
> >> >>
> >> >> It's not clear to me what (2) is. What is the state of the clock and
> >> >> data lanes?
> >> >
> >> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
> >>
> >> Then this is somehow missing
> >> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
> >>
> >>A DSI host should keep the PHY powered down until the pre_enable
> >> operation
> >>is called. All lanes are in an undefined idle state up to this point,
> >> and
> >>it must not be assumed that it is LP-11. pre_enable should initialise
> >> the
> >>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
> >> or HS
> >>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
> >>
> >> So I don't think these three states are sufficient, see below, that
> >> there
> >> should be at least four.
> >
> >Which one is #4?
>
> enabled clock lane (HS mode), data lanes in LP-11

What is the purpose of such a mode?

>
> -michael
>
> >>
> >> >
> >> > I don't think we support ULPS currently.
> >> >
> >> >
> >> >>
> >> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
> >> >> to release its reset while both clock and data lanes are in LP-11
> >> >> mode.
> >> >> But then it needs to be configured (via I2C) while the clock lane is
> >> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
> >> >>
> >> >> To me it looks like there is a fouth case then:
> >> >> 1. unpowered
> >> >> 2. DSI clock and data are in LP-11
> >> >> 3. DSI clock is in HS and data are in LP-11
> >> >> 4. DSI clock is in HS and data is in HS
> >> >>
> >> >> (And of course the bridge needs continuous clock mode).
> >> >>
> >> >> > Different DSI bridges have different requirements with respect to the
> >> >> > code being executed at stages 1 and 2. For example several DSI-to-eDP
> >> >> > bridges (ps8640, tc358767 require for the link to be quiet during
> >> >> > reset time.
> >> >> > The DSI-controlled bridges and DSI panels need to send some commands
> >> >> > in stage 2, before starting up video
> >> >> >
> >> >> > In the DRM subsystem stage 3 naturally maps to the
> >> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
> >> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
> >> >> > the DRM call chain.
> >> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
> >> >> > which remapped pre-enable callback execution order. However it has led
> >> >> > us to the two issues. First, at the DSI host driver we do not know
> >> >> > whether the panel / bridge were updated to use pre_enable_prev_first
> >> >> > or not. Second, if the bridge has to perform steps during both stages
> >> >> > 1 and 2, it can not do that.
> >> >> >
> >> >> > I'm trying to find a way to express the difference between stages 1
> >> >> > and 2 in the generic code, so that we do not to worry about particular
> >> >> > DSI host and DSI bridge / panel peculiarities when implementing the
> >> >> > DSI host and/or DSI panel driver.
> >> >>
> >> >> For now, I have a rather hacky ".dsi_lp11_notify" callback in
> >> >> drm_bridge_funcs which is supposed to be called by the DSI host while
> >> >> the
> >> >> clock and data lanes are in LP-11 mode. But that is rather an RFC and
> >> >> me
> >> >> needing something to get the driver for this bridge working. Because
> >> >> it's
> >> >> badly broken. FWIW, you can find my work-in-progress patches at
> >> >> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
> >> >>
> >> >> -michael
> >> >>
> >> >
> >> >
> >> > --
> >> > With best wishes
> >> > Dmitry
> >
> >
> >
>


-- 
With best wishes
Dmitry


Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-28 Thread Felix Kuehling

On 2023-11-28 12:22, Alex Deucher wrote:

On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling  wrote:

[+Alex]

On 2023-11-17 16:44, Felix Kuehling wrote:


This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.

These helper functions are needed for KFD to export and import DMABufs
the right way without duplicating the tracking of DMABufs associated with
GEM objects while ensuring that move notifier callbacks are working as
intended.

CC: Christian König 
CC: Thomas Zimmermann 
Signed-off-by: Felix Kuehling 

Re: our discussion about v2 of this patch: If this version is
acceptable, can I get an R-b or A-b?

I would like to get this patch into drm-next as a prerequisite for
patches 2 and 3. I cannot submit it to the current amd-staging-drm-next
because the patch I'm reverting doesn't exist there yet.

Patch 2 and 3 could go into drm-next as well, or go through Alex's
amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do
you prefer to coordinate this?

I guess ideally this would go through my drm-next tree since your
other patches depend on it unless others feel strongly that it should
go through drm-misc.


Yes, drm-next would work best for applying this patch and the two 
patches that depend on it. I can send you the rebased patches from my 
drm-next based branch that I used for testing this.


Regards,
  Felix




Alex



Regards,
Felix



---
   drivers/gpu/drm/drm_prime.c | 33 ++---
   include/drm/drm_prime.h |  7 +++
   2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 63b709a67471..834a5e28abbe 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
   }
   EXPORT_SYMBOL(drm_gem_dmabuf_release);

-/*
+/**
* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
* @dev: drm_device to import into
* @file_priv: drm file-private structure
@@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
*
* Returns 0 on success or a negative error code on failure.
*/
-static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
-   struct drm_file *file_priv, int prime_fd,
-   uint32_t *handle)
+int drm_gem_prime_fd_to_handle(struct drm_device *dev,
+struct drm_file *file_priv, int prime_fd,
+uint32_t *handle)
   {
   struct dma_buf *dma_buf;
   struct drm_gem_object *obj;
@@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device 
*dev,
   dma_buf_put(dma_buf);
   return ret;
   }
+EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);

   int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
@@ -408,7 +409,7 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
   return dmabuf;
   }

-/*
+/**
* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
* @dev: dev to export the buffer from
* @file_priv: drm file-private structure
@@ -421,10 +422,10 @@ static struct dma_buf *export_and_register_object(struct 
drm_device *dev,
* The actual exporting from GEM object to a dma-buf is done through the
* _gem_object_funcs.export callback.
*/
-static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
-   struct drm_file *file_priv, uint32_t handle,
-   uint32_t flags,
-   int *prime_fd)
+int drm_gem_prime_handle_to_fd(struct drm_device *dev,
+struct drm_file *file_priv, uint32_t handle,
+uint32_t flags,
+int *prime_fd)
   {
   struct drm_gem_object *obj;
   int ret = 0;
@@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device 
*dev,

   return ret;
   }
+EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);

   int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
@@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
* @obj: GEM object to export
* @flags: flags like DRM_CLOEXEC and DRM_RDWR
*
- * This is the implementation of the _gem_object_funcs.export functions
- * for GEM drivers using the PRIME helpers. It is used as the default for
- * drivers that do not set their own.
+ * This is the implementation of the _gem_object_funcs.export functions 
for GEM drivers
+ * using the PRIME helpers. It is used as the default in
+ * drm_gem_prime_handle_to_fd().
*/
   struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
int flags)
@@ -962,9 +964,10 @@ EXPORT_SYMBOL(drm_gem_prime_import_dev);
* @dev: drm_device to import into
* 

Re: [PATCH v3 16/16] drm/tilcdc: Convert to platform remove callback returning void

2023-11-28 Thread jyri . sarha
November 2, 2023 at 6:56 PM, "Uwe Kleine-König" mailto:u.kleine-koe...@pengutronix.de?to=%22Uwe%20Kleine-K%C3%B6nig%22%20%3Cu.kleine-koenig%40pengutronix.de%3E
 > wrote:

> 
> The .remove() callback for a platform driver returns an int which makes
> many driver authors wrongly assume it's possible to do error handling by
> returning an error code. However the value returned is (mostly) ignored
> and this typically results in resource leaks. To improve here there is a
> quest to make the remove callback return void. In the first step of this
> quest all drivers are converted to .remove_new() which already returns
> void.
> 
> There is one error path in tilcdc_pdev_remove() that potentially could
> yield a non-zero return code. In this case an error message describing
> the failure is emitted now instead of
> 
>  remove callback returned a non-zero value. This will be ignored.
> 
> before. Otherwise there is no difference. Also note that currently
> tilcdc_get_external_components() doesn't return negative values.
> 
> Signed-off-by: Uwe Kleine-König 

Applied this on top of drm-misc-next, dug up my good old Beaglebone-Black,and 
tested that everything still works, so:

Tested-by: Jyri Sarha 

I'll apply this shortly to drm-misc-next.

Best regards,
Jyri

> ---
> drivers/gpu/drm/tilcdc/tilcdc_drv.c | 9 -
> 1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c 
> b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> index 8ebd7134ee21..137cd9f62e9f 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> @@ -570,19 +570,18 @@ static int tilcdc_pdev_probe(struct platform_device 
> *pdev)
>  match);
> }
> 
> -static int tilcdc_pdev_remove(struct platform_device *pdev)
> +static void tilcdc_pdev_remove(struct platform_device *pdev)
> {
>  int ret;
> 
>  ret = tilcdc_get_external_components(>dev, NULL);
>  if (ret < 0)
> - return ret;
> + dev_err(>dev, "tilcdc_get_external_components() failed (%pe)\n",
> + ERR_PTR(ret));
>  else if (ret == 0)
>  tilcdc_fini(platform_get_drvdata(pdev));
>  else
>  component_master_del(>dev, _comp_ops);
> -
> - return 0;
> }
> 
> static void tilcdc_pdev_shutdown(struct platform_device *pdev)
> @@ -599,7 +598,7 @@ MODULE_DEVICE_TABLE(of, tilcdc_of_match);
> 
> static struct platform_driver tilcdc_platform_driver = {
>  .probe = tilcdc_pdev_probe,
> - .remove = tilcdc_pdev_remove,
> + .remove_new = tilcdc_pdev_remove,
>  .shutdown = tilcdc_pdev_shutdown,
>  .driver = {
>  .name = "tilcdc",
> -- 
> 2.42.0
>


Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Michael Walle
>> >> > DSI device lifetime has three different stages:
>> >> > 1. before the DSI link being powered up and clocking,
>> >> > 2. when the DSI link is in LP state (for the purpose of this question,
>> >> > this is the time between the DSI link being powered up and the video
>> >> > stream start)
>> >> > 3. when the DSI link is in HS state (while streaming the video).
>> >>
>> >> It's not clear to me what (2) is. What is the state of the clock and
>> >> data lanes?
>> >
>> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
>>
>> Then this is somehow missing
>> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
>>
>>A DSI host should keep the PHY powered down until the pre_enable
>> operation
>>is called. All lanes are in an undefined idle state up to this point,
>> and
>>it must not be assumed that it is LP-11. pre_enable should initialise
>> the
>>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
>> or HS
>>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
>>
>> So I don't think these three states are sufficient, see below, that
>> there
>> should be at least four.
>
>Which one is #4?

enabled clock lane (HS mode), data lanes in LP-11

-michael

>>
>> >
>> > I don't think we support ULPS currently.
>> >
>> >
>> >>
>> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
>> >> to release its reset while both clock and data lanes are in LP-11
>> >> mode.
>> >> But then it needs to be configured (via I2C) while the clock lane is
>> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
>> >>
>> >> To me it looks like there is a fouth case then:
>> >> 1. unpowered
>> >> 2. DSI clock and data are in LP-11
>> >> 3. DSI clock is in HS and data are in LP-11
>> >> 4. DSI clock is in HS and data is in HS
>> >>
>> >> (And of course the bridge needs continuous clock mode).
>> >>
>> >> > Different DSI bridges have different requirements with respect to the
>> >> > code being executed at stages 1 and 2. For example several DSI-to-eDP
>> >> > bridges (ps8640, tc358767 require for the link to be quiet during
>> >> > reset time.
>> >> > The DSI-controlled bridges and DSI panels need to send some commands
>> >> > in stage 2, before starting up video
>> >> >
>> >> > In the DRM subsystem stage 3 naturally maps to the
>> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
>> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
>> >> > the DRM call chain.
>> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
>> >> > which remapped pre-enable callback execution order. However it has led
>> >> > us to the two issues. First, at the DSI host driver we do not know
>> >> > whether the panel / bridge were updated to use pre_enable_prev_first
>> >> > or not. Second, if the bridge has to perform steps during both stages
>> >> > 1 and 2, it can not do that.
>> >> >
>> >> > I'm trying to find a way to express the difference between stages 1
>> >> > and 2 in the generic code, so that we do not to worry about particular
>> >> > DSI host and DSI bridge / panel peculiarities when implementing the
>> >> > DSI host and/or DSI panel driver.
>> >>
>> >> For now, I have a rather hacky ".dsi_lp11_notify" callback in
>> >> drm_bridge_funcs which is supposed to be called by the DSI host while
>> >> the
>> >> clock and data lanes are in LP-11 mode. But that is rather an RFC and
>> >> me
>> >> needing something to get the driver for this bridge working. Because
>> >> it's
>> >> badly broken. FWIW, you can find my work-in-progress patches at
>> >> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
>> >>
>> >> -michael
>> >>
>> >
>> >
>> > --
>> > With best wishes
>> > Dmitry
>
>
>



Re: [PATCH v3 16/16] drm/tilcdc: Convert to platform remove callback returning void

2023-11-28 Thread sarha
November 28, 2023 at 6:49 PM, "Uwe Kleine-König" 
mailto:u.kleine-koe...@pengutronix.de?to=%22Uwe%20Kleine-K%C3%B6nig%22%20%3Cu.kleine-koenig%40pengutronix.de%3E
 > wrote:

> 
> On Fri, Nov 03, 2023 at 09:58:07AM +0200, Tomi Valkeinen wrote:
> 
> > 
> > On 02/11/2023 18:56, Uwe Kleine-König wrote:
> > The .remove() callback for a platform driver returns an int which makes
> > many driver authors wrongly assume it's possible to do error handling by
> > returning an error code. However the value returned is (mostly) ignored
> > and this typically results in resource leaks. To improve here there is a
> > quest to make the remove callback return void. In the first step of this
> > quest all drivers are converted to .remove_new() which already returns
> > void.
> > [...]
> > 
> > Reviewed-by: Tomi Valkeinen 
> > 
> 
> This patch didn't make it into next yet. Who is responsible to pick this
> up?
> 

I expected the whole series had been applied at once. But yes, I can apply this 
patch.

Best regards,
Jyri

> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K. | Uwe Kleine-König |
> Industrial Linux Solutions | https://www.pengutronix.de/|
>


Re: [PATCH] drm/imagination: DRM_POWERVR should depend on ARCH_K3

2023-11-28 Thread Javier Martinez Canillas
Geert Uytterhoeven  writes:

> Hi Javier,
>
> On Tue, Nov 28, 2023 at 8:03 PM Javier Martinez Canillas
>  wrote:
>> Geert Uytterhoeven  writes:
>> > The Imagination Technologies PowerVR Series 6 GPU is currently only
>> > supported on Texas Instruments K3 AM62x SoCs.  Hence add a dependency on
>> > ARCH_K3, to prevent asking the user about this driver when configuring a
>> > kernel without Texas Instruments K3 Multicore SoC support.
>> >
>> > Fixes: 4babef0708656c54 ("drm/imagination: Add skeleton PowerVR driver")
>> > Signed-off-by: Geert Uytterhoeven 
>> > ---
>>
>> Indeed. Although I wonder what is the supposed policy since for example
>> the DRM_PANFROST symbol only depends on ARM || ARM64 and others such as
>
> I think ARM Mali is sufficiently ubiquitous on ARM/ARM64 systems to
> have just an ARM/ARM64 dependency...
>

Fair.

>> DRM_ETNAVIV don't even have an SoC or architecture dependency.
>
> Vivante GPUs are found in DTS files on at least 4 architectures.
> Might be worthwhile to add some dependencies, though...
>

Yeah, that's what I was thinking.

>> In any case, I agree with you that restricting to only K3 makes sense.
>
> I am looking forward to adding || SOC_AM33XX || ARCH_RENESAS || ...,
> eventually ;-)
>

Same! :)

>> Reviewed-by: Javier Martinez Canillas 
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH] drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"

2023-11-28 Thread Luben Tuikov
On 2023-11-27 11:09, Bert Karwatzki wrote:
> Commit f3123c2590005c, in combination with the use of work queues by the GPU
> scheduler, leads to random lock-ups of the GUI.
> 
> This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup() 
> still
> needs its entity argument to pass it to drm_sched_can_queue().
> 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994
> Link: 
> https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html
> Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by 
> drm_sched_entity_is_ready()")
> 
> Signed-off-by: Bert Karwatzki 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 682aebe96db7..550492a7a031 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1029,9 +1029,8 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> struct drm_sched_entity *entity)
>  {
> - if (drm_sched_entity_is_ready(entity))
> - if (drm_sched_can_queue(sched, entity))
> - drm_sched_run_job_queue(sched);
> + if (drm_sched_can_queue(sched, entity))
> + drm_sched_run_job_queue(sched);
>  }
> 
>  /**
> --
> 2.43.0
> 

Reviewed-by: Luben Tuikov 

Pushed to drm-misc-next.

Thanks!
-- 
Regards,
Luben


OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] drm/imagination: DRM_POWERVR should depend on ARCH_K3

2023-11-28 Thread Geert Uytterhoeven
Hi Javier,

On Tue, Nov 28, 2023 at 8:03 PM Javier Martinez Canillas
 wrote:
> Geert Uytterhoeven  writes:
> > The Imagination Technologies PowerVR Series 6 GPU is currently only
> > supported on Texas Instruments K3 AM62x SoCs.  Hence add a dependency on
> > ARCH_K3, to prevent asking the user about this driver when configuring a
> > kernel without Texas Instruments K3 Multicore SoC support.
> >
> > Fixes: 4babef0708656c54 ("drm/imagination: Add skeleton PowerVR driver")
> > Signed-off-by: Geert Uytterhoeven 
> > ---
>
> Indeed. Although I wonder what is the supposed policy since for example
> the DRM_PANFROST symbol only depends on ARM || ARM64 and others such as

I think ARM Mali is sufficiently ubiquitous on ARM/ARM64 systems to
have just an ARM/ARM64 dependency...

> DRM_ETNAVIV don't even have an SoC or architecture dependency.

Vivante GPUs are found in DTS files on at least 4 architectures.
Might be worthwhile to add some dependencies, though...

> In any case, I agree with you that restricting to only K3 makes sense.

I am looking forward to adding || SOC_AM33XX || ARCH_RENESAS || ...,
eventually ;-)

> Reviewed-by: Javier Martinez Canillas 

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH] drm/imagination: DRM_POWERVR should depend on ARCH_K3

2023-11-28 Thread Javier Martinez Canillas
Geert Uytterhoeven  writes:

Hello Geert,

> The Imagination Technologies PowerVR Series 6 GPU is currently only
> supported on Texas Instruments K3 AM62x SoCs.  Hence add a dependency on
> ARCH_K3, to prevent asking the user about this driver when configuring a
> kernel without Texas Instruments K3 Multicore SoC support.
>
> Fixes: 4babef0708656c54 ("drm/imagination: Add skeleton PowerVR driver")
> Signed-off-by: Geert Uytterhoeven 
> ---

Indeed. Although I wonder what is the supposed policy since for example
the DRM_PANFROST symbol only depends on ARM || ARM64 and others such as
DRM_ETNAVIV don't even have an SoC or architecture dependency.

In any case, I agree with you that restricting to only K3 makes sense.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [EXTERNAL] [PATCH drm-misc-next 5/5] drm/imagination: vm: make use of GPUVM's drm_exec helper

2023-11-28 Thread Danilo Krummrich

On 11/28/23 11:47, Donald Robson wrote:

Hi Danilo,

Apologies - I guess I should have submitted a patch to handle zero fences in 
your
locking functions with the final patch series.

On Sat, 2023-11-25 at 00:36 +0100, Danilo Krummrich wrote:

*** CAUTION: This email originates from a source not known to Imagination 
Technologies. Think before you click a link or open an attachment ***

Make use of GPUVM's drm_exec helper functions preventing direct access
to GPUVM internal data structures, such as the external object list.

This is especially important to ensure following the locking rules
around the GPUVM external object list.

Fixes: ff5f643de0bf ("drm/imagination: Add GEM and VM related code")
Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/imagination/pvr_vm.c | 16 +---
  1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/imagination/pvr_vm.c 
b/drivers/gpu/drm/imagination/pvr_vm.c
index e0d74d9a6190..3f7888f5cc53 100644
--- a/drivers/gpu/drm/imagination/pvr_vm.c
+++ b/drivers/gpu/drm/imagination/pvr_vm.c
@@ -337,27 +337,21 @@ static int
  pvr_vm_bind_op_lock_resvs(struct drm_exec *exec, struct pvr_vm_bind_op 
*bind_op)
  {
drm_exec_until_all_locked(exec) {
-   struct drm_gem_object *r_obj = _op->vm_ctx->dummy_gem;
struct drm_gpuvm *gpuvm = _op->vm_ctx->gpuvm_mgr;
struct pvr_gem_object *pvr_obj = bind_op->pvr_obj;
-   struct drm_gpuvm_bo *gpuvm_bo;
  
  		/* Acquire lock on the vm_context's reserve object. */

-   int err = drm_exec_lock_obj(exec, r_obj);
+   int err = drm_gpuvm_prepare_vm(gpuvm, exec, 0);
  
  		drm_exec_retry_on_contention(exec);

if (err)
return err;
  
  		/* Acquire lock on all BOs in the context. */

-   list_for_each_entry(gpuvm_bo, >extobj.list,
-   list.entry.extobj) {
-   err = drm_exec_lock_obj(exec, gpuvm_bo->obj);
-
-   drm_exec_retry_on_contention(exec);
-   if (err)
-   return err;
-   }
+   err = drm_gpuvm_prepare_objects(gpuvm, exec, 0);
+   drm_exec_retry_on_contention(exec);
+   if (err)
+   return err;


Before I discovered the problem when not reserving fences, I was trying to use
drm_gpuvm_exec_lock() with vm_exec->extra.fn() for the part below.  Is there
a reason not to do that now?


No, that works - gonna change that.

- Danilo



Many thanks,
Donald

  
  		/* Unmap operations don't have an object to lock. */

if (!pvr_obj)




[PATCH] drm/imagination: DRM_POWERVR should depend on ARCH_K3

2023-11-28 Thread Geert Uytterhoeven
The Imagination Technologies PowerVR Series 6 GPU is currently only
supported on Texas Instruments K3 AM62x SoCs.  Hence add a dependency on
ARCH_K3, to prevent asking the user about this driver when configuring a
kernel without Texas Instruments K3 Multicore SoC support.

Fixes: 4babef0708656c54 ("drm/imagination: Add skeleton PowerVR driver")
Signed-off-by: Geert Uytterhoeven 
---
 drivers/gpu/drm/imagination/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/imagination/Kconfig 
b/drivers/gpu/drm/imagination/Kconfig
index 3bfa2ac212dccb73..af492dbd9afd4ed9 100644
--- a/drivers/gpu/drm/imagination/Kconfig
+++ b/drivers/gpu/drm/imagination/Kconfig
@@ -6,6 +6,7 @@ config DRM_POWERVR
depends on ARM64
depends on DRM
depends on PM
+   depends on ARCH_K3 || COMPILE_TEST
select DRM_EXEC
select DRM_GEM_SHMEM_HELPER
select DRM_SCHED
-- 
2.34.1



Re: [PATCH] backlight: mp3309c: fix uninitialized local variable

2023-11-28 Thread Daniel Thompson
On Tue, Nov 28, 2023 at 04:08:39PM +0100, Flavio Suligoi wrote:
> In the function "pm3309c_parse_dt_node", when the dimming analog control
> mode (by I2C messages) is enabled, the local variable "prop_levels" is
> tested without any initialization, as indicated by the following smatch
> warning (thanks to Dan Carpenter for the report):

Good to see credit for the reporter but please use a "Reported-by:" tag
for that. There should probably be a "Fixed:" tag too.


> drivers/video/backlight/mp3309c.c:279 pm3309c_parse_dt_node() error: 
> uninitialized symbol 'prop_levels'.
>
> To avoid any problem in case of undefined behavior, we need to initialize
> it to "NULL".
> For consistency, I also initialize the other similar variable
> "prop_pwms" in the same way.

I don't love redundant initializations... but I can live with it ;-) .


Daniel.


Re: [Intel-gfx] [PATCH v3 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-28 Thread Matt Roper
On Mon, Nov 27, 2023 at 12:11:50PM -0800, Alan Previn wrote:
> Add missing tag for "Wa_14019159160 - Case 2" (for existing
> PXP code that ensures run alone mode bit is set to allow
> PxP-decryption.
> 
>  v3: - Check targeted platforms using IP_VAL. (John Harrison)
>  v2: - Fix WA id number (John Harrison).
>  - Improve comments and code to be specific
>for the targeted platforms (John Harrison)
> 
> Signed-off-by: Alan Previn 
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 7c367ba8d9dc..1152cf25d578 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -863,10 +863,12 @@ static bool ctx_needs_runalone(const struct 
> intel_context *ce)
>   bool ctx_is_protected = false;
>  
>   /*
> -  * On MTL and newer platforms, protected contexts require setting
> -  * the LRC run-alone bit or else the encryption will not happen.
> +  * Wa_14019159160 - Case 2: mtl
> +  * On some platforms, protected contexts require setting
> +  * the LRC run-alone bit or else the encryption/decryption will not 
> happen.
> +  * NOTE: Case 2 only applies to PXP use-case of said workaround.
>*/
> - if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
> + if (GRAPHICS_VER_FULL(ce->engine->i915) == IP_VER(12, 70) &&

The workaround database lists this as being needed on both 12.70 and
12.71.  Should this be a

IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))

check instead?

The workaround is also listed in the database as applying to DG2; is
this "case 2" subset of the workaround not relevant to that platform?


Matt

>   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
> RENDER_CLASS)) {
>   rcu_read_lock();
>   gem_ctx = rcu_dereference(ce->gem_context);
> 
> base-commit: 5429d55de723544dfc0630cf39d96392052b27a1
> -- 
> 2.39.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: (subset) [PATCH] drm/imagination: Numerous documentation fixes.

2023-11-28 Thread Maxime Ripard
On Tue, 28 Nov 2023 17:35:07 +, Donald Robson wrote:
> Some reported by Stephen Rothwell. The rest were found by running the
> kernel-doc build script.
> Some indentation fixes.
> 
> 

Applied to drm/drm-misc (drm-misc-next).

Thanks!
Maxime



[PATCH v3 6/9] drm/amd/display: create DCN3-specific log for MPC state

2023-11-28 Thread Melissa Wen
Logging DCN3 MPC state was following DCN1 implementation that doesn't
consider new DCN3 MPC color blocks. Create new elements according to
DCN3 MPC color caps and a new DCN3-specific function for reading MPC
data.

v3:
- remove gamut remap reg reading in favor of fixed31_32 matrix data

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c  | 48 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h   |  7 +++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
index a6a4c3413f89..bf3386cd444d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
@@ -1440,8 +1440,54 @@ static void mpc3_set_mpc_mem_lp_mode(struct mpc *mpc)
}
 }
 
+static void mpc3_read_mpcc_state(
+   struct mpc *mpc,
+   int mpcc_inst,
+   struct mpcc_state *s)
+{
+   struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc);
+   uint32_t rmu_status = 0xf;
+
+   REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, >opp_id);
+   REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, >dpp_id);
+   REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, >bot_mpcc_id);
+   REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, >mode,
+   MPCC_ALPHA_BLND_MODE, >alpha_mode,
+   MPCC_ALPHA_MULTIPLIED_MODE, >pre_multiplied_alpha,
+   MPCC_BLND_ACTIVE_OVERLAP_ONLY, >overlap_only);
+   REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, >idle,
+   MPCC_BUSY, >busy);
+
+   /* Color blocks state */
+   REG_GET(MPC_RMU_CONTROL, MPC_RMU0_MUX_STATUS, _status);
+
+   if (rmu_status == mpcc_inst) {
+   REG_GET(SHAPER_CONTROL[0],
+   MPC_RMU_SHAPER_LUT_MODE_CURRENT, >shaper_lut_mode);
+   REG_GET(RMU_3DLUT_MODE[0],
+   MPC_RMU_3DLUT_MODE_CURRENT,  >lut3d_mode);
+   REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[0],
+   MPC_RMU_3DLUT_30BIT_EN, >lut3d_bit_depth);
+   REG_GET(RMU_3DLUT_MODE[0],
+   MPC_RMU_3DLUT_SIZE, >lut3d_size);
+   } else {
+   REG_GET(SHAPER_CONTROL[1],
+   MPC_RMU_SHAPER_LUT_MODE_CURRENT, >shaper_lut_mode);
+   REG_GET(RMU_3DLUT_MODE[1],
+   MPC_RMU_3DLUT_MODE_CURRENT,  >lut3d_mode);
+   REG_GET(RMU_3DLUT_READ_WRITE_CONTROL[1],
+   MPC_RMU_3DLUT_30BIT_EN, >lut3d_bit_depth);
+   REG_GET(RMU_3DLUT_MODE[1],
+   MPC_RMU_3DLUT_SIZE, >lut3d_size);
+   }
+
+REG_GET_2(MPCC_OGAM_CONTROL[mpcc_inst],
+ MPCC_OGAM_MODE_CURRENT, >rgam_mode,
+ MPCC_OGAM_SELECT_CURRENT, >rgam_lut);
+}
+
 static const struct mpc_funcs dcn30_mpc_funcs = {
-   .read_mpcc_state = mpc1_read_mpcc_state,
+   .read_mpcc_state = mpc3_read_mpcc_state,
.insert_plane = mpc1_insert_plane,
.remove_mpcc = mpc1_remove_mpcc,
.mpc_init = mpc1_mpc_init,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
index 61a2406dcc53..a11e40fddc44 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h
@@ -199,6 +199,13 @@ struct mpcc_state {
uint32_t overlap_only;
uint32_t idle;
uint32_t busy;
+   uint32_t shaper_lut_mode;
+   uint32_t lut3d_mode;
+   uint32_t lut3d_bit_depth;
+   uint32_t lut3d_size;
+   uint32_t rgam_mode;
+   uint32_t rgam_lut;
+   struct mpc_grph_gamut_adjustment gamut_remap;
 };
 
 /**
-- 
2.42.0



[PATCH v3 8/9] drm/amd/display: add DPP and MPC color caps to DTN log

2023-11-28 Thread Melissa Wen
Add color caps information for DPP and MPC block to show HW color caps.

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 23 +++
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c   | 23 +++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index f7d9bcdbc6c6..d3cab6fb270b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -346,6 +346,24 @@ static void dcn10_log_color_state(struct dc *dc,
DTN_INFO("\n");
}
DTN_INFO("\n");
+   DTN_INFO("DPP Color Caps: input_lut_shared:%d  icsc:%d"
+"  dgam_ram:%d  dgam_rom: 
srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d"
+"  post_csc:%d  gamcor:%d  dgam_rom_for_yuv:%d  3d_lut:%d"
+"  blnd_lut:%d  oscs:%d\n\n",
+dc->caps.color.dpp.input_lut_shared,
+dc->caps.color.dpp.icsc,
+dc->caps.color.dpp.dgam_ram,
+dc->caps.color.dpp.dgam_rom_caps.srgb,
+dc->caps.color.dpp.dgam_rom_caps.bt2020,
+dc->caps.color.dpp.dgam_rom_caps.gamma2_2,
+dc->caps.color.dpp.dgam_rom_caps.pq,
+dc->caps.color.dpp.dgam_rom_caps.hlg,
+dc->caps.color.dpp.post_csc,
+dc->caps.color.dpp.gamma_corr,
+dc->caps.color.dpp.dgam_rom_for_yuv,
+dc->caps.color.dpp.hw_3d_lut,
+dc->caps.color.dpp.ogam_ram,
+dc->caps.color.dpp.ocsc);
 
DTN_INFO("MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  
OVERLAP_ONLY  IDLE\n");
for (i = 0; i < pool->pipe_count; i++) {
@@ -359,6 +377,11 @@ static void dcn10_log_color_state(struct dc *dc,
s.idle);
}
DTN_INFO("\n");
+   DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, 
ocsc:%d\n\n",
+dc->caps.color.mpc.gamut_remap,
+dc->caps.color.mpc.num_3dluts,
+dc->caps.color.mpc.ogam_ram,
+dc->caps.color.mpc.ocsc);
 }
 
 void dcn10_log_hw_state(struct dc *dc,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index 1e07f0a6be1f..3b38af592101 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -140,6 +140,24 @@ void dcn30_log_color_state(struct dc *dc,
DTN_INFO("\n");
}
DTN_INFO("\n");
+   DTN_INFO("DPP Color Caps: input_lut_shared:%d  icsc:%d"
+"  dgam_ram:%d  dgam_rom: 
srgb:%d,bt2020:%d,gamma2_2:%d,pq:%d,hlg:%d"
+"  post_csc:%d  gamcor:%d  dgam_rom_for_yuv:%d  3d_lut:%d"
+"  blnd_lut:%d  oscs:%d\n\n",
+dc->caps.color.dpp.input_lut_shared,
+dc->caps.color.dpp.icsc,
+dc->caps.color.dpp.dgam_ram,
+dc->caps.color.dpp.dgam_rom_caps.srgb,
+dc->caps.color.dpp.dgam_rom_caps.bt2020,
+dc->caps.color.dpp.dgam_rom_caps.gamma2_2,
+dc->caps.color.dpp.dgam_rom_caps.pq,
+dc->caps.color.dpp.dgam_rom_caps.hlg,
+dc->caps.color.dpp.post_csc,
+dc->caps.color.dpp.gamma_corr,
+dc->caps.color.dpp.dgam_rom_for_yuv,
+dc->caps.color.dpp.hw_3d_lut,
+dc->caps.color.dpp.ogam_ram,
+dc->caps.color.dpp.ocsc);
 
DTN_INFO("MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  
OVERLAP_ONLY  IDLE"
 "  SHAPER mode  3DLUT mode  3DLUT bit-depth  3DLUT size  OGAM 
mode  OGAM LUT"
@@ -193,6 +211,11 @@ void dcn30_log_color_state(struct dc *dc,
 
}
DTN_INFO("\n");
+   DTN_INFO("MPC Color Caps: gamut_remap:%d, 3dlut:%d, ogam_ram:%d, 
ocsc:%d\n\n",
+dc->caps.color.mpc.gamut_remap,
+dc->caps.color.mpc.num_3dluts,
+dc->caps.color.mpc.ogam_ram,
+dc->caps.color.mpc.ocsc);
 }
 
 bool dcn30_set_blend_lut(
-- 
2.42.0



[PATCH v3 9/9] drm/amd/display: hook up DCN20 color blocks data to DTN log

2023-11-28 Thread Melissa Wen
Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN2+ state. Create a
color state log specific to DCN2.0 and hook it up to DCN2 family
drivers. Instead of reading gamut remap reg values, display gamut remap
matrix data in fixed 31.32.

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  |  30 ++---
 .../gpu/drm/amd/display/dc/dcn20/dcn20_init.c |   1 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c  |  24 +++-
 .../gpu/drm/amd/display/dc/dcn21/dcn21_init.c |   1 +
 .../amd/display/dc/hwss/dcn20/dcn20_hwseq.c   | 106 ++
 .../amd/display/dc/hwss/dcn20/dcn20_hwseq.h   |   2 +
 6 files changed, 149 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
index dedc2dcf2691..1516c0a48726 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c
@@ -55,21 +55,23 @@ void dpp20_read_state(struct dpp *dpp_base,
 
REG_GET(DPP_CONTROL,
DPP_CLOCK_ENABLE, >is_enabled);
+
+   // Degamma LUT (RAM)
REG_GET(CM_DGAM_CONTROL,
-   CM_DGAM_LUT_MODE, >dgam_lut_mode);
-   // BGAM has no ROM, and definition is different, can't reuse same dump
-   //REG_GET(CM_BLNDGAM_CONTROL,
-   //  CM_BLNDGAM_LUT_MODE, >rgam_lut_mode);
-   REG_GET(CM_GAMUT_REMAP_CONTROL,
-   CM_GAMUT_REMAP_MODE, >gamut_remap_mode);
-   if (s->gamut_remap_mode) {
-   s->gamut_remap_c11_c12 = REG_READ(CM_GAMUT_REMAP_C11_C12);
-   s->gamut_remap_c13_c14 = REG_READ(CM_GAMUT_REMAP_C13_C14);
-   s->gamut_remap_c21_c22 = REG_READ(CM_GAMUT_REMAP_C21_C22);
-   s->gamut_remap_c23_c24 = REG_READ(CM_GAMUT_REMAP_C23_C24);
-   s->gamut_remap_c31_c32 = REG_READ(CM_GAMUT_REMAP_C31_C32);
-   s->gamut_remap_c33_c34 = REG_READ(CM_GAMUT_REMAP_C33_C34);
-   }
+   CM_DGAM_LUT_MODE, >dgam_lut_mode);
+
+   // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size)
+   REG_GET(CM_SHAPER_CONTROL,
+   CM_SHAPER_LUT_MODE, >shaper_lut_mode);
+   REG_GET_2(CM_3DLUT_READ_WRITE_CONTROL,
+ CM_3DLUT_CONFIG_STATUS, >lut3d_mode,
+ CM_3DLUT_30BIT_EN, >lut3d_bit_depth);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_SIZE, >lut3d_size);
+
+   // Blend/Out Gamma (RAM)
+   REG_GET(CM_BLNDGAM_LUT_WRITE_EN_MASK,
+   CM_BLNDGAM_CONFIG_STATUS, >rgam_lut_mode);
 }
 
 void dpp2_power_on_obuf(
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
index 884e3e323338..ef6488165b8f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_init.c
@@ -67,6 +67,7 @@ static const struct hw_sequencer_funcs dcn20_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dce110_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn20_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
index 5da6e44f284a..16b5ff208d14 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
@@ -542,8 +542,30 @@ static struct mpcc *mpc2_get_mpcc_for_dpp(struct mpc_tree 
*tree, int dpp_id)
return NULL;
 }
 
+static void mpc2_read_mpcc_state(
+   struct mpc *mpc,
+   int mpcc_inst,
+   struct mpcc_state *s)
+{
+   struct dcn20_mpc *mpc20 = TO_DCN20_MPC(mpc);
+
+   REG_GET(MPCC_OPP_ID[mpcc_inst], MPCC_OPP_ID, >opp_id);
+   REG_GET(MPCC_TOP_SEL[mpcc_inst], MPCC_TOP_SEL, >dpp_id);
+   REG_GET(MPCC_BOT_SEL[mpcc_inst], MPCC_BOT_SEL, >bot_mpcc_id);
+   REG_GET_4(MPCC_CONTROL[mpcc_inst], MPCC_MODE, >mode,
+   MPCC_ALPHA_BLND_MODE, >alpha_mode,
+   MPCC_ALPHA_MULTIPLIED_MODE, >pre_multiplied_alpha,
+   MPCC_BLND_ACTIVE_OVERLAP_ONLY, >overlap_only);
+   REG_GET_2(MPCC_STATUS[mpcc_inst], MPCC_IDLE, >idle,
+   MPCC_BUSY, >busy);
+
+   /* Gamma block state */
+   REG_GET(MPCC_OGAM_LUT_RAM_CONTROL[mpcc_inst],
+   MPCC_OGAM_CONFIG_STATUS, >rgam_mode);
+}
+
 static const struct mpc_funcs dcn20_mpc_funcs = {
-   .read_mpcc_state = mpc1_read_mpcc_state,
+   .read_mpcc_state = mpc2_read_mpcc_state,
.insert_plane = mpc1_insert_plane,
.remove_mpcc = mpc1_remove_mpcc,
.mpc_init = mpc1_mpc_init,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_init.c 

[PATCH v3 7/9] drm/amd/display: hook up DCN30 color blocks data to DTN log

2023-11-28 Thread Melissa Wen
Color caps changed between HW versions, which caused the DCN10 color
state sections in the DTN log to no longer match DCN3+ state. Create a
color state log specific to DCN3.0 and hook it up to DCN3.0+ and DCN3.1+
drivers.

rfc-v2:
- detail RAM mode for gamcor and blnd gamma blocks
- add MPC gamut remap matrix log

v3:
- read MPC gamut remap matrix in fixed 31.32 format
- extend to DCN3.0+ and DCN3.1+ drivers (Harry)

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_init.c |   1 +
 .../drm/amd/display/dc/dcn301/dcn301_init.c   |   1 +
 .../gpu/drm/amd/display/dc/dcn31/dcn31_init.c |   1 +
 .../drm/amd/display/dc/dcn314/dcn314_init.c   |   1 +
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   |   5 +-
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.c   | 126 ++
 .../amd/display/dc/hwss/dcn30/dcn30_hwseq.h   |   3 +
 .../drm/amd/display/dc/hwss/hw_sequencer.h|   2 +
 8 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
index 9894caedffed..4064e6b7f599 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_init.c
@@ -68,6 +68,7 @@ static const struct hw_sequencer_funcs dcn30_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
index 6477009ce065..1a9122252702 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
@@ -72,6 +72,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
index 669f524bd064..61577a3678a0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_init.c
@@ -71,6 +71,7 @@ static const struct hw_sequencer_funcs dcn31_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
index ccb7e317e86a..094b912832c1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_init.c
@@ -73,6 +73,7 @@ static const struct hw_sequencer_funcs dcn314_funcs = {
.setup_stereo = dcn10_setup_stereo,
.set_avmute = dcn30_set_avmute,
.log_hw_state = dcn10_log_hw_state,
+   .log_color_state = dcn30_log_color_state,
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index f0a9f8818909..f7d9bcdbc6c6 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -374,7 +374,10 @@ void dcn10_log_hw_state(struct dc *dc,
 
dcn10_log_hubp_states(dc, log_ctx);
 
-   dcn10_log_color_state(dc, log_ctx);
+   if (dc->hwss.log_color_state)
+   dc->hwss.log_color_state(dc, log_ctx);
+   else
+   dcn10_log_color_state(dc, log_ctx);
 
DTN_INFO("OTG:  v_bs  v_be  v_ss  v_se  vpol  vmax  vmin  vmax_sel  
vmin_sel  h_bs  h_be  h_ss  h_se  hpol  htot  vtot  underflow blank_en\n");
 
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index d71faf2ecd41..1e07f0a6be1f 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -69,6 +69,132 @@
 #define FN(reg_name, field_name) \
hws->shifts->field_name, hws->masks->field_name
 
+void dcn30_log_color_state(struct dc *dc,
+  struct dc_log_buffer_ctx *log_ctx)
+{
+

[PATCH v3 5/9] drm/amd/display: add get_gamut_remap helper for MPC3

2023-11-28 Thread Melissa Wen
We want to be able to read the MPC's gamut remap matrix similar to
what we do with .dpp_get_gamut_remap functions. On the other hand, we
don't need a hook here because only DCN3+ has the MPC gamut remap
block, being absent in previous families.

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c  | 58 +++
 .../gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h  |  4 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
index d1500b223858..a6a4c3413f89 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.c
@@ -1129,6 +1129,64 @@ void mpc3_set_gamut_remap(
}
 }
 
+static void read_gamut_remap(struct dcn30_mpc *mpc30,
+int mpcc_id,
+uint16_t *regval,
+uint32_t *select)
+{
+   struct color_matrices_reg gam_regs;
+
+   //current coefficient set in use
+   REG_GET(MPCC_GAMUT_REMAP_MODE[mpcc_id], MPCC_GAMUT_REMAP_MODE_CURRENT, 
select);
+
+   gam_regs.shifts.csc_c11 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C11_A;
+   gam_regs.masks.csc_c11  = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C11_A;
+   gam_regs.shifts.csc_c12 = mpc30->mpc_shift->MPCC_GAMUT_REMAP_C12_A;
+   gam_regs.masks.csc_c12 = mpc30->mpc_mask->MPCC_GAMUT_REMAP_C12_A;
+
+   if (*select == GAMUT_REMAP_COEFF) {
+   gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_A[mpcc_id]);
+   gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_A[mpcc_id]);
+
+   cm_helper_read_color_matrices(
+   mpc30->base.ctx,
+   regval,
+   _regs);
+
+   } else  if (*select == GAMUT_REMAP_COMA_COEFF) {
+
+   gam_regs.csc_c11_c12 = REG(MPC_GAMUT_REMAP_C11_C12_B[mpcc_id]);
+   gam_regs.csc_c33_c34 = REG(MPC_GAMUT_REMAP_C33_C34_B[mpcc_id]);
+
+   cm_helper_read_color_matrices(
+   mpc30->base.ctx,
+   regval,
+   _regs);
+
+   }
+
+}
+
+void mpc3_get_gamut_remap(struct mpc *mpc,
+ int mpcc_id,
+ struct mpc_grph_gamut_adjustment *adjust)
+{
+   struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc);
+   uint16_t arr_reg_val[12];
+   int select;
+
+   read_gamut_remap(mpc30, mpcc_id, arr_reg_val, );
+
+   if (select == GAMUT_REMAP_BYPASS) {
+   adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_BYPASS;
+   return;
+   }
+
+   adjust->gamut_adjust_type = GRAPHICS_GAMUT_ADJUST_TYPE_SW;
+   convert_hw_matrix(adjust->temperature_matrix,
+ arr_reg_val, ARRAY_SIZE(arr_reg_val));
+}
+
 bool mpc3_program_3dlut(
struct mpc *mpc,
const struct tetrahedral_params *params,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
index 5198f2167c7c..9cb96ae95a2f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_mpc.h
@@ -1056,6 +1056,10 @@ void mpc3_set_gamut_remap(
int mpcc_id,
const struct mpc_grph_gamut_adjustment *adjust);
 
+void mpc3_get_gamut_remap(struct mpc *mpc,
+ int mpcc_id,
+ struct mpc_grph_gamut_adjustment *adjust);
+
 void mpc3_set_rmu_mux(
struct mpc *mpc,
int rmu_idx,
-- 
2.42.0



[PATCH v3 2/9] drm/amd/display: Add dpp_get_gamut_remap functions

2023-11-28 Thread Melissa Wen
From: Harry Wentland 

We want to be able to read the DPP's gamut remap matrix.

v2:
- code-style and doc comments clean-up (Melissa)

Signed-off-by: Harry Wentland 
Signed-off-by: Melissa Wen 
---
 .../drm/amd/display/dc/basics/conversion.c| 34 +
 .../drm/amd/display/dc/basics/conversion.h|  4 ++
 .../amd/display/dc/dcn10/dcn10_cm_common.c| 20 ++
 .../amd/display/dc/dcn10/dcn10_cm_common.h|  4 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c  |  3 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h  |  3 +
 .../drm/amd/display/dc/dcn10/dcn10_dpp_cm.c   | 70 ++-
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.c  |  1 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dpp.h  |  3 +
 .../drm/amd/display/dc/dcn20/dcn20_dpp_cm.c   | 55 +++
 .../drm/amd/display/dc/dcn201/dcn201_dpp.c|  1 +
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  |  1 +
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.h  |  2 +
 .../drm/amd/display/dc/dcn30/dcn30_dpp_cm.c   | 54 ++
 .../gpu/drm/amd/display/dc/dcn32/dcn32_dpp.c  |  1 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  3 +
 16 files changed, 256 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.c 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
index e295a839ab47..f0065a51beb9 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.c
@@ -101,6 +101,40 @@ void convert_float_matrix(
}
 }
 
+static struct fixed31_32 int_frac_to_fixed_point(uint16_t arg,
+uint8_t integer_bits,
+uint8_t fractional_bits)
+{
+   struct fixed31_32 result;
+   uint16_t sign_mask = 1 << (fractional_bits + integer_bits);
+   uint16_t value_mask = sign_mask - 1;
+
+   result.value = (long long)(arg & value_mask) <<
+  (FIXED31_32_BITS_PER_FRACTIONAL_PART - fractional_bits);
+
+   if (arg & sign_mask)
+   result = dc_fixpt_neg(result);
+
+   return result;
+}
+
+/**
+ * convert_hw_matrix - converts HW values into fixed31_32 matrix.
+ * @matrix: fixed point 31.32 matrix
+ * @reg: array of register values
+ * @buffer_size: size of the array of register values
+ *
+ * Converts HW register spec defined format S2D13 into a fixed-point 31.32
+ * matrix.
+ */
+void convert_hw_matrix(struct fixed31_32 *matrix,
+  uint16_t *reg,
+  uint32_t buffer_size)
+{
+   for (int i = 0; i < buffer_size; ++i)
+   matrix[i] = int_frac_to_fixed_point(reg[i], 2, 13);
+}
+
 static uint32_t find_gcd(uint32_t a, uint32_t b)
 {
uint32_t remainder = 0;
diff --git a/drivers/gpu/drm/amd/display/dc/basics/conversion.h 
b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
index 81da4e6f7a1a..a433cef78496 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/conversion.h
+++ b/drivers/gpu/drm/amd/display/dc/basics/conversion.h
@@ -41,6 +41,10 @@ void convert_float_matrix(
 void reduce_fraction(uint32_t num, uint32_t den,
uint32_t *out_num, uint32_t *out_den);
 
+void convert_hw_matrix(struct fixed31_32 *matrix,
+  uint16_t *reg,
+  uint32_t buffer_size);
+
 static inline unsigned int log_2(unsigned int num)
 {
return ilog2(num);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
index 3538973bd0c6..b7e57aa27361 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c
@@ -62,6 +62,26 @@ void cm_helper_program_color_matrices(
 
 }
 
+void cm_helper_read_color_matrices(struct dc_context *ctx,
+  uint16_t *regval,
+  const struct color_matrices_reg *reg)
+{
+   uint32_t cur_csc_reg, regval0, regval1;
+   unsigned int i = 0;
+
+   for (cur_csc_reg = reg->csc_c11_c12;
+cur_csc_reg <= reg->csc_c33_c34; cur_csc_reg++) {
+   REG_GET_2(cur_csc_reg,
+   csc_c11, ,
+   csc_c12, );
+
+   regval[2 * i] = regval0;
+   regval[(2 * i) + 1] = regval1;
+
+   i++;
+   }
+}
+
 void cm_helper_program_xfer_func(
struct dc_context *ctx,
const struct pwl_params *params,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
index 0a68b63d6126..decc50b1ac53 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.h
@@ -114,5 +114,7 @@ bool cm_helper_translate_curve_to_degamma_hw_format(
const struct dc_transfer_func *output_tf,
struct pwl_params 

[PATCH v3 4/9] drm/amd/display: fill up DCN3 DPP color state

2023-11-28 Thread Melissa Wen
DCN3 DPP color state was uncollected and some state elements from DCN1
doesn't fit DCN3. Create new elements according to DCN3 color caps and
fill them up for DTN log output.

rfc-v2:
- fix reading of gamcor and blnd gamma states
- remove gamut remap register in favor of gamut remap matrix reading

Signed-off-by: Melissa Wen 
---
 .../gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c  | 37 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  8 
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
index 7c18f31bb56c..a3a769aad042 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c
@@ -44,12 +44,45 @@
 void dpp30_read_state(struct dpp *dpp_base, struct dcn_dpp_state *s)
 {
struct dcn3_dpp *dpp = TO_DCN30_DPP(dpp_base);
+   uint32_t gamcor_lut_mode, rgam_lut_mode;
 
REG_GET(DPP_CONTROL,
-   DPP_CLOCK_ENABLE, >is_enabled);
+   DPP_CLOCK_ENABLE, >is_enabled);
 
-   // TODO: Implement for DCN3
+   // Pre-degamma (ROM)
+   REG_GET_2(PRE_DEGAM,
+ PRE_DEGAM_MODE, >pre_dgam_mode,
+ PRE_DEGAM_SELECT, >pre_dgam_select);
+
+   // Gamma Correction (RAM)
+   REG_GET(CM_GAMCOR_CONTROL,
+   CM_GAMCOR_MODE_CURRENT, >gamcor_mode);
+   if (s->gamcor_mode) {
+   REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_SELECT_CURRENT, 
_lut_mode);
+   if (!gamcor_lut_mode)
+   s->gamcor_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B
+   }
+
+   // Shaper LUT (RAM), 3D LUT (mode, bit-depth, size)
+   REG_GET(CM_SHAPER_CONTROL,
+   CM_SHAPER_LUT_MODE, >shaper_lut_mode);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_MODE_CURRENT, >lut3d_mode);
+   REG_GET(CM_3DLUT_READ_WRITE_CONTROL,
+   CM_3DLUT_30BIT_EN, >lut3d_bit_depth);
+   REG_GET(CM_3DLUT_MODE,
+   CM_3DLUT_SIZE, >lut3d_size);
+
+   // Blend/Out Gamma (RAM)
+   REG_GET(CM_BLNDGAM_CONTROL,
+   CM_BLNDGAM_MODE_CURRENT, >rgam_lut_mode);
+   if (s->rgam_lut_mode){
+   REG_GET(CM_BLNDGAM_CONTROL, CM_BLNDGAM_SELECT_CURRENT, 
_lut_mode);
+   if (!rgam_lut_mode)
+   s->rgam_lut_mode = LUT_RAM_A; // Otherwise, LUT_RAM_B
+   }
 }
+
 /*program post scaler scs block in dpp CM*/
 void dpp3_program_post_csc(
struct dpp *dpp_base,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
index b6acfd86642a..4e604bf24f51 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
@@ -151,6 +151,14 @@ struct dcn_dpp_state {
uint32_t gamut_remap_c33_c34;
// gamut_remap data for dcn*_log_color_state()
struct dpp_grph_csc_adjustment gamut_remap;
+   uint32_t shaper_lut_mode;
+   uint32_t lut3d_mode;
+   uint32_t lut3d_bit_depth;
+   uint32_t lut3d_size;
+   uint32_t blnd_lut_mode;
+   uint32_t pre_dgam_mode;
+   uint32_t pre_dgam_select;
+   uint32_t gamcor_mode;
 };
 
 struct CM_bias_params {
-- 
2.42.0



[PATCH v3 0/9] drm/amd/display: improve DTN color state log

2023-11-28 Thread Melissa Wen
This series updates the color state section of the AMD DTN log to match
color resource differences between DCN versions.

Currently, the DTN log considers the DCN10 color pipeline, which is
useless for newer DCN versions due to all the color capability
differences. In addition to the new color blocks and features, some
semantic differences meant that the DCN10 output was no longer suitable
for newer families.

This version addresses comments from Siqueira and Harry [1]. It also
contains some improvements: DPP and MPC gamut remap matrix data in 31.32
fixed point format and coverage of DCN2+ and DCN3+.

- The first patch decouple DCN10 color state from HW log in a
  preparation for color logs specific to each DCN family.
- Harry kindly provided the second patch with a way to read Gamut Remap
  Matrix data in 31.32 fixed point format instead of HW values.
- With this, I replaced the DCN10 gamut remap output to display values
  in the new format (third patch).
- Patches 4 and 6 fill up the color state of DPP and MPC blocks for DCN3
  from the right registers.
- As DCN3+ has a new MPC color block for post-blending Gamut Remap
  matrix, in the patch 5 I reuse Harry's approach for reading DPP gamut
  remap matrix and create a helper to read data of MPC gamut remap
  matrix.
- Patch 7 and 9 create the new color state log specific for DCN2+ and
  DCN3+. I didn't extend to DCN32 (and also DCN35) because I observed
  some differences in the shaper and 3D LUT registers of this version.
- Patch 8 adds description of DPP and MPC color blocks for for better
  interpretation of data.

This new approach works well with the driver-specific color
properties[2] and steamdeck/gamescope[3] together, where we can see
color state changing from default values. I also tested with
steamdeck/KDE and DCN21/GNOME.

Please find some `before vs after` results below:

===

DCN301 - Before:
---

DPP:IGAM format  IGAM modeDGAM modeRGAM mode  GAMUT mode  C11 C12   
C13 C14   C21 C22   C23 C24   C31 C32   C33 C34
[ 0]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 1]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 2]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h
[ 3]:0h  BypassFixed  Bypass   Bypass0h 
h h h h h

MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  OVERLAP_ONLY  IDLE
[ 0]:   0h   0h   2h 3   01 0 0
[ 1]:   0h   1h   fh 2   20 0 0
[ 2]:   0h   2h   3h 3   01 0 0
[ 3]:   0h   3h   1h 3   20 0 0


DCN301 - After (Gamescope):
---

DPP:  DGAM ROM  DGAM ROM type  DGAM LUT  SHAPER mode  3DLUT mode  3DLUT bit 
depth  3DLUT size  RGAM mode  GAMUT adjust  C11C12C13
C14C21C22C23C24C31C32
C33C34
[ 0]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 1]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 2]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00
[ 3]:1   sRGBBypassRAM B   RAM A   
12-bit17x17x17  RAM ABypass  00 00 00 
00 00 00 00 00 00 00 
00 00

DPP Color Caps: input_lut_shared:0  icsc:1  dgam_ram:0  dgam_rom: 
srgb:1,bt2020:1,gamma2_2:1,pq:1,hlg:1  post_csc:1  gamcor:1  dgam_rom_for_yuv:0 
 3d_lut:1  blnd_lut:1  oscs:0

MPCC:  OPP  DPP  MPCCBOT  MODE  ALPHA_MODE  PREMULT  OVERLAP_ONLY  IDLE  SHAPER 
mode  3DLUT mode  3DLUT bit-depth  3DLUT size  OGAM mode  OGAM LUT  GAMUT 
adjust  C11C12C13C14C21C22C23   
 C24C31C32C33C34
[ 0]:   0h   0h   2h 3   01 0 0   
Bypass  Bypass   12-bit17x17x17RAM A  Bypass
00 00 00 00 00 00 
00 00 00 

[PATCH v3 3/9] drm/amd/display: read gamut remap matrix in fixed-point 31.32 format

2023-11-28 Thread Melissa Wen
Instead of read gamut remap data from hw values, convert HW register
values (S2D13) into a fixed-point 31.32 matrix for color state log.
Change DCN10 log to print data in the format of the gamut remap matrix.

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 38 +--
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  3 ++
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 9b801488eb9d..f0a9f8818909 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -289,20 +289,26 @@ static void dcn10_log_color_state(struct dc *dc,
struct resource_pool *pool = dc->res_pool;
int i;
 
-   DTN_INFO("DPP:IGAM format  IGAM modeDGAM modeRGAM mode"
-   "  GAMUT mode  C11 C12   C13 C14   C21 C22   C23 C24   "
-   "C31 C32   C33 C34\n");
+   DTN_INFO("DPP:IGAM formatIGAM modeDGAM modeRGAM mode"
+"  GAMUT adjust  "
+"C11C12C13C14"
+"C21C22C23C24"
+"C31C32C33C34\n");
for (i = 0; i < pool->pipe_count; i++) {
struct dpp *dpp = pool->dpps[i];
struct dcn_dpp_state s = {0};
 
dpp->funcs->dpp_read_state(dpp, );
+   dpp->funcs->dpp_get_gamut_remap(dpp, _remap);
 
if (!s.is_enabled)
continue;
 
-   DTN_INFO("[%2d]:  %11xh  %-11s  %-11s  %-11s"
-   "%8x%08xh %08xh %08xh %08xh %08xh %08xh",
+   DTN_INFO("[%2d]:  %11xh  %11s%9s%9s"
+"  %12s  "
+"%010lld %010lld %010lld %010lld "
+"%010lld %010lld %010lld %010lld "
+"%010lld %010lld %010lld %010lld",
dpp->inst,
s.igam_input_format,
(s.igam_lut_mode == 0) ? "BypassFixed" :
@@ -322,13 +328,21 @@ static void dcn10_log_color_state(struct dc *dc,
((s.rgam_lut_mode == 3) ? "RAM" :
((s.rgam_lut_mode == 4) ? "RAM" :
 "Unknown",
-   s.gamut_remap_mode,
-   s.gamut_remap_c11_c12,
-   s.gamut_remap_c13_c14,
-   s.gamut_remap_c21_c22,
-   s.gamut_remap_c23_c24,
-   s.gamut_remap_c31_c32,
-   s.gamut_remap_c33_c34);
+   (s.gamut_remap.gamut_adjust_type == 0) ? 
"Bypass" :
+   ((s.gamut_remap.gamut_adjust_type == 1) 
? "HW" :
+   
  "SW"),
+   s.gamut_remap.temperature_matrix[0].value,
+   s.gamut_remap.temperature_matrix[1].value,
+   s.gamut_remap.temperature_matrix[2].value,
+   s.gamut_remap.temperature_matrix[3].value,
+   s.gamut_remap.temperature_matrix[4].value,
+   s.gamut_remap.temperature_matrix[5].value,
+   s.gamut_remap.temperature_matrix[6].value,
+   s.gamut_remap.temperature_matrix[7].value,
+   s.gamut_remap.temperature_matrix[8].value,
+   s.gamut_remap.temperature_matrix[9].value,
+   s.gamut_remap.temperature_matrix[10].value,
+   s.gamut_remap.temperature_matrix[11].value);
DTN_INFO("\n");
}
DTN_INFO("\n");
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
index 597ebdb4da4c..b6acfd86642a 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h
@@ -141,6 +141,7 @@ struct dcn_dpp_state {
uint32_t igam_input_format;
uint32_t dgam_lut_mode;
uint32_t rgam_lut_mode;
+   // gamut_remap data for dcn10_get_cm_states()
uint32_t gamut_remap_mode;
uint32_t gamut_remap_c11_c12;
uint32_t gamut_remap_c13_c14;
@@ -148,6 +149,8 @@ struct dcn_dpp_state {
uint32_t gamut_remap_c23_c24;
uint32_t gamut_remap_c31_c32;
uint32_t gamut_remap_c33_c34;
+   // gamut_remap data for dcn*_log_color_state()
+   struct 

[PATCH v3 1/9] drm/amd/display: decouple color state from hw state log

2023-11-28 Thread Melissa Wen
Prepare to hook up color state log according to the DCN version.

v3:
- put functions in single line (Siqueira)

Signed-off-by: Melissa Wen 
---
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 26 +--
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 2b8b8366538e..9b801488eb9d 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -282,19 +282,13 @@ static void dcn10_log_hubp_states(struct dc *dc, void 
*log_ctx)
DTN_INFO("\n");
 }
 
-void dcn10_log_hw_state(struct dc *dc,
-   struct dc_log_buffer_ctx *log_ctx)
+static void dcn10_log_color_state(struct dc *dc,
+ struct dc_log_buffer_ctx *log_ctx)
 {
struct dc_context *dc_ctx = dc->ctx;
struct resource_pool *pool = dc->res_pool;
int i;
 
-   DTN_INFO_BEGIN();
-
-   dcn10_log_hubbub_state(dc, log_ctx);
-
-   dcn10_log_hubp_states(dc, log_ctx);
-
DTN_INFO("DPP:IGAM format  IGAM modeDGAM modeRGAM mode"
"  GAMUT mode  C11 C12   C13 C14   C21 C22   C23 C24   "
"C31 C32   C33 C34\n");
@@ -351,6 +345,22 @@ void dcn10_log_hw_state(struct dc *dc,
s.idle);
}
DTN_INFO("\n");
+}
+
+void dcn10_log_hw_state(struct dc *dc,
+   struct dc_log_buffer_ctx *log_ctx)
+{
+   struct dc_context *dc_ctx = dc->ctx;
+   struct resource_pool *pool = dc->res_pool;
+   int i;
+
+   DTN_INFO_BEGIN();
+
+   dcn10_log_hubbub_state(dc, log_ctx);
+
+   dcn10_log_hubp_states(dc, log_ctx);
+
+   dcn10_log_color_state(dc, log_ctx);
 
DTN_INFO("OTG:  v_bs  v_be  v_ss  v_se  vpol  vmax  vmin  vmax_sel  
vmin_sel  h_bs  h_be  h_ss  h_se  hpol  htot  vtot  underflow blank_en\n");
 
-- 
2.42.0



Re: [PATCH] drm/amdgpu: add shared fdinfo stats

2023-11-28 Thread Rob Clark
On Tue, Nov 28, 2023 at 6:28 AM Alex Deucher  wrote:
>
> On Tue, Nov 28, 2023 at 9:17 AM Christian König
>  wrote:
> >
> > Am 17.11.23 um 20:56 schrieb Alex Deucher:
> > > Add shared stats.  Useful for seeing shared memory.
> > >
> > > Signed-off-by: Alex Deucher 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
> > >   3 files changed, 21 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > index 5706b282a0c7..c7df7fa3459f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> > > @@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
> > > drm_file *file)
> > >  stats.requested_visible_vram/1024UL);
> > >   drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
> > >  stats.requested_gtt/1024UL);
> > > + drm_printf(p, "drm-shared-vram:\t%llu KiB\n", 
> > > stats.vram_shared/1024UL);
> > > + drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", 
> > > stats.gtt_shared/1024UL);
> > > + drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", 
> > > stats.cpu_shared/1024UL);
> > > +
> > >   for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
> > >   if (!usage[hw_ip])
> > >   continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index d79b4ca1ecfc..c24f7b2c04c1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
> > > struct amdgpu_mem_stats *stats)
> > >   {
> > >   uint64_t size = amdgpu_bo_size(bo);
> > > + struct drm_gem_object *obj;
> > >   unsigned int domain;
> > > + bool shared;
> > >
> > >   /* Abort if the BO doesn't currently have a backing store */
> > >   if (!bo->tbo.resource)
> > >   return;
> > >
> > > + obj = >tbo.base;
> > > + shared = obj->handle_count > 1;
> >
> > Interesting approach but I don't think that this is correct.
> >
> > The handle_count is basically how many GEM handles are there for BO, so
> > for example it doesn't catch sharing things with V4L.
> >
> > What we should probably rather do is to take a look if
> > bo->tbo.base.dma_buf is NULL or not.
>
> +Rob, dri-devel
>
> This is what the generic drm helper code does.  See
> drm_show_memory_stats().  If that is not correct that code should
> probably be fixed too.

OTOH, v4l doesn't expose fdinfo.  What "shared" is telling you is
whether the BO is counted multiple times when you look at all
processes fdinfo.

But I guess it would be ok to look for obj->handle_count > 1 || obj->dma_buf

BR,
-R

>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> >
> > > +
> > >   domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
> > >   switch (domain) {
> > >   case AMDGPU_GEM_DOMAIN_VRAM:
> > >   stats->vram += size;
> > >   if (amdgpu_bo_in_cpu_visible_vram(bo))
> > >   stats->visible_vram += size;
> > > + if (shared)
> > > + stats->vram_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_GTT:
> > >   stats->gtt += size;
> > > + if (shared)
> > > + stats->gtt_shared += size;
> > >   break;
> > >   case AMDGPU_GEM_DOMAIN_CPU:
> > >   default:
> > >   stats->cpu += size;
> > > + if (shared)
> > > + stats->cpu_shared += size;
> > >   break;
> > >   }
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index d28e21baef16..0503af75dc26 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
> > >   struct amdgpu_mem_stats {
> > >   /* current VRAM usage, includes visible VRAM */
> > >   uint64_t vram;
> > > + /* current shared VRAM usage, includes visible VRAM */
> > > + uint64_t vram_shared;
> > >   /* current visible VRAM usage */
> > >   uint64_t visible_vram;
> > >   /* current GTT usage */
> > >   uint64_t gtt;
> > > + /* current shared GTT usage */
> > > + uint64_t gtt_shared;
> > >   /* current system memory usage */
> > >   uint64_t cpu;
> > > + /* current shared system memory usage */
> > > + uint64_t cpu_shared;
> > >   /* sum of evicted buffers, includes visible VRAM */
> > >   uint64_t evicted_vram;
> > >   /* sum of evicted buffers due to CPU access */
> >


Re: (subset) [PATCH 00/17] dt-bindings: samsung: add specific compatibles for existing SoC

2023-11-28 Thread Thierry Reding


On Wed, 08 Nov 2023 11:43:26 +0100, Krzysztof Kozlowski wrote:
> Merging
> ===
> I propose to take entire patchset through my tree (Samsung SoC), because:
> 1. Next cycle two new SoCs will be coming (Google GS101 and ExynosAutov920), 
> so
>they will touch the same lines in some of the DT bindings (not all, 
> though).
>It is reasonable for me to take the bindings for the new SoCs, to have 
> clean
>`make dtbs_check` on the new DTS.
> 2. Having it together helps me to have clean `make dtbs_check` within my tree
>on the existing DTS.
> 3. No drivers are affected by this change.
> 4. I plan to do the same for Tesla FSD and Exynos ARM32 SoCs, thus expect
>follow up patchsets.
> 
> [...]

Applied, thanks!

[12/17] dt-bindings: pwm: samsung: add specific compatibles for existing SoC
commit: 5d67b8f81b9d598599366214e3b2eb5f84003c9f

Best regards,
-- 
Thierry Reding 


Re: [PATCH] drm/msm/dpu: Capture dpu snapshot when frame_done_timer timeouts

2023-11-28 Thread Paloma Arellano



On 11/27/2023 5:48 PM, Dmitry Baryshkov wrote:

On Tue, 28 Nov 2023 at 03:12, Paloma Arellano  wrote:

Trigger a devcoredump to dump dpu registers and capture the drm atomic
state when the frame_done_timer timeouts.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 1cf7ff6caff4..5cf7594feb5a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -191,6 +191,7 @@ struct dpu_encoder_virt {
 void *crtc_frame_event_cb_data;

 atomic_t frame_done_timeout_ms;
+   atomic_t frame_done_timeout_cnt;
 struct timer_list frame_done_timer;

 struct msm_display_info disp_info;
@@ -1204,6 +1205,8 @@ static void dpu_encoder_virt_atomic_enable(struct 
drm_encoder *drm_enc,

 dpu_enc->dsc = dpu_encoder_get_dsc_config(drm_enc);

+   atomic_set(_enc->frame_done_timeout_cnt, 0);
+
 if (disp_info->intf_type == INTF_DP)
 dpu_enc->wide_bus_en = 
msm_dp_wide_bus_available(priv->dp[index]);
 else if (disp_info->intf_type == INTF_DSI)
@@ -2115,11 +2118,12 @@ static int _dpu_encoder_status_show(struct seq_file *s, 
void *data)
 for (i = 0; i < dpu_enc->num_phys_encs; i++) {
 struct dpu_encoder_phys *phys = dpu_enc->phys_encs[i];

-   seq_printf(s, "intf:%d  wb:%d  vsync:%8d underrun:%8d",
+   seq_printf(s, "intf:%d  wb:%d  vsync:%8d underrun:%8d
frame_done_cnt:%d",
 phys->hw_intf ? phys->hw_intf->idx - INTF_0 : 
-1,
 phys->hw_wb ? phys->hw_wb->idx - WB_0 : -1,
 atomic_read(>vsync_cnt),
-   atomic_read(>underrun_cnt));
+   atomic_read(>underrun_cnt),
+   atomic_read(_enc->frame_done_timeout_cnt));

 seq_printf(s, "mode: %s\n", 
dpu_encoder_helper_get_intf_type(phys->intf_mode));
 }
@@ -2341,6 +2345,10 @@ static void dpu_encoder_frame_done_timeout(struct 
timer_list *t)

 DPU_ERROR_ENC(dpu_enc, "frame done timeout\n");

+   atomic_inc(_enc->frame_done_timeout_cnt);
+   if (atomic_read(_enc->frame_done_timeout_cnt) == 1)
+   msm_disp_snapshot_state(drm_enc->dev);

atomic_inc_and_test(), please


Hi Dmitry,

We only want to create a snapshot for the first instance in which the 
timer timeouts. atomic_int_and_test() increments the value and then 
returns whether it has a value of zero or not. FWIW I think I should 
change it to 'atomic_add_return(1, _enc->frame_done_timeout_cnt)' so 
that we can check only when this value equals one.


Thank you,

Paloma




+
 event = DPU_ENCODER_FRAME_EVENT_ERROR;
 trace_dpu_enc_frame_done_timeout(DRMID(drm_enc), event);
 dpu_enc->crtc_frame_event_cb(dpu_enc->crtc_frame_event_cb_data, event);
@@ -2392,6 +2400,7 @@ struct drm_encoder *dpu_encoder_init(struct drm_device 
*dev,
 goto fail;

 atomic_set(_enc->frame_done_timeout_ms, 0);
+   atomic_set(_enc->frame_done_timeout_cnt, 0);
 timer_setup(_enc->frame_done_timer,
 dpu_encoder_frame_done_timeout, 0);

--
2.41.0





[PATCH] drm/imagination: Numerous documentation fixes.

2023-11-28 Thread Donald Robson
Some reported by Stephen Rothwell. The rest were found by running the
kernel-doc build script.
Some indentation fixes.

Reported-by: Stephen Rothwell 
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202311241526.y2wzeuau-...@intel.com/
Signed-off-by: Donald Robson 
---
 Documentation/gpu/imagination/index.rst   |  2 +-
 Documentation/gpu/imagination/uapi.rst|  5 +
 drivers/gpu/drm/imagination/pvr_cccb.h|  1 +
 drivers/gpu/drm/imagination/pvr_device.h  | 25 ---
 drivers/gpu/drm/imagination/pvr_fw.h  |  3 ++-
 drivers/gpu/drm/imagination/pvr_fw_info.h |  8 
 drivers/gpu/drm/imagination/pvr_hwrt.h|  1 +
 drivers/gpu/drm/imagination/pvr_job.c |  4 +---
 drivers/gpu/drm/imagination/pvr_mmu.c |  3 ++-
 drivers/gpu/drm/imagination/pvr_queue.h   |  4 ++--
 drivers/gpu/drm/imagination/pvr_vm.c  |  2 +-
 include/uapi/drm/pvr_drm.h| 10 -
 12 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/Documentation/gpu/imagination/index.rst 
b/Documentation/gpu/imagination/index.rst
index dc9579e758c3..0c1e247cea41 100644
--- a/Documentation/gpu/imagination/index.rst
+++ b/Documentation/gpu/imagination/index.rst
@@ -3,7 +3,7 @@ drm/imagination PowerVR Graphics Driver
 ===
 
 .. kernel-doc:: drivers/gpu/drm/imagination/pvr_drv.c
-   :doc: PowerVR Graphics Driver
+   :doc: PowerVR (Series 6 and later) and IMG Graphics Driver
 
 Contents
 
diff --git a/Documentation/gpu/imagination/uapi.rst 
b/Documentation/gpu/imagination/uapi.rst
index 2227ea7e6222..7502413d0a93 100644
--- a/Documentation/gpu/imagination/uapi.rst
+++ b/Documentation/gpu/imagination/uapi.rst
@@ -45,9 +45,6 @@ DEV_QUERY
  drm_pvr_heap
  drm_pvr_dev_query_heap_info
 
-.. kernel-doc:: include/uapi/drm/pvr_drm.h
-   :doc: Flags for DRM_PVR_DEV_QUERY_HEAP_INFO_GET.
-
 .. kernel-doc:: include/uapi/drm/pvr_drm.h
:identifiers: drm_pvr_static_data_area_usage
  drm_pvr_static_data_area
@@ -121,7 +118,7 @@ CREATE_FREE_LIST and DESTROY_FREE_LIST
:identifiers: drm_pvr_ioctl_destroy_free_list_args
 
 CREATE_HWRT_DATASET and DESTROY_HWRT_DATASET
---
+
 .. kernel-doc:: include/uapi/drm/pvr_drm.h
:doc: PowerVR IOCTL CREATE_HWRT_DATASET and DESTROY_HWRT_DATASET interfaces
 
diff --git a/drivers/gpu/drm/imagination/pvr_cccb.h 
b/drivers/gpu/drm/imagination/pvr_cccb.h
index f35b3d4c9575..943fe8f2c963 100644
--- a/drivers/gpu/drm/imagination/pvr_cccb.h
+++ b/drivers/gpu/drm/imagination/pvr_cccb.h
@@ -86,6 +86,7 @@ pvr_cccb_get_size_of_cmd_with_hdr(u32 cmd_size)
 
 /**
  * pvr_cccb_cmdseq_can_fit() - Check if a command sequence can fit in the CCCB.
+ * @pvr_cccb: Target Client CCB.
  * @size: Command sequence size.
  *
  * Returns:
diff --git a/drivers/gpu/drm/imagination/pvr_device.h 
b/drivers/gpu/drm/imagination/pvr_device.h
index e07655fc65e8..2ca7e535799f 100644
--- a/drivers/gpu/drm/imagination/pvr_device.h
+++ b/drivers/gpu/drm/imagination/pvr_device.h
@@ -203,17 +203,29 @@ struct pvr_device {
struct mutex lock;
} queues;
 
+   /**
+* @watchdog: Watchdog for communications with firmware.
+*/
struct {
/** @work: Work item for watchdog callback. */
struct delayed_work work;
 
-   /** @old_kccb_cmds_executed: KCCB command execution count at 
last watchdog poll. */
+   /**
+* @old_kccb_cmds_executed: KCCB command execution count at last
+* watchdog poll.
+*/
u32 old_kccb_cmds_executed;
 
-   /** @kccb_stall_count: Number of watchdog polls KCCB has been 
stalled for. */
+   /**
+* @kccb_stall_count: Number of watchdog polls KCCB has been
+* stalled for.
+*/
u32 kccb_stall_count;
} watchdog;
 
+   /**
+* @kccb: Circular buffer for communications with firmware.
+*/
struct {
/** @ccb: Kernel CCB. */
struct pvr_ccb ccb;
@@ -225,8 +237,8 @@ struct pvr_device {
struct pvr_fw_object *rtn_obj;
 
/**
-* @rtn: Pointer to CPU mapping of KCCB return slots. Must be 
accessed by
-*   READ_ONCE()/WRITE_ONCE().
+* @rtn: Pointer to CPU mapping of KCCB return slots. Must be
+* accessed by READ_ONCE()/WRITE_ONCE().
 */
u32 *rtn;
 
@@ -293,14 +305,13 @@ struct pvr_file {
 
/**
 * @pvr_dev: A reference to the powervr-specific wrapper for the
-*   associated device. Saves on repeated calls to
-*   to_pvr_device().
+* associated device. Saves on repeated calls to to_pvr_device().
 */
 

Re: [PATCH 1/3] Revert "drm/prime: Unexport helpers for fd/handle conversion"

2023-11-28 Thread Alex Deucher
On Thu, Nov 23, 2023 at 6:12 PM Felix Kuehling  wrote:
>
> [+Alex]
>
> On 2023-11-17 16:44, Felix Kuehling wrote:
>
> > This reverts commit 71a7974ac7019afeec105a54447ae1dc7216cbb3.
> >
> > These helper functions are needed for KFD to export and import DMABufs
> > the right way without duplicating the tracking of DMABufs associated with
> > GEM objects while ensuring that move notifier callbacks are working as
> > intended.
> >
> > CC: Christian König 
> > CC: Thomas Zimmermann 
> > Signed-off-by: Felix Kuehling 
>
> Re: our discussion about v2 of this patch: If this version is
> acceptable, can I get an R-b or A-b?
>
> I would like to get this patch into drm-next as a prerequisite for
> patches 2 and 3. I cannot submit it to the current amd-staging-drm-next
> because the patch I'm reverting doesn't exist there yet.
>
> Patch 2 and 3 could go into drm-next as well, or go through Alex's
> amd-staging-drm-next branch once patch 1 is in drm-next. Alex, how do
> you prefer to coordinate this?

I guess ideally this would go through my drm-next tree since your
other patches depend on it unless others feel strongly that it should
go through drm-misc.

Alex


>
> Regards,
>Felix
>
>
> > ---
> >   drivers/gpu/drm/drm_prime.c | 33 ++---
> >   include/drm/drm_prime.h |  7 +++
> >   2 files changed, 25 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 63b709a67471..834a5e28abbe 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -278,7 +278,7 @@ void drm_gem_dmabuf_release(struct dma_buf *dma_buf)
> >   }
> >   EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >
> > -/*
> > +/**
> >* drm_gem_prime_fd_to_handle - PRIME import function for GEM drivers
> >* @dev: drm_device to import into
> >* @file_priv: drm file-private structure
> > @@ -292,9 +292,9 @@ EXPORT_SYMBOL(drm_gem_dmabuf_release);
> >*
> >* Returns 0 on success or a negative error code on failure.
> >*/
> > -static int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > -   struct drm_file *file_priv, int 
> > prime_fd,
> > -   uint32_t *handle)
> > +int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > +struct drm_file *file_priv, int prime_fd,
> > +uint32_t *handle)
> >   {
> >   struct dma_buf *dma_buf;
> >   struct drm_gem_object *obj;
> > @@ -360,6 +360,7 @@ static int drm_gem_prime_fd_to_handle(struct drm_device 
> > *dev,
> >   dma_buf_put(dma_buf);
> >   return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
> >
> >   int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file_priv)
> > @@ -408,7 +409,7 @@ static struct dma_buf 
> > *export_and_register_object(struct drm_device *dev,
> >   return dmabuf;
> >   }
> >
> > -/*
> > +/**
> >* drm_gem_prime_handle_to_fd - PRIME export function for GEM drivers
> >* @dev: dev to export the buffer from
> >* @file_priv: drm file-private structure
> > @@ -421,10 +422,10 @@ static struct dma_buf 
> > *export_and_register_object(struct drm_device *dev,
> >* The actual exporting from GEM object to a dma-buf is done through the
> >* _gem_object_funcs.export callback.
> >*/
> > -static int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > -   struct drm_file *file_priv, uint32_t 
> > handle,
> > -   uint32_t flags,
> > -   int *prime_fd)
> > +int drm_gem_prime_handle_to_fd(struct drm_device *dev,
> > +struct drm_file *file_priv, uint32_t handle,
> > +uint32_t flags,
> > +int *prime_fd)
> >   {
> >   struct drm_gem_object *obj;
> >   int ret = 0;
> > @@ -506,6 +507,7 @@ static int drm_gem_prime_handle_to_fd(struct drm_device 
> > *dev,
> >
> >   return ret;
> >   }
> > +EXPORT_SYMBOL(drm_gem_prime_handle_to_fd);
> >
> >   int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file_priv)
> > @@ -864,9 +866,9 @@ EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >* @obj: GEM object to export
> >* @flags: flags like DRM_CLOEXEC and DRM_RDWR
> >*
> > - * This is the implementation of the _gem_object_funcs.export functions
> > - * for GEM drivers using the PRIME helpers. It is used as the default for
> > - * drivers that do not set their own.
> > + * This is the implementation of the _gem_object_funcs.export 
> > functions for GEM drivers
> > + * using the PRIME helpers. It is used as the default in
> > + * drm_gem_prime_handle_to_fd().
> >*/
> >   struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
> >  

Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Dmitry Baryshkov
On Tue, 28 Nov 2023 at 18:56, Michael Walle  wrote:
>
> >> > DSI device lifetime has three different stages:
> >> > 1. before the DSI link being powered up and clocking,
> >> > 2. when the DSI link is in LP state (for the purpose of this question,
> >> > this is the time between the DSI link being powered up and the video
> >> > stream start)
> >> > 3. when the DSI link is in HS state (while streaming the video).
> >>
> >> It's not clear to me what (2) is. What is the state of the clock and
> >> data lanes?
> >
> > Clk an Data0 should be in the LP mode, ready for LP Data Transfer.
>
> Then this is somehow missing
> https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation
>
>A DSI host should keep the PHY powered down until the pre_enable
> operation
>is called. All lanes are in an undefined idle state up to this point,
> and
>it must not be assumed that it is LP-11. pre_enable should initialise
> the
>PHY, set the data lanes to LP-11, and the clock lane to either LP-11
> or HS
>depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.
>
> So I don't think these three states are sufficient, see below, that
> there
> should be at least four.

Which one is #4?

>
> -michael
>
> >
> > I don't think we support ULPS currently.
> >
> >
> >>
> >> I'm facing similar issues with the tc358775 bridge. This bridge needs
> >> to release its reset while both clock and data lanes are in LP-11
> >> mode.
> >> But then it needs to be configured (via I2C) while the clock lane is
> >> in enabled (HS mode), but the data lanes are still in LP-11 mode.
> >>
> >> To me it looks like there is a fouth case then:
> >> 1. unpowered
> >> 2. DSI clock and data are in LP-11
> >> 3. DSI clock is in HS and data are in LP-11
> >> 4. DSI clock is in HS and data is in HS
> >>
> >> (And of course the bridge needs continuous clock mode).
> >>
> >> > Different DSI bridges have different requirements with respect to the
> >> > code being executed at stages 1 and 2. For example several DSI-to-eDP
> >> > bridges (ps8640, tc358767 require for the link to be quiet during
> >> > reset time.
> >> > The DSI-controlled bridges and DSI panels need to send some commands
> >> > in stage 2, before starting up video
> >> >
> >> > In the DRM subsystem stage 3 naturally maps to the
> >> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
> >> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
> >> > the DRM call chain.
> >> > Earlier we attempted to solve that using the pre_enable_prev_first,
> >> > which remapped pre-enable callback execution order. However it has led
> >> > us to the two issues. First, at the DSI host driver we do not know
> >> > whether the panel / bridge were updated to use pre_enable_prev_first
> >> > or not. Second, if the bridge has to perform steps during both stages
> >> > 1 and 2, it can not do that.
> >> >
> >> > I'm trying to find a way to express the difference between stages 1
> >> > and 2 in the generic code, so that we do not to worry about particular
> >> > DSI host and DSI bridge / panel peculiarities when implementing the
> >> > DSI host and/or DSI panel driver.
> >>
> >> For now, I have a rather hacky ".dsi_lp11_notify" callback in
> >> drm_bridge_funcs which is supposed to be called by the DSI host while
> >> the
> >> clock and data lanes are in LP-11 mode. But that is rather an RFC and
> >> me
> >> needing something to get the driver for this bridge working. Because
> >> it's
> >> badly broken. FWIW, you can find my work-in-progress patches at
> >> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
> >>
> >> -michael
> >>
> >
> >
> > --
> > With best wishes
> > Dmitry



-- 
With best wishes
Dmitry


Re: [PATCH v3 08/16] drm/exynos: Convert to platform remove callback returning void

2023-11-28 Thread Krzysztof Kozlowski
On 28/11/2023 17:55, Uwe Kleine-König wrote:
> Hello Inki,
> 
> On Wed, Nov 08, 2023 at 08:54:54AM +0100, Uwe Kleine-König wrote:
>> Hello Inki,
>>
>> On Wed, Nov 08, 2023 at 01:16:18PM +0900, Inki Dae wrote:
>>> Sorry for late. There was a merge conflict so I fixed it manually and
>>> merged. And seems your patch description is duplicated so dropped
>>> duplicated one.
>>
>> Ah. I have a template that generates one patch per driver. I guess this
>> is the result of using squash instead of fixup while putting all exynos
>> changes into a single patch.
> 
> This patch didn't make it into next yet even though it's included in
> your exynos-drm-next branch at
> https://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git.
> 
> Is this on purpose?

Not exactly on purpose but the problem is drm-exynos tree is not in the
next.

Reminds me my talk from Plumbers this year. :)  Slides are here and
serve as reference:
https://lpc.events/event/17/contributions/1498/

Best regards,
Krzysztof



Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Michael Walle

> DSI device lifetime has three different stages:
> 1. before the DSI link being powered up and clocking,
> 2. when the DSI link is in LP state (for the purpose of this question,
> this is the time between the DSI link being powered up and the video
> stream start)
> 3. when the DSI link is in HS state (while streaming the video).

It's not clear to me what (2) is. What is the state of the clock and
data lanes?


Clk an Data0 should be in the LP mode, ready for LP Data Transfer.


Then this is somehow missing
https://docs.kernel.org/gpu/drm-kms-helpers.html#mipi-dsi-bridge-operation

  A DSI host should keep the PHY powered down until the pre_enable 
operation
  is called. All lanes are in an undefined idle state up to this point, 
and
  it must not be assumed that it is LP-11. pre_enable should initialise 
the
  PHY, set the data lanes to LP-11, and the clock lane to either LP-11 
or HS

  depending on the mode_flag MIPI_DSI_CLOCK_NON_CONTINUOUS.

So I don't think these three states are sufficient, see below, that 
there

should be at least four.

-michael



I don't think we support ULPS currently.




I'm facing similar issues with the tc358775 bridge. This bridge needs
to release its reset while both clock and data lanes are in LP-11 
mode.

But then it needs to be configured (via I2C) while the clock lane is
in enabled (HS mode), but the data lanes are still in LP-11 mode.

To me it looks like there is a fouth case then:
1. unpowered
2. DSI clock and data are in LP-11
3. DSI clock is in HS and data are in LP-11
4. DSI clock is in HS and data is in HS

(And of course the bridge needs continuous clock mode).

> Different DSI bridges have different requirements with respect to the
> code being executed at stages 1 and 2. For example several DSI-to-eDP
> bridges (ps8640, tc358767 require for the link to be quiet during
> reset time.
> The DSI-controlled bridges and DSI panels need to send some commands
> in stage 2, before starting up video
>
> In the DRM subsystem stage 3 naturally maps to the
> drm_bridge_funcs::enable, stage 1 also naturally maps to the
> drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
> the DRM call chain.
> Earlier we attempted to solve that using the pre_enable_prev_first,
> which remapped pre-enable callback execution order. However it has led
> us to the two issues. First, at the DSI host driver we do not know
> whether the panel / bridge were updated to use pre_enable_prev_first
> or not. Second, if the bridge has to perform steps during both stages
> 1 and 2, it can not do that.
>
> I'm trying to find a way to express the difference between stages 1
> and 2 in the generic code, so that we do not to worry about particular
> DSI host and DSI bridge / panel peculiarities when implementing the
> DSI host and/or DSI panel driver.

For now, I have a rather hacky ".dsi_lp11_notify" callback in
drm_bridge_funcs which is supposed to be called by the DSI host while 
the
clock and data lanes are in LP-11 mode. But that is rather an RFC and 
me
needing something to get the driver for this bridge working. Because 
it's

badly broken. FWIW, you can find my work-in-progress patches at
https://github.com/mwalle/linux/tree/feature-tc358775-fixes

-michael




--
With best wishes
Dmitry


Re: [PATCH v3 08/16] drm/exynos: Convert to platform remove callback returning void

2023-11-28 Thread Uwe Kleine-König
Hello Inki,

On Wed, Nov 08, 2023 at 08:54:54AM +0100, Uwe Kleine-König wrote:
> Hello Inki,
> 
> On Wed, Nov 08, 2023 at 01:16:18PM +0900, Inki Dae wrote:
> > Sorry for late. There was a merge conflict so I fixed it manually and
> > merged. And seems your patch description is duplicated so dropped
> > duplicated one.
> 
> Ah. I have a template that generates one patch per driver. I guess this
> is the result of using squash instead of fixup while putting all exynos
> changes into a single patch.

This patch didn't make it into next yet even though it's included in
your exynos-drm-next branch at
https://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git.

Is this on purpose?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


Re: [PATCH v3 16/16] drm/tilcdc: Convert to platform remove callback returning void

2023-11-28 Thread Uwe Kleine-König
On Fri, Nov 03, 2023 at 09:58:07AM +0200, Tomi Valkeinen wrote:
> On 02/11/2023 18:56, Uwe Kleine-König wrote:
> > The .remove() callback for a platform driver returns an int which makes
> > many driver authors wrongly assume it's possible to do error handling by
> > returning an error code. However the value returned is (mostly) ignored
> > and this typically results in resource leaks. To improve here there is a
> > quest to make the remove callback return void. In the first step of this
> > quest all drivers are converted to .remove_new() which already returns
> > void.
> > [...]
> 
> Reviewed-by: Tomi Valkeinen 

This patch didn't make it into next yet. Who is responsible to pick this
up?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


Re: [RFC PATCH 03/10] drm/mipi-dsi: add API for manual control over the DSI link power state

2023-11-28 Thread Dmitry Baryshkov
On Mon, 27 Nov 2023 at 18:07, Michael Walle  wrote:
>
> Hi,
>
> > DSI device lifetime has three different stages:
> > 1. before the DSI link being powered up and clocking,
> > 2. when the DSI link is in LP state (for the purpose of this question,
> > this is the time between the DSI link being powered up and the video
> > stream start)
> > 3. when the DSI link is in HS state (while streaming the video).
>
> It's not clear to me what (2) is. What is the state of the clock and
> data lanes?

Clk an Data0 should be in the LP mode, ready for LP Data Transfer.

I don't think we support ULPS currently.


>
> I'm facing similar issues with the tc358775 bridge. This bridge needs
> to release its reset while both clock and data lanes are in LP-11 mode.
> But then it needs to be configured (via I2C) while the clock lane is
> in enabled (HS mode), but the data lanes are still in LP-11 mode.
>
> To me it looks like there is a fouth case then:
> 1. unpowered
> 2. DSI clock and data are in LP-11
> 3. DSI clock is in HS and data are in LP-11
> 4. DSI clock is in HS and data is in HS
>
> (And of course the bridge needs continuous clock mode).
>
> > Different DSI bridges have different requirements with respect to the
> > code being executed at stages 1 and 2. For example several DSI-to-eDP
> > bridges (ps8640, tc358767 require for the link to be quiet during
> > reset time.
> > The DSI-controlled bridges and DSI panels need to send some commands
> > in stage 2, before starting up video
> >
> > In the DRM subsystem stage 3 naturally maps to the
> > drm_bridge_funcs::enable, stage 1 also naturally maps to the
> > drm_bridge_funcs::pre_enable. Stage 2 doesn't have its own place in
> > the DRM call chain.
> > Earlier we attempted to solve that using the pre_enable_prev_first,
> > which remapped pre-enable callback execution order. However it has led
> > us to the two issues. First, at the DSI host driver we do not know
> > whether the panel / bridge were updated to use pre_enable_prev_first
> > or not. Second, if the bridge has to perform steps during both stages
> > 1 and 2, it can not do that.
> >
> > I'm trying to find a way to express the difference between stages 1
> > and 2 in the generic code, so that we do not to worry about particular
> > DSI host and DSI bridge / panel peculiarities when implementing the
> > DSI host and/or DSI panel driver.
>
> For now, I have a rather hacky ".dsi_lp11_notify" callback in
> drm_bridge_funcs which is supposed to be called by the DSI host while the
> clock and data lanes are in LP-11 mode. But that is rather an RFC and me
> needing something to get the driver for this bridge working. Because it's
> badly broken. FWIW, you can find my work-in-progress patches at
> https://github.com/mwalle/linux/tree/feature-tc358775-fixes
>
> -michael
>


--
With best wishes
Dmitry


Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

2023-11-28 Thread Boris Brezillon
On Tue, 28 Nov 2023 17:10:17 +0100
AngeloGioacchino Del Regno 
wrote:

> >>if (status)
> >>panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);  
> > 
> > Nope, we don't need to read the STAT reg and forcibly call the threaded
> > handler if it's != 0. The synchronize_irq() call should do exactly that
> > (make sure all pending interrupts are processed before returning), and
> > our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
> > interrupts will kick in after that point.
> >   
> 
> Unless we synchronize_irq() *before* masking all interrupts (which would be
> wrong, as some interrupt could still fire after execution of the ISR), we get
> *either of* two scenarios:
> 
>   - COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by
> writing to JOB_INT_MASK; or
>   - COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded
> interrupt handler doesn't get executed, jobs are not canceled.
> 
> So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there,
> and if the extra check is present in the hardirq handler, and if the hardirq
> handler wasn't executed already before our synchronize_irq() call (so: if the
> hardirq execution has to be done to synchronize irqs), we are not guaranteeing
> that jobs cancellation/dequeuing/removal/whatever-handling is done before
> entering suspend.

Except the job event processing should have happened before we reached
that point. panfrost_xxx_suspend_irq() are just here to make sure

- we're done processing pending IRQs that we started processing before
  the _INT_MASK update happened
- we ignore new ones, if any

If we end up with unprocessed JOB/MMU irqs we care about when we're
here, this should be fixed by:

1. Making sure the paths updating the MMU AS are retaining a runtime PM
  ref (pm_runtime_get_sync()) before doing their stuff, and releasing
  it (pm_runtime_put()) when they are done

2. Making sure we retain a runtime PM ref while we have jobs queued to
   the various JM queues

3. Making sure we acquire a runtime PM ref when we are about to push a
   job to one of the JM queue

For #2 and #3, we retain one runtime PM ref per active job, just before
queuing it [1], and release the ref when the job is completed [2][3].
We're not supposed to receive interrupts if we have no active jobs, and
if we do, we can safely ignore them, because there's not much we would
do with those anyway.

For #1, we retain the runtime PM ref when flushing TLBs of an
active AS, and when destroying an active MMU context. The last
operation that requires touching GPU regs is panfrost_mmu_enable(),
which is called from panfrost_mmu_as_get(), which is turn is called
from panfrost_job_hw_submit() after this function has acquired a
runtime PM ref. All MMU updates are synchronous, and the interrupts
that might result from an AS are caused by GPU jobs. Meaning that any
MMU interrupt remaining when we're in the suspend path can safely be
ignored.

> 
> That, unless the suggestion was to call panfrost_job_handle_irqs() instead of
> the handler thread like that (because reading it back, it makes sense to do 
> so).

Nope, the suggestion was to keep things unchanged in
panfrost_job_suspend_irq(), and just add the extra is_suspended check
in panfrost_job_irq_handler().

[1]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L207
[2]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L462
[3]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L481
[4]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L279
[5]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L555


Re: [PATCH v2] drm/bridge: imx93-mipi-dsi: Fix a couple of building warnings

2023-11-28 Thread Robert Foss
On Thu, 23 Nov 2023 13:18:07 +0800, Liu Ying wrote:
> Fix a couple of building warnings on used uninitialized 'best_m' and
> 'best_n' local variables by initializing 'best_m' to zero and 'best_n'
> to UINT_MAX.  This makes compiler happy only.  No functional change.
> 
> 

Applied, thanks!

[1/1] drm/bridge: imx93-mipi-dsi: Fix a couple of building warnings
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=9f83f37ca76d



Rob



Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

2023-11-28 Thread AngeloGioacchino Del Regno

Il 28/11/23 16:53, Boris Brezillon ha scritto:

On Tue, 28 Nov 2023 16:10:45 +0100
AngeloGioacchino Del Regno 
wrote:


   static void panfrost_job_handle_err(struct panfrost_device *pfdev,
struct panfrost_job *job,
unsigned int js)
@@ -792,9 +800,13 @@ static irqreturn_t panfrost_job_irq_handler_thread(int 
irq, void *data)
struct panfrost_device *pfdev = data;
   
   	panfrost_job_handle_irqs(pfdev);

-   job_write(pfdev, JOB_INT_MASK,
- GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
- GENMASK(NUM_JOB_SLOTS - 1, 0));
+
+   /* Enable interrupts only if we're not about to get suspended */
+   if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))


The irq-line is requested with IRQF_SHARED, meaning the line might be
shared between all three GPU IRQs, but also with other devices. I think
if we want to be totally safe, we need to also check this is_suspending
field in the hard irq handlers before accessing the xxx_INT_yyy
registers.
   


This would mean that we would have to force canceling jobs in the suspend
handler, but if the IRQ never fired, would we still be able to find the
right bits flipped in JOB_INT_RAWSTAT?


There should be no jobs left if we enter suspend. If there is, that's a
bug we should fix, but I'm digressing.



  From what I understand, are you suggesting to call, in job_suspend_irq()
something like

void panfrost_job_suspend_irq(struct panfrost_device *pfdev)
{
  u32 status;

set_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending);

job_write(pfdev, JOB_INT_MASK, 0);
synchronize_irq(pfdev->js->irq);

status = job_read(pfdev, JOB_INT_STAT);


I guess you meant _RAWSTAT. _STAT should always be zero after we've
written 0 to _INT_MASK.



Whoops! Yes, as I wrote up there, I meant _RAWSTAT, sorry! :-)


if (status)
panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);


Nope, we don't need to read the STAT reg and forcibly call the threaded
handler if it's != 0. The synchronize_irq() call should do exactly that
(make sure all pending interrupts are processed before returning), and
our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
interrupts will kick in after that point.



Unless we synchronize_irq() *before* masking all interrupts (which would be
wrong, as some interrupt could still fire after execution of the ISR), we get
*either of* two scenarios:

 - COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by
   writing to JOB_INT_MASK; or
 - COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded
   interrupt handler doesn't get executed, jobs are not canceled.

So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there,
and if the extra check is present in the hardirq handler, and if the hardirq
handler wasn't executed already before our synchronize_irq() call (so: if the
hardirq execution has to be done to synchronize irqs), we are not guaranteeing
that jobs cancellation/dequeuing/removal/whatever-handling is done before
entering suspend.

That, unless the suggestion was to call panfrost_job_handle_irqs() instead of
the handler thread like that (because reading it back, it makes sense to do so).

Cheers!


}

and then while still retaining the check in the IRQ thread handler, also
check it in the hardirq handler like

static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
{
struct panfrost_device *pfdev = data;
u32 status;

if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))
return IRQ_NONE;


Yes, that's the extra check I was talking about, and that's also the
very reason I'm suggesting to call this field suspended_irqs instead of
is_suspending. Ultimately, each bit in this bitmap encodes the status
of a specific IRQ, not the transition from active-to-suspended,
otherwise we'd be clearing the bit at the end of
panfrost_job_suspend_irq(), right after the synchronize_irq(). But if
we were doing that, our hard IRQ handler could be called because other
devices raised an interrupt on the very same IRQ line while we are
suspended, and we'd be doing an invalid GPU reg read while the
clks/power-domains are off.



status = job_read(pfdev, JOB_INT_STAT);
if (!status)
return IRQ_NONE;

job_write(pfdev, JOB_INT_MASK, 0);
return IRQ_WAKE_THREAD;
}

(rinse and repeat for panfrost_mmu)

..or am I misunderstanding you?

Cheers,
Angelo








Re: [Intel-xe] [PATCH v5] Documentation/gpu: VM_BIND locking document

2023-11-28 Thread Rodrigo Vivi
On Tue, Nov 28, 2023 at 04:51:25PM +0100, Thomas Hellström wrote:
> On Mon, 2023-11-27 at 14:36 -0500, Rodrigo Vivi wrote:
> > On Tue, Nov 21, 2023 at 11:40:46AM +0100, Thomas Hellström wrote:
> > > Add the first version of the VM_BIND locking document which is
> > > intended to be part of the xe driver upstreaming agreement.
> > > 
> > > The document describes and discuss the locking used during exec-
> > > functions, evicton and for userptr gpu-vmas. Intention is to be
> > > using the
> > > same nomenclature as the drm-vm-bind-async.rst.
> > > 
> > > v2:
> > > - s/gvm/gpu_vm/g (Rodrigo Vivi)
> > > - Clarify the userptr seqlock with a pointer to mm/mmu_notifier.c
> > >   (Rodrigo Vivi)
> > > - Adjust commit message accordingly.
> > > - Add SPDX license header.
> > > 
> > > v3:
> > > - Large update to align with the drm_gpuvm manager locking
> > > - Add "Efficient userptr gpu_vma exec function iteration" section
> > > - Add "Locking at bind- and unbind time" section.
> > > 
> > > v4:
> > > - Fix tabs vs space errors by untabifying (Rodrigo Vivi)
> > > - Minor style fixes and typos (Rodrigo Vivi)
> > > - Clarify situations where stale GPU mappings are occurring and how
> > >   access through these mappings are blocked. (Rodrigo Vivi)
> > > - Insert into the toctree in implementation_guidelines.rst
> > > 
> > > v5:
> > > - Add a section about recoverable page-faults.
> > > - Use local references to other documentation where possible
> > >   (Bagas Sanjaya)
> > > - General documentation fixes and typos (Danilo Krummrich and
> > >   Boris Brezillon)
> > > - Improve the documentation around locks that need to be grabbed
> > > from the
> > >   dm-fence critical section (Boris Brezillon)
> > > - Add more references to the DRM GPUVM helpers (Danilo Krummrich
> > > and
> > >   Boriz Brezillon)
> > > - Update the rfc/xe.rst document.
> > > 
> > > Cc: Rodrigo Vivi 
> > > Signed-off-by: Thomas Hellström 
> > 
> > First of all, with Bagas and Boris latest suggestions, already few
> > free to use:
> > 
> > Reviewed-by: Rodrigo Vivi 
> > 
> > But a few minor comments below. Mostly trying to address Boris
> > feeling
> > of long sentences. However, take them with a grain of salt since I'm
> > not
> > a native english speaker. :) 
> 
> Hi, Rodrigo.
> 
> Thanks for the reviewing. I've added most but not all of the
> suggestions in v6. Regarding the comment about "zapping", that's used
> by the core mm for the process of unmapping page-table entries;
> zap_vma_ptes() etc. Merely following that, although I'm not really
> against using unmapping etc.

Perfect then. No concerns from my side.

Thanks,
Rodrigo.

> 
> /Thomas
> 


Re: [PATCH] drm/bridge: Fix typo in post_disable() description

2023-11-28 Thread Robert Foss
On Fri, 24 Nov 2023 10:42:30 +0100, Dario Binacchi wrote:
> s/singals/signals/
> 
> 

Applied, thanks!

[1/1] drm/bridge: Fix typo in post_disable() description
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=288b039db225



Rob



Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

2023-11-28 Thread Boris Brezillon
On Tue, 28 Nov 2023 16:42:25 +0100
AngeloGioacchino Del Regno 
wrote:

> >>  
>   panfrost_device_reset(pfdev);
>   panfrost_devfreq_resume(pfdev);
> 
>  @@ -421,6 +422,9 @@ static int panfrost_device_runtime_suspend(struct 
>  device *dev)
>   return -EBUSY;
> 
>   panfrost_devfreq_suspend(pfdev);
>  +panfrost_job_suspend_irq(pfdev);
>  +panfrost_mmu_suspend_irq(pfdev);
>  +panfrost_gpu_suspend_irq(pfdev);
>   panfrost_gpu_power_off(pfdev);
> 
>   return 0;
>  diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
>  b/drivers/gpu/drm/panfrost/panfrost_device.h
>  index 54a8aad54259..29f89f2d3679 100644
>  --- a/drivers/gpu/drm/panfrost/panfrost_device.h
>  +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
>  @@ -25,6 +25,12 @@ struct panfrost_perfcnt;
> #define NUM_JOB_SLOTS 3
> #define MAX_PM_DOMAINS 5
> 
>  +enum panfrost_drv_comp_bits {
>  +PANFROST_COMP_BIT_MMU,
>  +PANFROST_COMP_BIT_JOB,
>  +PANFROST_COMP_BIT_MAX
>  +};
>  +
> /**
>  * enum panfrost_gpu_pm - Supported kernel power management features
>  * @GPU_PM_CLK_DIS:  Allow disabling clocks during system suspend
>  @@ -109,6 +115,7 @@ struct panfrost_device {
> 
>   struct panfrost_features features;
>   const struct panfrost_compatible *comp;
>  +DECLARE_BITMAP(is_suspending, PANFROST_COMP_BIT_MAX);  
> >>>
> >>> nit: Maybe s/is_suspending/suspended_irqs/, given the state remains
> >>> until the device is resumed.  
> >>
> >> If we keep the `is_suspending` name, we can use this one more generically 
> >> in
> >> case we ever need to, what do you think?  
> > 
> > I'm lost. Why would we want to reserve a name for something we don't
> > know about? My comment was mostly relating to the fact this bitmap
> > doesn't reflect the is_suspending state, but rather is_suspended,
> > because it remains set until the device is resumed. And we actually want
> > it to reflect the is_suspended state, so we can catch interrupts that
> > are not for us without reading regs in the hard irq handler, when the
> > GPU is suspended.  
> 
> `is_suspended` (fun story: that's the first name I gave it) looks good to me,
> the doubt I raised was about calling it `suspended_irqs` instead, as I would
> prefer to keep names "more generic", but that's just personal preference at
> this point anyway.

Ah, sure, is_suspended is fine.



Re: [PATCH v2] drm/tests: Add KUnit tests for drm_mode_create_dvi_i_properties()

2023-11-28 Thread Dipam Turkar
Will work on that.

Dipam Turkar

On Tue, Nov 28, 2023 at 8:39 PM Maxime Ripard  wrote:

> Hi,
>
> On Sat, Nov 11, 2023 at 12:54:53AM +0530, Dipam Turkar wrote:
> > Introduce unit tests for the drm_mode_create_dvi_i_properties() function
> to ensure
> > the proper creation of DVI-I specific connector properties.
> >
> > Signed-off-by: Dipam Turkar 
> > ---
> >  drivers/gpu/drm/tests/drm_connector_test.c | 38 ++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/tests/drm_connector_test.c
> b/drivers/gpu/drm/tests/drm_connector_test.c
> > index c66aa2dc8d9d..9ac1fd32c579 100644
> > --- a/drivers/gpu/drm/tests/drm_connector_test.c
> > +++ b/drivers/gpu/drm/tests/drm_connector_test.c
> > @@ -4,6 +4,9 @@
> >   */
> >
> >  #include 
> > +#include 
> > +#include 
> > +#include 
> >
> >  #include 
> >
> > @@ -58,6 +61,30 @@ static void
> drm_test_get_tv_mode_from_name_truncated(struct kunit *test)
> >   KUNIT_EXPECT_LT(test, ret, 0);
> >  };
> >
> > +/*
> > + * Test that drm_mode_create_dvi_i_properties() succeeds and
> > + * DVI-I subconnector and select subconectors properties have
> > + * been created.
> > + */
> > +static void drm_test_mode_create_dvi_i_properties(struct kunit *test)
> > +{
> > + struct drm_device *drm;
> > + struct device *dev;
> > +
> > + dev = drm_kunit_helper_alloc_device(test);
> > + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
> > +
> > + drm = __drm_kunit_helper_alloc_drm_device(test, dev, sizeof(*drm),
> 0, DRIVER_MODESET);
> > + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, drm);
> > +
> > + KUNIT_EXPECT_EQ(test, drm_mode_create_dvi_i_properties(drm), 0);
> > + KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
> drm->mode_config.dvi_i_select_subconnector_property);
> > + KUNIT_EXPECT_NOT_ERR_OR_NULL(test,
> drm->mode_config.dvi_i_subconnector_property);
> > +
> > + // Expect the function to return 0 if called twice.
>
> This is not the proper comment format
>
> > + KUNIT_EXPECT_EQ(test, drm_mode_create_dvi_i_properties(drm), 0);
>
> This should be in a separate test, with a separate description. We want
> to test two things: that the function works well, and that the function
> still works if we call it a second time.
>
> > +}
> > +
> >  static struct kunit_case drm_get_tv_mode_from_name_tests[] = {
> >   KUNIT_CASE_PARAM(drm_test_get_tv_mode_from_name_valid,
> >drm_get_tv_mode_from_name_valid_gen_params),
> > @@ -70,7 +97,18 @@ static struct kunit_suite
> drm_get_tv_mode_from_name_test_suite = {
> >   .test_cases = drm_get_tv_mode_from_name_tests,
> >  };
>
> The test should be next to the test suite definition
>
> > +static struct kunit_case drm_connector_tests[] = {
> > + KUNIT_CASE(drm_test_mode_create_dvi_i_properties),
> > + { }
> > +};
> > +
> > +static struct kunit_suite drm_connector_test_suite = {
> > + .name = "drm_connector",
>
> That's too generic, the test suite is only about
> drm_mode_create_dvi_i_properties(), not drm_connector in general.
>
> > + .test_cases = drm_connector_tests,
> > +};
> > +
> >  kunit_test_suite(drm_get_tv_mode_from_name_test_suite);
> > +kunit_test_suite(drm_connector_test_suite);
>
> kunit_test_suites
>
> Maxime
>


[RFC PATCH 3/6] mm/gmem: add GMEM (Generalized Memory Management) interface for external accelerators

2023-11-28 Thread Weixi Zhu
Accelerator driver developers are forced to reinvent external MM subsystems
case by case, introducing redundant code (14K~70K for each case). This is
because Linux core MM only considers host memory resources. At the same
time, application-level developers suffer from poor programmability -- they
must consider parallel address spaces and be careful about the limited
device DRAM capacity.

This patch adds GMEM interface to help accelerator drivers directly reuse
Linux core MM, preventing them from reinventing the wheel. Drivers which
utilize GMEM interface can directly support unified virtual address spaces
for application users -- memory allocated with malloc()/mmap() can be
directly used by either CPU and accelerators, providing a coherent view of
memory.

The GMEM device interface prefixed with "gm_dev" is used to decouple
accelerator-specific operations. Device drivers should invoke
gm_dev_create() to register a device instance at the device boot time. A
device-specific implementation of "struct gm_mmu" must be provided, so
Linux can invoke hardware-related functions at the right time. If the
driver wants Linux to take charge of the local DRAM of the accelerator,
then it should register a range of physical addresses to be managed by
gm_dev_register_physmem().

The GMEM address space interface prefixed with "gm_as" is used to connect a
device context with a CPU context, i.e. an mm_struct. Struct gm_as is
created as a unified address space that not only includes a CPU context,
but may also include one or more device contexts. Device driver should
utilize gm_as_attach() to include a device context to a created struct
gm_as. Then gm_dev_fault() can then serve as a generic device page fault
handler. It is important that a device driver invokes gm_as_attach() at the
beginning of a CPU program. This invocation can happen inside an ioctl()
call when a device context is initialized.

Signed-off-by: Weixi Zhu 
---
 include/linux/gmem.h | 196 +++
 include/linux/mm_types.h |   1 +
 mm/gmem.c| 408 +++
 3 files changed, 605 insertions(+)

diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index 529ff6755a99..f424225daa03 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -13,6 +13,35 @@
 
 #ifdef CONFIG_GMEM
 
+#define GMEM_MMAP_RETRY_TIMES 10 /* gmem retry times before OOM */
+
+DECLARE_STATIC_KEY_FALSE(gmem_status);
+
+static inline bool gmem_is_enabled(void)
+{
+   return static_branch_likely(_status);
+}
+
+struct gm_dev {
+   int id;
+
+   /*
+* TODO: define more device capabilities and consider different device
+* base page sizes
+*/
+   unsigned long capability;
+   struct gm_mmu *mmu;
+   void *dev_data;
+   /* A device may support time-sliced context switch. */
+   struct gm_context *current_ctx;
+
+   struct list_head gm_ctx_list;
+
+   /* Add tracking of registered device local physical memory. */
+   nodemask_t registered_hnodes;
+   struct device *dma_dev;
+};
+
 #define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn 
number. */
 #define GM_PAGE_DEVICE 0x20
 #define GM_PAGE_NOMAP  0x40
@@ -96,7 +125,161 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, 
unsigned long start,
 unsigned long end);
 void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
unsigned long end);
+
+/* core gmem */
+enum gm_ret {
+   GM_RET_SUCCESS = 0,
+   GM_RET_NOMEM,
+   GM_RET_PAGE_EXIST,
+   GM_RET_MIGRATING,
+   GM_RET_FAILURE_UNKNOWN,
+};
+
+/**
+ * enum gm_mmu_mode - defines the method to share a physical page table.
+ *
+ * @GM_MMU_MODE_SHARE: Share a physical page table with another attached
+ * device's MMU, requiring one of the attached MMUs to be compatible. For
+ * example, the IOMMU is compatible with the CPU MMU on most modern machines.
+ * This mode requires the device physical memory to be cache-coherent.
+ * TODO: add MMU cookie to detect compatible MMUs.
+ *
+ * @GM_MMU_MODE_COHERENT_EXCLUSIVE: Maintain a coherent page table that holds
+ * exclusive mapping entries, so that device memory accesses can trigger
+ * fault-driven migration for automatic data locality optimizations.
+ * This mode does not require a cache-coherent link between the CPU and device.
+ *
+ * @GM_MMU_MODE_REPLICATE: Maintain a coherent page table that replicates
+ * physical mapping entries whenever a physical mapping is installed inside the
+ * address space, so that it may minimize the page faults to be triggered by
+ * this device.
+ * This mode requires the device physical memory to be cache-coherent.
+ */
+enum gm_mmu_mode {
+   GM_MMU_MODE_SHARE,
+   GM_MMU_MODE_COHERENT_EXCLUSIVE,
+   GM_MMU_MODE_REPLICATE,
+};
+
+enum gm_fault_hint {
+   GM_FAULT_HINT_MARK_HOT,
+   /*
+* TODO: introduce other fault hints, e.g. read-only 

[RFC PATCH 1/6] mm/gmem: add heterogeneous NUMA node

2023-11-28 Thread Weixi Zhu
This patch adds a new NUMA node state, named N_HETEROGENEOUS. It is
utilized to identify heterogeneous NUMA (hNUMA) node. Note that hNUMA node
may not be directly accessible by the CPU.

Each hNUMA node can be identified with a NUMA id. This can be extended to
provide NUMA topology including device local DRAM, where a cache-coherent
bus does not need to exist between the CPU and device local DRAM.
Furthermore, this allows an application user to issue memory hints that
bind with specific hNUMA nodes.

Signed-off-by: Weixi Zhu 
---
 drivers/base/node.c  |  6 
 include/linux/gmem.h | 19 ++
 include/linux/nodemask.h | 10 ++
 init/main.c  |  2 ++
 mm/Kconfig   | 14 
 mm/Makefile  |  1 +
 mm/gmem.c| 78 
 mm/page_alloc.c  |  3 ++
 8 files changed, 133 insertions(+)
 create mode 100644 include/linux/gmem.h
 create mode 100644 mm/gmem.c

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 493d533f8375..aa4d2ca266aa 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -928,6 +928,9 @@ static struct node_attr node_state_attr[] = {
[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
[N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator,
   N_GENERIC_INITIATOR),
+#ifdef CONFIG_GMEM
+   [N_HETEROGENEOUS] = _NODE_ATTR(has_hetero_memory, N_HETEROGENEOUS),
+#endif
 };
 
 static struct attribute *node_state_attrs[] = {
@@ -940,6 +943,9 @@ static struct attribute *node_state_attrs[] = {
_state_attr[N_MEMORY].attr.attr,
_state_attr[N_CPU].attr.attr,
_state_attr[N_GENERIC_INITIATOR].attr.attr,
+#ifdef CONFIG_GMEM
+   _state_attr[N_HETEROGENEOUS].attr.attr,
+#endif
NULL
 };
 
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
new file mode 100644
index ..fff877873557
--- /dev/null
+++ b/include/linux/gmem.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generalized Memory Management.
+ *
+ * Copyright (C) 2023- Huawei, Inc.
+ * Author: Weixi Zhu
+ *
+ */
+#ifndef _GMEM_H
+#define _GMEM_H
+
+#ifdef CONFIG_GMEM
+/* h-NUMA topology */
+void __init hnuma_init(void);
+#else
+static inline void hnuma_init(void) {}
+#endif
+
+#endif /* _GMEM_H */
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 8d07116caaf1..66e4640a52ba 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -407,6 +407,9 @@ enum node_states {
N_MEMORY,   /* The node has memory(regular, high, movable) 
*/
N_CPU,  /* The node has one or more cpus */
N_GENERIC_INITIATOR,/* The node has one or more Generic Initiators 
*/
+#ifdef CONFIG_GMEM
+   N_HETEROGENEOUS,/* The node has heterogeneous memory */
+#endif
NR_NODE_STATES
 };
 
@@ -536,6 +539,13 @@ static inline int node_random(const nodemask_t *maskp)
 #define for_each_node(node)   for_each_node_state(node, N_POSSIBLE)
 #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
 
+#ifdef CONFIG_GMEM
+/* For h-NUMA topology */
+#define hnode_map  node_states[N_HETEROGENEOUS]
+#define num_hnodes()   num_node_state(N_HETEROGENEOUS)
+#define for_each_hnode(node)   for_each_node_state(node, N_HETEROGENEOUS)
+#endif
+
 /*
  * For nodemask scratch area.
  * NODEMASK_ALLOC(type, name) allocates an object with a specified type and
diff --git a/init/main.c b/init/main.c
index e24b0780fdff..12dfb5b63d51 100644
--- a/init/main.c
+++ b/init/main.c
@@ -100,6 +100,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -901,6 +902,7 @@ void start_kernel(void)
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
boot_cpu_hotplug_init();
+   hnuma_init();
 
pr_notice("Kernel command line: %s\n", saved_command_line);
/* parameters may set static keys */
diff --git a/mm/Kconfig b/mm/Kconfig
index 89971a894b60..1a7d8194513c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1270,6 +1270,20 @@ config LOCK_MM_AND_FIND_VMA
bool
depends on !STACK_GROWSUP
 
+config GMEM
+   bool "generalized memory management for external memory devices"
+   depends on (ARM64 || X86_64) && MMU && TRANSPARENT_HUGEPAGE
+   select ARCH_USES_HIGH_VMA_FLAGS
+   default y
+   help
+ Supporting GMEM (generalized memory management) for external memory
+ devices
+
+ GMEM extends Linux MM to share its machine-independent MM code. Only
+ high-level interface is provided for device drivers. This prevents
+ accelerator drivers from reinventing the wheel, but relies on drivers 
to
+ implement their hardware-dependent functions declared by GMEM.
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 33873c8aedb3..f48ea2eb4a44 100644
--- a/mm/Makefile
+++ 

[RFC PATCH 5/6] mm/gmem: resolve VMA conflicts for attached peer devices

2023-11-28 Thread Weixi Zhu
This patch resolves potential VMA conflicts when
mmap(MAP_PRIVATE | MAP_PEER_SHARED) is invoked. Note that the semantic of
mmap(MAP_PRIVATE | MAP_PEER_SHARED) is to provide a coherent view of memory
through the allocated virtual addresses between the CPU and all attached
devices. However, an attached device may create its own computing context
that does not necessarily share the same address space layout with the CPU
process. Therefore, the mmap() syscall must return virtual addresses that
are guaranteed to be valid across all attached peer devices.

In current implementation, if a candidate VMA is detected to be
conflicting, it will be temporarily blacklisted. The mmap_region()
function will retry other VMA candidates for a predefined number of
iterations.

Signed-off-by: Weixi Zhu 
---
 fs/proc/task_mmu.c |  3 ++
 include/linux/gmem.h   | 26 +++-
 include/linux/mm.h |  8 +
 include/uapi/asm-generic/mman-common.h |  1 +
 kernel/fork.c  |  4 +++
 mm/gmem.c  | 38 
 mm/mempolicy.c |  4 +++
 mm/mmap.c  | 38 ++--
 mm/vm_object.c | 41 ++
 9 files changed, 159 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ef2eb12906da..5af03d8f0319 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -701,6 +701,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
 #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
 #ifdef CONFIG_X86_USER_SHADOW_STACK
[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
+#ifdef CONFIG_GMEM
+   [ilog2(VM_PEER_SHARED)] = "ps",
 #endif
};
size_t i;
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index 97186f29638d..82d88df5ce44 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -24,7 +24,10 @@ static inline bool gmem_is_enabled(void)
 
 static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
 {
-   return false;
+   if (!gmem_is_enabled())
+   return false;
+
+   return !!(vma->vm_flags & VM_PEER_SHARED);
 }
 
 struct gm_dev {
@@ -130,6 +133,8 @@ void unmap_gm_mappings_range(struct vm_area_struct *vma, 
unsigned long start,
 unsigned long end);
 void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
unsigned long end);
+void gm_reserve_vma(struct vm_area_struct *value, struct list_head *head);
+void gm_release_vma(struct mm_struct *mm, struct list_head *head);
 
 /* core gmem */
 enum gm_ret {
@@ -283,6 +288,10 @@ int gm_as_create(unsigned long begin, unsigned long end, 
struct gm_as **new_as);
 int gm_as_destroy(struct gm_as *as);
 int gm_as_attach(struct gm_as *as, struct gm_dev *dev, enum gm_mmu_mode mode,
 bool activate, struct gm_context **out_ctx);
+
+int gm_alloc_va_in_peer_devices(struct mm_struct *mm,
+   struct vm_area_struct *vma, unsigned long addr,
+   unsigned long len, vm_flags_t vm_flags);
 #else
 static inline bool gmem_is_enabled(void) { return false; }
 static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
@@ -339,6 +348,21 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
 {
return 0;
 }
+static inline void gm_reserve_vma(struct vm_area_struct *value,
+ struct list_head *head)
+{
+}
+static inline void gm_release_vma(struct mm_struct *mm, struct list_head *head)
+{
+}
+static inline int gm_alloc_va_in_peer_devices(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long addr,
+ unsigned long len,
+ vm_flags_t vm_flags)
+{
+   return 0;
+}
 #endif
 
 #endif /* _GMEM_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 418d26608ece..8837624e4c66 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -320,14 +320,22 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_3 35  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_4 36  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_BIT_5 37  /* bit only usable on 64-bit 
architectures */
+#define VM_HIGH_ARCH_BIT_6 38  /* bit only usable on 64-bit 
architectures */
 #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5)
+#define VM_HIGH_ARCH_6 

[RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track address mapping status

2023-11-28 Thread Weixi Zhu
This patch adds an abstraction layer, struct vm_object, that maintains
per-process virtual-to-physical mapping status stored in struct gm_mapping.
For example, a virtual page may be mapped to a CPU physical page or to a
device physical page. Struct vm_object effectively maintains an
arch-independent page table, which is defined as a "logical page table".
While arch-dependent page table used by a real MMU is named a "physical
page table". The logical page table is useful if Linux core MM is extended
to handle a unified virtual address space with external accelerators using
customized MMUs.

In this patch, struct vm_object utilizes a radix
tree (xarray) to track where a virtual page is mapped to. This adds extra
memory consumption from xarray, but provides a nice abstraction to isolate
mapping status from the machine-dependent layer (PTEs). Besides supporting
accelerators with external MMUs, struct vm_object is planned to further
union with i_pages in struct address_mapping for file-backed memory.

The idea of struct vm_object is originated from FreeBSD VM design, which
provides a unified abstraction for anonymous memory, file-backed memory,
page cache and etc[1].

Currently, Linux utilizes a set of hierarchical page walk functions to
abstract page table manipulations of different CPU architecture. The
problem happens when a device wants to reuse Linux MM code to manage its
page table -- the device page table may not be accessible to the CPU.
Existing solution like Linux HMM utilizes the MMU notifier mechanisms to
invoke device-specific MMU functions, but relies on encoding the mapping
status on the CPU page table entries. This entangles machine-independent
code with machine-dependent code, and also brings unnecessary restrictions.
The PTE size and format vary arch by arch, which harms the extensibility.

[1] https://docs.freebsd.org/en/articles/vm-design/

Signed-off-by: Weixi Zhu 
---
 include/linux/gmem.h | 120 +
 include/linux/mm_types.h |   4 +
 mm/Makefile  |   2 +-
 mm/vm_object.c   | 184 +++
 4 files changed, 309 insertions(+), 1 deletion(-)
 create mode 100644 mm/vm_object.c

diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index fff877873557..529ff6755a99 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -9,11 +9,131 @@
 #ifndef _GMEM_H
 #define _GMEM_H
 
+#include 
+
 #ifdef CONFIG_GMEM
+
+#define GM_PAGE_CPU0x10 /* Determines whether page is a pointer or a pfn 
number. */
+#define GM_PAGE_DEVICE 0x20
+#define GM_PAGE_NOMAP  0x40
+#define GM_PAGE_WILLNEED   0x80
+
+#define GM_PAGE_TYPE_MASK  (GM_PAGE_CPU | GM_PAGE_DEVICE | GM_PAGE_NOMAP)
+
+struct gm_mapping {
+   unsigned int flag;
+
+   union {
+   struct page *page;  /* CPU node */
+   struct gm_dev *dev; /* hetero-node. TODO: support multiple 
devices */
+   unsigned long pfn;
+   };
+
+   struct mutex lock;
+};
+
+static inline void gm_mapping_flags_set(struct gm_mapping *gm_mapping, int 
flags)
+{
+   if (flags & GM_PAGE_TYPE_MASK)
+   gm_mapping->flag &= ~GM_PAGE_TYPE_MASK;
+
+   gm_mapping->flag |= flags;
+}
+
+static inline void gm_mapping_flags_clear(struct gm_mapping *gm_mapping, int 
flags)
+{
+   gm_mapping->flag &= ~flags;
+}
+
+static inline bool gm_mapping_cpu(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_CPU);
+}
+
+static inline bool gm_mapping_device(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_DEVICE);
+}
+
+static inline bool gm_mapping_nomap(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_NOMAP);
+}
+
+static inline bool gm_mapping_willneed(struct gm_mapping *gm_mapping)
+{
+   return !!(gm_mapping->flag & GM_PAGE_WILLNEED);
+}
+
 /* h-NUMA topology */
 void __init hnuma_init(void);
+
+/* vm object */
+/*
+ * Each per-process vm_object tracks the mapping status of virtual pages from
+ * all VMAs mmap()-ed with MAP_PRIVATE | MAP_PEER_SHARED.
+ */
+struct vm_object {
+   spinlock_t lock;
+
+   /*
+* The logical_page_table is a container that holds the mapping
+* information between a VA and a struct page.
+*/
+   struct xarray *logical_page_table;
+   atomic_t nr_pages;
+};
+
+int __init vm_object_init(void);
+struct vm_object *vm_object_create(struct mm_struct *mm);
+void vm_object_drop_locked(struct mm_struct *mm);
+
+struct gm_mapping *alloc_gm_mapping(void);
+void free_gm_mappings(struct vm_area_struct *vma);
+struct gm_mapping *vm_object_lookup(struct vm_object *obj, unsigned long va);
+void vm_object_mapping_create(struct vm_object *obj, unsigned long start);
+void unmap_gm_mappings_range(struct vm_area_struct *vma, unsigned long start,
+unsigned long end);
+void munmap_in_peer_devices(struct mm_struct *mm, unsigned long start,
+   

[RFC PATCH 6/6] mm/gmem: extending Linux core MM to support unified virtual address space

2023-11-28 Thread Weixi Zhu
This patch extends Linux core MM to support unified virtual address space.
A unified virtual address space provides a coherent view of memory for the
CPU and devices. This is achieved by maintaining coherent page tables for
the CPU and any attached devices for each process, without assuming that
the underlying interconnect between the CPU and peripheral device is
cache-coherent.

Specifically, for each mm_struct that is attached with one or more device
computing contexts, a per-process logical page table is utilized to track
the mapping status of anonymous memory allocated via mmap(MAP_PRIVATE |
MAP_PEER_SHARED). The CPU page fault handling path is modified to examine
whether a faulted virtual page has already been faulted elsewhere, e.g. on
a device, by looking up the logical page table in vm_object. If so, a page
migration operation should be orchestrated by the core MM to prepare the
CPU physical page, instead of zero-filling. This is achieved by invoking
gm_host_fault_locked(). The logical page table must also be updated once
the CPU page table gets modified.

Ideally, the logical page table should always be looked up or modified
first if the CPU page table is changed, but the currently implementation is
reverse. Also, current implementation only considers anonymous memory,
while a device may want to operate on a disk-file directly via mmap(fd). In
the future, logical page table is planned to play a more generic role for
anonymous memory, folios/huge pages and file-backed memory, as well as to
provide a clean abstraction for CPU page table functions (including these
stage-2 functions). More, the page fault handler path will be enhanced to
deal with cache-coherent buses as well, since it might be desirable for
devices to operate sparse data remotely instead of migration data at page
granules.

Signed-off-by: Weixi Zhu 
---
 kernel/fork.c|  1 +
 mm/huge_memory.c | 85 +++-
 mm/memory.c  | 42 +---
 mm/mmap.c|  2 ++
 mm/oom_kill.c|  2 ++
 mm/vm_object.c   | 84 +++
 6 files changed, 203 insertions(+), 13 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index eab96cdb25a6..06130c73bf2e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -543,6 +543,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 
 void vm_area_free(struct vm_area_struct *vma)
 {
+   free_gm_mappings(vma);
 #ifdef CONFIG_PER_VMA_LOCK
call_rcu(>vm_rcu, vm_area_free_rcu_cb);
 #else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4f542444a91f..59f63f04 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -684,6 +685,10 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
pgtable_t pgtable;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
vm_fault_t ret = 0;
+   struct gm_mapping *gm_mapping = NULL;
+
+   if (vma_is_peer_shared(vma))
+   gm_mapping = vm_object_lookup(vma->vm_mm->vm_obj, haddr);
 
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
 
@@ -691,7 +696,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
folio_put(folio);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
-   return VM_FAULT_FALLBACK;
+   ret = VM_FAULT_FALLBACK;
+   goto gm_mapping_release;
}
folio_throttle_swaprate(folio, gfp);
 
@@ -701,7 +707,14 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
goto release;
}
 
-   clear_huge_page(page, vmf->address, HPAGE_PMD_NR);
+   /*
+* Skip zero-filling page if the logical mapping indicates
+* that page contains valid data of the virtual address. This
+* could happen if the page was a victim of device memory
+* oversubscription.
+*/
+   if (!(vma_is_peer_shared(vma) && gm_mapping_cpu(gm_mapping)))
+   clear_huge_page(page, vmf->address, HPAGE_PMD_NR);
/*
 * The memory barrier inside __folio_mark_uptodate makes sure that
 * clear_huge_page writes become visible before the set_pmd_at()
@@ -726,7 +739,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
-   return ret;
+   goto gm_mapping_release;
}
 
entry = mk_huge_pmd(page, vma->vm_page_prot);
@@ -734,6 +747,13 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
folio_add_new_anon_rmap(folio, vma, haddr);
folio_add_lru_vma(folio, vma);
   

[RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices

2023-11-28 Thread Weixi Zhu
The problem:

Accelerator driver developers are forced to reinvent external MM subsystems
case by case, because Linux core MM only considers host memory resources.
These reinvented MM subsystems have similar orders of magnitude of LoC as
Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has
30K. Meanwhile, more and more vendors are implementing their own
accelerators, e.g. Microsoft's Maia 100. At the same time,
application-level developers suffer from poor programmability -- they must
consider parallel address spaces and be careful about the limited device
DRAM capacity. This can be alleviated if a malloc()-ed virtual address can
be shared by the accelerator, or the abundant host DRAM can further
transparently backup the device local memory.

These external MM systems share similar mechanisms except for the
hardware-dependent part, so reinventing them is effectively introducing
redundant code (14K~70K for each case). Such developing/maintaining is not
cheap. Furthermore, to share a malloc()-ed virtual address, device drivers
need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU
notifiers/HMM. This raises the bar for driver development, since developers
must understand how Linux MM works. Further, it creates code maintenance
problems -- any changes to Linux MM potentially require coordinated changes
to accelerator drivers using low-level MM APIs.

Putting a cache-coherent bus between host and device will not make these
external MM subsystems disappear. For example, a throughput-oriented
accelerator will not tolerate executing heavy memory access workload with
a host MMU/IOMMU via a remote bus. Therefore, devices will still have
their own MMU and pick a simpler page table format for lower address
translation overhead, requiring external MM subsystems.



What GMEM (Generalized Memory Management [1]) does:

GMEM extends Linux MM to share its machine-independent MM code. Only
high-level interface is provided for device drivers. This prevents
accelerator drivers from reinventing the wheel, but relies on drivers to
implement their hardware-dependent functions declared by GMEM. GMEM's key
interface include gm_dev_create(), gm_as_create(), gm_as_attach() and
gm_dev_register_physmem(). Here briefly describe how a device driver
utilizes them:
1. At boot time, call gm_dev_create() and registers the implementation of
   hardware-dependent functions as declared in struct gm_mmu.
 - If the device has local DRAM, call gm_dev_register_physmem() to
   register available physical addresses.
2. When a device context is initialized (e.g. triggered by ioctl), check if
   the current CPU process has been attached to a gmem address space
   (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as
   to it.
3. Call gm_as_attach() to attach the device context to a gmem address space.
4. Invoke gm_dev_fault() to resolve a page fault or prepare data before
   device computation happens.

GMEM has changed the following assumptions in Linux MM:
  1. An mm_struct not only handle a single CPU context, but may also handle
 external memory contexts encapsulated as gm_context listed in
 mm->gm_as. An external memory context can include a few or all of the
 following parts: an external MMU (that requires TLB invalidation), an
 external page table (that requires PTE manipulation) and external DRAM
 (that requires physical memory management).
  2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily
 mean that a zero-filled physical page should be mapped. The virtual
 page may have been mapped to an external memory device.
  3. Unmapping a page may include sending device TLB invalidation (even if
 its MMU shares CPU page table) and manipulating device PTEs.



Semantics of new syscalls:

1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED)
Allocate virtual address that is shared between the CPU and all
attached devices. Data is guaranteed to be coherent whenever the
address is accessed by either CPU or any attached device. If the device
does not support page fault, then device driver is responsible for
faulting memory before data gets accessed. By default, the CPU DRAM is
can be used as a swap backup for the device local memory.
2. hmadvise(NUMA_id, va_start, size, memory_hint)
Issuing memory hint for a given VMA. This extends traditional madvise()
syscall with an extra argument so that programmers have better control
with heterogeneous devices registered as NUMA nodes. One useful memory
hint could be MADV_PREFETCH, which guarantees that the physical data of
the given VMA [VA, VA+size) is migrated to NUMA node #id. Another
useful memory hint is MADV_DONTNEED. This is helpful to increase device
memory utilization. It is worth considering extending the existing
madvise() syscall with one additional argument.



Implementation 

[RFC PATCH 4/6] mm/gmem: add new syscall hmadvise() to issue memory hints for heterogeneous NUMA nodes

2023-11-28 Thread Weixi Zhu
This patch adds a new syscall, hmadvise(), to issue memory hints for
heterogeneous NUMA nodes. The new syscall effectively extends madvise()
with one additional argument that indicates the NUMA id of a heterogeneous
device, which is not necessarily accessible by the CPU.

The implemented memory hint is MADV_PREFETCH, which guarantees that the
physical data of the given VMA [VA, VA+size) is migrated to a designated
NUMA id, so subsequent accesses from the corresponding device can obtain
local memory access speed. This prefetch hint is internally parallized with
multiple workqueue threads, allowing the page table management to be
overlapped. In a test with Huawei's Ascend NPU card, the MADV_PREFETCH is
able to saturate the host-device bandwidth if the given VMA size is larger
than 16MB.

Signed-off-by: Weixi Zhu 
---
 arch/arm64/include/asm/unistd.h |   2 +-
 arch/arm64/include/asm/unistd32.h   |   2 +
 include/linux/gmem.h|   9 +
 include/uapi/asm-generic/mman-common.h  |   3 +
 include/uapi/asm-generic/unistd.h   |   5 +-
 kernel/sys_ni.c |   2 +
 mm/gmem.c   | 222 
 tools/include/uapi/asm-generic/unistd.h |   5 +-
 8 files changed, 247 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 531effca5f1f..298313d2e0af 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   457
+#define __NR_compat_syscalls   458
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 9f7c1bf99526..0d44383b98be 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -919,6 +919,8 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
+#define __NR_hmadvise 457
+__SYSCALL(__NR_hmadvise, sys_hmadvise)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/include/linux/gmem.h b/include/linux/gmem.h
index f424225daa03..97186f29638d 100644
--- a/include/linux/gmem.h
+++ b/include/linux/gmem.h
@@ -22,6 +22,11 @@ static inline bool gmem_is_enabled(void)
return static_branch_likely(_status);
 }
 
+static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
+{
+   return false;
+}
+
 struct gm_dev {
int id;
 
@@ -280,6 +285,10 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
 bool activate, struct gm_context **out_ctx);
 #else
 static inline bool gmem_is_enabled(void) { return false; }
+static inline bool vma_is_peer_shared(struct vm_area_struct *vma)
+{
+   return false;
+}
 static inline void hnuma_init(void) {}
 static inline void __init vm_object_init(void)
 {
diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index 6ce1f1ceb432..49b22a497c5d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -79,6 +79,9 @@
 
 #define MADV_COLLAPSE  25  /* Synchronous hugepage collapse */
 
+/* for hmadvise */
+#define MADV_PREFETCH  26  /* prefetch pages for hNUMA node */
+
 /* compatibility flags */
 #define MAP_FILE   0
 
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index 756b013fb832..a0773d4f7fa5 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -829,8 +829,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
+#define __NR_hmadvise 453
+__SYSCALL(__NR_hmadvise, sys_hmadvise)
+
 #undef __NR_syscalls
-#define __NR_syscalls 457
+#define __NR_syscalls 458
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index e1a6e3c675c0..73bc1b35b8c6 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -374,3 +374,5 @@ COND_SYSCALL(setuid16);
 
 /* restartable sequence */
 COND_SYSCALL(rseq);
+
+COND_SYSCALL(hmadvise);
diff --git a/mm/gmem.c b/mm/gmem.c
index b95b6b42ed6d..4eb522026a0d 100644
--- a/mm/gmem.c
+++ b/mm/gmem.c
@@ -9,6 +9,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 DEFINE_STATIC_KEY_FALSE(gmem_status);
 EXPORT_SYMBOL_GPL(gmem_status);
@@ -484,3 +486,223 @@ int gm_as_attach(struct gm_as *as, struct gm_dev *dev, 
enum gm_mmu_mode mode,
return GM_RET_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(gm_as_attach);
+
+struct prefetch_data {
+   struct mm_struct *mm;
+   struct gm_dev *dev;
+   unsigned long addr;
+   size_t size;
+   struct work_struct work;
+

  1   2   3   >