Re: [PATCH v4 09/11] drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create

2024-09-20 Thread Rob Clark
On Fri, Sep 20, 2024 at 9:54 AM Akhil P Oommen  wrote:
>
> On Tue, Sep 17, 2024 at 01:14:19PM +0200, Antonino Maniscalco wrote:
> > Some userspace changes are necessary so add a flag for userspace to
> > advertise support for preemption when creating the submitqueue.
> >
> > When this flag is not set preemption will not be allowed in the middle
> > of the submitted IBs therefore mantaining compatibility with older
> > userspace.
> >
> > The flag is rejected if preemption is not supported on the target, this
> > allows userspace to know whether preemption is supported.
>
> Just curious, what is the motivation behind informing userspace about
> preemption support?

I think I requested that, as a "just in case" (because it would
otherwise be awkward if we later needed to know the difference btwn
drm/sched "preemption" which can only happen before submit is written
to ring and "real" preemption)

BR,
-R

> -Akhil
>
> >
> > Signed-off-by: Antonino Maniscalco 
> > Tested-by: Neil Armstrong  # on SM8650-QRD
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 
> >  drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
> >  include/uapi/drm/msm_drm.h|  5 -
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 736f475d696f..edbcb6d229ba 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -430,8 +430,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> >   OUT_PKT7(ring, CP_SET_MARKER, 1);
> >   OUT_RING(ring, 0x101); /* IFPC disable */
> >
> > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > - OUT_RING(ring, 0x00d); /* IB1LIST start */
> > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > + OUT_RING(ring, 0x00d); /* IB1LIST start */
> > + }
> >
> >   /* Submit the commands */
> >   for (i = 0; i < submit->nr_cmds; i++) {
> > @@ -462,8 +464,10 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
> > msm_gem_submit *submit)
> >   update_shadow_rptr(gpu, ring);
> >   }
> >
> > - OUT_PKT7(ring, CP_SET_MARKER, 1);
> > - OUT_RING(ring, 0x00e); /* IB1LIST end */
> > + if (submit->queue->flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT) {
> > + OUT_PKT7(ring, CP_SET_MARKER, 1);
> > + OUT_RING(ring, 0x00e); /* IB1LIST end */
> > + }
> >
> >   get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
> >   rbmemptr_stats(ring, index, cpcycles_end));
> > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
> > b/drivers/gpu/drm/msm/msm_submitqueue.c
> > index 0e803125a325..9b3ffca3f3b4 100644
> > --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> > @@ -170,6 +170,9 @@ int msm_submitqueue_create(struct drm_device *drm, 
> > struct msm_file_private *ctx,
> >   if (!priv->gpu)
> >   return -ENODEV;
> >
> > + if (flags & MSM_SUBMITQUEUE_ALLOW_PREEMPT && priv->gpu->nr_rings == 1)
> > + return -EINVAL;
> > +
> >   ret = msm_gpu_convert_priority(priv->gpu, prio, &ring_nr, 
> > &sched_prio);
> >   if (ret)
> >   return ret;
> > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> > index 3fca72f73861..f37858db34e6 100644
> > --- a/include/uapi/drm/msm_drm.h
> > +++ b/include/uapi/drm/msm_drm.h
> > @@ -345,7 +345,10 @@ struct drm_msm_gem_madvise {
> >   * backwards compatibility as a "default" submitqueue
> >   */
> >
> > -#define MSM_SUBMITQUEUE_FLAGS (0)
> > +#define MSM_SUBMITQUEUE_ALLOW_PREEMPT0x0001
> > +#define MSM_SUBMITQUEUE_FLAGS( \
> > + MSM_SUBMITQUEUE_ALLOW_PREEMPT | \
> > + 0)
> >
> >  /*
> >   * The submitqueue priority should be between 0 and MSM_PARAM_PRIORITIES-1,
> >
> > --
> > 2.46.0
> >


Re: [PATCH v4 00/11] Preemption support for A7XX

2024-09-20 Thread Rob Clark
On Fri, Sep 20, 2024 at 9:15 AM Akhil P Oommen  wrote:
>
> On Wed, Sep 18, 2024 at 08:39:30AM -0700, Rob Clark wrote:
> > On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > > > This series implements preemption for A7XX targets, which allows the 
> > > > GPU to
> > > > switch to an higher priority ring when work is pushed to it, reducing 
> > > > latency
> > > > for high priority submissions.
> > > >
> > > > This series enables L1 preemption with skip_save_restore which requires
> > > > the following userspace patches to function:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> > > >
> > > > A flag is added to `msm_submitqueue_create` to only allow submissions
> > > > from compatible userspace to be preempted, therefore maintaining
> > > > compatibility.
> > > >
> > > > Preemption is currently only enabled by default on A750, it can be
> > > > enabled on other targets through the `enable_preemption` module
> > > > parameter. This is because more testing is required on other targets.
> > > >
> > > > For testing on other HW it is sufficient to set that parameter to a
> > > > value of 1, then using the branch of mesa linked above, 
> > > > `TU_DEBUG=hiprio`
> > > > allows to run any application as high priority therefore preempting
> > > > submissions from other applications.
> > > >
> > > > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > > > added in this series can be used to observe preemption's behavior as
> > > > well as measuring preemption latency.
> > > >
> > > > Some commits from this series are based on a previous series to enable
> > > > preemption on A6XX targets:
> > > >
> > > > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> > > >
> > > > Signed-off-by: Antonino Maniscalco 
> > > > ---
> > > > Changes in v4:
> > > > - Added missing register in pwrup list
> > > > - Removed and rearrange barriers
> > > > - Renamed `skip_inline_wptr` to `restore_wptr`
> > > > - Track ctx seqno per ring
> > > > - Removed secure preempt context
> > > > - NOP out postamble to disable it instantly
> > > > - Only emit pwrup reglist once
> > > > - Document bv_rptr_addr
> > > > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > > > - Set name on preempt record buffer
> > > > - Link to v3: 
> > > > https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f...@gmail.com
> > > >
> > > > Changes in v3:
> > > > - Added documentation about preemption
> > > > - Use quirks to determine which target supports preemption
> > > > - Add a module parameter to force disabling or enabling preemption
> > > > - Clear postamble when profiling
> > > > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > > > - Make preemption records MAP_PRIV
> > > > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> > > >anymore
> > > > - Link to v2: 
> > > > https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Added preept_record_size for X185 in PATCH 3/7
> > > > - Added patches to reset perf counters
> > > > - Dropped unused defines
> > > > - Dropped unused variable (fixes warning)
> > > > - Only enable preemption on a750
> > > > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > > > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > > > - Added Neil's Tested-By tags
> > > > - Added explanation for UAPI changes in commit message
> > > > - Link to v1: 
> > > > https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> > > >
> > > > ---
> > > > Antonino Maniscalco (11):
> > > >drm/msm: Fix bv_fence being used as bv_rptr
> > > >drm/msm/A6XX: Track current_ctx_seqno per ring
> > > >drm/msm: Add a `preempt_record_size` field
> > > >drm/msm: A

Re: [PATCH v4 00/11] Preemption support for A7XX

2024-09-18 Thread Rob Clark
On Wed, Sep 18, 2024 at 12:46 AM Neil Armstrong
 wrote:
>
> Hi,
>
> On 17/09/2024 13:14, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing 
> > latency
> > for high priority submissions.
> >
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> >
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> >
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> >
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> >
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> >
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> >
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> >
> > Signed-off-by: Antonino Maniscalco 
> > ---
> > Changes in v4:
> > - Added missing register in pwrup list
> > - Removed and rearrange barriers
> > - Renamed `skip_inline_wptr` to `restore_wptr`
> > - Track ctx seqno per ring
> > - Removed secure preempt context
> > - NOP out postamble to disable it instantly
> > - Only emit pwrup reglist once
> > - Document bv_rptr_addr
> > - Removed unused A6XX_PREEMPT_USER_RECORD_SIZE
> > - Set name on preempt record buffer
> > - Link to v3: 
> > https://lore.kernel.org/r/20240905-preemption-a750-t-v3-0-fd947699f...@gmail.com
> >
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >anymore
> > - Link to v2: 
> > https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> >
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: 
> > https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> >
> > ---
> > Antonino Maniscalco (11):
> >drm/msm: Fix bv_fence being used as bv_rptr
> >drm/msm/A6XX: Track current_ctx_seqno per ring
> >drm/msm: Add a `preempt_record_size` field
> >drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >drm/msm/A6xx: Implement preemption for A7XX targets
> >drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >drm/msm/A6xx: Use posamble to reset counters on preemption
> >drm/msm/A6xx: Add traces for preemption
> >drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >drm/msm/A6xx: Enable preemption for A750
> >Documentation: document adreno preemption
> >
> >   Documentation/gpu/msm-preemption.rst   |  98 +
> >   drivers/gpu/drm/msm/Makefile   |   1 +
> >   drivers/gpu/drm/msm/adreno/a2xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |   6 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |   7 +-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 325 ++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h  | 174 
> >   drivers/gpu/drm/msm/adreno/a6xx_preempt.c  | 440 
> > +
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|   9 +-
> >   drivers/gpu/drm/msm/msm_drv.c  |   4 +
> >   drivers/gpu/drm/msm/msm_gpu.c  |   2 +-
> >   drivers/gpu/drm/msm/msm_gpu.h  |  11 -
> >   drivers/gpu/drm/msm/msm_gpu_trace.h|  28 ++
> >   drivers/gpu/drm/msm/msm_ringbuffer.h

Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

2024-09-17 Thread Rob Clark
On Tue, Sep 17, 2024 at 4:37 PM Konrad Dybcio  wrote:
>
> On 17.09.2024 5:30 PM, Rob Clark wrote:
> > On Tue, Sep 17, 2024 at 6:47 AM Konrad Dybcio  
> > wrote:
> >>
> >> On 13.09.2024 9:51 PM, Rob Clark wrote:
> >>> From: Rob Clark 
> >>>
> >>> The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> >>> devices (x1-85, possibly others), it seems to pass that barrier while
> >>> there are still things in the event completion FIFO waiting to be
> >>> written back to memory.
> >>
> >> Can we try to force-fault around here on other GPUs and perhaps
> >> limit this workaround?
> >
> > not sure what you mean by "force-fault"...
>
> I suppose 'reproduce' is what I meant

I haven't _noticed_ it yet.. if you want to try on devices you have,
glmark2 seems to be good at reproducing..

I think the reason is combo of high fps (on x1-85 most scenes are
north of 8k fps) so you get a lot of context switches btwn compositor
and glmark2.  Most scenes are just a clear plus single draw, and I
guess the compositor is just doing a single draw/blit.  A6xx can be
two draws/blits deep in it's pipeline, a7xx can be four, which maybe
exacerbates this.

> > we could probably limit
> > this to certain GPUs, the only reason I didn't is (a) it should be
> > harmless when it is not needed,
>
> Do we have any realistic perf hits here?

I don't think so, we can't switch ttbr0 while the gpu is still busy so
what the sqe does for CP_SMMU_TABLE_UPDATE _should_ be equivalent.
Maybe it amounts to some extra CP cycles and memory read, but I think
that should be negligible given that the expensive thing is that we
are stalling the gpu until it is idle.

> > and (b) I have no real good way to get
> > an exhaustive list of where it is needed.  Maybe/hopefully it is only
> > x1-85, but idk.
> >
> > It does bring up an interesting question about preemption, though
>
> Yeah..

The KMD does setup an xAMBLE to clear the perfcntrs on context switch.
We could maybe piggy back on that, but I guess we'd have to patch in
the fence value to wait for?

> Do we know what windows does here?

not sure, maybe akhil has some way to check.  Whether a similar
scenario comes up with windows probably depends on how the winsys
works.  If it dropped frames when rendering >vblank rate, you'd get
fewer context switches.

BR,
-R

> Konrad


Re: [PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

2024-09-17 Thread Rob Clark
On Tue, Sep 17, 2024 at 6:47 AM Konrad Dybcio  wrote:
>
> On 13.09.2024 9:51 PM, Rob Clark wrote:
> > From: Rob Clark 
> >
> > The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> > devices (x1-85, possibly others), it seems to pass that barrier while
> > there are still things in the event completion FIFO waiting to be
> > written back to memory.
>
> Can we try to force-fault around here on other GPUs and perhaps
> limit this workaround?

not sure what you mean by "force-fault"... we could probably limit
this to certain GPUs, the only reason I didn't is (a) it should be
harmless when it is not needed, and (b) I have no real good way to get
an exhaustive list of where it is needed.  Maybe/hopefully it is only
x1-85, but idk.

It does bring up an interesting question about preemption, though

BR,
-R

> Akhil, do we have any insight on this?
>
> Konrad


[PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

2024-09-13 Thread Rob Clark
From: Rob Clark 

The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
devices (x1-85, possibly others), it seems to pass that barrier while
there are still things in the event completion FIFO waiting to be
written back to memory.

Work around that by adding a fence wait before context switch.  The
CP_EVENT_WRITE that writes the fence is the last write from a submit,
so seeing this value hit memory is a reliable indication that it is
safe to proceed with the context switch.

Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/63
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index bcaec86ac67a..ba5b35502e6d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -101,9 +101,10 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
 }
 
 static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
-   struct msm_ringbuffer *ring, struct msm_file_private *ctx)
+   struct msm_ringbuffer *ring, struct msm_gem_submit *submit)
 {
bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
+   struct msm_file_private *ctx = submit->queue->ctx;
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
phys_addr_t ttbr;
u32 asid;
@@ -115,6 +116,13 @@ static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
return;
 
+   /* Wait for previous submit to complete before continuing: */
+   OUT_PKT7(ring, CP_WAIT_TIMESTAMP, 4);
+   OUT_RING(ring, 0);
+   OUT_RING(ring, lower_32_bits(rbmemptr(ring, fence)));
+   OUT_RING(ring, upper_32_bits(rbmemptr(ring, fence)));
+   OUT_RING(ring, submit->seqno - 1);
+
if (!sysprof) {
if (!adreno_is_a7xx(adreno_gpu)) {
/* Turn off protected mode to write to special 
registers */
@@ -193,7 +201,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
struct msm_ringbuffer *ring = submit->ring;
unsigned int i, ibs = 0;
 
-   a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
+   a6xx_set_pagetable(a6xx_gpu, ring, submit);
 
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
rbmemptr_stats(ring, index, cpcycles_start));
@@ -283,7 +291,7 @@ static void a7xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
OUT_PKT7(ring, CP_THREAD_CONTROL, 1);
OUT_RING(ring, CP_THREAD_CONTROL_0_SYNC_THREADS | CP_SET_THREAD_BR);
 
-   a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
+   a6xx_set_pagetable(a6xx_gpu, ring, submit);
 
get_stats_counter(ring, REG_A7XX_RBBM_PERFCTR_CP(0),
rbmemptr_stats(ring, index, cpcycles_start));
-- 
2.46.0



Re: [PATCH v3 00/10] Preemption support for A7XX

2024-09-09 Thread Rob Clark
On Fri, Sep 6, 2024 at 12:59 PM Akhil P Oommen  wrote:
>
> On Thu, Sep 05, 2024 at 04:51:18PM +0200, Antonino Maniscalco wrote:
> > This series implements preemption for A7XX targets, which allows the GPU to
> > switch to an higher priority ring when work is pushed to it, reducing 
> > latency
> > for high priority submissions.
> >
> > This series enables L1 preemption with skip_save_restore which requires
> > the following userspace patches to function:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
> >
> > A flag is added to `msm_submitqueue_create` to only allow submissions
> > from compatible userspace to be preempted, therefore maintaining
> > compatibility.
> >
> > Preemption is currently only enabled by default on A750, it can be
> > enabled on other targets through the `enable_preemption` module
> > parameter. This is because more testing is required on other targets.
> >
> > For testing on other HW it is sufficient to set that parameter to a
> > value of 1, then using the branch of mesa linked above, `TU_DEBUG=hiprio`
> > allows to run any application as high priority therefore preempting
> > submissions from other applications.
> >
> > The `msm_gpu_preemption_trigger` and `msm_gpu_preemption_irq` traces
> > added in this series can be used to observe preemption's behavior as
> > well as measuring preemption latency.
> >
> > Some commits from this series are based on a previous series to enable
> > preemption on A6XX targets:
> >
> > https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
> >
> > Signed-off-by: Antonino Maniscalco 
>
> Antonino, can you please test this once with per-process pt disabled to
> ensure that is not broken? It is handy sometimes while debugging.
> We just need to remove "adreno-smmu" compatible string from gpu smmu
> node in DT.

fwiw, I'd be ok supporting preemption on devices that have per-process
pgtables.  (And maybe we should be tainting the kernel if per-process
pgtables are disabled on a6xx+)

BR,
-R

> -Akhil.
>
> > ---
> > Changes in v3:
> > - Added documentation about preemption
> > - Use quirks to determine which target supports preemption
> > - Add a module parameter to force disabling or enabling preemption
> > - Clear postamble when profiling
> > - Define A6XX_CP_CONTEXT_SWITCH_CNTL_LEVEL fields in a6xx.xml
> > - Make preemption records MAP_PRIV
> > - Removed user ctx record (NON_PRIV) and patch 2/9 as it's not needed
> >   anymore
> > - Link to v2: 
> > https://lore.kernel.org/r/20240830-preemption-a750-t-v2-0-86aeead2c...@gmail.com
> >
> > Changes in v2:
> > - Added preept_record_size for X185 in PATCH 3/7
> > - Added patches to reset perf counters
> > - Dropped unused defines
> > - Dropped unused variable (fixes warning)
> > - Only enable preemption on a750
> > - Reject MSM_SUBMITQUEUE_ALLOW_PREEMPT for unsupported targets
> > - Added Akhil's Reviewed-By tags to patches 1/9,2/9,3/9
> > - Added Neil's Tested-By tags
> > - Added explanation for UAPI changes in commit message
> > - Link to v1: 
> > https://lore.kernel.org/r/20240815-preemption-a750-t-v1-0-7bda26c34...@gmail.com
> >
> > ---
> > Antonino Maniscalco (10):
> >   drm/msm: Fix bv_fence being used as bv_rptr
> >   drm/msm: Add a `preempt_record_size` field
> >   drm/msm: Add CONTEXT_SWITCH_CNTL bitfields
> >   drm/msm/A6xx: Implement preemption for A7XX targets
> >   drm/msm/A6xx: Sync relevant adreno_pm4.xml changes
> >   drm/msm/A6xx: Use posamble to reset counters on preemption
> >   drm/msm/A6xx: Add traces for preemption
> >   drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
> >   drm/msm/A6xx: Enable preemption for A750
> >   Documentation: document adreno preemption
> >
> >  Documentation/gpu/msm-preemption.rst   |  98 +
> >  drivers/gpu/drm/msm/Makefile   |   1 +
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |   7 +-
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 331 +++-
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h  | 166 
> >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c  | 430 
> > +
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|   9 +-
> >  drivers/gpu/drm/msm/msm_drv.c  |   4 +
> >  drivers/gpu/drm/msm/msm_gpu_trace.h|  28 ++
> >  drivers/gpu/drm/msm/msm_ringbuffer.h   |   8 +
> >  drivers/gpu/drm/msm/msm_submitqueue.c  |   3 +
> >  drivers/gpu/drm/msm/registers/adreno/a6xx.xml  |   7 +-
> >  .../gpu/drm/msm/registers/adreno/adreno_pm4.xml|  39 +-
> >  include/uapi/drm/msm_drm.h |   5 +-
> >  14 files changed, 1094 insertions(+), 42 deletions(-)
> > ---
> > base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> > change-id: 20240815-preemption-a750-t-fcee9a844b39
> >
> > Best regards,
> > --
> > Antonino Maniscalco 
> >


Re: [PATCH v3 04/10] drm/msm/A6xx: Implement preemption for A7XX targets

2024-09-09 Thread Rob Clark
On Mon, Sep 9, 2024 at 6:43 AM Connor Abbott  wrote:
>
> On Mon, Sep 9, 2024 at 2:15 PM Antonino Maniscalco
>  wrote:
> >
> > On 9/6/24 9:54 PM, Akhil P Oommen wrote:
> > > On Thu, Sep 05, 2024 at 04:51:22PM +0200, Antonino Maniscalco wrote:
> > >> This patch implements preemption feature for A6xx targets, this allows
> > >> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > >> hardware as such supports multiple levels of preemption granularities,
> > >> ranging from coarse grained(ringbuffer level) to a more fine grained
> > >> such as draw-call level or a bin boundary level preemption. This patch
> > >> enables the basic preemption level, with more fine grained preemption
> > >> support to follow.
> > >>
> > >> Signed-off-by: Sharat Masetty 
> > >> Signed-off-by: Antonino Maniscalco 
> > >> Tested-by: Neil Armstrong  # on SM8650-QRD
> > >> ---
> > >>   drivers/gpu/drm/msm/Makefile  |   1 +
> > >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 293 +-
> > >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 161 
> > ...
> > >
> > > we can use the lighter smp variant here.
> > >
> > >> +
> > >> +if (a6xx_gpu->cur_ring == ring)
> > >> +gpu_write(gpu, REG_A6XX_CP_RB_WPTR, wptr);
> > >> +else
> > >> +ring->skip_inline_wptr = true;
> > >> +} else {
> > >> +ring->skip_inline_wptr = true;
> > >> +}
> > >> +
> > >> +spin_unlock_irqrestore(&ring->preempt_lock, flags);
> > >>   }
> > >>
> > >>   static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
> > >> @@ -138,12 +231,14 @@ static void a6xx_set_pagetable(struct a6xx_gpu 
> > >> *a6xx_gpu,
> > >
> > > set_pagetable checks "cur_ctx_seqno" to see if pt switch is needed or
> > > not. This is currently not tracked separately for each ring. Can you
> > > please check that?
> >
> > I totally missed that. Thanks for catching it!
> >
> > >
> > > I wonder why that didn't cause any gpu errors in testing. Not sure if I
> > > am missing something.
> > >
> >
> > I think this is because, so long as a single context doesn't submit to
> > two different rings with differenr priorities, we will only be incorrect
> > in the sense that we emit more page table switches than necessary and
> > never less. However untrusted userspace could create a context that
> > submits to two different rings and that would lead to execution in the
> > wrong context so we must fix this.
>
> FWIW, in Mesa in the future we may want to expose multiple Vulkan
> queues per device. Then this would definitely blow up.

This will actually be required by future android versions, with the
switch to vk hwui backend (because apparently locking is hard, the
solution was to use different queue's for different threads)

https://gitlab.freedesktop.org/mesa/mesa/-/issues/11326

BR,
-R


Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-06 Thread Rob Clark
On Fri, Sep 6, 2024 at 5:24 AM Robin Murphy  wrote:
>
> On 2024-09-05 6:10 pm, Rob Clark wrote:
> > On Thu, Sep 5, 2024 at 10:00 AM Rob Clark  wrote:
> >>
> >> On Thu, Sep 5, 2024 at 9:27 AM Robin Murphy  wrote:
> >>>
> >>> On 05/09/2024 4:53 pm, Will Deacon wrote:
> >>>> Hi Rob,
> >>>>
> >>>> On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:
> >>>>> From: Rob Clark 
> >>>>>
> >>>>> This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.
> >>>>>
> >>>>> It was causing gpu smmu faults on x1e80100.
> >>>>>
> >>>>> I _think_ what is causing this is the change in ordering of
> >>>>> __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> >>>>> memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
> >>>>> this patch is supposed to work correctly in the face of other
> >>>>> concurrent translations (to buffers unrelated to the one being
> >>>>> unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> >>>>> stale data read back into the tlb.
> >>>>>
> >>>>> Signed-off-by: Rob Clark 
> >>>>> ---
> >>>>>drivers/iommu/io-pgtable-arm.c | 31 ++-
> >>>>>1 file changed, 14 insertions(+), 17 deletions(-)
> >>>>
> >>>> Please can you try the diff below, instead?
> >>>
> >>> Given that the GPU driver's .tlb_add_page is a no-op, I can't see this
> >>> making a difference. In fact, given that msm_iommu_pagetable_unmap()
> >>> still does a brute-force iommu_flush_iotlb_all() after io-pgtable
> >>> returns, and in fact only recently made .tlb_flush_walk start doing
> >>> anything either for the sake of the map path, I'm now really wondering
> >>> how this patch has had any effect at all... :/
> >>
> >> Yeah..  and unfortunately the TBU code only supports two devices so
> >> far, so I can't easily repro with TBU enabled atm.  Hmm..
> >> __arm_lpae_unmap() is also called in the ->map() path, although not
> >> sure how that changes things.
> >
> > Ok, an update.. after a reboot, still with this patch reverted, I once
> > again see faults.  So I guess that vindicates the original patch, and
> > leaves me still searching..
> >
> > fwiw, fault info from the gpu devcore:
> >
> > -
> > fault-info:
> >- ttbr0=000919306000
> >- iova=000100c17000
> >- dir=WRITE
> >- type=UNKNOWN
> >- source=CP
> > pgtable-fault-info:
> >- ttbr0: 00090ca4
> >- asid: 0
> >- ptes: 00095db47003 00095db48003 000914c8f003 
> > 0008fd7f0f47
> > -
> >
> > the 'ptes' part shows the table walk, which looks ok to me..
>
> But is it the right pagetable at all, given that the "ttbr0" values
> appear to be indicating different places?

hmm, the gpu does seem to be switching to the new table before it is
done with the old one..

BR,
-R


Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Rob Clark
On Thu, Sep 5, 2024 at 10:00 AM Rob Clark  wrote:
>
> On Thu, Sep 5, 2024 at 9:27 AM Robin Murphy  wrote:
> >
> > On 05/09/2024 4:53 pm, Will Deacon wrote:
> > > Hi Rob,
> > >
> > > On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:
> > >> From: Rob Clark 
> > >>
> > >> This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.
> > >>
> > >> It was causing gpu smmu faults on x1e80100.
> > >>
> > >> I _think_ what is causing this is the change in ordering of
> > >> __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> > >> memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
> > >> this patch is supposed to work correctly in the face of other
> > >> concurrent translations (to buffers unrelated to the one being
> > >> unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> > >> stale data read back into the tlb.
> > >>
> > >> Signed-off-by: Rob Clark 
> > >> ---
> > >>   drivers/iommu/io-pgtable-arm.c | 31 ++-
> > >>   1 file changed, 14 insertions(+), 17 deletions(-)
> > >
> > > Please can you try the diff below, instead?
> >
> > Given that the GPU driver's .tlb_add_page is a no-op, I can't see this
> > making a difference. In fact, given that msm_iommu_pagetable_unmap()
> > still does a brute-force iommu_flush_iotlb_all() after io-pgtable
> > returns, and in fact only recently made .tlb_flush_walk start doing
> > anything either for the sake of the map path, I'm now really wondering
> > how this patch has had any effect at all... :/
>
> Yeah..  and unfortunately the TBU code only supports two devices so
> far, so I can't easily repro with TBU enabled atm.  Hmm..
> __arm_lpae_unmap() is also called in the ->map() path, although not
> sure how that changes things.

Ok, an update.. after a reboot, still with this patch reverted, I once
again see faults.  So I guess that vindicates the original patch, and
leaves me still searching..

fwiw, fault info from the gpu devcore:

-
fault-info:
  - ttbr0=000919306000
  - iova=000100c17000
  - dir=WRITE
  - type=UNKNOWN
  - source=CP
pgtable-fault-info:
  - ttbr0: 00090ca4
  - asid: 0
  - ptes: 00095db47003 00095db48003 000914c8f003 0008fd7f0f47
-

the 'ptes' part shows the table walk, which looks ok to me..

BR,
-R

> BR,
> -R
>
> > >
> > > Will
> > >
> > > --->8
> > >
> > > diff --git a/drivers/iommu/io-pgtable-arm.c 
> > > b/drivers/iommu/io-pgtable-arm.c
> > > index 0e67f1721a3d..0a32e9499e2c 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct 
> > > arm_lpae_io_pgtable *data,
> > >  /* Clear the remaining entries */
> > >  __arm_lpae_clear_pte(ptep, &iop->cfg, i);
> > >
> > > -   if (gather && !iommu_iotlb_gather_queued(gather))
> > > +   if (!iommu_iotlb_gather_queued(gather))
> >
> > Note that this would reintroduce the latent issue which was present
> > originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we
> > actually allow a NULL gather to be passed to io_pgtable_tlb_add_page()
> > it may end up being dereferenced (e.g. in arm-smmu-v3).
> >
> > Thanks,
> > Robin.
> >
> > >  for (int j = 0; j < i; j++)
> > >  io_pgtable_tlb_add_page(iop, gather, 
> > > iova + j * size, size);
> > >


Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Rob Clark
On Thu, Sep 5, 2024 at 9:27 AM Robin Murphy  wrote:
>
> On 05/09/2024 4:53 pm, Will Deacon wrote:
> > Hi Rob,
> >
> > On Thu, Sep 05, 2024 at 05:49:56AM -0700, Rob Clark wrote:
> >> From: Rob Clark 
> >>
> >> This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.
> >>
> >> It was causing gpu smmu faults on x1e80100.
> >>
> >> I _think_ what is causing this is the change in ordering of
> >> __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> >> memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
> >> this patch is supposed to work correctly in the face of other
> >> concurrent translations (to buffers unrelated to the one being
> >> unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> >> stale data read back into the tlb.
> >>
> >> Signed-off-by: Rob Clark 
> >> ---
> >>   drivers/iommu/io-pgtable-arm.c | 31 ++-
> >>   1 file changed, 14 insertions(+), 17 deletions(-)
> >
> > Please can you try the diff below, instead?
>
> Given that the GPU driver's .tlb_add_page is a no-op, I can't see this
> making a difference. In fact, given that msm_iommu_pagetable_unmap()
> still does a brute-force iommu_flush_iotlb_all() after io-pgtable
> returns, and in fact only recently made .tlb_flush_walk start doing
> anything either for the sake of the map path, I'm now really wondering
> how this patch has had any effect at all... :/

Yeah..  and unfortunately the TBU code only supports two devices so
far, so I can't easily repro with TBU enabled atm.  Hmm..
__arm_lpae_unmap() is also called in the ->map() path, although not
sure how that changes things.

BR,
-R

> >
> > Will
> >
> > --->8
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 0e67f1721a3d..0a32e9499e2c 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -672,7 +672,7 @@ static size_t __arm_lpae_unmap(struct 
> > arm_lpae_io_pgtable *data,
> >  /* Clear the remaining entries */
> >  __arm_lpae_clear_pte(ptep, &iop->cfg, i);
> >
> > -   if (gather && !iommu_iotlb_gather_queued(gather))
> > +   if (!iommu_iotlb_gather_queued(gather))
>
> Note that this would reintroduce the latent issue which was present
> originally, wherein iommu_iotlb_gather_queued(NULL) is false, but if we
> actually allow a NULL gather to be passed to io_pgtable_tlb_add_page()
> it may end up being dereferenced (e.g. in arm-smmu-v3).
>
> Thanks,
> Robin.
>
> >  for (int j = 0; j < i; j++)
> >  io_pgtable_tlb_add_page(iop, gather, iova 
> > + j * size, size);
> >


Re: [PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Rob Clark
On Thu, Sep 5, 2024 at 6:24 AM Robin Murphy  wrote:
>
> On 05/09/2024 1:49 pm, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.
> >
> > It was causing gpu smmu faults on x1e80100.
> >
> > I _think_ what is causing this is the change in ordering of
> > __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> > memory) and io_pgtable_tlb_flush_walk().
>
> As I just commented, how do you believe the order of operations between:
>
> __arm_lpae_clear_pte();
> if (!iopte_leaf()) {
> io_pgtable_tlb_flush_walk();
>
> and:
>
> if (!iopte_leaf()) {
> __arm_lpae_clear_pte();
> io_pgtable_tlb_flush_walk();
>
> fundamentally differs?

from my reading of the original patch, the ordering is the same for
non-leaf nodes, but not for leaf nodes

> I'm not saying there couldn't be some subtle bug in the implementation
> which we've all missed, but I still can't see an issue with the intended
> logic.
>
> >  I'm not entirely sure how
> > this patch is supposed to work correctly in the face of other
> > concurrent translations (to buffers unrelated to the one being
> > unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> > stale data read back into the tlb.
>
> Read back from where? The ex-table PTE which was already set to zero
> before tlb_flush_walk was called?
>
> And isn't the hilariously overcomplicated TBU driver supposed to be
> telling you exactly what happened here? Otherwise I'm going to continue
> to seriously question the purpose of shoehorning that upstream at all...

I guess I could try the TBU driver.  But I already had my patchset to
extract the pgtable walk for gpu devcore dump, and that is telling me
that the CPU view of the pgtable is fine.  Which I think just leaves a
tlbinv problem.  If that is the case, swapping the order of leaf node
cpu cache ops and tlbinv ops seems like the cause.  But maybe I'm
missing something.

BR,
-R

> Thanks,
> Robin.
>
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/iommu/io-pgtable-arm.c | 31 ++-
> >   1 file changed, 14 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 16e51528772d..85261baa3a04 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, 
> > int num_entries,
> >  sizeof(*ptep) * num_entries, 
> > DMA_TO_DEVICE);
> >   }
> >
> > -static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct 
> > io_pgtable_cfg *cfg, int num_entries)
> > +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct 
> > io_pgtable_cfg *cfg)
> >   {
> > - for (int i = 0; i < num_entries; i++)
> > - ptep[i] = 0;
> >
> > - if (!cfg->coherent_walk && num_entries)
> > - __arm_lpae_sync_pte(ptep, num_entries, cfg);
> > + *ptep = 0;
> > +
> > + if (!cfg->coherent_walk)
> > + __arm_lpae_sync_pte(ptep, 1, cfg);
> >   }
> >
> >   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> > @@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct 
> > arm_lpae_io_pgtable *data,
> >   max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
> >   num_entries = min_t(int, pgcount, max_entries);
> >
> > - /* Find and handle non-leaf entries */
> > - for (i = 0; i < num_entries; i++) {
> > - pte = READ_ONCE(ptep[i]);
> > + while (i < num_entries) {
> > + pte = READ_ONCE(*ptep);
> >   if (WARN_ON(!pte))
> >   break;
> >
> > - if (!iopte_leaf(pte, lvl, iop->fmt)) {
> > - __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
> > + __arm_lpae_clear_pte(ptep, &iop->cfg);
> >
> > + if (!iopte_leaf(pte, lvl, iop->fmt)) {
> >   /* Also flush any partial walks */
> >   io_pgtable_tlb_flush_walk(iop, iova + i * 
> > size, size,
> > 
> > ARM_LPAE_GRANULE(data));
> >   __arm_lp

[PATCH] Revert "iommu/io-pgtable-arm: Optimise non-coherent unmap"

2024-09-05 Thread Rob Clark
From: Rob Clark 

This reverts commit 85b715a334583488ad7fbd3001fe6fd617b7d4c0.

It was causing gpu smmu faults on x1e80100.

I _think_ what is causing this is the change in ordering of
__arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
memory) and io_pgtable_tlb_flush_walk().  I'm not entirely sure how
this patch is supposed to work correctly in the face of other
concurrent translations (to buffers unrelated to the one being
unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
stale data read back into the tlb.

Signed-off-by: Rob Clark 
---
 drivers/iommu/io-pgtable-arm.c | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 16e51528772d..85261baa3a04 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,13 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int 
num_entries,
   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg 
*cfg, int num_entries)
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg 
*cfg)
 {
-   for (int i = 0; i < num_entries; i++)
-   ptep[i] = 0;
 
-   if (!cfg->coherent_walk && num_entries)
-   __arm_lpae_sync_pte(ptep, num_entries, cfg);
+   *ptep = 0;
+
+   if (!cfg->coherent_walk)
+   __arm_lpae_sync_pte(ptep, 1, cfg);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -653,28 +653,25 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
num_entries = min_t(int, pgcount, max_entries);
 
-   /* Find and handle non-leaf entries */
-   for (i = 0; i < num_entries; i++) {
-   pte = READ_ONCE(ptep[i]);
+   while (i < num_entries) {
+   pte = READ_ONCE(*ptep);
if (WARN_ON(!pte))
break;
 
-   if (!iopte_leaf(pte, lvl, iop->fmt)) {
-   __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+   __arm_lpae_clear_pte(ptep, &iop->cfg);
 
+   if (!iopte_leaf(pte, lvl, iop->fmt)) {
/* Also flush any partial walks */
io_pgtable_tlb_flush_walk(iop, iova + i * size, 
size,
  
ARM_LPAE_GRANULE(data));
__arm_lpae_free_pgtable(data, lvl + 1, 
iopte_deref(pte, data));
+   } else if (!iommu_iotlb_gather_queued(gather)) {
+   io_pgtable_tlb_add_page(iop, gather, iova + i * 
size, size);
}
-   }
 
-   /* Clear the remaining entries */
-   __arm_lpae_clear_pte(ptep, &iop->cfg, i);
-
-   if (gather && !iommu_iotlb_gather_queued(gather))
-   for (int j = 0; j < i; j++)
-   io_pgtable_tlb_add_page(iop, gather, iova + j * 
size, size);
+   ptep++;
+   i++;
+   }
 
return i * size;
} else if (iopte_leaf(pte, lvl, iop->fmt)) {
-- 
2.46.0



Re: [PATCH v2 6/9] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-09-04 Thread Rob Clark
On Wed, Sep 4, 2024 at 6:39 AM Antonino Maniscalco
 wrote:
>
> On 8/30/24 8:32 PM, Rob Clark wrote:
> > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
> >  wrote:
> >>
> >> Use the postamble to reset perf counters when switching between rings,
> >> except when sysprof is enabled, analogously to how they are reset
> >> between submissions when switching pagetables.
> >>
> >> Signed-off-by: Antonino Maniscalco 
> >> ---
> >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 +-
> >>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  6 ++
> >>   drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 26 +-
> >>   drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 +--
> >>   4 files changed, 49 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> >> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> index 1a90db5759b8..3528ecbbc1ab 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >> @@ -366,7 +366,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> >> msm_gem_submit *submit)
> >>   static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
> >>  struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue 
> >> *queue)
> >>   {
> >> -   u64 preempt_offset_priv_secure;
> >> +   bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) 
> >> > 1;
> >> +   u64 preempt_offset_priv_secure, preempt_postamble;
> >>
> >>  OUT_PKT7(ring, CP_SET_PSEUDO_REG, 15);
> >>
> >> @@ -403,6 +404,17 @@ static void a6xx_emit_set_pseudo_reg(struct 
> >> msm_ringbuffer *ring,
> >>  /* seems OK to set to 0 to disable it */
> >>  OUT_RING(ring, 0);
> >>  OUT_RING(ring, 0);
> >> +
> >> +   if (!sysprof && a6xx_gpu->preempt_postamble_len) {
> >> +   preempt_postamble = 
> >> SCRATCH_PREEMPT_POSTAMBLE_IOVA(a6xx_gpu);
> >> +
> >> +   OUT_PKT7(ring, CP_SET_AMBLE, 3);
> >> +   OUT_RING(ring, lower_32_bits(preempt_postamble));
> >> +   OUT_RING(ring, upper_32_bits(preempt_postamble));
> >> +   OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> >> +   a6xx_gpu->preempt_postamble_len) |
> >> +   CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> >> +   }
> >
> > Hmm, ok, we set this in the submit path.. but do we need to clear it
> > somehow when transitioning from !sysprof to sysprof?
> >
>
> We can always emit the packet and 0 fields out when sysprof is enabled.
> Would that be ok for you? Only emitting it when needed might be
> nontrivial given that there are multiple rings and we would be paying
> the overhead for emitting it in the more common case (not profiling) anyway.

That sounds like it would work

> > Also, how does this interact with UMD perfctr queries, I would expect
> > they would prefer save/restore?
>
> Right so my understanding given previous discussions is that we want to
> disable preemption from userspace in that case? The vulkan extension
> requires to acquire and release a lock so we could use that to emit the
> packets that enable and disable preemption perhaps.

ack

BR,
-R

> >
> > BR,
> > -R
> >
> >>   }
> >>
> >>   static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit 
> >> *submit)
> >> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> >> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> index 652e49f01428..2338e36c8f47 100644
> >> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> >> @@ -66,6 +66,7 @@ struct a6xx_gpu {
> >>  atomic_t preempt_state;
> >>  spinlock_t eval_lock;
> >>  struct timer_list preempt_timer;
> >> +   uint64_t preempt_postamble_len;
> >>
> >>  unsigned int preempt_level;
> >>  bool uses_gmem;
> >> @@ -99,6 +100,11 @@ struct a6xx_gpu {
> >>   #define SCRATCH_USER_CTX_IOVA(ring_id, a6xx_gpu) \
> >>  (a6xx_gpu->scratch_iova + (ring_id * sizeof(uint64_t)))
> >>
> >> +#define SCRATCH_PREEMPT_POSTAMBLE_OFFSET (100 * sizeof(u64))
> >> +
> >> +#define SCRATCH_PREEMPT_POSTAMBLE_IOVA(a6xx_gpu) \
> >> +  

[pull] drm/msm: drm-msm-next-2024-09-02 for v6.12

2024-09-02 Thread Rob Clark
Hi Dave, Sima,

This is the main pull for v6.12.  It ended a bit smaller this time,
there are a few series on the dpu and gpu side that weren't quite
ready to go this time around.

Further description below.

The following changes since commit 6d0ebb3904853d18eeec7af5e8b4ca351b6f9025:

  Merge tag 'drm-intel-next-2024-08-29' of
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
(2024-08-30 13:41:32 +1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-next-2024-09-02

for you to fetch changes up to 15302579373ed2c8ada629e9e7bcf9569393a48d:

  drm/msm/dpu: enable writeback on SM6350 (2024-09-02 02:53:44 +0300)


Updates for v6.12

DPU:
- Fix implement DP/PHY mapping on SC8180X
- Enable writeback on SM8150, SC8180X, SM6125, SM6350

DP:
- Enable widebus on all relevant chipsets

DSI:
- Fix PHY programming on SM8350 / SM8450

HDMI:
- Add support for HDMI on MSM8998

MDP5:
- NULL string fix

GPU:
- A642L speedbin support
- A615 support
- A306 support
- A621 support
- Expand UBWC uapi
- A7xx GPU devcoredump fixes
- A5xx preemption fixes
- cleanups


Abhinav Kumar (1):
  drm/msm/dp: enable widebus on all relevant chipsets

Aleksandr Mishin (1):
  drm/msm: Fix incorrect file name output in adreno_request_fw()

Arnaud Vrac (1):
  drm/msm: add msm8998 hdmi phy/pll support

Connor Abbott (7):
  drm/msm: Use a7xx family directly in gpu_state
  drm/msm: Dump correct dbgahb clusters on a750
  drm/msm: Fix CP_BV_DRAW_STATE_ADDR name
  drm/msm: Update a6xx register XML
  drm/msm: Expand UBWC config setting
  drm/msm: Expose expanded UBWC config uapi
  drm/msm: Fix UBWC macrotile_mode for a680

Dmitry Baryshkov (6):
  drm/msm/dpu: Configure DP INTF/PHY selector
  drm/msm/dsi: correct programming sequence for SM8350 / SM8450
  drm/msm/dpu: enable writeback on SM8150
  drm/msm/dpu: enable writeback on SC8108X
  drm/msm/dpu: enable writeback on SM6125
  drm/msm/dpu: enable writeback on SM6350

Eugene Lepshy (1):
  drm/msm/a6xx: Add A642L speedbin (0x81)

Konrad Dybcio (7):
  drm/msm/adreno: Assign msm_gpu->pdev earlier to avoid nullptrs
  drm/msm/a6xx: Evaluate adreno_is_a650_family in pdc_in_aop check
  drm/msm/a6xx: Store primFifoThreshold in struct a6xx_info
  drm/msm/a6xx: Store correct gmu_cgc_mode in struct a6xx_info
  drm/msm/a6xx: Use the per-GPU value for gmu_cgc_mode
  drm/msm/a6xx: Set GMU CGC properties on a6xx too
  drm/msm/a6xx: Add A621 support

Laurent Pinchart (1):
  drm/msm: Remove prototypes for non-existing functions

Li Zetao (1):
  drm/msm/adreno: Use kvmemdup to simplify the code

Marc Gonzalez (3):
  dt-bindings: phy: add qcom,hdmi-phy-8998
  dt-bindings: display/msm: hdmi: add qcom,hdmi-tx-8998
  drm/msm/hdmi: add "qcom,hdmi-tx-8998" compatible

Otto Pflüger (1):
  drm/msm/adreno: Add A306A support

Richard Acayan (1):
  drm/msm/adreno: add a615 support

Rob Clark (1):
  drm/msm: Remove unused pm_state

Sherry Yang (1):
  drm/msm: fix %s null argument error

Vladimir Lypak (4):
  drm/msm/a5xx: disable preemption in submits by default
  drm/msm/a5xx: properly clear preemption records on resume
  drm/msm/a5xx: fix races in preemption evaluation stage
  drm/msm/a5xx: workaround early ring-buffer emptiness check

 .../devicetree/bindings/display/msm/hdmi.yaml  |   28 +-
 .../devicetree/bindings/phy/qcom,hdmi-phy-qmp.yaml |1 +
 drivers/gpu/drm/msm/Makefile   |1 +
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |   11 +
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |   14 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |   16 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h  |2 +
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c  |   30 +-
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  |  141 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  |   21 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |   89 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |2 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c|   46 +-
 .../gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h  |2 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|   15 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   51 +-
 .../gpu/drm/msm/disp/dpu1/catalog/dpu_5_0_sm8150.h |   18 +
 .../drm/msm/disp/dpu1/catalog/dpu_5_1_sc8180x.h|   18 +
 .../gpu/drm/msm/disp/dpu1/catalog/dpu_5_4_sm6125.h |   18 +
 .../gpu/drm/msm/disp/dpu1/catalog/dpu_6_4_sm6350.h |   18 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |6 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c |   41 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.h |   18 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hwio.h

Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
 wrote:
>
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
>
> Signed-off-by: Sharat Masetty 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 
> ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
>  5 files changed, 921 insertions(+), 9 deletions(-)
>

[snip]

> +void a6xx_preempt_trigger(struct msm_gpu *gpu)
> +{
> +   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +   struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +   u64 preempt_offset_priv_secure;
> +   unsigned long flags;
> +   struct msm_ringbuffer *ring;
> +   uint64_t user_ctx_iova;
> +   unsigned int cntl;
> +
> +   if (gpu->nr_rings == 1)
> +   return;
> +
> +   /*
> +* Lock to make sure another thread attempting preemption doesn't 
> skip it
> +* while we are still evaluating the next ring. This makes sure the 
> other
> +* thread does start preemption if we abort it and avoids a soft lock.
> +*/
> +   spin_lock_irqsave(&a6xx_gpu->eval_lock, flags);
> +
> +   /*
> +* Try to start preemption by moving from NONE to START. If
> +* unsuccessful, a preemption is already in flight
> +*/
> +   if (!try_preempt_state(a6xx_gpu, PREEMPT_NONE, PREEMPT_START)) {
> +   spin_unlock_irqrestore(&a6xx_gpu->eval_lock, flags);
> +   return;
> +   }
> +
> +   cntl = (((a6xx_gpu->preempt_level << 6) & 0xC0) |
> +   ((a6xx_gpu->skip_save_restore << 9) & 0x200) |
> +   ((a6xx_gpu->uses_gmem << 8) & 0x100) | 0x1);

nit, could we define these fields in the xml, and not open-code
register building?

BR,
-R


Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
 wrote:
>
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
>
> Signed-off-by: Sharat Masetty 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 
> ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
>  5 files changed, 921 insertions(+), 9 deletions(-)
>

[snip]

> @@ -784,6 +1062,16 @@ static int a6xx_ucode_load(struct msm_gpu *gpu)
> msm_gem_object_set_name(a6xx_gpu->shadow_bo, "shadow");
> }
>
> +   a6xx_gpu->pwrup_reglist_ptr = msm_gem_kernel_new(gpu->dev, PAGE_SIZE,
> +MSM_BO_WC  | 
> MSM_BO_MAP_PRIV,
> +gpu->aspace, 
> &a6xx_gpu->pwrup_reglist_bo,
> +
> &a6xx_gpu->pwrup_reglist_iova);

I guess this could also be MSM_BO_GPU_READONLY?

BR,
-R


Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 12:09 PM Connor Abbott  wrote:
>
> On Fri, Aug 30, 2024 at 8:00 PM Rob Clark  wrote:
> >
> > On Fri, Aug 30, 2024 at 11:54 AM Connor Abbott  wrote:
> > >
> > > On Fri, Aug 30, 2024 at 7:08 PM Rob Clark  wrote:
> > > >
> > > > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
> > > >  wrote:
> > > > >
> > > > > This patch implements preemption feature for A6xx targets, this allows
> > > > > the GPU to switch to a higher priority ringbuffer if one is ready. 
> > > > > A6XX
> > > > > hardware as such supports multiple levels of preemption granularities,
> > > > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > > > such as draw-call level or a bin boundary level preemption. This patch
> > > > > enables the basic preemption level, with more fine grained preemption
> > > > > support to follow.
> > > > >
> > > > > Signed-off-by: Sharat Masetty 
> > > > > Signed-off-by: Antonino Maniscalco 
> > > > > Tested-by: Neil Armstrong  # on SM8650-QRD
> > > > > ---
> > > > >  drivers/gpu/drm/msm/Makefile  |   1 +
> > > > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 
> > > > > +-
> > > > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
> > > > >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 
> > > > > ++
> > > > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > > > >  5 files changed, 921 insertions(+), 9 deletions(-)
> > > > >
> > > >
> > > > [snip]
> > > >
> > > > > +
> > > > > +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
> > > > > +   struct msm_gpu_submitqueue *queue)
> > > > > +{
> > > > > +   void *ptr;
> > > > > +
> > > > > +   /*
> > > > > +* Create a per submitqueue buffer for the CP to save and 
> > > > > restore user
> > > > > +* specific information such as the VPC streamout data.
> > > > > +*/
> > > > > +   ptr = msm_gem_kernel_new(gpu->dev, 
> > > > > A6XX_PREEMPT_USER_RECORD_SIZE,
> > > > > +   MSM_BO_WC, gpu->aspace, &queue->bo, 
> > > > > &queue->bo_iova);
> > > >
> > > > Can this be MSM_BO_MAP_PRIV?  Otherwise it is visible (and writeable)
> > > > by other proceess's userspace generated cmdstream.
> > > >
> > > > And a similar question for the scratch_bo..  I'd have to give some
> > > > thought to what sort of mischief could be had, but generall kernel
> > > > mappings that are not MAP_PRIV are a thing to be careful about.
> > > >
> > >
> > > It seems like the idea behind this is that it's supposed to be
> > > per-context. kgsl allocates it as part of the context, as part of the
> > > userspace address space, and then in order to know which user record
> > > to use when preempting, before each submit (although really it only
> > > needs to be done when setting the pagetable) it does a CP_MEM_WRITE of
> > > the user record address to a scratch buffer holding an array of the
> > > current user record for each ring. Then when preempting it reads the
> > > address for the next ring from the scratch buffer and sets it. I think
> > > we need to do that dance too.
> >
> > Moving it into userspace's address space (vm) would be better.
> >
> > I assume the preempt record is where state is saved/restored?  So
> > would need to be in kernel aspace/vm?  Or is the fw changing ttbr0
> > after saving state but before restoring?
> >
> > BR,
> > -R
>
> The preempt record is split into a number of pieces, each with their
> own address. One of those pieces is the SMMU record with ttbr0 and
> other SMMU things. Another piece is the "private" context record with
> sensitive things like RB address/rptr/wptr, although actually the bulk
> of the registers are saved here. Then the user or "non-private" record
> is its own piece, which is presumably saved before switching ttbr0 and
> restored after the SMMU record is restored and ttbr0 is switched.
>

Ok, and all these are offsets in the preempt record.. but that part is
allocated with MAP_PRIV, so that part should be ok.

Why is the VPC streamout state handled differently?

BR,
-R


Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 11:54 AM Connor Abbott  wrote:
>
> On Fri, Aug 30, 2024 at 7:08 PM Rob Clark  wrote:
> >
> > On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
> >  wrote:
> > >
> > > This patch implements preemption feature for A6xx targets, this allows
> > > the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > hardware as such supports multiple levels of preemption granularities,
> > > ranging from coarse grained(ringbuffer level) to a more fine grained
> > > such as draw-call level or a bin boundary level preemption. This patch
> > > enables the basic preemption level, with more fine grained preemption
> > > support to follow.
> > >
> > > Signed-off-by: Sharat Masetty 
> > > Signed-off-by: Antonino Maniscalco 
> > > Tested-by: Neil Armstrong  # on SM8650-QRD
> > > ---
> > >  drivers/gpu/drm/msm/Makefile  |   1 +
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
> > >  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 
> > > ++
> > >  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
> > >  5 files changed, 921 insertions(+), 9 deletions(-)
> > >
> >
> > [snip]
> >
> > > +
> > > +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
> > > +   struct msm_gpu_submitqueue *queue)
> > > +{
> > > +   void *ptr;
> > > +
> > > +   /*
> > > +* Create a per submitqueue buffer for the CP to save and restore 
> > > user
> > > +* specific information such as the VPC streamout data.
> > > +*/
> > > +   ptr = msm_gem_kernel_new(gpu->dev, A6XX_PREEMPT_USER_RECORD_SIZE,
> > > +   MSM_BO_WC, gpu->aspace, &queue->bo, 
> > > &queue->bo_iova);
> >
> > Can this be MSM_BO_MAP_PRIV?  Otherwise it is visible (and writeable)
> > by other proceess's userspace generated cmdstream.
> >
> > And a similar question for the scratch_bo..  I'd have to give some
> > thought to what sort of mischief could be had, but generall kernel
> > mappings that are not MAP_PRIV are a thing to be careful about.
> >
>
> It seems like the idea behind this is that it's supposed to be
> per-context. kgsl allocates it as part of the context, as part of the
> userspace address space, and then in order to know which user record
> to use when preempting, before each submit (although really it only
> needs to be done when setting the pagetable) it does a CP_MEM_WRITE of
> the user record address to a scratch buffer holding an array of the
> current user record for each ring. Then when preempting it reads the
> address for the next ring from the scratch buffer and sets it. I think
> we need to do that dance too.

Moving it into userspace's address space (vm) would be better.

I assume the preempt record is where state is saved/restored?  So
would need to be in kernel aspace/vm?  Or is the fw changing ttbr0
after saving state but before restoring?

BR,
-R

> Connor
>
> > BR,
> > -R


Re: [PATCH v2 6/9] drm/msm/A6xx: Use posamble to reset counters on preemption

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
 wrote:
>
> Use the postamble to reset perf counters when switching between rings,
> except when sysprof is enabled, analogously to how they are reset
> between submissions when switching pagetables.
>
> Signed-off-by: Antonino Maniscalco 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  6 ++
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 26 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  7 +--
>  4 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 1a90db5759b8..3528ecbbc1ab 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -366,7 +366,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>  static void a6xx_emit_set_pseudo_reg(struct msm_ringbuffer *ring,
> struct a6xx_gpu *a6xx_gpu, struct msm_gpu_submitqueue *queue)
>  {
> -   u64 preempt_offset_priv_secure;
> +   bool sysprof = refcount_read(&a6xx_gpu->base.base.sysprof_active) > 1;
> +   u64 preempt_offset_priv_secure, preempt_postamble;
>
> OUT_PKT7(ring, CP_SET_PSEUDO_REG, 15);
>
> @@ -403,6 +404,17 @@ static void a6xx_emit_set_pseudo_reg(struct 
> msm_ringbuffer *ring,
> /* seems OK to set to 0 to disable it */
> OUT_RING(ring, 0);
> OUT_RING(ring, 0);
> +
> +   if (!sysprof && a6xx_gpu->preempt_postamble_len) {
> +   preempt_postamble = SCRATCH_PREEMPT_POSTAMBLE_IOVA(a6xx_gpu);
> +
> +   OUT_PKT7(ring, CP_SET_AMBLE, 3);
> +   OUT_RING(ring, lower_32_bits(preempt_postamble));
> +   OUT_RING(ring, upper_32_bits(preempt_postamble));
> +   OUT_RING(ring, CP_SET_AMBLE_2_DWORDS(
> +   a6xx_gpu->preempt_postamble_len) |
> +   CP_SET_AMBLE_2_TYPE(KMD_AMBLE_TYPE));
> +   }

Hmm, ok, we set this in the submit path.. but do we need to clear it
somehow when transitioning from !sysprof to sysprof?

Also, how does this interact with UMD perfctr queries, I would expect
they would prefer save/restore?

BR,
-R

>  }
>
>  static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 652e49f01428..2338e36c8f47 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -66,6 +66,7 @@ struct a6xx_gpu {
> atomic_t preempt_state;
> spinlock_t eval_lock;
> struct timer_list preempt_timer;
> +   uint64_t preempt_postamble_len;
>
> unsigned int preempt_level;
> bool uses_gmem;
> @@ -99,6 +100,11 @@ struct a6xx_gpu {
>  #define SCRATCH_USER_CTX_IOVA(ring_id, a6xx_gpu) \
> (a6xx_gpu->scratch_iova + (ring_id * sizeof(uint64_t)))
>
> +#define SCRATCH_PREEMPT_POSTAMBLE_OFFSET (100 * sizeof(u64))
> +
> +#define SCRATCH_PREEMPT_POSTAMBLE_IOVA(a6xx_gpu) \
> +   (a6xx_gpu->scratch_iova + SCRATCH_PREEMPT_POSTAMBLE_OFFSET)
> +
>  /*
>   * In order to do lockless preemption we use a simple state machine to 
> progress
>   * through the process.
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> index 4b61b993f75f..f586615db97e 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_preempt.c
> @@ -351,6 +351,28 @@ static int preempt_init_ring(struct a6xx_gpu *a6xx_gpu,
> return 0;
>  }
>
> +static void preempt_prepare_postamble(struct a6xx_gpu *a6xx_gpu)
> +{
> +   u32 *postamble = a6xx_gpu->scratch_ptr + 
> SCRATCH_PREEMPT_POSTAMBLE_OFFSET;
> +   u32 count = 0;
> +
> +   postamble[count++] = PKT7(CP_REG_RMW, 3);
> +   postamble[count++] = REG_A6XX_RBBM_PERFCTR_SRAM_INIT_CMD;
> +   postamble[count++] = 0;
> +   postamble[count++] = 1;
> +
> +   postamble[count++] = PKT7(CP_WAIT_REG_MEM, 6);
> +   postamble[count++] = CP_WAIT_REG_MEM_0_FUNCTION(WRITE_EQ);
> +   postamble[count++] = CP_WAIT_REG_MEM_1_POLL_ADDR_LO(
> +   REG_A6XX_RBBM_PERFCTR_SRAM_INIT_STATUS);
> +   postamble[count++] = CP_WAIT_REG_MEM_2_POLL_ADDR_HI(0);
> +   postamble[count++] = CP_WAIT_REG_MEM_3_REF(0x1);
> +   postamble[count++] = CP_WAIT_REG_MEM_4_MASK(0x1);
> +   postamble[count++] = CP_WAIT_REG_MEM_5_DELAY_LOOP_CYCLES(0);
> +
> +   a6xx_gpu->preempt_postamble_len = count;
> +}
> +
>  void a6xx_preempt_fini(struct msm_gpu *gpu)
>  {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -382,10 +404,12 @@ void a6xx_preempt_init(struct msm_gpu *gpu)
> a6xx_gpu->skip_save_restore = 1;
>
> a6xx_gpu->scratch_ptr  = msm_gem_kernel_new(gpu->dev,
> -  

Re: [PATCH v2 9/9] drm/msm/A6xx: Enable preemption for A750

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
 wrote:
>
> Initialize with 4 rings to enable preemption.
>
> For now only on A750 as other targets require testing.
>
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index a2853309288b..ea518209c03d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2609,7 +2609,9 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
> return ERR_PTR(ret);
> }
>
> -   if (is_a7xx)
> +   if (adreno_is_a750(adreno_gpu))
> +   ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 4);

perhaps it would be useful (to enable others to more easily test), to
allow this to be overridden with a modparam.. something like

   if ((enable_preemption == 1) || (enable_preemption == -1 &&
(config->info->quirks & ADRENO_QUIRK_PREEMPTION))

That would allow overriding enable_preemption to 1 or 0 on cmdline to
force enable/disable preemption.

That plus some instructions about how to test preemption (ie. what
deqp tests to run, or similar) would make it easier to "crowd source"
the testing (assuming you don't have every a7xx device there is)

BR,
-R

> +   else if (is_a7xx)
> ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs_a7xx, 1);
> else if (adreno_has_gmu_wrapper(adreno_gpu))
> ret = adreno_gpu_init(dev, pdev, adreno_gpu, 
> &funcs_gmuwrapper, 1);
>
> --
> 2.46.0
>


Re: [PATCH v2 4/9] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-30 Thread Rob Clark
On Fri, Aug 30, 2024 at 8:33 AM Antonino Maniscalco
 wrote:
>
> This patch implements preemption feature for A6xx targets, this allows
> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> hardware as such supports multiple levels of preemption granularities,
> ranging from coarse grained(ringbuffer level) to a more fine grained
> such as draw-call level or a bin boundary level preemption. This patch
> enables the basic preemption level, with more fine grained preemption
> support to follow.
>
> Signed-off-by: Sharat Masetty 
> Signed-off-by: Antonino Maniscalco 
> Tested-by: Neil Armstrong  # on SM8650-QRD
> ---
>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 323 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 431 
> ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   7 +
>  5 files changed, 921 insertions(+), 9 deletions(-)
>

[snip]

> +
> +int a6xx_preempt_submitqueue_setup(struct msm_gpu *gpu,
> +   struct msm_gpu_submitqueue *queue)
> +{
> +   void *ptr;
> +
> +   /*
> +* Create a per submitqueue buffer for the CP to save and restore user
> +* specific information such as the VPC streamout data.
> +*/
> +   ptr = msm_gem_kernel_new(gpu->dev, A6XX_PREEMPT_USER_RECORD_SIZE,
> +   MSM_BO_WC, gpu->aspace, &queue->bo, &queue->bo_iova);

Can this be MSM_BO_MAP_PRIV?  Otherwise it is visible (and writeable)
by other proceess's userspace generated cmdstream.

And a similar question for the scratch_bo..  I'd have to give some
thought to what sort of mischief could be had, but generall kernel
mappings that are not MAP_PRIV are a thing to be careful about.

BR,
-R


Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-28 Thread Rob Clark
On Wed, Aug 28, 2024 at 6:42 AM Rob Clark  wrote:
>
> On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
>  wrote:
> >
> > On 8/27/24 11:07 PM, Rob Clark wrote:
> > > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
> > >  wrote:
> > >>
> > >> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> > >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> > >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott  
> > >>>> wrote:
> > >>>>>
> > >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen 
> > >>>>>  wrote:
> > >>>>>>
> > >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> > >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen 
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco 
> > >>>>>>>> wrote:
> > >>>>>>>>> This patch implements preemption feature for A6xx targets, this 
> > >>>>>>>>> allows
> > >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is 
> > >>>>>>>>> ready. A6XX
> > >>>>>>>>> hardware as such supports multiple levels of preemption 
> > >>>>>>>>> granularities,
> > >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine 
> > >>>>>>>>> grained
> > >>>>>>>>> such as draw-call level or a bin boundary level preemption. This 
> > >>>>>>>>> patch
> > >>>>>>>>> enables the basic preemption level, with more fine grained 
> > >>>>>>>>> preemption
> > >>>>>>>>> support to follow.
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: Sharat Masetty 
> > >>>>>>>>> Signed-off-by: Antonino Maniscalco 
> > >>>>>>>>> ---
> > >>>>>>>>
> > >>>>>>>> No postamble packets which resets perfcounters? It is necessary. 
> > >>>>>>>> Also, I
> > >>>>>>>> think we should disable preemption during profiling like we 
> > >>>>>>>> disable slumber.
> > >>>>>>>>
> > >>>>>>>> -Akhil.
> > >>>>>>>
> > >>>>>>> I don't see anything in kgsl which disables preemption during
> > >>>>>>> profiling. It disables resetting perfcounters when doing system-wide
> > >>>>>>> profiling, like freedreno, and in that case I assume preempting is
> > >>>>>>> fine because the system profiler has a complete view of everything 
> > >>>>>>> and
> > >>>>>>> should "see" preemptions through the traces. For something like
> > >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
> > >>>>>>> because we disable saving/restoring perf counters, but that has to
> > >>>>>>> happen in userspace because the kernel doesn't know what userspace
> > >>>>>>> does.
> > >>>>>>>
> > >>>>>>
> > >>>>>> KGSL does some sort of arbitration of perfcounter configurations and
> > >>>>>> adds the select/enablement reg configuration as part of dynamic
> > >>>>>> power up register list which we are not doing here. Is this something
> > >>>>>> you are taking care of from userspace via preamble?
> > >>>>>>
> > >>>>>> -Akhil
> > >>>>>
> > >>>>> I don't think we have to take care of that in userspace, because Mesa
> > >>>>> will always configure the counter registers before reading them in the
> > >>>>> same submission, and if it gets preempted in the meantime then we're
> > >>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets
> > >>>>> them from userspace, which is why it

Re: [PATCH 3/5] drm/msm/a6xx: Store gmu_cgc_mode in struct a6xx_info

2024-08-28 Thread Rob Clark
On Wed, Aug 28, 2024 at 4:16 AM Konrad Dybcio  wrote:
>
> On 27.08.2024 10:12 PM, Rob Clark wrote:
> > resending with updated Konrad email addr
> >
> > On Mon, Aug 26, 2024 at 2:09 PM Rob Clark  wrote:
> >>
> >> On Mon, Aug 26, 2024 at 2:07 PM Rob Clark  wrote:
> >>>
> >>> On Fri, Jul 19, 2024 at 3:03 AM Konrad Dybcio  
> >>> wrote:
> >>>>
> >>>> This was apparently almost never set on a6xx.. move the existing values
> >>>> and fill out the remaining ones within the catalog.
> >>>>
> >>>> Signed-off-by: Konrad Dybcio 
> >>>> ---
>
> [...]
>
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>> @@ -402,7 +402,7 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool 
> >>>> state)
> >>>> struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
> >>>> const struct adreno_reglist *reg;
> >>>> unsigned int i;
> >>>> -   u32 val, clock_cntl_on, cgc_mode;
> >>>> +   u32 val, clock_cntl_on;
> >>>>
> >>>> if (!(adreno_gpu->info->a6xx->hwcg || 
> >>>> adreno_is_a7xx(adreno_gpu)))
> >>>> return;
> >>>> @@ -417,10 +417,8 @@ static void a6xx_set_hwcg(struct msm_gpu *gpu, bool 
> >>>> state)
> >>>> clock_cntl_on = 0x8aa8aa82;
> >>>>
> >>>> if (adreno_is_a7xx(adreno_gpu)) {
> >>>> -   cgc_mode = adreno_is_a740_family(adreno_gpu) ? 0x20222 : 
> >>>> 0x2;
> >>>> -
> >>>
> >>> This does appear to change the gmu_cgc_mode in nearly all cases.. was
> >>> this intended?
> >>
> >> Hmm, and this will only get written for a7xx, so we're dropping the
> >> reg write for a690..
>
> Right, this patch is a lot to chew through.. It:
>
> - adds the proper magic value per gpu gen
> - removes the sneaky a690 write
> - uses the new struct entry
>
> but also
>
> - fails to remove the if (a7xx) check
>
> so I suppose for v2 I can split it into:
>
> 1. add the magic values
> 2. fix the if (a7xx) check
> 3. use the struct value and drop the a690 one
>
> does that sound good?

Yeah, I would prefer if it were split up to make it clear that the
magic value changes were intentional

ps.  there is a _bit_ more time to get this in to msm-next for v6.12,
but not much

BR,
-R

> Konrad


Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-28 Thread Rob Clark
On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
 wrote:
>
> On 8/27/24 11:07 PM, Rob Clark wrote:
> > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
> >  wrote:
> >>
> >> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott  
> >>>> wrote:
> >>>>>
> >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen 
> >>>>>  wrote:
> >>>>>>
> >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen 
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> >>>>>>>>> This patch implements preemption feature for A6xx targets, this 
> >>>>>>>>> allows
> >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is ready. 
> >>>>>>>>> A6XX
> >>>>>>>>> hardware as such supports multiple levels of preemption 
> >>>>>>>>> granularities,
> >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine grained
> >>>>>>>>> such as draw-call level or a bin boundary level preemption. This 
> >>>>>>>>> patch
> >>>>>>>>> enables the basic preemption level, with more fine grained 
> >>>>>>>>> preemption
> >>>>>>>>> support to follow.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Sharat Masetty 
> >>>>>>>>> Signed-off-by: Antonino Maniscalco 
> >>>>>>>>> ---
> >>>>>>>>
> >>>>>>>> No postamble packets which resets perfcounters? It is necessary. 
> >>>>>>>> Also, I
> >>>>>>>> think we should disable preemption during profiling like we disable 
> >>>>>>>> slumber.
> >>>>>>>>
> >>>>>>>> -Akhil.
> >>>>>>>
> >>>>>>> I don't see anything in kgsl which disables preemption during
> >>>>>>> profiling. It disables resetting perfcounters when doing system-wide
> >>>>>>> profiling, like freedreno, and in that case I assume preempting is
> >>>>>>> fine because the system profiler has a complete view of everything and
> >>>>>>> should "see" preemptions through the traces. For something like
> >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
> >>>>>>> because we disable saving/restoring perf counters, but that has to
> >>>>>>> happen in userspace because the kernel doesn't know what userspace
> >>>>>>> does.
> >>>>>>>
> >>>>>>
> >>>>>> KGSL does some sort of arbitration of perfcounter configurations and
> >>>>>> adds the select/enablement reg configuration as part of dynamic
> >>>>>> power up register list which we are not doing here. Is this something
> >>>>>> you are taking care of from userspace via preamble?
> >>>>>>
> >>>>>> -Akhil
> >>>>>
> >>>>> I don't think we have to take care of that in userspace, because Mesa
> >>>>> will always configure the counter registers before reading them in the
> >>>>> same submission, and if it gets preempted in the meantime then we're
> >>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets
> >>>>> them from userspace, which is why it has to do something to set them
> >>>>
> >>>> Sorry, should be "kgsl sets them from the kernel".
> >>>>
> >>>>> after IFPC slumber or a context switch when the HW state is gone.
> >>>>> Also, because the upstream approach doesn't play nicely with system
> >>>>> profilers like perfetto, VK_KHR_performance_query is hidden by default
> >>>>> behind a debug flag in turnip. So there's 

Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

2024-08-27 Thread Rob Clark
On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
 wrote:
>
> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> > On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> >> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott  wrote:
> >>>
> >>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen  
> >>> wrote:
> 
>  On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> > On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen 
> >  wrote:
> >>
> >> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> >>> This patch implements preemption feature for A6xx targets, this allows
> >>> the GPU to switch to a higher priority ringbuffer if one is ready. 
> >>> A6XX
> >>> hardware as such supports multiple levels of preemption granularities,
> >>> ranging from coarse grained(ringbuffer level) to a more fine grained
> >>> such as draw-call level or a bin boundary level preemption. This patch
> >>> enables the basic preemption level, with more fine grained preemption
> >>> support to follow.
> >>>
> >>> Signed-off-by: Sharat Masetty 
> >>> Signed-off-by: Antonino Maniscalco 
> >>> ---
> >>
> >> No postamble packets which resets perfcounters? It is necessary. Also, 
> >> I
> >> think we should disable preemption during profiling like we disable 
> >> slumber.
> >>
> >> -Akhil.
> >
> > I don't see anything in kgsl which disables preemption during
> > profiling. It disables resetting perfcounters when doing system-wide
> > profiling, like freedreno, and in that case I assume preempting is
> > fine because the system profiler has a complete view of everything and
> > should "see" preemptions through the traces. For something like
> > VK_KHR_performance_query I suppose we'd want to disable preemption
> > because we disable saving/restoring perf counters, but that has to
> > happen in userspace because the kernel doesn't know what userspace
> > does.
> >
> 
>  KGSL does some sort of arbitration of perfcounter configurations and
>  adds the select/enablement reg configuration as part of dynamic
>  power up register list which we are not doing here. Is this something
>  you are taking care of from userspace via preamble?
> 
>  -Akhil
> >>>
> >>> I don't think we have to take care of that in userspace, because Mesa
> >>> will always configure the counter registers before reading them in the
> >>> same submission, and if it gets preempted in the meantime then we're
> >>> toast anyways (due to not saving/restoring perf counters). kgsl sets
> >>> them from userspace, which is why it has to do something to set them
> >>
> >> Sorry, should be "kgsl sets them from the kernel".
> >>
> >>> after IFPC slumber or a context switch when the HW state is gone.
> >>> Also, because the upstream approach doesn't play nicely with system
> >>> profilers like perfetto, VK_KHR_performance_query is hidden by default
> >>> behind a debug flag in turnip. So there's already an element of "this
> >>> is unsupported, you have to know what you're doing to use it."
> >
> > But when you have composition on GPU enabled, there will be very frequent
> > preemption. And I don't know how usable profiling tools will be in that
> > case unless you disable preemption with a Mesa debug flag. But for that
> > to work, all existing submitqueues should be destroyed and recreated.
> >
> > So I was thinking that we can use the sysprof propertry to force L0
> > preemption from kernel.
> >
> > -Akhil.
> >
>
> Right but when using a system profiler I imagined the expectation would
> be to be able to understand how applications and compositor interact. An
> use case could be measuring latency and understanding what contributes
> to it. That is actually the main reason I added traces for preemption.
> Disabling preemption would make it less useful for this type of
> analysis. Did you have an use case in mind for a system profiler that
> would benefit from disabling preemption and that is not covered by
> VK_KHR_performance_query (or equivalent GL ext)?

I would think that we want to generate an event, with GPU timestamp
(ie. RB_DONE) and which ring we are switching to, so that perfetto/etc
could display multiple GPU timelines and where the switch from one to
the other happens.

I'm a bit curious how this is handled on android, with AGI/etc.. I
don't see any support in perfetto for this.

BR,
-R

> Best regards,
> --
> Antonino Maniscalco 
>


Re: [PATCH 3/5] drm/msm/a6xx: Store gmu_cgc_mode in struct a6xx_info

2024-08-27 Thread Rob Clark
resending with updated Konrad email addr

On Mon, Aug 26, 2024 at 2:09 PM Rob Clark  wrote:
>
> On Mon, Aug 26, 2024 at 2:07 PM Rob Clark  wrote:
> >
> > On Fri, Jul 19, 2024 at 3:03 AM Konrad Dybcio  
> > wrote:
> > >
> > > This was apparently almost never set on a6xx.. move the existing values
> > > and fill out the remaining ones within the catalog.
> > >
> > > Signed-off-by: Konrad Dybcio 
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++-
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
> > >  3 files changed, 21 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > > index 1ea535960f32..deee0b686962 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > > @@ -448,7 +448,6 @@ static const struct adreno_reglist a690_hwcg[] = {
> > > {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
> > > {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
> > > {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
> > > -   {REG_A6XX_GPU_GMU_AO_GMU_CGC_MODE_CNTL, 0x20200},
> > > {REG_A6XX_GPU_GMU_AO_GMU_CGC_DELAY_CNTL, 0x10111},
> > > {REG_A6XX_GPU_GMU_AO_GMU_CGC_HYST_CNTL, 0x},
> > > {}
> > > @@ -636,6 +635,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a612_hwcg,
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x00020202,
> > > .prim_fifo_threshold = 0x0008,
> > > },
> > > /*
> > > @@ -668,6 +668,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a615_hwcg,
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x0222,
> > > .prim_fifo_threshold = 0x0018,
> > > },
> > > .speedbins = ADRENO_SPEEDBINS(
> > > @@ -691,6 +692,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .init = a6xx_gpu_init,
> > > .a6xx = &(const struct a6xx_info) {
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x0222,
> > > .prim_fifo_threshold = 0x0018,
> > > },
> > > .speedbins = ADRENO_SPEEDBINS(
> > > @@ -714,6 +716,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a615_hwcg,
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x0222,
> > > .prim_fifo_threshold = 0x00018000,
> > > },
> > > .speedbins = ADRENO_SPEEDBINS(
> > > @@ -737,6 +740,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a615_hwcg,
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x0222,
> > > .prim_fifo_threshold = 0x00018000,
> > > },
> > > .speedbins = ADRENO_SPEEDBINS(
> > > @@ -760,6 +764,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a615_hwcg,
> > > .protect = &a630_protect,
> > > +   .gmu_cgc_mode = 0x0222,
> > > .prim_fifo_threshold = 0x00018000,
> > > },
> > > .speedbins = ADRENO_SPEEDBINS(
> > > @@ -788,6 +793,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > > .a6xx = &(const struct a6xx_info) {
> > > .hwcg = a630_hwcg,
> > >

[PATCH v9 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-27 Thread Rob Clark
From: Rob Clark 

In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +
 drivers/gpu/drm/msm/msm_gpu.h   |  8 
 drivers/gpu/drm/msm/msm_iommu.c | 22 ++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 1c6626747b98..3848b5a64351 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
"WRITE" : "READ");
drm_printf(p, "  - type=%s\n", info->type);
drm_printf(p, "  - source=%s\n", info->block);
+
+   /* Information extracted from what we think are the current
+* pgtables.  Hopefully the TTBR0 matches what we've extracted
+* from the SMMU registers in smmu_info!
+*/
+   drm_puts(p, "pgtable-fault-info:\n");
+   drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
+   drm_printf(p, "  - asid: %d\n", info->asid);
+   drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
+  info->ptes[0], info->ptes[1], info->ptes[2], 
info->ptes[3]);
}
 
drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..bf2f8b2a7ccc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
if (submit) {
int i;
 
+   if (state->fault_info.ttbr0) {
+   struct msm_gpu_fault_info *info = &state->fault_info;
+   struct msm_mmu *mmu = submit->aspace->mmu;
+
+   msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
+  &info->asid);
+   msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
+   }
+
state->bos = kcalloc(submit->nr_bos,
sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..82e838ba8c80 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
int flags;
const char *type;
const char *block;
+
+   /* Information about what we think/expect is the current SMMU state,
+* for example expected_ttbr0 should match smmu_info.ttbr0 which
+* was read back from SMMU registers.
+*/
+   phys_addr_t pgtbl_ttbr0;
+   u64 ptes[4];
+   int asid;
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 2a94e82316f9..3e692818ba1f 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -195,6 +195,28 @@ struct iommu_domain_geometry 
*msm_iommu_get_geometry(struct msm_mmu *mmu)
return &iommu->domain->geometry;
 }
 
+int
+msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
ptes[4])
+{
+   struct msm_iommu_pagetable *pagetable;
+   struct arm_lpae_io_pgtable_walk_data wd = {};
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (!pagetable->pgtbl_ops->pgtable_walk)
+   return -EINVAL;
+
+   pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, &wd);
+
+   for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
+   ptes[i] = wd.ptes[i];
+
+   return 0;
+}
+
 static const struct msm_mmu_funcs pagetable_funcs = {
.map = msm_iommu_pagetable_map,
.unmap = msm_iommu_pagetable_unmap,
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881..96e509bd96a6 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -53,7 +53,8 @@ static inline void msm_mmu_set

[PATCH v9 0/4] io-pgtable-arm + drm/msm: Extend iova fault debugging

2024-08-27 Thread Rob Clark
From: Rob Clark 

This series extends io-pgtable-arm with a method to retrieve the page
table entries traversed in the process of address translation, and then
beefs up drm/msm gpu devcore dump to include this (and additional info)
in the devcore dump.

This is a respin of https://patchwork.freedesktop.org/series/94968/
(minus a patch that was already merged)

v2: Fix an armv7/32b build error in the last patch
v3: Incorperate Will Deacon's suggestion to make the interface
callback based.
v4: Actually wire up the callback
v5: Drop the callback approach
v6: Make walk-data struct pgtable specific and rename
io_pgtable_walk_data to arm_lpae_io_pgtable_walk_data
v7: Re-use the pgtable walker added for arm_lpae_read_and_clear_dirty()
v8: Pass pte pointer to callback so it can modify the actual pte
v9: Fix selftests_running case

Rob Clark (4):
  iommu/io-pgtable-arm: Make pgtable walker more generic
  iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  drm/msm: Extend gpu devcore dumps with pgtbl info

 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |   9 ++
 drivers/gpu/drm/msm/msm_gpu.h   |   8 ++
 drivers/gpu/drm/msm/msm_iommu.c |  22 
 drivers/gpu/drm/msm/msm_mmu.h   |   3 +-
 drivers/iommu/io-pgtable-arm.c  | 149 +++-
 include/linux/io-pgtable.h  |  15 +++
 7 files changed, 160 insertions(+), 56 deletions(-)

-- 
2.46.0



Re: [PATCH 3/5] drm/msm/a6xx: Store gmu_cgc_mode in struct a6xx_info

2024-08-26 Thread Rob Clark
On Mon, Aug 26, 2024 at 2:07 PM Rob Clark  wrote:
>
> On Fri, Jul 19, 2024 at 3:03 AM Konrad Dybcio  
> wrote:
> >
> > This was apparently almost never set on a6xx.. move the existing values
> > and fill out the remaining ones within the catalog.
> >
> > Signed-off-by: Konrad Dybcio 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++-
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > index 1ea535960f32..deee0b686962 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> > @@ -448,7 +448,6 @@ static const struct adreno_reglist a690_hwcg[] = {
> > {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
> > {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
> > {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
> > -   {REG_A6XX_GPU_GMU_AO_GMU_CGC_MODE_CNTL, 0x20200},
> > {REG_A6XX_GPU_GMU_AO_GMU_CGC_DELAY_CNTL, 0x10111},
> > {REG_A6XX_GPU_GMU_AO_GMU_CGC_HYST_CNTL, 0x},
> > {}
> > @@ -636,6 +635,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a612_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x00020202,
> > .prim_fifo_threshold = 0x0008,
> > },
> > /*
> > @@ -668,6 +668,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a615_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x0222,
> > .prim_fifo_threshold = 0x0018,
> > },
> > .speedbins = ADRENO_SPEEDBINS(
> > @@ -691,6 +692,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .init = a6xx_gpu_init,
> > .a6xx = &(const struct a6xx_info) {
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x0222,
> > .prim_fifo_threshold = 0x0018,
> > },
> > .speedbins = ADRENO_SPEEDBINS(
> > @@ -714,6 +716,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a615_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x0222,
> > .prim_fifo_threshold = 0x00018000,
> > },
> > .speedbins = ADRENO_SPEEDBINS(
> > @@ -737,6 +740,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a615_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x0222,
> > .prim_fifo_threshold = 0x00018000,
> > },
> > .speedbins = ADRENO_SPEEDBINS(
> > @@ -760,6 +764,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a615_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x0222,
> > .prim_fifo_threshold = 0x00018000,
> > },
> > .speedbins = ADRENO_SPEEDBINS(
> > @@ -788,6 +793,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a630_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x00020202,
> > .prim_fifo_threshold = 0x0018,
> > },
> > }, {
> > @@ -806,6 +812,7 @@ static const struct adreno_info a6xx_gpus[] = {
> > .a6xx = &(const struct a6xx_info) {
> > .hwcg = a640_hwcg,
> > .protect = &a630_protect,
> > +   .gmu_cgc_mode = 0x00020

Re: [PATCH 3/5] drm/msm/a6xx: Store gmu_cgc_mode in struct a6xx_info

2024-08-26 Thread Rob Clark
On Fri, Jul 19, 2024 at 3:03 AM Konrad Dybcio  wrote:
>
> This was apparently almost never set on a6xx.. move the existing values
> and fill out the remaining ones within the catalog.
>
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 19 ++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 ++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
>  3 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> index 1ea535960f32..deee0b686962 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
> @@ -448,7 +448,6 @@ static const struct adreno_reglist a690_hwcg[] = {
> {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
> {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
> {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
> -   {REG_A6XX_GPU_GMU_AO_GMU_CGC_MODE_CNTL, 0x20200},
> {REG_A6XX_GPU_GMU_AO_GMU_CGC_DELAY_CNTL, 0x10111},
> {REG_A6XX_GPU_GMU_AO_GMU_CGC_HYST_CNTL, 0x},
> {}
> @@ -636,6 +635,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a612_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x00020202,
> .prim_fifo_threshold = 0x0008,
> },
> /*
> @@ -668,6 +668,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a615_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x0222,
> .prim_fifo_threshold = 0x0018,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -691,6 +692,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .init = a6xx_gpu_init,
> .a6xx = &(const struct a6xx_info) {
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x0222,
> .prim_fifo_threshold = 0x0018,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -714,6 +716,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a615_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x0222,
> .prim_fifo_threshold = 0x00018000,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -737,6 +740,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a615_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x0222,
> .prim_fifo_threshold = 0x00018000,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -760,6 +764,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a615_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x0222,
> .prim_fifo_threshold = 0x00018000,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -788,6 +793,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a630_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x00020202,
> .prim_fifo_threshold = 0x0018,
> },
> }, {
> @@ -806,6 +812,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a640_hwcg,
> .protect = &a630_protect,
> +   .gmu_cgc_mode = 0x00020202,
> .prim_fifo_threshold = 0x0018,
> },
> .speedbins = ADRENO_SPEEDBINS(
> @@ -829,6 +836,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a650_hwcg,
> .protect = &a650_protect,
> +   .gmu_cgc_mode = 0x00020202,
> .prim_fifo_threshold = 0x00300200,
> },
> .address_space_size = SZ_16G,
> @@ -855,6 +863,7 @@ static const struct adreno_info a6xx_gpus[] = {
> .a6xx = &(const struct a6xx_info) {
> .hwcg = a660_hwcg,
> .protect = &a660_protect,
> +   .gmu_cgc_mode = 0x00020

[PATCH v8 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-26 Thread Rob Clark
From: Rob Clark 

In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +
 drivers/gpu/drm/msm/msm_gpu.h   |  8 
 drivers/gpu/drm/msm/msm_iommu.c | 22 ++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 1c6626747b98..3848b5a64351 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
"WRITE" : "READ");
drm_printf(p, "  - type=%s\n", info->type);
drm_printf(p, "  - source=%s\n", info->block);
+
+   /* Information extracted from what we think are the current
+* pgtables.  Hopefully the TTBR0 matches what we've extracted
+* from the SMMU registers in smmu_info!
+*/
+   drm_puts(p, "pgtable-fault-info:\n");
+   drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
+   drm_printf(p, "  - asid: %d\n", info->asid);
+   drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
+  info->ptes[0], info->ptes[1], info->ptes[2], 
info->ptes[3]);
}
 
drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..bf2f8b2a7ccc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
if (submit) {
int i;
 
+   if (state->fault_info.ttbr0) {
+   struct msm_gpu_fault_info *info = &state->fault_info;
+   struct msm_mmu *mmu = submit->aspace->mmu;
+
+   msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
+  &info->asid);
+   msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
+   }
+
state->bos = kcalloc(submit->nr_bos,
sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..82e838ba8c80 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
int flags;
const char *type;
const char *block;
+
+   /* Information about what we think/expect is the current SMMU state,
+* for example expected_ttbr0 should match smmu_info.ttbr0 which
+* was read back from SMMU registers.
+*/
+   phys_addr_t pgtbl_ttbr0;
+   u64 ptes[4];
+   int asid;
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 2a94e82316f9..3e692818ba1f 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -195,6 +195,28 @@ struct iommu_domain_geometry 
*msm_iommu_get_geometry(struct msm_mmu *mmu)
return &iommu->domain->geometry;
 }
 
+int
+msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
ptes[4])
+{
+   struct msm_iommu_pagetable *pagetable;
+   struct arm_lpae_io_pgtable_walk_data wd = {};
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (!pagetable->pgtbl_ops->pgtable_walk)
+   return -EINVAL;
+
+   pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, &wd);
+
+   for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
+   ptes[i] = wd.ptes[i];
+
+   return 0;
+}
+
 static const struct msm_mmu_funcs pagetable_funcs = {
.map = msm_iommu_pagetable_map,
.unmap = msm_iommu_pagetable_unmap,
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881..96e509bd96a6 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -53,7 +53,8 @@ static inline void msm_mmu_set

[PATCH v8 0/4] io-pgtable-arm + drm/msm: Extend iova fault debugging

2024-08-26 Thread Rob Clark
From: Rob Clark 

This series extends io-pgtable-arm with a method to retrieve the page
table entries traversed in the process of address translation, and then
beefs up drm/msm gpu devcore dump to include this (and additional info)
in the devcore dump.

This is a respin of https://patchwork.freedesktop.org/series/94968/
(minus a patch that was already merged)

v2: Fix an armv7/32b build error in the last patch
v3: Incorperate Will Deacon's suggestion to make the interface
callback based.
v4: Actually wire up the callback
v5: Drop the callback approach
v6: Make walk-data struct pgtable specific and rename
io_pgtable_walk_data to arm_lpae_io_pgtable_walk_data
v7: Re-use the pgtable walker added for arm_lpae_read_and_clear_dirty()
v8: Pass pte pointer to callback so it can modify the actual pte

Rob Clark (4):
  iommu/io-pgtable-arm: Make pgtable walker more generic
  iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  drm/msm: Extend gpu devcore dumps with pgtbl info

 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |   9 ++
 drivers/gpu/drm/msm/msm_gpu.h   |   8 ++
 drivers/gpu/drm/msm/msm_iommu.c |  22 
 drivers/gpu/drm/msm/msm_mmu.h   |   3 +-
 drivers/iommu/io-pgtable-arm.c  | 147 +++-
 include/linux/io-pgtable.h  |  15 +++
 7 files changed, 158 insertions(+), 56 deletions(-)

-- 
2.46.0



Re: [PATCH v7 1/4] iommu/io-pgtable-arm: Make pgtable walker more generic

2024-08-23 Thread Rob Clark
On Fri, Aug 23, 2024 at 9:09 AM Will Deacon  wrote:
>
> On Tue, Aug 20, 2024 at 10:16:44AM -0700, Rob Clark wrote:
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index f5d9fd1f45bf..b4bc358740e0 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -747,33 +747,31 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
> > io_pgtable_ops *ops,
> >  }
> >
> >  struct io_pgtable_walk_data {
> > - struct iommu_dirty_bitmap   *dirty;
> > + void*data;
> > + int (*visit)(struct io_pgtable_walk_data *walk_data, int lvl,
> > +  arm_lpae_iopte pte, size_t size);
> >   unsigned long   flags;
> >   u64 addr;
> >   const u64   end;
> >  };
> >
> > -static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data,
> > -struct io_pgtable_walk_data *walk_data,
> > -arm_lpae_iopte *ptep,
> > -int lvl);
> > +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
> > +  struct io_pgtable_walk_data *walk_data,
> > +  arm_lpae_iopte *ptep,
> > +  int lvl);
> >
> > -static int io_pgtable_visit_dirty(struct arm_lpae_io_pgtable *data,
> > -   struct io_pgtable_walk_data *walk_data,
> > -   arm_lpae_iopte *ptep, int lvl)
> > +static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
> > + struct io_pgtable_walk_data *walk_data,
> > + arm_lpae_iopte *ptep, int lvl)
> >  {
> >   struct io_pgtable *iop = &data->iop;
> >   arm_lpae_iopte pte = READ_ONCE(*ptep);
> >
> >   if (iopte_leaf(pte, lvl, iop->fmt)) {
> >   size_t size = ARM_LPAE_BLOCK_SIZE(lvl, data);
> > -
> > - if (iopte_writeable_dirty(pte)) {
> > - iommu_dirty_bitmap_record(walk_data->dirty,
> > -   walk_data->addr, size);
> > - if (!(walk_data->flags & IOMMU_DIRTY_NO_CLEAR))
> > - iopte_set_writeable_clean(ptep);
> > - }
> > + int ret = walk_data->visit(walk_data, lvl, pte, size);
> > + if (ret)
> > + return ret;
> >   walk_data->addr += size;
> >   return 0;
> >   }
> > @@ -782,13 +780,13 @@ static int io_pgtable_visit_dirty(struct 
> > arm_lpae_io_pgtable *data,
> >   return -EINVAL;
> >
> >   ptep = iopte_deref(pte, data);
> > - return __arm_lpae_iopte_walk_dirty(data, walk_data, ptep, lvl + 1);
> > + return __arm_lpae_iopte_walk(data, walk_data, ptep, lvl + 1);
> >  }
> >
> > -static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data,
> > -struct io_pgtable_walk_data *walk_data,
> > -arm_lpae_iopte *ptep,
> > -int lvl)
> > +static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
> > +  struct io_pgtable_walk_data *walk_data,
> > +  arm_lpae_iopte *ptep,
> > +  int lvl)
> >  {
> >   u32 idx;
> >   int max_entries, ret;
> > @@ -803,7 +801,7 @@ static int __arm_lpae_iopte_walk_dirty(struct 
> > arm_lpae_io_pgtable *data,
> >
> >   for (idx = ARM_LPAE_LVL_IDX(walk_data->addr, lvl, data);
> >(idx < max_entries) && (walk_data->addr < walk_data->end); 
> > ++idx) {
> > - ret = io_pgtable_visit_dirty(data, walk_data, ptep + idx, 
> > lvl);
> > + ret = io_pgtable_visit(data, walk_data, ptep + idx, lvl);
> >   if (ret)
> >   return ret;
> >   }
> > @@ -811,6 +809,20 @@ static int __arm_lpae_iopte_walk_dirty(struct 
> > arm_lpae_io_pgtable *data,
> >   return 0;
> >  }
> >
> > +static int visit_dirty(struct io_pgtable_walk_data *walk_data, int lvl,
> > +arm_lpae_iopte pte, size_t size)
> > +{
> > + struct iommu_dirty_bitmap *dirty = walk_data->data;
> > +
> > + if (iopte_writeable_dirty(pte)) {
> > + iommu_dirty_bitmap_record(dirty, walk_data->addr, size);
> > + if (!(walk_data->flags & IOMMU_DIRTY_NO_CLEAR))
> > + iopte_set_writeable_clean(&pte);
>
> Are you sure that's correct? I suspect we really want to update the actual
> page-table in this case, so we probably want to pass the pointer in instead
> of the pte value.

oh, right

> Will


Re: [PATCH v7 2/4] iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys

2024-08-23 Thread Rob Clark
On Fri, Aug 23, 2024 at 9:11 AM Will Deacon  wrote:
>
> On Tue, Aug 20, 2024 at 10:16:45AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Re-use the generic pgtable walk path.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 73 +-
> >  1 file changed, 36 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index b4bc358740e0..5fa1274a665a 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -710,42 +710,6 @@ static size_t arm_lpae_unmap_pages(struct 
> > io_pgtable_ops *ops, unsigned long iov
> >   data->start_level, ptep);
> >  }
> >
> > -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> > -  unsigned long iova)
> > -{
> > - struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > - arm_lpae_iopte pte, *ptep = data->pgd;
> > - int lvl = data->start_level;
> > -
> > - do {
> > - /* Valid IOPTE pointer? */
> > - if (!ptep)
> > - return 0;
> > -
> > - /* Grab the IOPTE we're interested in */
> > - ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
> > - pte = READ_ONCE(*ptep);
> > -
> > - /* Valid entry? */
> > - if (!pte)
> > - return 0;
> > -
> > - /* Leaf entry? */
> > - if (iopte_leaf(pte, lvl, data->iop.fmt))
> > - goto found_translation;
> > -
> > - /* Take it to the next level */
> > - ptep = iopte_deref(pte, data);
> > - } while (++lvl < ARM_LPAE_MAX_LEVELS);
> > -
> > - /* Ran out of page tables to walk */
> > - return 0;
> > -
> > -found_translation:
> > - iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
> > - return iopte_to_paddr(pte, data) | iova;
> > -}
> > -
> >  struct io_pgtable_walk_data {
> >   void*data;
> >   int (*visit)(struct io_pgtable_walk_data *walk_data, int lvl,
> > @@ -760,6 +724,41 @@ static int __arm_lpae_iopte_walk(struct 
> > arm_lpae_io_pgtable *data,
> >arm_lpae_iopte *ptep,
> >int lvl);
> >
> > +struct iova_to_phys_data {
> > + arm_lpae_iopte pte;
> > + int lvl;
> > +};
> > +
> > +static int visit_iova_to_phys(struct io_pgtable_walk_data *walk_data, int 
> > lvl,
> > +   arm_lpae_iopte pte, size_t size)
> > +{
> > + struct iova_to_phys_data *data = walk_data->data;
> > + data->pte = pte;
> > + data->lvl = lvl;
> > + return 0;
> > +}
> > +
> > +static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> > +  unsigned long iova)
> > +{
> > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > + struct iova_to_phys_data d;
> > + struct io_pgtable_walk_data walk_data = {
> > + .data = &d,
> > + .visit = visit_iova_to_phys,
> > + .addr = iova,
> > + .end = iova + 1,
> > + };
> > + int ret;
> > +
> > + ret = __arm_lpae_iopte_walk(data, &walk_data, data->pgd, 
> > data->start_level);
> > + if (ret)
> > + return 0;
> > +
> > + iova &= (ARM_LPAE_BLOCK_SIZE(d.lvl, data) - 1);
> > + return iopte_to_paddr(d.pte, data) | iova;
> > +}
> > +
> >  static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
> >   struct io_pgtable_walk_data *walk_data,
> >   arm_lpae_iopte *ptep, int lvl)
> > @@ -776,7 +775,7 @@ static int io_pgtable_visit(struct arm_lpae_io_pgtable 
> > *data,
> >   return 0;
> >   }
> >
> > - if (WARN_ON(!iopte_table(pte, lvl)))
> > + if (WARN_ON(!iopte_table(pte, lvl) && !selftest_running))
>
> Why do you care about the selftest here?

Otherwise we see a flood of WARN_ON from negative tests in the selftests

BR,
-R

> Will


Re: [PATCH v7 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-22 Thread Rob Clark
On Thu, Aug 22, 2024 at 1:34 PM Akhil P Oommen  wrote:
>
> On Tue, Aug 20, 2024 at 10:16:47AM -0700, Rob Clark wrote: > From: Rob Clark 
> 
> >
> > In the case of iova fault triggered devcore dumps, include additional
> > debug information based on what we think is the current page tables,
> > including the TTBR0 value (which should match what we have in
> > adreno_smmu_fault_info unless things have gone horribly wrong), and
> > the pagetable entries traversed in the process of resolving the
> > faulting iova.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
> >  drivers/gpu/drm/msm/msm_gpu.c   |  9 +
> >  drivers/gpu/drm/msm/msm_gpu.h   |  8 
> >  drivers/gpu/drm/msm/msm_iommu.c | 22 ++
> >  drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
> >  5 files changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 1c6626747b98..3848b5a64351 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct 
> > msm_gpu_state *state,
> >   drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE 
> > ? "WRITE" : "READ");
> >   drm_printf(p, "  - type=%s\n", info->type);
> >   drm_printf(p, "  - source=%s\n", info->block);
> > +
> > + /* Information extracted from what we think are the current
> > +  * pgtables.  Hopefully the TTBR0 matches what we've extracted
> > +  * from the SMMU registers in smmu_info!
> > +  */
> > + drm_puts(p, "pgtable-fault-info:\n");
> > + drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
>
> "0x" prefix? Otherwise, it is a bit confusing when the below one is
> decimal.

mixed feelings, the extra 0x is annoying when pasting into calc which
is a simple way to get binary decoding

OTOH none of this is machine decoded so I guess we could change it

> > + drm_printf(p, "  - asid: %d\n", info->asid);
> > + drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
> > +info->ptes[0], info->ptes[1], info->ptes[2], 
> > info->ptes[3]);
>
> Does crashdec decodes this?

No, it just passed thru for human eyeballs

crashdec _does_ have some logic to flag buffers that are "near" the
faulting iova to help identify if the fault is an underflow/overflow
(which has been, along with the pte trail, useful to debug some
issues)

BR,
-R

> -Akhil.
>
> >   }
> >
> >   drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index 3666b42b4ecd..bf2f8b2a7ccc 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu 
> > *gpu,
> >   if (submit) {
> >   int i;
> >
> > + if (state->fault_info.ttbr0) {
> > + struct msm_gpu_fault_info *info = &state->fault_info;
> > + struct msm_mmu *mmu = submit->aspace->mmu;
> > +
> > + msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
> > +&info->asid);
> > + msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
> > + }
> > +
> >   state->bos = kcalloc(submit->nr_bos,
> >   sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> > index 1f02bb9956be..82e838ba8c80 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.h
> > +++ b/drivers/gpu/drm/msm/msm_gpu.h
> > @@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
> >   int flags;
> >   const char *type;
> >   const char *block;
> > +
> > + /* Information about what we think/expect is the current SMMU state,
> > +  * for example expected_ttbr0 should match smmu_info.ttbr0 which
> > +  * was read back from SMMU registers.
> > +  */
> > + phys_addr_t

[PATCH v7 4/4] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-08-20 Thread Rob Clark
From: Rob Clark 

In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +
 drivers/gpu/drm/msm/msm_gpu.h   |  8 
 drivers/gpu/drm/msm/msm_iommu.c | 22 ++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 1c6626747b98..3848b5a64351 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -864,6 +864,16 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
"WRITE" : "READ");
drm_printf(p, "  - type=%s\n", info->type);
drm_printf(p, "  - source=%s\n", info->block);
+
+   /* Information extracted from what we think are the current
+* pgtables.  Hopefully the TTBR0 matches what we've extracted
+* from the SMMU registers in smmu_info!
+*/
+   drm_puts(p, "pgtable-fault-info:\n");
+   drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
+   drm_printf(p, "  - asid: %d\n", info->asid);
+   drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
+  info->ptes[0], info->ptes[1], info->ptes[2], 
info->ptes[3]);
}
 
drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..bf2f8b2a7ccc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
if (submit) {
int i;
 
+   if (state->fault_info.ttbr0) {
+   struct msm_gpu_fault_info *info = &state->fault_info;
+   struct msm_mmu *mmu = submit->aspace->mmu;
+
+   msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
+  &info->asid);
+   msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
+   }
+
state->bos = kcalloc(submit->nr_bos,
sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..82e838ba8c80 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
int flags;
const char *type;
const char *block;
+
+   /* Information about what we think/expect is the current SMMU state,
+* for example expected_ttbr0 should match smmu_info.ttbr0 which
+* was read back from SMMU registers.
+*/
+   phys_addr_t pgtbl_ttbr0;
+   u64 ptes[4];
+   int asid;
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 2a94e82316f9..3e692818ba1f 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -195,6 +195,28 @@ struct iommu_domain_geometry 
*msm_iommu_get_geometry(struct msm_mmu *mmu)
return &iommu->domain->geometry;
 }
 
+int
+msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
ptes[4])
+{
+   struct msm_iommu_pagetable *pagetable;
+   struct arm_lpae_io_pgtable_walk_data wd = {};
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (!pagetable->pgtbl_ops->pgtable_walk)
+   return -EINVAL;
+
+   pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, &wd);
+
+   for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
+   ptes[i] = wd.ptes[i];
+
+   return 0;
+}
+
 static const struct msm_mmu_funcs pagetable_funcs = {
.map = msm_iommu_pagetable_map,
.unmap = msm_iommu_pagetable_unmap,
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881..96e509bd96a6 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -53,7 +53,8 @@ static inline void msm_mmu_set

[PATCH v7 3/4] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-08-20 Thread Rob Clark
From: Rob Clark 

Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.

Signed-off-by: Rob Clark 
---
 drivers/iommu/io-pgtable-arm.c | 25 +
 include/linux/io-pgtable.h | 15 +++
 2 files changed, 40 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 5fa1274a665a..a666ee03de47 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -759,6 +759,30 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(d.pte, data) | iova;
 }
 
+static int visit_pgtable_walk(struct io_pgtable_walk_data *walk_data, int lvl,
+ arm_lpae_iopte pte, size_t size)
+{
+   struct arm_lpae_io_pgtable_walk_data *data = walk_data->data;
+   data->ptes[data->level++] = pte;
+   return 0;
+}
+
+static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
iova,
+void *wd)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_walk_data walk_data = {
+   .data = wd,
+   .visit = visit_pgtable_walk,
+   .addr = iova,
+   .end = iova + 1,
+   };
+
+   ((struct arm_lpae_io_pgtable_walk_data *)wd)->level = 0;
+
+   return __arm_lpae_iopte_walk(data, &walk_data, data->pgd, 
data->start_level);
+}
+
 static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
struct io_pgtable_walk_data *walk_data,
arm_lpae_iopte *ptep, int lvl)
@@ -928,6 +952,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.unmap_pages= arm_lpae_unmap_pages,
.iova_to_phys   = arm_lpae_iova_to_phys,
.read_and_clear_dirty = arm_lpae_read_and_clear_dirty,
+   .pgtable_walk   = arm_lpae_pgtable_walk,
};
 
return data;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index f9a81761bfce..76eabd890e6a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -174,12 +174,26 @@ struct io_pgtable_cfg {
};
 };
 
+/**
+ * struct arm_lpae_io_pgtable_walk_data - information from a pgtable walk
+ *
+ * @ptes: The recorded PTE values from the walk
+ * @level:The level of the last PTE
+ *
+ * @level also specifies the last valid index in @ptes
+ */
+struct arm_lpae_io_pgtable_walk_data {
+   u64 ptes[4];
+   int level;
+};
+
 /**
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map_pages:Map a physically contiguous range of pages of the same size.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
+ * @pgtable_walk: (optional) Perform a page table walk for a given iova.
  *
  * These functions map directly onto the iommu_ops member functions with
  * the same names.
@@ -193,6 +207,7 @@ struct io_pgtable_ops {
  struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
+   int (*pgtable_walk)(struct io_pgtable_ops *ops, unsigned long iova, 
void *wd);
int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
unsigned long iova, size_t size,
unsigned long flags,
-- 
2.46.0



[PATCH v7 2/4] iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys

2024-08-20 Thread Rob Clark
From: Rob Clark 

Re-use the generic pgtable walk path.

Signed-off-by: Rob Clark 
---
 drivers/iommu/io-pgtable-arm.c | 73 +-
 1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b4bc358740e0..5fa1274a665a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -710,42 +710,6 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops 
*ops, unsigned long iov
data->start_level, ptep);
 }
 
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-unsigned long iova)
-{
-   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-   arm_lpae_iopte pte, *ptep = data->pgd;
-   int lvl = data->start_level;
-
-   do {
-   /* Valid IOPTE pointer? */
-   if (!ptep)
-   return 0;
-
-   /* Grab the IOPTE we're interested in */
-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
-   pte = READ_ONCE(*ptep);
-
-   /* Valid entry? */
-   if (!pte)
-   return 0;
-
-   /* Leaf entry? */
-   if (iopte_leaf(pte, lvl, data->iop.fmt))
-   goto found_translation;
-
-   /* Take it to the next level */
-   ptep = iopte_deref(pte, data);
-   } while (++lvl < ARM_LPAE_MAX_LEVELS);
-
-   /* Ran out of page tables to walk */
-   return 0;
-
-found_translation:
-   iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-   return iopte_to_paddr(pte, data) | iova;
-}
-
 struct io_pgtable_walk_data {
void*data;
int (*visit)(struct io_pgtable_walk_data *walk_data, int lvl,
@@ -760,6 +724,41 @@ static int __arm_lpae_iopte_walk(struct 
arm_lpae_io_pgtable *data,
 arm_lpae_iopte *ptep,
 int lvl);
 
+struct iova_to_phys_data {
+   arm_lpae_iopte pte;
+   int lvl;
+};
+
+static int visit_iova_to_phys(struct io_pgtable_walk_data *walk_data, int lvl,
+ arm_lpae_iopte pte, size_t size)
+{
+   struct iova_to_phys_data *data = walk_data->data;
+   data->pte = pte;
+   data->lvl = lvl;
+   return 0;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+unsigned long iova)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct iova_to_phys_data d;
+   struct io_pgtable_walk_data walk_data = {
+   .data = &d,
+   .visit = visit_iova_to_phys,
+   .addr = iova,
+   .end = iova + 1,
+   };
+   int ret;
+
+   ret = __arm_lpae_iopte_walk(data, &walk_data, data->pgd, 
data->start_level);
+   if (ret)
+   return 0;
+
+   iova &= (ARM_LPAE_BLOCK_SIZE(d.lvl, data) - 1);
+   return iopte_to_paddr(d.pte, data) | iova;
+}
+
 static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
struct io_pgtable_walk_data *walk_data,
arm_lpae_iopte *ptep, int lvl)
@@ -776,7 +775,7 @@ static int io_pgtable_visit(struct arm_lpae_io_pgtable 
*data,
return 0;
}
 
-   if (WARN_ON(!iopte_table(pte, lvl)))
+   if (WARN_ON(!iopte_table(pte, lvl) && !selftest_running))
return -EINVAL;
 
ptep = iopte_deref(pte, data);
-- 
2.46.0



[PATCH v7 1/4] iommu/io-pgtable-arm: Make pgtable walker more generic

2024-08-20 Thread Rob Clark
From: Rob Clark 

We can re-use this basic pgtable walk logic in a few places.

Signed-off-by: Rob Clark 
---
 drivers/iommu/io-pgtable-arm.c | 59 +-
 1 file changed, 36 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5d9fd1f45bf..b4bc358740e0 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -747,33 +747,31 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
 }
 
 struct io_pgtable_walk_data {
-   struct iommu_dirty_bitmap   *dirty;
+   void*data;
+   int (*visit)(struct io_pgtable_walk_data *walk_data, int lvl,
+arm_lpae_iopte pte, size_t size);
unsigned long   flags;
u64 addr;
const u64   end;
 };
 
-static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data,
-  struct io_pgtable_walk_data *walk_data,
-  arm_lpae_iopte *ptep,
-  int lvl);
+static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
+struct io_pgtable_walk_data *walk_data,
+arm_lpae_iopte *ptep,
+int lvl);
 
-static int io_pgtable_visit_dirty(struct arm_lpae_io_pgtable *data,
- struct io_pgtable_walk_data *walk_data,
- arm_lpae_iopte *ptep, int lvl)
+static int io_pgtable_visit(struct arm_lpae_io_pgtable *data,
+   struct io_pgtable_walk_data *walk_data,
+   arm_lpae_iopte *ptep, int lvl)
 {
struct io_pgtable *iop = &data->iop;
arm_lpae_iopte pte = READ_ONCE(*ptep);
 
if (iopte_leaf(pte, lvl, iop->fmt)) {
size_t size = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-   if (iopte_writeable_dirty(pte)) {
-   iommu_dirty_bitmap_record(walk_data->dirty,
- walk_data->addr, size);
-   if (!(walk_data->flags & IOMMU_DIRTY_NO_CLEAR))
-   iopte_set_writeable_clean(ptep);
-   }
+   int ret = walk_data->visit(walk_data, lvl, pte, size);
+   if (ret)
+   return ret;
walk_data->addr += size;
return 0;
}
@@ -782,13 +780,13 @@ static int io_pgtable_visit_dirty(struct 
arm_lpae_io_pgtable *data,
return -EINVAL;
 
ptep = iopte_deref(pte, data);
-   return __arm_lpae_iopte_walk_dirty(data, walk_data, ptep, lvl + 1);
+   return __arm_lpae_iopte_walk(data, walk_data, ptep, lvl + 1);
 }
 
-static int __arm_lpae_iopte_walk_dirty(struct arm_lpae_io_pgtable *data,
-  struct io_pgtable_walk_data *walk_data,
-  arm_lpae_iopte *ptep,
-  int lvl)
+static int __arm_lpae_iopte_walk(struct arm_lpae_io_pgtable *data,
+struct io_pgtable_walk_data *walk_data,
+arm_lpae_iopte *ptep,
+int lvl)
 {
u32 idx;
int max_entries, ret;
@@ -803,7 +801,7 @@ static int __arm_lpae_iopte_walk_dirty(struct 
arm_lpae_io_pgtable *data,
 
for (idx = ARM_LPAE_LVL_IDX(walk_data->addr, lvl, data);
 (idx < max_entries) && (walk_data->addr < walk_data->end); ++idx) {
-   ret = io_pgtable_visit_dirty(data, walk_data, ptep + idx, lvl);
+   ret = io_pgtable_visit(data, walk_data, ptep + idx, lvl);
if (ret)
return ret;
}
@@ -811,6 +809,20 @@ static int __arm_lpae_iopte_walk_dirty(struct 
arm_lpae_io_pgtable *data,
return 0;
 }
 
+static int visit_dirty(struct io_pgtable_walk_data *walk_data, int lvl,
+  arm_lpae_iopte pte, size_t size)
+{
+   struct iommu_dirty_bitmap *dirty = walk_data->data;
+
+   if (iopte_writeable_dirty(pte)) {
+   iommu_dirty_bitmap_record(dirty, walk_data->addr, size);
+   if (!(walk_data->flags & IOMMU_DIRTY_NO_CLEAR))
+   iopte_set_writeable_clean(&pte);
+   }
+
+   return 0;
+}
+
 static int arm_lpae_read_and_clear_dirty(struct io_pgtable_ops *ops,
 unsigned long iova, size_t size,
 unsigned long flags,
@@ -819,7 +831,8 @@ static int arm_lpae_read_and_clear_dirty(struct 
io_pgtable_ops *ops,
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable_cfg *cfg = &data->iop.cfg;

[PATCH v7 0/4] io-pgtable-arm + drm/msm: Extend iova fault debugging

2024-08-20 Thread Rob Clark
From: Rob Clark 

This series extends io-pgtable-arm with a method to retrieve the page
table entries traversed in the process of address translation, and then
beefs up drm/msm gpu devcore dump to include this (and additional info)
in the devcore dump.

This is a respin of https://patchwork.freedesktop.org/series/94968/
(minus a patch that was already merged)

v2: Fix an armv7/32b build error in the last patch
v3: Incorperate Will Deacon's suggestion to make the interface
callback based.
v4: Actually wire up the callback
v5: Drop the callback approach
v6: Make walk-data struct pgtable specific and rename
io_pgtable_walk_data to arm_lpae_io_pgtable_walk_data
v7: Re-use the pgtable walker added for arm_lpae_read_and_clear_dirty()

Rob Clark (4):
  iommu/io-pgtable-arm: Make pgtable walker more generic
  iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  drm/msm: Extend gpu devcore dumps with pgtbl info

 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |   9 ++
 drivers/gpu/drm/msm/msm_gpu.h   |   8 ++
 drivers/gpu/drm/msm/msm_iommu.c |  22 
 drivers/gpu/drm/msm/msm_mmu.h   |   3 +-
 drivers/iommu/io-pgtable-arm.c  | 147 +++-
 include/linux/io-pgtable.h  |  15 +++
 7 files changed, 158 insertions(+), 56 deletions(-)

-- 
2.46.0



[pull] drm/msm: drm-msm-fixes-2024-08-19 for v6.11-rc5

2024-08-19 Thread Rob Clark
Hi Dave,

A few fixes for v6.11, see description below.

The following changes since commit fe34394ecdad459d2d7b1f30e4a39ac27fcd77f8:

  dt-bindings: display/msm: dsi-controller-main: Add SM7150
(2024-07-03 05:57:35 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-fixes-2024-08-19

for you to fetch changes up to 624ab9cde26a9f150b4fd268b0f3dae3184dc40c:

  drm/msm/adreno: Fix error return if missing firmware-name
(2024-08-15 10:12:07 -0700)


Fixes for v6.11-rc5

1) Fixes from the virtual plane series, namely
   - fix the list of formats for QCM2290 since it has no YUV support
   - minor fix in dpu_plane_atomic_check_pipe() to check only for csc and
 not csc and scaler while allowing yuv formats
   - take rotation into account while allocating virtual planes

2) Fix to cleanup FB if dpu_format_populate_layout() fails. This fixes the
   warning splat during DRM file closure

3) Fix to reset the phy link params before re-starting link training. This
   fixes the 100% link training failure when someone starts modetest while
   cable is connected

4) Long pending fix to fix a visual corruption seen for 4k modes. Root-cause
   was we cannot support 4k@30 with 30bpp with 2 lanes so this is a critical
   fix to use 24bpp for such cases

5) Fix to move dpu encoder's connector assignment to atomic_enable(). This
   fixes the NULL ptr crash for cases when there is an atomic_enable()
   without atomic_modeset() after atomic_disable() . This happens for
   connectors_changed case of crtc. It fixes a NULL ptr crash reported
   during hotplug.

6) Fix to simplify DPU's debug macros without which dynamic debug does not
   work as expected

7) Fix the highest bank bit setting for sc7180

8) adreno: fix error return if missing firmware-name


Abhinav Kumar (4):
  drm/msm/dp: fix the max supported bpp logic
  drm/msm/dpu: move dpu_encoder's connector assignment to atomic_enable()
  drm/msm/dp: reset the link phy params before link training
  drm/msm: fix the highest_bank_bit for sc7180

Dmitry Baryshkov (5):
  drm/msm/dpu: don't play tricks with debug macros
  drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails
  drm/msm/dpu: limit QCM2290 to RGB formats only
  drm/msm/dpu: relax YUV requirements
  drm/msm/dpu: take plane rotation into account for wide planes

Rob Clark (1):
  drm/msm/adreno: Fix error return if missing firmware-name

 drivers/gpu/drm/msm/adreno/adreno_gpu.c|  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c|  4 ++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |  4 ++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h| 14 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c  | 20 +---
 drivers/gpu/drm/msm/dp/dp_ctrl.c   |  2 ++
 drivers/gpu/drm/msm/dp/dp_panel.c  | 19 ++-
 drivers/gpu/drm/msm/msm_mdss.c |  2 +-
 8 files changed, 37 insertions(+), 30 deletions(-)


Re: [PATCH 0/7] Preemption support for A7XX

2024-08-16 Thread Rob Clark
On Thu, Aug 15, 2024 at 11:27 AM Antonino Maniscalco
 wrote:
>
> This series implements preemption for A7XX targets, which allows the GPU to
> switch to an higher priority ring when work is pushed to it, reducing latency
> for high priority submissions.
>
> This series enables L1 preemption with skip_save_restore which requires
> the following userspace patches to function:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30544
>
> A flag is added to `msm_gem_submit` to only allow submissions from compatible
> userspace to be preempted, therefore maintaining compatibility.

I guess this last para is from an earlier iteration of this series?
Looks like instead you are making this a submitqueue flag (which is an
approach that I prefer)

BR,
-R

> Some commits from this series are based on a previous series to enable
> preemption on A6XX targets:
>
> https://lkml.kernel.org/1520489185-21828-1-git-send-email-smase...@codeaurora.org
>
> Signed-off-by: Antonino Maniscalco 
> ---
> Antonino Maniscalco (7):
>   drm/msm: Fix bv_fence being used as bv_rptr
>   drm/msm: Add submitqueue setup and close
>   drm/msm: Add a `preempt_record_size` field
>   drm/msm/A6xx: Implement preemption for A7XX targets
>   drm/msm/A6xx: Add traces for preemption
>   drm/msm/A6XX: Add a flag to allow preemption to submitqueue_create
>   drm/msm/A6xx: Enable preemption for A7xx targets
>
>  drivers/gpu/drm/msm/Makefile  |   1 +
>  drivers/gpu/drm/msm/adreno/a6xx_catalog.c |   3 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 339 ++-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 168 
>  drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 441 
> ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   1 +
>  drivers/gpu/drm/msm/msm_gpu.h |   7 +
>  drivers/gpu/drm/msm/msm_gpu_trace.h   |  28 ++
>  drivers/gpu/drm/msm/msm_ringbuffer.h  |   8 +
>  drivers/gpu/drm/msm/msm_submitqueue.c |  10 +
>  include/uapi/drm/msm_drm.h|   5 +-
>  11 files changed, 995 insertions(+), 16 deletions(-)
> ---
> base-commit: 7c626ce4bae1ac14f60076d00eafe71af30450ba
> change-id: 20240815-preemption-a750-t-fcee9a844b39
>
> Best regards,
> --
> Antonino Maniscalco 
>


Re: [PATCH] drm/msm: fix the highest_bank_bit for sc7180

2024-08-12 Thread Rob Clark
On Thu, Aug 8, 2024 at 4:52 PM Abhinav Kumar  wrote:
>
> sc7180 programs the ubwc settings as 0x1e as that would mean a
> highest bank bit of 14 which matches what the GPU sets as well.
>
> However, the highest_bank_bit field of the msm_mdss_data which is
> being used to program the SSPP's fetch configuration is programmed
> to a highest bank bit of 16 as 0x3 translates to 16 and not 14.
>
> Fix the highest bank bit field used for the SSPP to match the mdss
> and gpu settings.
>
> Fixes: 6f410b246209 ("drm/msm/mdss: populate missing data")
> Signed-off-by: Abhinav Kumar 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/msm/msm_mdss.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index d90b9471ba6f..faa88fd6eb4d 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -577,7 +577,7 @@ static const struct msm_mdss_data sc7180_data = {
> .ubwc_enc_version = UBWC_2_0,
> .ubwc_dec_version = UBWC_2_0,
> .ubwc_static = 0x1e,
> -   .highest_bank_bit = 0x3,
> +   .highest_bank_bit = 0x1,
> .reg_bus_bw = 76800,
>  };
>
> --
> 2.44.0
>


Re: [PATCH v2 1/3] drm/msm: Use a7xx family directly in gpu_state

2024-08-12 Thread Rob Clark
On Sun, Aug 11, 2024 at 11:09 PM Akhil P Oommen
 wrote:
>
> On Wed, Aug 07, 2024 at 01:34:27PM +0100, Connor Abbott wrote:
> > With a7xx, we need to import a new header for each new generation and
> > switch to a different list of registers, instead of making
> > backwards-compatible changes. Using the helpers inadvertently made a750
> > use the a740 list of registers, instead use the family directly to fix
> > this.
>
> This won't scale. What about other gpus in the same generation but has a
> different register list? You don't see that issue currently because
> there are no support for lower tier a7x GPUs yet.
>
> I think we should move to a "snapshot block list" like in the downstream
> driver if you want to simplify the whole logic. Otherwise, we should
> leave the chipid check as it is and just fix up a750 configurations.

Or maybe just move some of this to the device catalogue?

BR,
-R

> -Akhil
>
> >
> > Fixes: f3f8207d8aed ("drm/msm: Add devcoredump support for a750")
> > Signed-off-by: Connor Abbott 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 41 
> > ++---
> >  1 file changed, 20 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > index 77146d30bcaa..c641ee7dec78 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
> > @@ -390,18 +390,18 @@ static void a7xx_get_debugbus_blocks(struct msm_gpu 
> > *gpu,
> >   const u32 *debugbus_blocks, *gbif_debugbus_blocks;
> >   int i;
> >
> > - if (adreno_is_a730(adreno_gpu)) {
> > + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
> >   debugbus_blocks = gen7_0_0_debugbus_blocks;
> >   debugbus_blocks_count = ARRAY_SIZE(gen7_0_0_debugbus_blocks);
> >   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
> >   gbif_debugbus_blocks_count = 
> > ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
> > - } else if (adreno_is_a740_family(adreno_gpu)) {
> > + } else if (adreno_gpu->info->family == ADRENO_7XX_GEN2) {
> >   debugbus_blocks = gen7_2_0_debugbus_blocks;
> >   debugbus_blocks_count = ARRAY_SIZE(gen7_2_0_debugbus_blocks);
> >   gbif_debugbus_blocks = a7xx_gbif_debugbus_blocks;
> >   gbif_debugbus_blocks_count = 
> > ARRAY_SIZE(a7xx_gbif_debugbus_blocks);
> >   } else {
> > - BUG_ON(!adreno_is_a750(adreno_gpu));
> > + BUG_ON(adreno_gpu->info->family != ADRENO_7XX_GEN3);
> >   debugbus_blocks = gen7_9_0_debugbus_blocks;
> >   debugbus_blocks_count = ARRAY_SIZE(gen7_9_0_debugbus_blocks);
> >   gbif_debugbus_blocks = gen7_9_0_gbif_debugbus_blocks;
> > @@ -511,7 +511,7 @@ static void a6xx_get_debugbus(struct msm_gpu *gpu,
> >   const struct a6xx_debugbus_block *cx_debugbus_blocks;
> >
> >   if (adreno_is_a7xx(adreno_gpu)) {
> > - BUG_ON(!(adreno_is_a730(adreno_gpu) || 
> > adreno_is_a740_family(adreno_gpu)));
> > + BUG_ON(adreno_gpu->info->family > ADRENO_7XX_GEN3);
> >   cx_debugbus_blocks = a7xx_cx_debugbus_blocks;
> >   nr_cx_debugbus_blocks = 
> > ARRAY_SIZE(a7xx_cx_debugbus_blocks);
> >   } else {
> > @@ -662,11 +662,11 @@ static void a7xx_get_dbgahb_clusters(struct msm_gpu 
> > *gpu,
> >   const struct gen7_sptp_cluster_registers *dbgahb_clusters;
> >   unsigned dbgahb_clusters_size;
> >
> > - if (adreno_is_a730(adreno_gpu)) {
> > + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
> >   dbgahb_clusters = gen7_0_0_sptp_clusters;
> >   dbgahb_clusters_size = ARRAY_SIZE(gen7_0_0_sptp_clusters);
> >   } else {
> > - BUG_ON(!adreno_is_a740_family(adreno_gpu));
> > + BUG_ON(adreno_gpu->info->family > ADRENO_7XX_GEN3);
> >   dbgahb_clusters = gen7_2_0_sptp_clusters;
> >   dbgahb_clusters_size = ARRAY_SIZE(gen7_2_0_sptp_clusters);
> >   }
> > @@ -820,14 +820,14 @@ static void a7xx_get_clusters(struct msm_gpu *gpu,
> >   const struct gen7_cluster_registers *clusters;
> >   unsigned clusters_size;
> >
> > - if (adreno_is_a730(adreno_gpu)) {
> > + if (adreno_gpu->info->family == ADRENO_7XX_GEN1) {
> >   clusters = gen7_0_0_clusters;
> >   clusters_size = ARRAY_SIZE(gen7_0_0_clusters);
> > - } else if (adreno_is_a740_family(adreno_gpu)) {
> > + } else if (adreno_gpu->info->family == ADRENO_7XX_GEN2) {
> >   clusters = gen7_2_0_clusters;
> >   clusters_size = ARRAY_SIZE(gen7_2_0_clusters);
> >   } else {
> > - BUG_ON(!adreno_is_a750(adreno_gpu));
> > + BUG_ON(adreno_gpu->info->family != ADRENO_7XX_GEN3);
> >   clusters = gen7_9_0_clusters;
> >   clusters_size = ARRAY_SIZE(

[PATCH] drm/msm: Remove unused pm_state

2024-08-09 Thread Rob Clark
From: Rob Clark 

This was added in commit ec446d09366c ("drm/msm: call
drm_atomic_helper_suspend() and drm_atomic_helper_resume()"), but unused
since commit ca8199f13498 ("drm/msm/dpu: ensure device suspend happens
during PM sleep") which switched to drm_mode_config_helper_suspend()/
drm_mode_config_helper_resume()..

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index be016d7b4ef1..c2eb9f14323e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -215,8 +215,6 @@ struct msm_drm_private {
struct notifier_block vmap_notifier;
struct shrinker *shrinker;
 
-   struct drm_atomic_state *pm_state;
-
/**
 * hangcheck_period: For hang detection, in ms
 *
-- 
2.46.0



Re: [PATCH v2 0/3] drm/msm: Further expose UBWC tiling parameters

2024-07-31 Thread Rob Clark
On Wed, Jul 3, 2024 at 3:54 AM Connor Abbott  wrote:
>
> After testing, there are more parameters that we're programming which
> affect how UBWC tiles are laid out in memory and therefore affect
> the Mesa implementation of VK_EXT_host_image_copy [1], which includes a
> CPU implementation of tiling and detiling images. In particular we have:
>
> 1. ubwc_mode, which is used to enable level 1 bank swizzling to go back
>to UBWC 1.0 when the implementation supports UBWC 2.0. a610 sets
>this.
> 2. macrotile_mode, which we previously left as default but according to
>downstream we shouldn't for a680.
> 3. level2_swizzling_dis, which according to downstream has to be set
>differently for a663.
>
> I want as much as possible to avoid problems from people trying to
> upstream Mesa/kernel support not knowing what they're doing and blindly
> copying things, so let's make this very explicit that you must set the
> correct parameters in the kernel and then make sure that Mesa always
> gets the right parameters from the "source of truth" in the kernel by
> adding two new UAPI parameters. The Mesa MR has already been updated to
> use this if available.
>
> A secondary goal is to make the adreno settings look more like the MDSS
> settings, by combining ubwc_mode and level2_swizzling_dis into a single
> ubwc_swizzle parameter that matches the MDSS one. This will help with
> creating a single source of truth for all drivers later. The UAPI also
> matches this, and it makes the Mesa tiling and detiling implementation
> simpler/more straightforward.
>
> For more information on what all these parameters mean, see the comments
> I've added in the first commit and the giant comment in
> src/freedreno/fdl/fd6_tiled_memcpy.c I've added in [1].
>
> Testing of the Mesa MR both with and without this series is appreciated,
> there are many different SoCs out there with different UBWC
> configurations and I cannot test them all.
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26578
>
> Signed-off-by: Connor Abbott 

for the series,

Reviewed-by: Rob Clark 

But could you resend with updated a6xx.xml now that the perf cntrs
have been corrected (to avoid further churn later)

BR,
-R

> ---
> Changes in v2:
> - Move ubwc_config field descriptions to kerneldoc comments on the struct
> - Link to v1: 
> https://lore.kernel.org/r/20240702-msm-tiling-config-v1-0-adaa6a6e4...@gmail.com
>
> ---
> Connor Abbott (3):
>   drm/msm: Update a6xx register XML
>   drm/msm: Expand UBWC config setting
>   drm/msm: Expose expanded UBWC config uapi
>
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |4 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |   34 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c   |6 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   32 +-
>  drivers/gpu/drm/msm/registers/adreno/a6xx.xml | 1617 
> -
>  include/uapi/drm/msm_drm.h|2 +
>  6 files changed, 1664 insertions(+), 31 deletions(-)
> ---
> base-commit: 269b88cb92e62e52718cd44c07b7517265193157
> change-id: 20240701-msm-tiling-config-c5f222f5db1c
>
> Best regards,
> --
> Connor Abbott 
>


[PATCH v6 2/2] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-07-17 Thread Rob Clark
From: Rob Clark 

In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +
 drivers/gpu/drm/msm/msm_gpu.h   |  8 
 drivers/gpu/drm/msm/msm_iommu.c | 22 ++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 99661af8d941..422dae873b6b 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -861,6 +861,16 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
"WRITE" : "READ");
drm_printf(p, "  - type=%s\n", info->type);
drm_printf(p, "  - source=%s\n", info->block);
+
+   /* Information extracted from what we think are the current
+* pgtables.  Hopefully the TTBR0 matches what we've extracted
+* from the SMMU registers in smmu_info!
+*/
+   drm_puts(p, "pgtable-fault-info:\n");
+   drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
+   drm_printf(p, "  - asid: %d\n", info->asid);
+   drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
+  info->ptes[0], info->ptes[1], info->ptes[2], 
info->ptes[3]);
}
 
drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..bf2f8b2a7ccc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
if (submit) {
int i;
 
+   if (state->fault_info.ttbr0) {
+   struct msm_gpu_fault_info *info = &state->fault_info;
+   struct msm_mmu *mmu = submit->aspace->mmu;
+
+   msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
+  &info->asid);
+   msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
+   }
+
state->bos = kcalloc(submit->nr_bos,
sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..82e838ba8c80 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
int flags;
const char *type;
const char *block;
+
+   /* Information about what we think/expect is the current SMMU state,
+* for example expected_ttbr0 should match smmu_info.ttbr0 which
+* was read back from SMMU registers.
+*/
+   phys_addr_t pgtbl_ttbr0;
+   u64 ptes[4];
+   int asid;
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index d5512037c38b..0c35a5b597f3 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -195,6 +195,28 @@ struct iommu_domain_geometry 
*msm_iommu_get_geometry(struct msm_mmu *mmu)
return &iommu->domain->geometry;
 }
 
+int
+msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
ptes[4])
+{
+   struct msm_iommu_pagetable *pagetable;
+   struct arm_lpae_io_pgtable_walk_data wd = {};
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (!pagetable->pgtbl_ops->pgtable_walk)
+   return -EINVAL;
+
+   pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, &wd);
+
+   for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
+   ptes[i] = wd.ptes[i];
+
+   return 0;
+}
+
 static const struct msm_mmu_funcs pagetable_funcs = {
.map = msm_iommu_pagetable_map,
.unmap = msm_iommu_pagetable_unmap,
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881..96e509bd96a6 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -53,7 +53,8 @@ static inline void msm_mmu_set

[PATCH v6 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-07-17 Thread Rob Clark
From: Rob Clark 

Add an io-pgtable method to walk the pgtable returning the raw PTEs that
would be traversed for a given iova access.

Signed-off-by: Rob Clark 
Acked-by: Joerg Roedel 

---
 drivers/iommu/io-pgtable-arm.c | 36 +-
 include/linux/io-pgtable.h | 17 
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 3d23b924cec1..e70803940b46 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -690,9 +690,11 @@ static size_t arm_lpae_unmap_pages(struct io_pgtable_ops 
*ops, unsigned long iov
data->start_level, ptep);
 }
 
-static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
-unsigned long iova)
+static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops,
+unsigned long iova,
+void *_wd)
 {
+   struct arm_lpae_io_pgtable_walk_data *wd = _wd;
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
arm_lpae_iopte pte, *ptep = data->pgd;
int lvl = data->start_level;
@@ -700,7 +702,7 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
do {
/* Valid IOPTE pointer? */
if (!ptep)
-   return 0;
+   return -ENOENT;
 
/* Grab the IOPTE we're interested in */
ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
@@ -708,22 +710,37 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
 
/* Valid entry? */
if (!pte)
-   return 0;
+   return -ENOENT;
 
-   /* Leaf entry? */
+   wd->ptes[wd->level++] = pte;
+
+   /* Leaf entry?  If so, we've found the translation */
if (iopte_leaf(pte, lvl, data->iop.fmt))
-   goto found_translation;
+   return 0;
 
/* Take it to the next level */
ptep = iopte_deref(pte, data);
} while (++lvl < ARM_LPAE_MAX_LEVELS);
 
/* Ran out of page tables to walk */
-   return 0;
+   return -ENOENT;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+unsigned long iova)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct arm_lpae_io_pgtable_walk_data wd = {};
+   int ret, lvl;
+
+   ret = arm_lpae_pgtable_walk(ops, iova, &wd);
+   if (ret)
+   return 0;
+
+   lvl = wd.level + data->start_level;
 
-found_translation:
iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-   return iopte_to_paddr(pte, data) | iova;
+   return iopte_to_paddr(wd.ptes[wd.level - 1], data) | iova;
 }
 
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
@@ -804,6 +821,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.map_pages  = arm_lpae_map_pages,
.unmap_pages= arm_lpae_unmap_pages,
.iova_to_phys   = arm_lpae_iova_to_phys,
+   .pgtable_walk   = arm_lpae_pgtable_walk,
};
 
return data;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 86cf1f7ae389..df6f6e58310c 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -171,12 +171,28 @@ struct io_pgtable_cfg {
};
 };
 
+/**
+ * struct arm_lpae_io_pgtable_walk_data - information from a pgtable walk
+ *
+ * @ptes: The recorded PTE values from the walk
+ * @level:The level of the last PTE
+ *
+ * @level also specifies the last valid index in @ptes
+ */
+struct arm_lpae_io_pgtable_walk_data {
+   u64 ptes[4];
+   int level;
+};
+
 /**
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map_pages:Map a physically contiguous range of pages of the same size.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
+ * @pgtable_walk: (optional) Perform a page table walk for a given iova.  The
+ *type for the wd parameter is specific to pgtable type, as
+ *the PTE size and number of levels differs per pgtable type.
  *
  * These functions map directly onto the iommu_ops member functions with
  * the same names.
@@ -190,6 +206,7 @@ struct io_pgtable_ops {
  struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
+   int (*pgtable_walk)(struct io_pgtable_ops *ops, unsigned long iova, 
void *wd);
int (*read_a

[PATCH v6 0/2] io-pgtable-arm + drm/msm: Extend iova fault debugging

2024-07-17 Thread Rob Clark
From: Rob Clark 

This series extends io-pgtable-arm with a method to retrieve the page
table entries traversed in the process of address translation, and then
beefs up drm/msm gpu devcore dump to include this (and additional info)
in the devcore dump.

This is a respin of https://patchwork.freedesktop.org/series/94968/
(minus a patch that was already merged)

v2: Fix an armv7/32b build error in the last patch
v3: Incorperate Will Deacon's suggestion to make the interface
callback based.
v4: Actually wire up the callback
v5: Drop the callback approach
v6: Make walk-data struct pgtable specific and rename
io_pgtable_walk_data to arm_lpae_io_pgtable_walk_data

Rob Clark (2):
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  drm/msm: Extend gpu devcore dumps with pgtbl info

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 +++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +++
 drivers/gpu/drm/msm/msm_gpu.h   |  8 ++
 drivers/gpu/drm/msm/msm_iommu.c | 22 +++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 drivers/iommu/io-pgtable-arm.c  | 36 ++---
 include/linux/io-pgtable.h  | 17 
 7 files changed, 95 insertions(+), 10 deletions(-)

-- 
2.45.2



Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

2024-07-16 Thread Rob Clark
On Tue, Jul 16, 2024 at 2:45 PM Abhinav Kumar  wrote:
>
>
>
> On 7/15/2024 12:51 PM, Rob Clark wrote:
> > On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
> >  wrote:
> >>
> >> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
> >>> There is no recovery mechanism in place yet to recover from mmu
> >>> faults for DPU. We can only prevent the faults by making sure there
> >>> is no misconfiguration.
> >>>
> >>> Rate-limit the snapshot capture for mmu faults to once per
> >>> msm_kms_init_aspace() as that should be sufficient to capture
> >>> the snapshot for debugging otherwise there will be a lot of
> >>> dpu snapshots getting captured for the same fault which is
> >>> redundant and also might affect capturing even one snapshot
> >>> accurately.
> >>
> >> Please squash this into the first patch. There is no need to add code
> >> with a known defficiency.
> >>
> >> Also, is there a reason why you haven't used  ?
> >
> > So, in some ways devcoredump is ratelimited by userspace needing to
> > clear an existing devcore..
> >
>
> Yes, a new devcoredump device will not be created until the previous one
> is consumed or times out but here I am trying to limit even the DPU
> snapshot capture because DPU register space is really huge and the rate
> at which smmu faults occur is quite fast that its causing instability
> while snapshots are being captured.
>
> > What I'd suggest would be more useful is to limit the devcores to once
> > per atomic update, ie. if display state hasn't changed, maybe an
> > additional devcore isn't useful
> >
> > BR,
> > -R
> >
>
> By display state change, do you mean like the checks we have in
> drm_atomic_crtc_needs_modeset()?
>
> OR do you mean we need to cache the previous (currently picked up by hw)
> state and trigger a new devcores if the new state is different by
> comparing more things?
>
> This will help to reduce the snapshots to unique frame updates but I do
> not think it will reduce the rate enough for the case where DPU did not
> recover from the previous fault.

I was thinking the easy thing, of just resetting the counter in
msm_atomic_commit_tail().. I suppose we could be clever filter out
updates that only change scanout address.  Or hash the atomic state
and only generate devcoredumps for unique states.  But I'm not sure
how over-complicated we should make this.

BR,
-R

>
> >>
> >>>
> >>> Signed-off-by: Abhinav Kumar 
> >>> ---
> >>>   drivers/gpu/drm/msm/msm_kms.c | 6 +-
> >>>   drivers/gpu/drm/msm/msm_kms.h | 3 +++
> >>>   2 files changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
> >>> index d5d3117259cf..90a333920c01 100644
> >>> --- a/drivers/gpu/drm/msm/msm_kms.c
> >>> +++ b/drivers/gpu/drm/msm/msm_kms.c
> >>> @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned 
> >>> long iova, int flags, void
> >>>   {
> >>>struct msm_kms *kms = arg;
> >>>
> >>> - msm_disp_snapshot_state(kms->dev);
> >>> + if (!kms->fault_snapshot_capture) {
> >>> + msm_disp_snapshot_state(kms->dev);
> >>> + kms->fault_snapshot_capture++;
> >>
> >> When is it decremented?
> >>
> >>> + }
> >>>
> >>>return -ENOSYS;
> >>>   }
> >>> @@ -208,6 +211,7 @@ struct msm_gem_address_space 
> >>> *msm_kms_init_aspace(struct drm_device *dev)
> >>>mmu->funcs->destroy(mmu);
> >>>}
> >>>
> >>> + kms->fault_snapshot_capture = 0;
> >>>msm_mmu_set_fault_handler(aspace->mmu, kms, msm_kms_fault_handler);
> >>>
> >>>return aspace;
> >>> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
> >>> index 1e0c54de3716..240b39e60828 100644
> >>> --- a/drivers/gpu/drm/msm/msm_kms.h
> >>> +++ b/drivers/gpu/drm/msm/msm_kms.h
> >>> @@ -134,6 +134,9 @@ struct msm_kms {
> >>>int irq;
> >>>bool irq_requested;
> >>>
> >>> + /* rate limit the snapshot capture to once per attach */
> >>> + int fault_snapshot_capture;
> >>> +
> >>>/* mapper-id used to request GEM buffer mapped for scanout: */
> >>>struct msm_gem_address_space *aspace;
> >>>
> >>> --
> >>> 2.44.0
> >>>
> >>
> >> --
> >> With best wishes
> >> Dmitry


[PATCH] drm/msm/adreno: Fix error return if missing firmware-name

2024-07-16 Thread Rob Clark
From: Rob Clark 

-ENODEV is used to signify that there is no zap shader for the platform,
and the CPU can directly take the GPU out of secure mode.  We want to
use this return code when there is no zap-shader node.  But not when
there is, but without a firmware-name property.  This case we want to
treat as-if the needed fw is not found.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index b46e7e93b3ed..0d84be3be0b7 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -99,7 +99,7 @@ static int zap_shader_load_mdt(struct msm_gpu *gpu, const 
char *fwname,
 * was a bad idea, and is only provided for backwards
 * compatibility for older targets.
 */
-   return -ENODEV;
+   return -ENOENT;
}
 
if (IS_ERR(fw)) {
-- 
2.45.2



Re: [PATCH 1/4] drm/msm/a5xx: disable preemption in submits by default

2024-07-15 Thread Rob Clark
On Thu, Jul 11, 2024 at 3:02 AM Vladimir Lypak  wrote:
>
> Fine grain preemption (switching from/to points within submits)
> requires extra handling in command stream of those submits, especially
> when rendering with tiling (using GMEM). However this handling is
> missing at this point in mesa (and always was). For this reason we get
> random GPU faults and hangs if more than one priority level is used
> because local preemption is enabled prior to executing command stream
> from submit.
> With that said it was ahead of time to enable local preemption by
> default considering the fact that even on downstream kernel it is only
> enabled if requested via UAPI.
>
> Fixes: a7a4c19c36de ("drm/msm/a5xx: fix setting of the 
> CP_PREEMPT_ENABLE_LOCAL register")
> Signed-off-by: Vladimir Lypak 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c0b5373e90d7..6c80d3003966 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -150,9 +150,13 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
> OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
> OUT_RING(ring, 1);
>
> -   /* Enable local preemption for finegrain preemption */
> +   /*
> +* Disable local preemption by default because it requires
> +* user-space to be aware of it and provide additional handling
> +* to restore rendering state or do various flushes on switch.
> +*/
> OUT_PKT7(ring, CP_PREEMPT_ENABLE_LOCAL, 1);
> -   OUT_RING(ring, 0x1);
> +   OUT_RING(ring, 0x0);

>From a quick look at the a530 pfp fw, it looks like
CP_PREEMPT_ENABLE_LOCAL is allowed in IB1/IB2 (ie. not restricted to
kernel RB).  So we should just disable it in the kernel, and let
userspace send a CP_PREEMPT_ENABLE_LOCAL to enable local preemption.

BR,
-R

> /* Allow CP_CONTEXT_SWITCH_YIELD packets in the IB2 */
> OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
> --
> 2.45.2
>


Re: [PATCH 5/5] drm/msm/dpu: rate limit snapshot capture for mmu faults

2024-07-15 Thread Rob Clark
On Mon, Jul 1, 2024 at 12:43 PM Dmitry Baryshkov
 wrote:
>
> On Fri, Jun 28, 2024 at 02:48:47PM GMT, Abhinav Kumar wrote:
> > There is no recovery mechanism in place yet to recover from mmu
> > faults for DPU. We can only prevent the faults by making sure there
> > is no misconfiguration.
> >
> > Rate-limit the snapshot capture for mmu faults to once per
> > msm_kms_init_aspace() as that should be sufficient to capture
> > the snapshot for debugging otherwise there will be a lot of
> > dpu snapshots getting captured for the same fault which is
> > redundant and also might affect capturing even one snapshot
> > accurately.
>
> Please squash this into the first patch. There is no need to add code
> with a known defficiency.
>
> Also, is there a reason why you haven't used  ?

So, in some ways devcoredump is ratelimited by userspace needing to
clear an existing devcore..

What I'd suggest would be more useful is to limit the devcores to once
per atomic update, ie. if display state hasn't changed, maybe an
additional devcore isn't useful

BR,
-R

>
> >
> > Signed-off-by: Abhinav Kumar 
> > ---
> >  drivers/gpu/drm/msm/msm_kms.c | 6 +-
> >  drivers/gpu/drm/msm/msm_kms.h | 3 +++
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c
> > index d5d3117259cf..90a333920c01 100644
> > --- a/drivers/gpu/drm/msm/msm_kms.c
> > +++ b/drivers/gpu/drm/msm/msm_kms.c
> > @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned 
> > long iova, int flags, void
> >  {
> >   struct msm_kms *kms = arg;
> >
> > - msm_disp_snapshot_state(kms->dev);
> > + if (!kms->fault_snapshot_capture) {
> > + msm_disp_snapshot_state(kms->dev);
> > + kms->fault_snapshot_capture++;
>
> When is it decremented?
>
> > + }
> >
> >   return -ENOSYS;
> >  }
> > @@ -208,6 +211,7 @@ struct msm_gem_address_space 
> > *msm_kms_init_aspace(struct drm_device *dev)
> >   mmu->funcs->destroy(mmu);
> >   }
> >
> > + kms->fault_snapshot_capture = 0;
> >   msm_mmu_set_fault_handler(aspace->mmu, kms, msm_kms_fault_handler);
> >
> >   return aspace;
> > diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
> > index 1e0c54de3716..240b39e60828 100644
> > --- a/drivers/gpu/drm/msm/msm_kms.h
> > +++ b/drivers/gpu/drm/msm/msm_kms.h
> > @@ -134,6 +134,9 @@ struct msm_kms {
> >   int irq;
> >   bool irq_requested;
> >
> > + /* rate limit the snapshot capture to once per attach */
> > + int fault_snapshot_capture;
> > +
> >   /* mapper-id used to request GEM buffer mapped for scanout: */
> >   struct msm_gem_address_space *aspace;
> >
> > --
> > 2.44.0
> >
>
> --
> With best wishes
> Dmitry


[pull] drm/msm: drm-msm-next-2024-07-04 for v6.11

2024-07-04 Thread Rob Clark
Hi Dave, Sima,

This is the main pull for v6.11.  It includes a merge of the immutable
tag qcom/20240430-a750-raytracing-v3-2-7f57c5ac0...@gmail.com to pick
up dependencies for raytracing and SMEM speedbin.

Further description below.

The following changes since commit 92815da4576a495cb6362cdfb132152fcccd:

  Merge remote-tracking branch 'drm-misc/drm-misc-next' into HEAD
(2024-06-12 16:52:39 +0300)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/msm.git tags/drm-msm-next-2024-07-04

for you to fetch changes up to fe34394ecdad459d2d7b1f30e4a39ac27fcd77f8:

  dt-bindings: display/msm: dsi-controller-main: Add SM7150
(2024-07-03 05:57:35 -0700)


Updates for v6.11

Core:
- SM7150 support

DPU:
- SM7150 support
- Fix DSC support for DSI panels in video mode
- Fixed TE vsync source support for DSI command-mode panels
- Fix for devices without UBWC in the display controller (ie.
  QCM2290)

DSI:
- Remove unused register-writing wrappers
- Fix DSC support for panels in video mode
- Add support for parsing TE vsync source
- Add support for MSM8937 (28nm DSI PHY)

MDP5:
- Add support for MSM8937
- Fix configuration for MSM8953

GPU:
- Split giant device table into per-gen "hw catalog" similar to
  what is done on the display side of the driver
- Fix a702 UBWC mode
- Fix unused variably warnings
- GPU memory traces
- Add param for userspace to know if raytracing is supported
- Memory barrier cleanup and GBIF unhalt fix
- X185 support (aka gpu in X1 laptop chips)
- a505 support
- fixes


Abhinav Kumar (3):
  drm/msm/a6xx: use __unused__ to fix compiler warnings for gen7_* includes
  drm/msm/dpu: drop validity checks for clear_pending_flush() ctl op
  drm/msm/dpu: check ubwc support before adding compressed formats

Akhil P Oommen (3):
  dt-bindings: display/msm/gmu: Add Adreno X185 GMU
  drm/msm/adreno: Add support for X185 GPU
  drm/msm/adreno: Introduce gmu_chipid for a740 & a750

Barnabás Czémán (4):
  drm/msm/dpu: guard ctl irq callback register/unregister
  drm/msm/mdp5: Remove MDP_CAP_SRC_SPLIT from msm8x53_config
  dt-bindings: display/msm: qcom, mdp5: Add msm8937 compatible
  dt-bindings: msm: dsi-phy-28nm: Document msm8937 compatible

Connor Abbott (5):
  firmware: qcom: scm: Add gpu_init_regs call
  firmware: qcom_scm: Add gpu_init_regs call
  drm/msm/a7xx: Initialize a750 "software fuse"
  drm/msm: Add MSM_PARAM_RAYTRACING uapi
  drm/msm/a7xx: Add missing register writes from downstream

Daniil Titov (3):
  drm/msm/mdp5: Add MDP5 configuration for MSM8937
  drm/msm/dsi: Add phy configuration for MSM8937
  drm/msm/adreno: Add support for Adreno 505 GPU

Danila Tikhonov (5):
  dt-bindings: display/msm: Add SM7150 DPU
  drm/msm/dpu: Add SM7150 support
  dt-bindings: display/msm: Add SM7150 MDSS
  drm/msm: mdss: Add SM7150 support
  dt-bindings: display/msm: dsi-controller-main: Add SM7150

Dmitry Baryshkov (9):
  dt-bindings: display/msm/dsi: allow specifying TE source
  drm/msm/dpu: convert vsync source defines to the enum
  drm/msm/dsi: drop unused GPIOs handling
  drm/msm/dpu: pull the is_cmd_mode out of
_dpu_encoder_update_vsync_source()
  drm/msm/dpu: rework vsync_source handling
  drm/msm/dsi: parse vsync source from device tree
  drm/msm/dpu: support setting the TE source
  drm/msm/dpu: rename dpu_hw_setup_vsync_source functions
  drm/msm/dpu: remove CRTC frame event callback registration

Jani Nikula (1):
  drm/msm/dp: switch to struct drm_edid

Jonathan Marek (4):
  drm/msm/dpu: fix video mode DSC for DSI
  drm/msm/dsi: set video mode widebus enable bit when widebus is enabled
  drm/msm/dsi: set VIDEO_COMPRESSION_MODE_CTRL_WC
  drm/msm/dsi: add a comment to explain pkt_per_line encoding

Jun Nie (2):
  drm/msm/dpu: adjust data width for widen bus case
  drm/msm/dpu: enable compression bit in cfg2 for DSC

Konrad Dybcio (7):
  drm/msm/a6xx: Fix A702 UBWC mode
  soc: qcom: Move some socinfo defines to the header
  soc: qcom: smem: Add a feature code getter
  drm/msm/dsi: Remove dsi_phy_read/write()
  drm/msm/dsi: Remove dsi_phy_write_[un]delay()
  drm/msm/adreno: De-spaghettify the use of memory barriers
  Revert "drm/msm/a6xx: Poll for GBIF unhalt status in hw_init"

Krzysztof Kozlowski (4):
  dt-bindings: display/msm/gpu: constrain clocks in top-level
  dt-bindings: display/msm/gpu: define reg-names in top-level
  dt-bindings: display/msm/gpu: simplify compatible regex
  dt-bindings: display/msm/gpu: fix the schema being not applied

Neil Armstrong (2):
  drm/msm/adreno: fix a7xx gpu init
  drm/msm/adreno: fix a743 and a740 cx mem init

Rob Clark (11):
  drm/msm/adreno: Split up gi

Re: [PATCH 2/3] drm/msm: Expand UBWC config setting

2024-07-02 Thread Rob Clark
On Tue, Jul 2, 2024 at 5:56 AM Connor Abbott  wrote:
>
> According to downstream we should be setting RBBM_NC_MODE_CNTL to a
> non-default value on a663 and a680, we don't support a663 and on a680
> we're leaving it at the wrong (suboptimal) value. Just set it on all
> GPUs. Similarly, plumb through level2_swizzling_dis which will be
> necessary on a663.
>
> ubwc_mode is expanded and renamed to ubwc_swizzle to match the name on
> the display side. Similarly macrotile_mode should match the display
> side.
>
> Signed-off-by: Connor Abbott 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  4 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 36 
> -
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 ++-
>  3 files changed, 33 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c003f970189b..33b0f607f913 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1788,5 +1788,9 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
> else
> adreno_gpu->ubwc_config.highest_bank_bit = 14;
>
> +   /* a5xx only supports UBWC 1.0, these are not configurable */
> +   adreno_gpu->ubwc_config.macrotile_mode = 0;
> +   adreno_gpu->ubwc_config.ubwc_swizzle = 0x7;
> +
> return gpu;
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c98cdb1e9326..7a3564dd7941 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -499,8 +499,17 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
> gpu->ubwc_config.uavflagprd_inv = 0;
> /* Whether the minimum access length is 64 bits */
> gpu->ubwc_config.min_acc_len = 0;
> -   /* Entirely magic, per-GPU-gen value */
> -   gpu->ubwc_config.ubwc_mode = 0;
> +   /* Whether to enable level 1, 2 & 3 bank swizzling.
> +* UBWC 1.0 always enables all three levels.
> +* UBWC 2.0 removes level 1 bank swizzling, leaving levels 2 & 3.
> +* UBWC 4.0 adds the optional ability to disable levels 2 & 3.

I guess this is a bitmask for BIT(level_n)?

> +*/
> +   gpu->ubwc_config.ubwc_swizzle = 0x6;
> +   /* Whether to use 4-channel macrotiling mode or the newer 8-channel
> +* macrotiling mode introduced in UBWC 3.1. 0 is 4-channel and 1 is
> +* 8-channel.
> +*/

Can we add these comments as kerneldoc comments in the ubwc_config
struct?  That would be a more natural place for eventually moving ubwc
config to a central systemwide table (and perhaps finally properly
dealing with the setting differences for DDR vs LPDDR)

BR,
-R

> +   gpu->ubwc_config.macrotile_mode = 0;
> /*
>  * The Highest Bank Bit value represents the bit of the highest DDR 
> bank.
>  * This should ideally use DRAM type detection.
> @@ -510,7 +519,7 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
> if (adreno_is_a610(gpu)) {
> gpu->ubwc_config.highest_bank_bit = 13;
> gpu->ubwc_config.min_acc_len = 1;
> -   gpu->ubwc_config.ubwc_mode = 1;
> +   gpu->ubwc_config.ubwc_swizzle = 0x7;
> }
>
> if (adreno_is_a618(gpu))
> @@ -536,6 +545,7 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
> gpu->ubwc_config.amsbc = 1;
> gpu->ubwc_config.rgb565_predicator = 1;
> gpu->ubwc_config.uavflagprd_inv = 2;
> +   gpu->ubwc_config.macrotile_mode = 1;
> }
>
> if (adreno_is_7c3(gpu)) {
> @@ -543,12 +553,12 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu 
> *gpu)
> gpu->ubwc_config.amsbc = 1;
> gpu->ubwc_config.rgb565_predicator = 1;
> gpu->ubwc_config.uavflagprd_inv = 2;
> +   gpu->ubwc_config.macrotile_mode = 1;
> }
>
> if (adreno_is_a702(gpu)) {
> gpu->ubwc_config.highest_bank_bit = 14;
> gpu->ubwc_config.min_acc_len = 1;
> -   gpu->ubwc_config.ubwc_mode = 0;
> }
>  }
>
> @@ -564,21 +574,26 @@ static void a6xx_set_ubwc_config(struct msm_gpu *gpu)
> u32 hbb = adreno_gpu->ubwc_config.highest_bank_bit - 13;
> u32 hbb_hi = hbb >> 2;
> u32 hbb_lo = hbb & 3;
> +   u32 ubwc_mode = adreno_gpu->ubwc_config.ubwc_swizzle & 1;
> +   u32 level2_swizzling_dis = !(adreno_gpu->ubwc_config.ubwc_swizzle & 
> 2);
>
> gpu_write(gpu, REG_A6XX_RB_NC_MODE_CNTL,
> + level2_swizzling_dis << 12 |
>   adreno_gpu->ubwc_config.rgb565_predicator << 11 |
>   hbb_hi << 10 | adreno_gpu->ubwc_config.amsbc << 4 |
>   adreno_gpu->ubwc_config.min_acc_len << 3 |
> - hbb_lo << 1 | adreno_gpu->ubwc_config.ubwc_mode);
> +

[PATCH] drm/msm/gem: Add missing rcu_dereference()

2024-07-01 Thread Rob Clark
From: Rob Clark 

Fixes a sparse "different address spaces" error.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202406280050.syeewlte-...@intel.com/
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index ddc6a131c041..ebc9ba66efb8 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -48,7 +48,7 @@ static void update_ctx_mem(struct drm_file *file, ssize_t 
size)
uint64_t ctx_mem = atomic64_add_return(size, &ctx->ctx_mem);
 
rcu_read_lock(); /* Locks file->pid! */
-   trace_gpu_mem_total(0, pid_nr(file->pid), ctx_mem);
+   trace_gpu_mem_total(0, pid_nr(rcu_dereference(file->pid)), ctx_mem);
rcu_read_unlock();
 
 }
-- 
2.45.2



[PATCH] drm/msm/a6xx: Add missing __always_unused

2024-07-01 Thread Rob Clark
From: Rob Clark 

The __build_asserts() function only exists to have a place to put
build-time asserts.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202407010401.rfunrbsx-...@intel.com/
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index bdafca7267a8..68ba9aed5506 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -1249,7 +1249,7 @@ static const struct adreno_info a7xx_gpus[] = {
 };
 DECLARE_ADRENO_GPULIST(a7xx);
 
-static inline void __build_asserts(void)
+static inline __always_unused void __build_asserts(void)
 {
BUILD_BUG_ON(a630_protect.count > a630_protect.count_max);
BUILD_BUG_ON(a650_protect.count > a650_protect.count_max);
-- 
2.45.2



Re: [PATCH 3/5] drm/msm/iommu: introduce msm_iommu_disp_new() for msm_kms

2024-07-01 Thread Rob Clark
On Fri, Jun 28, 2024 at 2:49 PM Abhinav Kumar  wrote:
>
> Introduce a new API msm_iommu_disp_new() for display use-cases.
>
> Signed-off-by: Abhinav Kumar 
> ---
>  drivers/gpu/drm/msm/msm_iommu.c | 26 ++
>  drivers/gpu/drm/msm/msm_mmu.h   |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index a79cd18bc4c9..0420bdc4a224 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -343,6 +343,17 @@ static int msm_gpu_fault_handler(struct iommu_domain 
> *domain, struct device *dev
> return 0;
>  }
>
> +static int msm_disp_fault_handler(struct iommu_domain *domain, struct device 
> *dev,
> + unsigned long iova, int flags, void *arg)
> +{
> +   struct msm_iommu *iommu = arg;
> +
> +   if (iommu->base.handler)
> +   return iommu->base.handler(iommu->base.arg, iova, flags, 
> NULL);
> +
> +   return -ENOSYS;
> +}
> +
>  static void msm_iommu_resume_translation(struct msm_mmu *mmu)
>  {
> struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev);
> @@ -434,6 +445,21 @@ struct msm_mmu *msm_iommu_new(struct device *dev, 
> unsigned long quirks)
> return &iommu->base;
>  }
>
> +struct msm_mmu *msm_iommu_disp_new(struct device *dev, unsigned long quirks)
> +{
> +   struct msm_iommu *iommu;
> +   struct msm_mmu *mmu;
> +
> +   mmu = msm_iommu_new(dev, quirks);
> +   if (IS_ERR_OR_NULL(mmu))
> +   return mmu;
> +
> +   iommu = to_msm_iommu(mmu);
> +   iommu_set_fault_handler(iommu->domain, msm_disp_fault_handler, iommu);
> +
> +   return mmu;
> +}

Hmm, are we using dev drvdata for the display pdev?  If
dev_get_drvdata() returns NULL for display pdev, we could get away
without having a different fault handler.

BR,
-R

> +
>  struct msm_mmu *msm_iommu_gpu_new(struct device *dev, struct msm_gpu *gpu, 
> unsigned long quirks)
>  {
> struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(dev);
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 88af4f490881..730458d08d6b 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -42,6 +42,7 @@ static inline void msm_mmu_init(struct msm_mmu *mmu, struct 
> device *dev,
>
>  struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks);
>  struct msm_mmu *msm_iommu_gpu_new(struct device *dev, struct msm_gpu *gpu, 
> unsigned long quirks);
> +struct msm_mmu *msm_iommu_disp_new(struct device *dev, unsigned long quirks);
>
>  static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
> int (*handler)(void *arg, unsigned long iova, int flags, void 
> *data))
> --
> 2.44.0
>


Re: [PATCH v4 1/5] drm/msm/adreno: Split up giant device table

2024-06-29 Thread Rob Clark
On Fri, Jun 28, 2024 at 6:58 PM Akhil P Oommen  wrote:
>
> On Tue, Jun 18, 2024 at 09:42:47AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Split into a separate table per generation, in preparation to move each
> > gen's device table to it's own file.
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Dmitry Baryshkov 
> > Reviewed-by: Konrad Dybcio 
> > ---
> >  drivers/gpu/drm/msm/adreno/adreno_device.c | 67 +-
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h| 10 
> >  2 files changed, 63 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index c3703a51287b..a57659eaddc2 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > @@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
> >  MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place 
> > of IOMMU");
> >  module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
> >
> > -static const struct adreno_info gpulist[] = {
> > +static const struct adreno_info a2xx_gpus[] = {
> >   {
> >   .chip_ids = ADRENO_CHIP_IDS(0x0200),
> >   .family = ADRENO_2XX_GEN1,
> > @@ -54,7 +54,12 @@ static const struct adreno_info gpulist[] = {
> >   .gmem  = SZ_512K,
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a2xx_gpu_init,
> > - }, {
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a2xx);
> > +
> > +static const struct adreno_info a3xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
> >   .family = ADRENO_3XX,
> >   .fw = {
> > @@ -116,7 +121,12 @@ static const struct adreno_info gpulist[] = {
> >   .gmem  = SZ_1M,
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a3xx_gpu_init,
> > - }, {
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a3xx);
> > +
> > +static const struct adreno_info a4xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x04000500),
> >   .family = ADRENO_4XX,
> >   .revn  = 405,
> > @@ -149,7 +159,12 @@ static const struct adreno_info gpulist[] = {
> >   .gmem  = (SZ_1M + SZ_512K),
> >   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> >   .init  = a4xx_gpu_init,
> > - }, {
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a4xx);
> > +
> > +static const struct adreno_info a5xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x05000600),
> >   .family = ADRENO_5XX,
> >   .revn = 506,
> > @@ -274,7 +289,12 @@ static const struct adreno_info gpulist[] = {
> >   .quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
> >   .init = a5xx_gpu_init,
> >   .zapfw = "a540_zap.mdt",
> > - }, {
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a5xx);
> > +
> > +static const struct adreno_info a6xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x0601),
> >   .family = ADRENO_6XX_GEN1,
> >   .revn = 610,
> > @@ -520,7 +540,12 @@ static const struct adreno_info gpulist[] = {
> >   .zapfw = "a690_zap.mdt",
> >   .hwcg = a690_hwcg,
> >   .address_space_size = SZ_16G,
> > - }, {
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a6xx);
> > +
> > +static const struct adreno_info a7xx_gpus[] = {
> > + {
> >   .chip_ids = ADRENO_CHIP_IDS(0x07000200),
> >   .family = ADRENO_6XX_GEN1, /* NOT a mistake! */
> >   .fw = {
> > @@ -582,7 +607,17 @@ static const struct adreno_info gpulist[] = {
> >   .init = a6xx_gpu_init,
> >   .zapfw = "gen70900_zap.mbn",
> >   .address_space_size = SZ_16G,
> > - },
> > + }
> > +};
> > +DECLARE_ADRENO_GPULIST(a7xx);
> > +
> > +static const struct adreno_gpulist *gpulists[] = {
> > + &a2xx_gpulist,
> > + &a3xx_gpulist,
> > + &a4xx_gpulist,
> > + &a5xx_gpulist,
> > + &a6xx_gpulist,
> > + &a6xx_gpulist,
>
> Typo. a6xx_gpulist -> a7

Re: [RFC PATCH] drm/msm/dpu: check ubwc support before adding compressed formats

2024-06-27 Thread Rob Clark
On Thu, Jun 27, 2024 at 1:53 PM Abhinav Kumar  wrote:
>
> On QCM2290 chipset DPU does not support UBWC.
>
> Add a dpu cap to indicate this and do not expose compressed formats
> in this case.
>
> Signed-off-by: Abhinav Kumar 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h | 1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h  | 2 ++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c   | 5 -
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h 
> b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h
> index 3cbb2fe8aba2..6671f798bacc 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_6_5_qcm2290.h
> @@ -12,6 +12,7 @@ static const struct dpu_caps qcm2290_dpu_caps = {
> .max_mixer_blendstages = 0x4,
> .has_dim_layer = true,
> .has_idle_pc = true,
> +   .has_no_ubwc = true,
> .max_linewidth = 2160,
> .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
>  };
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> index af2ead1c4886..676d0a283922 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> @@ -342,6 +342,7 @@ struct dpu_rotation_cfg {
>   * @has_dim_layer  dim layer feature status
>   * @has_idle_pcindicate if idle power collapse feature is supported
>   * @has_3d_merge   indicate if 3D merge is supported
> + * @has_no_ubwcindicate if UBWC is supported
>   * @max_linewidth  max linewidth for sspp
>   * @pixel_ram_size size of latency hiding and de-tiling buffer in bytes
>   * @max_hdeci_exp  max horizontal decimation supported (max is 2^value)
> @@ -354,6 +355,7 @@ struct dpu_caps {
> bool has_dim_layer;
> bool has_idle_pc;
> bool has_3d_merge;
> +   bool has_no_ubwc;

has_no_ubwc sounds kinda awkward compared to has_ubwc.  But I guess
you wanted to avoid all that churn..

How about instead, if msm_mdss_data::ubwc_{enc,dec}_version are zero,
then we know there is no ubwc support in the display.

BR,
-R


> /* SSPP limits */
> u32 max_linewidth;
> u32 pixel_ram_size;
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> index 6000e84598c2..31fe0fc4c02e 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
> @@ -1341,10 +1341,13 @@ void dpu_plane_danger_signal_ctrl(struct drm_plane 
> *plane, bool enable)
>  static bool dpu_plane_format_mod_supported(struct drm_plane *plane,
> uint32_t format, uint64_t modifier)
>  {
> +   struct dpu_plane *pdpu = to_dpu_plane(plane);
> +   const struct dpu_caps *caps = pdpu->catalog->caps;
> +
> if (modifier == DRM_FORMAT_MOD_LINEAR)
> return true;
>
> -   if (modifier == DRM_FORMAT_MOD_QCOM_COMPRESSED)
> +   if (modifier == DRM_FORMAT_MOD_QCOM_COMPRESSED && !caps->has_no_ubwc)
> return dpu_find_format(format, 
> qcom_compressed_supported_formats,
> 
> ARRAY_SIZE(qcom_compressed_supported_formats));
>
> --
> 2.44.0
>


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Rob Clark
On Wed, Jun 26, 2024 at 2:38 PM Konrad Dybcio  wrote:
>
> On 26.06.2024 8:43 PM, Rob Clark wrote:
> > On Wed, Jun 26, 2024 at 1:24 AM Akhil P Oommen  
> > wrote:
> >>
> >> On Mon, Jun 24, 2024 at 03:53:48PM +0200, Konrad Dybcio wrote:
> >>>
> >>>
> >>> On 6/23/24 13:06, Akhil P Oommen wrote:
> >>>> Add support in drm/msm driver for the Adreno X185 gpu found in
> >>>> Snapdragon X1 Elite chipset.
> >>>>
> >>>> Signed-off-by: Akhil P Oommen 
> >>>> ---
> >>>>
> >>>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> >>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> >>>>   drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> >>>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> >>>>   4 files changed, 36 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> >>>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> index 0e3dfd4c2bc8..168a4bddfaf2 100644
> >>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> >>>> @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> >>>> unsigned int state)
> >>>>  */
> >>>> gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> >>>> +   if (adreno_is_x185(adreno_gpu)) {
> >>>> +   chipid = 0x7050001;
> >>>
> >>> What's wrong with using the logic below?
> >>
> >> patchid is BITS(7, 0), not (15, 8) in the case of x185. Due to the
> >> changes in the chipid scheme within the a7x family, this is a bit
> >> confusing. I will try to improve here in another series.
> >
> > I'm thinking we should just add gmu_chipid to struct a6xx_info, tbh
> >
> > Maybe to start with, we can fall back to the existing logic if
> > a6xx_info::gmu_chipid is zero so we don't have to add it for _every_
> > a6xx/a7xx
>
> If X185 is not the only occurence, I'd second this..

basically all a7xx are "special" compared to the original logic, so we
can start with using gmu_chipid for just a7xx

BR,
-R

> Konrad


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Rob Clark
On Wed, Jun 26, 2024 at 1:49 PM Akhil P Oommen  wrote:
>
> On Mon, Jun 24, 2024 at 07:28:06AM -0700, Rob Clark wrote:
> > On Mon, Jun 24, 2024 at 7:25 AM Rob Clark  wrote:
> > >
> > > On Sun, Jun 23, 2024 at 4:08 AM Akhil P Oommen  
> > > wrote:
> > > >
> > > > Add support in drm/msm driver for the Adreno X185 gpu found in
> > > > Snapdragon X1 Elite chipset.
> > > >
> > > > Signed-off-by: Akhil P Oommen 
> > > > ---
> > > >
> > > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> > > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> > > >  drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> > > >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> > > >  4 files changed, 36 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > > > unsigned int state)
> > > >  */
> > > > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> > > >
> > > > +   if (adreno_is_x185(adreno_gpu)) {
> > > > +   chipid = 0x7050001;
> > > > /* NOTE: A730 may also fall in this if-condition with a future 
> > > > GMU fw update. */
> > > > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > > > +   } else if (adreno_is_a7xx(adreno_gpu) && 
> > > > !adreno_is_a730(adreno_gpu)) {
> > > > /* A7xx GPUs have obfuscated chip IDs. Use constant maj 
> > > > = 7 */
> > > > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> > > >
> > > > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > > > device *dev, u32 *votes,
> > > > if (!pri_count)
> > > > return -EINVAL;
> > > >
> > > > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > > -   if (IS_ERR(sec))
> > > > -   return PTR_ERR(sec);
> > > > +   /*
> > > > +* Some targets have a separate gfx mxc rail. So try to read 
> > > > that first and then fall back
> > > > +* to regular mx rail if it is missing
> > > > +*/
> > > > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > > > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > > > +   return -EPROBE_DEFER;
> > > > +   } else if (IS_ERR(sec)) {
> > > > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > > +   if (IS_ERR(sec))
> > > > +   return PTR_ERR(sec);
> > > > +   }
> > > >
> > > > sec_count >>= 1;
> > > > if (!sec_count)
> > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > > > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > index 973872ad0474..97837f7f2a40 100644
> > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > @@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu 
> > > > *gpu)
> > > > count = ARRAY_SIZE(a660_protect);
> > > > count_max = 48;
> > > > BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
> > > > -   } else if (adreno_is_a730(adreno_gpu) ||
> > > > -  adreno_is_a740(adreno_gpu) ||
> > > > -  adreno_is_a750(adreno_gpu)) {
> > > > +   } else if (adreno_is_a7xx(adreno_gpu)) {
> > > > regs = a730_protect;
> > > > count = ARRAY_SIZE(a730_protect);
> > > > count_max = 48;
> > > > @@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
> > > > gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
> > > >
> > > > /* Set weights for bicubic filtering */
> > > > -   if (adreno_is_a6

[PATCH v5 2/2] drm/msm: Extend gpu devcore dumps with pgtbl info

2024-06-26 Thread Rob Clark
From: Rob Clark 

In the case of iova fault triggered devcore dumps, include additional
debug information based on what we think is the current page tables,
including the TTBR0 value (which should match what we have in
adreno_smmu_fault_info unless things have gone horribly wrong), and
the pagetable entries traversed in the process of resolving the
faulting iova.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 ++
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +
 drivers/gpu/drm/msm/msm_gpu.h   |  8 
 drivers/gpu/drm/msm/msm_iommu.c | 25 +
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 99661af8d941..422dae873b6b 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -861,6 +861,16 @@ void adreno_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
drm_printf(p, "  - dir=%s\n", info->flags & IOMMU_FAULT_WRITE ? 
"WRITE" : "READ");
drm_printf(p, "  - type=%s\n", info->type);
drm_printf(p, "  - source=%s\n", info->block);
+
+   /* Information extracted from what we think are the current
+* pgtables.  Hopefully the TTBR0 matches what we've extracted
+* from the SMMU registers in smmu_info!
+*/
+   drm_puts(p, "pgtable-fault-info:\n");
+   drm_printf(p, "  - ttbr0: %.16llx\n", (u64)info->pgtbl_ttbr0);
+   drm_printf(p, "  - asid: %d\n", info->asid);
+   drm_printf(p, "  - ptes: %.16llx %.16llx %.16llx %.16llx\n",
+  info->ptes[0], info->ptes[1], info->ptes[2], 
info->ptes[3]);
}
 
drm_printf(p, "rbbm-status: 0x%08x\n", state->rbbm_status);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 3666b42b4ecd..bf2f8b2a7ccc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -281,6 +281,15 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
if (submit) {
int i;
 
+   if (state->fault_info.ttbr0) {
+   struct msm_gpu_fault_info *info = &state->fault_info;
+   struct msm_mmu *mmu = submit->aspace->mmu;
+
+   msm_iommu_pagetable_params(mmu, &info->pgtbl_ttbr0,
+  &info->asid);
+   msm_iommu_pagetable_walk(mmu, info->iova, info->ptes);
+   }
+
state->bos = kcalloc(submit->nr_bos,
sizeof(struct msm_gpu_state_bo), GFP_KERNEL);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f02bb9956be..82e838ba8c80 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -101,6 +101,14 @@ struct msm_gpu_fault_info {
int flags;
const char *type;
const char *block;
+
+   /* Information about what we think/expect is the current SMMU state,
+* for example expected_ttbr0 should match smmu_info.ttbr0 which
+* was read back from SMMU registers.
+*/
+   phys_addr_t pgtbl_ttbr0;
+   u64 ptes[4];
+   int asid;
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index d5512037c38b..a235b0d0afb5 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -195,6 +195,31 @@ struct iommu_domain_geometry 
*msm_iommu_get_geometry(struct msm_mmu *mmu)
return &iommu->domain->geometry;
 }
 
+int
+msm_iommu_pagetable_walk(struct msm_mmu *mmu, unsigned long iova, uint64_t 
ptes[4])
+{
+   struct msm_iommu_pagetable *pagetable;
+   struct io_pgtable_walk_data wd = {};
+   int ret;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (!pagetable->pgtbl_ops->pgtable_walk)
+   return -EINVAL;
+
+   ret = pagetable->pgtbl_ops->pgtable_walk(pagetable->pgtbl_ops, iova, 
&wd);
+   if (ret)
+   return ret;
+
+   for (int i = 0; i < ARRAY_SIZE(wd.ptes); i++)
+   ptes[i] = wd.ptes[i];
+
+   return 0;
+}
+
 static const struct msm_mmu_funcs pagetable_funcs = {
.map = msm_iommu_pagetable_map,
.unmap = msm_iommu_pagetable_unmap,
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881..96e509bd96a6 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -53,7 +53,8 

[PATCH v5 0/2] io-pgtable-arm + drm/msm: Extend iova fault debugging

2024-06-26 Thread Rob Clark
From: Rob Clark 

This series extends io-pgtable-arm with a method to retrieve the page
table entries traversed in the process of address translation, and then
beefs up drm/msm gpu devcore dump to include this (and additional info)
in the devcore dump.

This is a respin of https://patchwork.freedesktop.org/series/94968/
(minus a patch that was already merged)

v2: Fix an armv7/32b build error in the last patch
v3: Incorperate Will Deacon's suggestion to make the interface
callback based.
v4: Actually wire up the callback
v5: Drop the callback approach

Rob Clark (2):
  iommu/io-pgtable-arm: Add way to debug pgtable walk
  drm/msm: Extend gpu devcore dumps with pgtbl info

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 
 drivers/gpu/drm/msm/msm_gpu.c   |  9 +++
 drivers/gpu/drm/msm/msm_gpu.h   |  8 ++
 drivers/gpu/drm/msm/msm_iommu.c | 25 ++
 drivers/gpu/drm/msm/msm_mmu.h   |  3 ++-
 drivers/iommu/io-pgtable-arm.c  | 34 ++---
 include/linux/io-pgtable.h  | 16 
 7 files changed, 95 insertions(+), 10 deletions(-)

-- 
2.45.2



Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-26 Thread Rob Clark
On Mon, Jun 24, 2024 at 8:14 AM Will Deacon  wrote:
>
> On Thu, May 23, 2024 at 10:52:21AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Add an io-pgtable method to walk the pgtable returning the raw PTEs that
> > would be traversed for a given iova access.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 51 --
> >  include/linux/io-pgtable.h |  4 +++
> >  2 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index f7828a7aad41..f47a0e64bb35 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct 
> > io_pgtable_ops *ops, unsigned long iov
> >   data->start_level, ptep);
> >  }
> >
> > -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> > -  unsigned long iova)
> > +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
> > iova,
> > + int (*cb)(void *cb_data, void *pte, int level),
> > + void *cb_data)
> >  {
> >   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >   arm_lpae_iopte pte, *ptep = data->pgd;
> >   int lvl = data->start_level;
> > + int ret;
> >
> >   do {
> >   /* Valid IOPTE pointer? */
> >   if (!ptep)
> > - return 0;
> > + return -EFAULT;
>
> nit: -ENOENT might be a little better, as we're only checking against a
> NULL entry rather than strictly any faulting entry.
>
> >   /* Grab the IOPTE we're interested in */
> >   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
> > @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
> > io_pgtable_ops *ops,
> >
> >   /* Valid entry? */
> >   if (!pte)
> > - return 0;
> > + return -EFAULT;
>
> Same here (and at the end of the function).
>
> > +
> > + ret = cb(cb_data, &pte, lvl);
>
> Since pte is on the stack, rather than pointing into the actual pgtable,
> I think it would be clearer to pass it by value to the callback.
>
> > + if (ret)
> > + return ret;
> >
> > - /* Leaf entry? */
> > + /* Leaf entry?  If so, we've found the translation */
> >   if (iopte_leaf(pte, lvl, data->iop.fmt))
> > - goto found_translation;
> > + return 0;
> >
> >   /* Take it to the next level */
> >   ptep = iopte_deref(pte, data);
> >   } while (++lvl < ARM_LPAE_MAX_LEVELS);
> >
> >   /* Ran out of page tables to walk */
> > + return -EFAULT;
> > +}
> > +
> > +struct iova_to_phys_walk_data {
> > + arm_lpae_iopte pte;
> > + int level;
> > +};
>
> Expanding a little on Robin's suggestion, why don't we drop this structure
> in favour of something more generic:
>
> struct arm_lpae_walk_data {
> arm_lpae_iopte ptes[ARM_LPAE_MAX_LEVELS];
> };
>
> and then do something in the walker like:
>
> if (cb && !cb(pte, lvl))
> walk_data->ptes[lvl] = pte;
>

So thinking about this some more... if I use a walk_data struct to
return the PTEs, I can just get rid of the callback entirely.  That
ends up looking more like my first version.   The callback taking
void* was mainly to avoid coding the PTE size in the generic
io_pgtable interface.  But if we just go with u64, because that is the
biggest PTE size we need to deal with, then it all gets simpler.  (The
callback was actually a semi-awkward interface to use from drm/msm.)

BR,
-R

> which could return the physical address at the end, if it reaches a leaf
> entry. That way arm_lpae_iova_to_phys() is just passing a NULL callback
> to the walker and your debug callback just needs to return 0 (i.e. the
> callback is basically just saying whether or not to continue the walk).
>
> Will


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-26 Thread Rob Clark
On Wed, Jun 26, 2024 at 1:24 AM Akhil P Oommen  wrote:
>
> On Mon, Jun 24, 2024 at 03:53:48PM +0200, Konrad Dybcio wrote:
> >
> >
> > On 6/23/24 13:06, Akhil P Oommen wrote:
> > > Add support in drm/msm driver for the Adreno X185 gpu found in
> > > Snapdragon X1 Elite chipset.
> > >
> > > Signed-off-by: Akhil P Oommen 
> > > ---
> > >
> > >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> > >   drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> > >   4 files changed, 36 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > > unsigned int state)
> > >  */
> > > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> > > +   if (adreno_is_x185(adreno_gpu)) {
> > > +   chipid = 0x7050001;
> >
> > What's wrong with using the logic below?
>
> patchid is BITS(7, 0), not (15, 8) in the case of x185. Due to the
> changes in the chipid scheme within the a7x family, this is a bit
> confusing. I will try to improve here in another series.

I'm thinking we should just add gmu_chipid to struct a6xx_info, tbh

Maybe to start with, we can fall back to the existing logic if
a6xx_info::gmu_chipid is zero so we don't have to add it for _every_
a6xx/a7xx

BR,
-R

> >
> > > /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
> > > update. */
> > > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > > +   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) 
> > > {
> > > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
> > > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> > > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > > device *dev, u32 *votes,
> > > if (!pri_count)
> > > return -EINVAL;
> > > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > -   if (IS_ERR(sec))
> > > -   return PTR_ERR(sec);
> > > +   /*
> > > +* Some targets have a separate gfx mxc rail. So try to read that 
> > > first and then fall back
> > > +* to regular mx rail if it is missing
> > > +*/
> > > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > > +   return -EPROBE_DEFER;
> > > +   } else if (IS_ERR(sec)) {
> > > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > > +   if (IS_ERR(sec))
> > > +   return PTR_ERR(sec);
> > > +   }
> >
> > I assume GMXC would always be used if present, although please use the
> > approach Dmitry suggested
>
> Correct.
>
> -Akhil
> >
> >
> > The rest looks good!
> >
> > Konrad


Re: [PATCH v2 2/2] Revert "drm/msm/a6xx: Poll for GBIF unhalt status in hw_init"

2024-06-25 Thread Rob Clark
On Tue, Jun 25, 2024 at 1:18 PM Dmitry Baryshkov
 wrote:
>
> On Tue, 25 Jun 2024 at 21:54, Konrad Dybcio  wrote:
> >
> > Commit c9707bcbd0f3 ("drm/msm/adreno: De-spaghettify the use of memory
>
> ID is not present in next

it ofc wouldn't be, because it was the previous patch in this series ;-)

I've fixed that up (and below) while applying the patch

BR,
-R

> > barriers") made some fixups relating to write arrival, ensuring that
> > the GPU's memory interface has *really really really* been told to come
> > out of reset. That in turn rendered the hacky commit being reverted no
> > longer necessary.
> >
> > Get rid of it.
> >
> > This reverts commit b77532803d11f2b03efab2ebfd8c0061cd7f8b30.
>
> b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
>
> >
> > Signed-off-by: Konrad Dybcio 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 4083d0cad782..03e23eef5126 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -867,10 +867,6 @@ static int hw_init(struct msm_gpu *gpu)
> > gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
> > }
> >
> > -   /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! 
> > */
> > -   if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
> > -   spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
> > -
> > gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
> >
> > if (adreno_is_a619_holi(adreno_gpu))
> >
> > --
> > 2.45.2
> >
>
>
> --
> With best wishes
> Dmitry


Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-25 Thread Rob Clark
On Tue, Jun 25, 2024 at 1:23 PM Akhil P Oommen  wrote:
>
> On Tue, Jun 25, 2024 at 11:03:42AM -0700, Rob Clark wrote: > On Tue, Jun 25, 
> 2024 at 10:59 AM Akhil P Oommen  wrote:
> > >
> > > On Fri, Jun 21, 2024 at 02:09:58PM -0700, Rob Clark wrote:
> > > > On Sat, Jun 8, 2024 at 8:44 AM Kiarash Hajian
> > > >  wrote:
> > > > >
> > > > > The driver's memory regions are currently just ioremap()ed, but not
> > > > > reserved through a request. That's not a bug, but having the request 
> > > > > is
> > > > > a little more robust.
> > > > >
> > > > > Implement the region-request through the corresponding managed
> > > > > devres-function.
> > > > >
> > > > > Signed-off-by: Kiarash Hajian 
> > > > > ---
> > > > > Changes in v6:
> > > > > -Fix compile error
> > > > > -Link to v5: 
> > > > > https://lore.kernel.org/all/20240607-memory-v1-1-8664f52fc...@gmail.com
> > > > >
> > > > > Changes in v5:
> > > > > - Fix error hanlding problems.
> > > > > - Link to v4: 
> > > > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v4-1-3881a6408...@gmail.com
> > > > >
> > > > > Changes in v4:
> > > > > - Combine v3 commits into a singel commit
> > > > > - Link to v3: 
> > > > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v3-0-0a728ad45...@gmail.com
> > > > >
> > > > > Changes in v3:
> > > > > - Remove redundant devm_iounmap calls, relying on devres for 
> > > > > automatic resource cleanup.
> > > > >
> > > > > Changes in v2:
> > > > > - update the subject prefix to "drm/msm/a6xx:", to match the 
> > > > > majority of other changes to this file.
> > > > > ---
> > > > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 
> > > > > +++--
> > > > >  1 file changed, 11 insertions(+), 22 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > > index 8bea8ef26f77..d26cc6254ef9 100644
> > > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > > > @@ -525,7 +525,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu 
> > > > > *gmu)
> > > > > bool pdc_in_aop = false;
> > > > >
> > > > > if (IS_ERR(pdcptr))
> > > > > -   goto err;
> > > > > +   return;
> > > > >
> > > > > if (adreno_is_a650(adreno_gpu) ||
> > > > > adreno_is_a660_family(adreno_gpu) ||
> > > > > @@ -541,7 +541,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu 
> > > > > *gmu)
> > > > > if (!pdc_in_aop) {
> > > > > seqptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc_seq");
> > > > > if (IS_ERR(seqptr))
> > > > > -   goto err;
> > > > > +   return;
> > > > > }
> > > > >
> > > > > /* Disable SDE clock gating */
> > > > > @@ -633,12 +633,6 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu 
> > > > > *gmu)
> > > > > wmb();
> > > > >
> > > > > a6xx_rpmh_stop(gmu);
> > > > > -
> > > > > -err:
> > > > > -   if (!IS_ERR_OR_NULL(pdcptr))
> > > > > -   iounmap(pdcptr);
> > > > > -   if (!IS_ERR_OR_NULL(seqptr))
> > > > > -   iounmap(seqptr);
> > > > >  }
> > > > >
> > > > >  /*
> > > > > @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> > > > > platform_device *pdev,
> > > > > return ERR_PTR(-EINVAL);
> > > > > }
> > > > >
> > > > > -   ret = ioremap(res->start, resource_size(res));
> > > > > +   ret = devm_ioremap_resource(&pdev->dev, res);
> > > >
> > > > So, this

Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-25 Thread Rob Clark
On Tue, Jun 25, 2024 at 10:59 AM Akhil P Oommen
 wrote:
>
> On Fri, Jun 21, 2024 at 02:09:58PM -0700, Rob Clark wrote:
> > On Sat, Jun 8, 2024 at 8:44 AM Kiarash Hajian
> >  wrote:
> > >
> > > The driver's memory regions are currently just ioremap()ed, but not
> > > reserved through a request. That's not a bug, but having the request is
> > > a little more robust.
> > >
> > > Implement the region-request through the corresponding managed
> > > devres-function.
> > >
> > > Signed-off-by: Kiarash Hajian 
> > > ---
> > > Changes in v6:
> > > -Fix compile error
> > > -Link to v5: 
> > > https://lore.kernel.org/all/20240607-memory-v1-1-8664f52fc...@gmail.com
> > >
> > > Changes in v5:
> > > - Fix error hanlding problems.
> > > - Link to v4: 
> > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v4-1-3881a6408...@gmail.com
> > >
> > > Changes in v4:
> > > - Combine v3 commits into a singel commit
> > > - Link to v3: 
> > > https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v3-0-0a728ad45...@gmail.com
> > >
> > > Changes in v3:
> > > - Remove redundant devm_iounmap calls, relying on devres for 
> > > automatic resource cleanup.
> > >
> > > Changes in v2:
> > > - update the subject prefix to "drm/msm/a6xx:", to match the majority 
> > > of other changes to this file.
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 
> > > +++--
> > >  1 file changed, 11 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 8bea8ef26f77..d26cc6254ef9 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -525,7 +525,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > > bool pdc_in_aop = false;
> > >
> > > if (IS_ERR(pdcptr))
> > > -   goto err;
> > > +   return;
> > >
> > > if (adreno_is_a650(adreno_gpu) ||
> > > adreno_is_a660_family(adreno_gpu) ||
> > > @@ -541,7 +541,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > > if (!pdc_in_aop) {
> > > seqptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc_seq");
> > > if (IS_ERR(seqptr))
> > > -   goto err;
> > > +   return;
> > > }
> > >
> > > /* Disable SDE clock gating */
> > > @@ -633,12 +633,6 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> > > wmb();
> > >
> > > a6xx_rpmh_stop(gmu);
> > > -
> > > -err:
> > > -   if (!IS_ERR_OR_NULL(pdcptr))
> > > -   iounmap(pdcptr);
> > > -   if (!IS_ERR_OR_NULL(seqptr))
> > > -   iounmap(seqptr);
> > >  }
> > >
> > >  /*
> > > @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> > > platform_device *pdev,
> > > return ERR_PTR(-EINVAL);
> > > }
> > >
> > > -   ret = ioremap(res->start, resource_size(res));
> > > +   ret = devm_ioremap_resource(&pdev->dev, res);
> >
> > So, this doesn't actually work, failing in __request_region_locked(),
> > because the gmu region partially overlaps with the gpucc region (which
> > is busy).  I think this is intentional, since gmu is controlling the
> > gpu clocks, etc.  In particular REG_A6XX_GPU_CC_GX_GDSCR is in this
> > overlapping region.  Maybe Akhil knows more about GMU.
>
> We don't really need to map gpucc region from driver on behalf of gmu.
> Since we don't access any gpucc register from drm-msm driver, we can
> update the range size to correct this. But due to backward compatibility
> requirement with older dt, can we still enable region locking? I prefer
> it if that is possible.

Actually, when I reduced the region size to not overlap with gpucc,
the region is smaller than REG_A6XX_GPU_CC_GX_GDSCR * 4.

So I guess that register is actually part of gpucc?

BR,
-R

> FYI, kgsl accesses gpucc registers to ensure gdsc has collapsed. So
> gpucc region has to be mapped by kgsl and that is reflected in the kgsl
> device tree.
>
> -Akhil
>
&

Re: [PATCH v2 4/7] drm/msm/adreno: Add speedbin data for SM8550 / A740

2024-06-25 Thread Rob Clark
On Wed, Jun 5, 2024 at 1:10 PM Konrad Dybcio  wrote:
>
> Add speebin data for A740, as found on SM8550 and derivative SoCs.
>
> Reviewed-by: Dmitry Baryshkov 
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/adreno_device.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 901ef767e491..e00eef8099ae 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -570,6 +570,10 @@ static const struct adreno_info gpulist[] = {
> .zapfw = "a740_zap.mdt",
> .hwcg = a740_hwcg,
> .address_space_size = SZ_16G,
> +   .speedbins = ADRENO_SPEEDBINS(
> +   { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 },
> +   { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 },

Did you really mean for these both to map to the same speedbin?

> +   ),
> }, {
> .chip_ids = ADRENO_CHIP_IDS(0x43051401), /* "C520v2" */
> .family = ADRENO_7XX_GEN3,
>
> --
> 2.43.0
>


Re: [PATCH v2 3/7] drm/msm/adreno: Implement SMEM-based speed bin

2024-06-25 Thread Rob Clark
On Wed, Jun 5, 2024 at 1:10 PM Konrad Dybcio  wrote:
>
> On recent (SM8550+) Snapdragon platforms, the GPU speed bin data is
> abstracted through SMEM, instead of being directly available in a fuse.
>
> Add support for SMEM-based speed binning, which includes getting
> "feature code" and "product code" from said source and parsing them
> to form something that lets us match OPPs against.
>
> Due to the product code being ignored in the context of Adreno on
> production parts (as of SM8650), hardcode it to SOCINFO_PC_UNKNOWN.
>
> Signed-off-by: Konrad Dybcio 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  8 +++---
>  drivers/gpu/drm/msm/adreno/adreno_device.c |  2 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c| 41 
> +++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h| 12 ++---
>  4 files changed, 53 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 973872ad0474..3f84417ff027 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -2894,13 +2894,15 @@ static u32 fuse_to_supp_hw(const struct adreno_info 
> *info, u32 fuse)
> return UINT_MAX;
>  }
>
> -static int a6xx_set_supported_hw(struct device *dev, const struct 
> adreno_info *info)
> +static int a6xx_set_supported_hw(struct adreno_gpu *adreno_gpu,
> +struct device *dev,
> +const struct adreno_info *info)
>  {
> u32 supp_hw;
> u32 speedbin;
> int ret;
>
> -   ret = adreno_read_speedbin(dev, &speedbin);
> +   ret = adreno_read_speedbin(adreno_gpu, dev, &speedbin);
> /*
>  * -ENOENT means that the platform doesn't support speedbin which is
>  * fine
> @@ -3060,7 +3062,7 @@ struct msm_gpu *a6xx_gpu_init(struct drm_device *dev)
>
> a6xx_llc_slices_init(pdev, a6xx_gpu, is_a7xx);
>
> -   ret = a6xx_set_supported_hw(&pdev->dev, config->info);
> +   ret = a6xx_set_supported_hw(adreno_gpu, &pdev->dev, config->info);
> if (ret) {
> a6xx_llc_slices_destroy(a6xx_gpu);
> kfree(a6xx_gpu);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index c3703a51287b..901ef767e491 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -6,6 +6,8 @@
>   * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
>   */
>
> +#include 
> +
>  #include "adreno_gpu.h"
>
>  bool hang_debug = false;
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 074fb498706f..055072260b3d 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -21,6 +21,9 @@
>  #include "msm_gem.h"
>  #include "msm_mmu.h"
>
> +#include 
> +#include 
> +
>  static u64 address_space_size = 0;
>  MODULE_PARM_DESC(address_space_size, "Override for size of processes private 
> GPU address space");
>  module_param(address_space_size, ullong, 0600);
> @@ -1057,9 +1060,39 @@ void adreno_gpu_ocmem_cleanup(struct adreno_ocmem 
> *adreno_ocmem)
>adreno_ocmem->hdl);
>  }
>
> -int adreno_read_speedbin(struct device *dev, u32 *speedbin)
> +int adreno_read_speedbin(struct adreno_gpu *adreno_gpu,
> +struct device *dev, u32 *fuse)
>  {
> -   return nvmem_cell_read_variable_le_u32(dev, "speed_bin", speedbin);
> +   u32 fcode;
> +   int ret;
> +
> +   /*
> +* Try reading the speedbin via a nvmem cell first
> +* -ENOENT means "no nvmem-cells" and essentially means "old DT" or
> +* "nvmem fuse is irrelevant", simply assume it's fine.
> +*/
> +   ret = nvmem_cell_read_variable_le_u32(dev, "speed_bin", fuse);
> +   if (!ret)
> +   return 0;
> +   else if (ret != -ENOENT)
> +   return dev_err_probe(dev, ret, "Couldn't read the speed bin 
> fuse value\n");
> +
> +#ifdef CONFIG_QCOM_SMEM
> +   /*
> +* Only check the feature code - the product code only matters for
> +* proto SoCs unavailable outside Qualcomm labs, as far as GPU bin
> +* matching is concerned.
> +*
> +* Ignore EOPNOTSUPP, as not all SoCs expose this info through SMEM.
> +*/
> +   ret = qcom_smem_get_feature_code(&fcode);
> +   if (!ret) {
> +   *fuse = ADRENO_SKU_ID(fcode);
> +   } else if (ret != -EOPNOTSUPP)
> +   return dev_err_probe(dev, ret, "Couldn't get feature code 
> from SMEM\n");
> +#endif
> +
> +   return 0;
>  }
>
>  int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> @@ -1098,9 +1131,9 @@ int adreno_gpu_init(struct drm_device *drm, struct 
> platform_device *pdev,
> devm_pm_opp_set_clkname(dev, "core");
>  

Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-25 Thread Rob Clark
On Tue, Jun 25, 2024 at 4:27 AM Will Deacon  wrote:
>
> On Mon, Jun 24, 2024 at 08:37:26AM -0700, Rob Clark wrote:
> > On Mon, Jun 24, 2024 at 8:14 AM Will Deacon  wrote:
> > >
> > > On Thu, May 23, 2024 at 10:52:21AM -0700, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > Add an io-pgtable method to walk the pgtable returning the raw PTEs that
> > > > would be traversed for a given iova access.
> > > >
> > > > Signed-off-by: Rob Clark 
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 51 --
> > > >  include/linux/io-pgtable.h |  4 +++
> > > >  2 files changed, 46 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c 
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index f7828a7aad41..f47a0e64bb35 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct 
> > > > io_pgtable_ops *ops, unsigned long iov
> > > >   data->start_level, ptep);
> > > >  }
> > > >
> > > > -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> > > > -  unsigned long iova)
> > > > +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned 
> > > > long iova,
> > > > + int (*cb)(void *cb_data, void *pte, int level),
> > > > + void *cb_data)
> > > >  {
> > > >   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > >   arm_lpae_iopte pte, *ptep = data->pgd;
> > > >   int lvl = data->start_level;
> > > > + int ret;
> > > >
> > > >   do {
> > > >   /* Valid IOPTE pointer? */
> > > >   if (!ptep)
> > > > - return 0;
> > > > + return -EFAULT;
> > >
> > > nit: -ENOENT might be a little better, as we're only checking against a
> > > NULL entry rather than strictly any faulting entry.
> > >
> > > >   /* Grab the IOPTE we're interested in */
> > > >   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
> > > > @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
> > > > io_pgtable_ops *ops,
> > > >
> > > >   /* Valid entry? */
> > > >   if (!pte)
> > > > - return 0;
> > > > + return -EFAULT;
> > >
> > > Same here (and at the end of the function).
> > >
> > > > +
> > > > + ret = cb(cb_data, &pte, lvl);
> > >
> > > Since pte is on the stack, rather than pointing into the actual pgtable,
> > > I think it would be clearer to pass it by value to the callback.
> >
> > fwiw, I passed it as a void* to avoid the pte size.. although I guess
> > it could be a union of all the possible pte types
>
> Can you just get away with a u64?

yeah, that wfm if you're ok with it

BR,
-R


Re: [PATCH] drm/msm: log iommu init failure

2024-06-24 Thread Rob Clark
On Mon, Jun 24, 2024 at 11:29 AM Dmitry Baryshkov
 wrote:
>
> On Mon, 24 Jun 2024 at 20:59, Rob Clark  wrote:
> >
> > On Thu, Jun 20, 2024 at 11:48 PM Luca Weiss  
> > wrote:
> > >
> > > On Fri Jun 21, 2024 at 12:47 AM CEST, Konrad Dybcio wrote:
> > > >
> > > >
> > > > On 6/20/24 20:24, Dmitry Baryshkov wrote:
> > > > > On Thu, 20 Jun 2024 at 20:32, Rob Clark  wrote:
> > > > >>
> > > > >> On Thu, May 30, 2024 at 2:48 AM Marc Gonzalez  
> > > > >> wrote:
> > > > >>>
> > > > >>> On 16/05/2024 10:43, Marijn Suijten wrote:
> > > > >>>
> > > > >>>> On 2024-05-15 17:09:02, Marc Gonzalez wrote:
> > > > >>>>
> > > > >>>>> When create_address_space() fails (e.g. when smmu node is 
> > > > >>>>> disabled)
> > > > >>
> > > > >> Note that smmu support is going to become a hard dependency with the
> > > > >> drm_gpuvm/VM_BIND conversion.. which I think means we should never 
> > > > >> get
> > > > >> far enough to hit this error path..
> > > > >
> > > > > Does that mean that we will lose GPU support on  MSM8974?
> >
> > And display support as well :-/
> >
> > Note that GPU should be disabled by default without smmu.. you can
> > override with modparam, but please don't.  It is incredibly insecure,
> > you might as well make /dev/mem world readable/writeable.
> >
> > Is simplefb an option on 8974 or 8226 to keep display support?
>
> Not in a longer term, I still hope to push HDMI PHY/PLL support for
> MSM8974, which means dynamic resolution support.

Hmm, maybe it would be possible to re-add carveout support.. but my
hopes aren't too high.  It would be better if we could get smmu going.
(Not to mention, I don't really like the idea of people using the gpu
without an smmu... it is a really insecure thing to do.)

BR,
-R

> >
> > BR,
> > -R
> >
> > > >
> > > > Yeah, that was brought up on #freedreno some time ago
> > >
> > > Also on MSM8226 which I also care about...
> > >
> > > Anyone at all knowledgable on IOMMU would be very welcome to help out
> > > with IOMMU support on these two platforms (and anything else that
> > > old?) in any case, since me and some other people have looked at this
> > > (on and off) for years but haven't gotten to any stable or usable point
> > > unfortunately.
> > >
> > > Regards
> > > Luca
> > >
> > > >
> > > > Konrad
> > >
>
>
>
> --
> With best wishes
> Dmitry


Re: [PATCH] drm/msm: log iommu init failure

2024-06-24 Thread Rob Clark
On Thu, Jun 20, 2024 at 11:48 PM Luca Weiss  wrote:
>
> On Fri Jun 21, 2024 at 12:47 AM CEST, Konrad Dybcio wrote:
> >
> >
> > On 6/20/24 20:24, Dmitry Baryshkov wrote:
> > > On Thu, 20 Jun 2024 at 20:32, Rob Clark  wrote:
> > >>
> > >> On Thu, May 30, 2024 at 2:48 AM Marc Gonzalez  
> > >> wrote:
> > >>>
> > >>> On 16/05/2024 10:43, Marijn Suijten wrote:
> > >>>
> > >>>> On 2024-05-15 17:09:02, Marc Gonzalez wrote:
> > >>>>
> > >>>>> When create_address_space() fails (e.g. when smmu node is disabled)
> > >>
> > >> Note that smmu support is going to become a hard dependency with the
> > >> drm_gpuvm/VM_BIND conversion.. which I think means we should never get
> > >> far enough to hit this error path..
> > >
> > > Does that mean that we will lose GPU support on  MSM8974?

And display support as well :-/

Note that GPU should be disabled by default without smmu.. you can
override with modparam, but please don't.  It is incredibly insecure,
you might as well make /dev/mem world readable/writeable.

Is simplefb an option on 8974 or 8226 to keep display support?

BR,
-R

> >
> > Yeah, that was brought up on #freedreno some time ago
>
> Also on MSM8226 which I also care about...
>
> Anyone at all knowledgable on IOMMU would be very welcome to help out
> with IOMMU support on these two platforms (and anything else that
> old?) in any case, since me and some other people have looked at this
> (on and off) for years but haven't gotten to any stable or usable point
> unfortunately.
>
> Regards
> Luca
>
> >
> > Konrad
>


Re: [PATCH v4 1/2] iommu/io-pgtable-arm: Add way to debug pgtable walk

2024-06-24 Thread Rob Clark
On Mon, Jun 24, 2024 at 8:14 AM Will Deacon  wrote:
>
> On Thu, May 23, 2024 at 10:52:21AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Add an io-pgtable method to walk the pgtable returning the raw PTEs that
> > would be traversed for a given iova access.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 51 --
> >  include/linux/io-pgtable.h |  4 +++
> >  2 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index f7828a7aad41..f47a0e64bb35 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -693,17 +693,19 @@ static size_t arm_lpae_unmap_pages(struct 
> > io_pgtable_ops *ops, unsigned long iov
> >   data->start_level, ptep);
> >  }
> >
> > -static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> > -  unsigned long iova)
> > +static int arm_lpae_pgtable_walk(struct io_pgtable_ops *ops, unsigned long 
> > iova,
> > + int (*cb)(void *cb_data, void *pte, int level),
> > + void *cb_data)
> >  {
> >   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >   arm_lpae_iopte pte, *ptep = data->pgd;
> >   int lvl = data->start_level;
> > + int ret;
> >
> >   do {
> >   /* Valid IOPTE pointer? */
> >   if (!ptep)
> > - return 0;
> > + return -EFAULT;
>
> nit: -ENOENT might be a little better, as we're only checking against a
> NULL entry rather than strictly any faulting entry.
>
> >   /* Grab the IOPTE we're interested in */
> >   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
> > @@ -711,22 +713,52 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
> > io_pgtable_ops *ops,
> >
> >   /* Valid entry? */
> >   if (!pte)
> > - return 0;
> > + return -EFAULT;
>
> Same here (and at the end of the function).
>
> > +
> > + ret = cb(cb_data, &pte, lvl);
>
> Since pte is on the stack, rather than pointing into the actual pgtable,
> I think it would be clearer to pass it by value to the callback.

fwiw, I passed it as a void* to avoid the pte size.. although I guess
it could be a union of all the possible pte types

BR,
-R

>
> > + if (ret)
> > + return ret;
> >
> > - /* Leaf entry? */
> > + /* Leaf entry?  If so, we've found the translation */
> >   if (iopte_leaf(pte, lvl, data->iop.fmt))
> > - goto found_translation;
> > + return 0;
> >
> >   /* Take it to the next level */
> >   ptep = iopte_deref(pte, data);
> >   } while (++lvl < ARM_LPAE_MAX_LEVELS);
> >
> >   /* Ran out of page tables to walk */
> > + return -EFAULT;
> > +}
> > +
> > +struct iova_to_phys_walk_data {
> > + arm_lpae_iopte pte;
> > + int level;
> > +};
>
> Expanding a little on Robin's suggestion, why don't we drop this structure
> in favour of something more generic:
>
> struct arm_lpae_walk_data {
> arm_lpae_iopte ptes[ARM_LPAE_MAX_LEVELS];
> };
>
> and then do something in the walker like:
>
> if (cb && !cb(pte, lvl))
> walk_data->ptes[lvl] = pte;
>
> which could return the physical address at the end, if it reaches a leaf
> entry. That way arm_lpae_iova_to_phys() is just passing a NULL callback
> to the walker and your debug callback just needs to return 0 (i.e. the
> callback is basically just saying whether or not to continue the walk).
>
> Will


Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-24 Thread Rob Clark
On Mon, Jun 24, 2024 at 7:25 AM Rob Clark  wrote:
>
> On Sun, Jun 23, 2024 at 4:08 AM Akhil P Oommen  
> wrote:
> >
> > Add support in drm/msm driver for the Adreno X185 gpu found in
> > Snapdragon X1 Elite chipset.
> >
> > Signed-off-by: Akhil P Oommen 
> > ---
> >
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
> >  drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
> >  4 files changed, 36 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..168a4bddfaf2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> > unsigned int state)
> >  */
> > gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
> >
> > +   if (adreno_is_x185(adreno_gpu)) {
> > +   chipid = 0x7050001;
> > /* NOTE: A730 may also fall in this if-condition with a future GMU 
> > fw update. */
> > -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> > +   } else if (adreno_is_a7xx(adreno_gpu) && 
> > !adreno_is_a730(adreno_gpu)) {
> > /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 
> > */
> > chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
> >
> > @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct 
> > device *dev, u32 *votes,
> > if (!pri_count)
> > return -EINVAL;
> >
> > -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > -   if (IS_ERR(sec))
> > -   return PTR_ERR(sec);
> > +   /*
> > +* Some targets have a separate gfx mxc rail. So try to read that 
> > first and then fall back
> > +* to regular mx rail if it is missing
> > +*/
> > +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> > +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> > +   return -EPROBE_DEFER;
> > +   } else if (IS_ERR(sec)) {
> > +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> > +   if (IS_ERR(sec))
> > +   return PTR_ERR(sec);
> > +   }
> >
> > sec_count >>= 1;
> > if (!sec_count)
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 973872ad0474..97837f7f2a40 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
> > count = ARRAY_SIZE(a660_protect);
> > count_max = 48;
> > BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
> > -   } else if (adreno_is_a730(adreno_gpu) ||
> > -  adreno_is_a740(adreno_gpu) ||
> > -  adreno_is_a750(adreno_gpu)) {
> > +   } else if (adreno_is_a7xx(adreno_gpu)) {
> > regs = a730_protect;
> > count = ARRAY_SIZE(a730_protect);
> > count_max = 48;
> > @@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
> > gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
> >
> > /* Set weights for bicubic filtering */
> > -   if (adreno_is_a650_family(adreno_gpu)) {
> > +   if (adreno_is_a650_family(adreno_gpu) || 
> > adreno_is_x185(adreno_gpu)) {
> > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
> > gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
> > 0x3fe05ff4);
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > index c3703a51287b..139c7d828749 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> > @@ -568,6 +568,20 @@ static const struct adreno_info gpulist[] = {
> > .zapfw = "a740_zap.mdt",
> > .hwcg = a740_hwcg,
> > .address_space_size = SZ_16G,
> > +   }, {
> > +   .chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" *

Re: [PATCH v1 2/3] drm/msm/adreno: Add support for X185 GPU

2024-06-24 Thread Rob Clark
On Sun, Jun 23, 2024 at 4:08 AM Akhil P Oommen  wrote:
>
> Add support in drm/msm driver for the Adreno X185 gpu found in
> Snapdragon X1 Elite chipset.
>
> Signed-off-by: Akhil P Oommen 
> ---
>
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 19 +++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  6 ++
>  drivers/gpu/drm/msm/adreno/adreno_device.c | 14 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h|  5 +
>  4 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 0e3dfd4c2bc8..168a4bddfaf2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -830,8 +830,10 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
> unsigned int state)
>  */
> gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
>
> +   if (adreno_is_x185(adreno_gpu)) {
> +   chipid = 0x7050001;
> /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
> update. */
> -   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
> +   } else if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) 
> {
> /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
> chipid = FIELD_PREP(GENMASK(31, 24), 0x7);
>
> @@ -1329,9 +1331,18 @@ static int a6xx_gmu_rpmh_arc_votes_init(struct device 
> *dev, u32 *votes,
> if (!pri_count)
> return -EINVAL;
>
> -   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> -   if (IS_ERR(sec))
> -   return PTR_ERR(sec);
> +   /*
> +* Some targets have a separate gfx mxc rail. So try to read that 
> first and then fall back
> +* to regular mx rail if it is missing
> +*/
> +   sec = cmd_db_read_aux_data("gmxc.lvl", &sec_count);
> +   if (PTR_ERR_OR_ZERO(sec) == -EPROBE_DEFER) {
> +   return -EPROBE_DEFER;
> +   } else if (IS_ERR(sec)) {
> +   sec = cmd_db_read_aux_data("mx.lvl", &sec_count);
> +   if (IS_ERR(sec))
> +   return PTR_ERR(sec);
> +   }
>
> sec_count >>= 1;
> if (!sec_count)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 973872ad0474..97837f7f2a40 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1319,9 +1319,7 @@ static void a6xx_set_cp_protect(struct msm_gpu *gpu)
> count = ARRAY_SIZE(a660_protect);
> count_max = 48;
> BUILD_BUG_ON(ARRAY_SIZE(a660_protect) > 48);
> -   } else if (adreno_is_a730(adreno_gpu) ||
> -  adreno_is_a740(adreno_gpu) ||
> -  adreno_is_a750(adreno_gpu)) {
> +   } else if (adreno_is_a7xx(adreno_gpu)) {
> regs = a730_protect;
> count = ARRAY_SIZE(a730_protect);
> count_max = 48;
> @@ -1891,7 +1889,7 @@ static int hw_init(struct msm_gpu *gpu)
> gpu_write(gpu, REG_A6XX_UCHE_CLIENT_PF, BIT(7) | 0x1);
>
> /* Set weights for bicubic filtering */
> -   if (adreno_is_a650_family(adreno_gpu)) {
> +   if (adreno_is_a650_family(adreno_gpu) || adreno_is_x185(adreno_gpu)) {
> gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_0, 0);
> gpu_write(gpu, REG_A6XX_TPL1_BICUBIC_WEIGHTS_TABLE_1,
> 0x3fe05ff4);
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index c3703a51287b..139c7d828749 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -568,6 +568,20 @@ static const struct adreno_info gpulist[] = {
> .zapfw = "a740_zap.mdt",
> .hwcg = a740_hwcg,
> .address_space_size = SZ_16G,
> +   }, {
> +   .chip_ids = ADRENO_CHIP_IDS(0x43050c01), /* "C512v2" */
> +   .family = ADRENO_7XX_GEN2,
> +   .fw = {
> +   [ADRENO_FW_SQE] = "gen70500_sqe.fw",
> +   [ADRENO_FW_GMU] = "gen70500_gmu.bin",
> +   },
> +   .gmem = 3 * SZ_1M,
> +   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> +   .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> + ADRENO_QUIRK_HAS_HW_APRIV,
> +   .init = a6xx_gpu_init,
> +   .hwcg = a740_hwcg,
> +   .address_space_size = SZ_16G,

I'm kinda thinking we should drop the address_space_size and add
instead ADRENO_QUIRK_4G or something along those lines, since there
are devices with 32 or 64G

(a690 is incorrect in this way too)

BR,
-R

> }, {
> .chip_ids = ADRENO_CHIP_IDS(0x43051401), /* "C520v2" */
> .family = ADRENO_7XX_GEN3,
> diff --git a/

Re: [PATCH] drm/msm/a6xx: request memory region

2024-06-21 Thread Rob Clark
On Sat, Jun 8, 2024 at 8:44 AM Kiarash Hajian
 wrote:
>
> The driver's memory regions are currently just ioremap()ed, but not
> reserved through a request. That's not a bug, but having the request is
> a little more robust.
>
> Implement the region-request through the corresponding managed
> devres-function.
>
> Signed-off-by: Kiarash Hajian 
> ---
> Changes in v6:
> -Fix compile error
> -Link to v5: 
> https://lore.kernel.org/all/20240607-memory-v1-1-8664f52fc...@gmail.com
>
> Changes in v5:
> - Fix error hanlding problems.
> - Link to v4: 
> https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v4-1-3881a6408...@gmail.com
>
> Changes in v4:
> - Combine v3 commits into a singel commit
> - Link to v3: 
> https://lore.kernel.org/r/20240512-msm-adreno-memory-region-v3-0-0a728ad45...@gmail.com
>
> Changes in v3:
> - Remove redundant devm_iounmap calls, relying on devres for automatic 
> resource cleanup.
>
> Changes in v2:
> - update the subject prefix to "drm/msm/a6xx:", to match the majority of 
> other changes to this file.
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 +++--
>  1 file changed, 11 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 8bea8ef26f77..d26cc6254ef9 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -525,7 +525,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> bool pdc_in_aop = false;
>
> if (IS_ERR(pdcptr))
> -   goto err;
> +   return;
>
> if (adreno_is_a650(adreno_gpu) ||
> adreno_is_a660_family(adreno_gpu) ||
> @@ -541,7 +541,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> if (!pdc_in_aop) {
> seqptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc_seq");
> if (IS_ERR(seqptr))
> -   goto err;
> +   return;
> }
>
> /* Disable SDE clock gating */
> @@ -633,12 +633,6 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
> wmb();
>
> a6xx_rpmh_stop(gmu);
> -
> -err:
> -   if (!IS_ERR_OR_NULL(pdcptr))
> -   iounmap(pdcptr);
> -   if (!IS_ERR_OR_NULL(seqptr))
> -   iounmap(seqptr);
>  }
>
>  /*
> @@ -1503,7 +1497,7 @@ static void __iomem *a6xx_gmu_get_mmio(struct 
> platform_device *pdev,
> return ERR_PTR(-EINVAL);
> }
>
> -   ret = ioremap(res->start, resource_size(res));
> +   ret = devm_ioremap_resource(&pdev->dev, res);

So, this doesn't actually work, failing in __request_region_locked(),
because the gmu region partially overlaps with the gpucc region (which
is busy).  I think this is intentional, since gmu is controlling the
gpu clocks, etc.  In particular REG_A6XX_GPU_CC_GX_GDSCR is in this
overlapping region.  Maybe Akhil knows more about GMU.

BR,
-R

> if (!ret) {
> DRM_DEV_ERROR(&pdev->dev, "Unable to map the %s registers\n", 
> name);
> return ERR_PTR(-EINVAL);
> @@ -1613,13 +1607,13 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, 
> struct device_node *node)
> gmu->mmio = a6xx_gmu_get_mmio(pdev, "gmu");
> if (IS_ERR(gmu->mmio)) {
> ret = PTR_ERR(gmu->mmio);
> -   goto err_mmio;
> +   goto err_cleanup;
> }
>
> gmu->cxpd = dev_pm_domain_attach_by_name(gmu->dev, "cx");
> if (IS_ERR(gmu->cxpd)) {
> ret = PTR_ERR(gmu->cxpd);
> -   goto err_mmio;
> +   goto err_cleanup;
> }
>
> if (!device_link_add(gmu->dev, gmu->cxpd, DL_FLAG_PM_RUNTIME)) {
> @@ -1635,7 +1629,7 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, 
> struct device_node *node)
> gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> if (IS_ERR(gmu->gxpd)) {
> ret = PTR_ERR(gmu->gxpd);
> -   goto err_mmio;
> +   goto err_cleanup;
> }
>
> gmu->initialized = true;
> @@ -1645,9 +1639,7 @@ int a6xx_gmu_wrapper_init(struct a6xx_gpu *a6xx_gpu, 
> struct device_node *node)
>  detach_cxpd:
> dev_pm_domain_detach(gmu->cxpd, false);
>
> -err_mmio:
> -   iounmap(gmu->mmio);
> -
> +err_cleanup:
> /* Drop reference taken in of_find_device_by_node */
> put_device(gmu->dev);
>
> @@ -1762,7 +1754,7 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct 
> device_node *node)
> gmu->rscc = a6xx_gmu_get_mmio(pdev, "rscc");
> if (IS_ERR(gmu->rscc)) {
> ret = -ENODEV;
> -   goto err_mmio;
> +   goto err_cleanup;
> }
> } else {
> gmu->rscc = gmu->mmio + 0x23000;
> @@ -1774,13 +1766,13 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct 
> device_node *node)
>
>   

Re: [PATCH] drm/msm: log iommu init failure

2024-06-20 Thread Rob Clark
On Thu, May 30, 2024 at 2:48 AM Marc Gonzalez  wrote:
>
> On 16/05/2024 10:43, Marijn Suijten wrote:
>
> > On 2024-05-15 17:09:02, Marc Gonzalez wrote:
> >
> >> When create_address_space() fails (e.g. when smmu node is disabled)

Note that smmu support is going to become a hard dependency with the
drm_gpuvm/VM_BIND conversion.. which I think means we should never get
far enough to hit this error path..

BR,
-R

> >> msm_gpu_init() silently fails:
> >>
> >> msm_dpu c901000.display-controller: failed to load adreno gpu
> >> msm_dpu c901000.display-controller: failed to bind 500.gpu (ops 
> >> a3xx_ops): -19
> >>
> >> Log create_address_space() failure.
> >>
> >> Signed-off-by: Marc Gonzalez 
> >
> > Thanks!
> >
> > Suggested-by: Marijn Suijten 
> >
> > And, after checking the below:
> >
> > Reviewed-by: Marijn Suijten 
> >
> >> ---
> >>  drivers/gpu/drm/msm/msm_gpu.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> >> index 655002b21b0d5..f1e692866cc38 100644
> >> --- a/drivers/gpu/drm/msm/msm_gpu.c
> >> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> >> @@ -941,6 +941,7 @@ int msm_gpu_init(struct drm_device *drm, struct 
> >> platform_device *pdev,
> >>  DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM 
> >> carveout!\n", name);
> >>  else if (IS_ERR(gpu->aspace)) {
> >>  ret = PTR_ERR(gpu->aspace);
> >> +DRM_DEV_ERROR(drm->dev, "could not create address space: 
> >> %d\n", ret);
> >
> > Maybe this wasn't done before because this also includes `-EPROBE_DEFER`, 
> > so you
> > might want to wrap this in
> >
> >   if (ret != -EPROBE_DEFER)
> >   DRM_DEV_ERROR...
> >
> > But then dev_err_probe() was built specifically to be less verbose about 
> > this
> > (and track defer reasons).  While this is an init and not probe function, 
> > it's
> > called from struct component_ops->bind where it should be okay to call that,
> > as long as you have access to the component `struct device*` and not its 
> > master
> > (IIRC).
>
>
> Hello Marijn,
>
> I have moved on to HDMI.
>
> Feel free to take full ownership of this submission,
> as I won't have the energy to get it accepted.
>
> Regards,
>
> Marc
>


Re: [PATCH] drm/msm/adreno: Add A306A support

2024-06-20 Thread Rob Clark
On Wed, May 29, 2024 at 3:41 AM Konrad Dybcio  wrote:
>
> On 28.05.2024 9:43 PM, Barnabás Czémán wrote:
> > From: Otto Pflüger 
> >
> > Add support for Adreno 306A GPU what is found in MSM8917 SoC.
> > This GPU marketing name is Adreno 308.
> >
> > Signed-off-by: Otto Pflüger 
> > [use internal name of the GPU, reword the commit message]
> > Signed-off-by: Barnabás Czémán 
> > ---
>
> [...]
>
>
> >
> > +static inline bool adreno_is_a306a(const struct adreno_gpu *gpu)
> > +{
> > + /* a306a marketing name is a308 */
> > + return adreno_is_revn(gpu, 308);
> > +}
>
> The .c changes look good. Rob, do we still want .rev nowadays?

mostly I just want to avoid revn for newer GPUs, but I suppose we
should be consistent and drop it for "new old" GPUs..

Also, it would be nice to rebase on
https://patchwork.freedesktop.org/series/127393/

BR,
-R


Re: [PATCH] drm/msm/dpu: protect ctl ops calls with validity checks

2024-06-20 Thread Rob Clark
On Thu, Jun 20, 2024 at 6:08 AM Dmitry Baryshkov
 wrote:
>
> On Thu, 20 Jun 2024 at 00:27, Abhinav Kumar  wrote:
> >
> > dpu_encoder_helper_phys_cleanup() calls the ctl ops without checking if
> > the ops are assigned causing discrepancy between its callers where the
> > checks are performed and the API itself which does not.
> >
> > Two approaches can be taken: either drop the checks even in the caller
> > OR add the checks even in dpu_encoder_helper_phys_cleanup().
> >
> > Adopt the latter approach as ctl ops are assigned revision based so may not
> > be always assigned.
>
> NAK, these calls are always assigned. Please make sure that they are
> documented as required and drop offending checks.

agreed, I'd rather see the obvious crash if somehow a required
callback didn't get set up, than a subtle/silent problem.  It is
easier to debug that way.

BR,
-R

> >
> > Fixes: d7d0e73f7de3 ("drm/msm/dpu: introduce the dpu_encoder_phys_* for 
> > writeback")
> > Reported-by: Dan Carpenter 
> > Closes: 
> > https://lore.kernel.org/all/464fbd84-0d1c-43c3-a40b-31656ac06456@moroto.mountain/T/
> > Signed-off-by: Abhinav Kumar 
> > ---
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > index 708657598cce..7f7e6d4e974b 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> > @@ -2180,9 +2180,12 @@ void dpu_encoder_helper_phys_cleanup(struct 
> > dpu_encoder_phys *phys_enc)
> > if (ctl->ops.reset_intf_cfg)
> > ctl->ops.reset_intf_cfg(ctl, &intf_cfg);
> >
> > -   ctl->ops.trigger_flush(ctl);
> > -   ctl->ops.trigger_start(ctl);
> > -   ctl->ops.clear_pending_flush(ctl);
> > +   if (ctl->ops.trigger_flush)
> > +   ctl->ops.trigger_flush(ctl);
> > +   if (ctl->ops.trigger_start)
> > +   ctl->ops.trigger_start(ctl);
> > +   if (ctl->ops.clear_pending_flush)
> > +   ctl->ops.clear_pending_flush(ctl);
> >  }
> >
> >  void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc,
> > --
> > 2.44.0
> >
>
>
> --
> With best wishes
> Dmitry


Re: [PATCH v4 5/5] drm/msm/adreno: Move CP_PROTECT settings to hw catalog

2024-06-20 Thread Rob Clark
On Tue, Jun 18, 2024 at 12:02 PM Konrad Dybcio  wrote:
>
>
>
> On 6/18/24 18:42, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Move the CP_PROTECT settings into the hw catalog.
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Dmitry Baryshkov 
> > ---
>
> [...]
>
> > +static inline void __build_asserts(void)
> > +{
> > + BUILD_BUG_ON(a630_protect.count > a630_protect.count_max);
> > + BUILD_BUG_ON(a650_protect.count > a650_protect.count_max);
> > + BUILD_BUG_ON(a660_protect.count > a660_protect.count_max);
> > + BUILD_BUG_ON(a690_protect.count > a690_protect.count_max);
> > + BUILD_BUG_ON(a730_protect.count > a730_protect.count_max);
> > +}
> > +
>
> patch:394: new blank line at EOF

removed the extra blank line while applying, thx

BR,
-R

> other than that:
>
> Reviewed-by: Konrad Dybcio 
>
> Konrad


[PATCH v4 5/5] drm/msm/adreno: Move CP_PROTECT settings to hw catalog

2024-06-18 Thread Rob Clark
From: Rob Clark 

Move the CP_PROTECT settings into the hw catalog.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 248 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 257 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   2 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  13 ++
 4 files changed, 269 insertions(+), 251 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index b81bcae59ac3..329b88b24b80 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -454,6 +454,173 @@ static const struct adreno_reglist a690_hwcg[] = {
{}
 };
 
+/* For a615, a616, a618, a619, a630, a640 and a680 */
+static const u32 a630_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e70, 0x0001),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x11c00, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a630_protect, 32);
+
+/* These are for a620 and a650 */
+static const u32 a650_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x08e80, 0x027f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e60, 0x0011),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0b608, 0x0007),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x18400, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1a800, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1f400, 0x0443),
+   A6XX_PROTECT_RDONLY(0x1f844, 0x007b),
+   A6XX_PROTECT_NORDWR(0x1f887, 0x001b),
+   A6XX_PROTECT_NORDWR(0x1f8c0, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a650_protect, 48);
+
+/* These are for a635 and a660 */
+static const u32 a660_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272

[PATCH v4 4/5] drm/msm/adreno: Move hwcg table into a6xx specific info

2024-06-18 Thread Rob Clark
From: Rob Clark 

Introduce a6xx_info where we can stash gen specific stuff without
polluting the toplevel adreno_info struct.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
Reviewed-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 65 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  9 
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  6 ++-
 4 files changed, 67 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index bcc2f4d8cfc6..b81bcae59ac3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,7 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx_gpu.h"
 #include "a6xx.xml.h"
 #include "a6xx_gmu.xml.h"
 
@@ -465,7 +466,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a610_zap.mdt",
-   .hwcg = a612_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a612_hwcg,
+   },
/*
 * There are (at least) three SoCs implementing A610: SM6125
 * (trinket), SM6115 (bengal) and SM6225 (khaje). Trinket does
@@ -493,7 +496,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mbn",
-   .hwcg = a615_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 128, 1 },
@@ -513,6 +518,8 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
+   .a6xx = &(const struct a6xx_info) {
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 169, 1 },
@@ -531,7 +538,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 138, 1 },
@@ -550,7 +559,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 190, 1 },
@@ -569,7 +580,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 120, 4 },
@@ -593,7 +606,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a630_zap.mdt",
-   .hwcg = a630_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a630_hwcg,
+   },
}, {
.chip_ids = ADRENO_CHIP_IDS(0x06040001),
.family = ADRENO_6XX_GEN2,
@@ -607,7 +622,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a640_zap.mdt",
-   .hwcg = a640_hwcg,
+   .a6xx = &(const struct a6xx_info) {
+   .hwcg = a640_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0, 0 },
{ 1, 1 },
@@ -626,7 +643,9 @@ static const struct adreno_info a6xx_gpus[] = {
ADRENO_QUIRK_HAS_HW_APRIV,
.init = a6xx_gpu_init,
.zapfw = &q

[PATCH v4 3/5] drm/msm/adreno: Move hwcg regs to a6xx hw catalog

2024-06-18 Thread Rob Clark
From: Rob Clark 

Move the hwcg tables into the hw catalog.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
Reviewed-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 619 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 617 -
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   3 -
 3 files changed, 619 insertions(+), 620 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 10a92eab0232..bcc2f4d8cfc6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,451 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx.xml.h"
+#include "a6xx_gmu.xml.h"
+
+static const struct adreno_reglist a612_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0081},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0xf3cf},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x0120},
+   {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x2220},
+   {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x0011},
+   {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x0422},
+   {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x4000},
+   {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x0200},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+   {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x0004},
+   {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x0002},
+   {REG_A6XX_RBBM_ISDB_CNT, 0x0182},
+   {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x},
+   {REG_A6XX_RBBM_SP_HYST_CNT, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
+   {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
+   {},
+};
+
+/* For a615 family (a615, a616, a618 and a619) */
+static const struct adreno_reglist a615_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0080},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0,  0xF3CF},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP1,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP1, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP1,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP1, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP1, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE,  0x},
+   {REG_A6XX_RBB

[PATCH v4 2/5] drm/msm/adreno: Split catalog into separate files

2024-06-18 Thread Rob Clark
From: Rob Clark 

Split each gen's gpu table into it's own file.  Only code-motion, no
functional change.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
Reviewed-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/Makefile   |   5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |  52 ++
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |  81 +++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |  50 ++
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  | 148 +
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 338 +++
 drivers/gpu/drm/msm/adreno/adreno_device.c | 625 +
 7 files changed, 680 insertions(+), 619 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index eb788921ff4f..f5e2838c6a76 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -8,13 +8,18 @@ ccflags-$(CONFIG_DRM_MSM_DP) += -I $(src)/dp
 adreno-y := \
adreno/adreno_device.o \
adreno/adreno_gpu.o \
+   adreno/a2xx_catalog.o \
adreno/a2xx_gpu.o \
adreno/a2xx_gpummu.o \
+   adreno/a3xx_catalog.o \
adreno/a3xx_gpu.o \
+   adreno/a4xx_catalog.o \
adreno/a4xx_gpu.o \
+   adreno/a5xx_catalog.o \
adreno/a5xx_gpu.o \
adreno/a5xx_power.o \
adreno/a5xx_preempt.o \
+   adreno/a6xx_catalog.o \
adreno/a6xx_gpu.o \
adreno/a6xx_gmu.o \
adreno/a6xx_hfi.o \
diff --git a/drivers/gpu/drm/msm/adreno/a2xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
new file mode 100644
index ..9ddb7b31fd98
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a2xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x0200),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 200,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_256K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, { /* a200 on i.mx51 has only 128kib gmem */
+   .chip_ids = ADRENO_CHIP_IDS(0x0201),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 201,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x0202),
+   .family = ADRENO_2XX_GEN2,
+   .revn  = 220,
+   .fw = {
+   [ADRENO_FW_PM4] = "leia_pm4_470.fw",
+   [ADRENO_FW_PFP] = "leia_pfp_470.fw",
+   },
+   .gmem  = SZ_512K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }
+};
+DECLARE_ADRENO_GPULIST(a2xx);
+
+MODULE_FIRMWARE("qcom/leia_pfp_470.fw");
+MODULE_FIRMWARE("qcom/leia_pm4_470.fw");
+MODULE_FIRMWARE("qcom/yamato_pfp.fw");
+MODULE_FIRMWARE("qcom/yamato_pm4.fw");
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
new file mode 100644
index ..0de8465b6cf0
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
+   .family = ADRENO_3XX,
+   .fw = {
+   [ADRENO_FW_PM4] = "a330_pm4.fw",
+   [ADRENO_FW_PFP] = "a330_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a3xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000520),
+   .family = ADRENO_3XX,
+  

[PATCH v4 1/5] drm/msm/adreno: Split up giant device table

2024-06-18 Thread Rob Clark
From: Rob Clark 

Split into a separate table per generation, in preparation to move each
gen's device table to it's own file.

Signed-off-by: Rob Clark 
Reviewed-by: Dmitry Baryshkov 
Reviewed-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 67 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h| 10 
 2 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index c3703a51287b..a57659eaddc2 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
 MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place of 
IOMMU");
 module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
 
-static const struct adreno_info gpulist[] = {
+static const struct adreno_info a2xx_gpus[] = {
{
.chip_ids = ADRENO_CHIP_IDS(0x0200),
.family = ADRENO_2XX_GEN1,
@@ -54,7 +54,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = SZ_512K,
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a2xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a2xx);
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x03000512),
.family = ADRENO_3XX,
.fw = {
@@ -116,7 +121,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = SZ_1M,
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a3xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a3xx);
+
+static const struct adreno_info a4xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x04000500),
.family = ADRENO_4XX,
.revn  = 405,
@@ -149,7 +159,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = (SZ_1M + SZ_512K),
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a4xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a4xx);
+
+static const struct adreno_info a5xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x05000600),
.family = ADRENO_5XX,
.revn = 506,
@@ -274,7 +289,12 @@ static const struct adreno_info gpulist[] = {
.quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
.init = a5xx_gpu_init,
.zapfw = "a540_zap.mdt",
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a5xx);
+
+static const struct adreno_info a6xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x0601),
.family = ADRENO_6XX_GEN1,
.revn = 610,
@@ -520,7 +540,12 @@ static const struct adreno_info gpulist[] = {
.zapfw = "a690_zap.mdt",
.hwcg = a690_hwcg,
.address_space_size = SZ_16G,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a6xx);
+
+static const struct adreno_info a7xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x07000200),
.family = ADRENO_6XX_GEN1, /* NOT a mistake! */
.fw = {
@@ -582,7 +607,17 @@ static const struct adreno_info gpulist[] = {
.init = a6xx_gpu_init,
.zapfw = "gen70900_zap.mbn",
.address_space_size = SZ_16G,
-   },
+   }
+};
+DECLARE_ADRENO_GPULIST(a7xx);
+
+static const struct adreno_gpulist *gpulists[] = {
+   &a2xx_gpulist,
+   &a3xx_gpulist,
+   &a4xx_gpulist,
+   &a5xx_gpulist,
+   &a6xx_gpulist,
+   &a6xx_gpulist,
 };
 
 MODULE_FIRMWARE("qcom/a300_pm4.fw");
@@ -617,13 +652,17 @@ MODULE_FIRMWARE("qcom/yamato_pm4.fw");
 static const struct adreno_info *adreno_info(uint32_t chip_id)
 {
/* identify gpu: */
-   for (int i = 0; i < ARRAY_SIZE(gpulist); i++) {
-   const struct adreno_info *info = &gpulist[i];
-   if (info->machine && !of_machine_is_compatible(info->machine))
-   continue;
-   for (int j = 0; info->chip_ids[j]; j++)
-   if (info->chip_ids[j] == chip_id)
-   return info;
+   for (int i = 0; i < ARRAY_SIZE(gpulists); i++) {
+   for (int j = 0; j < gpulists[i]->gpus_count; j++) {
+   const struct adreno_info *info = &gpulists[i]->gpus[j];
+
+   if (info->machine && 
!of_machine_is_compatible(info->machine))
+   continue;
+
+   for (int k = 0; info->chip_ids[k]; k++)
+   if (info->chip_ids[k] == chip_id)
+

[PATCH v4 0/5] drm/msm/adreno: Introduce/rework device hw catalog

2024-06-18 Thread Rob Clark
From: Rob Clark 

Split the single flat gpulist table into per-gen tables that exist in
their own per-gen files, and start moving more info into the device
table.  This at least gets all the big tables of register settings out
of the heart of the a6xx_gpu code.  Probably more could be moved, to
remove at least some of the per-gen if/else ladders, but this seemed
like a reasonably good start.

v2: Drop sentinel table entries
v3: Fix typo
v4: More const, fix missing a702 protect regs

Rob Clark (5):
  drm/msm/adreno: Split up giant device table
  drm/msm/adreno: Split catalog into separate files
  drm/msm/adreno: Move hwcg regs to a6xx hw catalog
  drm/msm/adreno: Move hwcg table into a6xx specific info
  drm/msm/adreno: Move CP_PROTECT settings to hw catalog

 drivers/gpu/drm/msm/Makefile   |5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |   52 +
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |   81 ++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |   50 +
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  |  148 +++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 1240 
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  880 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   11 +
 drivers/gpu/drm/msm/adreno/adreno_device.c |  624 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   32 +-
 10 files changed, 1649 insertions(+), 1474 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

-- 
2.45.2



Re: [PATCH v3 4/5] drm/msm/adreno: Move hwcg table into a6xx specific info

2024-06-18 Thread Rob Clark
On Tue, Jun 18, 2024 at 1:30 AM Dmitry Baryshkov
 wrote:
>
> On Mon, Jun 17, 2024 at 03:51:14PM GMT, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Introduce a6xx_info where we can stash gen specific stuff without
> > polluting the toplevel adreno_info struct.
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 65 +--
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 +--
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  9 
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  6 ++-
> >  4 files changed, 67 insertions(+), 19 deletions(-)
> >
>
> Reviewed-by: Dmitry Baryshkov 
>
>
> > @@ -98,7 +100,9 @@ struct adreno_info {
> >   struct msm_gpu *(*init)(struct drm_device *dev);
> >   const char *zapfw;
> >   u32 inactive_period;
> > - const struct adreno_reglist *hwcg;
> > + union {
> > + const struct a6xx_info *a6xx;
> > + };
> >   u64 address_space_size;
> >   /**
> >* @speedbins: Optional table of fuse to speedbin mappings
>
> My preference would be towards wrapping the adreno_gpu, but that would
> require more significant rework of the driver. Let's see if we can get
> to that later.
>

yeah, it was going to be more re-work, and I'm neck deep in
gpuvm/vm_bind.. I just wanted to land this since it is a pita (and
error prone) to rebase as more gpu's get added ;-)

It isn't entirely unlike how we handle gpu gen specific options in
mesa, where we have a somewhat bigger set of options, so I wouldn't
say that this approach was worse than extending adreno_info.. just
different..

BR,
-R


[PATCH v3 4/5] drm/msm/adreno: Move hwcg table into a6xx specific info

2024-06-17 Thread Rob Clark
From: Rob Clark 

Introduce a6xx_info where we can stash gen specific stuff without
polluting the toplevel adreno_info struct.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 65 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  9 
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  6 ++-
 4 files changed, 67 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index bcc2f4d8cfc6..96d93251fdd6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,7 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx_gpu.h"
 #include "a6xx.xml.h"
 #include "a6xx_gmu.xml.h"
 
@@ -465,7 +466,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a610_zap.mdt",
-   .hwcg = a612_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a612_hwcg,
+   },
/*
 * There are (at least) three SoCs implementing A610: SM6125
 * (trinket), SM6115 (bengal) and SM6225 (khaje). Trinket does
@@ -493,7 +496,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mbn",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 128, 1 },
@@ -513,6 +518,8 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
+   .a6xx = &(struct a6xx_info) {
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 169, 1 },
@@ -531,7 +538,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 138, 1 },
@@ -550,7 +559,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 190, 1 },
@@ -569,7 +580,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 120, 4 },
@@ -593,7 +606,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a630_zap.mdt",
-   .hwcg = a630_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a630_hwcg,
+   },
}, {
.chip_ids = ADRENO_CHIP_IDS(0x06040001),
.family = ADRENO_6XX_GEN2,
@@ -607,7 +622,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a640_zap.mdt",
-   .hwcg = a640_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a640_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0, 0 },
{ 1, 1 },
@@ -626,7 +643,9 @@ static const struct adreno_info a6xx_gpus[] = {
ADRENO_QUIRK_HAS_HW_APRIV,
.init = a6xx_gpu_init,
.zapfw = "a650_zap.mdt",
-   .hwcg = a650_hwcg,
+   .a6xx = &(struct a6xx_info)

[PATCH v3 5/5] drm/msm/adreno: Move CP_PROTECT settings to hw catalog

2024-06-17 Thread Rob Clark
From: Rob Clark 

Move the CP_PROTECT settings into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 247 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 257 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   2 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  13 ++
 4 files changed, 268 insertions(+), 251 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 96d93251fdd6..f64b5a7e86c9 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -454,6 +454,173 @@ static const struct adreno_reglist a690_hwcg[] = {
{}
 };
 
+/* For a615, a616, a618, a619, a630, a640 and a680 */
+static const u32 a630_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e70, 0x0001),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x11c00, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a630_protect, 32);
+
+/* These are for a620 and a650 */
+static const u32 a650_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x08e80, 0x027f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e60, 0x0011),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0b608, 0x0007),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x18400, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1a800, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1f400, 0x0443),
+   A6XX_PROTECT_RDONLY(0x1f844, 0x007b),
+   A6XX_PROTECT_NORDWR(0x1f887, 0x001b),
+   A6XX_PROTECT_NORDWR(0x1f8c0, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a650_protect, 48);
+
+/* These are for a635 and a660 */
+static const u32 a660_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001

[PATCH v3 3/5] drm/msm/adreno: Move hwcg regs to a6xx hw catalog

2024-06-17 Thread Rob Clark
From: Rob Clark 

Move the hwcg tables into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 619 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 617 -
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   3 -
 3 files changed, 619 insertions(+), 620 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 10a92eab0232..bcc2f4d8cfc6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,451 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx.xml.h"
+#include "a6xx_gmu.xml.h"
+
+static const struct adreno_reglist a612_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0081},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0xf3cf},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x0120},
+   {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x2220},
+   {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x0011},
+   {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x0422},
+   {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x4000},
+   {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x0200},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+   {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x0004},
+   {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x0002},
+   {REG_A6XX_RBBM_ISDB_CNT, 0x0182},
+   {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x},
+   {REG_A6XX_RBBM_SP_HYST_CNT, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
+   {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
+   {},
+};
+
+/* For a615 family (a615, a616, a618 and a619) */
+static const struct adreno_reglist a615_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0080},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0,  0xF3CF},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP1,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP1, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP1,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP1, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP1, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE,  0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_UCHE, 0x},
+   {REG_A6XX_RBB

[PATCH v3 2/5] drm/msm/adreno: Split catalog into separate files

2024-06-17 Thread Rob Clark
From: Rob Clark 

Split each gen's gpu table into it's own file.  Only code-motion, no
functional change.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Makefile   |   5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |  52 ++
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |  81 +++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |  50 ++
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  | 148 +
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 338 +++
 drivers/gpu/drm/msm/adreno/adreno_device.c | 625 +
 7 files changed, 680 insertions(+), 619 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index eb788921ff4f..f5e2838c6a76 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -8,13 +8,18 @@ ccflags-$(CONFIG_DRM_MSM_DP) += -I $(src)/dp
 adreno-y := \
adreno/adreno_device.o \
adreno/adreno_gpu.o \
+   adreno/a2xx_catalog.o \
adreno/a2xx_gpu.o \
adreno/a2xx_gpummu.o \
+   adreno/a3xx_catalog.o \
adreno/a3xx_gpu.o \
+   adreno/a4xx_catalog.o \
adreno/a4xx_gpu.o \
+   adreno/a5xx_catalog.o \
adreno/a5xx_gpu.o \
adreno/a5xx_power.o \
adreno/a5xx_preempt.o \
+   adreno/a6xx_catalog.o \
adreno/a6xx_gpu.o \
adreno/a6xx_gmu.o \
adreno/a6xx_hfi.o \
diff --git a/drivers/gpu/drm/msm/adreno/a2xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
new file mode 100644
index ..9ddb7b31fd98
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a2xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x0200),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 200,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_256K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, { /* a200 on i.mx51 has only 128kib gmem */
+   .chip_ids = ADRENO_CHIP_IDS(0x0201),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 201,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x0202),
+   .family = ADRENO_2XX_GEN2,
+   .revn  = 220,
+   .fw = {
+   [ADRENO_FW_PM4] = "leia_pm4_470.fw",
+   [ADRENO_FW_PFP] = "leia_pfp_470.fw",
+   },
+   .gmem  = SZ_512K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }
+};
+DECLARE_ADRENO_GPULIST(a2xx);
+
+MODULE_FIRMWARE("qcom/leia_pfp_470.fw");
+MODULE_FIRMWARE("qcom/leia_pm4_470.fw");
+MODULE_FIRMWARE("qcom/yamato_pfp.fw");
+MODULE_FIRMWARE("qcom/yamato_pm4.fw");
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
new file mode 100644
index ..0de8465b6cf0
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
+   .family = ADRENO_3XX,
+   .fw = {
+   [ADRENO_FW_PM4] = "a330_pm4.fw",
+   [ADRENO_FW_PFP] = "a330_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a3xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000520),
+   .family = ADRENO_3XX,
+   .revn  = 305,
+   .fw = {
+   [ADRENO_FW_PM4] = "a

[PATCH v3 1/5] drm/msm/adreno: Split up giant device table

2024-06-17 Thread Rob Clark
From: Rob Clark 

Split into a separate table per generation, in preparation to move each
gen's device table to it's own file.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 67 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h| 10 
 2 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index c3703a51287b..a57659eaddc2 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -20,7 +20,7 @@ bool allow_vram_carveout = false;
 MODULE_PARM_DESC(allow_vram_carveout, "Allow using VRAM Carveout, in place of 
IOMMU");
 module_param_named(allow_vram_carveout, allow_vram_carveout, bool, 0600);
 
-static const struct adreno_info gpulist[] = {
+static const struct adreno_info a2xx_gpus[] = {
{
.chip_ids = ADRENO_CHIP_IDS(0x0200),
.family = ADRENO_2XX_GEN1,
@@ -54,7 +54,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = SZ_512K,
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a2xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a2xx);
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x03000512),
.family = ADRENO_3XX,
.fw = {
@@ -116,7 +121,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = SZ_1M,
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a3xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a3xx);
+
+static const struct adreno_info a4xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x04000500),
.family = ADRENO_4XX,
.revn  = 405,
@@ -149,7 +159,12 @@ static const struct adreno_info gpulist[] = {
.gmem  = (SZ_1M + SZ_512K),
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init  = a4xx_gpu_init,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a4xx);
+
+static const struct adreno_info a5xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x05000600),
.family = ADRENO_5XX,
.revn = 506,
@@ -274,7 +289,12 @@ static const struct adreno_info gpulist[] = {
.quirks = ADRENO_QUIRK_LMLOADKILL_DISABLE,
.init = a5xx_gpu_init,
.zapfw = "a540_zap.mdt",
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a5xx);
+
+static const struct adreno_info a6xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x0601),
.family = ADRENO_6XX_GEN1,
.revn = 610,
@@ -520,7 +540,12 @@ static const struct adreno_info gpulist[] = {
.zapfw = "a690_zap.mdt",
.hwcg = a690_hwcg,
.address_space_size = SZ_16G,
-   }, {
+   }
+};
+DECLARE_ADRENO_GPULIST(a6xx);
+
+static const struct adreno_info a7xx_gpus[] = {
+   {
.chip_ids = ADRENO_CHIP_IDS(0x07000200),
.family = ADRENO_6XX_GEN1, /* NOT a mistake! */
.fw = {
@@ -582,7 +607,17 @@ static const struct adreno_info gpulist[] = {
.init = a6xx_gpu_init,
.zapfw = "gen70900_zap.mbn",
.address_space_size = SZ_16G,
-   },
+   }
+};
+DECLARE_ADRENO_GPULIST(a7xx);
+
+static const struct adreno_gpulist *gpulists[] = {
+   &a2xx_gpulist,
+   &a3xx_gpulist,
+   &a4xx_gpulist,
+   &a5xx_gpulist,
+   &a6xx_gpulist,
+   &a6xx_gpulist,
 };
 
 MODULE_FIRMWARE("qcom/a300_pm4.fw");
@@ -617,13 +652,17 @@ MODULE_FIRMWARE("qcom/yamato_pm4.fw");
 static const struct adreno_info *adreno_info(uint32_t chip_id)
 {
/* identify gpu: */
-   for (int i = 0; i < ARRAY_SIZE(gpulist); i++) {
-   const struct adreno_info *info = &gpulist[i];
-   if (info->machine && !of_machine_is_compatible(info->machine))
-   continue;
-   for (int j = 0; info->chip_ids[j]; j++)
-   if (info->chip_ids[j] == chip_id)
-   return info;
+   for (int i = 0; i < ARRAY_SIZE(gpulists); i++) {
+   for (int j = 0; j < gpulists[i]->gpus_count; j++) {
+   const struct adreno_info *info = &gpulists[i]->gpus[j];
+
+   if (info->machine && 
!of_machine_is_compatible(info->machine))
+   continue;
+
+   for (int k = 0; info->chip_ids[k]; k++)
+   if (info->chip_ids[k] == chip_id)
+   return info;
+   }
}
 
 

[PATCH v3 0/5] drm/msm/adreno: Introduce/rework device hw catalog

2024-06-17 Thread Rob Clark
From: Rob Clark 

Split the single flat gpulist table into per-gen tables that exist in
their own per-gen files, and start moving more info into the device
table.  This at least gets all the big tables of register settings out
of the heart of the a6xx_gpu code.  Probably more could be moved, to
remove at least some of the per-gen if/else ladders, but this seemed
like a reasonably good start.

v2: Drop sentinel table entries
v3: Fix typo

Rob Clark (5):
  drm/msm/adreno: Split up giant device table
  drm/msm/adreno: Split catalog into separate files
  drm/msm/adreno: Move hwcg regs to a6xx hw catalog
  drm/msm/adreno: Move hwcg table into a6xx specific info
  drm/msm/adreno: Move CP_PROTECT settings to hw catalog

 drivers/gpu/drm/msm/Makefile   |5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |   52 +
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |   81 ++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |   50 +
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  |  148 +++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 1239 
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  |  880 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   11 +
 drivers/gpu/drm/msm/adreno/adreno_device.c |  624 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   32 +-
 10 files changed, 1648 insertions(+), 1474 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

-- 
2.45.2



[PATCH v2 5/5] drm/msm/adreno: Move CP_PROTECT settings to hw catalog

2024-06-17 Thread Rob Clark
From: Rob Clark 

Move the CP_PROTECT settings into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 247 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 257 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   2 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  13 ++
 4 files changed, 268 insertions(+), 251 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index da49589f82d0..89e7feffcbe3 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -454,6 +454,173 @@ static const struct adreno_reglist a690_hwcg[] = {
{}
 };
 
+/* For a615, a616, a618, a619, a630, a640 and a680 */
+static const u32 a630_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e70, 0x0001),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x11c00, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a630_protect, 32);
+
+/* These are for a620 and a650 */
+static const u32 a650_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e03, 0x000c),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x08630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x08e00, 0x),
+   A6XX_PROTECT_NORDWR(0x08e08, 0x),
+   A6XX_PROTECT_NORDWR(0x08e50, 0x001f),
+   A6XX_PROTECT_NORDWR(0x08e80, 0x027f),
+   A6XX_PROTECT_NORDWR(0x09624, 0x01db),
+   A6XX_PROTECT_NORDWR(0x09e60, 0x0011),
+   A6XX_PROTECT_NORDWR(0x09e78, 0x0187),
+   A6XX_PROTECT_NORDWR(0x0a630, 0x01cf),
+   A6XX_PROTECT_NORDWR(0x0ae02, 0x),
+   A6XX_PROTECT_NORDWR(0x0ae50, 0x032f),
+   A6XX_PROTECT_NORDWR(0x0b604, 0x),
+   A6XX_PROTECT_NORDWR(0x0b608, 0x0007),
+   A6XX_PROTECT_NORDWR(0x0be02, 0x0001),
+   A6XX_PROTECT_NORDWR(0x0be20, 0x17df),
+   A6XX_PROTECT_NORDWR(0x0f000, 0x0bff),
+   A6XX_PROTECT_RDONLY(0x0fc00, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x18400, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1a800, 0x1fff),
+   A6XX_PROTECT_NORDWR(0x1f400, 0x0443),
+   A6XX_PROTECT_RDONLY(0x1f844, 0x007b),
+   A6XX_PROTECT_NORDWR(0x1f887, 0x001b),
+   A6XX_PROTECT_NORDWR(0x1f8c0, 0x), /* note: infinite range */
+};
+DECLARE_ADRENO_PROTECT(a650_protect, 48);
+
+/* These are for a635 and a660 */
+static const u32 a660_protect_regs[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x00501, 0x0005),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x02f4),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_NORDWR(0x00800, 0x0082),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   A6XX_PROTECT_RDONLY(0x008de, 0x00ae),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x0272),
+   A6XX_PROTECT_NORDWR(0x00e00, 0x0001

[PATCH v2 3/5] drm/msm/adreno: Move hwcg regs to a6xx hw catalog

2024-06-17 Thread Rob Clark
From: Rob Clark 

Move the hwcg tables into the hw catalog.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 619 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 617 -
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   3 -
 3 files changed, 619 insertions(+), 620 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index 899fc5fb5184..b991d3646722 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,451 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx.xml.h"
+#include "a6xx_gmu.xml.h"
+
+static const struct adreno_reglist a612_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0081},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0, 0xf3cf},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x0120},
+   {REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x2220},
+   {REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x00040f00},
+   {REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x05522022},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x0011},
+   {REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00445044},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x0422},
+   {REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x4000},
+   {REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x0200},
+   {REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004},
+   {REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x0004},
+   {REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x0002},
+   {REG_A6XX_RBBM_ISDB_CNT, 0x0182},
+   {REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x},
+   {REG_A6XX_RBBM_SP_HYST_CNT, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222},
+   {REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111},
+   {REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555},
+   {},
+};
+
+/* For a615 family (a615, a616, a618 and a619) */
+static const struct adreno_reglist a615_hwcg[] = {
+   {REG_A6XX_RBBM_CLOCK_CNTL_SP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0220},
+   {REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0080},
+   {REG_A6XX_RBBM_CLOCK_HYST_SP0,  0xF3CF},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP0,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL_TP1,  0x0222},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_CNTL4_TP1, 0x0002},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP0,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST_TP1,  0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_HYST4_TP1, 0x0007},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY2_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY3_TP1, 0x},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_DELAY4_TP1, 0x0001},
+   {REG_A6XX_RBBM_CLOCK_CNTL_UCHE,  0x},
+   {REG_A6XX_RBBM_CLOCK_CNTL2_UCHE, 0x},
+   {REG_A6XX_RBB

[PATCH v2 4/5] drm/msm/adreno: Move hwcg table into a6xx specific info

2024-06-17 Thread Rob Clark
From: Rob Clark 

Introduce a6xx_info where we can stash gen specific stuff without
polluting the toplevel adreno_info struct.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 65 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  6 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  9 
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  6 ++-
 4 files changed, 67 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
index b991d3646722..da49589f82d0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c
@@ -7,6 +7,7 @@
  */
 
 #include "adreno_gpu.h"
+#include "a6xx_gpu.h"
 #include "a6xx.xml.h"
 #include "a6xx_gmu.xml.h"
 
@@ -465,7 +466,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a610_zap.mdt",
-   .hwcg = a612_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a612_hwcg,
+   },
/*
 * There are (at least) three SoCs implementing A610: SM6125
 * (trinket), SM6115 (bengal) and SM6225 (khaje). Trinket does
@@ -493,7 +496,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mbn",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 128, 1 },
@@ -513,6 +518,8 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
+   .a6xx = &(struct a6xx_info) {
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 169, 1 },
@@ -531,7 +538,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 138, 1 },
@@ -550,7 +559,9 @@ static const struct adreno_info a6xx_gpus[] = {
.inactive_period = DRM_MSM_INACTIVE_PERIOD,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 190, 1 },
@@ -569,7 +580,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a615_zap.mdt",
-   .hwcg = a615_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a615_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0,   0 },
{ 120, 4 },
@@ -593,7 +606,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a630_zap.mdt",
-   .hwcg = a630_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a630_hwcg,
+   },
}, {
.chip_ids = ADRENO_CHIP_IDS(0x06040001),
.family = ADRENO_6XX_GEN2,
@@ -607,7 +622,9 @@ static const struct adreno_info a6xx_gpus[] = {
.quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT,
.init = a6xx_gpu_init,
.zapfw = "a640_zap.mdt",
-   .hwcg = a640_hwcg,
+   .a6xx = &(struct a6xx_info) {
+   .hwcg = a640_hwcg,
+   },
.speedbins = ADRENO_SPEEDBINS(
{ 0, 0 },
{ 1, 1 },
@@ -626,7 +643,9 @@ static const struct adreno_info a6xx_gpus[] = {
ADRENO_QUIRK_HAS_HW_APRIV,
.init = a6xx_gpu_init,
.zapfw = "a650_zap.mdt",
-   .hwcg = a650_hwcg,
+   .a6xx = &(struct a6xx_info)

[PATCH v2 2/5] drm/msm/adreno: Split catalog into separate files

2024-06-17 Thread Rob Clark
From: Rob Clark 

Split each gen's gpu table into it's own file.  Only code-motion, no
functional change.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/Makefile   |   5 +
 drivers/gpu/drm/msm/adreno/a2xx_catalog.c  |  52 ++
 drivers/gpu/drm/msm/adreno/a3xx_catalog.c  |  81 +++
 drivers/gpu/drm/msm/adreno/a4xx_catalog.c  |  50 ++
 drivers/gpu/drm/msm/adreno/a5xx_catalog.c  | 148 +
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c  | 338 +++
 drivers/gpu/drm/msm/adreno/adreno_device.c | 625 +
 7 files changed, 680 insertions(+), 619 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a2xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a3xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a4xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_catalog.c
 create mode 100644 drivers/gpu/drm/msm/adreno/a6xx_catalog.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index eb788921ff4f..f5e2838c6a76 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -8,13 +8,18 @@ ccflags-$(CONFIG_DRM_MSM_DP) += -I $(src)/dp
 adreno-y := \
adreno/adreno_device.o \
adreno/adreno_gpu.o \
+   adreno/a2xx_catalog.o \
adreno/a2xx_gpu.o \
adreno/a2xx_gpummu.o \
+   adreno/a3xx_catalog.o \
adreno/a3xx_gpu.o \
+   adreno/a4xx_catalog.o \
adreno/a4xx_gpu.o \
+   adreno/a5xx_catalog.o \
adreno/a5xx_gpu.o \
adreno/a5xx_power.o \
adreno/a5xx_preempt.o \
+   adreno/a6xx_catalog.o \
adreno/a6xx_gpu.o \
adreno/a6xx_gmu.o \
adreno/a6xx_hfi.o \
diff --git a/drivers/gpu/drm/msm/adreno/a2xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
new file mode 100644
index ..5c6afd1291ba
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a2xx_catalog.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a2xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x0200),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 200,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_256K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, { /* a200 on i.mx51 has only 128kib gmem */
+   .chip_ids = ADRENO_CHIP_IDS(0x0201),
+   .family = ADRENO_2XX_GEN1,
+   .revn  = 201,
+   .fw = {
+   [ADRENO_FW_PM4] = "yamato_pm4.fw",
+   [ADRENO_FW_PFP] = "yamato_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x0202),
+   .family = ADRENO_2XX_GEN2,
+   .revn  = 220,
+   .fw = {
+   [ADRENO_FW_PM4] = "leia_pm4_470.fw",
+   [ADRENO_FW_PFP] = "leia_pfp_470.fw",
+   },
+   .gmem  = SZ_512K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a2xx_gpu_init,
+   }
+};
+DECLARK_ADRENO_GPULIST(a2xx);
+
+MODULE_FIRMWARE("qcom/leia_pfp_470.fw");
+MODULE_FIRMWARE("qcom/leia_pm4_470.fw");
+MODULE_FIRMWARE("qcom/yamato_pfp.fw");
+MODULE_FIRMWARE("qcom/yamato_pm4.fw");
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_catalog.c 
b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
new file mode 100644
index ..7962ce0f3516
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a3xx_catalog.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2013-2014 Red Hat
+ * Author: Rob Clark 
+ *
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
+ */
+
+#include "adreno_gpu.h"
+
+static const struct adreno_info a3xx_gpus[] = {
+   {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000512),
+   .family = ADRENO_3XX,
+   .fw = {
+   [ADRENO_FW_PM4] = "a330_pm4.fw",
+   [ADRENO_FW_PFP] = "a330_pfp.fw",
+   },
+   .gmem  = SZ_128K,
+   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
+   .init  = a3xx_gpu_init,
+   }, {
+   .chip_ids = ADRENO_CHIP_IDS(0x03000520),
+   .family = ADRENO_3XX,
+   .revn  = 305,
+   .fw = {
+   [ADRENO_FW_PM4] = "a

  1   2   3   4   5   6   7   8   9   10   >