Re: [PATCH v5 5/8] drm/xe: Add helper to accumulate exec queue runtime

2024-05-17 Thread Matt Roper
On Fri, May 17, 2024 at 01:43:07PM -0700, Lucas De Marchi wrote:
> From: Umesh Nerlige Ramappa 
> 
> Add a helper to accumulate per-client runtime of all its
> exec queues. This is called every time a sched job is finished.
> 
> v2:
>   - Use guc_exec_queue_free_job() and execlist_job_free() to accumulate
> runtime when job is finished since xe_sched_job_completed() is not a
> notification that job finished.
>   - Stop trying to update runtime from xe_exec_queue_fini() - that is
> redundant and may happen after xef is closed, leading to a
> use-after-free
>   - Do not special case the first timestamp read: the default LRC sets
> CTX_TIMESTAMP to zero, so even the first sample should be a valid
> one.
>   - Handle the parallel submission case by multiplying the runtime by
> width.
> v3: Update comments
> 
> Signed-off-by: Umesh Nerlige Ramappa 
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/xe/xe_device_types.h |  3 +++
>  drivers/gpu/drm/xe/xe_exec_queue.c   | 37 
>  drivers/gpu/drm/xe/xe_exec_queue.h   |  1 +
>  drivers/gpu/drm/xe/xe_execlist.c |  1 +
>  drivers/gpu/drm/xe/xe_guc_submit.c   |  2 ++
>  5 files changed, 44 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 5c5e36de452a..bc97990fd032 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -555,6 +555,9 @@ struct xe_file {
>   struct mutex lock;
>   } exec_queue;
>  
> + /** @runtime: hw engine class runtime in ticks for this drm client */
> + u64 runtime[XE_ENGINE_CLASS_MAX];
> +
>   /** @client: drm client */
>   struct xe_drm_client *client;
>  };
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c 
> b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 395de93579fa..fa6dc996eca8 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -769,6 +769,43 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
>   q->lrc[0].fence_ctx.next_seqno - 1;
>  }
>  
> +/**
> + * xe_exec_queue_update_runtime() - Update runtime for this exec queue from 
> hw
> + * @q: The exec queue
> + *
> + * Update the timestamp saved by HW for this exec queue and save runtime
> + * calculated by using the delta from last update. On multi-lrc case, only 
> the
> + * first is considered.
> + */
> +void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> +{
> + struct xe_file *xef;
> + struct xe_lrc *lrc;
> + u32 old_ts, new_ts;
> +
> + /*
> +  * Jobs that are run during driver load may use an exec_queue, but are
> +  * not associated with a user xe file, so avoid accumulating busyness
> +  * for kernel specific work.
> +  */
> + if (!q->vm || !q->vm->xef)
> + return;
> +
> + xef = q->vm->xef;
> +
> + /*
> +  * Only sample the first LRC. For parallel submission, all of them are
> +  * scheduled together and we compensate that below by multiplying by
> +  * width - this may introduce errors if that premise is not true and
> +  * they don't exit 100% aligned. On the other hand, looping through
> +  * the LRCs and reading them in different time could also introduce
> +  * errors.

At the time we're executing this function, those LRCs aren't executing
on the hardware anymore and their timestamps aren't continuing to move,
right?  I don't see where error could creep in from just looping over
each of them?

I guess parallel submission is mostly just used by media these days,
where the batches submitted in parallel are nearly identical and
expected to run the same amount of time, right?  Do we have any
userspace (or potential future userspace) that might submit
heterogeneous batches in parallel, which would make this inaccurate?

I'm not very familiar with the use cases of parallel submission, so I'll
trust that you've got a better understanding of the userspace usage than
I do; everything else here looks fine to me.

Reviewed-by: Matt Roper 


Matt

> +  */
> + lrc = >lrc[0];
> + new_ts = xe_lrc_update_timestamp(lrc, _ts);
> + xef->runtime[q->class] += (new_ts - old_ts) * q->width;
> +}
> +
>  void xe_exec_queue_kill(struct xe_exec_queue *q)
>  {
>   struct xe_exec_queue *eq = q, *next;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h 
> b/drivers/gpu/drm/xe/xe_exec_queue.h
> index 48f6da53a292..e0f07d28ee1a 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -75,5 +75,6 @@ struct dma_fence *xe_exec_queue_last_fence_get(struct 
> xe_exec_queu

Re: [PATCH v7 2/3] drm/i915/gt: Do not generate the command streamer for all the CCS

2024-03-27 Thread Matt Roper
On Wed, Mar 27, 2024 at 04:56:18PM +0100, Andi Shyti wrote:
> We want a fixed load CCS balancing consisting in all slices
> sharing one single user engine. For this reason do not create the
> intel_engine_cs structure with its dedicated command streamer for
> CCS slices beyond the first.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> Acked-by: Michal Mrozek 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index f553cf4e6449..47c4a69e854c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -908,6 +908,21 @@ static intel_engine_mask_t init_engine_mask(struct 
> intel_gt *gt)
>   info->engine_mask &= ~BIT(GSC0);
>   }
>  
> + /*
> +  * Do not create the command streamer for CCS slices beyond the first.
> +  * All the workload submitted to the first engine will be shared among
> +  * all the slices.
> +  *
> +  * Once the user will be allowed to customize the CCS mode, then this
> +  * check needs to be removed.
> +  */
> + if (IS_DG2(gt->i915)) {
> + intel_engine_mask_t first_ccs = BIT((CCS0 + 
> __ffs(CCS_MASK(gt;
> + intel_engine_mask_t all_ccs = CCS_MASK(gt) << CCS0;
> +
> + info->engine_mask &= ~(all_ccs &= ~first_ccs);

Shouldn't the second "&=" just be an "&" since there's no need to modify
the all_ccs variable that never gets used again?

In fact since this is DG2-specific, it seems like it might be more
intuitive to just write the whole thing more directly as

if (IS_DG2(gt->i915)) {
int first_ccs = __ffs(CCS_MASK(gt));

info->engine_mask &= ~GENMASK(CCS3, CCS0);
info->engine_mask |= BIT(_CCS(first_ccs));
}

But up to you; if you just want to remove the unnecessary "=" that's
fine too.  Either way,

Reviewed-by: Matt Roper 


Matt

> + }
> +
>   return info->engine_mask;
>  }
>  
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 2/3] drm/i915/gt: Do not generate the command streamer for all the CCS

2024-03-26 Thread Matt Roper
On Tue, Mar 26, 2024 at 07:42:34PM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> On Tue, Mar 26, 2024 at 09:03:10AM -0700, Matt Roper wrote:
> > On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
> > > + /*
> > > +  * Do not create the command streamer for CCS slices
> > > +  * beyond the first. All the workload submitted to the
> > > +  * first engine will be shared among all the slices.
> > > +  *
> > > +  * Once the user will be allowed to customize the CCS
> > > +  * mode, then this check needs to be removed.
> > > +  */
> > > + if (IS_DG2(i915) &&
> > > + class == COMPUTE_CLASS &&
> > > + ccs_instance++)
> > > + continue;
> > 
> > Wouldn't it be more intuitive to drop the non-lowest CCS engines in
> > init_engine_mask() since that's the function that's dedicated to
> > building the list of engines we'll use?  Then we don't need to kill the
> > assertion farther down either.
> 
> Because we don't check the result of init_engine_mask() while
> creating the engine's structure. We check it only after and
> indeed I removed the drm_WARN_ON() check.
> 
> I think the whole process of creating the engine's structure in
> the intel_engines_init_mmio() can be simplified, but this goes
> beyong the scope of the series.
> 
> Or am I missing something?

The important part of init_engine_mask isn't the return value, but
rather that it's what sets up gt->info.engine_mask.  The HAS_ENGINE()
check that intel_engines_init_mmio() uses is based on the value stored
there, so updating that function will also ensure that we skip the
engines we don't want in the loop.


Matt

> 
> Thanks,
> Andi

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 3/3] drm/i915/gt: Enable only one CCS for compute workload

2024-03-26 Thread Matt Roper
On Wed, Mar 13, 2024 at 09:19:51PM +0100, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices
> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/Makefile   |  1 +
>  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +
>  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  5 +++
>  drivers/gpu/drm/i915/gt/intel_workarounds.c |  7 
>  5 files changed, 65 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
>  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 3ef6ed41e62b..a6885a1d41a1 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -118,6 +118,7 @@ gt-y += \
>   gt/intel_ggtt_fencing.o \
>   gt/intel_gt.o \
>   gt/intel_gt_buffer_pool.o \
> + gt/intel_gt_ccs_mode.o \
>   gt/intel_gt_clock_utils.o \
>   gt/intel_gt_debugfs.o \
>   gt/intel_gt_engines_debugfs.o \
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> new file mode 100644
> index ..044219c5960a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include "i915_drv.h"
> +#include "intel_gt.h"
> +#include "intel_gt_ccs_mode.h"
> +#include "intel_gt_regs.h"
> +
> +void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> + int cslice;
> + u32 mode = 0;
> + int first_ccs = __ffs(CCS_MASK(gt));
> +
> + if (!IS_DG2(gt->i915))
> + return;
> +
> + /* Build the value for the fixed CCS load balancing */
> + for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
> + if (CCS_MASK(gt) & BIT(cslice))
> + /*
> +  * If available, assign the cslice
> +  * to the first available engine...
> +  */
> + mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
> +
> + else
> + /*
> +  * ... otherwise, mark the cslice as
> +  * unavailable if no CCS dispatches here
> +  */
> + mode |= XEHP_CCS_MODE_CSLICE(cslice,
> +  XEHP_CCS_MODE_CSLICE_MASK);
> + }
> +
> + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
> new file mode 100644
> index ..9e5549caeb26
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#ifndef __INTEL_GT_CCS_MODE_H__
> +#define __INTEL_GT_CCS_MODE_H__
> +
> +struct intel_gt;
> +
> +void intel_gt_apply_ccs_mode(struct intel_gt *gt);
> +
> +#endif /* __INTEL_GT_CCS_MODE_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 31b102604e3d..743fe3566722 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1480,6 +1480,11 @@
>  #define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
>  #define   GEN12_RCU_MODE_CCS_ENABLE  REG_BIT(0)
>  
> +#define XEHP_CCS_MODE_MMIO(0x14804)
> +#define   XEHP_CCS_MODE_CSLICE_MASK  REG_GENMASK(2, 0) /* CCS0-3 + 
> rsvd */
> +#define   XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK 
> + 1)
> +#define   XEHP_CCS_MODE_CSLICE(cslice, ccs)  (ccs << (cslice * 
> XEHP_CCS_MODE_CSLICE_WIDTH))
> +
>  #define CHV_FUSE_GT  _MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
>  #define   CHV_FGT_DISABLE_SS1(1 << 11)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workar

Re: [PATCH v6 2/3] drm/i915/gt: Do not generate the command streamer for all the CCS

2024-03-26 Thread Matt Roper
On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
> We want a fixed load CCS balancing consisting in all slices
> sharing one single user engine. For this reason do not create the
> intel_engine_cs structure with its dedicated command streamer for
> CCS slices beyond the first.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index f553cf4e6449..c4fb31bb6e72 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -966,6 +966,7 @@ int intel_engines_init_mmio(struct intel_gt *gt)
>   const unsigned int engine_mask = init_engine_mask(gt);
>   unsigned int mask = 0;
>   unsigned int i, class;
> + u8 ccs_instance = 0;
>   u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
>   int err;
>  
> @@ -986,6 +987,19 @@ int intel_engines_init_mmio(struct intel_gt *gt)
>   !HAS_ENGINE(gt, i))
>   continue;
>  
> + /*
> +  * Do not create the command streamer for CCS slices
> +  * beyond the first. All the workload submitted to the
> +  * first engine will be shared among all the slices.
> +  *
> +  * Once the user will be allowed to customize the CCS
> +  * mode, then this check needs to be removed.
> +  */
> + if (IS_DG2(i915) &&
> + class == COMPUTE_CLASS &&
> + ccs_instance++)
> + continue;

Wouldn't it be more intuitive to drop the non-lowest CCS engines in
init_engine_mask() since that's the function that's dedicated to
building the list of engines we'll use?  Then we don't need to kill the
assertion farther down either.


Matt

> +
>   err = intel_engine_setup(gt, i,
>logical_ids[instance]);
>   if (err)
> @@ -996,11 +1010,9 @@ int intel_engines_init_mmio(struct intel_gt *gt)
>   }
>  
>   /*
> -  * Catch failures to update intel_engines table when the new engines
> -  * are added to the driver by a warning and disabling the forgotten
> -  * engines.
> +  * Update the intel_engines table.
>*/
> -     if (drm_WARN_ON(>drm, mask != engine_mask))
> + if (mask != engine_mask)
>   gt->info.engine_mask = mask;
>  
>   gt->info.num_engines = hweight32(mask);
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/guc: Update w/a 14019159160

2024-03-14 Thread Matt Roper
On Tue, Mar 12, 2024 at 04:43:06PM -0700, John Harrison wrote:
> On 3/12/2024 09:24, Matt Roper wrote:
> > On Thu, Mar 07, 2024 at 06:01:29PM -0800, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > An existing workaround has been extended in both platforms affected
> > > and implementation complexity.
> > > 
> > > Signed-off-by: John Harrison 
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h |  3 ++-
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.c|  3 ++-
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 21 ++-
> > >   3 files changed, 15 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h 
> > > b/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> > > index bebf28e3c4794..3e7060e859794 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> > > @@ -105,7 +105,8 @@ enum {
> > >* Workaround keys:
> > >*/
> > >   enum {
> > > - GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE   = 
> > > 0x9001,
> > > + GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE   = 
> > > 0x9001,   /* Wa_14019159160 */
> > > + GUC_WORKAROUND_KLV_AVOID_GFX_CLEAR_WHILE_ACTIVE = 
> > > 0x9006,   /* Wa_14019159160 */
> > >   };
> > >   #endif /* _ABI_GUC_KLVS_ABI_H */
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > index 0c67d674c94de..4c3dae98656af 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > @@ -296,7 +296,8 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
> > >   /* Wa_16019325821 */
> > >   /* Wa_14019159160 */
> > > - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)))
> > > + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) ||
> >  From what I can see, this workaround is also needed on Xe_LPG+ (12.74)
> Isn't that an Xe platform? Or is 12.74 just ARL?

12.74 / Xe_LPG+ is used in some ARL, which is being officially supported
by i915.


Matt

> 
> John.
> 
> > now.
> > 
> > 
> > Matt
> > 
> > > + IS_DG2(gt->i915))
> > >   flags |= GUC_WA_RCS_CCS_SWITCHOUT;
> > >   /*
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > index 5c9908b56616e..00fe3c21a9b1c 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > @@ -815,23 +815,23 @@ guc_capture_prep_lists(struct intel_guc *guc)
> > >   return PAGE_ALIGN(total_size);
> > >   }
> > > -/* Wa_14019159160 */
> > > -static u32 guc_waklv_ra_mode(struct intel_guc *guc, u32 offset, u32 
> > > remain)
> > > +static void guc_waklv_enable_simple(struct intel_guc *guc, u32 *offset, 
> > > u32 *remain, u32 klv_id)
> > >   {
> > >   u32 size;
> > >   u32 klv_entry[] = {
> > >   /* 16:16 key/length */
> > > - FIELD_PREP(GUC_KLV_0_KEY, 
> > > GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE) |
> > > + FIELD_PREP(GUC_KLV_0_KEY, klv_id) |
> > >   FIELD_PREP(GUC_KLV_0_LEN, 0),
> > >   /* 0 dwords data */
> > >   };
> > >   size = sizeof(klv_entry);
> > > - GEM_BUG_ON(remain < size);
> > > + GEM_BUG_ON(*remain < size);
> > > - iosys_map_memcpy_to(>ads_map, offset, klv_entry, size);
> > > + iosys_map_memcpy_to(>ads_map, *offset, klv_entry, size);
> > > - return size;
> > > + *offset += size;
> > > + *remain -= size;
> > >   }
> > >   static void guc_waklv_init(struct intel_guc *guc)
> > > @@ -850,10 +850,11 @@ static void guc_waklv_init(struct intel_guc *guc)
> > >       remain = guc_ads_waklv_size(guc);
> > >   /* Wa_14019159160 */
> > > - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) {
> > > - size = guc_waklv_ra_mode(guc, offset, remain);
> > > - offset += size;
> > > - remain -= size;
> > > + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || 
> > > IS_DG2(gt->i915)) {
> > > + guc_waklv_enable_simple(guc, , ,
> > > + GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE);
> > > + guc_waklv_enable_simple(guc, , ,
> > > + 
> > > GUC_WORKAROUND_KLV_AVOID_GFX_CLEAR_WHILE_ACTIVE);
> > >   }
> > >   size = guc_ads_waklv_size(guc) - remain;
> > > -- 
> > > 2.43.0
> > > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915: remove platform checks in platform-specific handlers

2024-03-13 Thread Matt Roper
On Wed, Mar 13, 2024 at 07:27:36PM +0300, Nikita Kiryushin wrote:
> 
> Remove IS_KABYLAKE and IS_SKYLAKE in special handlers for
> skylake and kabylake: the checks are done at hook initialization and are
> always true in corresponding handlers.
> 
> Signed-off-by: Nikita Kiryushin 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 +++---
>  drivers/gpu/drm/i915/intel_clock_gating.c   | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 3eacbc50caf8..8eff6be9d74c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -601,7 +601,7 @@ static void kbl_ctx_workarounds_init(struct
> intel_engine_cs *engine,
>   gen9_ctx_workarounds_init(engine, wal);
>   /* WaToEnableHwFixForPushConstHWBug:kbl */
> - if (IS_KABYLAKE(i915) && IS_GRAPHICS_STEP(i915, STEP_C0, STEP_FOREVER))
> + if (IS_GRAPHICS_STEP(i915, STEP_C0, STEP_FOREVER))
>   wa_masked_en(wal, COMMON_SLICE_CHICKEN2,
>GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION);
>  @@ -1169,7 +1169,7 @@ skl_gt_workarounds_init(struct intel_gt *gt, struct
> i915_wa_list *wal)
>   GEN8_EU_GAUNIT_CLOCK_GATE_DISABLE);
>   /* WaInPlaceDecompressionHang:skl */
> - if (IS_SKYLAKE(gt->i915) && IS_GRAPHICS_STEP(gt->i915, STEP_A0, 
> STEP_H0))
> + if (IS_GRAPHICS_STEP(gt->i915, STEP_A0, STEP_H0))
>   wa_write_or(wal,
>   GEN9_GAMT_ECO_REG_RW_IA,
>   GAMT_ECO_ENABLE_IN_PLACE_DECOMPRESS);
> @@ -1181,7 +1181,7 @@ kbl_gt_workarounds_init(struct intel_gt *gt, struct
> i915_wa_list *wal)
>   gen9_gt_workarounds_init(gt, wal);
>   /* WaDisableDynamicCreditSharing:kbl */
> - if (IS_KABYLAKE(gt->i915) && IS_GRAPHICS_STEP(gt->i915, 0, STEP_C0))
> + if (IS_GRAPHICS_STEP(gt->i915, 0, STEP_C0))
>   wa_write_or(wal,
>   GAMT_CHKN_BIT_REG,
>   GAMT_CHKN_DISABLE_DYNAMIC_CREDIT_SHARING);
> diff --git a/drivers/gpu/drm/i915/intel_clock_gating.c
> b/drivers/gpu/drm/i915/intel_clock_gating.c
> index 9c21ce69bd98..977251bcbf42 100644
> --- a/drivers/gpu/drm/i915/intel_clock_gating.c
> +++ b/drivers/gpu/drm/i915/intel_clock_gating.c
> @@ -413,12 +413,12 @@ static void kbl_init_clock_gating(struct
> drm_i915_private *i915)
>   intel_uncore_rmw(>uncore, FBC_LLC_READ_CTRL, 0, 
> FBC_LLC_FULLY_OPEN);
>   /* WaDisableSDEUnitClockGating:kbl */
> - if (IS_KABYLAKE(i915) && IS_GRAPHICS_STEP(i915, 0, STEP_C0))
> + if (IS_GRAPHICS_STEP(i915, 0, STEP_C0))
>   intel_uncore_rmw(>uncore, GEN8_UCGCTL6,
>0, GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
>   /* WaDisableGamClockGating:kbl */
> - if (IS_KABYLAKE(i915) && IS_GRAPHICS_STEP(i915, 0, STEP_C0))
> + if (IS_GRAPHICS_STEP(i915, 0, STEP_C0))
>   intel_uncore_rmw(>uncore, GEN6_UCGCTL1,
>0, GEN6_GAMUNIT_CLOCK_GATE_DISABLE);
>  -- 2.34.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH 2/5] drm/i915: Drop dead code for xehpsdv

2024-03-12 Thread Matt Roper
On Wed, Mar 06, 2024 at 11:36:40AM -0800, Lucas De Marchi wrote:
> PCI IDs for XEHPSDV were never added and platform always marked with
> force_probe. Drop what's not used and rename some places to either be
> xehp or dg2, depending on the platform/IP checks.
> 
> The registers not used anymore are also removed.
> 
> Signed-off-by: Lucas De Marchi 
> ---
> 
> Potential problem here that needs a deeper look, the changes in
> __gen12_fw_ranges. Some ranges had comments saying they were XEHPSDV so
> I removed them, but it needs to be double checked with spec and CI
> results.
> 
...
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> b/drivers/gpu/drm/i915/intel_uncore.c
> index 76400e9c40f0..4f1e56187442 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1536,17 +1536,12 @@ static const struct intel_forcewake_range 
> __gen12_fw_ranges[] = {
>   GEN_FW_RANGE(0x13200, 0x13fff, FORCEWAKE_MEDIA_VDBOX2), /*  
> \
>   0x13200 - 0x133ff: VD2 (DG2 only)   
> \
>   0x13400 - 0x13fff: reserved */  
> \
> - GEN_FW_RANGE(0x14000, 0x141ff, FORCEWAKE_MEDIA_VDBOX0), /* XEHPSDV only 
> */  \
> - GEN_FW_RANGE(0x14200, 0x143ff, FORCEWAKE_MEDIA_VDBOX2), /* XEHPSDV only 
> */  \
> - GEN_FW_RANGE(0x14400, 0x145ff, FORCEWAKE_MEDIA_VDBOX4), /* XEHPSDV only 
> */  \
> - GEN_FW_RANGE(0x14600, 0x147ff, FORCEWAKE_MEDIA_VDBOX6), /* XEHPSDV only 
> */  \

We can't just remove ranges in the middle of the table since that breaks
the "watertight" table requirement that our selftests check for.  We
need to either roll the now-unused ranges into an adjacent range, or add
a new "reserved" range.

>   GEN_FW_RANGE(0x14800, 0x14fff, FORCEWAKE_RENDER),   
> \
>   GEN_FW_RANGE(0x15000, 0x16dff, FORCEWAKE_GT), /*
> \
>   0x15000 - 0x15fff: gt (DG2 only)
> \
>   0x16000 - 0x16dff: reserved */  
> \
>   GEN_FW_RANGE(0x16e00, 0x1, FORCEWAKE_RENDER),   
> \
> - GEN_FW_RANGE(0x2, 0x21fff, FORCEWAKE_MEDIA_VDBOX0), /*  
> \
> - 0x2 - 0x20fff: VD0 (XEHPSDV only)   
> \
> + GEN_FW_RANGE(0x21000, 0x21fff, FORCEWAKE_MEDIA_VDBOX0), /*  
> \
>   0x21000 - 0x21fff: reserved */  
> \
>   GEN_FW_RANGE(0x22000, 0x23fff, FORCEWAKE_GT),   
> \
>   GEN_FW_RANGE(0x24000, 0x2417f, 0), /*   
> \
> @@ -1627,10 +1622,6 @@ static const struct intel_forcewake_range 
> __gen12_fw_ranges[] = {
>   0x1f6e00 - 0x1f7fff: reserved */
> \
>   GEN_FW_RANGE(0x1f8000, 0x1fa0ff, FORCEWAKE_MEDIA_VEBOX3),
>  
> -static const struct intel_forcewake_range __xehp_fw_ranges[] = {
> - XEHP_FWRANGES(FORCEWAKE_GT)
> -};
> -
>  static const struct intel_forcewake_range __dg2_fw_ranges[] = {
>   XEHP_FWRANGES(FORCEWAKE_RENDER)

We can drop the macro here now and just make this a normal table like
everything else.


Matt

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH 1/5] drm/i915: Drop WA 16015675438

2024-03-12 Thread Matt Roper
On Wed, Mar 06, 2024 at 11:36:39AM -0800, Lucas De Marchi wrote:
> With dynamic load-balancing disabled on the compute side, there's no
> reason left to enable WA 16015675438. Drop it from both PVC and DG2.
> Note that this can be done because now the driver always set a fixed
> partition of EUs during initialization via the ccs_mode configuration.
> 
> The flag to GuC is still needed because of 18020744125, so update
> the comment accordingly.
> 
> Cc: Mateusz Jablonski 
> Cc: Michal Mrozek 
> Cc: Rodrigo Vivi 
> Signed-off-by: Lucas De Marchi 

Dynamic load-balancing disable hasn't landed in i915 yet (although it
probably will soon).  Assuming we wait for that to happen first before
applying this,

Reviewed-by: Matt Roper 


Matt

> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c  | 2 +-
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..7f812409c30a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2928,14 +2928,10 @@ general_render_compute_wa_init(struct intel_engine_cs 
> *engine, struct i915_wa_li
>   wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0, 
> DISABLE_D8_D16_COASLESCE);
>   }
>  
> - if (IS_PONTEVECCHIO(i915) || IS_DG2(i915)) {
> + if (IS_PONTEVECCHIO(i915) || IS_DG2(i915))
>   /* Wa_14015227452:dg2,pvc */
>   wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE);
>  
> - /* Wa_16015675438:dg2,pvc */
> - wa_masked_en(wal, FF_SLICE_CS_CHICKEN2, 
> GEN12_PERF_FIX_BALANCING_CFE_DISABLE);
> - }
> -
>   if (IS_DG2(i915)) {
>   /*
>* Wa_16011620976:dg2_g11
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index d2b7425bbdcc..c6603793af89 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -315,7 +315,7 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
>   if (IS_DG2_G11(gt->i915))
>   flags |= GUC_WA_CONTEXT_ISOLATION;
>  
> - /* Wa_16015675438 */
> + /* Wa_18020744125 */
>   if (!RCS_MASK(gt))
>   flags |= GUC_WA_RCS_REGS_IN_CCS_REGS_LIST;
>  
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v5 2/4] drm/i915/gt: Refactor uabi engine class/instance list creation

2024-03-12 Thread Matt Roper
On Fri, Mar 08, 2024 at 09:22:17PM +0100, Andi Shyti wrote:
> For the upcoming changes we need a cleaner way to build the list
> of uabi engines.
> 
> Suggested-by: Tvrtko Ursulin 
> Signed-off-by: Andi Shyti 
> Cc:  # v6.2+

I don't really see why we need patches 2 & 3 in this series.  If we want
to restrict the platform to a single CCS engine for now (and give that
single engine access to all of the cslices), it would be much simpler to
only create a single intel_engine_cs which which would then cause both
i915 and userspace to only consider a single engine, even if more than
one is physically present.  That could be done with a simple adjustment
to engine_mask_apply_compute_fuses() to mask off extra bits from the
engine mask such that only a single CCS can get returned rather than the
mask of all CCSs that are present.

Managing all of the engines in the KMD but only exposing one (some) of
them to userspace might be something we need if you want to add extra
functionality down to road to "hotplug" extra engines, or to allow
userspace to explicitly request multi-CCS mode.  But none of that seems
necessary for this series, especially for something you're backporting
to stable kernels.


Matt

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 -
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..11cc06c0c785 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, 
> const char *name, u16
>  
>  void intel_engines_driver_register(struct drm_i915_private *i915)
>  {
> - u16 name_instance, other_instance = 0;
> + u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
>   struct legacy_ring ring = {};
>   struct list_head *it, *next;
>   struct rb_node **p, *prev;
> @@ -214,6 +214,8 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   prev = NULL;
>   p = >uabi_engines.rb_node;
>   list_for_each_safe(it, next, ) {
> + u16 uabi_class;
> +
>   struct intel_engine_cs *engine =
>   container_of(it, typeof(*engine), uabi_list);
>  
> @@ -222,15 +224,14 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>  
>   GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
>   engine->uabi_class = uabi_classes[engine->class];
> - if (engine->uabi_class == I915_NO_UABI_CLASS) {
> - name_instance = other_instance++;
> - } else {
> - GEM_BUG_ON(engine->uabi_class >=
> -ARRAY_SIZE(i915->engine_uabi_class_count));
> - name_instance =
> - 
> i915->engine_uabi_class_count[engine->uabi_class]++;
> - }
> - engine->uabi_instance = name_instance;
> +
> + if (engine->uabi_class == I915_NO_UABI_CLASS)
> + uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
> + else
> + uabi_class = engine->uabi_class;
> +
> + GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
> + engine->uabi_instance = class_instance[uabi_class]++;
>  
>   /*
>* Replace the internal name with the final user and log facing
> @@ -238,11 +239,15 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>*/
>   engine_rename(engine,
> intel_engine_class_repr(engine->class),
> -   name_instance);
> +   engine->uabi_instance);
>  
> - if (engine->uabi_class == I915_NO_UABI_CLASS)
> + if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
>   continue;
>  
> + GEM_BUG_ON(uabi_class >=
> +ARRAY_SIZE(i915->engine_uabi_class_count));
> + i915->engine_uabi_class_count[uabi_class]++;
> +
>   rb_link_node(>uabi_node, prev, p);
>   rb_insert_color(>uabi_node, >uabi_engines);
>  
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v5 1/4] drm/i915/gt: Disable HW load balancing for CCS

2024-03-12 Thread Matt Roper
On Fri, Mar 08, 2024 at 09:22:16PM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++--
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE   _MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE  REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)

Nitpick: we usually order register bits in descending order.  Aside from
that,

Reviewed-by: Matt Roper 

although I still hope our architects will push through a formal
documentation update for this.


Matt

>  
>  #define CHV_FUSE_GT  _MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 25413809b9dc..4865eb5ca9c9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -51,7 +51,8 @@
>   *   registers belonging to BCS, VCS or VECS should be implemented in
>   *   xcs_engine_wa_init(). Workarounds for registers not belonging to a 
> specific
>   *   engine's MMIO range but that are part of of the common RCS/CCS reset 
> domain
> - *   should be implemented in general_render_compute_wa_init().
> + *   should be implemented in general_render_compute_wa_init(). The settings
> + *   about the CCS load balancing should be added in ccs_engine_wa_mode().
>   *
>   * - GT workarounds: the list of these WAs is applied whenever these 
> registers
>   *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
> @@ -2854,6 +2855,22 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
>   wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC);
>  }
>  
> +static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct 
> i915_wa_list *wal)
> +{
> + struct intel_gt *gt = engine->gt;
> +
> + if (!IS_DG2(gt->i915))
> + return;
> +
> + /*
> +  * Wa_14019159160: This workaround, along with others, leads to
> +  * significant challenges in utilizing load balancing among the
> +  * CCS slices. Consequently, an architectural decision has been
> +  * made to completely disable automatic CCS load balancing.
> +  */
> + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
> +}
> +
>  /*
>   * The workarounds in this function apply to shared registers in
>   * the general render reset domain that aren't tied to a
> @@ -3004,8 +3021,10 @@ engine_init_workarounds(struct intel_engine_cs 
> *engine, struct i915_wa_list *wal
>* to a single RCS/CCS engine's workaround list since
>* they're reset as part of the general render domain reset.
>*/
> - if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
> + if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) {
>   general_render_compute_wa_init(engine, wal);
> + ccs_engine_wa_mode(engine, wal);
> + }
>  
>   if (engine->class == COMPUTE_CLASS)
>   ccs_engine_wa_init(engine, wal);
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/guc: Update w/a 14019159160

2024-03-12 Thread Matt Roper
On Thu, Mar 07, 2024 at 06:01:29PM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> An existing workaround has been extended in both platforms affected
> and implementation complexity.
> 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h |  3 ++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  3 ++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 21 ++-
>  3 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h 
> b/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> index bebf28e3c4794..3e7060e859794 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h
> @@ -105,7 +105,8 @@ enum {
>   * Workaround keys:
>   */
>  enum {
> - GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE   = 
> 0x9001,
> + GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE   = 
> 0x9001,   /* Wa_14019159160 */
> + GUC_WORKAROUND_KLV_AVOID_GFX_CLEAR_WHILE_ACTIVE = 
> 0x9006,   /* Wa_14019159160 */
>  };
>  
>  #endif /* _ABI_GUC_KLVS_ABI_H */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 0c67d674c94de..4c3dae98656af 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -296,7 +296,8 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
>  
>   /* Wa_16019325821 */
>   /* Wa_14019159160 */
> - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)))
> + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) ||

>From what I can see, this workaround is also needed on Xe_LPG+ (12.74)
now.


Matt

> + IS_DG2(gt->i915))
>   flags |= GUC_WA_RCS_CCS_SWITCHOUT;
>  
>   /*
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 5c9908b56616e..00fe3c21a9b1c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -815,23 +815,23 @@ guc_capture_prep_lists(struct intel_guc *guc)
>   return PAGE_ALIGN(total_size);
>  }
>  
> -/* Wa_14019159160 */
> -static u32 guc_waklv_ra_mode(struct intel_guc *guc, u32 offset, u32 remain)
> +static void guc_waklv_enable_simple(struct intel_guc *guc, u32 *offset, u32 
> *remain, u32 klv_id)
>  {
>   u32 size;
>   u32 klv_entry[] = {
>   /* 16:16 key/length */
> - FIELD_PREP(GUC_KLV_0_KEY, 
> GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE) |
> + FIELD_PREP(GUC_KLV_0_KEY, klv_id) |
>   FIELD_PREP(GUC_KLV_0_LEN, 0),
>   /* 0 dwords data */
>   };
>  
>   size = sizeof(klv_entry);
> - GEM_BUG_ON(remain < size);
> + GEM_BUG_ON(*remain < size);
>  
> - iosys_map_memcpy_to(>ads_map, offset, klv_entry, size);
> + iosys_map_memcpy_to(>ads_map, *offset, klv_entry, size);
>  
> - return size;
> + *offset += size;
> + *remain -= size;
>  }
>  
>  static void guc_waklv_init(struct intel_guc *guc)
> @@ -850,10 +850,11 @@ static void guc_waklv_init(struct intel_guc *guc)
>   remain = guc_ads_waklv_size(guc);
>  
>   /* Wa_14019159160 */
> - if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))) {
> - size = guc_waklv_ra_mode(guc, offset, remain);
> - offset += size;
> - remain -= size;
> + if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || 
> IS_DG2(gt->i915)) {
> + guc_waklv_enable_simple(guc, , ,
> +     GUC_WORKAROUND_KLV_SERIALIZED_RA_MODE);
> + guc_waklv_enable_simple(guc, , ,
> + 
> GUC_WORKAROUND_KLV_AVOID_GFX_CLEAR_WHILE_ACTIVE);
>   }
>  
>   size = guc_ads_waklv_size(guc) - remain;
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 3/3] drm/i915/gt: Enable only one CCS for compute workload

2024-03-06 Thread Matt Roper
On Wed, Mar 06, 2024 at 02:22:47AM +0100, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices
> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Requires: 97aba5e46038 ("drm/i915/gt: Refactor uabi engine class/instance 
> list creation")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 11 ++
>  drivers/gpu/drm/i915/gt/intel_gt.c  | 23 +
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  5 +
>  3 files changed, 39 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 11cc06c0c785..9ef1c4ce252d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -208,6 +208,7 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   struct list_head *it, *next;
>   struct rb_node **p, *prev;
>   LIST_HEAD(engines);
> + u16 uabi_ccs = 0;
>  
>   sort_engines(i915, );
>  
> @@ -244,6 +245,16 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
>   continue;
>  
> + /*
> +  * The load is balanced among all the available compute
> +  * slices. Expose only the first instance of the compute
> +  * engine.
> +  */
> + if (IS_DG2(i915) &&
> + uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> + uabi_ccs++)
> + continue;
> +
>   GEM_BUG_ON(uabi_class >=
>  ARRAY_SIZE(i915->engine_uabi_class_count));
>   i915->engine_uabi_class_count[uabi_class]++;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index a425db5ed3a2..0aac97439552 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -168,6 +168,26 @@ static void init_unused_rings(struct intel_gt *gt)
>   }
>  }
>  
> +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> + u32 mode;
> + int cslice;
> +
> + if (!IS_DG2(gt->i915))
> + return;
> +
> + /* Set '0' as a default CCS id to all the cslices */
> + mode = 0;
> +
> + for (cslice = 0; cslice < hweight32(CCS_MASK(gt)); cslice++)
> + /* Write 0x7 if no CCS context dispatches to this cslice */
> + if (!(CCS_MASK(gt) & BIT(cslice)))
> + mode |= XEHP_CCS_MODE_CSLICE(cslice,
> +  XEHP_CCS_MODE_CSLICE_MASK);
> +
> + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);

This is still going to hook all available cslices up to hardware engine
ccs0.  But what you actually want is to hook them all up to what
userspace sees as CCS0 (i.e., the first CCS engine that wasn't fused
off).  Hardware's engine numbering and userspace's numbering aren't the
same.

Also, if you have a part that only has hardware ccs1/cslice1 for
example, you're not going to set cslices 2 & 3 to 0x7 properly.

So probably what you want is something like this (untested):

static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
{
u32 mode = 0;
int first_ccs = __ffs(CCS_MASK(gt));

/*
 * Re-assign every present cslice to the first available CCS
 * engine; mark unavailable cslices as unused.
 */
for (int cslice = 0; cslice < 4; cslice++) {
if (CCS_MASK(gt) & BIT(cslice))
mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
else
mode |= XEHP_CCS_MODE_CSLICE(cslice,
 XEHP_CCS_MODE_CSLICE_MASK);
}

intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
}

> +}
> +
>  int intel_gt_init_hw(struct intel_gt *gt)
>  {
>   struct drm_i915_private *i915 = gt->i915;
> @@ -195,6 +215,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
>  
>   intel_gt_init_swizzling(gt);
>  
> + /* Configure CCS mode */
> + intel_gt_apply_ccs_mode(gt);

This is only setting this once during init.  The value gets lost on
every RCS/CCS reset, so we need to make sure it ge

Re: [PATCH v4 1/3] drm/i915/gt: Disable HW load balancing for CCS

2024-03-06 Thread Matt Roper
On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE   _MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE  REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
>  
>  #define CHV_FUSE_GT  _MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..a2e78cf0b5f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs 
> *engine, struct i915_wa_li
>  
>   /* Wa_18028616096 */
>   wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, 
> UGM_FRAGMENT_THRESHOLD_TO_3);
> +
> + /*
> +  * Wa_14019159160: disable the automatic CCS load balancing

I'm still a bit concerned that this doesn't really match what this
specific workaround is asking us to do.  There seems to be an agreement
on various internal email threads that we need to disable load
balancing, but there's no single specific workaround that officially
documents that decision.

This specific workaround asks us to do a bunch of different things, and
the third item it asks for is to disable load balancing in very specific
cases (i.e., while the RCS is active at the same time as one or more CCS
engines).  Taking this workaround in isolation, it would be valid to
keep load balancing active if you were just using the CCS engines and
leaving the RCS idle, or if balancing was turned on/off by the GuC
scheduler according to engine use at the moment, as the documented
workaround seems to assume will be the case.

So in general I think we do need to disable load balancing based on
other offline discussion, but blaming that entire change on
Wa_14019159160 seems a bit questionable since it's not really what this
specific workaround is asking us to do and someone may come back and try
to "correct" the implementation of this workaround in the future without
realizing there are other factors too.  It would be great if we could
get hardware teams to properly document this expectation somewhere
(either in a separate dedicated workaround, or in the MMIO tuning guide)
so that we'll have a more direct and authoritative source for such a
large behavioral change.


Matt

> +  */
> +     wa_masked_en(wal, GEN12_RCU_MODE, 
> XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
>   }
>  
>   if (IS_DG2_G11(i915)) {
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v3 2/4] drm/i915/gt: Do not exposed fused off engines.

2024-03-05 Thread Matt Roper
On Fri, Mar 01, 2024 at 12:28:57AM +0100, Andi Shyti wrote:
> Some of the CCS engines are disabled. They should not be listed
> in the uabi_engine list, that is the list of engines that the
> user can see.

Fused off engines already aren't visible to userspace (or to the kernel
for that matter).  For CCS engines engine_mask_apply_compute_fuses()
removes the fused off engines from the runtime engine mask; other engine
types are handled in similar functions.  Any engine that doesn't appear
in the filtered down engine_mask won't even have a 'struct
intel_engine_cs' allocated for it.


Matt

> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Requires: 4e4f77d74878 ("drm/i915/gt: Refactor uabi engine class/instance 
> list creation")
> Signed-off-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index cf8f24ad88f6..ec5bcd1c1ec4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -244,6 +244,18 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
>   continue;
>  
> + /*
> +  * If the CCS engine is fused off, the corresponding bit
> +  * in the engine mask is disabled. Do not expose it
> +  * to the user.
> +  *
> +  * By default at least one engine is enabled (check
> +  * the engine_mask_apply_compute_fuses() function.
> +  */
> + if (!(engine->gt->info.engine_mask &
> +   BIT(_CCS(engine->uabi_instance
> + continue;
> +
>   GEM_BUG_ON(uabi_class >=
>      ARRAY_SIZE(i915->engine_uabi_class_count));
>   i915->engine_uabi_class_count[uabi_class]++;
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-22 Thread Matt Roper
On Thu, Feb 22, 2024 at 11:03:27PM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> first of all thanks a lot for the observations you are raising.
> 
> On Wed, Feb 21, 2024 at 12:51:04PM -0800, Matt Roper wrote:
> > On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> > > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> 
> ...
> 
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> > > > > b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > index a425db5ed3a2..e19df4ef47f6 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt 
> > > > > *gt)
> > > > >   }
> > > > >  }
> > > > >  
> > > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > > > +{
> > > > > + if (!IS_DG2(gt->i915))
> > > > > + return;
> > > > > +
> > > > > + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > > > 
> > > > This doesn't look right to me.  A value of 0 means every cslice gets
> > > > associated with CCS0.
> > > 
> > > Yes, that's what I'm trying to do. The behavior I'm looking for
> > > is this one:
> > > 
> > >/*
> > > ...
> > >   * With 1 engine (ccs0):
> > >   *   slice 0, 1, 2, 3: ccs0
> > >   *
> > >   * With 2 engines (ccs0, ccs1):
> > >   *   slice 0, 2: ccs0
> > >   *   slice 1, 3: ccs1
> > >   *
> > >   * With 4 engines (ccs0, ccs1, ccs2, ccs3):
> > >   *   slice 0: ccs0
> > >   *   slice 1: ccs1
> > >   *   slice 2: ccs2
> > >   *   slice 3: ccs3
> > > ...
> > > */
> > > 
> > > where the user can configure runtime the mode, making sure that
> > > no client is connected to i915.
> > > 
> > > But, this needs to be written 
> > > 
> > > As we are now forcing mode '1', then all cslices are connected
> > > with ccs0.
> > 
> > Right --- and that's what I'm pointing out as illegal.  I think that
> > code comment above was taken out of context from a different RFC series;
> > that's not an accurate description of the behavior we want here.
> > 
> > First, the above comment is using ccs# to refer to userspace engines,
> > not hardware engines.  As a simple example, DG2-G11 only ever has a
> > single CCS which userspace sees as "instance 0" but which is actually
> > CCS1 at the hardware level.  If you try to follow the comment above when
> > programming CCS_MODE, you've assigned all of the cslices to a
> > non-existent engine and assigned no cslices to the CCS engine that
> > actually exists.  For DG2-G10 (and I think DG2-G12), there are different
> > combinations of fused-off / not-fused-off engines that will always show
> > up in userspace as CCS0-CCSn, even if those don't match the hardware
> > IDs.
> > 
> > Second, the above comment is assuming that you have a part with a
> > maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
> > as an example, there's also only a single cslice (cslice1), so if you
> > tell CCS1 that it's allowed to use EUs from non-existent cslice0,
> > cslice2, and cslice3, you might not get the behavior you were hoping
> > for.
> 
> if the hardware slices are fused off we wouldn't see them in a
> first place, right? And that's anyway a permanent configuration
> that wouldn't affect the patch.

There are physically four possible cslices in the IP design.  The
presence/absence of each of those cslices can vary both by SKU and by
part-specific fusing.  Some SKUs (DG2-G11) wind up only ever having a
single possible configuration as far as I know, but the larger SKUs have
more part-to-part variation in terms of exactly which specific subset of
DSS (and by extension cslices) are present/absent.  The KMD determines
the configuration at runtime by reading the DSS fuse registers and
deriving the cslice presence/absence from that.

The register you're writing in this patch tells the CCS engine which
cslice(s) it can use to execute work.  If the KMD already knows that
cslice doesn't exist, but it tells CCS that it can go ahead and
use it anyway, things probably won't work properly.  That's why the spec

Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-21 Thread Matt Roper
On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> thanks a lot for looking into this.
> 
> On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> 
> [...]
> 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> > > b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > index 833987015b8b..7041acc77810 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct 
> > > drm_i915_private *i915)
> > >   if (engine->uabi_class == I915_NO_UABI_CLASS)
> > >   continue;
> > >  
> > > + /*
> > > +  * Do not list and do not count CCS engines other than the first
> > > +  */
> > > + if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > > + engine->uabi_instance > 0) {
> > > + i915->engine_uabi_class_count[engine->uabi_class]--;
> > > + continue;
> > > + }
> > 
> > Wouldn't it be simpler to just add a workaround to the end of
> > engine_mask_apply_compute_fuses() if we want to ensure only a single
> > compute engine gets exposed?  Then both the driver internals and uapi
> > will agree that's there's just one CCS (and on which one there is).
> > 
> > If we want to do something fancy with "hotplugging" a new engine later
> > on or whatever, that can be handled in the future series (although as
> > noted on the previous patch, it sounds like these changes might not
> > actually be aligned with the workaround we were trying to address).
> 
> The hotplugging capability is one of the features I was looking
> for, actually.
> 
> I have done some more refactoring in this piece of code in
> upcoming patches.
> 
> Will check, though, if I can do something with compute_fuses(),
> even though, the other cslices are not really fused off (read
> below).
> 
> > > +
> > >   rb_link_node(>uabi_node, prev, p);
> > >   rb_insert_color(>uabi_node, >uabi_engines);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> > > b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index a425db5ed3a2..e19df4ef47f6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > >   }
> > >  }
> > >  
> > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > +{
> > > + if (!IS_DG2(gt->i915))
> > > + return;
> > > +
> > > + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > 
> > This doesn't look right to me.  A value of 0 means every cslice gets
> > associated with CCS0.
> 
> Yes, that's what I'm trying to do. The behavior I'm looking for
> is this one:
> 
>/*
> ...
>   * With 1 engine (ccs0):
>   *   slice 0, 1, 2, 3: ccs0
>   *
>   * With 2 engines (ccs0, ccs1):
>   *   slice 0, 2: ccs0
>   *   slice 1, 3: ccs1
>   *
>   * With 4 engines (ccs0, ccs1, ccs2, ccs3):
>   *   slice 0: ccs0
>   *   slice 1: ccs1
>   *   slice 2: ccs2
>   *   slice 3: ccs3
> ...
> */
> 
> where the user can configure runtime the mode, making sure that
> no client is connected to i915.
> 
> But, this needs to be written 
> 
> As we are now forcing mode '1', then all cslices are connected
> with ccs0.

Right --- and that's what I'm pointing out as illegal.  I think that
code comment above was taken out of context from a different RFC series;
that's not an accurate description of the behavior we want here.

First, the above comment is using ccs# to refer to userspace engines,
not hardware engines.  As a simple example, DG2-G11 only ever has a
single CCS which userspace sees as "instance 0" but which is actually
CCS1 at the hardware level.  If you try to follow the comment above when
programming CCS_MODE, you've assigned all of the cslices to a
non-existent engine and assigned no cslices to the CCS engine that
actually exists.  For DG2-G10 (and I think DG2-G12), there are different
combinations of fused-off / not-fused-off engines that will always show
up in userspace as CCS0-CCSn, even if those don't match the hardware
IDs.

Second, the ab

Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-20 Thread Matt Roper
On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> Enable only one CCS engine by default with all the compute sices
> allocated to it.
> 
> While generating the list of UABI engines to be exposed to the
> user, exclude any additional CCS engines beyond the first
> instance.
> 
> This change can be tested with igt i915_query.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +
>  drivers/gpu/drm/i915/gt/intel_gt.c  | 11 +++
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  2 ++
>  drivers/gpu/drm/i915/i915_query.c   |  1 +
>  4 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 833987015b8b..7041acc77810 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   if (engine->uabi_class == I915_NO_UABI_CLASS)
>   continue;
>  
> + /*
> +  * Do not list and do not count CCS engines other than the first
> +  */
> + if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> + engine->uabi_instance > 0) {
> + i915->engine_uabi_class_count[engine->uabi_class]--;
> + continue;
> + }

Wouldn't it be simpler to just add a workaround to the end of
engine_mask_apply_compute_fuses() if we want to ensure only a single
compute engine gets exposed?  Then both the driver internals and uapi
will agree that's there's just one CCS (and on which one there is).

If we want to do something fancy with "hotplugging" a new engine later
on or whatever, that can be handled in the future series (although as
noted on the previous patch, it sounds like these changes might not
actually be aligned with the workaround we were trying to address).

> +
>   rb_link_node(>uabi_node, prev, p);
>   rb_insert_color(>uabi_node, >uabi_engines);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index a425db5ed3a2..e19df4ef47f6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
>   }
>  }
>  
> +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> +{
> + if (!IS_DG2(gt->i915))
> + return;
> +
> + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);

This doesn't look right to me.  A value of 0 means every cslice gets
associated with CCS0.  On a DG2-G11 platform, that will flat out break
compute since CCS0 is never present (G11 only has a single CCS and it's
always the hardware's CCS1).  Even on a G10 or G12 this could also break
things depending on the fusing of your card if the hardware CCS0 happens
to be missing.

Also, the register says that we need a field value of 0x7 for each
cslice that's fused off.  By passing 0, we're telling the CCS engine
that it can use cslices that may not actually exist.

> +}
> +
>  int intel_gt_init_hw(struct intel_gt *gt)
>  {
>   struct drm_i915_private *i915 = gt->i915;
> @@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
>  
>   intel_gt_init_swizzling(gt);
>  
> + /* Configure CCS mode */
> + intel_gt_apply_ccs_mode(gt);
> +
>   /*
>* At least 830 can leave some of the unused rings
>* "active" (ie. head != tail) after resume which
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cf709f6c05ae..c148113770ea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1605,6 +1605,8 @@
>  #define   GEN12_VOLTAGE_MASK REG_GENMASK(10, 0)
>  #define   GEN12_CAGF_MASKREG_GENMASK(19, 11)
>  
> +#define XEHP_CCS_MODE  _MMIO(0x14804)

Nitpick:  this doesn't seem to be in the proper place and also breaks
the file's convention of using tabs to move over to column 48 for the
definition value.


Matt

> +
>  #define GEN11_GT_INTR_DW(x)  _MMIO(0x190018 + ((x) * 4))
>  #define   GEN11_CSME (31)
>  #define   GEN12_HECI_2   (30)
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/

Re: [PATCH v2 1/2] drm/i915/gt: Disable HW load balancing for CCS

2024-02-20 Thread Matt Roper
On Tue, Feb 20, 2024 at 03:35:25PM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_14019159160 recommends disabling it across all
> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE   _MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE  REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
>  
>  #define CHV_FUSE_GT  _MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..9126b37186fc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs 
> *engine, struct i915_wa_li
>   wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1,
>GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE);
>   }
> +
> + /*
> +  * Wa_14019159160: disable the CCS load balancing
> +  * indiscriminately for all the platforms

The database's description of this workaround is a bit confusing since
it's been modified a few times, but if I'm reading it correctly it
doesn't sound like this is what it's asking us to do.  What I see says
that load balancing shouldn't be allowed specifically while the RCS is
active.  If the RCS is sitting idle, I believe you're free to use as
many CCS engines as you like, with load balancing still active.

We already have other workarounds that prevent different address spaces
from executing on the RCS/CCS engines at the same time, so the part
about "same address space" in the description should already be
satisfied.  It sounds like the issues now are if 2+ CCS's are in use and
something new shows up to run on the previously-idle RCS, or if
something's already running on the RCS and 1 CCS, and something new
shows up to run on an additional CCS.  The workaround details make it
sound like it's supposed to be the GuC's responsibility to prevent the
new work from getting scheduled onto the additional engine while we're
already in one of those two situations, so I don't see anything asking
us to change the hardware-level load balance enable/disable (actually
the spec specifically tells us *not* to do this).  Aren't we supposed to
be just setting a GuC workaround flag for this?


Matt

> +  */
> + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
>  }
>  
>  static void
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH 1/2] drm/i915/gt: Disable HW load balancing for CCS

2024-02-15 Thread Matt Roper
On Thu, Feb 15, 2024 at 02:59:23PM +0100, Andi Shyti wrote:
> The hardware should not dynamically balance the load between CCS
> engines. Wa_16016805146 recommends disabling it across all

Is this the right workaround number?  When I check the database, this
workaround was rejected on both DG2-G10 and DG2-G11, and doesn't even
have an entry for DG2-G12.

There are other workarounds that sound somewhat related to load
balancing (e.g., part 3 of Wa_14019159160), but what's asked there is
more involved than just setting one register bit and conflicts a bit
with the second patch of this series.


Matt

> platforms.
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Signed-off-by: Andi Shyti 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Matt Roper 
> Cc:  # v6.2+
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 +
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 50962cfd1353..cf709f6c05ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1478,6 +1478,7 @@
>  
>  #define GEN12_RCU_MODE   _MMIO(0x14800)
>  #define   GEN12_RCU_MODE_CCS_ENABLE  REG_BIT(0)
> +#define   XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
>  
>  #define CHV_FUSE_GT  _MMIO(VLV_GUNIT_BASE + 0x2168)
>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index d67d44611c28..7f42c8015f71 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs 
> *engine, struct i915_wa_li
>   wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1,
>GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE);
>   }
> +
> + /*
> +  * Wa_16016805146: disable the CCS load balancing
> +  * indiscriminately for all the platforms
> +  */
> + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
>  }
>  
>  static void
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/xe: Remove PVC from xe_wa kunit tests

2024-01-23 Thread Matt Roper
On Mon, Jan 22, 2024 at 07:12:42PM -0800, Lucas De Marchi wrote:
> Since the PCI IDs for PVC were added to the xe driver, the xe_wa tests

This first line doesn't seem to be worded right.  I think you meant
either "weren't added" or "were only added to topic/xe-for-CI."

Assuming you can reword that,

Reviewed-by: Matt Roper 

> should not try to create a fake PVC device since they can't find
> the right PCI ID. Fix bugs when running kunit:
> 
>   # xe_wa_gt: ASSERTION FAILED at 
> drivers/gpu/drm/xe/tests/xe_wa_test.c:111
>   Expected ret == 0, but
>   ret == -19 (0xffed)
>   [FAILED] PVC (B0)
>   # xe_wa_gt: ASSERTION FAILED at 
> drivers/gpu/drm/xe/tests/xe_wa_test.c:111
>   Expected ret == 0, but
>   ret == -19 (0xffed)
>   [FAILED] PVC (B1)
>   # xe_wa_gt: ASSERTION FAILED at 
> drivers/gpu/drm/xe/tests/xe_wa_test.c:111
>   Expected ret == 0, but
>   ret == -19 (0xffed)
>   [FAILED] PVC (C0)
> 
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/xe/tests/xe_wa_test.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_wa_test.c 
> b/drivers/gpu/drm/xe/tests/xe_wa_test.c
> index 439477593faf..44570d888355 100644
> --- a/drivers/gpu/drm/xe/tests/xe_wa_test.c
> +++ b/drivers/gpu/drm/xe/tests/xe_wa_test.c
> @@ -69,9 +69,6 @@ static const struct platform_test_case cases[] = {
>   SUBPLATFORM_CASE(DG2, G10, C0),
>   SUBPLATFORM_CASE(DG2, G11, B1),
>   SUBPLATFORM_CASE(DG2, G12, A1),
> - PLATFORM_CASE(PVC, B0),
> - PLATFORM_CASE(PVC, B1),
> - PLATFORM_CASE(PVC, C0),
>   GMDID_CASE(METEORLAKE, 1270, A0, 1300, A0),
>   GMDID_CASE(METEORLAKE, 1271, A0, 1300, A0),
>   GMDID_CASE(LUNARLAKE, 2004, A0, 2000, A0),
> -- 
> 2.43.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/mtl: Wake GT before sending H2G message

2024-01-18 Thread Matt Roper
On Thu, Jan 18, 2024 at 05:21:23PM -0800, Belgaumkar, Vinay wrote:
> 
> On 1/18/2024 3:50 PM, Matt Roper wrote:
> > On Thu, Jan 18, 2024 at 03:17:28PM -0800, Vinay Belgaumkar wrote:
> > > Instead of waiting until the interrupt reaches GuC, we can grab a
> > > forcewake while triggering the H2G interrupt. GEN11_GUC_HOST_INTERRUPT
> > > is inside an "always on" domain with respect to RC6. However, there
> > A bit of a nitpick, but technically "always on" is a description of GT
> > register ranges that never get powered down.  GEN11_GUC_HOST_INTERRUPT
> > isn't inside the GT at all, but rather is an sgunit register and thus
> > isn't affected by forcewake.  This is just a special case where the
> > sgunit register forwards a message back to the GT's GuC, and the
> > workaround wants us to make sure the GT is awake before that message
> > gets there.
> True, can modify the description to reflect this.
> > 
> > > could be some delays when platform is entering/exiting some higher
> > > level platform sleep states and a H2G is triggered. A forcewake
> > > ensures those sleep states have been fully exited and further
> > > processing occurs as expected.
> > Based on this description, is adding implicit forcewake to this register
> > really enough?  Implicit forcewake powers up before a read/write, but
> > also allows it to power back down as soon as the MMIO operation is
> > complete.  If the GuC is a bit slow to notice the interrupt, then we
> > could wind up with a sequence like
> > 
> >   - Driver grabs forcewake and GT powers up
> >   - Driver writes 0x1901f0 to trigger GuC interrupt
> >   - Driver releases forcewake and GT powers down
> >   - GuC notices interrupt (or maybe fails to notice it because the GT
> > powered down before it had a chance to process it?)
> > 
> > which I'm guessing isn't actually going to satisfy this workaround.  Do
> > we actually need to keep the GT awake not just through the register
> > operation, but also through the GuC's processing of the interrupt?  If
> > so, then we probably want to do an explicit forcewake get/put to ensure
> > the hardware stays powered up long enough.
> 
> The issue being addressed here is not GT entering C6, but the higher
> platform sleep states. Once we force wake GT while writing to the H2G
> register, that should bring us out of sleep. After clearing the forcewake
> (which would happen after the write for 0x1901f0 goes through), we still
> have C6 hysteresis and the hysteresis counters for the higher platform sleep
> states which should give GuC enough time to process the interrupt before we
> enter C6 and then subsequently these higher sleep states.

Okay, makes sense.  Hopefully the finalize the workaround details and
documentation soon, but this looks reasonable with the information we
have at the moment.

Reviewed-by: Matt Roper 


Matt

> 
> Thanks,
> 
> Vinay.
> 
> > 
> > 
> > Matt
> > 
> > > This will have an official WA soon so adding a FIXME in the comments.
> > > 
> > > Signed-off-by: Vinay Belgaumkar 
> > > ---
> > >   drivers/gpu/drm/i915/intel_uncore.c | 5 -
> > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> > > b/drivers/gpu/drm/i915/intel_uncore.c
> > > index dfefad5a5fec..121458a31886 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1800,7 +1800,10 @@ static const struct intel_forcewake_range 
> > > __mtl_fw_ranges[] = {
> > >   GEN_FW_RANGE(0x24000, 0x2ffff, 0), /*
> > >   0x24000 - 0x2407f: always on
> > >   0x24080 - 0x2: reserved */
> > > - GEN_FW_RANGE(0x3, 0x3, FORCEWAKE_GT)
> > > + GEN_FW_RANGE(0x3, 0x3, FORCEWAKE_GT),
> > > + GEN_FW_RANGE(0x4, 0x1901ec, 0),
> > > + GEN_FW_RANGE(0x1901f0, 0x1901f0, FORCEWAKE_GT)
> > > + /* FIXME: WA to wake GT while triggering H2G */
> > >   };
> > >   /*
> > > -- 
> > > 2.38.1
> > > 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/mtl: Wake GT before sending H2G message

2024-01-18 Thread Matt Roper
On Thu, Jan 18, 2024 at 03:17:28PM -0800, Vinay Belgaumkar wrote:
> Instead of waiting until the interrupt reaches GuC, we can grab a
> forcewake while triggering the H2G interrupt. GEN11_GUC_HOST_INTERRUPT
> is inside an "always on" domain with respect to RC6. However, there

A bit of a nitpick, but technically "always on" is a description of GT
register ranges that never get powered down.  GEN11_GUC_HOST_INTERRUPT
isn't inside the GT at all, but rather is an sgunit register and thus
isn't affected by forcewake.  This is just a special case where the
sgunit register forwards a message back to the GT's GuC, and the
workaround wants us to make sure the GT is awake before that message
gets there.

> could be some delays when platform is entering/exiting some higher
> level platform sleep states and a H2G is triggered. A forcewake
> ensures those sleep states have been fully exited and further
> processing occurs as expected.

Based on this description, is adding implicit forcewake to this register
really enough?  Implicit forcewake powers up before a read/write, but
also allows it to power back down as soon as the MMIO operation is
complete.  If the GuC is a bit slow to notice the interrupt, then we
could wind up with a sequence like

 - Driver grabs forcewake and GT powers up
 - Driver writes 0x1901f0 to trigger GuC interrupt
 - Driver releases forcewake and GT powers down
 - GuC notices interrupt (or maybe fails to notice it because the GT
   powered down before it had a chance to process it?)

which I'm guessing isn't actually going to satisfy this workaround.  Do
we actually need to keep the GT awake not just through the register
operation, but also through the GuC's processing of the interrupt?  If
so, then we probably want to do an explicit forcewake get/put to ensure
the hardware stays powered up long enough.


Matt

> 
> This will have an official WA soon so adding a FIXME in the comments.
> 
> Signed-off-by: Vinay Belgaumkar 
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> b/drivers/gpu/drm/i915/intel_uncore.c
> index dfefad5a5fec..121458a31886 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1800,7 +1800,10 @@ static const struct intel_forcewake_range 
> __mtl_fw_ranges[] = {
>   GEN_FW_RANGE(0x24000, 0x2, 0), /*
>   0x24000 - 0x2407f: always on
>   0x24080 - 0x2: reserved */
> - GEN_FW_RANGE(0x3, 0x3, FORCEWAKE_GT)
> + GEN_FW_RANGE(0x3, 0x3, FORCEWAKE_GT),
> + GEN_FW_RANGE(0x4, 0x1901ec, 0),
> + GEN_FW_RANGE(0x1901f0, 0x1901f0, FORCEWAKE_GT)
> + /* FIXME: WA to wake GT while triggering H2G */
>  };
>  
>  /*
> -- 
> 2.38.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 5/5] drm/xe: Enable 32bits build

2024-01-17 Thread Matt Roper
On Tue, Jan 16, 2024 at 09:40:50AM -0800, Lucas De Marchi wrote:
> Now that all the issues with 32bits are fixed, enable it again.
> 
> Signed-off-by: Lucas De Marchi 

I didn't test locally, but assuming you confirmed all the warnings are
gone now,

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/xe/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index 1b57ae38210d..1b0ef91a5d2c 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config DRM_XE
>   tristate "Intel Xe Graphics"
> - depends on DRM && PCI && MMU && (m || (y && KUNIT=y)) && 64BIT
> + depends on DRM && PCI && MMU && (m || (y && KUNIT=y))
>   select INTERVAL_TREE
>   # we need shmfs for the swappable backing store, and in particular
>   # the shmem_readpage() which depends upon tmpfs
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 4/5] drm/xe: Fix cast on trace variable

2024-01-17 Thread Matt Roper
On Tue, Jan 16, 2024 at 09:40:49AM -0800, Lucas De Marchi wrote:
> Cast the pointer to unsigned long and let it be implicitly extended to
> u64. This fixes the build on 32bits arch.
> 
> Cc: Matthew Brost 
> Cc: Niranjana Vishwanathapura 
> Cc: Rodrigo Vivi 
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/xe/xe_trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 95163c303f3e..e4e7262191ad 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -31,7 +31,7 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
>),
>  
>   TP_fast_assign(
> -__entry->fence = (u64)fence;
> +__entry->fence = (unsigned long)fence;
>  __entry->seqno = fence->seqno;
>  ),
>  
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 3/5] drm/xe/display: Avoid calling readq()

2024-01-17 Thread Matt Roper
On Tue, Jan 16, 2024 at 09:40:48AM -0800, Lucas De Marchi wrote:
> readq() is not available in 32bits. iosys-map already has the logic in
> place to use read u64 in all cases, so simply add a helper variable for
> using that.
> 
> Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
> Signed-off-by: Lucas De Marchi 
> ---
>  .../gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h   | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h 
> b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
> index 5f19550cc845..6739dadaf1a9 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
> @@ -7,6 +7,7 @@
>  #define _I915_GEM_OBJECT_H_
>  
>  #include 
> +#include 
>  
>  #include "xe_bo.h"
>  
> @@ -36,6 +37,7 @@ static inline int i915_gem_object_read_from_page(struct 
> xe_bo *bo,
>  {
>   struct ttm_bo_kmap_obj map;
>   void *virtual;
> + struct iosys_map vaddr;
>   bool is_iomem;
>   int ret;
>  
> @@ -52,10 +54,11 @@ static inline int i915_gem_object_read_from_page(struct 
> xe_bo *bo,
>   ofs &= ~PAGE_MASK;
>   virtual = ttm_kmap_obj_virtual(, _iomem);
>   if (is_iomem)
> - *ptr = readq((void __iomem *)(virtual + ofs));
> + iosys_map_set_vaddr_iomem(, (void __iomem *)(virtual));

Should we just use a memcpy_fromio (and memcpy in the else branch) and
pass the size actually requested rather than hardcoding it to a u64?  At
the moment the only callsite happens to want a u64, and thus the Xe
compat header has an XE_WARN_ON that complains if any other size is
requested, but in theory this function is supposed to be general purpose
and take any size.


Matt

>   else
> - *ptr = *(u64 *)(virtual + ofs);
> + iosys_map_set_vaddr(, virtual);
>  
> + *ptr = iosys_map_rd(, ofs, u64);
>   ttm_bo_kunmap();
>  out_unlock:
>   xe_bo_unlock(bo);
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 2/5] drm/xe/mmio: Cast to u64 when printing

2024-01-17 Thread Matt Roper
On Tue, Jan 16, 2024 at 09:40:47AM -0800, Lucas De Marchi wrote:
> resource_size_t uses %pa format in printk since the size varies
> depending on build options. However to keep the io_size/physical_size
> addition in the same call we can't pass the address without adding yet
> another variable in these function. Simply cast it to u64 and keep using
> %llx.
> 
> Fixes: 286089ce6929 ("drm/xe: Improve vram info debug printing")
> Cc: Oak Zeng 
> Cc: Michael J. Ruhl 
> Cc: Matthew Brost 
> Cc: Rodrigo Vivi 
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/xe/xe_mmio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> index c8c5d74b6e90..5f6b53ea5528 100644
> --- a/drivers/gpu/drm/xe/xe_mmio.c
> +++ b/drivers/gpu/drm/xe/xe_mmio.c
> @@ -272,8 +272,8 @@ int xe_mmio_probe_vram(struct xe_device *xe)
>   drm_info(>drm, "VRAM[%u, %u]: Actual physical size %pa, 
> usable size exclude stolen %pa, CPU accessible size %pa\n", id,
>tile->id, >mem.vram.actual_physical_size, 
> >mem.vram.usable_size, >mem.vram.io_size);
>   drm_info(>drm, "VRAM[%u, %u]: DPA range: [%pa-%llx], io 
> range: [%pa-%llx]\n", id, tile->id,
> -  >mem.vram.dpa_base, tile->mem.vram.dpa_base + 
> tile->mem.vram.actual_physical_size,
> -  >mem.vram.io_start, tile->mem.vram.io_start + 
> tile->mem.vram.io_size);
> +  >mem.vram.dpa_base, tile->mem.vram.dpa_base + 
> (u64)tile->mem.vram.actual_physical_size,
> +  >mem.vram.io_start, tile->mem.vram.io_start + 
> (u64)tile->mem.vram.io_size);
>  
>   /* calculate total size using tile size to get the correct HW 
> sizing */
>   total_size += tile_size;
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 1/5] drm/xe: Use _ULL for u64 division

2024-01-17 Thread Matt Roper
On Tue, Jan 16, 2024 at 09:40:46AM -0800, Lucas De Marchi wrote:
> Use DIV_ROUND_UP_ULL() so it also works on 32bit build.
> 
> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/xe/xe_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index a94c0b27f04e..5f5e3c7132d3 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -624,7 +624,7 @@ void xe_device_wmb(struct xe_device *xe)
>  u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>  {
>   return xe_device_has_flat_ccs(xe) ?
> - DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
> + DIV_ROUND_UP_ULL(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
>  }
>  
>  bool xe_device_mem_access_ongoing(struct xe_device *xe)
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH v3 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-28 Thread Matt Roper
On Mon, Nov 27, 2023 at 12:11:50PM -0800, Alan Previn wrote:
> Add missing tag for "Wa_14019159160 - Case 2" (for existing
> PXP code that ensures run alone mode bit is set to allow
> PxP-decryption.
> 
>  v3: - Check targeted platforms using IP_VAL. (John Harrison)
>  v2: - Fix WA id number (John Harrison).
>  - Improve comments and code to be specific
>for the targeted platforms (John Harrison)
> 
> Signed-off-by: Alan Previn 
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 7c367ba8d9dc..1152cf25d578 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -863,10 +863,12 @@ static bool ctx_needs_runalone(const struct 
> intel_context *ce)
>   bool ctx_is_protected = false;
>  
>   /*
> -  * On MTL and newer platforms, protected contexts require setting
> -  * the LRC run-alone bit or else the encryption will not happen.
> +  * Wa_14019159160 - Case 2: mtl
> +  * On some platforms, protected contexts require setting
> +  * the LRC run-alone bit or else the encryption/decryption will not 
> happen.
> +  * NOTE: Case 2 only applies to PXP use-case of said workaround.
>*/
> - if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
> + if (GRAPHICS_VER_FULL(ce->engine->i915) == IP_VER(12, 70) &&

The workaround database lists this as being needed on both 12.70 and
12.71.  Should this be a

IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))

check instead?

The workaround is also listed in the database as applying to DG2; is
this "case 2" subset of the workaround not relevant to that platform?


Matt

>   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
> RENDER_CLASS)) {
>   rcu_read_lock();
>   gem_ctx = rcu_dereference(ce->gem_context);
> 
> base-commit: 5429d55de723544dfc0630cf39d96392052b27a1
> -- 
> 2.39.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2] drm/i915: Flush WC GGTT only on required platforms

2023-10-13 Thread Matt Roper
On Fri, Oct 13, 2023 at 03:44:39PM +0200, Nirmoy Das wrote:
> gen8_ggtt_invalidate() is only needed for limited set of platforms
> where GGTT is mapped as WC otherwise this can cause unwanted
> side-effects on XE_HP platforms where GFX_FLSH_CNTL_GEN6 is not
> valid.
> 
> v2: Add a func to detect wc ggtt detection (Ville)
> 
> Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
> Cc: Rodrigo Vivi 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Cc: Jani Nikula 
> Cc: Jonathan Cavitt 
> Cc: John Harrison 
> Cc: Andi Shyti 
> Cc: Ville Syrjälä 
> Cc:  # v6.2+
> Suggested-by: Matt Roper 
> Signed-off-by: Nirmoy Das 
> Acked-by: Andi Shyti 

Reviewed-by: Matt Roper 

Interestingly, bspec 151 indicates that we probably shouldn't have been
using a CPU:WC mapping for the GGTT on gen9bc platforms either (i.e.,
the GTT part of the GTTMMADR has the same "64-bits or less" restriction
listed as later platforms).  But we've been using WC without issue for
the last 8 years, so I guess it's not worth changing it now.


Matt

> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 35 +++-
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 4d7d88b92632..401667f83f96 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -195,6 +195,21 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)
>   spin_unlock_irq(>lock);
>  }
>  
> +static bool needs_wc_ggtt_mapping(struct drm_i915_private *i915)
> +{
> + /*
> +  * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
> +  * will be dropped. For WC mappings in general we have 64 byte burst
> +  * writes when the WC buffer is flushed, so we can't use it, but have to
> +  * resort to an uncached mapping. The WC issue is easily caught by the
> +  * readback check when writing GTT PTE entries.
> +  */
> + if (!IS_GEN9_LP(i915) && GRAPHICS_VER(i915) < 11)
> + return true;
> +
> + return false;
> +}
> +
>  static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
>  {
>   struct intel_uncore *uncore = ggtt->vm.gt->uncore;
> @@ -202,8 +217,12 @@ static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
>   /*
>* Note that as an uncached mmio write, this will flush the
>* WCB of the writes into the GGTT before it triggers the invalidate.
> +  *
> +  * Only perform this when GGTT is mapped as WC, see ggtt_probe_common().
>*/
> - intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
> + if (needs_wc_ggtt_mapping(ggtt->vm.i915))
> + intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6,
> +   GFX_FLSH_CNTL_EN);
>  }
>  
>  static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
> @@ -1126,17 +1145,11 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, 
> u64 size)
>   GEM_WARN_ON(pci_resource_len(pdev, GEN4_GTTMMADR_BAR) != 
> gen6_gttmmadr_size(i915));
>   phys_addr = pci_resource_start(pdev, GEN4_GTTMMADR_BAR) + 
> gen6_gttadr_offset(i915);
>  
> - /*
> -  * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
> -  * will be dropped. For WC mappings in general we have 64 byte burst
> -  * writes when the WC buffer is flushed, so we can't use it, but have to
> -  * resort to an uncached mapping. The WC issue is easily caught by the
> -  * readback check when writing GTT PTE entries.
> -  */
> - if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11)
> - ggtt->gsm = ioremap(phys_addr, size);
> - else
> + if (needs_wc_ggtt_mapping(i915))
>   ggtt->gsm = ioremap_wc(phys_addr, size);
> + else
> + ggtt->gsm = ioremap(phys_addr, size);
> +
>   if (!ggtt->gsm) {
>   drm_err(>drm, "Failed to map the ggtt page table\n");
>   return -ENOMEM;
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [RFC PATCH] drm/i915/gt: Do not treat MCR locking timeouts as errors

2023-10-04 Thread Matt Roper
On Wed, Oct 04, 2023 at 10:58:32PM +0200, Andi Shyti wrote:
> Hi Matt,
> 
> > > > > > > The MCR steering semaphore is a shared lock entry between i915
> > > > > > > and various firmware components.
> > > > > > > 
> > > > > > > Getting the lock might sinchronize on some shared resources.
> > > > > > > Sometimes though, it might happen that the firmware forgets to
> > > > > > > unlock causing unnecessary noise in the driver which keeps doing
> > > > > > > what was supposed to do, ignoring the problem.
> > > > > > > 
> > > > > > > Do not consider this failure as an error, but just print a debug
> > > > > > > message stating that the MCR locking has been skipped.
> > > > > > > 
> > > > > > > On the driver side we still have spinlocks that make sure that
> > > > > > > the access to the resources is serialized.
> > > > > > > 
> > > > > > > Signed-off-by: Andi Shyti 
> > > > > > > Cc: Jonathan Cavitt 
> > > > > > > Cc: Matt Roper 
> > > > > > > Cc: Nirmoy Das 
> > > > > > > ---
> > > > > > >drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++
> > > > > > >1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c 
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > > > index 326c2ed1d99b..51eb693df39b 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > > > @@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, 
> > > > > > > unsigned long *flags)
> > > > > > >* would indicate some hardware/firmware is misbehaving 
> > > > > > > and not
> > > > > > >* releasing it properly.
> > > > > > >*/
> > > > > > > - if (err == -ETIMEDOUT) {
> > > > > > > - gt_err_ratelimited(gt, "hardware MCR steering semaphore 
> > > > > > > timed out");
> > > > > > > - add_taint_for_CI(gt->i915, TAINT_WARN);  /* CI is now 
> > > > > > > unreliable */
> > > > > > > - }
> > > > > > > + if (err == -ETIMEDOUT)
> > > > > > > + gt_dbg(gt, "hardware MCR steering semaphore timed out");
> > > > > > >}
> > > > > > >/**
> > > > > > Are we sure this does not warrant a level higher than dbg, such as
> > > > > > notice/warn?
> > > > > We might make it a warn, but this doesn't change much the economy
> > > > > of the driver as we will keep doing what we were supposed to do.
> > > > > 
> > > > > > Because how can we be sure the two entities will not stomp on
> > > > > > each other toes if we failed to obtain lock?
> > > > > So far, in all the research I've done, no one looks like using
> > > > > MCR lock, but yet someone is stuck in it.
> > > > 
> > > > If someone has the lock then that someone thinks they are using it. Just
> > > > because you can't see what someone piece of IFWI is doing doesn't mean 
> > > > it
> > > > isn't doing it. And if it is a genuinely missing unlock then it needs 
> > > > to be
> > > > tracked down and fixed with an IFWI update otherwise the system is 
> > > > going to
> > > > be unstable from that point on.
> > > 
> > > But I'm not changing here the behavior of the driver. The driver
> > > will keep doing what was doing before.
> > > 
> > > Because this most probably won't be noticed by the user, then I
> > > don't see why it should shout out loud that the system is
> > > unusable when most probably it is.
> > 
> > That's like saying that any random race condition isn't likely to be
> > noticed by the user so it's not a big deal if we're missing a few
> > mutexes or spinlocks somewhere...even though there's likely to be no
> > user-visible impact to any race condition 99% of the time, it's the 1%
> > that winds up bei

Re: [Intel-gfx] [RFC PATCH] drm/i915/gt: Do not treat MCR locking timeouts as errors

2023-10-04 Thread Matt Roper
On Wed, Oct 04, 2023 at 09:35:27PM +0200, Andi Shyti wrote:
> Hi John,
> 
> > > > > The MCR steering semaphore is a shared lock entry between i915
> > > > > and various firmware components.
> > > > > 
> > > > > Getting the lock might sinchronize on some shared resources.
> > > > > Sometimes though, it might happen that the firmware forgets to
> > > > > unlock causing unnecessary noise in the driver which keeps doing
> > > > > what was supposed to do, ignoring the problem.
> > > > > 
> > > > > Do not consider this failure as an error, but just print a debug
> > > > > message stating that the MCR locking has been skipped.
> > > > > 
> > > > > On the driver side we still have spinlocks that make sure that
> > > > > the access to the resources is serialized.
> > > > > 
> > > > > Signed-off-by: Andi Shyti 
> > > > > Cc: Jonathan Cavitt 
> > > > > Cc: Matt Roper 
> > > > > Cc: Nirmoy Das 
> > > > > ---
> > > > >drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++
> > > > >1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c 
> > > > > b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > index 326c2ed1d99b..51eb693df39b 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > > > > @@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, 
> > > > > unsigned long *flags)
> > > > >* would indicate some hardware/firmware is misbehaving and not
> > > > >* releasing it properly.
> > > > >*/
> > > > > - if (err == -ETIMEDOUT) {
> > > > > - gt_err_ratelimited(gt, "hardware MCR steering semaphore 
> > > > > timed out");
> > > > > - add_taint_for_CI(gt->i915, TAINT_WARN);  /* CI is now 
> > > > > unreliable */
> > > > > - }
> > > > > + if (err == -ETIMEDOUT)
> > > > > + gt_dbg(gt, "hardware MCR steering semaphore timed out");
> > > > >}
> > > > >/**
> > > > Are we sure this does not warrant a level higher than dbg, such as
> > > > notice/warn?
> > > We might make it a warn, but this doesn't change much the economy
> > > of the driver as we will keep doing what we were supposed to do.
> > > 
> > > > Because how can we be sure the two entities will not stomp on
> > > > each other toes if we failed to obtain lock?
> > > So far, in all the research I've done, no one looks like using
> > > MCR lock, but yet someone is stuck in it.
> > 
> > If someone has the lock then that someone thinks they are using it. Just
> > because you can't see what someone piece of IFWI is doing doesn't mean it
> > isn't doing it. And if it is a genuinely missing unlock then it needs to be
> > tracked down and fixed with an IFWI update otherwise the system is going to
> > be unstable from that point on.
> 
> But I'm not changing here the behavior of the driver. The driver
> will keep doing what was doing before.
> 
> Because this most probably won't be noticed by the user, then I
> don't see why it should shout out loud that the system is
> unusable when most probably it is.

That's like saying that any random race condition isn't likely to be
noticed by the user so it's not a big deal if we're missing a few
mutexes or spinlocks somewhere...even though there's likely to be no
user-visible impact to any race condition 99% of the time, it's the 1%
that winds up being absolutely catastrophic.

If we're not obtaining the MCR lock as expected and are simply moving
forward to force our own steering (possibly at the same time firmware is
programming steering to a different value), you probably won't actually
see a problem either because the operations won't wind up interleaving
in a problematic order, or because the driver and the firmware both
happen to be trying to steer to the same instance (e.g., instance #0 is
a quite common target).  But even if they're hard to hit, the
possibility for a major problem is still there and basically we need to
consider the whole platform to be in an unknown, unstable state once
we've detected one of these issues.

Based on some earlier experiments, it sounds like the problem at the
moment is that we've ju

Re: [Intel-gfx] [PATCH v7 4/4] drm/i915/mtl: Skip MCR ops for ring fault register

2023-09-28 Thread Matt Roper
On Fri, Sep 29, 2023 at 12:14:37AM +0200, Andrzej Hajda wrote:
> On 28.09.2023 15:00, Nirmoy Das wrote:
> > On MTL GEN12_RING_FAULT_REG is not replicated so don't
> > do mcr based operation for this register.
> > 
> > v2: use MEDIA_VER() instead of GRAPHICS_VER()(Matt).
> > v3: s/"MEDIA_VER(i915) == 13"/"MEDIA_VER(i915) >= 13"(Matt)
> >  improve comment.
> > v4: improve the comment further(Andi)
> > 
> > Signed-off-by: Nirmoy Das 
> > Reviewed-by: Matt Roper 
> > Reviewed-by: Andi Shyti 
> > Reviewed-by: Andrzej Hajda 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt.c  | 13 -
> >   drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
> >   drivers/gpu/drm/i915/i915_gpu_error.c   | 11 ++-
> >   3 files changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> > b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index 93062c35e072..dff8bba1f5d4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -262,10 +262,21 @@ intel_gt_clear_error_registers(struct intel_gt *gt,
> >I915_MASTER_ERROR_INTERRUPT);
> > }
> > -   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
> > +   /*
> > +* For the media GT, this ring fault register is not replicated,
> > +* so don't do multicast/replicated register read/write operation on it.
> > +*/
> > +   if (MEDIA_VER(i915) >= 13 && gt->type == GT_MEDIA) {
> > +   intel_uncore_rmw(uncore, XELPMP_RING_FAULT_REG,
> > +RING_FAULT_VALID, 0);
> > +   intel_uncore_posting_read(uncore,
> > + XELPMP_RING_FAULT_REG);
> > +
> > +   } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
> 
> WA 14017387313 suggests to "remove Semaphore acquisition steps for all GAM
> ranges" (XELPMP_RING_FAULT_REG is in GAM range), just FYI.

We've actually looked at that workaround before and decided that it
doesn't make sense to implement it on Linux.  The background for that
workaround is due to Windows driver design; their driver potentially
tries to access some MCR registers from within an interrupt handler,
which would cause problems if non-IRQ code grabs the semaphore, gets
interrupted, and then the interrupt handler deadlocks while also trying
to acquire it.  On Linux, we never access MCR registers from an
interrupt handler, so we're not susceptible to that issue.


Matt

> 
> Regards
> Andrzej
> 
> 
> > intel_gt_mcr_multicast_rmw(gt, XEHP_RING_FAULT_REG,
> >RING_FAULT_VALID, 0);
> > intel_gt_mcr_read_any(gt, XEHP_RING_FAULT_REG);
> > +
> > } else if (GRAPHICS_VER(i915) >= 12) {
> > intel_uncore_rmw(uncore, GEN12_RING_FAULT_REG, 
> > RING_FAULT_VALID, 0);
> > intel_uncore_posting_read(uncore, GEN12_RING_FAULT_REG);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> > b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index cca4bac8f8b0..eecd0a87a647 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1084,6 +1084,7 @@
> >   #define GEN12_RING_FAULT_REG  _MMIO(0xcec4)
> >   #define XEHP_RING_FAULT_REG   MCR_REG(0xcec4)
> > +#define XELPMP_RING_FAULT_REG  _MMIO(0xcec4)
> >   #define   GEN8_RING_FAULT_ENGINE_ID(x)(((x) >> 12) & 0x7)
> >   #define   RING_FAULT_GTTSEL_MASK  (1 << 11)
> >   #define   RING_FAULT_SRCID(x) (((x) >> 3) & 0xff)
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
> > b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index f4ebcfb70289..b4e31e59c799 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1234,7 +1234,16 @@ static void engine_record_registers(struct 
> > intel_engine_coredump *ee)
> > if (GRAPHICS_VER(i915) >= 6) {
> > ee->rc_psmi = ENGINE_READ(engine, RING_PSMI_CTL);
> > -   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> > +   /*
> > +* For the media GT, this ring fault register is not replicated,
> > +* so don't do multicast/replicated register read/write
> > +* operation on it.
> > +    */
> > +   if (MEDIA_VER(i915) >= 13 && engine->gt->type == GT_MEDIA)
> > +   ee->fault_reg = intel_uncore_read(engine->uncore,
> > + 
> > XELPMP_RING_FAULT_REG);
> > +
> > +   else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> > ee->fault_reg = intel_gt_mcr_read_any(engine->gt,
> >   
> > XEHP_RING_FAULT_REG);
> > else if (GRAPHICS_VER(i915) >= 12)
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v7 1/4] drm/i915: Introduce intel_gt_mcr_lock_sanitize()

2023-09-28 Thread Matt Roper
On Thu, Sep 28, 2023 at 03:00:12PM +0200, Nirmoy Das wrote:
> Implement intel_gt_mcr_lock_sanitize() to provide a mechanism
> for cleaning the steer semaphore when absolutely necessary.
> 
> v2: remove unnecessary lock(Andi, Matt)
> improve the kernel doc(Matt)
> s/intel_gt_mcr_lock_clear/intel_gt_mcr_lock_sanitize
> 
> Signed-off-by: Nirmoy Das 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 22 ++
>  drivers/gpu/drm/i915/gt/intel_gt_mcr.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> index bf4a933de03a..326c2ed1d99b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> @@ -419,6 +419,28 @@ void intel_gt_mcr_unlock(struct intel_gt *gt, unsigned 
> long flags)
>   intel_uncore_write_fw(gt->uncore, MTL_STEER_SEMAPHORE, 0x1);
>  }
>  
> +/**
> + * intel_gt_mcr_lock_sanitize - Sanitize MCR steering lock
> + * @gt: GT structure
> + *
> + * This will be used to sanitize the initial status of the hardware lock
> + * during driver load and resume since there won't be any concurrent access
> + * from other agents at those times, but it's possible that boot firmware
> + * may have left the lock in a bad state.
> + *
> + */
> +void intel_gt_mcr_lock_sanitize(struct intel_gt *gt)
> +{
> + /*
> +  * This gets called at load/resume time, so we shouldn't be
> +  * racing with other driver threads grabbing the mcr lock.
> +  */
> + lockdep_assert_not_held(>mcr_lock);
> +
> + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> + intel_uncore_write_fw(gt->uncore, MTL_STEER_SEMAPHORE, 0x1);
> +}
> +
>  /**
>   * intel_gt_mcr_read - read a specific instance of an MCR register
>   * @gt: GT structure
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> index 41684495b7da..01ac565a56a4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> @@ -11,6 +11,7 @@
>  void intel_gt_mcr_init(struct intel_gt *gt);
>  void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags);
>  void intel_gt_mcr_unlock(struct intel_gt *gt, unsigned long flags);
> +void intel_gt_mcr_lock_sanitize(struct intel_gt *gt);
>  
>  u32 intel_gt_mcr_read(struct intel_gt *gt,
> i915_mcr_reg_t reg,
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 3/4] drm/i915: Reset steer semaphore for media GT on resume

2023-09-27 Thread Matt Roper
On Wed, Sep 27, 2023 at 11:03:56PM +0200, Nirmoy Das wrote:
> During resume, the steer semaphore on GT1 was observed to be held. The
> hardware team has confirmed the safety of clearing the steer semaphore
> during driver load/resume, as no lock acquisitions can occur in this
> process by other agents.

I guess the question is whether we just want to handle the one case
where we've already seen a BIOS snapshot screw up (i.e., specifically on
GT1 during resume), or do we want to make this a general sanitization
that we do on both GTs at both load and resume, just to be safe?  Given
that the hardware team has indicated no external agents would be
expected to be using steering at the point the driver is
loading/resuming, maybe it's best to always do the sanitization on
platforms that have a hardware semaphore?


Matt

> 
> v2: reset on resume not in intel_gt_init().
> v3: do the reset on intel_gt_resume_early()
> 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index dab73980c9f1..59cebf205b72 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -13,6 +13,7 @@
>  #include "intel_engine_pm.h"
>  #include "intel_gt.h"
>  #include "intel_gt_clock_utils.h"
> +#include "intel_gt_mcr.h"
>  #include "intel_gt_pm.h"
>  #include "intel_gt_print.h"
>  #include "intel_gt_requests.h"
> @@ -218,6 +219,17 @@ void intel_gt_pm_fini(struct intel_gt *gt)
>  
>  void intel_gt_resume_early(struct intel_gt *gt)
>  {
> + /*
> +  * Reset the steer semaphore on GT1, as we have observed it
> +  * remaining held after a suspend operation. Confirmation
> +  * from the hardware team ensures the safety of resetting
> +  * the steer semaphore during driver load/resume, as there
> +  * are no lock acquisitions during this process by other
> +  * agents.
> +  */
> + if (MEDIA_VER(gt->i915) >= 13 && gt->type == GT_MEDIA)
> + intel_gt_mcr_lock_reset(gt);
> +
>   intel_uncore_resume_early(gt->uncore);
>   intel_gt_check_and_clear_faults(gt);
>  }
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 1/4] drm/i915: Introduce intel_gt_mcr_lock_reset()

2023-09-27 Thread Matt Roper
On Wed, Sep 27, 2023 at 11:03:54PM +0200, Nirmoy Das wrote:
> Implement intel_gt_mcr_lock_reset() to provide a mechanism
> for resetting the steer semaphore when absolutely necessary.
> 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 29 ++
>  drivers/gpu/drm/i915/gt/intel_gt_mcr.h |  1 +
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> index bf4a933de03a..d98e0d2fc2ee 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> @@ -419,6 +419,35 @@ void intel_gt_mcr_unlock(struct intel_gt *gt, unsigned 
> long flags)
>   intel_uncore_write_fw(gt->uncore, MTL_STEER_SEMAPHORE, 0x1);
>  }
>  
> +/**
> + * intel_gt_mcr_lock_reset - Reset MCR steering lock
> + * @gt: GT structure
> + *
> + * Performs a steer semaphore reset operation. On MTL and beyond, a hardware
> + * lock will also be taken to serialize access not only for the driver,
> + * but also for external hardware and firmware agents.

The text here makes it sound like this reset function is going to take
the lock.  Since we have the same language in the lock() function's
kerneldoc, I think you can just delete this whole sentence.

> + * However, there may be situations where the driver must reset the semaphore
> + * but only when it is absolutely certain that no other agent should own the
> + * lock at that given time.

This part leads to questions about what such situations would be and how
we'd know it's safe to use.  Maybe it's best to just say something like
"This will be used to sanitize the initial status of the hardware lock
during driver load and resume since there won't be any concurrent access
from other agents at those times, but it's possible that boot firmware
may have left the lock in a bad state."

> + *
> + * Context: Takes gt->mcr_lock.  uncore->lock should *not* be held when this
> + *  function is called, although it may be acquired after this
> + *  function call.
> + */
> +void intel_gt_mcr_lock_reset(struct intel_gt *gt)
> +{
> + unsigned long __flags;
> +
> + lockdep_assert_not_held(>uncore->lock);
> +
> + spin_lock_irqsave(>mcr_lock, __flags);

If we're doing this to sanitize at load/resume, then presumably we
shouldn't ever be racing with other driver threads either, right?  If it
was possible for some other thread to already be grabbing the MCR lock,
then that would mean it also isn't safe for us to reset it here either.


Matt

> +
> + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> + intel_uncore_write_fw(gt->uncore, MTL_STEER_SEMAPHORE, 0x1);
> +
> + spin_unlock_irqrestore(>mcr_lock, __flags);
> +}
> +
>  /**
>   * intel_gt_mcr_read - read a specific instance of an MCR register
>   * @gt: GT structure
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> index 41684495b7da..485c7711f2e8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
> @@ -11,6 +11,7 @@
>  void intel_gt_mcr_init(struct intel_gt *gt);
>  void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags);
>  void intel_gt_mcr_unlock(struct intel_gt *gt, unsigned long flags);
> +void intel_gt_mcr_lock_reset(struct intel_gt *gt);
>  
>  u32 intel_gt_mcr_read(struct intel_gt *gt,
> i915_mcr_reg_t reg,
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC PATCH] drm/i915/gt: Apply Workaround 22016122933 to all the GT's

2023-09-27 Thread Matt Roper
On Wed, Sep 27, 2023 at 05:18:39PM +0200, Andi Shyti wrote:
> From: Nirmoy Das 
> 
> Commit f1530f912ed8 ("drm/i915/gt: Apply workaround 22016122933
> correctly") adds the workaround only in non media GT's, which is

This is backwards; the workaround is applied only to the media GT and
not to the primary GT.

> GT-0 in case of MTL. It turns out that we need to apply it in
> both the GT's.

The workaround database indicates this should only applied to the media
IP, not to the render IP, and the internal details further confirm that
this is not necessary on the primary GT.  Is there some other workaround
(with a different lineage number) that asks us to do the same thing on
the primary GT?


Matt

> 
> Signed-off-by: Nirmoy Das 
> Signed-off-by: Andi Shyti 
> Cc: Jonathan Cavitt 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 93062c35e072..7f7af1d4dc10 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1022,5 +1022,5 @@ enum i915_map_type intel_gt_coherent_map_type(struct 
> intel_gt *gt,
>  
>  bool intel_gt_needs_wa_22016122933(struct intel_gt *gt)
>  {
> - return MEDIA_VER_FULL(gt->i915) == IP_VER(13, 0) && gt->type == 
> GT_MEDIA;
> + return MEDIA_VER_FULL(gt->i915) == IP_VER(13, 0);
>  }
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2] drm/i915/mtl: Skip MCR ops for ring fault register

2023-09-26 Thread Matt Roper
On Tue, Sep 26, 2023 at 11:58:02PM +0200, Nirmoy Das wrote:
> On MTL GEN12_RING_FAULT_REG is not replicated so don't
> do mcr based operation for this register.
> 
> v2: use MEDIA_VER() instead of GRAPHICS_VER()(Matt).
> 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c  | 13 -
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c   | 10 +-
>  3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 93062c35e072..430738607f61 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -262,10 +262,21 @@ intel_gt_clear_error_registers(struct intel_gt *gt,
>  I915_MASTER_ERROR_INTERRUPT);
>   }
>  
> - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
> + /*
> +  * for media tile this ring fault register is not replicated

Nitpicks:  s/tile/gt/ and either write it as "For the media GT..."
(singular) or "For media GTs..." (plural).  Same with the other copy of
this comment farther down.

> +  * so skip doing mcr ops on it.
> +  */
> + if (MEDIA_VER(i915) == 13 && gt->type == GT_MEDIA) {

I guess for now we should probably make this (and the one farther down)
a ">=" instead of "==" under the assumption future media versions will
do the same in case we get some kind of refresh platform down the road
with a slightly higher version number.

Aside from those minor tweaks,

Reviewed-by: Matt Roper 

> + intel_uncore_rmw(uncore, XELPMP_RING_FAULT_REG,
> +  RING_FAULT_VALID, 0);
> + intel_uncore_posting_read(uncore,
> +   XELPMP_RING_FAULT_REG);
> +
> + } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
>   intel_gt_mcr_multicast_rmw(gt, XEHP_RING_FAULT_REG,
>  RING_FAULT_VALID, 0);
>   intel_gt_mcr_read_any(gt, XEHP_RING_FAULT_REG);
> +
>   } else if (GRAPHICS_VER(i915) >= 12) {
>   intel_uncore_rmw(uncore, GEN12_RING_FAULT_REG, 
> RING_FAULT_VALID, 0);
>   intel_uncore_posting_read(uncore, GEN12_RING_FAULT_REG);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cca4bac8f8b0..eecd0a87a647 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1084,6 +1084,7 @@
>  
>  #define GEN12_RING_FAULT_REG _MMIO(0xcec4)
>  #define XEHP_RING_FAULT_REG  MCR_REG(0xcec4)
> +#define XELPMP_RING_FAULT_REG_MMIO(0xcec4)
>  #define   GEN8_RING_FAULT_ENGINE_ID(x)   (((x) >> 12) & 0x7)
>  #define   RING_FAULT_GTTSEL_MASK (1 << 11)
>  #define   RING_FAULT_SRCID(x)(((x) >> 3) & 0xff)
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
> b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f4ebcfb70289..f0b691ea3a6e 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1234,7 +1234,15 @@ static void engine_record_registers(struct 
> intel_engine_coredump *ee)
>   if (GRAPHICS_VER(i915) >= 6) {
>   ee->rc_psmi = ENGINE_READ(engine, RING_PSMI_CTL);
>  
> - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> + /*
> +  * for media tile this ring fault register is not replicated
> +  * so skip doing mcr ops on it.
> +  */
> + if (MEDIA_VER(i915) == 13 && engine->gt->type == GT_MEDIA)
> + ee->fault_reg = intel_uncore_read(engine->uncore,
> +   
> XELPMP_RING_FAULT_REG);
> +
> + else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>   ee->fault_reg = intel_gt_mcr_read_any(engine->gt,
> 
> XEHP_RING_FAULT_REG);
>   else if (GRAPHICS_VER(i915) >= 12)
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915: Don't set PIPE_CONTROL_FLUSH_L3 for aux inval

2023-09-26 Thread Matt Roper
On Tue, Sep 26, 2023 at 04:24:01PM +0200, Nirmoy Das wrote:
> PIPE_CONTROL_FLUSH_L3 is not needed for aux invalidation
> so don't set that.
> 
> Fixes: 78a6ccd65fa3 ("drm/i915/gt: Ensure memory quiesced before 
> invalidation")
> Cc: Jonathan Cavitt 
> Cc: Andi Shyti 
> Cc:  # v5.8+
> Cc: Andrzej Hajda 
> Cc: Tvrtko Ursulin 
> Cc: Matt Roper 
> Cc: Tejas Upadhyay 
> Cc: Lucas De Marchi 
> Cc: Prathap Kumar Valsan 
> Cc: Tapani Pälli 
> Cc: Mark Janes 
> Cc: Rodrigo Vivi 
> Signed-off-by: Nirmoy Das 

Acked-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 0143445dba83..ba4c2422b340 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -271,8 +271,17 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
>   bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
>  
> + /*
> +  * L3 fabric flush is needed for AUX CCS invalidation
> +  * which happens as part of pipe-control so we can
> +  * ignore PIPE_CONTROL_FLUSH_L3. Also PIPE_CONTROL_FLUSH_L3
> +  * deals with Protected Memory which is not needed for
> +  * AUX CCS invalidation and lead to unwanted side effects.
> +  */
> + if (mode & EMIT_FLUSH)
> + bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
> +
>   bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
> - bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
>   bit_group_1 |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
>   /* Wa_1409600907:tgl,adl-p */
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/mtl: Skip MCR ops for ring fault register

2023-09-26 Thread Matt Roper
On Tue, Sep 26, 2023 at 04:18:42PM +0200, Nirmoy Das wrote:
> On MTL GEN12_RING_FAULT_REG is not replicated so don't
> do mcr based operation for this register.
> 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c  | 14 +-
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c   | 11 ++-
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
> b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 93062c35e072..d4de692e8be1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -262,10 +262,22 @@ intel_gt_clear_error_registers(struct intel_gt *gt,
>  I915_MASTER_ERROR_INTERRUPT);
>   }
>  
> - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
> + /*
> +  * for media tile this ring fault register is not replicated
> +  * so skip doing mcr ops on it.
> +  */
> + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50) &&

This should be checking the media version rather than the graphics
version.  I.e., "MEDIA_VER(i915) > 13" since it's possible future
versions of the media IP may change the behavior (independently of the
graphics IP versions).


Matt

> + gt->type == GT_MEDIA) {
> + intel_uncore_rmw(uncore, XELPMP_RING_FAULT_REG,
> +  RING_FAULT_VALID, 0);
> + intel_uncore_posting_read(uncore,
> +   XELPMP_RING_FAULT_REG);
> +
> + } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
>   intel_gt_mcr_multicast_rmw(gt, XEHP_RING_FAULT_REG,
>  RING_FAULT_VALID, 0);
>   intel_gt_mcr_read_any(gt, XEHP_RING_FAULT_REG);
> +
>   } else if (GRAPHICS_VER(i915) >= 12) {
>   intel_uncore_rmw(uncore, GEN12_RING_FAULT_REG, 
> RING_FAULT_VALID, 0);
>   intel_uncore_posting_read(uncore, GEN12_RING_FAULT_REG);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index cca4bac8f8b0..eecd0a87a647 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1084,6 +1084,7 @@
>  
>  #define GEN12_RING_FAULT_REG _MMIO(0xcec4)
>  #define XEHP_RING_FAULT_REG  MCR_REG(0xcec4)
> +#define XELPMP_RING_FAULT_REG_MMIO(0xcec4)
>  #define   GEN8_RING_FAULT_ENGINE_ID(x)   (((x) >> 12) & 0x7)
>  #define   RING_FAULT_GTTSEL_MASK (1 << 11)
>  #define   RING_FAULT_SRCID(x)(((x) >> 3) & 0xff)
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
> b/drivers/gpu/drm/i915/i915_gpu_error.c
> index f4ebcfb70289..83f1a729da8b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1234,7 +1234,16 @@ static void engine_record_registers(struct 
> intel_engine_coredump *ee)
>   if (GRAPHICS_VER(i915) >= 6) {
>   ee->rc_psmi = ENGINE_READ(engine, RING_PSMI_CTL);
>  
> - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> + /*
> +  * for media tile this ring fault register is not replicated
> +  * so skip doing mcr ops on it.
> +  */
> + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50) &&
> + engine->gt->type == GT_MEDIA)
> + ee->fault_reg = intel_uncore_read(engine->uncore,
> +   
> XELPMP_RING_FAULT_REG);
> +
> + else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
>   ee->fault_reg = intel_gt_mcr_read_any(engine->gt,
> 
> XEHP_RING_FAULT_REG);
>   else if (GRAPHICS_VER(i915) >= 12)
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915: Remove unnecessary memory quiescing for aux inval

2023-09-20 Thread Matt Roper
On Wed, Sep 20, 2023 at 01:11:31PM +0200, Nirmoy Das wrote:
> i915 already does memory quiesce before signaling
> breadcrumb so remove extra memory quiescing for aux
> invalidation which can cause unnecessary side effects.

This explanation seems confusing to me.  If we've already performed the
necessary flushing and quiesced all cache<->memory traffic, then
performing another flush should just be a noop, right?  If we're seeing
side effects, then doesn't that imply that there was still something in
the cache that hadn't made its way to memory yet?

Is the breadcrumb code flushing all the necessary bits?  What about
PIPE_CONTROL_CCS_FLUSH?


Matt

> 
> Fixes: 78a6ccd65fa3 ("drm/i915/gt: Ensure memory quiesced before 
> invalidation")
> Cc: Jonathan Cavitt 
> Cc: Andi Shyti 
> Cc:  # v5.8+
> Cc: Andrzej Hajda 
> Cc: Tvrtko Ursulin 
> Cc: Matt Roper 
> Cc: Tejas Upadhyay 
> Cc: Lucas De Marchi 
> Cc: Prathap Kumar Valsan 
> Cc: Tapani Pälli 
> Cc: Mark Janes 
> Cc: Rodrigo Vivi 
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 50 
>  1 file changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 0143445dba83..5001670046a0 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -248,11 +248,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  {
>   struct intel_engine_cs *engine = rq->engine;
>  
> - /*
> -  * On Aux CCS platforms the invalidation of the Aux
> -  * table requires quiescing memory traffic beforehand
> -  */
> - if (mode & EMIT_FLUSH || gen12_needs_ccs_aux_inv(engine)) {
> + if (mode & EMIT_FLUSH) {
>   u32 bit_group_0 = 0;
>   u32 bit_group_1 = 0;
>   int err;
> @@ -264,13 +260,6 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  
>   bit_group_0 |= PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>  
> - /*
> -  * When required, in MTL and beyond platforms we
> -  * need to set the CCS_FLUSH bit in the pipe control
> -  */
> - if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
> - bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
> -
>   bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
>   bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> @@ -800,14 +789,15 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request 
> *rq, u32 *cs)
>  {
>   struct drm_i915_private *i915 = rq->i915;
>   struct intel_gt *gt = rq->engine->gt;
> - u32 flags = (PIPE_CONTROL_CS_STALL |
> -  PIPE_CONTROL_TLB_INVALIDATE |
> -  PIPE_CONTROL_TILE_CACHE_FLUSH |
> -  PIPE_CONTROL_FLUSH_L3 |
> -  PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> -  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> -  PIPE_CONTROL_DC_FLUSH_ENABLE |
> -  PIPE_CONTROL_FLUSH_ENABLE);
> + u32 bit_group_0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
> + u32 bit_group_1 = (PIPE_CONTROL_CS_STALL |
> +PIPE_CONTROL_TLB_INVALIDATE |
> +PIPE_CONTROL_TILE_CACHE_FLUSH |
> +PIPE_CONTROL_FLUSH_L3 |
> +PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> +PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +PIPE_CONTROL_DC_FLUSH_ENABLE |
> +PIPE_CONTROL_FLUSH_ENABLE);
>  
>   /* Wa_14016712196 */
>   if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71)) || 
> IS_DG2(i915))
> @@ -817,14 +807,26 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request 
> *rq, u32 *cs)
>  
>   if (GRAPHICS_VER(i915) == 12 && GRAPHICS_VER_FULL(i915) < IP_VER(12, 
> 50))
>   /* Wa_1409600907 */
> - flags |= PIPE_CONTROL_DEPTH_STALL;
> + bit_group_1 |= PIPE_CONTROL_DEPTH_STALL;
>  
>   if (!HAS_3D_PIPELINE(rq->i915))
> - flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
> + bit_group_1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>   else if (rq->engine->class == COMPUTE_CLASS)
> - flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> + bit_group_1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> +
> + /*
> +  * On Aux CCS platforms the invalidation of the Aux
> +  * table requires quiescing memory traffic beforehand.
> +  * 

Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling

2023-07-28 Thread Matt Roper
On Fri, Jul 28, 2023 at 01:39:06PM +0100, Tvrtko Ursulin wrote:
> 
> Forgot one part of your reply:
> 
> On 28/07/2023 00:57, Matt Roper wrote:
> > On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > > introduced PAT indices to i915 internal APIs, partially replacing the
> > > usage of driver internal cache_level, but has also added a few sub-
> > > optimal design decisions which this patch tries to improve upon.
> > > 
> > > Principal change here is to invert the per platform cache level to PAT
> > > index table which was added by the referenced commit, and by doing so
> > > enable i915 to understand the cache mode between PAT indices, changing
> > > them from opaque to transparent.
> > > 
> > > Once we have the inverted table we are able to remove the hidden false
> > > "return true" from i915_gem_object_has_cache_level and make the involved
> > > code path clearer.
> > > 
> > > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > > composed of a more detailed representation of each cache mode (base mode
> > > plus flags).
> > > 
> > > In this way we are able to express the differences between different
> > > write-back mode coherency settings on Meteorlake, which in turn enables us
> > > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > > 
> > > We can also replace the platform dependent cache mode to string code in
> > > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > > 
> > > v2:
> > >   * Fix PAT-to-cache-mode table for PVC. (Fei)
> > >   * Cache display caching mode too. (Fei)
> > >   * Improve and document criteria in i915_gem_object_can_bypass_llc() 
> > > (Matt)
> > > 
> > > v3:
> > >   * Checkpath issues.
> > >   * Cache mode flags check fixed.
> > > 
> > > v4:
> > >   * Fix intel_device_info->cache_modes array size. (Matt)
> > >   * Boolean cache mode and flags query. (Matt)
> > >   * Reduce number of cache macros with some macro magic.
> > >   * One more checkpatch fix.
> > >   * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > > 
> > > Signed-off-by: Tvrtko Ursulin 
> > > References: 9275277d5324 ("drm/i915: use pat_index instead of 
> > > cache_level")
> > > Cc: Chris Wilson 
> > > Cc: Fei Yang 
> > > Cc: Andi Shyti 
> > > Cc: Matt Roper 
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.c|  60 +
> > >   drivers/gpu/drm/i915/gem/i915_gem_domain.h|   5 +-
> > >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   3 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_mman.c  |   4 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.c| 117 ++
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.h|  11 +-
> > >   .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +
> > >   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   8 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_stolen.c|   2 +-
> > >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> > >   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> > >   .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> > >   .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  10 +-
> > >   drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_ggtt.c  |  25 ++--
> > >   drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
> > >   drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +-
> > >   drivers/gpu/drm/i915/gt/intel_ppgtt.c |   6 +-
> > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> > >   drivers/gpu/drm/i915/gt/intel_timeline.c  |   2 +-
> > >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> > >   .../gpu/drm/i915/gt/selftest_workarounds.c|   2 +-
> > >   drivers/gpu/drm/i915/i915_cache.c |  89 +++--
> > >   drivers/gpu/drm/i915/i915_cache.h |  70 ++-
> > >   drivers/gpu/drm/i915/i91

Re: [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake

2023-07-28 Thread Matt Roper
On Fri, Jul 28, 2023 at 09:18:36AM +0100, Tvrtko Ursulin wrote:
> 
> On 27/07/2023 23:25, Matt Roper wrote:
> > On Thu, Jul 27, 2023 at 03:54:58PM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > No need to run extra instructions which will never trigger on platforms
> > > before Meteorlake.
> > > 
> > > Signed-off-by: Tvrtko Ursulin 
> > > ---
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++
> > >   1 file changed, 26 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
> > > b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index c8568e5d1147..862ac1d2de25 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
> > >   {
> > >   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > + if (unlikely(flags & PTE_READ_ONLY))
> > > + pte &= ~GEN8_PAGE_RW;
> > > +
> > > + if (flags & PTE_LM)
> > > + pte |= GEN12_PPGTT_PTE_LM;
> > > +
> > > + if (pat_index & BIT(0))
> > > + pte |= GEN12_PPGTT_PTE_PAT0;
> > > +
> > > + if (pat_index & BIT(1))
> > > + pte |= GEN12_PPGTT_PTE_PAT1;
> > > +
> > > + if (pat_index & BIT(2))
> > > + pte |= GEN12_PPGTT_PTE_PAT2;
> > > +
> > > + return pte;
> > > +}
> > > +
> > > +static u64 mtl_pte_encode(dma_addr_t addr,
> > > +   unsigned int pat_index,
> > > +   u32 flags)
> > > +{
> > > + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> > > +
> > 
> > Would it be more readable to start with
> > 
> >  gen8_pte_t pte = gen12_pte_encode(addr, pat_index, flags);
> > 
> > and then |-in only the MTL-specific bit(s) as appropriate?
> > 
> > >   if (unlikely(flags & PTE_READ_ONLY))
> > >   pte &= ~GEN8_PAGE_RW;
> > > @@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt 
> > > *gt,
> > >*/
> > >   ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
> > > + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> > > + ppgtt->vm.pte_encode = mtl_pte_encode;
> > >   if (GRAPHICS_VER(gt->i915) >= 12)
> > >   ppgtt->vm.pte_encode = gen12_pte_encode;
> > 
> > I think you wanted 'else if' here.  Otherwise you clobber the MTL
> > function pointer.
> 
> Doh this was a strong fail.. Yes and yes.. I even had it like you suggest in
> that patch I mentioned to you earlier..
> https://patchwork.freedesktop.org/patch/546013/?series=120341=2.
> 
> Do you have an opinion on that one perhaps?

Yeah, I overlooked that patch before, but it looks good to me.


Matt


> 
> Thanks,
> 
> Tvrtko

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 04:57:53PM -0700, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin 
> > 
> > Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> > introduced PAT indices to i915 internal APIs, partially replacing the
> > usage of driver internal cache_level, but has also added a few sub-
> > optimal design decisions which this patch tries to improve upon.
> > 
> > Principal change here is to invert the per platform cache level to PAT
> > index table which was added by the referenced commit, and by doing so
> > enable i915 to understand the cache mode between PAT indices, changing
> > them from opaque to transparent.
> > 
> > Once we have the inverted table we are able to remove the hidden false
> > "return true" from i915_gem_object_has_cache_level and make the involved
> > code path clearer.
> > 
> > To achieve this we replace the enum i915_cache_level with i915_cache_t,
> > composed of a more detailed representation of each cache mode (base mode
> > plus flags).
> > 
> > In this way we are able to express the differences between different
> > write-back mode coherency settings on Meteorlake, which in turn enables us
> > to map the i915 "cached" mode to the correct Meteorlake PAT index.
> > 
> > We can also replace the platform dependent cache mode to string code in
> > debugfs and elsewhere by the single implementation based on i915_cache_t.
> > 
> > v2:
> >  * Fix PAT-to-cache-mode table for PVC. (Fei)
> >  * Cache display caching mode too. (Fei)
> >  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> > 
> > v3:
> >  * Checkpath issues.
> >  * Cache mode flags check fixed.
> > 
> > v4:
> >  * Fix intel_device_info->cache_modes array size. (Matt)
> >  * Boolean cache mode and flags query. (Matt)
> >  * Reduce number of cache macros with some macro magic.
> >  * One more checkpatch fix.
> >  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> > 
> > Signed-off-by: Tvrtko Ursulin 
> > References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > Cc: Chris Wilson 
> > Cc: Fei Yang 
> > Cc: Andi Shyti 
> > Cc: Matt Roper 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.c|  60 +
> >  drivers/gpu/drm/i915/gem/i915_gem_domain.h|   5 +-
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   3 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |   4 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c| 117 ++
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h|  11 +-
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +
> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   8 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|   2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
> >  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
> >  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
> >  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  10 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  25 ++--
> >  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +-
> >  drivers/gpu/drm/i915/gt/intel_ppgtt.c |   6 +-
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_timeline.c  |   2 +-
> >  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
> >  .../gpu/drm/i915/gt/selftest_workarounds.c|   2 +-
> >  drivers/gpu/drm/i915/i915_cache.c |  89 +++--
> >  drivers/gpu/drm/i915/i915_cache.h |  70 ++-
> >  drivers/gpu/drm/i915/i915_debugfs.c   |  53 ++--
> >  drivers/gpu/drm/i915/i915_driver.c|   4 +-
> >  drivers/gpu/drm/i915/i915_gem.c   |  13 --
> >  drivers/gpu/drm/i915/i915_pci.c   |  84 +++--
> >  drivers/gpu/drm/i915/i915_perf.c  |   2 +-
> >  drivers/gpu/drm/i915/intel_device_info.h  |   6 +-
> >  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
> >  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
> >  .../gpu/drm/i915/selft

Re: [RFC 7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
> was set by userspace.
> 
> Instead it can decide based on the presence of full coherency which
> should be functionally equivalent on legacy platforms. We can ignore WT
> since it is only used by the display, and we can ignore Meteorlake since
> it will fail on the existing "has_llc" condition before the object cache
> mode check.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Fei Yang 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 9d6e49c8a4c6..f74b33670bad 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache 
> *cache,
>   if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>   return false;
>  
> - /*
> -  * For objects created by userspace through GEM_CREATE with pat_index
> -  * set by set_pat extension, i915_gem_object_has_cache_level() always
> -  * return true, otherwise the call would fall back to checking whether
> -  * the object is un-cached.
> -  */
>   return (cache->has_llc ||
>   obj->cache_dirty ||
> - !(obj->pat_set_by_user ||
> -   i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> + i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));

My understanding of relocations is minimal, but does 2W actually matter
here (CPU snooping GPU caches)?  I would have expected only 1W coherency
to be necessary (GPU snooping CPU caches)?


Matt

>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 6/8] drm/i915: Lift the user PAT restriction from gpu_write_needs_clflush

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:55:02PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Now that i915 understands the caching modes behind PAT indices, and having
> also special cased the Meteorlake snooping fully coherent mode, we can
> remove the user PAT check from gpu_write_needs_clflush().
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Fei Yang 
> Cc: Matt Roper 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index c15f83de33af..bf3a2fa0e539 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -41,12 +41,6 @@ static bool gpu_write_needs_clflush(struct 
> drm_i915_gem_object *obj)
>   if (IS_METEORLAKE(i915))
>   return false;
>  
> - /*
> -  * Always flush cache for UMD objects with PAT index set.
> -  */
> - if (obj->pat_set_by_user)
> - return true;
> -
>   /*
>* Fully coherent cached access may end up with data in the CPU cache
>* which hasn't hit memory yet.
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 5/8] drm/i915: Improve the vm_fault_gtt user PAT index restriction

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:55:01PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in vm_fault_gtt() to not reject the uncached PAT if it
> was set by userspace on a snoopable platform.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Fei Yang 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 14 +++---
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index cd7f8ded0d6f..9aa6ecf68432 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -382,17 +382,9 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>   goto err_reset;
>   }
>  
> - /*
> -  * For objects created by userspace through GEM_CREATE with pat_index
> -  * set by set_pat extension, coherency is managed by userspace, make
> -  * sure we don't fail handling the vm fault by calling
> -  * i915_gem_object_has_cache_level() which always return true for such
> -  * objects. Otherwise this helper function would fall back to checking
> -  * whether the object is un-cached.
> -  */
> - if (!((obj->pat_set_by_user ||
> -i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)) ||
> -   HAS_LLC(i915))) {
> + /* Access to snoopable pages through the GTT is incoherent. */

This comment was removed in the previous patch, but now it came back
here.  Should we have just left it be in the previous patch?

I'm not really clear on what it means either.  Are we using "GTT" as
shorthand to refer to the aperture here?


Matt

> + if (!i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) &&
> + !HAS_LLC(i915)) {
>   ret = -EFAULT;
>   goto err_unpin;
>   }
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 4/8] drm/i915: Refactor PAT/object cache handling

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:55:00PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> introduced PAT indices to i915 internal APIs, partially replacing the
> usage of driver internal cache_level, but has also added a few sub-
> optimal design decisions which this patch tries to improve upon.
> 
> Principal change here is to invert the per platform cache level to PAT
> index table which was added by the referenced commit, and by doing so
> enable i915 to understand the cache mode between PAT indices, changing
> them from opaque to transparent.
> 
> Once we have the inverted table we are able to remove the hidden false
> "return true" from i915_gem_object_has_cache_level and make the involved
> code path clearer.
> 
> To achieve this we replace the enum i915_cache_level with i915_cache_t,
> composed of a more detailed representation of each cache mode (base mode
> plus flags).
> 
> In this way we are able to express the differences between different
> write-back mode coherency settings on Meteorlake, which in turn enables us
> to map the i915 "cached" mode to the correct Meteorlake PAT index.
> 
> We can also replace the platform dependent cache mode to string code in
> debugfs and elsewhere by the single implementation based on i915_cache_t.
> 
> v2:
>  * Fix PAT-to-cache-mode table for PVC. (Fei)
>  * Cache display caching mode too. (Fei)
>  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> 
> v3:
>  * Checkpath issues.
>  * Cache mode flags check fixed.
> 
> v4:
>  * Fix intel_device_info->cache_modes array size. (Matt)
>  * Boolean cache mode and flags query. (Matt)
>  * Reduce number of cache macros with some macro magic.
>  * One more checkpatch fix.
>  * Tweak tables to show legacy and Gen12 WB is fully coherent.
> 
> Signed-off-by: Tvrtko Ursulin 
> References: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson 
> Cc: Fei Yang 
> Cc: Andi Shyti 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c|  60 +
>  drivers/gpu/drm/i915/gem/i915_gem_domain.h|   5 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 117 ++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  11 +-
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   8 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  20 +--
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>  .../drm/i915/gem/selftests/huge_gem_object.c  |   2 +-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   3 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  25 ++--
>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |   4 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |   6 +-
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>  drivers/gpu/drm/i915/gt/intel_timeline.c  |   2 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c|   2 +-
>  drivers/gpu/drm/i915/i915_cache.c |  89 +++--
>  drivers/gpu/drm/i915/i915_cache.h |  70 ++-
>  drivers/gpu/drm/i915/i915_debugfs.c   |  53 ++--
>  drivers/gpu/drm/i915/i915_driver.c|   4 +-
>  drivers/gpu/drm/i915/i915_gem.c   |  13 --
>  drivers/gpu/drm/i915/i915_pci.c   |  84 +++--
>  drivers/gpu/drm/i915/i915_perf.c  |   2 +-
>  drivers/gpu/drm/i915/intel_device_info.h  |   6 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   4 +-
>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  14 +--
>  36 files changed, 391 insertions(+), 367 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index 57db9c581bf6..c15f83de33af 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -8,6 +8,7 @@
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  
> +#include 

Re: [RFC 3/8] drm/i915: Cache PAT index used by the driver

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:54:59PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Eliminate a bunch of runtime calls to i915_gem_get_pat_index() by caching
> the interesting PAT indices in struct drm_i915_private. They are static
> per platfrom so no need to consult a function every time.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Matt Roper 
> Cc: Fei Yang 
> ---
>  drivers/gpu/drm/i915/Makefile |  1 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  3 +--
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  7 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 26 ---
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  4 +--
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  4 +--
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  8 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c   | 11 +++-
>  drivers/gpu/drm/i915/gt/selftest_migrate.c|  9 +++
>  drivers/gpu/drm/i915/gt/selftest_reset.c  | 14 +++---
>  drivers/gpu/drm/i915/gt/selftest_tlb.c|  5 ++--
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  |  8 ++
>  drivers/gpu/drm/i915/i915_cache.c | 18 +
>  drivers/gpu/drm/i915/i915_cache.h | 13 ++
>  drivers/gpu/drm/i915/i915_driver.c|  3 +++
>  drivers/gpu/drm/i915/i915_drv.h   |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c   |  8 ++
>  drivers/gpu/drm/i915/i915_gpu_error.c |  8 ++
>  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +---
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +--
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 11 +++-
>  .../drm/i915/selftests/intel_memory_region.c  |  4 +--
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 ++
>  24 files changed, 89 insertions(+), 91 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.c
>  create mode 100644 drivers/gpu/drm/i915/i915_cache.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index c5fc91cd58e7..905a51a16588 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
>  # core driver code
>  i915-y += i915_driver.o \
> i915_drm_client.o \
> +   i915_cache.o \
> i915_config.o \
> i915_getparam.o \
> i915_ioctl.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 5a687a3686bd..0a1d40220020 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1330,8 +1330,7 @@ static void *reloc_iomap(struct i915_vma *batch,
>   ggtt->vm.insert_page(>vm,
>i915_gem_object_get_dma_address(obj, page),
>offset,
> -  i915_gem_get_pat_index(ggtt->vm.i915,
> - I915_CACHE_NONE),
> +  eb->i915->pat_uc,
>0);
>   } else {
>   offset += page << PAGE_SHIFT;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 5b0a5cf9a98a..1c8eb806b7d3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -563,11 +563,8 @@ static void dbg_poison(struct i915_ggtt *ggtt,
>   while (size) {
>   void __iomem *s;
>  
> - ggtt->vm.insert_page(>vm, addr,
> -  ggtt->error_capture.start,
> -  i915_gem_get_pat_index(ggtt->vm.i915,
> - I915_CACHE_NONE),
> -  0);
> + ggtt->vm.insert_page(>vm, addr, ggtt->error_capture.start,
> +  ggtt->vm.i915->pat_uc, 0);
>   mb();
>  
>   s = io_mapping_map_wc(>iomap,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 7078af2f8f79..6bd6c239f4ac 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -58,6 +58,16 @@ i915_ttm_cache_level(struct drm_i915_private *i915, struct 
> ttm_resource *res,
>   I915_CACHE_NONE;
>  }
>  
> +static unsigned int
> +i915_ttm_cache_pat(struct drm_i915_private *i915, struct ttm_reso

Re: [Intel-gfx] [RFC 2/8] drm/i915: Split PTE encode between Gen12 and Meteorlake

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:54:58PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> No need to run extra instructions which will never trigger on platforms
> before Meteorlake.
> 
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index c8568e5d1147..862ac1d2de25 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -63,6 +63,30 @@ static u64 gen12_pte_encode(dma_addr_t addr,
>  {
>   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>  
> + if (unlikely(flags & PTE_READ_ONLY))
> + pte &= ~GEN8_PAGE_RW;
> +
> + if (flags & PTE_LM)
> + pte |= GEN12_PPGTT_PTE_LM;
> +
> + if (pat_index & BIT(0))
> + pte |= GEN12_PPGTT_PTE_PAT0;
> +
> + if (pat_index & BIT(1))
> + pte |= GEN12_PPGTT_PTE_PAT1;
> +
> + if (pat_index & BIT(2))
> + pte |= GEN12_PPGTT_PTE_PAT2;
> +
> + return pte;
> +}
> +
> +static u64 mtl_pte_encode(dma_addr_t addr,
> +   unsigned int pat_index,
> +   u32 flags)
> +{
> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> +

Would it be more readable to start with

gen8_pte_t pte = gen12_pte_encode(addr, pat_index, flags);

and then |-in only the MTL-specific bit(s) as appropriate?

>   if (unlikely(flags & PTE_READ_ONLY))
>   pte &= ~GEN8_PAGE_RW;
>  
> @@ -995,6 +1019,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>*/
>   ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>  
> + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> + ppgtt->vm.pte_encode = mtl_pte_encode;
>   if (GRAPHICS_VER(gt->i915) >= 12)
>   ppgtt->vm.pte_encode = gen12_pte_encode;

I think you wanted 'else if' here.  Otherwise you clobber the MTL
function pointer.


Matt

>   else
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 1/8] drm/i915: Skip clflush after GPU writes on Meteorlake

2023-07-27 Thread Matt Roper
On Thu, Jul 27, 2023 at 03:54:57PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> On Meteorlake CPU cache will not contain stale data after GPU access since
> write-invalidate protocol is used, which means there is no need to flush
> before potentially transitioning the buffer to a non-coherent domain.
> 
> Use the opportunity to documet the situation on discrete too.
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Matt Roper 
> Cc: Fei Yang 
> Cc: Matthew Auld 
> Cc: Thomas Hellström 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index ffddec1d2a76..57db9c581bf6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -24,9 +24,22 @@ static bool gpu_write_needs_clflush(struct 
> drm_i915_gem_object *obj)
>  {
>   struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  
> + /*
> +  * Discrete GPUs never dirty the CPU cache.
> +  */
>   if (IS_DGFX(i915))
>   return false;
>  
> + /*
> +  * Cache snooping on Meteorlake is using write-invalidate so GPU writes
> +  * never end up in the CPU cache.
> +  *
> +  * QQQ: Do other snooping platforms behave identicaly and could we
> +  *  therefore write this as "if !HAS_LLC(i915) && HAS_SNOOP(i915)"?
> +  */
> + if (IS_METEORLAKE(i915))
> + return false;
> +
>   /*
>* For objects created by userspace through GEM_CREATE with pat_index
>* set by set_pat extension, i915_gem_object_has_cache_level() will
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v8 5/9] drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control

2023-07-21 Thread Matt Roper
On Fri, Jul 21, 2023 at 06:15:10PM +0200, Andi Shyti wrote:
> Enable the CCS_FLUSH bit 13 in the control pipe for render and
> compute engines in platforms starting from Meteor Lake (BSPEC
> 43904 and 47112).
> 
> Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> engines")
> Signed-off-by: Andi Shyti 
> Cc: Jonathan Cavitt 
> Cc: Nirmoy Das 
> Cc:  # v5.8+

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 7 +++
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 5d2175e918dd2..139a7e69f5c4d 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -230,6 +230,13 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  
>   bit_group_0 |= PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>  
> + /*
> +  * When required, in MTL and beyond platforms we
> +  * need to set the CCS_FLUSH bit in the pipe control
> +  */
> + if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
> + bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
> +
>   bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
>   bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 5d143e2a8db03..5df7cce23197c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -299,6 +299,7 @@
>  #define   PIPE_CONTROL_QW_WRITE  (1<<14)
>  #define   PIPE_CONTROL_POST_SYNC_OP_MASK(3<<14)
>  #define   PIPE_CONTROL_DEPTH_STALL   (1<<13)
> +#define   PIPE_CONTROL_CCS_FLUSH (1<<13) /* MTL+ */
>  #define   PIPE_CONTROL_WRITE_FLUSH   (1<<12)
>  #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH (1<<12) /* gen6+ */
>  #define   PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE  (1<<11) /* MBZ on ILK */
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v8 2/9] drm/i915: Add the gen12_needs_ccs_aux_inv helper

2023-07-21 Thread Matt Roper
On Fri, Jul 21, 2023 at 06:15:07PM +0200, Andi Shyti wrote:
> We always assumed that a device might either have AUX or FLAT
> CCS, but this is an approximation that is not always true, e.g.
> PVC represents an exception.
> 
> Set the basis for future finer selection by implementing a
> boolean gen12_needs_ccs_aux_inv() function that tells whether aux
> invalidation is needed or not.
> 
> Currently PVC is the only exception to the above mentioned rule.
> 
> Signed-off-by: Andi Shyti 
> Cc: Matt Roper 
> Cc: Jonathan Cavitt 
> Cc:  # v5.8+

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 563efee055602..460c9225a50fc 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -165,6 +165,18 @@ static u32 preparser_disable(bool state)
>   return MI_ARB_CHECK | 1 << 8 | state;
>  }
>  
> +static bool gen12_needs_ccs_aux_inv(struct intel_engine_cs *engine)
> +{
> + if (IS_PONTEVECCHIO(engine->i915))
> + return false;
> +
> + /*
> +  * so far platforms supported by i915 having
> +  * flat ccs do not require AUX invalidation
> +  */
> + return !HAS_FLAT_CCS(engine->i915);
> +}
> +
>  u32 *gen12_emit_aux_table_inv(struct intel_gt *gt, u32 *cs, const i915_reg_t 
> inv_reg)
>  {
>   u32 gsi_offset = gt->uncore->gsi_offset;
> @@ -267,7 +279,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   else if (engine->class == COMPUTE_CLASS)
>   flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
> - if (!HAS_FLAT_CCS(rq->engine->i915))
> + if (gen12_needs_ccs_aux_inv(rq->engine))
>   count = 8 + 4;
>   else
>   count = 8;
> @@ -285,7 +297,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  
>   cs = gen8_emit_pipe_control(cs, flags, LRC_PPHWSP_SCRATCH_ADDR);
>  
> - if (!HAS_FLAT_CCS(rq->engine->i915)) {
> + if (gen12_needs_ccs_aux_inv(rq->engine)) {
>   /* hsdes: 1809175790 */
>   cs = gen12_emit_aux_table_inv(rq->engine->gt, cs,
> GEN12_CCS_AUX_INV);
> @@ -307,7 +319,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   if (mode & EMIT_INVALIDATE) {
>   cmd += 2;
>  
> - if (!HAS_FLAT_CCS(rq->engine->i915) &&
> + if (gen12_needs_ccs_aux_inv(rq->engine) &&
>   (rq->engine->class == VIDEO_DECODE_CLASS ||
>rq->engine->class == VIDEO_ENHANCEMENT_CLASS)) {
>   aux_inv = rq->engine->mask &
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v3] drm/i915: Refactor PAT/object cache handling

2023-07-21 Thread Matt Roper
;>>>> managed
> >>>>> - * by userspace. Othereise the call here would fall back to 
> >>>>> checking
> >>>>> - * whether the object is un-cached or write-through.
> >>>>> - */
> >>>>> -return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
> >>>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
> >>>>> +return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 
> >>>>> 1 &&
> >>>>> +   i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 
> >>>>> 1;
> >>>>>   }
> >>>
> >>> [snip]
> >>>>> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct 
> >>>>> reloc_cache *cache,
> >>>>>   if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
> >>>>>   return false;
> >>>>>
> >>>>> -/*
> >>>>> - * For objects created by userspace through GEM_CREATE with 
> >>>>> pat_index
> >>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() 
> >>>>> always
> >>>>> - * return true, otherwise the call would fall back to checking 
> >>>>> whether
> >>>>> - * the object is un-cached.
> >>>>> - */
> >>>>>   return (cache->has_llc ||
> >>>>>   obj->cache_dirty ||
> >>>>> -!i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
> >>>>> +i915_gem_object_has_cache_mode(obj,
> >>>>> + I915_CACHE_MODE_UC) != 1);
> >>>>
> >>>> Platforms with relocations and platforms with user-specified PAT
> >>>> have no overlap, right?  So a -1 return should be impossible here
> >>>> and this is one case where we could just treat the return value as
> >>>> a boolean, right?
> >>>
> >
> > Hm no, or maybe. My thinking behind tri-state is to allow a safe option
> > for "don't know". In case PAT index to cache mode table is not fully
> > populated on some future platform.
> 
> That would be a problem in the cache mode table. At least max_pat_index
> should have guaranteed the PAT index is sane.
> 
> -Fei

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 7/9] drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control

2023-07-20 Thread Matt Roper
On Wed, Jul 19, 2023 at 01:07:27PM +0200, Andi Shyti wrote:
> Enable the CCS_FLUSH bit 13 in the control pipe for render and
> compute engines in platforms starting from Meteor Lake (BSPEC
> 43904 and 47112). The VE and BCS engines need to add the flush
> part in their command streamer.
> 
> Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> engines")
> Signed-off-by: Andi Shyti 
> Cc: Jonathan Cavitt 
> Cc: Nirmoy Das 
> Cc:  # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 31 
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  1 +
>  2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 3bedab8d61db1..78bbd55262a2d 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -225,6 +225,13 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  
>   bit_group_0 |= PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>  
> + /*
> +  * When required, in MTL+ platforms we need to

Nitpick:  let's avoid using "FOO+" as "FOO and beyond."  We already have
formal IP names that include + signs (Xe_LPM+, Xe_LPD+, etc.), so using
it this way can cause confusion.

> +  * set the CCS_FLUSH bit in the pipe control
> +  */
> + if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
> + bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
> +
>   bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
>   bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> @@ -309,6 +316,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
>  {
>   intel_engine_mask_t aux_inv = 0;
> + u32 cmd_flush = 0;
>   u32 cmd = 4;
>   u32 *cs;
>  
> @@ -339,6 +347,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_CS_STALL;
>  
> + /*
> +  * When required, in MTL+ platforms we need to
> +  * set the CCS_FLUSH bit in the pipe control
> +  */
> + if (GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
> + bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
> +
>   intel_emit_pipe_control_cs(rq, bit_group_0, bit_group_1,
>  LRC_PPHWSP_SCRATCH_ADDR);
>  
> @@ -346,7 +361,18 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>  
>   case VIDEO_ENHANCEMENT_CLASS:
>   case COMPUTE_CLASS:
> + cmd += 2;
> + cmd_flush = MI_FLUSH_DW;
> +
> + break;
> +

It looks like some of these changes wound up in the wrong patch?
And as Nirmoy pointed out on the other patch, some of the functions and
engine instructions are mixed around here too.


Matt

>   case COPY_ENGINE_CLASS:
> + cmd += 2;
> + /*
> +  * When required, in MTL+ platforms we need to
> +  * set the CCS_FLUSH bit in the pipe control
> +  */
> + cmd_flush = MI_FLUSH_DW | MI_FLUSH_DW_CCS;
>   break;
>   }
>   }
> @@ -355,6 +381,11 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   if (IS_ERR(cs))
>   return PTR_ERR(cs);
>  
> + if (cmd_flush) {
> + *cs++ = cmd_flush;
> + *cs++ = 0;
> + }
> +
>   if (mode & EMIT_INVALIDATE)
>   *cs++ = preparser_disable(true);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 5d143e2a8db03..5df7cce23197c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -299,6 +299,7 @@
>  #define   PIPE_CONTROL_QW_WRITE  (1<<14)
>  #define   PIPE_CONTROL_POST_SYNC_OP_MASK(3<<14)
>  #define   PIPE_CONTROL_DEPTH_STALL   (1<<13)
> +#define   PIPE_CONTROL_CCS_FLUSH (1<<13) /* MTL+ */
>  #define   PIPE_CONTROL_WRITE_FLUSH   (1<<12)
>  #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH (1<<12) /* gen6+ */
>  #define   PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE  (1<<11) /* MBZ on ILK */
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 5/9] drm/i915/gt: Refactor intel_emit_pipe_control_cs() in a single function

2023-07-20 Thread Matt Roper
On Wed, Jul 19, 2023 at 01:07:25PM +0200, Andi Shyti wrote:
> Just a trivial refactoring for reducing the number of code
> duplicate. This will come at handy in the next commits.
> 
> Signed-off-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 44 +---
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 7566c89d9def3..1b1dadacfbf42 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -177,23 +177,31 @@ u32 *gen12_emit_aux_table_inv(struct intel_gt *gt, u32 
> *cs, const i915_reg_t inv
>   return cs;
>  }
>  
> +static u32 *intel_emit_pipe_control_cs(struct i915_request *rq, u32 
> bit_group_0,

This is another case where it gets confusing because this function name
sounds like it's something generic, but it actually only applies to a
small subset of platforms (gen12).

> +u32 bit_group_1, u32 offset)
> +{
> + u32 *cs;
> +
> + cs = intel_ring_begin(rq, 6);
> + if (IS_ERR(cs))
> + return cs;

We're not actually checking for this error at the callsites.  Should we
be checking for it and propagating it farther up the call stack?

> +
> + cs = gen12_emit_pipe_control(cs, bit_group_0, bit_group_1,
> +  LRC_PPHWSP_SCRATCH_ADDR);
> + intel_ring_advance(rq, cs);
> +
> + return cs;

This cursor never gets used for anything.  We can probably just make
this function return an int error code.


Matt

> +}
> +
>  static int mtl_dummy_pipe_control(struct i915_request *rq)
>  {
>   /* Wa_14016712196 */
>   if (IS_MTL_GRAPHICS_STEP(rq->engine->i915, M, STEP_A0, STEP_B0) ||
> - IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0)) {
> - u32 *cs;
> -
> - /* dummy PIPE_CONTROL + depth flush */
> - cs = intel_ring_begin(rq, 6);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> - cs = gen12_emit_pipe_control(cs,
> -  0,
> -  PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> -  LRC_PPHWSP_SCRATCH_ADDR);
> - intel_ring_advance(rq, cs);
> - }
> + IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0))
> + intel_emit_pipe_control_cs(rq,
> +0,
> +PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> +LRC_PPHWSP_SCRATCH_ADDR);
>  
>   return 0;
>  }
> @@ -210,7 +218,6 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   u32 bit_group_0 = 0;
>   u32 bit_group_1 = 0;
>   int err;
> - u32 *cs;
>  
>   err = mtl_dummy_pipe_control(rq);
>   if (err)
> @@ -237,13 +244,8 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   else if (engine->class == COMPUTE_CLASS)
>   bit_group_1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
> - cs = intel_ring_begin(rq, 6);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> -
> - cs = gen12_emit_pipe_control(cs, bit_group_0, bit_group_1,
> -  LRC_PPHWSP_SCRATCH_ADDR);
> - intel_ring_advance(rq, cs);
> + intel_emit_pipe_control_cs(rq, bit_group_0, bit_group_1,
> +LRC_PPHWSP_SCRATCH_ADDR);
>   }
>  
>   if (mode & EMIT_INVALIDATE) {
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v6 2/9] drm/i915: Add the has_aux_ccs device property

2023-07-20 Thread Matt Roper
On Wed, Jul 19, 2023 at 01:07:22PM +0200, Andi Shyti wrote:
> We always assumed that a device might either have AUX or FLAT
> CCS, but this is an approximation that is not always true as it
> requires some further per device checks.
> 
> Add the "has_aux_ccs" flag in the intel_device_info structure in
> order to have a per device flag indicating of the AUX CCS.

I think this flag is a bit misnamed/inaccurate at the moment.  AuxCCS in
general has been around for ages.  Bspec 14276 indicates the GT side of
the hardware has had AuxCCS since at least SNB (gen6).  You seem to just
be setting this flag on the platforms where we need to do TLB
invalidation for the AUX (gen12), which is a small subset of the
platforms that had this compression in general.

I kind of feel like the helper function approach might still be simpler
than using a device flag, but if you want to stick with the flag it's
probably best to rename it slightly so that it more accurately reflects
what we're using it for.


Matt

> 
> Signed-off-by: Andi Shyti 
> Cc: Matt Roper 
> Cc: Jonathan Cavitt 
> Cc:  # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 4 ++--
>  drivers/gpu/drm/i915/i915_drv.h  | 1 +
>  drivers/gpu/drm/i915/i915_pci.c  | 5 -
>  drivers/gpu/drm/i915/intel_device_info.h | 1 +
>  4 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 563efee055602..0d4d5e0407a2d 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -267,7 +267,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   else if (engine->class == COMPUTE_CLASS)
>   flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
> - if (!HAS_FLAT_CCS(rq->engine->i915))
> + if (HAS_AUX_CCS(rq->engine->i915))
>   count = 8 + 4;
>   else
>   count = 8;
> @@ -307,7 +307,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   if (mode & EMIT_INVALIDATE) {
>   cmd += 2;
>  
> - if (!HAS_FLAT_CCS(rq->engine->i915) &&
> + if (HAS_AUX_CCS(rq->engine->i915) &&
>   (rq->engine->class == VIDEO_DECODE_CLASS ||
>rq->engine->class == VIDEO_ENHANCEMENT_CLASS)) {
>   aux_inv = rq->engine->mask &
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 682ef2b5c7d59..e9cc048b5727a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -848,6 +848,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>   * stored in lmem to support the 3D and media compression formats.
>   */
>  #define HAS_FLAT_CCS(i915)   (INTEL_INFO(i915)->has_flat_ccs)
> +#define HAS_AUX_CCS(i915)(INTEL_INFO(i915)->has_aux_ccs)
>  
>  #define HAS_GT_UC(i915)  (INTEL_INFO(i915)->has_gt_uc)
>  
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index fcacdc21643cf..c9ff1d11a9fce 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -643,7 +643,8 @@ static const struct intel_device_info jsl_info = {
>   TGL_CACHELEVEL, \
>   .has_global_mocs = 1, \
>   .has_pxp = 1, \
> - .max_pat_index = 3
> + .max_pat_index = 3, \
> + .has_aux_ccs = 1
>  
>  static const struct intel_device_info tgl_info = {
>   GEN12_FEATURES,
> @@ -775,6 +776,7 @@ static const struct intel_device_info dg2_info = {
>  
>  static const struct intel_device_info ats_m_info = {
>   DG2_FEATURES,
> + .has_aux_ccs = 1,
>   .require_force_probe = 1,
>   .tuning_thread_rr_after_dep = 1,
>  };
> @@ -827,6 +829,7 @@ static const struct intel_device_info mtl_info = {
>   .__runtime.media.ip.ver = 13,
>   PLATFORM(INTEL_METEORLAKE),
>   .extra_gt_list = xelpmp_extra_gt,
> + .has_aux_ccs = 1,
>   .has_flat_ccs = 0,
>   .has_gmd_id = 1,
>   .has_guc_deprivilege = 1,
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
> b/drivers/gpu/drm/i915/intel_device_info.h
> index dbfe6443457b5..93485507506cc 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -151,6 +151,7 @@ enum intel_ppgtt_type {
>   func(has_reset_engine); \
>   func(has_3d_pipeline); \
>   func(has_4tile); \
> + func(has_aux_ccs); \
>   func(has_flat_ccs); \
>   func(has_global_mocs); \
>   func(has_gmd_id); \
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v3] drm/i915: Refactor PAT/object cache handling

2023-07-19 Thread Matt Roper
ven if the GPU access is WB, it's still non-coherent,
> thus CPU cache could be out-dated.

My point is that this is relocation code --- it should be impossible to
get here on MTL and beyond, right?  So user-provided PAT isn't a
consideration.

> 
> [snip]
> >> @@ -208,12 +230,6 @@ bool i915_gem_object_can_bypass_llc(struct 
> >> drm_i915_gem_object *obj)
> >>  if (!(obj->flags & I915_BO_ALLOC_USER))
> >>  return false;
> >>
> >> -/*
> >> - * Always flush cache for UMD objects at creation time.
> >> - */
> >> -if (obj->pat_set_by_user)
> >> -return true;
> >> -
> 
> I'm still worried that the removal of these lines would cause the
> MESA failure seen before. I know you are checking pat index below, but
> that is only about GPU access. It doesn't tell you how CPU is going to
> access the memory. If user space is setting an uncached PAT, then use
> copy engine to zero out the memory, but on the CPU side the mapping is
> cacheable, you could still seeing garbage data.
> 
> I agree the lines don't belong here because it doesn't have anything
> to do with LLC, but they need to be moved to the right location instead
> of being removed.

These lines got replaced with a check for the specific PAT indices that
are problematic rather than just assuming any user-provided PAT might
cause problems.  But I had some concerns about the specific logic there
in my review as well.


Matt

> 
> >>  /*
> >>   * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
> >>   * possible for userspace to bypass the GTT caching bits set by the
> >> @@ -226,7 +242,21 @@ bool i915_gem_object_can_bypass_llc(struct 
> >> drm_i915_gem_object *obj)
> >>   * it, but since i915 takes the stance of always zeroing memory before
> >>   * handing it to userspace, we need to prevent this.
> >>   */
> >> -return IS_JSL_EHL(i915);
> >> +if (IS_JSL_EHL(i915))
> >> +return true;
> >> +

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v3] drm/i915: Refactor PAT/object cache handling

2023-07-19 Thread Matt Roper
On Wed, Jul 19, 2023 at 01:37:30PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Commit 9275277d5324 ("drm/i915: use pat_index instead of cache_level") has
> introduced PAT indices to i915 internal APIs, partially replacing the
> usage of driver internal cache_level, but has also added a few
> questionable design decisions which this patch tries to improve upon.
> 
> Principal change is to invert the per platform cache level to PAT index
> table which was added by the referenced commit, and by doing so enable
> i915 to understand the cache mode between PAT indices, changing them from
> opaque to transparent.
> 
> Once we have the inverted table we are able to remove the hidden false
> "return true" from i915_gem_object_has_cache_level.
> 
> Other changes/fixes/improvements we are able to do:
> 
> 1)
> Replace the enum i915_cache_level with i915_cache_t, composed of a more
> detailed representation of each cache mode (base mode plus flags).
> 
> For instance this way we are able to express the difference between WB and
> 1-way coherent WB on Meteorlake. Which in turn enables us to map the i915
> "cached" mode to the correct Meteorlake PAT index.
> 
> 2)
> We can cache PAT indices of the caching modes used by the driver itself in
> struct drm_i915_private, which eliminates the runtime calls to
> i915_gem_get_pat_index from both high- and low-level i915 components.
> 
> 3)
> We can also cache the caching modes used by the driver for coherent
> access and for display buffers.
> 
> 4)
> Remove the incorrect references to enum i915_cache_level from low level
> PTE encode vfuncs, since those are actually given PAT indices by their
> callers.
> 
> 5)
> Because i915 now understands PAT indices, we can remove the overly
> aggressive flushing triggered from i915_gem_object_can_bypass_llc() and
> limit it to non-coherent write-back mode only.
> 
> 6)
> Finally we are able to replace the platform dependent cache mode to string
> code in debugfs and elsewhere by the single implementation based on
> i915_cache_t.
> 
> v2:
>  * Fix PAT-to-cache-mode table for PVC. (Fei)
>  * Cache display caching mode too. (Fei)
>  * Improve and document criteria in i915_gem_object_can_bypass_llc() (Matt)
> 
> v3:
>  * Checkpath issues.
>  * Cache mode flags check fixed.
> 
> Signed-off-by: Tvrtko Ursulin 
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson 
> Cc: Fei Yang 
> Cc: Andi Shyti 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  .../drm/i915/display/intel_plane_initial.c|   3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c|  56 ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.h|   5 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  13 +-
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  12 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 152 +++---
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  11 +-
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 116 +
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   8 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  11 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  44 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
>  .../drm/i915/gem/selftests/huge_gem_object.c  |   4 +-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   6 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   4 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  19 +--
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  33 ++--
>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |   4 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.h   |   3 +-
>  drivers/gpu/drm/i915/gt/intel_migrate.c   |  11 +-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |   6 +-
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   4 +-
>  drivers/gpu/drm/i915/gt/intel_timeline.c  |   2 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
>  drivers/gpu/drm/i915/gt/selftest_migrate.c|   9 +-
>  drivers/gpu/drm/i915/gt/selftest_reset.c  |  14 +-
>  drivers/gpu/drm/i915/gt/selftest_tlb.c|   5 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c|   2 +-
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  |   8 +-
>  drivers/gpu/drm/i915/i915_cache.c |  91 +++
>  drivers/gpu/drm/i915/i915_cache.h |  60 +++
>  drivers/gpu/drm/i915/i915_debugfs.c   |  53 +-
>  drivers/gpu/drm/i915/i91

Re: [PATCH v4 2/6] drm/i915/gt: Ensure memory quiesced before invalidation

2023-07-18 Thread Matt Roper
On Tue, Jul 18, 2023 at 02:28:26AM +0200, Andi Shyti wrote:
> Hi Matt,
> 
> > > > > > +   /*
> > > > > > +* Aux invalidations on Aux CCS platforms require
> > > > > > +* memory traffic is quiesced prior.
> > > > > > +*/
> > > > > > +   if ((mode & EMIT_INVALIDATE) && !HAS_FLAT_CCS(engine->i915))
> > > > > 
> > > > > It's a pre-existing mistake in drm-tip at the moment, but we shouldn't
> > > > > assume !flatccs always implies auxccs.  PVC has neither, and there may
> > > > > be other similar platforms in the future.  We should probably add a
> > > > > helper function for AuxCCS, similar to what we added to the Xe driver
> > > > > recently:
> > > > > 
> > > > > https://patchwork.freedesktop.org/patch/539304/?series=118334=1
> > > 
> > > Currently that is done in patch 6...
> > 
> > Are you sure?  Patch #6 consolidates things a bit, but is still incorrectly
> > assuming flatccs = !auxccs:
> > 
> >if (HAS_FLAT_CCS(engine->i915))  
> >   
> >return _MMIO(0); 
> >   
> 
> But isn't it the same the patch you linked is doing?
> 
>   return !xe->info.has_flat_ccs;

No, that's just the end of the function.  The important
platform-specific checks come before that point (at the moment we only
have PVC, but we expect more platforms to be added there very soon too).


Matt

> 
> And

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 2/6] drm/i915/gt: Ensure memory quiesced before invalidation

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 11:52:25PM +0200, Andi Shyti wrote:
> Hi Matt,
> 
> On Mon, Jul 17, 2023 at 01:31:03PM -0700, Matt Roper wrote:
> > On Mon, Jul 17, 2023 at 10:54:37AM -0700, Matt Roper wrote:
> > > On Mon, Jul 17, 2023 at 07:30:55PM +0200, Andi Shyti wrote:
> > > > From: Jonathan Cavitt 
> > > > 
> > > > All memory traffic must be quiesced before requesting
> > > > an aux invalidation on platforms that use Aux CCS.
> > > > 
> > > > Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> > > > engines")
> > > > Signed-off-by: Jonathan Cavitt 
> > > > Signed-off-by: Andi Shyti 
> > > > Cc:  # v5.8+
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 7 +++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> > > > b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > > > index 563efee055602..bee3b7dc595cf 100644
> > > > --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > > > +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > > > @@ -202,6 +202,13 @@ int gen12_emit_flush_rcs(struct i915_request *rq, 
> > > > u32 mode)
> > > >  {
> > > > struct intel_engine_cs *engine = rq->engine;
> > > >  
> > > > +   /*
> > > > +* Aux invalidations on Aux CCS platforms require
> > > > +* memory traffic is quiesced prior.
> > > > +*/
> > > > +   if ((mode & EMIT_INVALIDATE) && !HAS_FLAT_CCS(engine->i915))
> > > 
> > > It's a pre-existing mistake in drm-tip at the moment, but we shouldn't
> > > assume !flatccs always implies auxccs.  PVC has neither, and there may
> > > be other similar platforms in the future.  We should probably add a
> > > helper function for AuxCCS, similar to what we added to the Xe driver
> > > recently:
> > > 
> > > https://patchwork.freedesktop.org/patch/539304/?series=118334=1
> 
> Currently that is done in patch 6...

Are you sure?  Patch #6 consolidates things a bit, but is still incorrectly
assuming flatccs = !auxccs:

   if (HAS_FLAT_CCS(engine->i915))  
  
   return _MMIO(0); 
  

> 
> > BTW, since this patch didn't handle it I was expecting to see another
> > patch in the series that quiesces memory for the non-RCS/CCS engines,
> > but it looks like there isn't one yet.  So we should probably add the
> > necessary MI_FLUSH_DW logic for the other engines to this patch as well.
> 
> ... where also other engines are handles as well. I left this
> patch as it is in order to preserve the authorship and it's
> original form.

I don't see it being handled in patch #6.  That performs invalidation on
more engines than we were before, but it doesn't add the missing quiesce
logic as far as I can see.


Matt

> 
> Maybe in patch 6 I can add the extra check for PVC as you did for
> XE.
> 
> Thanks,
> Andi

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 2/6] drm/i915/gt: Ensure memory quiesced before invalidation

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 10:54:37AM -0700, Matt Roper wrote:
> On Mon, Jul 17, 2023 at 07:30:55PM +0200, Andi Shyti wrote:
> > From: Jonathan Cavitt 
> > 
> > All memory traffic must be quiesced before requesting
> > an aux invalidation on platforms that use Aux CCS.
> > 
> > Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> > engines")
> > Signed-off-by: Jonathan Cavitt 
> > Signed-off-by: Andi Shyti 
> > Cc:  # v5.8+
> > ---
> >  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> > b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > index 563efee055602..bee3b7dc595cf 100644
> > --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> > @@ -202,6 +202,13 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> > mode)
> >  {
> > struct intel_engine_cs *engine = rq->engine;
> >  
> > +   /*
> > +* Aux invalidations on Aux CCS platforms require
> > +* memory traffic is quiesced prior.
> > +*/
> > +   if ((mode & EMIT_INVALIDATE) && !HAS_FLAT_CCS(engine->i915))
> 
> It's a pre-existing mistake in drm-tip at the moment, but we shouldn't
> assume !flatccs always implies auxccs.  PVC has neither, and there may
> be other similar platforms in the future.  We should probably add a
> helper function for AuxCCS, similar to what we added to the Xe driver
> recently:
> 
> https://patchwork.freedesktop.org/patch/539304/?series=118334=1
> 

BTW, since this patch didn't handle it I was expecting to see another
patch in the series that quiesces memory for the non-RCS/CCS engines,
but it looks like there isn't one yet.  So we should probably add the
necessary MI_FLUSH_DW logic for the other engines to this patch as well.


Matt

> 
> Matt
> 
> 
> > +   mode |= EMIT_FLUSH;
> > +
> > if (mode & EMIT_FLUSH) {
> > u32 flags = 0;
> > int err;
> > -- 
> > 2.40.1
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 6/6] drm/i915/gt: Support aux invalidation on all engines

2023-07-17 Thread Matt Roper
}
> + if (intel_engine_has_aux_inv(rq->engine))
> + cmd += 10;
>   }
>  
>   cs = intel_ring_begin(rq, cmd);
> @@ -371,14 +395,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   *cs++ = 0; /* upper addr */
>   *cs++ = 0; /* value */
>  
> - if (aux_inv) { /* hsdes: 1809175790 */
> - if (rq->engine->class == VIDEO_DECODE_CLASS)
> - cs = gen12_emit_aux_table_inv(rq->engine->gt,
> -   cs, GEN12_VD0_AUX_INV);
> - else
> - cs = gen12_emit_aux_table_inv(rq->engine->gt,
> -   cs, GEN12_VE0_AUX_INV);
> - }
> + cs = intel_emit_aux_table_inv(rq->engine, cs);
>  
>   if (mode & EMIT_INVALIDATE)
>   *cs++ = preparser_disable(false);
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.h 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.h
> index 655e5c00ddc27..d938c94524510 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.h
> @@ -13,6 +13,7 @@
>  #include "intel_gt_regs.h"
>  #include "intel_gpu_commands.h"
>  
> +struct intel_engine_cs;
>  struct intel_gt;
>  struct i915_request;
>  
> @@ -46,7 +47,7 @@ u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, 
> u32 *cs);
>  u32 *gen11_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
>  u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
>  
> -u32 *gen12_emit_aux_table_inv(struct intel_gt *gt, u32 *cs, const i915_reg_t 
> inv_reg);
> +u32 *intel_emit_aux_table_inv(struct intel_engine_cs *engine, u32 *cs);
>  
>  static inline u32 *
>  __gen8_emit_pipe_control(u32 *batch, u32 flags0, u32 flags1, u32 offset)
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 235f3fab60a98..70054767c88c3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1371,10 +1371,7 @@ gen12_emit_indirect_ctx_rcs(const struct intel_context 
> *ce, u32 *cs)
>   IS_DG2_G11(ce->engine->i915))
>   cs = gen8_emit_pipe_control(cs, 
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE, 0);
>  
> - /* hsdes: 1809175790 */
> - if (!HAS_FLAT_CCS(ce->engine->i915))
> - cs = gen12_emit_aux_table_inv(ce->engine->gt,
> -   cs, GEN12_CCS_AUX_INV);
> + cs = intel_emit_aux_table_inv(ce->engine, cs);
>  
>   /* Wa_16014892111 */
>   if (IS_MTL_GRAPHICS_STEP(ce->engine->i915, M, STEP_A0, STEP_B0) ||
> @@ -1399,17 +1396,7 @@ gen12_emit_indirect_ctx_xcs(const struct intel_context 
> *ce, u32 *cs)
>   
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE,
>   0);
>  
> - /* hsdes: 1809175790 */
> - if (!HAS_FLAT_CCS(ce->engine->i915)) {
> - if (ce->engine->class == VIDEO_DECODE_CLASS)
> - cs = gen12_emit_aux_table_inv(ce->engine->gt,
> -   cs, GEN12_VD0_AUX_INV);
> - else if (ce->engine->class == VIDEO_ENHANCEMENT_CLASS)
> - cs = gen12_emit_aux_table_inv(ce->engine->gt,
> -   cs, GEN12_VE0_AUX_INV);
> - }
> -
> - return cs;
> + return intel_emit_aux_table_inv(ce->engine, cs);
>  }
>  
>  static void
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 5/6] drm/i915/gt: Poll aux invalidation register bit on invalidation

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 07:30:58PM +0200, Andi Shyti wrote:
> From: Jonathan Cavitt 
> 
> For platforms that use Aux CCS, wait for aux invalidation to
> complete by checking the aux invalidation register bit is
> cleared.
> 
> Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> engines")
> Signed-off-by: Jonathan Cavitt 
> Signed-off-by: Andi Shyti 
> Cc:  # v5.8+
> Reviewed-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 17 +
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  1 +
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index aa2fb9d72745a..fbc70f3b7f2fd 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -174,6 +174,16 @@ u32 *gen12_emit_aux_table_inv(struct intel_gt *gt, u32 
> *cs, const i915_reg_t inv
>   *cs++ = AUX_INV;
>   *cs++ = MI_NOOP;

We only need qword alignment for sequences of commands, not each
individual command, right?  So technically we could drop this noop...

>  
> + *cs++ = MI_SEMAPHORE_WAIT_TOKEN |
> + MI_SEMAPHORE_REGISTER_POLL |
> + MI_SEMAPHORE_POLL |
> + MI_SEMAPHORE_SAD_EQ_SDD;
> + *cs++ = 0;
> + *cs++ = i915_mmio_reg_offset(inv_reg) + gsi_offset;
> + *cs++ = 0;
> + *cs++ = 0;
> + *cs++ = MI_NOOP;

...and then we wouldn't need an extra one here.

If we drop the pair of noops, that would also change the # of dwords
farther down too.

> +
>   return cs;
>  }
>  
> @@ -284,10 +294,9 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   else if (engine->class == COMPUTE_CLASS)
>   flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
> + count = 8;
>   if (!HAS_FLAT_CCS(rq->engine->i915))

As noted on the earlier patch, we should probably make this check that
the platform actually has AuxCCS.  

Anyway, up to you whether you want to make that change or not.  The
extra noops don't actually hurt anything.

Reviewed-by: Matt Roper 

> - count = 8 + 4;
> - else
> - count = 8;
> + count += 10;
>  
>   cs = intel_ring_begin(rq, count);
>   if (IS_ERR(cs))
> @@ -330,7 +339,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
>   aux_inv = rq->engine->mask &
>   ~GENMASK(_BCS(I915_MAX_BCS - 1), BCS0);
>   if (aux_inv)
> - cmd += 4;
> + cmd += 10;
>   }
>   }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 5df7cce23197c..2bd8d98d21102 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -121,6 +121,7 @@
>  #define   MI_SEMAPHORE_TARGET(engine)((engine)<<15)
>  #define MI_SEMAPHORE_WAITMI_INSTR(0x1c, 2) /* GEN8+ */
>  #define MI_SEMAPHORE_WAIT_TOKEN  MI_INSTR(0x1c, 3) /* GEN12+ */
> +#define   MI_SEMAPHORE_REGISTER_POLL (1 << 16)
>  #define   MI_SEMAPHORE_POLL  (1 << 15)
>  #define   MI_SEMAPHORE_SAD_GT_SDD(0 << 12)
>  #define   MI_SEMAPHORE_SAD_GTE_SDD   (1 << 12)
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 4/6] drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 07:30:57PM +0200, Andi Shyti wrote:
> Enable the CCS_FLUSH bit 13 in the control pipe for render and
> compute engines in platforms starting from Meteor Lake (BSPEC
> 43904 and 47112).
> 
> Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> engines")
> Signed-off-by: Andi Shyti 
> Cc: Jonathan Cavitt 
> Cc: Nirmoy Das 
> Cc:  # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 +
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  1 +
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 3c935d6b68bf0..aa2fb9d72745a 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -207,7 +207,7 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>* memory traffic is quiesced prior.
>*/
>   if ((mode & EMIT_INVALIDATE) && !HAS_FLAT_CCS(engine->i915))
> - mode |= EMIT_FLUSH;
> + mode |= EMIT_FLUSH | EMIT_CCS_FLUSH;

Do we even really need the extra EMIT_* flag?  It seems like just doing
the CCS flush on graphics 12.70 and beyond would probably be fine since
EMIT_FLUSH is only used in two places on those platforms:  an
EMIT_BARRIER in intel_engine_emit_ctx_wa (which happens during device
init, before we've had an opportunity to use CCS for anything) and the
new flush we're adding here in aux invalidation.  All other uses of
EMIT_FLUSH in the driver are specific to non-GuC execlist submission or
to the old ringbuffer-based submission on pre-gen8 platforms.

Anyway, adding the extra condition shouldn't really hurt anything
either, so up to you whether you want to drop it or not.

Reviewed-by: Matt Roper 

>  
>   if (mode & EMIT_FLUSH) {
>   u32 bit_group_0 = 0;
> @@ -221,6 +221,14 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  
>   bit_group_0 |= PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>  
> + /*
> +  * When required, in MTL+ platforms we need to
> +  * set the CCS_FLUSH bit in the pipe control
> +  */
> + if (mode & EMIT_CCS_FLUSH &&
> + GRAPHICS_VER_FULL(rq->i915) >= IP_VER(12, 70))
> + bit_group_0 |= PIPE_CONTROL_CCS_FLUSH;
> +
>   bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
>   bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
>   bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index e99a6fa03d453..e2cae9d02bd62 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -514,6 +514,7 @@ struct intel_engine_cs {
>   int (*emit_flush)(struct i915_request *request, u32 mode);
>  #define EMIT_INVALIDATE  BIT(0)
>  #define EMIT_FLUSH   BIT(1)
> +#define EMIT_CCS_FLUSH   BIT(2) /* MTL+ */
>  #define EMIT_BARRIER (EMIT_INVALIDATE | EMIT_FLUSH)
>   int (*emit_bb_start)(struct i915_request *rq,
>u64 offset, u32 length,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index 5d143e2a8db03..5df7cce23197c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -299,6 +299,7 @@
>  #define   PIPE_CONTROL_QW_WRITE  (1<<14)
>  #define   PIPE_CONTROL_POST_SYNC_OP_MASK(3<<14)
>  #define   PIPE_CONTROL_DEPTH_STALL   (1<<13)
> +#define   PIPE_CONTROL_CCS_FLUSH     (1<<13) /* MTL+ */
>  #define   PIPE_CONTROL_WRITE_FLUSH   (1<<12)
>  #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH (1<<12) /* gen6+ */
>  #define   PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE  (1<<11) /* MBZ on ILK */
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 3/6] drm/i915/gt: Rename flags with bit_group_X according to the datasheet

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 07:30:56PM +0200, Andi Shyti wrote:
> In preparation of the next patch allign with the datasheet (BSPEC

s/allign/align/

Otherwise,

Reviewed-by: Matt Roper 

> 47112) with the naming of the pipe control set of flag values.
> The variable "flags" in gen12_emit_flush_rcs() is applied as a
> set of flags called Bit Group 1.
> 
> Define also the Bit Group 0 as bit_group_0 where currently only
> PIPE_CONTROL0_HDC_PIPELINE_FLUSH bit is set.
> 
> Signed-off-by: Andi Shyti 
> Cc:  # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 34 +---
>  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index bee3b7dc595cf..3c935d6b68bf0 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -210,7 +210,8 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   mode |= EMIT_FLUSH;
>  
>   if (mode & EMIT_FLUSH) {
> - u32 flags = 0;
> + u32 bit_group_0 = 0;
> + u32 bit_group_1 = 0;
>   int err;
>   u32 *cs;
>  
> @@ -218,32 +219,33 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>   if (err)
>   return err;
>  
> - flags |= PIPE_CONTROL_TILE_CACHE_FLUSH;
> - flags |= PIPE_CONTROL_FLUSH_L3;
> - flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> - flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> + bit_group_0 |= PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
> +
> + bit_group_1 |= PIPE_CONTROL_TILE_CACHE_FLUSH;
> + bit_group_1 |= PIPE_CONTROL_FLUSH_L3;
> + bit_group_1 |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> + bit_group_1 |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
>   /* Wa_1409600907:tgl,adl-p */
> - flags |= PIPE_CONTROL_DEPTH_STALL;
> - flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
> - flags |= PIPE_CONTROL_FLUSH_ENABLE;
> + bit_group_1 |= PIPE_CONTROL_DEPTH_STALL;
> + bit_group_1 |= PIPE_CONTROL_DC_FLUSH_ENABLE;
> + bit_group_1 |= PIPE_CONTROL_FLUSH_ENABLE;
>  
> - flags |= PIPE_CONTROL_STORE_DATA_INDEX;
> - flags |= PIPE_CONTROL_QW_WRITE;
> + bit_group_1 |= PIPE_CONTROL_STORE_DATA_INDEX;
> + bit_group_1 |= PIPE_CONTROL_QW_WRITE;
>  
> - flags |= PIPE_CONTROL_CS_STALL;
> + bit_group_1 |= PIPE_CONTROL_CS_STALL;
>  
>   if (!HAS_3D_PIPELINE(engine->i915))
> - flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
> + bit_group_1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>   else if (engine->class == COMPUTE_CLASS)
> - flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> + bit_group_1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
>   cs = intel_ring_begin(rq, 6);
>   if (IS_ERR(cs))
>   return PTR_ERR(cs);
>  
> - cs = gen12_emit_pipe_control(cs,
> -  PIPE_CONTROL0_HDC_PIPELINE_FLUSH,
> -  flags, LRC_PPHWSP_SCRATCH_ADDR);
> + cs = gen12_emit_pipe_control(cs, bit_group_0, bit_group_1,
> +  LRC_PPHWSP_SCRATCH_ADDR);
>   intel_ring_advance(rq, cs);
>   }
>  
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 2/6] drm/i915/gt: Ensure memory quiesced before invalidation

2023-07-17 Thread Matt Roper
On Mon, Jul 17, 2023 at 07:30:55PM +0200, Andi Shyti wrote:
> From: Jonathan Cavitt 
> 
> All memory traffic must be quiesced before requesting
> an aux invalidation on platforms that use Aux CCS.
> 
> Fixes: 972282c4cf24 ("drm/i915/gen12: Add aux table invalidate for all 
> engines")
> Signed-off-by: Jonathan Cavitt 
> Signed-off-by: Andi Shyti 
> Cc:  # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 563efee055602..bee3b7dc595cf 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -202,6 +202,13 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 
> mode)
>  {
>   struct intel_engine_cs *engine = rq->engine;
>  
> + /*
> +  * Aux invalidations on Aux CCS platforms require
> +  * memory traffic is quiesced prior.
> +  */
> + if ((mode & EMIT_INVALIDATE) && !HAS_FLAT_CCS(engine->i915))

It's a pre-existing mistake in drm-tip at the moment, but we shouldn't
assume !flatccs always implies auxccs.  PVC has neither, and there may
be other similar platforms in the future.  We should probably add a
helper function for AuxCCS, similar to what we added to the Xe driver
recently:

https://patchwork.freedesktop.org/patch/539304/?series=118334=1


Matt


> + mode |= EMIT_FLUSH;
> +
>   if (mode & EMIT_FLUSH) {
>   u32 flags = 0;
>   int err;
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc

2023-07-14 Thread Matt Roper
On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote:
> 
> On 14/07/2023 06:43, Yang, Fei wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > According to the comment in i915_gem_object_can_bypass_llc the
> > > purpose of the function is to return false if the platform/object
> > > has a caching mode where GPU can bypass the LLC.
> > > 
> > > So far the only platforms which allegedly can do this are Jasperlake
> > > and Elkhartlake, and that via MOCS (not PAT).
> > > 
> > > Instead of blindly assuming that objects where userspace has set the
> > > PAT index can (bypass the LLC), question is is there a such PAT index
> > > on a platform. Probably starting with Meteorlake since that one is the
> > > only one where set PAT extension can be currently used. Or if there is
> > > a MOCS entry which can achieve the same thing on Meteorlake.
> > > 
> > > If there is such PAT, now that i915 can be made to understand them
> > > better, we can make the check more fine grained. Or if there is a MOCS
> > > entry then we probably should apply the blanket IS_METEORLAKE condition.
> > > 
> > > Signed-off-by: Tvrtko Ursulin 
> > > Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > > Cc: Chris Wilson 
> > > Cc: Fei Yang 
> > > Cc: Andi Shyti 
> > > Cc: Matt Roper 
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 --
> > >   1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index 33a1e97d18b3..1e34171c4162 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct 
> > > drm_i915_gem_object *obj)
> > >if (!(obj->flags & I915_BO_ALLOC_USER))
> > >return false;
> > > 
> > > - /*
> > > -  * Always flush cache for UMD objects at creation time.
> > > -  */
> > > - if (obj->pat_set_by_user)
> > 
> > I'm afraid this is going to break MESA. Can we run MESA tests with this 
> > patch?
> 
> I can't, but question is why it would break Mesa which would need a nice
> comment here?
> 
> For instance should the check be IS_METEORLAKE?
> 
> Or should it be "is wb" && "not has 1-way coherent"?
> 
> Or both?
> 
> Or, given how Meteorlake does not have LLC, how can anything bypass it
> there? Or is it about snooping on Meteorlake and how?

I think the "LLC" in the function name is a bit misleading since this is
really all just about the ability to avoid coherency (which might come
from an LLC on some platforms or from snooping on others).

The concern is that the CPU writes to the buffer and those writes sit in
a CPU cache without making it to RAM immediately.  If the GPU then
reads the object with any of the non-coherent PAT settings that were
introduced in Xe_LPG, it will not snoop the CPU cache and will read old,
stale data from RAM.

So I think we'd want a condition like ("Xe_LPG or later" && "any non
coherent PAT").  The WB/WT/UC status of the GPU behavior shouldn't
matter here, just the coherency setting.


Matt

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > >/*
> > > * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make 
> > > it
> > > * possible for userspace to bypass the GTT caching bits set by the
> > > --
> > > 2.39.2

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 2/6] drm/i915/gt: Clear all bits from GEN12_FF_MODE2

2023-06-25 Thread Matt Roper
On Sat, Jun 24, 2023 at 10:17:53AM -0700, Lucas De Marchi wrote:
> Right now context workarounds don't do a rmw and instead only write to
> the register. Since 2 separate programmings to the same register are
> coalesced into a single write, this is not problematic for
> GEN12_FF_MODE2 since both TDS and GS timer are going to be written
> together and the other remaining bits be zeroed.
> 
> However in order to fix other workarounds that may want to preserve the
> unrelated bits in the same register, context workarounds need to
> be changed to a rmw. To prepare for that, move the programming of
> GEN12_FF_MODE2 to a single place so the value passed for "clear" can
> be all the bits. Otherwise the second workaround would be dropped as
> it'd be detected as overwriting a previously programmed workaround.
> 
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 51 +++--
>  1 file changed, 17 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 8f8346df3c18..7d48bd57b6ef 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -693,40 +693,11 @@ static void dg2_ctx_gt_tuning_init(struct 
> intel_engine_cs *engine,
>  0, false);
>  }
>  
> -/*
> - * These settings aren't actually workarounds, but general tuning settings 
> that
> - * need to be programmed on several platforms.
> - */
> -static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
> -  struct i915_wa_list *wal)
> -{
> - /*
> -  * Although some platforms refer to it as Wa_1604555607, we need to
> -  * program it even on those that don't explicitly list that
> -  * workaround.
> -  *
> -  * Note that the programming of this register is further modified
> -  * according to the FF_MODE2 guidance given by Wa_1608008084:gen12.
> -  * Wa_1608008084 tells us the FF_MODE2 register will return the wrong
> -  * value when read. The default value for this register is zero for all
> -  * fields and there are no bit masks. So instead of doing a RMW we
> -  * should just write TDS timer value. For the same reason read
> -  * verification is ignored.
> -  */
> - wa_add(wal,
> -GEN12_FF_MODE2,
> -FF_MODE2_TDS_TIMER_MASK,
> -FF_MODE2_TDS_TIMER_128,
> -0, false);
> -}
> -
>  static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
>  struct i915_wa_list *wal)
>  {
>   struct drm_i915_private *i915 = engine->i915;
>  
> - gen12_ctx_gt_tuning_init(engine, wal);
> -
>   /*
>* Wa_1409142259:tgl,dg1,adl-p
>* Wa_1409347922:tgl,dg1,adl-p
> @@ -748,15 +719,27 @@ static void gen12_ctx_workarounds_init(struct 
> intel_engine_cs *engine,
>   GEN9_PREEMPT_GPGPU_THREAD_GROUP_LEVEL);
>  
>   /*
> -  * Wa_16011163337
> +  * Wa_16011163337 - GS_TIMER
> +  *
> +  * TDS_TIMER: Although some platforms refer to it as Wa_1604555607, we
> +  * need to program it even on those that don't explicitly list that
> +  * workaround.
> +  *
> +  * Note that the programming of GEN12_FF_MODE2 is further modified
> +  * according to the FF_MODE2 guidance given by Wa_1608008084.
> +  * Wa_1608008084 tells us the FF_MODE2 register will return the wrong
> +  * value when read from the CPU.
>*
> -  * Like in gen12_ctx_gt_tuning_init(), read verification is ignored due
> -  * to Wa_1608008084.
> +  * The default value for this register is zero for all fields.
> +  * So instead of doing a RMW we should just write the desired values
> +  * for TDS and GS timers. Note that since the readback can't be trusted,
> +  * the clear mask is just set to ~0 to make sure other bits are not
> +  * inadvertently set. For the same reason read verification is ignored.
>*/
>   wa_add(wal,
>  GEN12_FF_MODE2,
> -FF_MODE2_GS_TIMER_MASK,
> -FF_MODE2_GS_TIMER_224,
> +~0,
> +FF_MODE2_TDS_TIMER_128 | FF_MODE2_GS_TIMER_224,
>  0, false);
>  
>   if (!IS_DG1(i915)) {
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH 2/3] drm/i915/gt: Fix context workarounds with non-masked regs

2023-06-23 Thread Matt Roper
On Fri, Jun 23, 2023 at 02:05:20PM -0700, Lucas De Marchi wrote:
> On Fri, Jun 23, 2023 at 12:48:13PM -0700, Kenneth Graunke wrote:
> > On Friday, June 23, 2023 8:49:05 AM PDT Lucas De Marchi wrote:
> > > On Thu, Jun 22, 2023 at 04:37:21PM -0700, Kenneth Graunke wrote:
> > > >On Thursday, June 22, 2023 11:27:30 AM PDT Lucas De Marchi wrote:
> > > >> Most of the context workarounds tweak masked registers, but not all. 
> > > >> For
> > > >> masked registers, when writing the value it's sufficient to just write
> > > >> the wa->set_bits since that will take care of both the clr and set bits
> > > >> as well as not overwriting other bits.
> > > >>
> > > >> However there are some workarounds, the registers are non-masked. Up
> > > >> until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with 
> > > >> the
> > > >> set_bits to program the register via the GPU in the WA bb. This has the
> > > >> side effect of overwriting the content of the register outside of bits
> > > >> that should be set and also doesn't handle the bits that should be
> > > >> cleared.
> > > >>
> > > >> Kenneth reported that on DG2, mesa was seeing a weird behavior due to
> > > >> the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With
> > > >> the GPU idle, that register could be read via intel_reg as 0x00e001ff,
> > > >> but during a 3D workload it would change to 0x007f. So the
> > > >> programming of that tuning was affecting more than the bits in
> > > >> L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the
> > > >> context workarounds due to the use of MI_LOAD_REGISTER_IMM.
> > > >>
> > > >> So, for registers that are not masked, read its value via mmio, modify
> > > >> and then set it in the buffer to be written by the GPU. This should 
> > > >> take
> > > >> care in a simple way of programming just the bits required by the
> > > >> tuning/workaround. If in future there are registers that involved that
> > > >> can't be read by the CPU, a more complex approach may be required like
> > > >> a) issuing additional instructions to read and modify; or b) scan the
> > > >> golden context and patch it in place before saving it; or something
> > > >> else. But for now this should suffice.
> > > >>
> > > >> Scanning the context workarounds for all platforms, these are the
> > > >> impacted ones with the respective registers
> > > >>
> > > >>mtl: DRAW_WATERMARK
> > > >>mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2
> > > >>gen12: GEN12_FF_MODE2
> > > >
> > > >Speaking of GEN12_FF_MODE2...there's a big scary comment above that
> > > >workaround write which says that register "will return the wrong value
> > > >when read."  I think with this patch, we'll start doing a RMW cycle for
> > > >the register, which could mix in some of this "wrong value".  The
> > > >comment mentions that the intention is to write the whole register,
> > > >as the default value is 0 for all fields.
> > > 
> > > Good point. That also means we don't need to backport this patch to
> > > stable kernel to any gen12, since overwritting the other bits is
> > > actually the intended behavior.
> > > 
> > > >
> > > >Maybe what we want to do is change gen12_ctx_gt_tuning_init to do
> > > >
> > > >wa_write(wal, GEN12_FF_MODE2, FF_MODE2_TDS_TIMER_128);
> > > >
> > > >so it has a clear mask of ~0 instead of FF_MODE2_TDS_TIMER_MASK, and
> > > 
> > > In order to ignore read back when verifying, we would still need to use
> > > wa_add(), but changing the mask. We don't have a wa_write() that ends up
> > > with { .clr = ~0, .read_mask = 0 }.
> > > 
> > >   wa_add(wal,
> > >  GEN12_FF_MODE2,
> > >  ~0, FF_MODE2_TDS_TIMER_128,
> > >  0, false);
> > 
> > Good point!  Though, I just noticed another bug here:
> > 
> > gen12_ctx_workarounds_init sets FF_MODE2_GS_TIMER_224 to avoid hangs
> > in the HS/DS unit, after gen12_ctx_gt_tuning_init set TDS_TIMER_128
> > for performance.  One of those is going to clobber the other; we're
> > likely losing the TDS tuning today.  Combi

Re: [Intel-xe] [RFC PATCH 1/1] drm/xe: Introduce function pointers for MMIO functions

2023-06-15 Thread Matt Roper
On Thu, Jun 15, 2023 at 04:04:18PM +0300, Oded Gabbay wrote:
> On Thu, Jun 15, 2023 at 3:01 AM Matt Roper  wrote:
> >
> > On Mon, Jun 12, 2023 at 06:31:57PM +0200, Francois Dugast wrote:
> > > On Thu, Jun 08, 2023 at 10:35:29AM -0700, Lucas De Marchi wrote:
> > > > On Fri, Jun 02, 2023 at 02:25:01PM +, Francois Dugast wrote:
> > > > > A local structure of function pointers is used as a minimal hardware
> > > > > abstraction layer to prepare for platform independent MMIO calls.
> > > > >
> > > > > Cc: Oded Gabbay 
> > > > > Cc: Ofir Bitton 
> > > > > Cc: Ohad Sharabi 
> > > > > Signed-off-by: Francois Dugast 
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_device_types.h |  3 ++
> > > > > drivers/gpu/drm/xe/xe_mmio.c | 81 
> > > > > drivers/gpu/drm/xe/xe_mmio.h | 35 ++--
> > > > > 3 files changed, 99 insertions(+), 20 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
> > > > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > index 17b6b1cc5adb..3f8fd0d8129b 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > @@ -378,6 +378,9 @@ struct xe_device {
> > > > >   /** @d3cold_allowed: Indicates if d3cold is a valid device state */
> > > > >   bool d3cold_allowed;
> > > > >
> > > > > + /** @mmio_funcs: function pointers for MMIO related functions */
> > > > > + const struct xe_mmio_funcs *mmio_funcs;
> > > > > +
> > > > >   /* private: */
> > > > >
> > > > > #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
> > > > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c 
> > > > > b/drivers/gpu/drm/xe/xe_mmio.c
> > > > > index 475b14fe4356..f3d08676a77a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > > > @@ -25,6 +25,62 @@
> > > > >
> > > > > #define BAR_SIZE_SHIFT 20
> > > > >
> > > > > +static void xe_mmio_write32_device(struct xe_gt *gt,
> > > > > +struct xe_reg reg, u32 val);
> > > > > +static u32 xe_mmio_read32_device(struct xe_gt *gt, struct xe_reg 
> > > > > reg);
> > > > > +static void xe_mmio_write64_device(struct xe_gt *gt,
> > > > > +struct xe_reg reg, u64 val);
> > > > > +static u64 xe_mmio_read64_device(struct xe_gt *gt, struct xe_reg 
> > > > > reg);
> > > > > +
> > > > > +static const struct xe_mmio_funcs xe_mmio_funcs_device = {
> > > > > + .write32 = xe_mmio_write32_device,
> > > > > + .read32 = xe_mmio_read32_device,
> > > > > + .write64 = xe_mmio_write64_device,
> > > > > + .read64 = xe_mmio_read64_device,
> > > > > +};
> > > > > +
> > > > > +static inline void xe_mmio_write32_device(struct xe_gt *gt,
> > > > > +struct xe_reg reg, u32 val)
> > > > > +{
> > > > > + struct xe_tile *tile = gt_to_tile(gt);
> > > > > +
> > > > > + if (reg.addr < gt->mmio.adj_limit)
> > > > > + reg.addr += gt->mmio.adj_offset;
> > > > > +
> > > > > + writel(val, tile->mmio.regs + reg.addr);
> > > > > +}
> > > > > +
> > > > > +static inline u32 xe_mmio_read32_device(struct xe_gt *gt, struct 
> > > > > xe_reg reg)
> > > > > +{
> > > > > + struct xe_tile *tile = gt_to_tile(gt);
> > > > > +
> > > > > + if (reg.addr < gt->mmio.adj_limit)
> > > > > + reg.addr += gt->mmio.adj_offset;
> > > > > +
> > > > > + return readl(tile->mmio.regs + reg.addr);
> > > > > +}
> > > > > +
> > > > > +static inline void xe_mmio_write64_device(struct xe_gt *gt,
> > > > > +struct xe_reg reg, u64 val)
> > > > > +{
> > > > > + struct xe_tile *tile = gt_to_tile(gt);
> > > > > +
> > > > > + if (reg.addr < gt->mmio.adj_limit)
> > > > > + reg.addr += gt-

Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-08 Thread Matt Roper
On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> Currently the KMD is using enum i915_cache_level to set caching policy
>for
>>> buffer objects. This is flaky because the PAT index which really
>controls
>>> the caching behavior in PTE has far more levels than what's defined in
>the
>>> enum. In addition, the PAT index is platform dependent, having to
>translate
>>> between i915_cache_level and PAT index is not reliable, and makes the
>code
>>> more complicated.
>>>
>>> From UMD's perspective there is also a necessity to set caching policy
>for
>>> performance fine tuning. It's much easier for the UMD to directly use
>PAT
>>> index because the behavior of each PAT index is clearly defined in
>Bspec.
>>> Having the abstracted i915_cache_level sitting in between would only
>cause
>>> more ambiguity.
>>
>> It may be worth mentioning here that PAT is expected to work much like
>> MOCS already works today --- the contract on the exact platform-specific
>> meaning of each index is documented in the hardware spec and userspace
>> is expected to select the index that exactly matches the behavior it
>> desires.
>will update.
>>>
>>> For these reasons this patch replaces i915_cache_level with PAT index.
>Also
>>> note, the cache_level is not completely removed yet, because the KMD
>still
>>> has the need of creating buffer objects with simple cache settings such
>as
>>> cached, uncached, or writethrough. For such simple cases, using
>cache_level
>>> would help simplify the code.
>>
>> See my comment farther down, but the implementation of
>> i915_gem_object_has_cache_level() seems a bit confusing and you may want
>> to elaborate on it here.
>>
>> Also somewhat confusing from a high-level skim is if/how we're still
>> using obj->cache_coherent once userspace has taken direct control of the
>> cache behavior.  Some PAT indices give coherency and others don't (and
>> the combinations will likely get more complicated on future platforms).
>> If obj->cache_coherent is still being considered even once PAT indices
>> are being controlled by userspace, I think we need some explanation of
>> how that works in the commit message (and likely in the kerneldoc for
>> that field too).
>For the objects with pat_index set by user space, coherency is managed by
>user space. Things like obj->cache_coherent and obj->cache_dirty are still
>there for objects created by kernel.

Right, that's the challenge --- userspace is taking over control of this
stuff, but the fields are still around and still used internally within
the driver.  How we reconcile those two things needs to be clearly
explained in the commit message and kerneldoc, otherwise we're going to
wind up with confusing code that's very difficult to maintain down the
road.

All the cache behavior is complicated enough that it probably wouldn't
be a bad idea to have a dedicated section of the kerneldoc focused on
cache behavior.

>>>
>>> Cc: Chris Wilson 
>>> Cc: Matt Roper 
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andi Shyti 
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 45 ++
>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 51 +++-
>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +
>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>>>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +

Re: [PATCH v5 3/3] drm/i915: make sure correct pte encode is used

2023-05-04 Thread Matt Roper
On Wed, May 03, 2023 at 03:51:00PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> PTE encode is platform dependent. After replacing cache_level with
> pat_index, the newly introduced mtl_pte_encode is actually generic
> for all gen12 platforms, thus rename it to gen12_pte_encode and
> apply it to all gen12 platforms.
> 
> Cc: Chris Wilson 
> Cc: Matt Roper 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 

Bspec: 63019
Reviewed-by: Matt Roper 

I think it's important to include the bspec reference here since we have
so much trouble finding proper documentation on PTE formats (other pages
like 45030, 45040, etc. never got updated in a sensible way).


Matt

> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index f2334a713c4e..d1e3d3b90e95 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -55,9 +55,9 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>   return pte;
>  }
>  
> -static u64 mtl_pte_encode(dma_addr_t addr,
> -   unsigned int pat_index,
> -   u32 flags)
> +static u64 gen12_pte_encode(dma_addr_t addr,
> + unsigned int pat_index,
> + u32 flags)
>  {
>   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>  
> @@ -995,8 +995,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>*/
>   ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>  
> - if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> - ppgtt->vm.pte_encode = mtl_pte_encode;
> + if (GRAPHICS_VER(gt->i915) >= 12)
> + ppgtt->vm.pte_encode = gen12_pte_encode;
>   else
>   ppgtt->vm.pte_encode = gen8_pte_encode;
>  
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v5 2/3] drm/i915: use pat_index instead of cache_level

2023-05-04 Thread Matt Roper
On Wed, May 03, 2023 at 03:50:59PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> Currently the KMD is using enum i915_cache_level to set caching policy for
> buffer objects. This is flaky because the PAT index which really controls
> the caching behavior in PTE has far more levels than what's defined in the
> enum. In addition, the PAT index is platform dependent, having to translate
> between i915_cache_level and PAT index is not reliable, and makes the code
> more complicated.
> 
> From UMD's perspective there is also a necessity to set caching policy for
> performance fine tuning. It's much easier for the UMD to directly use PAT
> index because the behavior of each PAT index is clearly defined in Bspec.
> Having the abstracted i915_cache_level sitting in between would only cause
> more ambiguity.

It may be worth mentioning here that PAT is expected to work much like
MOCS already works today --- the contract on the exact platform-specific
meaning of each index is documented in the hardware spec and userspace
is expected to select the index that exactly matches the behavior it
desires.

> 
> For these reasons this patch replaces i915_cache_level with PAT index. Also
> note, the cache_level is not completely removed yet, because the KMD still
> has the need of creating buffer objects with simple cache settings such as
> cached, uncached, or writethrough. For such simple cases, using cache_level
> would help simplify the code.

See my comment farther down, but the implementation of
i915_gem_object_has_cache_level() seems a bit confusing and you may want
to elaborate on it here.

Also somewhat confusing from a high-level skim is if/how we're still
using obj->cache_coherent once userspace has taken direct control of the
cache behavior.  Some PAT indices give coherency and others don't (and
the combinations will likely get more complicated on future platforms).
If obj->cache_coherent is still being considered even once PAT indices
are being controlled by userspace, I think we need some explanation of
how that works in the commit message (and likely in the kerneldoc for
that field too).

> 
> Cc: Chris Wilson 
> Cc: Matt Roper 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 45 ++
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 51 +++-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 71 
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 82 +--
>  drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 ++---
>  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
>  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
>  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 ++-
>  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
>  drivers/gpu/drm/i915/i915_debugfs.c   | 52 +---
>  drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
>  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
>  drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
>  drivers/gpu/drm/i915/i915_vma.h   |  2 +-
>  drivers/gpu/drm/i915/i915_vma_types.h |  2 -
>  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>  .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>  drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
>  36 files changed, 391 insertions(+), 240 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
> b/drivers/gpu/drm/i915/display/intel_dpt.c
> index c5eacfdba1a5..7c5fddb203ba 100644
> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_

Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-24 Thread Matt Roper
On Sun, Apr 23, 2023 at 12:37:27AM -0700, Yang, Fei wrote:
> > On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
> >>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.y...@intel.com wrote:
> >>>> From: Fei Yang 
> >>>>
> >>>> PTE encode functions are platform dependent. This patch implements
> >>>> PTE functions for MTL, and ensures the correct PTE encode function
> >>>> is used by calling pte_encode function pointer instead of the
> >>>> hardcoded gen8 version of PTE encode.
> >>>>
> >>>> Signed-off-by: Fei Yang 
> >>>> Reviewed-by: Andrzej Hajda 
> >>>> Reviewed-by: Andi Shyti 
> >>>> Acked-by: Nirmoy Das 
> >>>
> >>> Bspec: 45015, 45040
> >>>
> >>>> ---
> >>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
> >>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 
> >>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +--
> >>>>  3 files changed, 72 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
> >>b/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> index b8027392144d..c5eacfdba1a5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
> >>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
> >>>>vm->vma_ops.bind_vma= dpt_bind_vma;
> >>>>vm->vma_ops.unbind_vma  = dpt_unbind_vma;
> >>>>
> >>>> - vm->pte_encode = gen8_ggtt_pte_encode;
> >>>> + vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
> >>>>
> >>>>dpt->obj = dpt_obj;
> >>>>dpt->obj->is_dpt = true;
> >>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>>  b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> index 4daaa6f55668..11b91e0453c8 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> >>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
> >>>>return pte;
> >>>>  }
> >>>>
> >>>> +static u64 mtl_pte_encode(dma_addr_t addr,
> >>>> +   enum i915_cache_level level,
> >>>> +   u32 flags)
> >>>> +{
> >>>> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> >>>> +
> >>>> + if (unlikely(flags & PTE_READ_ONLY))
> >>>> + pte &= ~GEN8_PAGE_RW;
> >>>> +
> >>>> + if (flags & PTE_LM)
> >>>> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
> >>>
> >>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
> >>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
> >>> this trying to do?
> >>
> >> This takes effect only for PTE_LM, doesn't affect MTL.
> >> PTE_NC is needed for PVC (use of access counter).
> >> I believe this function was writen based on the one for PVC. And this
> >> function did get extended to cover all gen12 in a later patch.
> >
> > Even though MTL doesn't have local memory, PTE_LM is supposed to be
> > used on MTL for access to BAR2 stolen memory.
> 
> You were right, but I still think this code is fine because this bit is
> ignored for MTL anyway and it is needed for other platforms with LMEM.
> Otherwise this code would have some sort of platform checking which is
> hard to do because we don't have platform info here.
> Or we would have to define another PTE encode function for platforms
> needing PTE_NC just for this one difference, then manage the function
> pointer correctly.

MTL is the only platform that uses this function right now:

   +   if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
   +   ppgtt->vm.pte_encode = mtl_pte_encode;
   +   else
   +   ppgtt->vm.pte_encode = gen8_pte_encode;

If this is intended for PVC, then you have it in the wrong function to
begin with (and it also shouldn't be in a patch labelled "mtl").  If
you're trying to future-proof for some post-MTL discrete platform, then
such code should be saved until we enable that platform so that it can
be properly reviewed.


Matt

> 
> -Fei
> 
> > Matt
> >
> >> -Fei
> >>> Matt
> >>>
> >>>> +
> >>>> + switch (level) {
> >>>> + case I915_CACHE_NONE:
> >>>> + pte |= GEN12_PPGTT_PTE_PAT1;
> >>>> + break;

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v4 8/8] drm/i915: Allow user to set cache at BO creation

2023-04-21 Thread Matt Roper
On Fri, Apr 21, 2023 at 10:38:01AM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> To comply with the design that buffer objects shall have immutable
> cache setting through out their life cycle, {set, get}_caching ioctl's
> are no longer supported from MTL onward. With that change caching
> policy can only be set at object creation time. The current code
> applies a default (platform dependent) cache setting for all objects.
> However this is not optimal for performance tuning. The patch extends
> the existing gem_create uAPI to let user set PAT index for the object
> at creation time.
> The new extension is platform independent, so UMD's can switch to using
> this extension for older platforms as well, while {set, get}_caching are
> still supported on these legacy paltforms for compatibility reason.
> 
> Cc: Chris Wilson 
> Cc: Matt Roper 
> Cc: Andi Shyti 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 

This still needs links links to the opensource userspace pull requests
(which must be fully reviewed, approved, and ready to merge by those
projects before we can apply the kernel changes), Cc's for the relevant
userspace developers (we need their ack on this as well), and a
Testcase: trailer indicating what IGT test(s) cover this new uapi.


Matt

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 ++
>  drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 
>  include/uapi/drm/i915_drm.h| 36 ++
>  tools/include/uapi/drm/i915_drm.h  | 36 ++
>  4 files changed, 114 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index bfe1dbda4cb7..723c3ddd6c74 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -245,6 +245,7 @@ struct create_ext {
>   unsigned int n_placements;
>   unsigned int placement_mask;
>   unsigned long flags;
> + unsigned int pat_index;
>  };
>  
>  static void repr_placements(char *buf, size_t size,
> @@ -394,11 +395,39 @@ static int ext_set_protected(struct i915_user_extension 
> __user *base, void *data
>   return 0;
>  }
>  
> +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
> +{
> + struct create_ext *ext_data = data;
> + struct drm_i915_private *i915 = ext_data->i915;
> + struct drm_i915_gem_create_ext_set_pat ext;
> + unsigned int max_pat_index;
> +
> + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
> +  offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
> +
> + if (copy_from_user(, base, sizeof(ext)))
> + return -EFAULT;
> +
> + max_pat_index = INTEL_INFO(i915)->max_pat_index;
> +
> + if (ext.pat_index > max_pat_index) {
> + drm_dbg(>drm, "PAT index is invalid: %u\n",
> + ext.pat_index);
> + return -EINVAL;
> + }
> +
> + ext_data->pat_index = ext.pat_index;
> +
> + return 0;
> +}
> +
>  static const i915_user_extension_fn create_extensions[] = {
>   [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
>   [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
>  };
>  
> +#define PAT_INDEX_NOT_SET0x
>  /**
>   * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle 
> to it.
>   * @dev: drm device pointer
> @@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
> *data,
>   if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
>   return -EINVAL;
>  
> + ext_data.pat_index = PAT_INDEX_NOT_SET;
>   ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
>  create_extensions,
>  ARRAY_SIZE(create_extensions),
> @@ -454,5 +484,11 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
> *data,
>   if (IS_ERR(obj))
>   return PTR_ERR(obj);
>  
> + if (ext_data.pat_index != PAT_INDEX_NOT_SET) {
> + i915_gem_object_set_pat_index(obj, ext_data.pat_index);
> + /* Mark pat_index is set by UMD */
> + obj->cache_level = I915_CACHE_INVAL;
> + }
> +
>   return i915_gem_publish(obj, file, >size, >handle);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 27c948350b5b..61651f7e5806 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/

Re: [PATCH v4 2/8] drm/i915/mtl: fix mocs selftest

2023-04-21 Thread Matt Roper
On Fri, Apr 21, 2023 at 10:37:55AM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> Media GT has a different base for MOCS register, need to apply
> gsi_offset to the mmio address if not using the intel_uncore_r/w
> functions for register access.
> 
> Cc: Matt Roper 
> Signed-off-by: Fei Yang 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/selftest_mocs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c 
> b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index ca009a6a13bd..a8446ab82501 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -131,13 +131,14 @@ static int read_mocs_table(struct i915_request *rq,
>  const struct drm_i915_mocs_table *table,
>  u32 *offset)
>  {
> + struct intel_gt *gt = rq->engine->gt;
>   u32 addr;
>  
>   if (!table)
>   return 0;
>  
>   if (HAS_GLOBAL_MOCS_REGISTERS(rq->engine->i915))
> - addr = global_mocs_offset();
> + addr = global_mocs_offset() + gt->uncore->gsi_offset;
>   else
>   addr = mocs_offset(rq->engine);
>  
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-21 Thread Matt Roper
On Fri, Apr 21, 2023 at 10:27:22AM -0700, Yang, Fei wrote:
>> On Wed, Apr 19, 2023 at 04:00:53PM -0700, fei.y...@intel.com wrote:
>>> From: Fei Yang 
>>>
>>> PTE encode functions are platform dependent. This patch implements
>>> PTE functions for MTL, and ensures the correct PTE encode function
>>> is used by calling pte_encode function pointer instead of the
>>> hardcoded gen8 version of PTE encode.
>>>
>>> Signed-off-by: Fei Yang 
>>> Reviewed-by: Andrzej Hajda 
>>> Reviewed-by: Andi Shyti 
>>> Acked-by: Nirmoy Das 
>>
>> Bspec: 45015, 45040
>>
>>> ---
>>>  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 45 
>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 +--
>>>  3 files changed, 72 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c
>b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> index b8027392144d..c5eacfdba1a5 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>>        vm->vma_ops.bind_vma    = dpt_bind_vma;
>>>        vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>>
>>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>>
>>>        dpt->obj = dpt_obj;
>>>        dpt->obj->is_dpt = true;
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 4daaa6f55668..11b91e0453c8 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>>        return pte;
>>>  }
>>>
>>> +static u64 mtl_pte_encode(dma_addr_t addr,
>>> +                       enum i915_cache_level level,
>>> +                       u32 flags)
>>> +{
>>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>>> +
>>> +     if (unlikely(flags & PTE_READ_ONLY))
>>> +             pte &= ~GEN8_PAGE_RW;
>>> +
>>> +     if (flags & PTE_LM)
>>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>>
>> GEN12_PPGTT_PTE_NC got defined in the previous patch as BIT(5).  But
>> according to bspec 45040, bit 5 is ignored in the PTE encoding.  What is
>> this trying to do?
>This takes effect only for PTE_LM, doesn't affect MTL.
>PTE_NC is needed for PVC (use of access counter).
>I believe this function was writen based on the one for PVC. And this
>function
>did get extended to cover all gen12 in a later patch.

Even though MTL doesn't have local memory, PTE_LM is supposed to be used
on MTL for access to BAR2 stolen memory.


Matt

>-Fei
>> Matt
>>
>>> +
>>> +     switch (level) {
>>> +     case I915_CACHE_NONE:
>>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>>> +             break;

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH] drm/i915/mtl: workaround coherency issue for Media

2023-04-20 Thread Matt Roper
side
> +  */
> + if (IS_METEORLAKE(gt->i915))
> + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   vma = i915_vma_instance(obj, >ggtt->vm, NULL);
>   if (IS_ERR(vma))
>   goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct 
> ct_incoming_msg **msg)
>   /* now update descriptor */
>   WRITE_ONCE(desc->head, head);
>  
> + /*
> +  * Wa_22016122933: Making sure the head update is
> +  * visible to GuC right away
> +  */
> + intel_guc_write_barrier(ct_to_guc(ct));
> +
>   return available - len;
>  
>  corrupted:
> -- 
> 2.39.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-20 Thread Matt Roper
On Thu, Apr 20, 2023 at 11:13:52PM +0200, Andi Shyti wrote:
> From: Madhumitha Tolakanahalli Pradeep 
> 
> 
> On MTL, GT can no longer allocate on LLC - only the CPU can.
> This, along with addition of support for L4 cache calls for
> a MOCS/PAT table update.
> Also the PAT index registers are multicasted for primary GT,
> and there is an address jump from index 7 to 8. This patch
> makes sure that these registers are programmed in the proper
> way.
> 
> BSpec: 44509, 45101, 44235
> 
> Cc: Matt Roper 
> Cc: Lucas De Marchi 
> Signed-off-by: Madhumitha Tolakanahalli Pradeep 
> 
> Signed-off-by: Aravind Iddamsetty 
> Signed-off-by: Nirmoy Das 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andrzej Hajda 
> Reviewed-by: Nirmoy Das 
> Reviewed-by: Andi Shyti 
> Signed-off-by: Andi Shyti 
> ---
> Hi,
> 
> just extracting this patch from Fei's series.

I just posted some feedback on Fei's series about an hour ago:

https://lore.kernel.org/intel-gfx/20230420202904.gy4085...@mdroper-desk1.amr.corp.intel.com/

Basically there's extra stuff in this patch that doesn't relate to the
primary topic of defining the MOCS and PAT tables for MTL.  E.g., the
PTE bits aren't used in this patch and should be moved to the following
patch that deals with page table encoding (and at least one of those
bits likely isn't correct from what I see in the bspec).

Also the GSI changes at the bottom seem to be trying to work around a
shortcoming of the selftest; it would likely be better to handle that in
the selftest itself (which can probably be a separate patch).


Matt

> 
> Andi
> 
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 47 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++-
>  drivers/gpu/drm/i915/gt/intel_mocs.c| 76 +++--
>  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
>  5 files changed, 143 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index fd1f9cd35e9d7..e8c3b762a92a3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -356,7 +356,11 @@
>  #define GEN7_TLB_RD_ADDR _MMIO(0x4700)
>  
>  #define GEN12_PAT_INDEX(index)   _MMIO(0x4800 + (index) 
> * 4)
> -#define XEHP_PAT_INDEX(index)MCR_REG(0x4800 + 
> (index) * 4)
> +#define _PAT_INDEX(index)_PICK_EVEN_2RANGES(index, 8, \
> +0x4800, 
> 0x4804, \
> +0x4848, 
> 0x484c)
> +#define XEHP_PAT_INDEX(index)
> MCR_REG(_PAT_INDEX(index))
> +#define XELPMP_PAT_INDEX(index)  _MMIO(_PAT_INDEX(index))
>  
>  #define XEHP_TILE0_ADDR_RANGEMCR_REG(0x4900)
>  #define   XEHP_TILE_LMEM_RANGE_SHIFT 8
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4f436ba7a3c83..2f6a9be0ffe61 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -468,6 +468,44 @@ void gtt_write_workarounds(struct intel_gt *gt)
>   }
>  }
>  
> +static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)
> +{
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(0),
> +MTL_PPAT_L4_0_WB);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(1),
> +MTL_PPAT_L4_1_WT);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(2),
> +MTL_PPAT_L4_3_UC);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
> +MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
> +MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
> +
> + /*
> +  * Remaining PAT entries are left at the hardware-default
> +  * fully-cached setting
> +  */
> +}
> +
> +static void xelpg_setup_private_ppat(struct intel_gt *gt)
> +{
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
> +  MTL_PPAT_L4_0_WB);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
> +  MTL_PPAT_L4_1_WT);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
> +  MTL_PPAT_L4_3_UC);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
> +  MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
> +  MT

Re: [Intel-gfx] [PATCH 6/8] drm/i915: preparation for using PAT index

2023-04-20 Thread Matt Roper
On Wed, Apr 19, 2023 at 04:00:56PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> This patch is a preparation for replacing enum i915_cache_level with PAT
> index. Caching policy for buffer objects is set through the PAT index in
> PTE, the old i915_cache_level is not sufficient to represent all caching
> modes supported by the hardware.
> 
> Preparing the transition by adding some platform dependent data structures
> and helper functions to translate the cache_level to pat_index.
> 
> cachelevel_to_pat: a platform dependent array mapping cache_level to
>pat_index.
> 
> max_pat_index: the maximum PAT index supported by the hardware. Needed for
>validating the PAT index passed in from user space.

The description here doesn't quite match how it's being used.  For
platforms like MTL, the hardware supports PAT indices 0-15.  The bspec
only gives us values to program for the first 5 of those entries and we
leave the rest at their hardware default (fully cached).  In the code
below, you're setting max_pat_index to the size of the bspec-defined
table (i.e., max=4 on MTL).  That's fine, but it means the description
here ("maximum...supported by hardware") is inaccurate.


Matt

> 
> i915_gem_get_pat_index: function to convert cache_level to PAT index.
> 
> obj_to_i915(obj): macro moved to header file for wider usage.
> 
> I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the
>       convenience of coding.
> 
> Cc: Chris Wilson 
> Cc: Matt Roper 
> Cc: Andi Shyti 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c|  9 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|  4 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  6 ++
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  |  6 ++
>  drivers/gpu/drm/i915/i915_pci.c   | 75 +--
>  drivers/gpu/drm/i915/intel_device_info.h  |  5 ++
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
>  9 files changed, 107 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 4666bb82f312..8c70a0ec7d2f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -45,6 +45,15 @@ static struct kmem_cache *slab_objects;
>  
>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>  
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> + enum i915_cache_level level)
> +{
> + if (drm_WARN_ON(>drm, level >= I915_MAX_CACHE_LEVEL))
> + return 0;
> +
> + return INTEL_INFO(i915)->cachelevel_to_pat[level];
> +}
> +
>  struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
>   struct drm_i915_gem_object *obj;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 885ccde9dc3c..4c92e17b4337 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -20,6 +20,8 @@
>  
>  enum intel_region_id;
>  
> +#define obj_to_i915(obj__) to_i915((obj__)->base.dev)
> +
>  static inline bool i915_gem_object_size_2big(u64 size)
>  {
>   struct drm_i915_gem_object *obj;
> @@ -30,6 +32,8 @@ static inline bool i915_gem_object_size_2big(u64 size)
>   return false;
>  }
>  
> +unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
> + enum i915_cache_level level);
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>  
>  void i915_objects_module_exit(void);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 830c11431ee8..41b35abccf88 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -194,6 +194,7 @@ enum i915_cache_level {
>* engine.
>*/
>   I915_CACHE_WT,
> + I915_MAX_CACHE_LEVEL,
>  };
>  
>  enum i915_map_type {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> index b1672e054b21..214763942aa2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
> @@ -460,8 +460,6 @@ void i915_gem_shrinker_taints_mutex(struct 
> drm_i915_private *i915,
>   fs_reclaim_release(GFP_KERNEL);
>  }
>  
> -

Re: [Intel-gfx] [PATCH 5/8] drm/i915/mtl: end support for set caching ioctl

2023-04-20 Thread Matt Roper
On Wed, Apr 19, 2023 at 04:00:55PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> The design is to keep Buffer Object's caching policy immutable through
> out its life cycle. This patch ends the support for set caching ioctl
> from MTL onward. While doing that we also set BO's to be 1-way coherent
> at creation time because GPU is no longer automatically snooping CPU
> cache. For UMD's need to fine tune the caching policy for BO's, a follow
> up patch will extend the GEM_CREATE uAPI to allow UMD's specify caching
> mode at BO creation time.

Nitpick:  I don't think "UMD" is a term that anyone really uses outside
of Intel.  It's probably better to just say "userspace" instead of
"UMD" since that's more accurate anyway.


Matt

> 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> Reviewed-by: Andrzej Hajda 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c  | 9 -
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> index d2d5a24301b2..bb3575b1479f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> @@ -337,6 +337,9 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, 
> void *data,
>   if (IS_DGFX(i915))
>   return -ENODEV;
>  
> + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> + return -EOPNOTSUPP;
> +
>   switch (args->caching) {
>   case I915_CACHING_NONE:
>   level = I915_CACHE_NONE;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 37d1efcd3ca6..cad4a6017f4b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -601,7 +601,14 @@ static int shmem_object_init(struct intel_memory_region 
> *mem,
>   obj->write_domain = I915_GEM_DOMAIN_CPU;
>   obj->read_domains = I915_GEM_DOMAIN_CPU;
>  
> - if (HAS_LLC(i915))
> + /*
> +  * MTL doesn't snoop CPU cache by default for GPU access (namely
> +  * 1-way coherency). However some UMD's are currently depending on
> +  * that. Make 1-way coherent the default setting for MTL. A follow
> +  * up patch will extend the GEM_CREATE uAPI to allow UMD's specify
> +  * caching mode at BO creation time
> +  */
> + if (HAS_LLC(i915) || (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)))
>   /* On some devices, we can have the GPU use the LLC (the CPU
>* cache) for about a 10% performance improvement
>* compared to uncached.  Graphics requests other than
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

2023-04-20 Thread Matt Roper
On Wed, Apr 19, 2023 at 04:00:54PM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> This patch implements Wa_22016122933.
> 
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).

Is this description accurate?  This patch doesn't insert an mfence
instruction itself, it just calls intel_guc_write_barrier().  On
platforms like MTL that aren't using local memory, that issues a wmb()
barrier, which I believe is implemented as an sfence, not mfence.  You'd
need to be doing a mb() call to get an mfence.

I think in general this level of explanation is unnecessary; you can
just give a high-level description indicating that we force the
write-combine buffer to be flushed and not give the low-level specifics
of what instruction that translates to at the x86 level.

Aside from simplifying the commit message,

Reviewed-by: Matt Roper 

> 
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
> 
> BSpec: 45101
> 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andi Shyti 
> Acked-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 -
>  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  7 +++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++
>  4 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct 
> drm_i915_private *i915,
> struct drm_i915_gem_object *obj,
> bool always_coherent)
>  {
> - if (i915_gem_object_is_lmem(obj))
> + /*
> +  * Wa_22016122933: always return I915_MAP_WC for MTL
> +  */
> + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>   return I915_MAP_WC;
>   if (HAS_LLC(i915) || always_coherent)
>   return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   if (obj->base.size < gsc->fw.size)
>   return -ENOSPC;
>  
> + /*
> +  * Wa_22016122933: For MTL the shared memory needs to be mapped
> +  * as WC on CPU side and UC (PAT index 2) on GPU side
> +  */
> + if (IS_METEORLAKE(i915))
> + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>   dst = i915_gem_object_pin_map_unlocked(obj,
>  i915_coherent_map_type(i915, 
> obj, true));
>   if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>   memset(dst, 0, obj->base.size);
>   memcpy(dst, src, gsc->fw.size);
>  
> + /*
> +  * Wa_22016122933: Making sure the data in dst is
> +  * visible to GSC right away
> +  */
> + intel_guc_write_barrier(>uc.guc);
> +
>   i915_gem_object_unpin_map(gsc->fw.obj);
>   i915_gem_object_unpin_map(obj);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,

Re: [Intel-gfx] [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-20 Thread Matt Roper
(px_dma(vm->scratch[0]),
> - I915_CACHE_NONE, pte_flags);
> + vm->pte_encode(px_dma(vm->scratch[0]),
> +I915_CACHE_NONE, pte_flags);
>  
>   for (i = 1; i <= vm->top; i++) {
>   struct drm_i915_gem_object *obj;
> @@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>*/
>   ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
>  
> - ppgtt->vm.pte_encode = gen8_pte_encode;
> + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
> + ppgtt->vm.pte_encode = mtl_pte_encode;
> + else
> + ppgtt->vm.pte_encode = gen8_pte_encode;
>  
>   ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>   ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 3c7f1ed92f5b..20915edc8bd9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>   }
>  }
>  
> +static u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> +enum i915_cache_level level,
> +u32 flags)
> +{
> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> +
> + WARN_ON_ONCE(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
> +
> + if (flags & PTE_LM)
> + pte |= GEN12_GGTT_PTE_LM;
> +
> + switch (level) {
> + case I915_CACHE_NONE:
> + pte |= MTL_GGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_LLC:
> + case I915_CACHE_L3_LLC:
> + pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_WT:
> + pte |= MTL_GGTT_PTE_PAT0;
> + break;
> + }
> +
> + return pte;
> +}
> +
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>enum i915_cache_level level,
>u32 flags)
> @@ -247,7 +274,7 @@ static void gen8_ggtt_insert_page(struct 
> i915_address_space *vm,
>   gen8_pte_t __iomem *pte =
>   (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
>  
> - gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
> + gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags));
>  
>   ggtt->invalidate(ggtt);
>  }
> @@ -257,8 +284,8 @@ static void gen8_ggtt_insert_entries(struct 
> i915_address_space *vm,
>enum i915_cache_level level,
>u32 flags)
>  {
> - const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
>   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> + const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags);
>   gen8_pte_t __iomem *gte;
>   gen8_pte_t __iomem *end;
>   struct sgt_iter iter;
> @@ -981,7 +1008,10 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   ggtt->vm.vma_ops.bind_vma= intel_ggtt_bind_vma;
>   ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
>  
> - ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
> + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> + ggtt->vm.pte_encode = mtl_ggtt_pte_encode;
> + else
> + ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
>  
>   return ggtt_probe_common(ggtt, size);
>  }
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-20 Thread Matt Roper
On Wed, Apr 19, 2023 at 04:00:52PM -0700, fei.y...@intel.com wrote:
> From: Madhumitha Tolakanahalli Pradeep 
> 
> 
> On MTL, GT can no longer allocate on LLC - only the CPU can.
> This, along with addition of support for L4 cache calls for
> a MOCS/PAT table update.
> Also the PAT index registers are multicasted for primary GT,
> and there is an address jump from index 7 to 8. This patch
> makes sure that these registers are programmed in the proper
> way.
> 
> BSpec: 44509, 45101, 44235
> 
> Cc: Matt Roper 
> Cc: Lucas De Marchi 
> Signed-off-by: Madhumitha Tolakanahalli Pradeep 
> 
> Signed-off-by: Aravind Iddamsetty 
> Signed-off-by: Nirmoy Das 
> Signed-off-by: Fei Yang 
> Reviewed-by: Andrzej Hajda 
> Reviewed-by: Nirmoy Das 
> Reviewed-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 47 ++-
>  drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++-
>  drivers/gpu/drm/i915/gt/intel_mocs.c| 76 +++--
>  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
>  5 files changed, 143 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index fd1f9cd35e9d..e8c3b762a92a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -356,7 +356,11 @@
>  #define GEN7_TLB_RD_ADDR _MMIO(0x4700)
>  
>  #define GEN12_PAT_INDEX(index)   _MMIO(0x4800 + (index) 
> * 4)
> -#define XEHP_PAT_INDEX(index)MCR_REG(0x4800 + 
> (index) * 4)
> +#define _PAT_INDEX(index)_PICK_EVEN_2RANGES(index, 8, \
> +0x4800, 
> 0x4804, \
> +0x4848, 
> 0x484c)
> +#define XEHP_PAT_INDEX(index)
> MCR_REG(_PAT_INDEX(index))
> +#define XELPMP_PAT_INDEX(index)  _MMIO(_PAT_INDEX(index))
>  
>  #define XEHP_TILE0_ADDR_RANGEMCR_REG(0x4900)
>  #define   XEHP_TILE_LMEM_RANGE_SHIFT 8
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4f436ba7a3c8..2f6a9be0ffe6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -468,6 +468,44 @@ void gtt_write_workarounds(struct intel_gt *gt)
>   }
>  }
>  
> +static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)
> +{
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(0),
> +MTL_PPAT_L4_0_WB);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(1),
> +MTL_PPAT_L4_1_WT);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(2),
> +MTL_PPAT_L4_3_UC);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
> +MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> + intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
> +MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
> +
> + /*
> +  * Remaining PAT entries are left at the hardware-default
> +  * fully-cached setting
> +  */
> +}
> +
> +static void xelpg_setup_private_ppat(struct intel_gt *gt)
> +{
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
> +  MTL_PPAT_L4_0_WB);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
> +  MTL_PPAT_L4_1_WT);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
> +  MTL_PPAT_L4_3_UC);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
> +  MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
> + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
> +  MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
> +
> + /*
> +  * Remaining PAT entries are left at the hardware-default
> +  * fully-cached setting
> +  */
> +}
> +
>  static void tgl_setup_private_ppat(struct intel_uncore *uncore)
>  {
>   /* TGL doesn't support LLC or AGE settings */
> @@ -603,7 +641,14 @@ void setup_private_pat(struct intel_gt *gt)
>  
>   GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
>  
> - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
> + if (gt->type == GT_MEDIA) {
> + xelpmp_setup_private_ppat(gt->uncore);
> + return;
> + }
> +
> + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
> + xelpg_setup_private_ppat(gt);
> + else if (GRAPHICS_VER_FULL(i915) >= IP_V

Re: [PATCH v4] drm/i915: Make IRQ reset and postinstall multi-gt aware

2023-04-17 Thread Matt Roper
On Tue, Apr 18, 2023 at 12:34:43AM +0200, Andi Shyti wrote:
> In multi-gt systems IRQs need to be reset and enabled per GT.
> 
> This might add some redundancy when handling interrupts for
> engines that might not exist in every tile, but helps to keep the
> code cleaner and more understandable.
> 
> Signed-off-by: Andi Shyti 
> Cc: Tvrtko Ursulin 
> ---
> Hi,
> 
> The rsults of this patch are more than promising as we are able
> to have MTL booting and executing basic tests.(*)
> 
> Thank you Daniele and Matt for the valuable exchange of opinions.
> 
> Amdo
> 
> (*) https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115465v5/index.html?
> 
> Changelog
> =
> v3 -> v4
>  - do not change the initial gt and uncore initialization in
>order to gain a better understanding at a glance of the
>purpose of all the local variables.

I think I may not have explained myself well on the previous feedback
here.  What I meant was that rather than doing

struct intel_uncore *uncore = to_gt(dev_priv)->uncore;

as you were in the previous rev, you can simply do

struct intel_uncore *uncore = dev_priv->uncore;

because gt0's uncore pointer is always the same as dev_priv's.  Since
we're using the uncore variable to access display-specific gunit stuff I
figured that was slightly more clear to the reader to take the
device-level pointer rather than grabbing it from any of the GTs. 

That said, using "uncore = gt->uncore" as you have in this version
doesn't cause any real problems since the actual registers being
accessed are sgunit registers and thus don't get translated by GSI
offset.  You still wind up at the same sgunit register offsets on MTL no
matter which GT you grab an uncore from, and display/gunit isn't
something that PVC even needs to worry about.  So

Reviewed-by: Matt Roper 


> v2 -> v3
>  - keep GUnit irq initialization out of the for_each_gt() loop as
>the media GT doesn't have a GUnit.
> v1 -> v2
>  - improve description in the commit log.
> 
>  drivers/gpu/drm/i915/i915_irq.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index dea1a117f3fa1..c027fd5189b85 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2858,10 +2858,13 @@ static void dg1_irq_reset(struct drm_i915_private 
> *dev_priv)
>  {
>   struct intel_gt *gt = to_gt(dev_priv);
>   struct intel_uncore *uncore = gt->uncore;
> + unsigned int i;
>  
>   dg1_master_intr_disable(dev_priv->uncore.regs);
>  
> - gen11_gt_irq_reset(gt);
> + for_each_gt(gt, dev_priv, i)
> + gen11_gt_irq_reset(gt);
> +
>   gen11_display_irq_reset(dev_priv);
>  
>   GEN3_IRQ_RESET(uncore, GEN11_GU_MISC_);
> @@ -3646,8 +3649,10 @@ static void dg1_irq_postinstall(struct 
> drm_i915_private *dev_priv)
>   struct intel_gt *gt = to_gt(dev_priv);
>   struct intel_uncore *uncore = gt->uncore;
>   u32 gu_misc_masked = GEN11_GU_MISC_GSE;
> + unsigned int i;
>  
> -     gen11_gt_irq_postinstall(gt);
> + for_each_gt(gt, dev_priv, i)
> + gen11_gt_irq_postinstall(gt);
>  
>   GEN3_IRQ_INIT(uncore, GEN11_GU_MISC_, ~gu_misc_masked, gu_misc_masked);
>  
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH v2] drm/i915: Make IRQ reset and postinstall multi-gt aware

2023-04-13 Thread Matt Roper
On Thu, Apr 13, 2023 at 06:19:16PM +0200, Andi Shyti wrote:
> On Thu, Apr 13, 2023 at 09:03:29AM -0700, Ceraolo Spurio, Daniele wrote:
> > 
> > 
> > On 4/13/2023 8:52 AM, Matt Roper wrote:
> > > On Thu, Apr 13, 2023 at 03:56:21PM +0200, Andi Shyti wrote:
> > > > Hi Tvrtko,
> > > > 
> > > > (I forgot to CC Daniele)
> > > > 
> > > > On Thu, Apr 13, 2023 at 11:41:28AM +0100, Tvrtko Ursulin wrote:
> > > > > On 13/04/2023 10:20, Andi Shyti wrote:
> > > > > > From: Paulo Zanoni 
> > > > > > 
> > > > > > In multitile systems IRQ need to be reset and enabled per GT.
> > > > > > 
> > > > > > Although in MTL the GUnit misc interrupts register set are
> > > > > > available only in GT-0, we need to loop through all the GT's
> > > > > > in order to initialize the media engine which lies on a different
> > > > > > GT.
> > > > > > 
> > > > > > Signed-off-by: Paulo Zanoni 
> > > > > > Cc: Tvrtko Ursulin 
> > > > > > Signed-off-by: Andi Shyti 
> > > > > > ---
> > > > > > Hi,
> > > > > > 
> > > > > > proposing again this patch, apparently GuC needs this patch to
> > > > > > initialize the media GT.
> > > > > What is the resolution for Matt's concern that this is wrong for MTL?
> > > > There are two explanations, one easy and one less easy.
> > > > 
> > > > The easy one: without this patch i915 doesn't boot on MTL!(*)
> > > > 
> > > > The second explanation is that in MTL the media engine has it's
> > > > own set of misc irq's registers and those are on a different GT
> > > > (Daniele pointed this out).
> > > Assuming you're talking about MTL_GUC_MGUC_INTR_MASK, that's not true;
> > > it's just a single sgunit register (0x1900e8) that has different
> > > bitfields for the primary GuC and the media GuC.  So I still think we
> > > should avoid looping over GTs; it's actually much simpler to handle
> > > things in a single pass since we can just determine the single register
> > > value once (all fields) and write it directly, instead of doing two
> > > separate RMW updates to the same register to try to avoid clobbering
> > > the other GuC's settings.
> 
> if we handle exceptions in a single pass wouldn't we have many
> exceptions to handle in the long run?

I don't think so, it basically boils down to something along the lines
of

if (MEDIA_VER(i915) >= 13)
val = HIGH_BITS | LOW_BITS;
else
val = HIGH_BITS;

...

intel_uncore_write(val);

which isn't really any more complicated than today's logic:

called for each gt {
...

if (gt is MEDIA)
bits = LOW_BITS;
else
bits = HIGH_BITS;

...

intel_uncore_rmw(bits);
}


Matt

> 
> > > For pre-MTL platforms, it's the same register, except that the bitfield
> > > now devoted to the media GuC was previously used for something else
> > > (scatter/gather).
> > 
> > It's not just the GuC, the VCS/VECS engine programming is also tied to the
> > media GT (via the HAS_ENGINE checks). It looks like we unconditionally
> > program VCS 0 and 2, so it'll still work for MTL, but if we get a device
> > with more VCS engines it'll break. Maybe we can add a MTL version of the
> > function that just programs everything unconditionally? Going forward it
> > should be ok to program things for engines that don't exist, but I'm not
> > sure we can do that for older platforms that came before the extra engines
> > were ever defined in HW.
> 
> This is more or less what Tvrtko has suggested, as well. Looks to
> me like replicating some code... anyway, I will try and see how
> it looks like.
> 
> Andi
> 
> PS Thanks Matt, Daniele and Tvrtko for the feedback.
> 
> > Daniele
> > 
> > > 
> > > 
> > > Matt
> > > 
> > > > I sent this patch not to bypass any review, but to restart the
> > > > discussion as this patch was just dropped.
> > > > 
> > > > Thanks,
> > > > Andi
> > > > 
> > > > 
> > > > (*)
> > > > [drm] *ERROR* GT1: GUC: CT: No response for request 0x550a (fence 7)
> > > > [drm] *ERROR* GT1: GUC: CT: Sending action 0x550a failed (-ETIMEDOUT) 
> > > > status=0X0
> > > > [drm] *ERROR* GT1: GUC: Failed to enable usage stats: -ETIMEDOUT
> > > > [drm] *ERROR* GT1: GuC initialization failed -ETIMEDOUT
> > > > [drm] *ERROR* GT1: Enabling uc failed (-5)
> > > > [drm] *ERROR* GT1: Failed to initialize GPU, declaring it wedged!

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH v2] drm/i915: Make IRQ reset and postinstall multi-gt aware

2023-04-13 Thread Matt Roper
On Thu, Apr 13, 2023 at 09:03:29AM -0700, Ceraolo Spurio, Daniele wrote:
> 
> 
> On 4/13/2023 8:52 AM, Matt Roper wrote:
> > On Thu, Apr 13, 2023 at 03:56:21PM +0200, Andi Shyti wrote:
> > > Hi Tvrtko,
> > > 
> > > (I forgot to CC Daniele)
> > > 
> > > On Thu, Apr 13, 2023 at 11:41:28AM +0100, Tvrtko Ursulin wrote:
> > > > On 13/04/2023 10:20, Andi Shyti wrote:
> > > > > From: Paulo Zanoni 
> > > > > 
> > > > > In multitile systems IRQ need to be reset and enabled per GT.
> > > > > 
> > > > > Although in MTL the GUnit misc interrupts register set are
> > > > > available only in GT-0, we need to loop through all the GT's
> > > > > in order to initialize the media engine which lies on a different
> > > > > GT.
> > > > > 
> > > > > Signed-off-by: Paulo Zanoni 
> > > > > Cc: Tvrtko Ursulin 
> > > > > Signed-off-by: Andi Shyti 
> > > > > ---
> > > > > Hi,
> > > > > 
> > > > > proposing again this patch, apparently GuC needs this patch to
> > > > > initialize the media GT.
> > > > What is the resolution for Matt's concern that this is wrong for MTL?
> > > There are two explanations, one easy and one less easy.
> > > 
> > > The easy one: without this patch i915 doesn't boot on MTL!(*)
> > > 
> > > The second explanation is that in MTL the media engine has it's
> > > own set of misc irq's registers and those are on a different GT
> > > (Daniele pointed this out).
> > Assuming you're talking about MTL_GUC_MGUC_INTR_MASK, that's not true;
> > it's just a single sgunit register (0x1900e8) that has different
> > bitfields for the primary GuC and the media GuC.  So I still think we
> > should avoid looping over GTs; it's actually much simpler to handle
> > things in a single pass since we can just determine the single register
> > value once (all fields) and write it directly, instead of doing two
> > separate RMW updates to the same register to try to avoid clobbering
> > the other GuC's settings.
> > 
> > For pre-MTL platforms, it's the same register, except that the bitfield
> > now devoted to the media GuC was previously used for something else
> > (scatter/gather).
> 
> It's not just the GuC, the VCS/VECS engine programming is also tied to the
> media GT (via the HAS_ENGINE checks). It looks like we unconditionally
> program VCS 0 and 2, so it'll still work for MTL, but if we get a device
> with more VCS engines it'll break. Maybe we can add a MTL version of the
> function that just programs everything unconditionally? Going forward it
> should be ok to program things for engines that don't exist, but I'm not
> sure we can do that for older platforms that came before the extra engines
> were ever defined in HW.

Right, so I think the engine handling is already correct for MTL today;
the main concern would be how it might need to change for other future
platforms if more media engines show back up on a media GT.  I think we
can wait and cross that bridge if/when we get to it.  With focus moving
over to the Xe KMD, we might be on a completely different driver by the
time the hardware adds back in more media engines that aren't already
covered unconditionally.


Matt

> 
> Daniele
> 
> > 
> > 
> > Matt
> > 
> > > I sent this patch not to bypass any review, but to restart the
> > > discussion as this patch was just dropped.
> > > 
> > > Thanks,
> > > Andi
> > > 
> > > 
> > > (*)
> > > [drm] *ERROR* GT1: GUC: CT: No response for request 0x550a (fence 7)
> > > [drm] *ERROR* GT1: GUC: CT: Sending action 0x550a failed (-ETIMEDOUT) 
> > > status=0X0
> > > [drm] *ERROR* GT1: GUC: Failed to enable usage stats: -ETIMEDOUT
> > > [drm] *ERROR* GT1: GuC initialization failed -ETIMEDOUT
> > > [drm] *ERROR* GT1: Enabling uc failed (-5)
> > > [drm] *ERROR* GT1: Failed to initialize GPU, declaring it wedged!
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH v2] drm/i915: Make IRQ reset and postinstall multi-gt aware

2023-04-13 Thread Matt Roper
On Thu, Apr 13, 2023 at 03:56:21PM +0200, Andi Shyti wrote:
> Hi Tvrtko,
> 
> (I forgot to CC Daniele)
> 
> On Thu, Apr 13, 2023 at 11:41:28AM +0100, Tvrtko Ursulin wrote:
> > 
> > On 13/04/2023 10:20, Andi Shyti wrote:
> > > From: Paulo Zanoni 
> > > 
> > > In multitile systems IRQ need to be reset and enabled per GT.
> > > 
> > > Although in MTL the GUnit misc interrupts register set are
> > > available only in GT-0, we need to loop through all the GT's
> > > in order to initialize the media engine which lies on a different
> > > GT.
> > > 
> > > Signed-off-by: Paulo Zanoni 
> > > Cc: Tvrtko Ursulin 
> > > Signed-off-by: Andi Shyti 
> > > ---
> > > Hi,
> > > 
> > > proposing again this patch, apparently GuC needs this patch to
> > > initialize the media GT.
> > 
> > What is the resolution for Matt's concern that this is wrong for MTL?
> 
> There are two explanations, one easy and one less easy.
> 
> The easy one: without this patch i915 doesn't boot on MTL!(*)
> 
> The second explanation is that in MTL the media engine has it's
> own set of misc irq's registers and those are on a different GT
> (Daniele pointed this out).

Assuming you're talking about MTL_GUC_MGUC_INTR_MASK, that's not true;
it's just a single sgunit register (0x1900e8) that has different
bitfields for the primary GuC and the media GuC.  So I still think we
should avoid looping over GTs; it's actually much simpler to handle
things in a single pass since we can just determine the single register
value once (all fields) and write it directly, instead of doing two
separate RMW updates to the same register to try to avoid clobbering
the other GuC's settings.

For pre-MTL platforms, it's the same register, except that the bitfield
now devoted to the media GuC was previously used for something else
(scatter/gather).


Matt

> 
> I sent this patch not to bypass any review, but to restart the
> discussion as this patch was just dropped.
> 
> Thanks,
> Andi
> 
> 
> (*)
> [drm] *ERROR* GT1: GUC: CT: No response for request 0x550a (fence 7)
> [drm] *ERROR* GT1: GUC: CT: Sending action 0x550a failed (-ETIMEDOUT) 
> status=0X0
> [drm] *ERROR* GT1: GUC: Failed to enable usage stats: -ETIMEDOUT
> [drm] *ERROR* GT1: GuC initialization failed -ETIMEDOUT
> [drm] *ERROR* GT1: Enabling uc failed (-5)
> [drm] *ERROR* GT1: Failed to initialize GPU, declaring it wedged!

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-11 Thread Matt Roper
On Mon, Apr 10, 2023 at 08:55:16PM -0700, Yang, Fei wrote:
...
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> 
>>> index 69ce55f517f5..b632167eaf2e 100644
> 
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> 
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> 
>>> @@ -88,9 +88,18 @@ typedef u64 gen8_pte_t;
> 
>>>  #define BYT_PTE_SNOOPED_BY_CPU_CACHES REG_BIT(2)
> 
>>>  #define BYT_PTE_WRITEABLE    REG_BIT(1)
> 
>>> 
> 
>>> +#define GEN12_PPGTT_PTE_PAT3    BIT_ULL(62)
> 
>>>  #define GEN12_PPGTT_PTE_LM  BIT_ULL(11)
> 
>>> +#define GEN12_PPGTT_PTE_PAT2    BIT_ULL(7)
> 
>>> +#define GEN12_PPGTT_PTE_NC  BIT_ULL(5)
> 
>>> +#define GEN12_PPGTT_PTE_PAT1    BIT_ULL(4)
> 
>>> +#define GEN12_PPGTT_PTE_PAT0    BIT_ULL(3)
> 
>> 
> 
>> Which bspec page is this from?  The PPGTT descriptions in
> 
>> the bspec are kind of hard to track down.
> 
> 
> 
>[9]https://gfxspecs.intel.com/Predator/Home/Index/53521

The bspec tagging is a bit bizarre in this area, but I don't believe
this page is intended to apply to MTL.  Note that this page is inside a
section specifically listed as "57b VA Support" --- i.e., this general
section is for platforms like PVC rather than MTL.  MTL only has 48b
virtual address space (bspec 55416), so I think one of the pages in the
48b sections is what we should be referencing instead.

If they screwed up and put MTL-specific details only on a PVC-specific
page of the bspec, we should probably file a bspec issue about that to
get it fixed.

> 
>PAT_Index[2:0] = {PAT, PCD, PWT} = BIT(7, 4, 3)
> 
>PAT_Index[3] = BIT(62)
> 
>PAT_Index[4] = BIT(61)
> 
> 
> 
>> But if these only apply to MTL, why are they labelled as "GEN12?"
> 
> 
> 
>These apply to GEN12plus.

That's not what your patch is doing; you're using them in a function
that only gets called on MTL.  So the question is whether these
definitions truly applied to older platforms like TGL/RKL/ADL/etc too
(and we need to go back and fix that code), or whether they're
Xe_LPG-specific, in which case the "GEN12_" prefix is incorrect.

Also, handling the MTL-specific PTE encoding later in the series, after
the transition from cache_level to PAT index, might be best since then
you can just implement it correctly at the time the code is introduced;
no need to add the cache_level implementation first (which can't even
use all the bits) just to come back a few patches later and replace it
all with PAT code.

> 
> 
> 
>>> 
> 
>>> -#define GEN12_GGTT_PTE_LM   BIT_ULL(1)
> 
>>> +#define GEN12_GGTT_PTE_LM BIT_ULL(1)
> 
>>> +#define MTL_GGTT_PTE_PAT0  BIT_ULL(52)
> 
>>> +#define MTL_GGTT_PTE_PAT1  BIT_ULL(53)
> 
>>> +#define GEN12_GGTT_PTE_ADDR_MASK   GENMASK_ULL(45, 12)
> 
>>> +#define MTL_GGTT_PTE_PAT_MASK  
>GENMASK_ULL(53, 52)
> 
>>> 
> 
>>>  #define GEN12_PDE_64K BIT(6)
> 
>>>  #define GEN12_PTE_PS64 BIT(8)
> 
>>> @@ -147,6 +156,15 @@ typedef u64 gen8_pte_t;  #define GEN8_PDE_IPS_64K
> 
>>> BIT(11)
> 
>>>  #define GEN8_PDE_PS_2M   BIT(7)
> 
>>> 
> 
>>> +#define MTL_PPAT_L4_CACHE_POLICY_MASK REG_GENMASK(3, 2)
> 
>>> +#define MTL_PAT_INDEX_COH_MODE_MASK  REG_GENMASK(1, 0)
> 
>>> +#define MTL_PPAT_L4_3_UC  
>REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 3)
> 
>>> +#define MTL_PPAT_L4_1_WT  
>REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 1)
> 
>>> +#define MTL_PPAT_L4_0_WB  
>REG_FIELD_PREP(MTL_PPAT_L4_CACHE_POLICY_MASK, 0)
> 
>>> +#define MTL_3_COH_2W REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK,
>3)
> 
>>> +#define MTL_2_COH_1W REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK,
>2)
> 
>>> +#define MTL_0_COH_NON   REG_FIELD_PREP(MTL_PAT_INDEX_COH_MODE_MASK, 0)
> 
>> 
> 
>>The values for these definitions don't seem to be aligned.
> 
> 
> 
>These are aligned with
>[10]https://gfxspecs.intel.com/Predator/Home/Index/45101

I mean spacing aligned.  If your tabstops are set to 8, then the values
don't line up visually.


Matt

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-10 Thread Matt Roper
On Fri, Apr 07, 2023 at 12:12:29AM -0700, fei.y...@intel.com wrote:
> From: Fei Yang 
> 
> On MTL, GT can no longer allocate on LLC - only the CPU can.
> This, along with addition of support for ADM/L4 cache calls a
> MOCS/PAT table update. Also defines PTE encode functions for
> MTL as it has different PAT index definition than previous
> platforms.

It might be best to keep the PTE encoding as a separate patch from the
MOCS/PAT tables.  It's a different enough topic that it probably
deserves a patch of its own.

> 
> BSpec: 44509, 45101, 44235
> 
> Cc: Matt Roper 
> Cc: Lucas De Marchi 
> Signed-off-by: Madhumitha Tolakanahalli Pradeep 
> 
> Signed-off-by: Aravind Iddamsetty 
> Signed-off-by: Fei Yang 
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c| 28 +
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h|  3 +
>  drivers/gpu/drm/i915/gt/intel_ggtt.c| 27 +
>  drivers/gpu/drm/i915/gt/intel_gtt.c | 23 +++-
>  drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++-
>  drivers/gpu/drm/i915/gt/intel_mocs.c| 76 +++--
>  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
>  drivers/gpu/drm/i915/i915_pci.c |  1 +
>  8 files changed, 173 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 4daaa6f55668..df4073d32114 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>   return pte;
>  }
>  
> +static u64 mtl_pte_encode(dma_addr_t addr,
> +   enum i915_cache_level level,
> +   u32 flags)
> +{
> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
> +
> + if (unlikely(flags & PTE_READ_ONLY))
> + pte &= ~GEN8_PAGE_RW;
> +
> + if (flags & PTE_LM)
> + pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
> +
> + switch (level) {
> + case I915_CACHE_NONE:
> + pte |= GEN12_PPGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_LLC:
> + case I915_CACHE_L3_LLC:
> + pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_WT:
> + pte |= GEN12_PPGTT_PTE_PAT0;
> + break;
> + }
> +
> + return pte;
> +}
> +
>  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
>  {
>   struct drm_i915_private *i915 = ppgtt->vm.i915;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h 
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> index f541d19264b4..6b8ce7f4d25a 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> @@ -18,5 +18,8 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>enum i915_cache_level level,
>u32 flags);
> +u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> + unsigned int pat_index,
> + u32 flags);
>  
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 3c7f1ed92f5b..4a16bfcde1de 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
>   }
>  }
>  
> +u64 mtl_ggtt_pte_encode(dma_addr_t addr,
> + enum i915_cache_level level,
> + u32 flags)
> +{
> + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT;
> +
> + GEM_BUG_ON(addr & ~GEN12_GGTT_PTE_ADDR_MASK);
> +
> + if (flags & PTE_LM)
> + pte |= GEN12_GGTT_PTE_LM;
> +
> + switch (level) {
> + case I915_CACHE_NONE:
> + pte |= MTL_GGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_LLC:
> + case I915_CACHE_L3_LLC:
> + pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1;
> + break;
> + case I915_CACHE_WT:
> + pte |= MTL_GGTT_PTE_PAT0;
> + break;
> + }
> +
> + return pte;
> +}
> +
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>enum i915_cache_level level,
>u32 flags)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 4f436ba7a3c8..1e1b34e22cf5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -468,6 +468,25 @@ void gtt_write_workarounds(struct intel_gt *gt)
>   }
>  }
> 

Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-05 Thread Matt Roper
On Wed, Apr 05, 2023 at 02:13:31PM -0700, John Harrison wrote:
> On 4/3/2023 17:34, Matt Roper wrote:
> > On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > A pair of pre-Gen12 registers were being included in the Gen12 capture
> > > list. GuC was rejecting those as being invalid and logging errors
> > > about them. So, stop doing it.
> > Looks like these registers existed from gen8-gen11.  With this change,
> > it looks like they also won't be included in the GuC error capture for
> > gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
> > rather than default_lists; do we care about that?  I assume not (since
> > those platforms don't use GuC submission unless you force it with the
> > enable_guc modparam and taint your kernel), but I figured I should point
> > it out.
> Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or it's
> own thing. I hadn't spotted that before. It certainly seems incorrect.
> 
> > 
> > Reviewed-by: Matt Roper 
> > 
> > 
> > [1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
> >  display IP)?  It doesn't seem like we're doing anything with display
> >  registers here so using display IP naming seems really confusing.
> I think because no-one has a clue what name refers to what hardware any more
> :(.
> 
> What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55?

Yeah, the naming is a real mess.  :-(  For graphics IP, the official
terms are supposed to be:

12.00 = Xe_LP
12.10 = Xe_LP+ (basically the same as Xe_LP except for interrupts)
12.50 = Xe_HP
12.55 = Xe_HPG (it's nearly identical to Xe_HP)
12.7x = Xe_LPG

There are separate names for media, although we didn't really start
using them anywhere in the i915 until the separation of IPs started
becoming more important with MTL:

12.00 = Xe_M (or Xe_M+ for DG1, but we treat it the same in the KMD)
12.50 = Xe_XPM
12.55 = Xe_HPM
12.60 = Xe_XPM+
13.00 = Xe_LPM+

and display:

12.00 = Xe_D
13.00 = Xe_LPD (ADL-P) or Xe_HPD (DG2)
14.00 = Xe_LPD+


The pre-12 stuff predates the fancy new marketing-mandated names.  Even
though we're not using "gen" terminology going forward, those old ones
are grandfathered in, so it's still okay to refer to them as gen9,
gen11, etc.


Matt

> 
> John.
> 
> > 
> > 
> > Matt
> > 
> > > Signed-off-by: John Harrison 
> > > Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error 
> > > state capture.")
> > > Cc: Alan Previn 
> > > Cc: Umesh Nerlige Ramappa 
> > > Cc: Lucas De Marchi 
> > > Cc: John Harrison 
> > > Cc: Jani Nikula 
> > > Cc: Matt Roper 
> > > Cc: Balasubramani Vivekanandan 
> > > Cc: Daniele Ceraolo Spurio 
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > index cf49188db6a6e..e0e793167d61b 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > @@ -31,12 +31,14 @@
> > >   { FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
> > >   #define COMMON_GEN9BASE_GLOBAL \
> > > - { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > - { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
> > >   { ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
> > >   { DONE_REG, 0,  0, "DONE_REG" }, \
> > >   { HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
> > > +#define GEN9_GLOBAL \
> > > + { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > + { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
> > > +
> > >   #define COMMON_GEN12BASE_GLOBAL \
> > >   { GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" 
> > > }, \
> > >   { GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" 
> > > }, \
> > > @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
> > > xe_lpd_gsc_inst_regs[] = {
> > >   static const struct __guc_mmio_reg_descr default_global_regs[] = {
> > >   COMMON_BASE_GLOBAL,
> > >   COMMON_GEN9BASE_GLOBAL,
> > > + GEN9_GLOBAL,
> > >   };
> > >   static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {
> > > -- 
> > > 2.39.1
> > > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-03 Thread Matt Roper
On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> A pair of pre-Gen12 registers were being included in the Gen12 capture
> list. GuC was rejecting those as being invalid and logging errors
> about them. So, stop doing it.

Looks like these registers existed from gen8-gen11.  With this change,
it looks like they also won't be included in the GuC error capture for
gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
rather than default_lists; do we care about that?  I assume not (since
those platforms don't use GuC submission unless you force it with the
enable_guc modparam and taint your kernel), but I figured I should point
it out.

Reviewed-by: Matt Roper 


[1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
display IP)?  It doesn't seem like we're doing anything with display
registers here so using display IP naming seems really confusing.


Matt

> 
> Signed-off-by: John Harrison 
> Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state 
> capture.")
> Cc: Alan Previn 
> Cc: Umesh Nerlige Ramappa 
> Cc: Lucas De Marchi 
> Cc: John Harrison 
> Cc: Jani Nikula 
> Cc: Matt Roper 
> Cc: Balasubramani Vivekanandan 
> Cc: Daniele Ceraolo Spurio 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index cf49188db6a6e..e0e793167d61b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -31,12 +31,14 @@
>   { FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
>  
>  #define COMMON_GEN9BASE_GLOBAL \
> - { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> - { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
>   { ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
>   { DONE_REG, 0,  0, "DONE_REG" }, \
>   { HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
>  
> +#define GEN9_GLOBAL \
> + { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> + { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
> +
>  #define COMMON_GEN12BASE_GLOBAL \
>   { GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" }, \
>   { GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" }, \
> @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
> xe_lpd_gsc_inst_regs[] = {
>  static const struct __guc_mmio_reg_descr default_global_regs[] = {
>   COMMON_BASE_GLOBAL,
>   COMMON_GEN9BASE_GLOBAL,
> + GEN9_GLOBAL,
>  };
>  
>  static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {
> -- 
> 2.39.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-03 Thread Matt Roper
On Mon, Apr 03, 2023 at 07:02:08PM +0300, Ville Syrjälä wrote:
> On Fri, Mar 31, 2023 at 11:38:30PM -0700, fei.y...@intel.com wrote:
> > From: Fei Yang 
> > 
> > To comply with the design that buffer objects shall have immutable
> > cache setting through out its life cycle, {set, get}_caching ioctl's
> > are no longer supported from MTL onward. With that change caching
> > policy can only be set at object creation time. The current code
> > applies a default (platform dependent) cache setting for all objects.
> > However this is not optimal for performance tuning. The patch extends
> > the existing gem_create uAPI to let user set PAT index for the object
> > at creation time.
> 
> This is missing the whole justification for the new uapi.
> Why is MOCS not sufficient?

PAT and MOCS are somewhat related, but they're not the same thing.  The
general direction of the hardware architecture recently has been to
slowly dumb down MOCS and move more of the important memory/cache
control over to the PAT instead.  On current platforms there is some
overlap (and MOCS has an "ignore PAT" setting that makes the MOCS "win"
for the specific fields that both can control), but MOCS doesn't have a
way to express things like snoop/coherency mode (on MTL), or class of
service (on PVC).  And if you check some of the future platforms, the
hardware design starts packing even more stuff into the PAT (not just
cache behavior) which will never be handled by MOCS.

Also keep in mind that MOCS generally applies at the GPU instruction
level; although a lot of instructions have a field to provide a MOCS
index, or can use a MOCS already associated with a surface state, there
are still some that don't.  PAT is the source of memory access
characteristics for anything that can't provide a MOCS directly.


Matt

> 
> > The new extension is platform independent, so UMD's can switch to using
> > this extension for older platforms as well, while {set, get}_caching are
> > still supported on these legacy paltforms for compatibility reason.
> > 
> > Cc: Chris Wilson 
> > Cc: Matt Roper 
> > Signed-off-by: Fei Yang 
> > Reviewed-by: Andi Shyti 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
> >  include/uapi/drm/i915_drm.h| 36 ++
> >  tools/include/uapi/drm/i915_drm.h  | 36 ++
> >  3 files changed, 105 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index e76c9703680e..1c6e2034d28e 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -244,6 +244,7 @@ struct create_ext {
> > unsigned int n_placements;
> > unsigned int placement_mask;
> > unsigned long flags;
> > +   unsigned int pat_index;
> >  };
> >  
> >  static void repr_placements(char *buf, size_t size,
> > @@ -393,11 +394,39 @@ static int ext_set_protected(struct 
> > i915_user_extension __user *base, void *data
> > return 0;
> >  }
> >  
> > +static int ext_set_pat(struct i915_user_extension __user *base, void *data)
> > +{
> > +   struct create_ext *ext_data = data;
> > +   struct drm_i915_private *i915 = ext_data->i915;
> > +   struct drm_i915_gem_create_ext_set_pat ext;
> > +   unsigned int max_pat_index;
> > +
> > +   BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
> > +offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
> > +
> > +   if (copy_from_user(, base, sizeof(ext)))
> > +   return -EFAULT;
> > +
> > +   max_pat_index = INTEL_INFO(i915)->max_pat_index;
> > +
> > +   if (ext.pat_index > max_pat_index) {
> > +   drm_dbg(>drm, "PAT index is invalid: %u\n",
> > +   ext.pat_index);
> > +   return -EINVAL;
> > +   }
> > +
> > +   ext_data->pat_index = ext.pat_index;
> > +
> > +   return 0;
> > +}
> > +
> >  static const i915_user_extension_fn create_extensions[] = {
> > [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
> > [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
> > +   [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
> >  };
> >  
> > +#define PAT_INDEX_NOT_SET  0x
> >  /**
> >   * Creates a new mm object and returns a handle to it.
> >   * @dev: drm device pointer
> > @@ -417,6 +446,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
> > *data,
> > if

Re: [Intel-xe] [PATCH 2/3] drm/xe: Fix platform order

2023-03-31 Thread Matt Roper
On Fri, Mar 31, 2023 at 07:22:06AM -0600, Lucas De Marchi wrote:
> On Mon, Mar 27, 2023 at 10:02:38AM -0700, Matt Roper wrote:
> > On Thu, Mar 23, 2023 at 10:17:53PM -0700, Lucas De Marchi wrote:
> > > Platform order is important when looping through the list of guc
> > > firmware blobs since we use it to prevent loading a blob for a newer
> > > platform onto an older one. Move PVC after ADL.
> > 
> > Shouldn't we be moving the ADL platforms (graphics versions 12.0) higher
> > than DG1 (12.10) and DG2 (12.50) too?
> 
> question then would be:  would we be ordering them by gt
> version?  Or by when they were introduced?

Since all of the platforms here have the GuC inside the
graphics IP[*], then the graphics IP version seems natural to me.

"When they were introduced" would be identical for all of these
platforms for the Xe driver (since we just dumped a big megapatch that
contained all of these platforms at once).  But if you want to match
when they were introduced *in i915* that would be reasonable too,
although the ADLs would still need to come before DG2 in that case.


Matt

[*] MTL has a GuC in both the graphics IP and the media IP.  One of our
questions early on was whether the GuC IP itself would differ between
the two GTs (requiring different firmwares for each).  The response that
came back from the hardware team was that that's technically possible
with standalone media, but at least for MTL they'd keep them identical.
So for now, just basing 100% on the graphics IP version seems fine.  In
the future we may need to stop tying GuC to platform at all and instead
match on the appropriate IP version for whichever GT we're loading on.
But that's a problem for the future...


> 
> I think it makes more sense to be by when they were introduced as a
> platform in the driver.
> 
>   1) what about media/display?
>   2) allow us to always be appending in the enum and elsewhere in
>   the driver.
> 
> Lucas De Marchi
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > Signed-off-by: Lucas De Marchi 
> > > ---
> > >  drivers/gpu/drm/xe/xe_platform_types.h | 3 +--
> > >  drivers/gpu/drm/xe/xe_uc_fw.c  | 2 +-
> > >  2 files changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_platform_types.h 
> > > b/drivers/gpu/drm/xe/xe_platform_types.h
> > > index 72612c832e88..10367f6cc75a 100644
> > > --- a/drivers/gpu/drm/xe/xe_platform_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_platform_types.h
> > > @@ -9,14 +9,13 @@
> > >  /* Keep in gen based order, and chronological order within a gen */
> > >  enum xe_platform {
> > >   XE_PLATFORM_UNINITIALIZED = 0,
> > > - /* gen12 */
> > >   XE_TIGERLAKE,
> > >   XE_ROCKETLAKE,
> > >   XE_DG1,
> > >   XE_DG2,
> > > - XE_PVC,
> > >   XE_ALDERLAKE_S,
> > >   XE_ALDERLAKE_P,
> > > + XE_PVC,
> > >   XE_METEORLAKE,
> > >  };
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
> > > index e2c982b37e87..174c42873ebb 100644
> > > --- a/drivers/gpu/drm/xe/xe_uc_fw.c
> > > +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
> > > @@ -43,9 +43,9 @@ static struct xe_device *uc_fw_to_xe(struct xe_uc_fw 
> > > *uc_fw)
> > >   */
> > >  #define XE_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
> > >   fw_def(METEORLAKE,   guc_def(mtl,  70, 5, 2)) \
> > > + fw_def(PVC,  guc_def(pvc,  70, 5, 2)) \
> > >   fw_def(ALDERLAKE_P,  guc_def(adlp,  70, 5, 2)) \
> > >   fw_def(ALDERLAKE_S,  guc_def(tgl,  70, 5, 2)) \
> > > - fw_def(PVC,  guc_def(pvc,  70, 5, 2)) \
> > >   fw_def(DG2,  guc_def(dg2,  70, 5, 2)) \
> > >   fw_def(DG1,  guc_def(dg1,  70, 5, 2)) \
> > >   fw_def(TIGERLAKE,guc_def(tgl,  70, 5, 2))
> > > --
> > > 2.39.0
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-xe] [PATCH 2/3] drm/xe: Fix platform order

2023-03-27 Thread Matt Roper
On Thu, Mar 23, 2023 at 10:17:53PM -0700, Lucas De Marchi wrote:
> Platform order is important when looping through the list of guc
> firmware blobs since we use it to prevent loading a blob for a newer
> platform onto an older one. Move PVC after ADL.

Shouldn't we be moving the ADL platforms (graphics versions 12.0) higher
than DG1 (12.10) and DG2 (12.50) too?


Matt

> 
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/xe/xe_platform_types.h | 3 +--
>  drivers/gpu/drm/xe/xe_uc_fw.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_platform_types.h 
> b/drivers/gpu/drm/xe/xe_platform_types.h
> index 72612c832e88..10367f6cc75a 100644
> --- a/drivers/gpu/drm/xe/xe_platform_types.h
> +++ b/drivers/gpu/drm/xe/xe_platform_types.h
> @@ -9,14 +9,13 @@
>  /* Keep in gen based order, and chronological order within a gen */
>  enum xe_platform {
>   XE_PLATFORM_UNINITIALIZED = 0,
> - /* gen12 */
>   XE_TIGERLAKE,
>   XE_ROCKETLAKE,
>   XE_DG1,
>   XE_DG2,
> - XE_PVC,
>   XE_ALDERLAKE_S,
>   XE_ALDERLAKE_P,
> + XE_PVC,
>   XE_METEORLAKE,
>  };
>  
> diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
> index e2c982b37e87..174c42873ebb 100644
> --- a/drivers/gpu/drm/xe/xe_uc_fw.c
> +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
> @@ -43,9 +43,9 @@ static struct xe_device *uc_fw_to_xe(struct xe_uc_fw *uc_fw)
>   */
>  #define XE_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
>   fw_def(METEORLAKE,   guc_def(mtl,  70, 5, 2)) \
> + fw_def(PVC,  guc_def(pvc,  70, 5, 2)) \
>   fw_def(ALDERLAKE_P,  guc_def(adlp,  70, 5, 2)) \
>   fw_def(ALDERLAKE_S,  guc_def(tgl,  70, 5, 2)) \
> - fw_def(PVC,  guc_def(pvc,  70, 5, 2)) \
>   fw_def(DG2,  guc_def(dg2,  70, 5, 2)) \
>   fw_def(DG1,  guc_def(dg1,  70, 5, 2)) \
>   fw_def(TIGERLAKE,guc_def(tgl,  70, 5, 2))
> -- 
> 2.39.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-xe] [PATCH 1/3] drm/xe: Remove unused revid from firmware name

2023-03-27 Thread Matt Roper
On Thu, Mar 23, 2023 at 10:17:52PM -0700, Lucas De Marchi wrote:
> The rev field is always 0 so it ends up never used. In i915 it was
> introduced because of CML: up to rev 5 it reuses the guc and huc
> firmware blobs from KBL. After that there is a specific firmware for
> that platform.  This can be reintroduced later if ever needed.

I doubt we'd ever need the revid again; more likely we'd want a way to
select different firmwares for a given subplatform (which is something I
think we need to add anyway for ADL-N).

Reviewed-by: Matt Roper 


Matt

> 
> With the removal of revid the packed attribute in
> uc_fw_platform_requirement, which is there only for reducing the space
> these tables take, can also be removed since it has even more limited
> usefulness: currently there's only padding of 2 bytes. Remove the
> attribute to avoid the unaligned access.
> 
>   $ pahole -C uc_fw_platform_requirement 
> build64/drivers/gpu/drm/xe/xe_uc_fw.o
>   struct uc_fw_platform_requirement {
>   enum xe_platform   p;/* 0 4 
> */
>   const struct uc_fw_blobblob; /* 410 
> */
> 
>   /* size: 16, cachelines: 1, members: 2 */
>   /* padding: 2 */
>   /* last cacheline: 16 bytes */
>   };
> 
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/xe/xe_uc_fw.c | 33 +++--
>  1 file changed, 15 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c b/drivers/gpu/drm/xe/xe_uc_fw.c
> index e9b30e620fd9..e2c982b37e87 100644
> --- a/drivers/gpu/drm/xe/xe_uc_fw.c
> +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
> @@ -39,21 +39,21 @@ static struct xe_device *uc_fw_to_xe(struct xe_uc_fw 
> *uc_fw)
>  
>  /*
>   * List of required GuC and HuC binaries per-platform.
> - * Must be ordered based on platform + revid, from newer to older.
> + * Must be ordered based on platform, from newer to older.
>   */
>  #define XE_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
> - fw_def(METEORLAKE,   0, guc_def(mtl,  70, 5, 2)) \
> - fw_def(ALDERLAKE_P,  0, guc_def(adlp,  70, 5, 2)) \
> - fw_def(ALDERLAKE_S,  0, guc_def(tgl,  70, 5, 2)) \
> - fw_def(PVC,  0, guc_def(pvc,  70, 5, 2)) \
> - fw_def(DG2,  0, guc_def(dg2,  70, 5, 2)) \
> - fw_def(DG1,  0, guc_def(dg1,  70, 5, 2)) \
> - fw_def(TIGERLAKE,0, guc_def(tgl,  70, 5, 2))
> + fw_def(METEORLAKE,   guc_def(mtl,  70, 5, 2)) \
> + fw_def(ALDERLAKE_P,  guc_def(adlp,  70, 5, 2)) \
> + fw_def(ALDERLAKE_S,  guc_def(tgl,  70, 5, 2)) \
> + fw_def(PVC,  guc_def(pvc,  70, 5, 2)) \
> + fw_def(DG2,  guc_def(dg2,  70, 5, 2)) \
> + fw_def(DG1,  guc_def(dg1,  70, 5, 2)) \
> + fw_def(TIGERLAKE,guc_def(tgl,  70, 5, 2))
>  
>  #define XE_HUC_FIRMWARE_DEFS(fw_def, huc_def, huc_ver) \
> - fw_def(ALDERLAKE_S,  0, huc_def(tgl)) \
> - fw_def(DG1,  0, huc_def(dg1)) \
> - fw_def(TIGERLAKE,0, huc_def(tgl))
> + fw_def(ALDERLAKE_S, huc_def(tgl)) \
> + fw_def(DG1, huc_def(dg1)) \
> + fw_def(TIGERLAKE,   huc_def(tgl))
>  
>  #define __MAKE_HUC_FW_PATH(prefix_, name_) \
>  "i915/" \
> @@ -82,7 +82,7 @@ static struct xe_device *uc_fw_to_xe(struct xe_uc_fw *uc_fw)
>  
>  
>  /* All blobs need to be declared via MODULE_FIRMWARE() */
> -#define XE_UC_MODULE_FW(platform_, revid_, uc_) \
> +#define XE_UC_MODULE_FW(platform_, uc_) \
>   MODULE_FIRMWARE(uc_);
>  
>  XE_GUC_FIRMWARE_DEFS(XE_UC_MODULE_FW, MAKE_GUC_FW_PATH)
> @@ -109,16 +109,14 @@ struct __packed uc_fw_blob {
>   UC_FW_BLOB(major_, minor_, \
>  MAKE_HUC_FW_PATH_FULL_VER(prefix_, major_, minor_, bld_num_))
>  
> -struct __packed uc_fw_platform_requirement {
> +struct uc_fw_platform_requirement {
>   enum xe_platform p;
> - u8 rev; /* first platform rev using this FW */
>   const struct uc_fw_blob blob;
>  };
>  
> -#define MAKE_FW_LIST(platform_, revid_, uc_) \
> +#define MAKE_FW_LIST(platform_, uc_) \
>  { \
>   .p = XE_##platform_, \
> - .rev = revid_, \
>   .blob = uc_, \
>  },
>  
> @@ -143,7 +141,6 @@ uc_fw_auto_select(struct xe_device *xe, struct xe_uc_fw 
> *uc_fw)
>   static const struct uc_fw_platform_requirement *fw_blobs;
>   enum xe_platform p = xe->info.platform;
>   u32 fw_count;
> - u8 rev = xe->info.revid;
>   int i;
>  
>   XE_BUG_ON(uc_fw->type >= ARRAY_SIZE(blobs_all));
> @@ -151,7 +148,7 @@ uc_fw_auto_select(struct xe_device *xe, struct xe_uc_fw 
> *uc_fw)
>   fw_count = blobs_all[uc_fw->typ

Re: [Intel-gfx] [PATCH] drm/i915: Make IRQ reset and postinstall multi-gt aware

2023-03-21 Thread Matt Roper
On Wed, Mar 22, 2023 at 12:20:09AM +0100, Andi Shyti wrote:
> From: Paulo Zanoni 
> 
> In multitile systems IRQ need to be reset and enabled per GT.

At the moment we're not enabling multi-tile support on any platforms
yet.  Xe_HP SDV has pretty much already served its purpose as an early
Xe_HP test platform, and most PVC effort is refocusing on the Xe KMD
right now.

Note that we don't want/need changes like this on non-tile multi-gt
platforms like MTL.  The interrupt registers you're accessing here are
sgunit registers so there's only ever a single copy of the register on
such platforms; looping around and processing the same register two
times in a row doesn't accomplish anything that just processing them a
single time doesn't.


Matt

> 
> Signed-off-by: Paulo Zanoni 
> Cc: Tvrtko Ursulin 
> Signed-off-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 28 ++--
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 31271c30a8cf4..ee4530ec14de3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2762,14 +2762,19 @@ static void dg1_irq_reset(struct drm_i915_private 
> *dev_priv)
>  {
>   struct intel_gt *gt = to_gt(dev_priv);
>   struct intel_uncore *uncore = gt->uncore;
> + unsigned int i;
>  
>   dg1_master_intr_disable(dev_priv->uncore.regs);
>  
> - gen11_gt_irq_reset(gt);
> - gen11_display_irq_reset(dev_priv);
> + for_each_gt(gt, dev_priv, i) {
> + gen11_gt_irq_reset(gt);
>  
> - GEN3_IRQ_RESET(uncore, GEN11_GU_MISC_);
> - GEN3_IRQ_RESET(uncore, GEN8_PCU_);
> + uncore = gt->uncore;
> + GEN3_IRQ_RESET(uncore, GEN11_GU_MISC_);
> + GEN3_IRQ_RESET(uncore, GEN8_PCU_);
> + }
> +
> + gen11_display_irq_reset(dev_priv);
>  }
>  
>  void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv,
> @@ -3423,13 +3428,16 @@ static void gen11_irq_postinstall(struct 
> drm_i915_private *dev_priv)
>  
>  static void dg1_irq_postinstall(struct drm_i915_private *dev_priv)
>  {
> - struct intel_gt *gt = to_gt(dev_priv);
> - struct intel_uncore *uncore = gt->uncore;
>   u32 gu_misc_masked = GEN11_GU_MISC_GSE;
> + struct intel_gt *gt;
> + unsigned int i;
>  
> - gen11_gt_irq_postinstall(gt);
> + for_each_gt(gt, dev_priv, i) {
> + gen11_gt_irq_postinstall(gt);
>  
> - GEN3_IRQ_INIT(uncore, GEN11_GU_MISC_, ~gu_misc_masked, gu_misc_masked);
> + GEN3_IRQ_INIT(gt->uncore, GEN11_GU_MISC_, ~gu_misc_masked,
> +   gu_misc_masked);
> + }
>  
>   if (HAS_DISPLAY(dev_priv)) {
>   icp_irq_postinstall(dev_priv);
> @@ -3438,8 +3446,8 @@ static void dg1_irq_postinstall(struct drm_i915_private 
> *dev_priv)
>  GEN11_DISPLAY_IRQ_ENABLE);
>   }
>  
> - dg1_master_intr_enable(uncore->regs);
> - intel_uncore_posting_read(uncore, DG1_MSTR_TILE_INTR);
> + dg1_master_intr_enable(to_gt(dev_priv)->uncore->regs);
> + intel_uncore_posting_read(to_gt(dev_priv)->uncore, DG1_MSTR_TILE_INTR);
>  }
>  
>  static void cherryview_irq_postinstall(struct drm_i915_private *dev_priv)
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH v2 1/2] drm/i915: Sanitycheck MMIO access early in driver load

2023-03-21 Thread Matt Roper
On Tue, Mar 21, 2023 at 06:09:35PM +0100, Andi Shyti wrote:
> From: Matt Roper 
> 
> We occasionally see the PCI device in a non-accessible state at the
> point the driver is loaded.  When this happens, all BAR accesses will
> read back as 0x.  Rather than reading registers and
> misinterpreting their (invalid) values, let's specifically check for
> 0x in a register that cannot have that value to see if the
> device is accessible.
> 
> Signed-off-by: Matt Roper 
> Cc: Mika Kuoppala 
> Signed-off-by: Andi Shyti 
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 35 +
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> b/drivers/gpu/drm/i915/intel_uncore.c
> index e1e1f34490c8e..0b69081d6d285 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -2602,11 +2602,46 @@ static int uncore_forcewake_init(struct intel_uncore 
> *uncore)
>   return 0;
>  }
>  
> +static int sanity_check_mmio_access(struct intel_uncore *uncore)
> +{
> + struct drm_i915_private *i915 = uncore->i915;
> + int ret;
> +
> + if (GRAPHICS_VER(i915) < 8)
> + return 0;
> +
> + /*
> +  * Sanitycheck that MMIO access to the device is working properly.  If
> +  * the CPU is unable to communcate with a PCI device, BAR reads will
> +  * return 0x.  Let's make sure the device isn't in this state
> +  * before we start trying to access registers.
> +  *
> +  * We use the primary GT's forcewake register as our guinea pig since
> +  * it's been around since HSW and it's a masked register so the upper
> +  * 16 bits can never read back as 1's if device access is operating
> +  * properly.
> +  *
> +  * If MMIO isn't working, we'll wait up to 2 seconds to see if it
> +  * recovers, then give up.
> +  */
> + ret = intel_wait_for_register_fw(uncore, FORCEWAKE_MT, 0, 0, 200);

It looks like you lost the check for 0x specifically.  In fact
with a mask/value of 0, isn't this always going to just always pass
immediately?

We don't know what the value of this register will be (there may or may
not be some bits set), but we need to make sure that it isn't 0x
because that means we're not even truly accessing the register, just
hitting a PCI BAR read failure.


Matt

> + if (ret == -ETIMEDOUT) {
> + drm_err(>drm, "Device is non-operational; MMIO access 
> returns 0x!\n");
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
>  int intel_uncore_init_mmio(struct intel_uncore *uncore)
>  {
>   struct drm_i915_private *i915 = uncore->i915;
>   int ret;
>  
> + ret = sanity_check_mmio_access(uncore);
> + if (ret)
> +     return ret;
> +
>   /*
>* The boot firmware initializes local memory and assesses its health.
>* If memory training fails, the punit will have been instructed to
> -- 
> 2.39.2
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/selftests: keep same cache settings as timeline

2023-03-17 Thread Matt Roper
On Thu, Mar 16, 2023 at 08:43:46PM -0700, Yang, Fei wrote:
> >> From: Fei Yang 
> >>
> >> On MTL, objects allocated through i915_gem_object_create_internal() are
> >> mapped as uncached in GPU by default because HAS_LLC is false. However
> >> in the live_hwsp_read selftest these watcher objects are mapped as WB
> >> on CPU side. The conseqence is that the updates done by the GPU are not
> >> immediately visible to CPU, thus the selftest is randomly failing due to
> >> the stale data in CPU cache. Solution can be either setting WC for CPU +
> >> UC for GPU, or WB for CPU + 1-way coherent WB for GPU.
> >> To keep the consistency, let's simply inherit the same cache settings
> >> from the timeline, which is WB for CPU + 1-way coherent WB for GPU,
> >> because this test is supposed to emulate the behavior of the timeline
> >> anyway.
> >>
> >> Signed-off-by: Fei Yang 
> >
> > It looks like there might be an indentation mistake on the second line
> > of the i915_gem_object_pin_map_unlocked() call, but we can fix that up
> > while applying; no need to re-send.
> >
> > Reviewed-by: Matt Roper 
> 
> Thanks for reviewing.
> 
> > Is there an FDO issue # for the random failures thar were being seen?
> > If so, we should add a Closes: or References: tag here.
> 
> I'm not aware of a FDO filed for this failure. That might be because the
> issue is reproduced on MTL which might not be widely available to the
> community yet.

Yeah, I was thinking CI would have filed some, but I just remembered we
don't have public CI setup yet for MTL, so no automated bugs are coming
in yet.

Applied to drm-intel-gt-next.  Thanks for the patch.


Matt

> 
> > Matt
> >> ---
> >>  drivers/gpu/drm/i915/gt/selftest_timeline.c | 14 +++---
> >>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c 
> >> b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> >> index 522d0190509c..631aaed9bc3d 100644
> >> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
> >> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> >> @@ -825,7 +825,8 @@ static bool cmp_gte(u32 a, u32 b)
> >>return a >= b;
> >>  }
> >>
> >> -static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt)
> >> +static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt,
> >> +  struct intel_timeline *tl)
> >>  {
> >>struct drm_i915_gem_object *obj;
> >>struct i915_vma *vma;
> >> @@ -834,7 +835,10 @@ static int setup_watcher(struct hwsp_watcher *w, 
> >> struct intel_gt *gt)
> >>if (IS_ERR(obj))
> >>return PTR_ERR(obj);
> >>
> >> - w->map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
> >> + /* keep the same cache settings as timeline */
> >> + i915_gem_object_set_cache_coherency(obj, 
> >> tl->hwsp_ggtt->obj->cache_level);
> >> + w->map = i915_gem_object_pin_map_unlocked(obj,
> >> + page_unmask_bits(tl->hwsp_ggtt->obj->mm.mapping));
> >>if (IS_ERR(w->map)) {
> >>i915_gem_object_put(obj);
> >>return PTR_ERR(w->map);
> >> @@ -1004,8 +1008,10 @@ static int live_hwsp_read(void *arg)
> >>if (!tl->has_initial_breadcrumb)
> >>goto out_free;
> >>
> >> + selftest_tl_pin(tl);
> >> +
> >>for (i = 0; i < ARRAY_SIZE(watcher); i++) {
> >> - err = setup_watcher([i], gt);
> >> + err = setup_watcher([i], gt, tl);
> >>if (err)
> >>goto out;
> >>}
> >> @@ -1160,6 +1166,8 @@ static int live_hwsp_read(void *arg)
> >>for (i = 0; i < ARRAY_SIZE(watcher); i++)
> >>cleanup_watcher([i]);
> >>
> >> + intel_timeline_unpin(tl);
> >> +
> >>if (igt_flush_test(gt->i915))
> >>err = -EIO;
> >>
> >> --
> >> 2.25.1

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


  1   2   3   4   5   6   7   8   9   10   >