Re: [Intel-gfx] [PATCH] drm/i915/selftests: Drop igt_cs_tlb

2023-02-17 Thread Chris Wilson
Quoting Jonathan Cavitt (2023-02-17 17:20:19)
> The gt_tlb live selftest has the same code coverage as the
> igt_cs_tlb subtest of gtt.

True, the intent of the code is the same, but gt_tlb has had a much high
success rate at hitting TLB bugs, so is my preferred test.

> However, igt_cs_tlb is hitting
> an issue in that we are updating PTE in gt0->vm but
> executing on the media tile without updating gt1->vm.

I'm no longer convinced this a good explanation of the issue, as unlike
the i915_requests selftest this is using a GEM context and not the local
kernel contexts. The GEM context->vm should work equally on the mtl
render and media tiles, afaict.

The failure is very early in running on media tile (after running on the
render tile) so I think it should be easy enough to reproduce in a
simpler test to narrow down the cause.

> This issue is corrected in gt_tlb, and thus igt_cs_tlb
> is obsolete and should be removed.

gt_tlb supersedes igt_cs_tlb, that I can agree on,
Acked-by: Chris Wilson 
-Chris


Re: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-02 Thread Chris Wilson
Quoting fei.y...@intel.com (2022-03-02 18:26:57)
> From: Fei Yang 
> 
> GPU hangs have been observed when multiple engines write to the
> same aux_inv register at the same time. To avoid this each engine
> should only invalidate its own auxiliary table. The function
> gen12_emit_flush_xcs() currently invalidate the auxiliary table for
> all engines because the rq->engine is not necessarily the engine
> eventually carrying out the request, and potentially the engine
> could even be a virtual one (with engine->instance being -1).
> With this patch, auxiliary table invalidation is done only for the
> engine executing the request. And the mmio address for the aux_inv
> register is set after the engine instance becomes certain.
> 
> Signed-off-by: Chris Wilson 
> Signed-off-by: Fei Yang 
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c  | 41 ---
>  .../drm/i915/gt/intel_execlists_submission.c  | 38 +
>  drivers/gpu/drm/i915/i915_request.h   |  2 +
>  3 files changed, 47 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index b1b9c3fd7bf9..af62e2bc2c9b 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -165,30 +165,6 @@ static u32 preparser_disable(bool state)
> return MI_ARB_CHECK | 1 << 8 | state;
>  }
>  
> -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
> -{
> -   static const i915_reg_t vd[] = {
> -   GEN12_VD0_AUX_NV,
> -   GEN12_VD1_AUX_NV,
> -   GEN12_VD2_AUX_NV,
> -   GEN12_VD3_AUX_NV,
> -   };
> -
> -   static const i915_reg_t ve[] = {
> -   GEN12_VE0_AUX_NV,
> -   GEN12_VE1_AUX_NV,
> -   };
> -
> -   if (engine->class == VIDEO_DECODE_CLASS)
> -   return vd[engine->instance];
> -
> -   if (engine->class == VIDEO_ENHANCEMENT_CLASS)
> -   return ve[engine->instance];
> -
> -   GEM_BUG_ON("unknown aux_inv reg\n");
> -   return INVALID_MMIO_REG;
> -}
> -
>  static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>  {
> *cs++ = MI_LOAD_REGISTER_IMM(1);
> @@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
> if (mode & EMIT_INVALIDATE)
> aux_inv = rq->engine->mask & ~BIT(BCS0);
> if (aux_inv)
> -   cmd += 2 * hweight32(aux_inv) + 2;
> +   cmd += 4;
>  
> cs = intel_ring_begin(rq, cmd);
> if (IS_ERR(cs))
> @@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
> *cs++ = 0; /* value */
>  
> if (aux_inv) { /* hsdes: 1809175790 */
> -   struct intel_engine_cs *engine;
> -   unsigned int tmp;
> -
> -   *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
> -   for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
> -   *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
> -   *cs++ = AUX_INV;
> -   }
> +   *cs++ = MI_LOAD_REGISTER_IMM(1);
> +   rq->vd_ve_aux_inv = cs;
> +   *cs++ = 0; /* address to be set at submission to HW */
> +   *cs++ = AUX_INV;
> *cs++ = MI_NOOP;
> -   }
> +   } else
> +   rq->vd_ve_aux_inv = NULL;
>  
> if (mode & EMIT_INVALIDATE)
> *cs++ = preparser_disable(false);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 1c602d4ae297..a018de6dcac5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq)
> return __i915_request_is_complete(rq);
>  }
>  
> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
> +{
> +   static const i915_reg_t vd[] = {
> +   GEN12_VD0_AUX_NV,
> +   GEN12_VD1_AUX_NV,
> +   GEN12_VD2_AUX_NV,
> +   GEN12_VD3_AUX_NV,
> +   };
> +
> +   static const i915_reg_t ve[] = {
> +   GEN12_VE0_AUX_NV,
> +   GEN12_VE1_AUX_NV,
> +   };
> +
> +   if (engine->class == VIDEO_DECODE_CLASS) {
> +   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd));
> +   return vd[engine->insta

Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] lib/igt_device: Add support for accessing unbound VF PCI devices

2022-02-18 Thread Chris Wilson
Quoting Janusz Krzysztofik (2022-02-18 17:08:41)
> Hi Chris,
> 
> On Friday, 18 February 2022 17:03:01 CET Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2022-02-18 15:19:35)
> > > @@ -206,15 +229,19 @@ static struct pci_device 
> > > *__igt_device_get_pci_device(int fd)
> > > igt_warn("Couldn't find PCI device %04x:%02x:%02x:%02x\n",
> > >  pci_addr.domain, pci_addr.bus,
> > >  pci_addr.device, pci_addr.function);
> > > -   return NULL;
> > > +   goto cleanup;
> > > }
> > >  
> > > if (pci_device_probe(pci_dev)) {
> > > igt_warn("Couldn't probe PCI device\n");
> > > -   return NULL;
> > > +   goto cleanup;
> > > }
> > >  
> > > return pci_dev;
> > > +
> > > +cleanup:
> > > +   pci_system_cleanup();
> > 
> > This is a global cleanup of libpciaccess iirc, such that if anyone else
> > was using the library they would be affected.
> 
> Right, but shouldn't we also drop pci_system_init() from here and request 
> users to manage initialization and cleanup of that data themselves?  On each 
> call pci_system_init() abandons existing data and overwrites a pointer to it 
> with that of newly allocated memory, then tests calling 
> igt_device_get_pci_device() multiple times are going to suffer from 
> significant memory leaking.

Right, I thought it only inited once -- I just remember the issue with
calling pci_system_cleanup() while others were still using it.

Stick the call to init in an __attribute__((constructor)) or pthread_once.
-Chris


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] lib/igt_device: Add support for accessing unbound VF PCI devices

2022-02-18 Thread Chris Wilson
Quoting Janusz Krzysztofik (2022-02-18 15:19:35)
> @@ -206,15 +229,19 @@ static struct pci_device 
> *__igt_device_get_pci_device(int fd)
> igt_warn("Couldn't find PCI device %04x:%02x:%02x:%02x\n",
>  pci_addr.domain, pci_addr.bus,
>  pci_addr.device, pci_addr.function);
> -   return NULL;
> +   goto cleanup;
> }
>  
> if (pci_device_probe(pci_dev)) {
> igt_warn("Couldn't probe PCI device\n");
> -   return NULL;
> +   goto cleanup;
> }
>  
> return pci_dev;
> +
> +cleanup:
> +   pci_system_cleanup();

This is a global cleanup of libpciaccess iirc, such that if anyone else
was using the library they would be affected.

> +   return NULL;
>  }


Re: [Intel-gfx] [PATCH 0/2] Backport upstream commit e49a8b2cc852

2021-12-03 Thread Chris Wilson
Quoting Janusz Krzysztofik (2021-12-03 13:21:06)
> diff --git a/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch 
> b/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch
> index efadd30d8cad..62b0a41d4aa4 100644
> --- a/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch
> +++ b/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch
> @@ -8,20 +8,20 @@ some engines will be fully setup and some not. Those 
> incompletely setup
>  engines only have 'engine->release == NULL' and so will leak any of the
>  common objects allocated.
>  
> -Incorporates some suggestions from Janusz for handling pinned context
> -cleanup.
> +v2: no longer incorporates suggestions from Janusz for handling pinned
> +context cleanup since upstream version has been backported

Noting the absence of something not explained just adds confusion.

You've handled this appropriately as a standalone patch, no more needs
to be said here (and the chunk doesn't even appear in the patch
anymore).

The covernote explained why this chunk disappears, so we have a story
for the pile, which is a different story as that told by the series of
individual patches.

Reviewed-by: Chris Wilson 
-Chris


Re: [Intel-gfx] [PATCH 2/2] drm/i915: Rename gt to gt0

2021-11-17 Thread Chris Wilson
Quoting Andi Shyti (2021-11-17 13:34:56)
> diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c 
> b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> index 089fb4658b216..0bbf8c0c42eac 100644
> --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
> @@ -817,7 +817,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
>  * maximum clocks following a vblank miss (see do_rps_boost()).
>  */
> if (!state->rps_interactive) {
> -   intel_rps_mark_interactive(_priv->gt.rps, true);
> +   intel_rps_mark_interactive(_priv->gt0.rps, true);

This should be across all gt, so probably wants a fresh interface that
takes i915 and does for_each_gt in a later patch. (Since we could be
rendering on a remote tile to present on a display.)

> state->rps_interactive = true;
> }
>  
> @@ -851,7 +851,7 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
> return;
>  
> if (state->rps_interactive) {
> -   intel_rps_mark_interactive(_priv->gt.rps, false);
> +   intel_rps_mark_interactive(_priv->gt0.rps, false);
> state->rps_interactive = false;
> }
>  
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
> b/drivers/gpu/drm/i915/display/intel_display.c
> index 0ceee8ac66717..d4fcd8f236476 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -838,7 +838,7 @@ __intel_display_resume(struct drm_device *dev,
>  static bool gpu_reset_clobbers_display(struct drm_i915_private *dev_priv)
>  {
> return (INTEL_INFO(dev_priv)->gpu_reset_clobbers_display &&
> -   intel_has_gpu_reset(_priv->gt));
> +   intel_has_gpu_reset(_priv->gt0));

All these display consumers probably want to use
dev_priv->ggtt->vm.gt, since the scanout capable GGTT would seem to be
the defining feature.

to_scanout_gt(i915) ?

>  static bool pxp_is_borked(struct drm_i915_gem_object *obj)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ebd775cb1661c..c62253d0af044 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -237,7 +237,7 @@ static int proto_context_set_persistence(struct 
> drm_i915_private *i915,
>  * colateral damage, and we should not pretend we can by
>  * exposing the interface.
>  */
> -   if (!intel_has_reset_engine(>gt))
> +   if (!intel_has_reset_engine(>gt0))
> return -ENODEV;

Prep for all gt. A lot of these need an all-gt interface so we don't
have for_each_gt spread all other the place.

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index ef22d4ed66ad6..69ad407eb15f3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -166,7 +166,7 @@ static struct dma_fence *i915_ttm_accel_move(struct 
> ttm_buffer_object *bo,
> enum i915_cache_level src_level, dst_level;
> int ret;
>  
> -   if (!i915->gt.migrate.context || intel_gt_is_wedged(>gt))
> +   if (!i915->gt0.migrate.context || intel_gt_is_wedged(>gt0))

This should already be looking at lmem->gt

> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 8f8bea08e734d..176ea5c7d422f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -116,7 +116,7 @@ static void set_scheduler_caps(struct drm_i915_private 
> *i915)
> disabled |= (I915_SCHEDULER_CAP_ENABLED |
>  I915_SCHEDULER_CAP_PRIORITY);
>  
> -   if (intel_uc_uses_guc_submission(>gt.uc))
> +   if (intel_uc_uses_guc_submission(>gt0.uc))

This shouldn't be looking at gt at all, but if it must, that information
must be coming via engine->gt. Kind of renders the mapping moot
currently.
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
> b/drivers/gpu/drm/i915/gt/intel_rps.c
> index 07ff7ba7b2b71..63089e671a242 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -2302,7 +2302,7 @@ unsigned long i915_read_mch_val(void)
> return 0;
>  
> with_intel_runtime_pm(>runtime_pm, wakeref) {
> -   struct intel_ips *ips = >gt.rps.ips;
> +   struct intel_ips *ips = >gt0.rps.ips;

Make mchdev_get() return the gt or rps, at the slight cost of making the
drm_dev_put() more complicated (but can be pushed into a mchdev_put for
symmetry).

> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index a9727447c0379..4bfedc04f5c70 100644

Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2

2021-03-23 Thread Chris Wilson
Quoting Ville Syrjälä (2021-03-22 14:48:44)
> On Sun, Mar 21, 2021 at 04:30:32PM +0000, Chris Wilson wrote:
> > Quoting Chris Wilson (2021-03-21 16:28:07)
> > > Quoting Ville Syrjala (2021-03-21 16:10:38)
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> > > > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > index ec28a6cde49b..0b2434e29d00 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > > @@ -189,7 +189,7 @@ compute_partial_view(const struct 
> > > > drm_i915_gem_object *obj,
> > > > struct i915_ggtt_view view;
> > > >  
> > > > if (i915_gem_object_is_tiled(obj))
> > > > -   chunk = roundup(chunk, tile_row_pages(obj));
> > > > +   chunk = roundup(chunk, tile_row_pages(obj) ?: 1);
> > > 
> > > I was thinking the answer would be to align to the next page, and hey
> > > presto!
> > 
> > Wait, the tile row cannot be a single page. Something else is zero that
> > should not be.
> 
> How come? At least i915_tiling_ok() doesn't enforce any
> bigger lower bound.

This maybe the trap I'm falling into, thinking that all arch have at
least 4K tile rows. Some might say, "shouldn't the chunk be aligned to an
even tile row" as well, but I was never certain about that.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2

2021-03-21 Thread Chris Wilson
Quoting Chris Wilson (2021-03-21 16:30:32)
> Quoting Chris Wilson (2021-03-21 16:28:07)
> > Quoting Ville Syrjala (2021-03-21 16:10:38)
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > index ec28a6cde49b..0b2434e29d00 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > > @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object 
> > > *obj,
> > > struct i915_ggtt_view view;
> > >  
> > > if (i915_gem_object_is_tiled(obj))
> > > -   chunk = roundup(chunk, tile_row_pages(obj));
> > > +   chunk = roundup(chunk, tile_row_pages(obj) ?: 1);
> > 
> > I was thinking the answer would be to align to the next page, and hey
> > presto!
> 
> Wait, the tile row cannot be a single page. Something else is zero that
> should not be.

Which fortunately does not matter here, as we start with a 2MiB chunk
and want to align that to a multiple of tile rows. Still, as you said,
something stinks.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2

2021-03-21 Thread Chris Wilson
Quoting Chris Wilson (2021-03-21 16:28:07)
> Quoting Ville Syrjala (2021-03-21 16:10:38)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index ec28a6cde49b..0b2434e29d00 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object 
> > *obj,
> > struct i915_ggtt_view view;
> >  
> > if (i915_gem_object_is_tiled(obj))
> > -   chunk = roundup(chunk, tile_row_pages(obj));
> > +   chunk = roundup(chunk, tile_row_pages(obj) ?: 1);
> 
> I was thinking the answer would be to align to the next page, and hey
> presto!

Wait, the tile row cannot be a single page. Something else is zero that
should not be.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2

2021-03-21 Thread Chris Wilson
Quoting Ville Syrjala (2021-03-21 16:10:38)
> From: Ville Syrjälä 
> 
> Gen2 tiles are 2KiB in size so i915_gem_object_get_tile_row_size()
> can in fact return <4KiB, which leads to div-by-zero here.
> Avoid that.

So long as we overestimate it is fine, since we only care to find a
suitably small chunk that is large enough. I thought it was
overestimating, oh well.
 
> Not sure i915_gem_object_get_tile_row_size() is entirely
> sane anyway since it doesn't account for the different tile
> layouts on i8xx/i915...

It should not matter so long as we pick a common divisor, suitable for
the fence register.
 
> I'm not able to hit this before commit 6846895fde05 ("drm/i915:
> Replace PIN_NONFAULT with calls to PIN_NOEVICT") and it looks
> like I also need to run recent version of Mesa. With those in
> place xonotic trips on this quite easily on my 85x.

NOEVICT will make it much less eager to remove older bindings, with the
preference then to use smaller views of objects. The theory being that
the workingset is less than the whole object, so we can fit more active
pages in and cause less thrashing when moving the unused pages around
in the GTT.

> Cc: Chris Wilson 
> Signed-off-by: Ville Syrjälä 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index ec28a6cde49b..0b2434e29d00 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object 
> *obj,
> struct i915_ggtt_view view;
>  
> if (i915_gem_object_is_tiled(obj))
> -   chunk = roundup(chunk, tile_row_pages(obj));
> +   chunk = roundup(chunk, tile_row_pages(obj) ?: 1);

I was thinking the answer would be to align to the next page, and hey
presto!

Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915: Drop relocation support on all new hardware (v3)

2021-03-11 Thread Chris Wilson
Quoting Zbigniew Kempczyński (2021-03-11 11:44:32)
> On Wed, Mar 10, 2021 at 03:50:07PM -0600, Jason Ekstrand wrote:
> > The Vulkan driver in Mesa for Intel hardware never uses relocations if
> > it's running on a version of i915 that supports at least softpin which
> > all versions of i915 supporting Gen12 do.  On the OpenGL side, Gen12+ is
> > only supported by iris which never uses relocations.  The older i965
> > driver in Mesa does use relocations but it only supports Intel hardware
> > through Gen11 and has been deprecated for all hardware Gen9+.  The
> > compute driver also never uses relocations.  This only leaves the media
> > driver which is supposed to be switching to softpin going forward.
> > Making softpin a requirement for all future hardware seems reasonable.
> > 
> > Rejecting relocations starting with Gen12 has the benefit that we don't
> > have to bother supporting it on platforms with local memory.  Given how
> > much CPU touching of memory is required for relocations, not having to
> > do so on platforms where not all memory is directly CPU-accessible
> > carries significant advantages.
> > 
> > v2 (Jason Ekstrand):
> >  - Allow TGL-LP platforms as they've already shipped
> > 
> > v3 (Jason Ekstrand):
> >  - WARN_ON platforms with LMEM support in case the check is wrong
> 
> I was asked to review of this patch. It works along with expected
> IGT check https://patchwork.freedesktop.org/patch/423361/?series=82954=25
> 
> Before I'll give you r-b - isn't i915_gem_execbuffer2_ioctl() better place
> to do for loop just after copy_from_user() and check relocation_count?
> We have an access to exec2_list there, we know the gen so we're able to say
> relocations are not supported immediate, without entering 
> i915_gem_do_execbuffer().

There's a NORELOC flag you can enforce as mandatory. That's trivial for
userspace to set, really makes sure they are aware of the change afoot,
and i915_gem_ceck_execbuffer() will perform the validation upfront with
the other flag checks.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] Revert "drm/i915: Propagate errors on awaiting already signaled fences"

2021-03-11 Thread Chris Wilson
Quoting Daniel Vetter (2021-03-11 16:01:46)
> On Fri, Mar 05, 2021 at 11:05:46AM -0600, Jason Ekstrand wrote:
> > This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7.  Ever
> > since that commit, we've been having issues where a hang in one client
> > can propagate to another.  In particular, a hang in an app can propagate
> > to the X server which causes the whole desktop to lock up.
> > 
> > Signed-off-by: Jason Ekstrand 
> > Reported-by: Marcin Slusarz 
> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> > Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already 
> > signaled fences")
> 
> Yeah I suggested to just go with the revert, so I guess on my to give you
> the explainer to be added to the commit message.

If you took the patch this was copied from and only revert on the
dma-resv, things do not break horribly.

> Error propagation along fences sound like a good idea, but as your bug
> shows, surprising consequences, since propagating errors across security
> boundaries is not a good thing.
> 
> What we do have is track the hangs on the ctx, and report information to
> userspace using RESET_STATS.

And by the fence, which is far more precise.

> That's how arb_robustness works. Also, if my
> understanding is still correct, the EIO from execbuf is when your context
> is banned (because not recoverable or too many hangs). And in all these
> cases it's up to userspace to figure out what is all impacted and should
> be reported to the application, that's not on the kernel to guess and
> automatically propagate.
> 
> What's more, we're also building more features on top of ctx error
> reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
> userspace fence wait also relies on that mechanism. So it is the path
> going forward for reporting gpu hangs and resets to userspace.

That ioctl is not a solid basis, it never did quite work as expected and
we kept realising the limitations of the information and the accuracy
that it could report.
 
> So all together that's why I think we should just bury this idea again as
> not quite the direction we want to go to, hence why I think the revert is
> the right option here.

No, as shown by igt it's a critical issue that we have to judicially
chose which errors to ignore. And it's not just the ability to subvert
gen7 and gen9, it's the error tracking employed for preempting contexts
among others.  Hence go with the original patch to undo the propagation
along dma-resv.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-03-10 10:19:12)
> 
> Hi,
> 
> On 08/03/2021 17:32, Chiou, Cooper wrote:
> > I've tested on GLK, KBL, CFL Intel NUC devices and got the following 
> > performance results, there is no performance regression per my testing.
> > 
> > Patch: [v5] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads 
> > for Gen9
> > Test suite: 
> > phoronix-test-suite.supertuxkart.1024x768.Fullscreen.Ultimate.1.GranParadisoIsland.frames_per_second
> > Kernel version: 5.12.0-rc1 (drm-tip)
> > 
> > a. Device: Intel NUC kit NUC7JY Gemini Lake Celeron J4005 @2.7GHz (2 Cores)
> >  Without patch, fps=57.45
> >  With patch, fps=57.49
> > b. Device: Intel NUC kit NUC8BEH Coffee Lake Core i3-8109U @3.6GHz(4 Cores)
> >  Without patch, fps=117.23
> >  With patch, fps=117.27
> > c. Device: Intel NUC kit NUC7i3BNH Kaby Lake Core i3-7100U @2.4GHz(4 Cores)
> >  Without patch, fps=114.05
> >  With patch, fps=114.34
> > 
> > Meanwhile, Intel lkp team has validated performance on lkp-kbl-nuc1 and no 
> > regression.
> > f69d02e37a85645a  d912096c40cdc3bc9364966971 testcase/testparams/testbox
> >   -- ---
> >%stddev  change %stddev
> >\  |\
> >29.79   29.67
> > phoronix-test-suite/performance-true-Fullscreen-Ultimate-1-Gran_Paradiso_Island__Approxima-supertuxkart-1.5.2-ucode=0xde/lkp-kbl-nuc1
> >29.79   29.67GEO-MEAN 
> > phoronix-test-suite.supertuxkart.1280x1024.Fullscreen.Ultimate.1.GranParadisoIsland.frames_per_second
> > 
> 
> CI results are green so that is good.
> 
> Do the machines used for performance testing include unusual fusing? 
> Worrying thing is that we were never able to reproduce the reported 
> regression in house due lack of identical machine, right? Although I 
> guess avoiding hangs trumps performance.

The issue is that if the regression is reproducible it means that the
broadcast mask is no longer correct (or never was, one or the other ;)
And another w/a is going astray because it depends on the previous
undefined value of the mcr.

Which raises the question as to whether the hang prevention seen here is
also because some other w/a (or other mmio) is not being applied to the
relevant units. Or vice versa.

Either way there remains an underlying issue in that some register
writes for gen9 require mcr being set that were are not handling
correctly. Changing the mask here changing results elsewhere indicate
that the issues are fully addressed, and the fear that undoing some
other mmio is going to introduce other subtle hangs. And we are all
blindly poking at the issue as no one has access to the affected skus.

What would be useful is if we print the value before changing it so that
we can see if we have any machines in CI where we are making significant
changes to the broadcast mask.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] gpu: drm: i915: fix error return code of igt_buddy_alloc_smoke()

2021-03-08 Thread Chris Wilson
Quoting Jia-Ju Bai (2021-03-08 08:59:52)
> When i915_random_order() returns NULL to order, no error return code of
> igt_buddy_alloc_smoke() is assigned.
> To fix this bug, err is assigned with -EINVAL in this case.

It would not be EINVAL since that is used for a reference failure, but
in this case the idea was to return 0 as no testing was done and the
ENOMEM was raised before testing began i.e. not an internal and
unexpected driver allocation failure.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] gpu: drm: i915: fix error return code of igt_threaded_blt()

2021-03-08 Thread Chris Wilson
Quoting Jia-Ju Bai (2021-03-08 09:07:22)
> When kcalloc() returns NULL to tsk or thread, no error code of 
> igt_threaded_blt() is returned.
> To fix this bug, -ENOMEM is returned as error code.

Because we decided to skip the test if it could not be run due to
insufficient memory, as opposed to an internal allocation failure from
the driver.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] Revert "drm/i915: Propagate errors on awaiting already signaled fences"

2021-03-05 Thread Chris Wilson
Quoting Jason Ekstrand (2021-03-05 17:05:46)
> This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7.  Ever
> since that commit, we've been having issues where a hang in one client
> can propagate to another.  In particular, a hang in an app can propagate
> to the X server which causes the whole desktop to lock up.

The fence error handling is required to prevent user's circumventing
incomplete work, such as security validation or escaping isolation.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-05 Thread Chris Wilson
Quoting Chris Wilson (2021-03-05 12:20:45)
> Quoting Tvrtko Ursulin (2021-03-05 09:23:02)
> > I am not sure if PC8 and DMC could also be involved from what Cooper was 
> > saying in a different thread. Maybe another CI run without the DMC, both 
> > ffs and fls. Another for limiting cstates.
> 
> Disabling the dmc leaves the display code in an inconsistent state so we
> don't complete a BAT run; but since the warnings are thrown during boot
> we can say that disabling the dmc does clear up the workaround issues on
> ehl/jsl:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-ehl-2/boot0.txt
> https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-jsl-1/boot0.txt

Fwiw, disabling the dmc and using fls for gen9 is not enough to avoid
the warnings though:

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7614/fi-cfl-8109u/igt@i915_selftest@live@memory_region.html

So far only ffs works for gen9 mcr.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-05 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-03-05 09:23:02)
> I am not sure if PC8 and DMC could also be involved from what Cooper was 
> saying in a different thread. Maybe another CI run without the DMC, both 
> ffs and fls. Another for limiting cstates.

Disabling the dmc leaves the display code in an inconsistent state so we
don't complete a BAT run; but since the warnings are thrown during boot
we can say that disabling the dmc does clear up the workaround issues on
ehl/jsl:

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-ehl-2/boot0.txt
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-jsl-1/boot0.txt
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 6/7] drm/i915: rename DISP_STEPPING->DISPLAY_STEP and GT_STEPPING->GT_STEP

2021-03-05 Thread Chris Wilson
Quoting Jani Nikula (2021-02-24 08:46:55)
> On Tue, 23 Feb 2021, Lucas De Marchi  wrote:
> > On Tue, Feb 23, 2021 at 05:35:11PM +0200, Jani Nikula wrote:
> >>Matter of taste. STEP matches the enums.
> >>
> >>Signed-off-by: Jani Nikula 
> >>---
> >> drivers/gpu/drm/i915/display/intel_display_power.c |  2 +-
> >> drivers/gpu/drm/i915/display/intel_psr.c   |  4 ++--
> >> drivers/gpu/drm/i915/display/skl_universal_plane.c |  2 +-
> >> drivers/gpu/drm/i915/gt/intel_workarounds.c| 10 +-
> >> drivers/gpu/drm/i915/i915_drv.h| 10 +-
> >> drivers/gpu/drm/i915/intel_device_info.c   |  2 +-
> >> drivers/gpu/drm/i915/intel_pm.c|  2 +-
> >> 7 files changed, 16 insertions(+), 16 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c 
> >>b/drivers/gpu/drm/i915/display/intel_display_power.c
> >>index f00c1750febd..1f7b2700947a 100644
> >>--- a/drivers/gpu/drm/i915/display/intel_display_power.c
> >>+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
> >>@@ -5349,7 +5349,7 @@ static void tgl_bw_buddy_init(struct drm_i915_private 
> >>*dev_priv)
> >>
> >>  if (IS_ALDERLAKE_S(dev_priv) ||
> >>  IS_DG1_REVID(dev_priv, DG1_REVID_A0, DG1_REVID_A0) ||
> >>- IS_TGL_DISP_STEPPING(dev_priv, STEP_A0, STEP_B0))
> >>+ IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> >>  /* Wa_1409767108:tgl,dg1,adl-s */
> >>  table = wa_1409767108_buddy_page_masks;
> >>  else
> >>diff --git a/drivers/gpu/drm/i915/display/intel_psr.c 
> >>b/drivers/gpu/drm/i915/display/intel_psr.c
> >>index 7c6e561f86c1..da5084b54eb6 100644
> >>--- a/drivers/gpu/drm/i915/display/intel_psr.c
> >>+++ b/drivers/gpu/drm/i915/display/intel_psr.c
> >>@@ -548,7 +548,7 @@ static void hsw_activate_psr2(struct intel_dp *intel_dp)
> >>
> >>  if (intel_dp->psr.psr2_sel_fetch_enabled) {
> >>  /* WA 1408330847 */
> >>- if (IS_TGL_DISP_STEPPING(dev_priv, STEP_A0, STEP_A0) ||
> >>+ if (IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0) ||
> >
> > I always hated the DISP vs DISPLAY. It should be in the commit message.
> >
> > But if you are doing the s/STEPPING/STEP/, shouldn't the filename also use
> > step and all the functions/structs?
> 
> To be honest, the rename came as an afterthought, after Aditya (I think)
> added the STEP_X enums.
> 
> For me step everywhere sounds good, I wonder what the native speakers
> think.

IS_DISPLAY_STEPPING(STEP_X) is more flamboyant than
IS_DISPLAY_STEP(STEP_X), but we often make the concession for brevity
and in this case the consistency between macro and enum beats the
inconsistency in English. So STEP reads as a perfectly acceptable synonym
for STEPPING.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-04 Thread Chris Wilson
Quoting Chris Wilson (2021-03-04 11:56:16)
> Quoting Chris Wilson (2021-03-04 09:19:24)
> > Quoting Tvrtko Ursulin (2021-03-04 09:12:26)
> > > 
> > > On 02/03/2021 06:27, Cooper Chiou wrote:
> > > > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to
> > > > resolve VP8 hardware encoding system hang up on GT1 sku for
> > > > ChromiumOS projects
> > > > 
> > > > Slice specific MMIO read inaccurate so MGSR needs to be programmed
> > > > appropriately to get correct reads from these slicet-related MMIOs.
> > > > 
> > > > It dictates that before any MMIO read into Slice/Subslice specific
> > > > registers, MCR packet control register(0xFDC) needs to be programmed
> > > > to point to any enabled slice/subslice pair, especially GT1 fused sku
> > > > since this issue can be reproduced on VP8 hardware encoding via ffmpeg
> > > > on ChromiumOS devices.
> > > > When exit PC7, MGSR will reset so that we have to skip fused subslice 
> > > > ID.
> > > > 
> > > > Reference: HSD#1508045018,1405586840, BSID#0575
> > > > 
> > > > Cc: Ville Syrjälä 
> > > > Cc: Rodrigo Vivi 
> > > > Cc: Jani Nikula 
> > > > Cc: Chris Wilson 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: William Tseng 
> > > > Cc: Lee Shawn C 
> > > > 
> > > > Signed-off-by: Cooper Chiou 
> > > > ---
> > > >   drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 +
> > > >   1 file changed, 38 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> > > > b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > > index 3b4a7da60f0b..4ad598a727a6 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private 
> > > > *i915, struct i915_wa_list *wal)
> > > >   wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
> > > >   }
> > > >   
> > > > +static void
> > > > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list 
> > > > *wal)
> > > > +{
> > > > + const struct sseu_dev_info *sseu = >gt.info.sseu;
> > > > + unsigned int slice, subslice;
> > > > + u32 mcr, mcr_mask;
> > > > +
> > > > + GEM_BUG_ON(INTEL_GEN(i915) < 9);
> > > > +
> > > > + /*
> > > > +  * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml
> > > > +  * Before any MMIO read into slice/subslice specific registers, 
> > > > MCR
> > > > +  * packet control register needs to be programmed to point to any
> > > > +  * enabled s/ss pair. Otherwise, incorrect values will be 
> > > > returned.
> > > > +  * This means each subsequent MMIO read will be forwarded to an
> > > > +  * specific s/ss combination, but this is OK since these registers
> > > > +  * are consistent across s/ss in almost all cases. In the rare
> > > > +  * occasions, such as INSTDONE, where this value is dependent
> > > > +  * on s/ss combo, the read should be done with read_subslice_reg.
> > > > +  */
> > > > + slice = fls(sseu->slice_mask) - 1;
> > > > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask));
> > > > + subslice = fls(intel_sseu_get_subslices(sseu, slice));
> > > > + GEM_BUG_ON(!subslice);
> > > > + subslice--;
> > > > +
> > > > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
> > > > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
> > > > +
> > > > + drm_dbg(>drm, "MCR slice:%d/subslice:%d = %x\n", slice, 
> > > > subslice, mcr);
> > > > +
> > > > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr);
> > > > +}
> > > 
> > > Have you considered reusing existing wa_init_mcr? Just needs the 
> > > top-level assert changed and otherwise it looks it would do the right 
> > > thing for Gen9. Advantage being a smaller patch and less code to carry.
> > 
> > That was the first patch, and fails for the same reason. The problem
> > would appear to be in the mcr_mask for gt3.
> 
> For the record, it appears to be an issue with fls vs ffs. Switching to
> ffs also removes the warnings for workaround failures on ehl/jsl.

Of course the icl in farm2 started spewing warnigns, but the other icl
in farm1 were happy.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-04 Thread Chris Wilson
Quoting Chris Wilson (2021-03-04 09:19:24)
> Quoting Tvrtko Ursulin (2021-03-04 09:12:26)
> > 
> > On 02/03/2021 06:27, Cooper Chiou wrote:
> > > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to
> > > resolve VP8 hardware encoding system hang up on GT1 sku for
> > > ChromiumOS projects
> > > 
> > > Slice specific MMIO read inaccurate so MGSR needs to be programmed
> > > appropriately to get correct reads from these slicet-related MMIOs.
> > > 
> > > It dictates that before any MMIO read into Slice/Subslice specific
> > > registers, MCR packet control register(0xFDC) needs to be programmed
> > > to point to any enabled slice/subslice pair, especially GT1 fused sku
> > > since this issue can be reproduced on VP8 hardware encoding via ffmpeg
> > > on ChromiumOS devices.
> > > When exit PC7, MGSR will reset so that we have to skip fused subslice ID.
> > > 
> > > Reference: HSD#1508045018,1405586840, BSID#0575
> > > 
> > > Cc: Ville Syrjälä 
> > > Cc: Rodrigo Vivi 
> > > Cc: Jani Nikula 
> > > Cc: Chris Wilson 
> > > Cc: Tvrtko Ursulin 
> > > Cc: William Tseng 
> > > Cc: Lee Shawn C 
> > > 
> > > Signed-off-by: Cooper Chiou 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 +
> > >   1 file changed, 38 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> > > b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > index 3b4a7da60f0b..4ad598a727a6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private 
> > > *i915, struct i915_wa_list *wal)
> > >   wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
> > >   }
> > >   
> > > +static void
> > > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal)
> > > +{
> > > + const struct sseu_dev_info *sseu = >gt.info.sseu;
> > > + unsigned int slice, subslice;
> > > + u32 mcr, mcr_mask;
> > > +
> > > + GEM_BUG_ON(INTEL_GEN(i915) < 9);
> > > +
> > > + /*
> > > +  * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml
> > > +  * Before any MMIO read into slice/subslice specific registers, MCR
> > > +  * packet control register needs to be programmed to point to any
> > > +  * enabled s/ss pair. Otherwise, incorrect values will be returned.
> > > +  * This means each subsequent MMIO read will be forwarded to an
> > > +  * specific s/ss combination, but this is OK since these registers
> > > +  * are consistent across s/ss in almost all cases. In the rare
> > > +  * occasions, such as INSTDONE, where this value is dependent
> > > +  * on s/ss combo, the read should be done with read_subslice_reg.
> > > +  */
> > > + slice = fls(sseu->slice_mask) - 1;
> > > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask));
> > > + subslice = fls(intel_sseu_get_subslices(sseu, slice));
> > > + GEM_BUG_ON(!subslice);
> > > + subslice--;
> > > +
> > > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
> > > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
> > > +
> > > + drm_dbg(>drm, "MCR slice:%d/subslice:%d = %x\n", slice, 
> > > subslice, mcr);
> > > +
> > > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr);
> > > +}
> > 
> > Have you considered reusing existing wa_init_mcr? Just needs the 
> > top-level assert changed and otherwise it looks it would do the right 
> > thing for Gen9. Advantage being a smaller patch and less code to carry.
> 
> That was the first patch, and fails for the same reason. The problem
> would appear to be in the mcr_mask for gt3.

For the record, it appears to be an issue with fls vs ffs. Switching to
ffs also removes the warnings for workaround failures on ehl/jsl.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-03-04 Thread Chris Wilson
Quoting Lionel Landwerlin (2021-03-04 09:45:47)
> On 04/03/2021 10:58, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2021-03-04 08:28:59)
> >> On 04/03/2021 02:09, Chris Wilson wrote:
> >>> Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
> >>>> Perf measurements rely on CPU and engine timestamps to correlate
> >>>> events of interest across these time domains. Current mechanisms get
> >>>> these timestamps separately and the calculated delta between these
> >>>> timestamps lack enough accuracy.
> >>>>
> >>>> To improve the accuracy of these time measurements to within a few us,
> >>>> add a query that returns the engine and cpu timestamps captured as
> >>>> close to each other as possible.
> >>>>
> >>>> v2: (Tvrtko)
> >>>> - document clock reference used
> >>>> - return cpu timestamp always
> >>>> - capture cpu time just before lower dword of cs timestamp
> >>>>
> >>>> v3: (Chris)
> >>>> - use uncore-rpm
> >>>> - use __query_cs_timestamp helper
> >>>>
> >>>> v4: (Lionel)
> >>>> - Kernel perf subsytem allows users to specify the clock id to be used
> >>>> in perf_event_open. This clock id is used by the perf subsystem to
> >>>> return the appropriate cpu timestamp in perf events. Similarly, let
> >>>> the user pass the clockid to this query so that cpu timestamp
> >>>> corresponds to the clock id requested.
> >>>>
> >>>> v5: (Tvrtko)
> >>>> - Use normal ktime accessors instead of fast versions
> >>>> - Add more uApi documentation
> >>>>
> >>>> v6: (Lionel)
> >>>> - Move switch out of spinlock
> >>>>
> >>>> v7: (Chris)
> >>>> - cs_timestamp is a misnomer, use cs_cycles instead
> >>>> - return the cs cycle frequency as well in the query
> >>>>
> >>>> v8:
> >>>> - Add platform and engine specific checks
> >>>>
> >>>> v9: (Lionel)
> >>>> - Return 2 cpu timestamps in the query - captured before and after the
> >>>> register read
> >>>>
> >>>> Signed-off-by: Umesh Nerlige Ramappa 
> >>>> ---
> >>>>drivers/gpu/drm/i915/i915_query.c | 144 ++
> >>>>include/uapi/drm/i915_drm.h   |  47 ++
> >>>>2 files changed, 191 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> >>>> b/drivers/gpu/drm/i915/i915_query.c
> >>>> index fed337ad7b68..acca22ee6014 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_query.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_query.c
> >>>> @@ -6,6 +6,8 @@
> >>>>
> >>>>#include 
> >>>>
> >>>> +#include "gt/intel_engine_pm.h"
> >>>> +#include "gt/intel_engine_user.h"
> >>>>#include "i915_drv.h"
> >>>>#include "i915_perf.h"
> >>>>#include "i915_query.h"
> >>>> @@ -90,6 +92,147 @@ static int query_topology_info(struct 
> >>>> drm_i915_private *dev_priv,
> >>>>   return total_length;
> >>>>}
> >>>>
> >>>> +typedef u64 (*__ktime_func_t)(void);
> >>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> >>>> +{
> >>>> +   /*
> >>>> +* Use logic same as the perf subsystem to allow user to select 
> >>>> the
> >>>> +* reference clock id to be used for timestamps.
> >>>> +*/
> >>>> +   switch (clk_id) {
> >>>> +   case CLOCK_MONOTONIC:
> >>>> +   return _get_ns;
> >>>> +   case CLOCK_MONOTONIC_RAW:
> >>>> +   return _get_raw_ns;
> >>>> +   case CLOCK_REALTIME:
> >>>> +   return _get_real_ns;
> >>>> +   case CLOCK_BOOTTIME:
> >>>> +   return _get_boottime_ns;
> >>>> +   case CLOCK_TAI:
> >>>> +   return _get_clocktai_ns;
> >>>> +   default:
> >>>> +   

Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9

2021-03-04 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-03-04 09:12:26)
> 
> On 02/03/2021 06:27, Cooper Chiou wrote:
> > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to
> > resolve VP8 hardware encoding system hang up on GT1 sku for
> > ChromiumOS projects
> > 
> > Slice specific MMIO read inaccurate so MGSR needs to be programmed
> > appropriately to get correct reads from these slicet-related MMIOs.
> > 
> > It dictates that before any MMIO read into Slice/Subslice specific
> > registers, MCR packet control register(0xFDC) needs to be programmed
> > to point to any enabled slice/subslice pair, especially GT1 fused sku
> > since this issue can be reproduced on VP8 hardware encoding via ffmpeg
> > on ChromiumOS devices.
> > When exit PC7, MGSR will reset so that we have to skip fused subslice ID.
> > 
> > Reference: HSD#1508045018,1405586840, BSID#0575
> > 
> > Cc: Ville Syrjälä 
> > Cc: Rodrigo Vivi 
> > Cc: Jani Nikula 
> > Cc: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > Cc: William Tseng 
> > Cc: Lee Shawn C 
> > 
> > Signed-off-by: Cooper Chiou 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 +
> >   1 file changed, 38 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> > b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index 3b4a7da60f0b..4ad598a727a6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, 
> > struct i915_wa_list *wal)
> >   wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
> >   }
> >   
> > +static void
> > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal)
> > +{
> > + const struct sseu_dev_info *sseu = >gt.info.sseu;
> > + unsigned int slice, subslice;
> > + u32 mcr, mcr_mask;
> > +
> > + GEM_BUG_ON(INTEL_GEN(i915) < 9);
> > +
> > + /*
> > +  * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml
> > +  * Before any MMIO read into slice/subslice specific registers, MCR
> > +  * packet control register needs to be programmed to point to any
> > +  * enabled s/ss pair. Otherwise, incorrect values will be returned.
> > +  * This means each subsequent MMIO read will be forwarded to an
> > +  * specific s/ss combination, but this is OK since these registers
> > +  * are consistent across s/ss in almost all cases. In the rare
> > +  * occasions, such as INSTDONE, where this value is dependent
> > +  * on s/ss combo, the read should be done with read_subslice_reg.
> > +  */
> > + slice = fls(sseu->slice_mask) - 1;
> > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask));
> > + subslice = fls(intel_sseu_get_subslices(sseu, slice));
> > + GEM_BUG_ON(!subslice);
> > + subslice--;
> > +
> > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
> > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
> > +
> > + drm_dbg(>drm, "MCR slice:%d/subslice:%d = %x\n", slice, 
> > subslice, mcr);
> > +
> > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr);
> > +}
> 
> Have you considered reusing existing wa_init_mcr? Just needs the 
> top-level assert changed and otherwise it looks it would do the right 
> thing for Gen9. Advantage being a smaller patch and less code to carry.

That was the first patch, and fails for the same reason. The problem
would appear to be in the mcr_mask for gt3.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-03-04 Thread Chris Wilson
Quoting Lionel Landwerlin (2021-03-04 08:28:59)
> On 04/03/2021 02:09, Chris Wilson wrote:
> > Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
> >> Perf measurements rely on CPU and engine timestamps to correlate
> >> events of interest across these time domains. Current mechanisms get
> >> these timestamps separately and the calculated delta between these
> >> timestamps lack enough accuracy.
> >>
> >> To improve the accuracy of these time measurements to within a few us,
> >> add a query that returns the engine and cpu timestamps captured as
> >> close to each other as possible.
> >>
> >> v2: (Tvrtko)
> >> - document clock reference used
> >> - return cpu timestamp always
> >> - capture cpu time just before lower dword of cs timestamp
> >>
> >> v3: (Chris)
> >> - use uncore-rpm
> >> - use __query_cs_timestamp helper
> >>
> >> v4: (Lionel)
> >> - Kernel perf subsytem allows users to specify the clock id to be used
> >>in perf_event_open. This clock id is used by the perf subsystem to
> >>return the appropriate cpu timestamp in perf events. Similarly, let
> >>the user pass the clockid to this query so that cpu timestamp
> >>corresponds to the clock id requested.
> >>
> >> v5: (Tvrtko)
> >> - Use normal ktime accessors instead of fast versions
> >> - Add more uApi documentation
> >>
> >> v6: (Lionel)
> >> - Move switch out of spinlock
> >>
> >> v7: (Chris)
> >> - cs_timestamp is a misnomer, use cs_cycles instead
> >> - return the cs cycle frequency as well in the query
> >>
> >> v8:
> >> - Add platform and engine specific checks
> >>
> >> v9: (Lionel)
> >> - Return 2 cpu timestamps in the query - captured before and after the
> >>register read
> >>
> >> Signed-off-by: Umesh Nerlige Ramappa 
> >> ---
> >>   drivers/gpu/drm/i915/i915_query.c | 144 ++
> >>   include/uapi/drm/i915_drm.h   |  47 ++
> >>   2 files changed, 191 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> >> b/drivers/gpu/drm/i915/i915_query.c
> >> index fed337ad7b68..acca22ee6014 100644
> >> --- a/drivers/gpu/drm/i915/i915_query.c
> >> +++ b/drivers/gpu/drm/i915/i915_query.c
> >> @@ -6,6 +6,8 @@
> >>   
> >>   #include 
> >>   
> >> +#include "gt/intel_engine_pm.h"
> >> +#include "gt/intel_engine_user.h"
> >>   #include "i915_drv.h"
> >>   #include "i915_perf.h"
> >>   #include "i915_query.h"
> >> @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private 
> >> *dev_priv,
> >>  return total_length;
> >>   }
> >>   
> >> +typedef u64 (*__ktime_func_t)(void);
> >> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> >> +{
> >> +   /*
> >> +* Use logic same as the perf subsystem to allow user to select the
> >> +* reference clock id to be used for timestamps.
> >> +*/
> >> +   switch (clk_id) {
> >> +   case CLOCK_MONOTONIC:
> >> +   return _get_ns;
> >> +   case CLOCK_MONOTONIC_RAW:
> >> +   return _get_raw_ns;
> >> +   case CLOCK_REALTIME:
> >> +   return _get_real_ns;
> >> +   case CLOCK_BOOTTIME:
> >> +   return _get_boottime_ns;
> >> +   case CLOCK_TAI:
> >> +   return _get_clocktai_ns;
> >> +   default:
> >> +   return NULL;
> >> +   }
> >> +}
> >> +
> >> +static inline int
> >> +__read_timestamps(struct intel_uncore *uncore,
> >> + i915_reg_t lower_reg,
> >> + i915_reg_t upper_reg,
> >> + u64 *cs_ts,
> >> + u64 *cpu_ts,
> >> + __ktime_func_t cpu_clock)
> >> +{
> >> +   u32 upper, lower, old_upper, loop = 0;
> >> +
> >> +   upper = intel_uncore_read_fw(uncore, upper_reg);
> >> +   do {
> >> +   cpu_ts[0] = cpu_clock();
> >> +   lower = intel_uncore_read_fw(uncore, lower_reg);
> >> +   cpu_ts[1] = cpu_clock();
> >> +   old_upper = upper;
> 

Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-03-03 Thread Chris Wilson
Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00)
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
> 
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.
> 
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
> 
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
> 
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
> 
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
> 
> v6: (Lionel)
> - Move switch out of spinlock
> 
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
> 
> v8:
> - Add platform and engine specific checks
> 
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
> 
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_query.c | 144 ++
>  include/uapi/drm/i915_drm.h   |  47 ++
>  2 files changed, 191 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..acca22ee6014 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>  
>  #include 
>  
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private 
> *dev_priv,
> return total_length;
>  }
>  
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> +   /*
> +* Use logic same as the perf subsystem to allow user to select the
> +* reference clock id to be used for timestamps.
> +*/
> +   switch (clk_id) {
> +   case CLOCK_MONOTONIC:
> +   return _get_ns;
> +   case CLOCK_MONOTONIC_RAW:
> +   return _get_raw_ns;
> +   case CLOCK_REALTIME:
> +   return _get_real_ns;
> +   case CLOCK_BOOTTIME:
> +   return _get_boottime_ns;
> +   case CLOCK_TAI:
> +   return _get_clocktai_ns;
> +   default:
> +   return NULL;
> +   }
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> + i915_reg_t lower_reg,
> + i915_reg_t upper_reg,
> + u64 *cs_ts,
> + u64 *cpu_ts,
> + __ktime_func_t cpu_clock)
> +{
> +   u32 upper, lower, old_upper, loop = 0;
> +
> +   upper = intel_uncore_read_fw(uncore, upper_reg);
> +   do {
> +   cpu_ts[0] = cpu_clock();
> +   lower = intel_uncore_read_fw(uncore, lower_reg);
> +   cpu_ts[1] = cpu_clock();
> +   old_upper = upper;
> +   upper = intel_uncore_read_fw(uncore, upper_reg);

Both register reads comprise the timestamp returned to userspace, so
presumably you want cpu_ts[] to wrap both.

   do {
   old_upper = upper;

   cpu_ts[0] = cpu_clock();
   lower = intel_uncore_read_fw(uncore, lower_reg);
   upper = intel_uncore_read_fw(uncore, upper_reg);
   cpu_ts[1] = cpu_clock();
   } while (upper != old_upper && loop++ < 2);
-
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 13/16] drm/i915/pxp: User interface for Protected buffer

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:57)
> From: Bommu Krishnaiah 
> 
> This api allow user mode to create Protected buffers. Only contexts
> marked as protected are allowed to operate on protected buffers.
> 
> We only allow setting the flags at creation time.
> 
> All protected objects that have backing storage will be considered
> invalid when the session is destroyed and they won't be usable anymore.
> 
> This is a rework of the original code by Bommu Krishnaiah. I've
> authorship unchanged since significant chunks have not been modified.
> 
> v2: split context changes, fix defines and improve documentation (Chris),
> add object invalidation logic
> 
> Signed-off-by: Bommu Krishnaiah 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Telukuntla Sreedhar 
> Cc: Kondapally Kalyan 
> Cc: Gupta Anshuman 
> Cc: Huang Sean Z 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c| 27 +++--
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 +
>  drivers/gpu/drm/i915/gem/i915_gem_object.c|  6 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h| 12 ++
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp.c  | 40 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp.h  | 13 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_types.h|  5 +++
>  include/uapi/drm/i915_drm.h   | 22 ++
>  9 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 3ad3413c459f..d02e5938afbe 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -5,6 +5,7 @@
>  
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_region.h"
> +#include "pxp/intel_pxp.h"
>  
>  #include "i915_drv.h"
>  #include "i915_user_extensions.h"
> @@ -13,7 +14,8 @@ static int
>  i915_gem_create(struct drm_file *file,
> struct intel_memory_region *mr,
> u64 *size_p,
> -   u32 *handle_p)
> +   u32 *handle_p,
> +   u64 user_flags)
>  {
> struct drm_i915_gem_object *obj;
> u32 handle;
> @@ -35,12 +37,17 @@ i915_gem_create(struct drm_file *file,
>  
> GEM_BUG_ON(size != obj->base.size);
>  
> +   obj->user_flags = user_flags;
> +
> ret = drm_gem_handle_create(file, >base, );
> /* drop reference from allocate - handle holds it now */
> i915_gem_object_put(obj);
> if (ret)
> return ret;
>  
> +   if (user_flags & I915_GEM_OBJECT_PROTECTED)
> +   intel_pxp_object_add(obj);
> +
> *handle_p = handle;
> *size_p = size;
> return 0;
> @@ -89,11 +96,12 @@ i915_gem_dumb_create(struct drm_file *file,
> return i915_gem_create(file,
>intel_memory_region_by_type(to_i915(dev),
>mem_type),
> -  >size, >handle);
> +  >size, >handle, 0);
>  }
>  
>  struct create_ext {
> struct drm_i915_private *i915;
> +   unsigned long user_flags;
>  };
>  
>  static int __create_setparam(struct drm_i915_gem_object_param *args,
> @@ -104,6 +112,19 @@ static int __create_setparam(struct 
> drm_i915_gem_object_param *args,
> return -EINVAL;
> }
>  
> +   switch (lower_32_bits(args->param)) {
> +   case I915_OBJECT_PARAM_PROTECTED_CONTENT:
> +   if (!intel_pxp_is_enabled(_data->i915->gt.pxp))
> +   return -ENODEV;
> +   if (args->size) {
> +   return -EINVAL;
> +   } else if (args->data) {
> +   ext_data->user_flags |= I915_GEM_OBJECT_PROTECTED;
> +   return 0;
> +   }
> +   break;
> +   }
> +
> return -EINVAL;
>  }
>  
> @@ -148,5 +169,5 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
> return i915_gem_create(file,
>intel_memory_region_by_type(i915,
>
> INTEL_MEMORY_SYSTEM),
> -  >size, >handle);
> +  >size, >handle, 
> ext_data.user_flags);
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/ge

Re: [Intel-gfx] [PATCH v2 11/16] drm/i915/pxp: interface for creation of protected contexts

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:55)
> Usage of protected objects, coming in a follow-up patch, will be
> restricted to protected contexts. Contexts can only be marked as
> protected at creation time and they must be both bannable and not
> recoverable.
> 
> When a PXP teardown occurs, all gem contexts marked as protected that
> have been used at least once will be marked as invalid and all new
> submissions using them will be rejected. All intel contexts within the
> invalidated gem contexts will be marked banned.
> A new flag has been added to the RESET_STATS ioctl to report the
> invalidation to userspace.
> 
> v2: split to its own patch and improve doc (Chris), invalidate contexts
> on teardown
> 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Chris Wilson 
> Signed-off-by: Daniele Ceraolo Spurio 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 59 ++-
>  drivers/gpu/drm/i915/gem/i915_gem_context.h   | 18 ++
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  2 +
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 13 
>  drivers/gpu/drm/i915/pxp/intel_pxp.c  | 38 
>  drivers/gpu/drm/i915/pxp/intel_pxp.h  |  1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |  3 +
>  include/uapi/drm/i915_drm.h   | 19 ++
>  8 files changed, 150 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ca37d93ef5e7..19ac24a3c42c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -76,6 +76,8 @@
>  #include "gt/intel_gpu_commands.h"
>  #include "gt/intel_ring.h"
>  
> +#include "pxp/intel_pxp.h"
> +
>  #include "i915_drm_client.h"
>  #include "i915_gem_context.h"
>  #include "i915_globals.h"
> @@ -2006,6 +2008,40 @@ static int set_priority(struct i915_gem_context *ctx,
> return 0;
>  }
>  
> +static int set_protected(struct i915_gem_context *ctx,
> +const struct drm_i915_gem_context_param *args)
> +{
> +   int ret = 0;
> +
> +   if (!intel_pxp_is_enabled(>i915->gt.pxp))
> +   ret = -ENODEV;
> +   else if (ctx->client) /* can't change this after creation! */
> +   ret = -EEXIST;
> +   else if (args->size)
> +   ret = -EINVAL;
> +   else if (!args->value)
> +   clear_bit(UCONTEXT_PROTECTED, >user_flags);
> +   else if (i915_gem_context_is_recoverable(ctx) ||
> +!i915_gem_context_is_bannable(ctx))
> +   ret = -EPERM;
> +   else
> +   set_bit(UCONTEXT_PROTECTED, >user_flags);
> +
> +   return ret;
> +}
> +
> +static int get_protected(struct i915_gem_context *ctx,
> +struct drm_i915_gem_context_param *args)
> +{
> +   if (!intel_pxp_is_enabled(>i915->gt.pxp))
> +   return -ENODEV;
> +
> +   args->size = 0;
> +   args->value = i915_gem_context_can_use_protected_content(ctx);
> +
> +   return 0;
> +}
> +
>  static int ctx_setparam(struct drm_i915_file_private *fpriv,
> struct i915_gem_context *ctx,
> struct drm_i915_gem_context_param *args)
> @@ -2038,6 +2074,8 @@ static int ctx_setparam(struct drm_i915_file_private 
> *fpriv,
> ret = -EPERM;
> else if (args->value)
> i915_gem_context_set_bannable(ctx);
> +   else if (i915_gem_context_can_use_protected_content(ctx))
> +   ret = -EPERM; /* can't clear this for protected 
> contexts */
> else
> i915_gem_context_clear_bannable(ctx);
> break;
> @@ -2045,10 +2083,12 @@ static int ctx_setparam(struct drm_i915_file_private 
> *fpriv,
> case I915_CONTEXT_PARAM_RECOVERABLE:
> if (args->size)
> ret = -EINVAL;
> -   else if (args->value)
> -   i915_gem_context_set_recoverable(ctx);
> -   else
> +   else if (!args->value)
> i915_gem_context_clear_recoverable(ctx);
> +   else if (i915_gem_context_can_use_protected_content(ctx))
> +   ret = -EPERM; /* can't set this for protected 
> contexts */
> +   else
> +   i915_gem_context_set_recoverable(ctx);
> break;
>  
> case I915_CONTE

Re: [Intel-gfx] [PATCH v2 10/16] drm/i915/pxp: Enable PXP power management

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:54)
> +int intel_pxp_runtime_resume(struct intel_pxp *pxp)
> +{
> +   struct intel_gt *gt = pxp_to_gt(pxp);
> +   int ret;
> +
> +   if (!intel_pxp_is_enabled(pxp))
> +   return 0;
> +
> +   intel_pxp_irq_enable(pxp);
> +   pxp->global_state_in_suspend = false;
> +
> +   /*
> +* if the display loses power during runtime suspend it will cause the
> +* session to become invalid, so to be safe we always re-create it. 
> The
> +* MEI module is still bound, so this is the same as a teardown event,
> +* hence we can just pretend we received the irq.
> +*/
> +   intel_pxp_mark_termination_in_progress(pxp);
> +
> +   spin_lock_irq(>irq_lock);
> +   intel_pxp_irq_handler(pxp, 
> GEN12_DISPLAY_PXP_STATE_TERMINATED_INTERRUPT);
> +   spin_unlock_irq(>irq_lock);
> +
> +   return ret;

return random() ?

> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> index 488006a0cf39..bb981d38c2fe 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> @@ -25,8 +25,14 @@ static bool intel_pxp_session_is_in_play(struct intel_pxp 
> *pxp, u32 id)
> intel_wakeref_t wakeref;
> u32 sip = 0;
>  
> -   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
> -   sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP);
> +   /* if we're suspended the session is considered off */
> +   wakeref = intel_runtime_pm_get_if_in_use(gt->uncore->rpm);
> +   if (!wakeref)
> +   return false;
> +
> +   sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP);
> +
> +   intel_runtime_pm_put(gt->uncore->rpm, wakeref);

with_intel_runtime_pm_if_in_use(gt->uncore->rpm, wakeref)
sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP);

>  
> return sip & BIT(id);
>  }
> @@ -43,12 +49,18 @@ static int pxp_wait_for_session_state(struct intel_pxp 
> *pxp, u32 id, bool in_pla
> u32 mask = BIT(id);
> int ret;
>  
> -   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
> -   ret = intel_wait_for_register(gt->uncore,
> - GEN12_KCR_SIP,
> - mask,
> - in_play ? mask : 0,
> - 100);
> +   /* if we're suspended the session is considered off */
> +   wakeref = intel_runtime_pm_get_if_in_use(gt->uncore->rpm);
> +   if (!wakeref)
> +   return in_play ? -ENODEV : 0;
> +
> +   ret = intel_wait_for_register(gt->uncore,
> + GEN12_KCR_SIP,
> + mask,
> + in_play ? mask : 0,
> + 100);
> +
> +   intel_runtime_pm_put(gt->uncore->rpm, wakeref);

Ditto
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53)
> +static int pxp_terminate(struct intel_pxp *pxp)
> +{
> +   int ret = 0;
> +
> +   mutex_lock(>mutex);
> +
> +   pxp->global_state_attacked = true;

global_state_attacked is serialised by pxp->work

> +
> +   ret = intel_pxp_arb_terminate_session_with_global_terminate(pxp);
> +
> +   mutex_unlock(>mutex);
> +
> +   return ret;

Fake error returns.

> +}
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53)
> From: "Huang, Sean Z" 
> 
> The HW will generate a teardown interrupt when session termination is
> required, which requires i915 to submit a terminating batch. Once the HW
> is done with the termination it will generate another interrupt, at
> which point it is safe to re-create the session.

Why do we do the auto recreation after the teardown interrupt?

> 
> v2: use struct completion instead of bool (Chris)
> 
> Signed-off-by: Huang, Sean Z 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/Makefile|   1 +
>  drivers/gpu/drm/i915/gt/intel_gt_irq.c   |   7 +
>  drivers/gpu/drm/i915/i915_reg.h  |   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp.c |  34 +
>  drivers/gpu/drm/i915/pxp/intel_pxp.h |  16 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 151 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp_irq.h |  33 
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.c |   9 +-
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.h |   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |  10 +-
>  drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |   8 +
>  11 files changed, 268 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 8b605f326039..5e9bd34dec38 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -274,6 +274,7 @@ i915-y += i915_perf.o
>  i915-$(CONFIG_DRM_I915_PXP) += \
> pxp/intel_pxp.o \
> pxp/intel_pxp_cmd.o \
> +   pxp/intel_pxp_irq.o \
> pxp/intel_pxp_session.o \
> pxp/intel_pxp_tee.o
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index d29126c458ba..0d3585efe2b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -13,6 +13,7 @@
>  #include "intel_lrc_reg.h"
>  #include "intel_uncore.h"
>  #include "intel_rps.h"
> +#include "pxp/intel_pxp_irq.h"
>  
>  static void guc_irq_handler(struct intel_guc *guc, u16 iir)
>  {
> @@ -64,6 +65,9 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 
> instance,
> if (instance == OTHER_GTPM_INSTANCE)
> return gen11_rps_irq_handler(>rps, iir);
>  
> +   if (instance == OTHER_KCR_INSTANCE)
> +   return intel_pxp_irq_handler(>pxp, iir);
> +
> WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n",
>   instance, iir);
>  }
> @@ -190,6 +194,9 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
> intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK,  ~0);
> intel_uncore_write(uncore, GEN11_GUC_SG_INTR_ENABLE, 0);
> intel_uncore_write(uncore, GEN11_GUC_SG_INTR_MASK,  ~0);
> +
> +   intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_ENABLE, 0);
> +   intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_MASK,  ~0);
>  }
>  
>  void gen11_gt_irq_postinstall(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index e5dd0203991b..97a6d0c638ec 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7958,6 +7958,7 @@ enum {
>  /* irq instances for OTHER_CLASS */
>  #define OTHER_GUC_INSTANCE 0
>  #define OTHER_GTPM_INSTANCE1
> +#define OTHER_KCR_INSTANCE 4
>  
>  #define GEN11_INTR_IDENTITY_REG(x) _MMIO(0x190060 + ((x) * 4))
>  
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> index cbec9395bde9..0ca1c2c16972 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> @@ -2,7 +2,9 @@
>  /*
>   * Copyright(c) 2020 Intel Corporation.
>   */
> +#include 
>  #include "intel_pxp.h"
> +#include "intel_pxp_irq.h"
>  #include "intel_pxp_tee.h"
>  #include "gt/intel_context.h"
>  #include "i915_drv.h"
> @@ -67,12 +69,23 @@ void intel_pxp_init(struct intel_pxp *pxp)
>  
> mutex_init(>mutex);
>  
> +   /*
> +* we'll use the completion to check if there is a termination 
> pending,
> +* so we start it as completed and we reinit it when a termination
> +* is triggered.
> +*/
> +   init_completion(>termination);
> +   complete_all(>termination);
> +
> kcr_p

Re: [Intel-gfx] [PATCH v2 04/16] drm/i915/pxp: allocate a vcs context for pxp usage

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:48)
> @@ -232,6 +235,13 @@ ktime_t intel_engine_get_busy_time(struct 
> intel_engine_cs *engine,
>  
>  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
>  
> +struct intel_context *
> +intel_engine_pinned_context_create(struct intel_engine_cs *engine,
> +  unsigned int hwsp,
> +  struct lock_class_key *key,
> +  const char *name);

intel_engine_create_pinned_context()

Other users would want to pass in vm and ring size (ring size may be
useful here as well for larger SESSION_TERMINATE_LEN())
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 07/16] drm/i915/pxp: Create the arbitrary session after boot

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:51)
> +static inline bool intel_pxp_is_active(const struct intel_pxp *pxp)
> +{
> +   return pxp->arb_is_in_play;
> +}

> +static bool intel_pxp_session_is_in_play(struct intel_pxp *pxp, u32 id)
> +{
> +   struct intel_gt *gt = pxp_to_gt(pxp);
> +   intel_wakeref_t wakeref;
> +   u32 sip = 0;
> +
> +   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
> +   sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP);
> +
> +   return sip & BIT(id);
> +}
> +
> +bool intel_pxp_arb_session_is_in_play(struct intel_pxp *pxp)
> +{
> +   return intel_pxp_session_is_in_play(pxp, ARB_SESSION);
> +}

So pxp->arb_is_in_play is not the same as intel_pxp_arb_session_is_in_play().

That's confusing.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 08/16] drm/i915/pxp: Implement arb session teardown

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:52)
> From: "Huang, Sean Z" 
> 
> Teardown is triggered when the display topology changes and no
> long meets the secure playback requirement, and hardware trashes
> all the encryption keys for display. Additionally, we want to emit a
> teardown operation to make sure we're clean on boot and resume
> 
> v2: emit in the ring, use high prio request (Chris)
> 
> Signed-off-by: Huang, Sean Z 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/Makefile|   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c | 166 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h |  15 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  38 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.h |   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |   5 +-
>  6 files changed, 225 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index d6d510e4875e..8b605f326039 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -273,6 +273,7 @@ i915-y += i915_perf.o
>  # Protected execution platform (PXP) support
>  i915-$(CONFIG_DRM_I915_PXP) += \
> pxp/intel_pxp.o \
> +   pxp/intel_pxp_cmd.o \
> pxp/intel_pxp_session.o \
> pxp/intel_pxp_tee.o
>  
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c
> new file mode 100644
> index ..ffab09839cd3
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright(c) 2020, Intel Corporation. All rights reserved.
> + */
> +
> +#include "intel_pxp.h"
> +#include "intel_pxp_session.h"
> +#include "gt/intel_context.h"
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_gpu_commands.h"
> +#include "gt/intel_ring.h"
> +
> +#include "i915_trace.h"
> +
> +/* PXP GPU command definitions */
> +
> +/* MI_SET_APPID */
> +#define   MI_SET_APPID_SESSION_ID(x)((x) << 0)
> +
> +/* MI_FLUSH_DW */
> +#define   MI_FLUSH_DW_DW0_PROTECTED_MEMORY_ENABLE   BIT(22)
> +
> +/* MI_WAIT */
> +#define   MFX_WAIT_DW0_PXP_SYNC_CONTROL_FLAG BIT(9)
> +#define   MFX_WAIT_DW0_MFX_SYNC_CONTROL_FLAG  BIT(8)

We've been using REG_BIT() for the explicit (u32) casting.

> +/* CRYPTO_KEY_EXCHANGE */
> +#define CRYPTO_KEY_EXCHANGE ((0x3 << 29) | (0x01609 << 16))

#define __INSTR(client) ((client) << INSTR_CLIENT_SHIFT)

#define MI_INSTR(opcode, flags) \
(__INSTR(INSTR_MI_CLIENT) | (opcode) << 23 | (flags))
#define RC_INTR(foo) (__INSTR(INSTR_RC_CLIENT) | (foo) << 16)

#define CRYPTO_KEY_EXCHANGE RC_INSTR(0x1609)

With a better (foo).

> +
> +/* stall until prior PXP and MFX/HCP/HUC objects are cmopleted */
> +#define MFX_WAIT_PXP \
> +   MFX_WAIT | \
> +   MFX_WAIT_DW0_PXP_SYNC_CONTROL_FLAG | \
> +   MFX_WAIT_DW0_MFX_SYNC_CONTROL_FLAG;
> +
> +static u32 *pxp_emit_session_selection(u32 *cs, u32 idx)
> +{
> +   *cs++ = MFX_WAIT_PXP;

One day someone will proofread bspec.

> +   /* pxp off */
> +   *cs++ = MI_FLUSH_DW;
> +   *cs++ = 0;
> +   *cs++ = 0;

Hmm. Can the immediate data be dropped? TIL.

> +   /* select session */
> +   *cs++ = MI_SET_APPID | MI_SET_APPID_SESSION_ID(idx);
> +
> +   *cs++ = MFX_WAIT_PXP;
> +
> +   /* pxp on */
> +   *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_DW0_PROTECTED_MEMORY_ENABLE;
> +   *cs++ = 0;
> +   *cs++ = 0;

Bspec says "after completion of the flush...", which says to me we
should not initiate the wait until after the flush, so we would need a
post-sync op here to stall the CS (or else we may complete the wait
before the operation is begun). I don't see any programming notes to
that effect, so could just be my paranoia from handling atomics.

> +   *cs++ = MFX_WAIT_PXP;

Fwiw, the bspec language would seem to imply that nothing should happen
with this wait at this point. Perhaps more reason to make the pxp-on
MI_FLUSH_DW be synchronous. (Though we will have a sync point at the
breadcrumb, so meh.)

> +
> +   return cs;
> +}
> +
> +static u32 *pxp_emit_inline_termination(u32 *cs)
> +{
> +   /* session inline termination */
> +   *cs++ = CRYPTO_KEY_EXCHANGE;
> +   *cs++ = 0;
> +
> +   return cs;
> +}
> +
> +static u32 *pxp_emit_wait(u32 *cs)
> +{
> +   /* wait 

Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler

2021-03-03 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53)
> From: "Huang, Sean Z" 
> 
> The HW will generate a teardown interrupt when session termination is
> required, which requires i915 to submit a terminating batch. Once the HW
> is done with the termination it will generate another interrupt, at
> which point it is safe to re-create the session.
> 
> v2: use struct completion instead of bool (Chris)
> 
> Signed-off-by: Huang, Sean Z 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/Makefile|   1 +
>  drivers/gpu/drm/i915/gt/intel_gt_irq.c   |   7 +
>  drivers/gpu/drm/i915/i915_reg.h  |   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp.c |  34 +
>  drivers/gpu/drm/i915/pxp/intel_pxp.h |  16 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 151 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp_irq.h |  33 
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.c |   9 +-
>  drivers/gpu/drm/i915/pxp/intel_pxp_session.h |   1 +
>  drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |  10 +-
>  drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |   8 +
>  11 files changed, 268 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
>  create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 8b605f326039..5e9bd34dec38 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -274,6 +274,7 @@ i915-y += i915_perf.o
>  i915-$(CONFIG_DRM_I915_PXP) += \
> pxp/intel_pxp.o \
> pxp/intel_pxp_cmd.o \
> +   pxp/intel_pxp_irq.o \
> pxp/intel_pxp_session.o \
> pxp/intel_pxp_tee.o
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index d29126c458ba..0d3585efe2b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -13,6 +13,7 @@
>  #include "intel_lrc_reg.h"
>  #include "intel_uncore.h"
>  #include "intel_rps.h"
> +#include "pxp/intel_pxp_irq.h"
>  
>  static void guc_irq_handler(struct intel_guc *guc, u16 iir)
>  {
> @@ -64,6 +65,9 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 
> instance,
> if (instance == OTHER_GTPM_INSTANCE)
> return gen11_rps_irq_handler(>rps, iir);
>  
> +   if (instance == OTHER_KCR_INSTANCE)
> +   return intel_pxp_irq_handler(>pxp, iir);
> +
> WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n",
>   instance, iir);
>  }
> @@ -190,6 +194,9 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
> intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK,  ~0);
> intel_uncore_write(uncore, GEN11_GUC_SG_INTR_ENABLE, 0);
> intel_uncore_write(uncore, GEN11_GUC_SG_INTR_MASK,  ~0);
> +
> +   intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_ENABLE, 0);
> +   intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_MASK,  ~0);
>  }
>  
>  void gen11_gt_irq_postinstall(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index e5dd0203991b..97a6d0c638ec 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7958,6 +7958,7 @@ enum {
>  /* irq instances for OTHER_CLASS */
>  #define OTHER_GUC_INSTANCE 0
>  #define OTHER_GTPM_INSTANCE1
> +#define OTHER_KCR_INSTANCE 4
>  
>  #define GEN11_INTR_IDENTITY_REG(x) _MMIO(0x190060 + ((x) * 4))
>  
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> index cbec9395bde9..0ca1c2c16972 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> @@ -2,7 +2,9 @@
>  /*
>   * Copyright(c) 2020 Intel Corporation.
>   */
> +#include 
>  #include "intel_pxp.h"
> +#include "intel_pxp_irq.h"
>  #include "intel_pxp_tee.h"
>  #include "gt/intel_context.h"
>  #include "i915_drv.h"
> @@ -67,12 +69,23 @@ void intel_pxp_init(struct intel_pxp *pxp)
>  
> mutex_init(>mutex);
>  
> +   /*
> +* we'll use the completion to check if there is a termination 
> pending,
> +* so we start it as completed and we reinit it when a termination
> +* is triggered.
> +*/
> +   init_completion(>termination);
> +   complete_all(>termination);
> +
> kcr_pxp_enable(gt);
>  
> ret = create_v

Re: [Intel-gfx] [PATCH] drm/i915: Verify dma_addr in gen8_ggtt_pte_encode

2021-03-02 Thread Chris Wilson
Quoting Piorkowski, Piotr (2021-02-24 15:29:25)
> From: Piotr Piórkowski 
> 
> Until now, the gen8_ggtt_pte_encode function, responsible for the preparation
> of GGTT PTE, has not verified in any way whether the address given as the
> parameter is correct.
> By adding a GGTT address mask, we can easily verify that dma_addr will not fit
> in the PTE field.
> While around, cleanup a place where we hold all GEN12 GGTT PTE masks,
> and the addition of the PTE description.
> 
> Bspec: 45015
> 
> Signed-off-by: Piotr Piórkowski 
> Cc: Matthew Auld 
> Cc: Michal Wajdeczko 
> Cc: Michal Winiarski 
> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c |  2 ++
>  drivers/gpu/drm/i915/gt/intel_gtt.h  | 13 -
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index b0b8ded834f0..52b2428da431 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -193,6 +193,8 @@ static u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>  {
> gen8_pte_t pte = addr | _PAGE_PRESENT;
>  
> +   GEM_BUG_ON(addr & ~GEN12_GGTT_PTE_ADDR_MASK);

You can also check the dma_get_mask() doesn't exceed the addr mask.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Wedge the GPU if command parser setup fails

2021-03-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-26 09:51:54)
> From: Tvrtko Ursulin 
> 
> Commit 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing")
> introduced mandatory command parsing but setup failures were not
> translated into wedging the GPU which was probably the intent.
> 
> Possible errors come in two categories. Either the sanity check on
> internal tables has failed, which should be caught in CI unless an
> affected platform would be missed in testing; or memory allocation failure
> happened during driver load, which should be extremely unlikely but for
> correctness should still be handled.
> 
> Signed-off-by: Tvrtko Ursulin 
> Fixes: 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing")
> Cc: Jon Bloomfield 
> Cc: Joonas Lahtinen 
> Cc: Chris Wilson 
> ---
> To catchup with referenced patch we'd need to copy stable from v5.4+.
> Failures are very unlikely but I think it would still be prudent to do so.
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  8 +++-
>  drivers/gpu/drm/i915/i915_cmd_parser.c| 19 +--
>  drivers/gpu/drm/i915/i915_drv.h   |  2 +-
>  3 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 577ebd4a324f..ac6478c4ede4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -696,6 +696,11 @@ static int engine_setup_common(struct intel_engine_cs 
> *engine)
> goto err_status;
> }
>  
> +   err = intel_engine_init_cmd_parser(engine);
> +   if (err) {
> +   goto err_cmd_parser;
> +   }

-{}

> +
> i915_sched_init(>sched,
> engine->i915->drm.dev,
> engine->name,
> @@ -703,7 +708,6 @@ static int engine_setup_common(struct intel_engine_cs 
> *engine)
> ENGINE_PHYSICAL);
>  
> intel_engine_init_execlists(engine);
> -   intel_engine_init_cmd_parser(engine);
> intel_engine_init__pm(engine);
> intel_engine_init_retire(engine);
>  
> @@ -720,6 +724,8 @@ static int engine_setup_common(struct intel_engine_cs 
> *engine)
>  
> return 0;
>  
> +err_cmd_parser:
> +   intel_breadcrumbs_free(engine->breadcrumbs);
>  err_status:
> cleanup_status_page(engine);
> return err;
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
> b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index ced9a96d7c34..13905891aff7 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -940,7 +940,7 @@ static void fini_hash_table(struct intel_engine_cs 
> *engine)
>   * struct intel_engine_cs based on whether the platform requires software
>   * command parsing.
>   */
> -void intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
> +int intel_engine_init_cmd_parser(struct intel_engine_cs *engine)
>  {
> const struct drm_i915_cmd_table *cmd_tables;
> int cmd_table_count;
> @@ -948,7 +948,7 @@ void intel_engine_init_cmd_parser(struct intel_engine_cs 
> *engine)
>  
> if (!IS_GEN(engine->i915, 7) && !(IS_GEN(engine->i915, 9) &&
>   engine->class == COPY_ENGINE_CLASS))
> -   return;
> +   return 0;
>  
> switch (engine->class) {
> case RENDER_CLASS:
> @@ -1013,19 +1013,19 @@ void intel_engine_init_cmd_parser(struct 
> intel_engine_cs *engine)
> break;
> default:
> MISSING_CASE(engine->class);
> -   return;
> +   goto out;
> }
>  
> if (!validate_cmds_sorted(engine, cmd_tables, cmd_table_count)) {
> drm_err(>i915->drm,
> "%s: command descriptions are not sorted\n",
> engine->name);
> -   return;
> +   goto out;
> }
> if (!validate_regs_sorted(engine)) {
> drm_err(>i915->drm,
> "%s: registers are not sorted\n", engine->name);
> -   return;
> +   goto out;
> }
>  
> ret = init_hash_table(engine, cmd_tables, cmd_table_count);
> @@ -1033,10 +1033,17 @@ void intel_engine_init_cmd_parser(struct 
> intel_engine_cs *engine)
> drm_err(>i915->drm,
> "%s: initialised failed!\n", engine-&

[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-20 Thread Chris Wilson
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 168 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.

If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.

v2: Move the guard from modifying drm_mm_node.start which is still used
by the drm_mm itself, into an adjustment of node.start at the point of
use.

v3: Pass the requested guard padding from the caller, so we can drop the
VT-d w/a knowledge from the i915_vma allocator.

v4: Bump minimum padding to 168 PTE and cautiously ensure that a full
tile row around the vma is included with the guard.

Signed-off-by: Chris Wilson 
Cc: Ville Syrjälä 
Cc: Matthew Auld 
Cc: Imre Deak 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++
 drivers/gpu/drm/i915/gt/intel_ggtt.c   | 25 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 drivers/gpu/drm/i915/i915_vma.c|  8 +++
 4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 0478b069c202..32b13af0d3df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -16,6 +16,8 @@
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
+#define VTD_GUARD (168u * I915_GTT_PAGE_SIZE) /* 168 or tile-row PTE padding */
+
 static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
return !(obj->cache_level == I915_CACHE_NONE ||
@@ -345,6 +347,17 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (ret)
goto err;
 
+   /* VT-d may overfetch before/after the vma, so pad with scratch */
+   if (intel_scanout_needs_vtd_wa(i915)) {
+   unsigned int guard = VTD_GUARD;
+
+   if (i915_gem_object_is_tiled(obj))
+   guard = max(guard,
+   i915_gem_object_get_tile_row_size(obj));
+
+   flags |= PIN_OFFSET_GUARD | guard;
+   }
+
/*
 * As the user may map the buffer once pinned in the display plane
 * (e.g. libkms for the bootup splash), we have to ensure that we
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 6b326138e765..251b50884d1c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm,
 {
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-   gen8_pte_t __iomem *gtt_base =
-   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-   int i;
-
-   if (WARN(num_entries > max_entries,
-"First entry = %d; Num entries = %d (max=%d)\n",
-first_entry, num_entries, max_entries))
-   num_entries = max_entries;
-
-   for (i = 0; i < num_entries; i++)
-   gen8_set_pte(_base[i], scratch_pte);
-}
-
 static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
 {
/*
@@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
ggtt->vm.clear_range = nop_clear_range;
-   if (intel_scanout_needs_vtd_wa(i915))
-   ggtt->vm.clear_range = gen8_ggtt_clear_range;
 
ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
 
@@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.alloc_pt_dma = alloc_pt_dma;
 
ggtt->vm.clear_range = nop_clear_range;
-   if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
+   if (!HAS_FULL_PPGTT(i915))
ggtt->vm.clear_range = gen6_ggtt_clear_range;
ggtt->vm.insert_page = gen6_ggtt_insert_page;
ggtt-

[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-20 Thread Chris Wilson
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 168 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.

If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.

v2: Move the guard from modifying drm_mm_node.start which is still used
by the drm_mm itself, into an adjustment of node.start at the point of
use.

v3: Pass the requested guard padding from the caller, so we can drop the
VT-d w/a knowledge from the i915_vma allocator.

v4: Bump minimum padding to 168 PTE and cautiously ensure that a full
tile row around the vma is included with the guard.

Signed-off-by: Chris Wilson 
Cc: Ville Syrjälä 
Cc: Matthew Auld 
Cc: Imre Deak 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++
 drivers/gpu/drm/i915/gt/intel_ggtt.c   | 25 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 drivers/gpu/drm/i915/i915_vma.c|  8 +++
 4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 0478b069c202..32b13af0d3df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -16,6 +16,8 @@
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
+#define VTD_GUARD (168u * I915_GTT_PAGE_SIZE) /* 168 or tile-row PTE padding */
+
 static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
return !(obj->cache_level == I915_CACHE_NONE ||
@@ -345,6 +347,17 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (ret)
goto err;
 
+   /* VT-d may overfetch before/after the vma, so pad with scratch */
+   if (intel_scanout_needs_vtd_wa(i915)) {
+   unsigned int guard = VTD_GUARD;
+
+   if (i915_gem_object_is_tiled(obj))
+   guard = max(guard,
+   i915_gem_object_get_tile_row_size(obj));
+
+   flags |= PIN_OFFSET_GUARD | guard;
+   }
+
/*
 * As the user may map the buffer once pinned in the display plane
 * (e.g. libkms for the bootup splash), we have to ensure that we
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 6b326138e765..251b50884d1c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm,
 {
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-   gen8_pte_t __iomem *gtt_base =
-   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-   int i;
-
-   if (WARN(num_entries > max_entries,
-"First entry = %d; Num entries = %d (max=%d)\n",
-first_entry, num_entries, max_entries))
-   num_entries = max_entries;
-
-   for (i = 0; i < num_entries; i++)
-   gen8_set_pte(_base[i], scratch_pte);
-}
-
 static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
 {
/*
@@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
ggtt->vm.clear_range = nop_clear_range;
-   if (intel_scanout_needs_vtd_wa(i915))
-   ggtt->vm.clear_range = gen8_ggtt_clear_range;
 
ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
 
@@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.alloc_pt_dma = alloc_pt_dma;
 
ggtt->vm.clear_range = nop_clear_range;
-   if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
+   if (!HAS_FULL_PPGTT(i915))
ggtt->vm.clear_range = gen6_ggtt_clear_range;
ggtt->vm.insert_page = gen6_ggtt_insert_page;
ggtt-

Re: [Intel-gfx] [PATCH] drm/i915: Wait for scanout to stop when sanitizing planes

2021-02-17 Thread Chris Wilson
Quoting Ville Syrjala (2021-02-17 16:20:50)
> From: Ville Syrjälä 
> 
> When we sanitize planes let's wait for the scanout to stop
> before we let the subsequent code tear down the ggtt mappings
> and whatnot. Cures an underrun on my ivb when I boot with
> VT-d enabled and the BIOS fb gets thrown out due to stolen
> being considered unusable with VT-d active.
> 
> Signed-off-by: Ville Syrjälä 
> ---
>  drivers/gpu/drm/i915/display/intel_display.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
> b/drivers/gpu/drm/i915/display/intel_display.c
> index acade004e8b1..3e2c192ec708 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -2403,6 +2403,7 @@ static void intel_plane_disable_noatomic(struct 
> intel_crtc *crtc,
> intel_set_cpu_fifo_underrun_reporting(dev_priv, crtc->pipe, 
> false);
>  
> intel_disable_plane(plane, crtc_state);
> +   intel_wait_for_vblank(dev_priv, crtc->pipe);

I could only find paths here from sanitize, so it looks safe from the
prospect of adding random delays to userspace.

Makes sense so,
Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy

2021-02-16 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-16 15:59:33)
> 
> On 16/02/2021 12:49, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2021-02-16 10:50:50)
> >> From: Tvrtko Ursulin 
> >>
> >> Test that periodic reads of engine busyness against a constant 100% load
> >> are within the 5000ppm tolerance when comparing perf timestamp versus
> >> counter values.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   tests/i915/perf_pmu.c | 46 ++-
> >>   1 file changed, 41 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c
> >> index 50b5c82bc472..728312be5293 100644
> >> --- a/tests/i915/perf_pmu.c
> >> +++ b/tests/i915/perf_pmu.c
> >> @@ -26,6 +26,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -46,6 +47,7 @@
> >>   #include "igt_perf.h"
> >>   #include "igt_sysfs.h"
> >>   #include "igt_pm.h"
> >> +#include "igt_stats.h"
> >>   #include "sw_sync.h"
> >>   
> >>   IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
> >> @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, 
> >> unsigned int flags)
> >>   static void
> >>   single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int 
> >> flags)
> >>   {
> >> +   unsigned int loops = flags & FLAG_LONG ? 20 : 1;
> >> +   double err_min = DBL_MAX, err_max = -DBL_MAX;
> >>  unsigned long slept;
> >>  igt_spin_t *spin;
> >> +   igt_stats_t s;
> >>  uint64_t val;
> >>  int fd;
> >>   
> >> @@ -290,11 +295,40 @@ single(int gem_fd, const struct 
> >> intel_execution_engine2 *e, unsigned int flags)
> >>  else
> >>  spin = NULL;
> >>   
> >> -   val = pmu_read_single(fd);
> >> -   slept = measured_usleep(batch_duration_ns / 1000);
> >> -   if (flags & TEST_TRAILING_IDLE)
> >> -   end_spin(gem_fd, spin, flags);
> >> -   val = pmu_read_single(fd) - val;
> >> +   igt_stats_init_with_size(, loops);
> >> +
> >> +   while (--loops) {
> > 
> > while (loops--)
> > 
> > /o\
> 
> Yeah.. At least I know the oddity is related to sampling. Since even on 
> Haswell:
> 
> (perf_pmu:1591) DEBUG: time=500207720 busy=500037022 error=-341.25ppm
> (perf_pmu:1591) DEBUG: time=500252520 busy=500033517 error=-437.78ppm
> (perf_pmu:1591) DEBUG: time=500187490 busy=49817 error=-375.21ppm
> (perf_pmu:1591) DEBUG: time=500244871 busy=49837 error=-489.83ppm
> (perf_pmu:1591) DEBUG: time=500268670 busy=49477 error=-538.10ppm
> (perf_pmu:1591) DEBUG: time=500245246 busy=50432 error=-489.39ppm
> (perf_pmu:1591) DEBUG: time=500245735 busy=49306 error=-492.62ppm
> (perf_pmu:1591) DEBUG: time=500270045 busy=51747 error=-536.31ppm
> (perf_pmu:1591) DEBUG: time=500254286 busy=48162 error=-511.99ppm
> (perf_pmu:1591) DEBUG: time=500247790 busy=50347 error=-494.64ppm
> (perf_pmu:1591) DEBUG: time=500250261 busy=50257 error=-499.76ppm
> (perf_pmu:1591) DEBUG: time=500250005 busy=58177 error=-483.41ppm
> (perf_pmu:1591) DEBUG: time=500249065 busy=41867 error=-514.14ppm
> (perf_pmu:1591) DEBUG: time=500249725 busy=50371 error=-498.46ppm
> (perf_pmu:1591) DEBUG: time=500250335 busy=49772 error=-500.88ppm
> (perf_pmu:1591) DEBUG: time=500258691 busy=49937 error=-517.24ppm
> (perf_pmu:1591) DEBUG: time=500239980 busy=51037 error=-477.66ppm
> (perf_pmu:1591) DEBUG: time=500240791 busy=504999361 error=9512.56ppm
> 
> And this last one is way more than one sampling period. I'll be thinking 
> about this in the background.

One thing to add would be the cumulative error. It does feel like a
corrective factor is applied to the sampling period.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Readout conn_state->max_bpc

2021-02-16 Thread Chris Wilson
Quoting Ville Syrjala (2021-02-16 16:00:35)
> From: Ville Syrjälä 
> 
> Populate conn_state->max_bpc with something sensible from the start.
> Otherwise it's possible that we get to compute_sink_pipe_bpp() with
> max_bpc==0.
> 
> The specific scenario goes as follows:
> 1. Initial connector state allocated with max_bpc==0
> 2. Trigger a modeset on the crtc feeding the connector, without
>actually adding the connector to the commit
> 3. drm_atomic_connector_check() is skipped because the
>connector has not yet been added, hence conn_state->max_bpc
>retains its current value
> 4. drm_atomic_helper_check_modeset() ->
>drm_atomic_add_affected_connectors() -> the connector
>is now part of the commit
> 5. compute_baseline_pipe_bpp() -> MISSING_CASE(max_bpc==0)
> 
> Note that pipe_bpp itself may not be populated on pre-g4x machines,
> in which case we just fall back to max_bpc==8 and let .compute_config()
> limit the resulting pipe_bpp further if necessary.
> 
> Cc: Daniel Vetter 
> Reported-by: Chris Wilson 
> Signed-off-by: Ville Syrjälä 
Tested-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers

2021-02-16 Thread Chris Wilson
In light of the VT-d workarounds, we may introduce padding around the
scanout vma. This should not affect relocations referencing the scanout
on !full-ppgtt where we leak the GGTT address of scanout to users.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Reviewed-by: Matthew Auld 
---
 tests/i915/gem_exec_reloc.c | 102 
 1 file changed, 102 insertions(+)

diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c
index cc9b8cd6d..23057f76e 100644
--- a/tests/i915/gem_exec_reloc.c
+++ b/tests/i915/gem_exec_reloc.c
@@ -26,7 +26,9 @@
 
 #include "i915/gem.h"
 #include "igt.h"
+#include "igt_device.h"
 #include "igt_dummyload.h"
+#include "igt_kms.h"
 #include "sw_sync.h"
 
 IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations.");
@@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common)
igt_assert_eq(result, 0);
 }
 
+static uint32_t
+pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb)
+{
+   drmModeModeInfoPtr mode = igt_output_get_mode(output);
+   igt_plane_t *primary;
+
+   igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB,
+ LOCAL_I915_FORMAT_MOD_X_TILED, fb);
+
+   primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+   igt_plane_set_fb(primary, fb);
+
+   igt_display_commit2(dpy, COMMIT_LEGACY);
+
+   return fb->gem_handle;
+}
+
+static void scanout(int i915,
+   igt_display_t *dpy,
+   const struct intel_execution_engine2 *e)
+{
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_exec_object2 obj[2] = {
+   [1] = { .handle = batch_create(i915) },
+   };
+   struct drm_i915_gem_execbuffer2 execbuf = {
+   .buffers_ptr = to_user_pointer(obj),
+   .buffer_count = 2,
+   .flags = e->flags,
+   };
+   igt_output_t *output;
+   struct igt_fb fb;
+   uint64_t *map;
+
+   igt_display_reset(dpy);
+
+   output = igt_get_single_output_for_pipe(dpy, PIPE_A);
+   igt_require(output);
+   igt_output_set_pipe(output, PIPE_A);
+
+   /*
+* Find where the scanout is in our GTT; on !full-ppgtt this will be
+* the actual GGTT address of the scanout.
+*/
+   obj[0].handle = pin_scanout(dpy, output, );
+   gem_execbuf(i915, );
+   igt_info("Scanout GTT address: %#llx\n", obj[0].offset);
+
+   /* Relocations should match the scanout address */
+   reloc.target_handle = obj[0].handle;
+   reloc.delta = 0;
+   reloc.presumed_offset = -1;
+   reloc.offset = 4000;
+   obj[1].relocation_count = 1;
+   obj[1].relocs_ptr = to_user_pointer();
+   gem_execbuf(i915, );
+   igt_info("Reloc address: %#llx\n", reloc.presumed_offset);
+   igt_assert_eq_u64(reloc.presumed_offset, obj[0].offset);
+
+   /* The address written into the batch should match the relocation */
+   gem_sync(i915, obj[1].handle);
+   map = gem_mmap__device_coherent(i915, obj[1].handle,
+   0, 4096, PROT_WRITE);
+   igt_assert_eq_u64(map[500], obj[0].offset);
+   munmap(map, 4096);
+
+   /* And finally softpinning with the scanout address should work */
+   obj[0].flags |= EXEC_OBJECT_PINNED;
+   obj[1].relocation_count = 0;
+   gem_execbuf(i915, );
+   igt_assert_eq_u64(obj[0].offset, reloc.presumed_offset);
+
+   gem_close(i915, obj[1].handle);
+   igt_remove_fb(dpy->drm_fd, );
+}
+
 #define I915_GEM_GPU_DOMAINS \
(I915_GEM_DOMAIN_RENDER | \
 I915_GEM_DOMAIN_SAMPLER | \
@@ -1511,6 +1590,29 @@ igt_main
igt_subtest("invalid-domains")
invalid_domains(fd);
 
+   igt_subtest_group {
+   igt_display_t display = {
+   .drm_fd = fd,
+   .n_pipes = IGT_MAX_PIPES
+   };
+
+   igt_fixture {
+   igt_device_set_master(fd);
+   kmstest_set_vt_graphics_mode();
+   igt_display_require(, fd);
+   }
+
+   igt_subtest_with_dynamic("basic-scanout") {
+   __for_each_physical_engine(fd, e) {
+   igt_dynamic_f("%s", e->name)
+   scanout(fd, , e);
+   }
+   }
+
+   igt_fixture
+   igt_display_fini();
+   }
+
igt_fixture
close(fd);
 }
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers

2021-02-16 Thread Chris Wilson
Quoting Matthew Auld (2021-02-16 16:49:28)
> On Tue, 16 Feb 2021 at 14:32, Chris Wilson  wrote:
> >
> > In light of the VT-d workarounds, we may introduce padding around the
> > scanout vma. This should not affect relocations referencing the scanout
> > on !full-ppgtt where we leak the GGTT address of scanout to users.
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Matthew Auld 
> > ---
> >  tests/i915/gem_exec_reloc.c | 102 
> >  1 file changed, 102 insertions(+)
> >
> > diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c
> > index cc9b8cd6d..98960bb84 100644
> > --- a/tests/i915/gem_exec_reloc.c
> > +++ b/tests/i915/gem_exec_reloc.c
> > @@ -26,7 +26,9 @@
> >
> >  #include "i915/gem.h"
> >  #include "igt.h"
> > +#include "igt_device.h"
> >  #include "igt_dummyload.h"
> > +#include "igt_kms.h"
> >  #include "sw_sync.h"
> >
> >  IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations.");
> > @@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common)
> > igt_assert_eq(result, 0);
> >  }
> >
> > +static uint32_t
> > +pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb)
> > +{
> > +   drmModeModeInfoPtr mode;
> > +   igt_plane_t *primary;
> > +
> > +   mode = igt_output_get_mode(output);
> > +
> > +   igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay,
> > + DRM_FORMAT_XRGB,
> > + LOCAL_I915_FORMAT_MOD_X_TILED, fb);
> > +
> > +   primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
> > +   igt_plane_set_fb(primary, fb);
> > +
> > +   igt_display_commit2(dpy, COMMIT_LEGACY);
> > +
> > +   return fb->gem_handle;
> > +}
> > +
> > +static void scanout(int i915,
> > +   igt_display_t *dpy,
> > +   const struct intel_execution_engine2 *e)
> 
> Missing feeding the engine into the execbuf?

Oops. Pretty pointless in making it loop over the engines without
selecting one!
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers

2021-02-16 Thread Chris Wilson
In light of the VT-d workarounds, we may introduce padding around the
scanout vma. This should not affect relocations referencing the scanout
on !full-ppgtt where we leak the GGTT address of scanout to users.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 tests/i915/gem_exec_reloc.c | 102 
 1 file changed, 102 insertions(+)

diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c
index cc9b8cd6d..98960bb84 100644
--- a/tests/i915/gem_exec_reloc.c
+++ b/tests/i915/gem_exec_reloc.c
@@ -26,7 +26,9 @@
 
 #include "i915/gem.h"
 #include "igt.h"
+#include "igt_device.h"
 #include "igt_dummyload.h"
+#include "igt_kms.h"
 #include "sw_sync.h"
 
 IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations.");
@@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common)
igt_assert_eq(result, 0);
 }
 
+static uint32_t
+pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb)
+{
+   drmModeModeInfoPtr mode;
+   igt_plane_t *primary;
+
+   mode = igt_output_get_mode(output);
+
+   igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB,
+ LOCAL_I915_FORMAT_MOD_X_TILED, fb);
+
+   primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY);
+   igt_plane_set_fb(primary, fb);
+
+   igt_display_commit2(dpy, COMMIT_LEGACY);
+
+   return fb->gem_handle;
+}
+
+static void scanout(int i915,
+   igt_display_t *dpy,
+   const struct intel_execution_engine2 *e)
+{
+   struct drm_i915_gem_relocation_entry reloc;
+   struct drm_i915_gem_exec_object2 obj[2] = {
+   [1] = { .handle = batch_create(i915) },
+   };
+   struct drm_i915_gem_execbuffer2 execbuf = {
+   .buffers_ptr = to_user_pointer(obj),
+   .buffer_count = 2,
+   };
+   igt_output_t *output;
+   struct igt_fb fb;
+   uint64_t *map;
+
+   igt_display_reset(dpy);
+
+   output = igt_get_single_output_for_pipe(dpy, PIPE_A);
+   igt_require(output);
+   igt_output_set_pipe(output, PIPE_A);
+
+   /*
+* Find where the scanout is in our GTT; on !full-ppgtt this will be
+* the actual GGTT address of the scanout.
+*/
+   obj[0].handle = pin_scanout(dpy, output, );
+   gem_execbuf(i915, );
+   igt_info("Scanout GTT address: %#llx\n", obj[0].offset);
+
+   /* Relocations should match the scanout address */
+   reloc.target_handle = obj[0].handle;
+   reloc.delta = 0;
+   reloc.presumed_offset = -1;
+   reloc.offset = 4000;
+   obj[1].relocation_count = 1;
+   obj[1].relocs_ptr = to_user_pointer();
+   gem_execbuf(i915, );
+   igt_info("Reloc address: %#llx\n", reloc.presumed_offset);
+   igt_assert_eq_u64(reloc.presumed_offset, obj[0].offset);
+
+   gem_sync(i915, obj[1].handle);
+   map = gem_mmap__device_coherent(i915, obj[1].handle,
+   0, 4096, PROT_WRITE);
+   igt_assert_eq_u64(map[500], obj[0].offset);
+   munmap(map, 4096);
+
+   /* And finally softpinning with the scanout address should work */
+   obj[0].flags |= EXEC_OBJECT_PINNED;
+   obj[1].relocation_count = 0;
+   gem_execbuf(i915, );
+   igt_assert_eq_u64(obj[0].offset, reloc.presumed_offset);
+
+   gem_close(i915, obj[1].handle);
+   igt_remove_fb(dpy->drm_fd, );
+}
+
 #define I915_GEM_GPU_DOMAINS \
(I915_GEM_DOMAIN_RENDER | \
 I915_GEM_DOMAIN_SAMPLER | \
@@ -1511,6 +1590,29 @@ igt_main
igt_subtest("invalid-domains")
invalid_domains(fd);
 
+   igt_subtest_group {
+   igt_display_t display = {
+   .drm_fd = fd,
+   .n_pipes = IGT_MAX_PIPES
+   };
+
+   igt_fixture {
+   igt_device_set_master(fd);
+   kmstest_set_vt_graphics_mode();
+   igt_display_require(, fd);
+   }
+
+   igt_subtest_with_dynamic("basic-scanout") {
+   __for_each_physical_engine(fd, e) {
+   igt_dynamic_f("%s", e->name)
+   scanout(fd, , e);
+   }
+   }
+
+   igt_fixture
+   igt_display_fini();
+   }
+
igt_fixture
close(fd);
 }
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy

2021-02-16 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-16 10:50:50)
> From: Tvrtko Ursulin 
> 
> Test that periodic reads of engine busyness against a constant 100% load
> are within the 5000ppm tolerance when comparing perf timestamp versus
> counter values.
> 
> Signed-off-by: Tvrtko Ursulin 
> ---
>  tests/i915/perf_pmu.c | 46 ++-
>  1 file changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c
> index 50b5c82bc472..728312be5293 100644
> --- a/tests/i915/perf_pmu.c
> +++ b/tests/i915/perf_pmu.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -46,6 +47,7 @@
>  #include "igt_perf.h"
>  #include "igt_sysfs.h"
>  #include "igt_pm.h"
> +#include "igt_stats.h"
>  #include "sw_sync.h"
>  
>  IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
> @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned 
> int flags)
>  static void
>  single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int 
> flags)
>  {
> +   unsigned int loops = flags & FLAG_LONG ? 20 : 1;
> +   double err_min = DBL_MAX, err_max = -DBL_MAX;
> unsigned long slept;
> igt_spin_t *spin;
> +   igt_stats_t s;
> uint64_t val;
> int fd;
>  
> @@ -290,11 +295,40 @@ single(int gem_fd, const struct intel_execution_engine2 
> *e, unsigned int flags)
> else
> spin = NULL;
>  
> -   val = pmu_read_single(fd);
> -   slept = measured_usleep(batch_duration_ns / 1000);
> -   if (flags & TEST_TRAILING_IDLE)
> -   end_spin(gem_fd, spin, flags);
> -   val = pmu_read_single(fd) - val;
> +   igt_stats_init_with_size(, loops);
> +
> +   while (--loops) {

while (loops--)

/o\
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy

2021-02-16 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-16 10:50:50)
> From: Tvrtko Ursulin 
> 
> Test that periodic reads of engine busyness against a constant 100% load
> are within the 5000ppm tolerance when comparing perf timestamp versus
> counter values.
> 
> Signed-off-by: Tvrtko Ursulin 
> ---
>  tests/i915/perf_pmu.c | 46 ++-
>  1 file changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c
> index 50b5c82bc472..728312be5293 100644
> --- a/tests/i915/perf_pmu.c
> +++ b/tests/i915/perf_pmu.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -46,6 +47,7 @@
>  #include "igt_perf.h"
>  #include "igt_sysfs.h"
>  #include "igt_pm.h"
> +#include "igt_stats.h"
>  #include "sw_sync.h"
>  
>  IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
> @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned 
> int flags)
>  static void
>  single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int 
> flags)
>  {
> +   unsigned int loops = flags & FLAG_LONG ? 20 : 1;
> +   double err_min = DBL_MAX, err_max = -DBL_MAX;
> unsigned long slept;
> igt_spin_t *spin;
> +   igt_stats_t s;
> uint64_t val;
> int fd;

Something to record is that TEST_TRAILING_IDLE and TEST_LONG are mutually
exclusive.

So assert(igt_hweight(flags & (TEST_TRAILING_IDLE | TEST_LONG)) <= 1); ?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases

2021-02-16 Thread Chris Wilson
Quoting Ram Moon, AnandX (2021-02-16 12:05:23)
> Hi Chris,
> 
> -Original Message-
> From: dri-devel  On Behalf Of Chris 
> Wilson
> Sent: Monday, February 15, 2021 6:10 PM
> To: Auld, Matthew ; Ram Moon, AnandX 
> ; Surendrakumar Upadhyay, TejaskumarX 
> ; Ursulin, Tvrtko 
> ; Jani Nikula ; 
> dri-de...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size 
> for corner cases
> 
> Quoting Ram Moon, AnandX (2021-02-15 12:29:17)
> > Hi Chris,
> > 
> > -----Original Message-
> > From: dri-devel  On Behalf Of 
> > Chris Wilson
> > Sent: Wednesday, February 10, 2021 4:15 PM
> > To: Ram Moon, AnandX ; Jani Nikula 
> > ; Auld, Matthew ; 
> > Surendrakumar Upadhyay, TejaskumarX 
> > ; Ursulin, Tvrtko 
> > ; dri-de...@lists.freedesktop.org; 
> > intel-gfx@lists.freedesktop.org
> > Cc: Ram Moon, AnandX 
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object 
> > size for corner cases
> > 
> > Quoting Anand Moon (2021-02-10 07:59:29)
> > > Add check for object size to return appropriate error -E2BIG or 
> > > -EINVAL to avoid WARM_ON and successful return for some testcase.
> > 
> > No. You miss the point of having those warnings. We need to inspect the 
> > code to remove the last remaining "int pagenum", and then we can remove the 
> > GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for users, 
> > only for us to motivate us into finally fixing the code.
> > -Chris
> > 
> > Yes, I got your point these check are meant to catch up integer overflow.
> > 
> > I have check with the IGT testcase case  _gem_create_ and 
> > _gem_userptr_blits_ which fails pass size *-1ull << 32*  left shift 
> > and *0~* which leads to integer overflow ie  _negative_ size passed to 
> > create  i915_gem_create via ioctl  this leads to GM_WARN_ON.
> > 
> > Can we drop these testcase so that we don't break the kernel ?
> 
> The kernel rejects the ioctl with the expected errno. We leave a warning 
> purely for the benefit of CI, only when compiled for debugging and not by 
> default, so that we have a persistent reminder to do the code review.
> What's broken?
> -Chris
> 
> All though the testcase return with appropriate error we observe kernel taint 
> see below.

Which is an intentional taint added for CI so that we get a warning and
a visible bug so that we can allocate resources to _fix_ the underlying
problems in the code.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy

2021-02-16 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-16 10:50:50)
> From: Tvrtko Ursulin 
> 
> Test that periodic reads of engine busyness against a constant 100% load
> are within the 5000ppm tolerance when comparing perf timestamp versus
> counter values.

We've previously only included the accuracy tests on platforms that
claim to be accurate. Can we rephrase the goal of the test such that it
sounds more like a universal truth that should also be valid for the
sampling backends? "The busyness should never exceed 100% (within a
margin of error)..." And since accuracy is currently protected by
has_busy_stats, let's try to find a different name.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Introduce guard pages to i915_vma

2021-02-15 Thread Chris Wilson
Introduce the concept of padding the i915_vma with guard pages before
and aft. The major consequence is that all ordinary uses of i915_vma
must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
directly, as the drm_mm_node will include the guard pages that surround
our object.

The biggest connundrum is how exactly to mix requesting a fixed address
with guard pages, particularly through the existing uABI. The user does
not know about guard pages, so such must be transparent to the user, and
so the execobj.offset must be that of the object itself excluding the
guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.
The caveat is that some placements will be impossible with guard pages,
as wrap arounds need to be avoided, and the vma itself will require a
larger node. We must we not report EINVAL but ENOSPC as these are
unavailable locations within the GTT rather than conflicting user
requirements.

In the next patch, we start using guard pages for scanout objects. While
these are limited to GGTT vma, on a few platforms these vma (or at least
an alias of the vma) is shared with userspace, so we may leak the
existence of such guards if we are not careful to ensure that the
execobj.offset is transparent and excludes the guards. (On such platforms
like ivb, without full-ppgtt, userspace has to use relocations so the
presence of more untouchable regions within its GTT such be of no further
issue.)

v2: Include the guard range in the overflow checks and placement
restrictions.

v3: Fix the check on the placement upper bound. The request user offset
is relative to the guard offset (not the node.start) and so we should
not include the initial guard offset again when computing the upper
bound of the node.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
 drivers/gpu/drm/i915/i915_vma.c   | 28 ++-
 drivers/gpu/drm/i915/i915_vma.h   |  5 +++--
 drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index c5803c434d33..6b326138e765 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen8_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
gen8_set_pte(gte++, pte_encode | addr);
GEM_BUG_ON(gte > end);
@@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen6_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
iowrite32(vm->pte_encode(addr, level, flags), gte++);
GEM_BUG_ON(gte > end);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 748f5ea1ba04..01e62edf4704 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, 
unsigned long color)
 static int
 i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
-   unsigned long color;
+   unsigned long color, guard;
u64 start, end;
int ret;
 
@@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
GEM_BUG_ON(drm_mm_node_allocated(>node));
 
size = max(size, vma->size);
-   alignment = max(alignment, vma->display_alignment);
+   alignment = max_t(typeof(alignment), alignment, vma->display_alignment);
if (flags & PIN_MAPPABLE) {
size = max_t(typeof(size), size, vma->fence_size);
alignment = max_t(typeof(alignment),
@@ -642,6 +642,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT));
GEM_BUG_ON(!is_power_of_2(alignment));
 
+   guard = vma->guard; /* retain guard across rebinds */
+   guard = ALIGN(guard, alignment);
+
start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma

2021-02-15 Thread Chris Wilson
Quoting Matthew Auld (2021-02-15 19:31:39)
> On Mon, 15 Feb 2021 at 18:15, Chris Wilson  wrote:
> >
> > Quoting Matthew Auld (2021-02-15 18:04:08)
> > > On Mon, 15 Feb 2021 at 15:56, Chris Wilson  
> > > wrote:
> > > >
> > > > Introduce the concept of padding the i915_vma with guard pages before
> > > > and aft. The major consequence is that all ordinary uses of i915_vma
> > > > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
> > > > directly, as the drm_mm_node will include the guard pages that surround
> > > > our object.
> > > >
> > > > The biggest connundrum is how exactly to mix requesting a fixed address
> > > > with guard pages, particularly through the existing uABI. The user does
> > > > not know about guard pages, so such must be transparent to the user, and
> > > > so the execobj.offset must be that of the object itself excluding the
> > > > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.
> > > > The caveat is that some placements will be impossible with guard pages,
> > > > as wrap arounds need to be avoided, and the vma itself will require a
> > > > larger node. We must we not report EINVAL but ENOSPC as these are
> > > > unavailable locations within the GTT rather than conflicting user
> > > > requirements.
> > > >
> > > > In the next patch, we start using guard pages for scanout objects. While
> > > > these are limited to GGTT vma, on a few platforms these vma (or at least
> > > > an alias of the vma) is shared with userspace, so we may leak the
> > > > existence of such guards if we are not careful to ensure that the
> > > > execobj.offset is transparent and excludes the guards. (On such 
> > > > platforms
> > > > like ivb, without full-ppgtt, userspace has to use relocations so the
> > > > presence of more untouchable regions within its GTT such be of no 
> > > > further
> > > > issue.)
> > > >
> > > > v2: Include the guard range in the overflow checks and placement
> > > > restrictions.
> > > >
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Matthew Auld 
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
> > > >  drivers/gpu/drm/i915/i915_vma.c   | 28 ++-
> > > >  drivers/gpu/drm/i915/i915_vma.h   |  5 +++--
> > > >  drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
> > > >  4 files changed, 38 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> > > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > > index c5803c434d33..6b326138e765 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
> > > > i915_address_space *vm,
> > > >
> > > > gte = (gen8_pte_t __iomem *)ggtt->gsm;
> > > > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > > > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> > > >
> > > > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > > > +   while (gte < end)
> > > > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > > > +
> > > > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > > > for_each_sgt_daddr(addr, iter, vma->pages)
> > > > gen8_set_pte(gte++, pte_encode | addr);
> > > > GEM_BUG_ON(gte > end);
> > > > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
> > > > i915_address_space *vm,
> > > >
> > > > gte = (gen6_pte_t __iomem *)ggtt->gsm;
> > > > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > > > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> > > >
> > > > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > > > +   while (gte < end)
> > > > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > > > +
> > > > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > > > for_each_sgt_daddr(addr, iter, vma->pages)
> > > >

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma

2021-02-15 Thread Chris Wilson
Quoting Matthew Auld (2021-02-15 18:04:08)
> On Mon, 15 Feb 2021 at 15:56, Chris Wilson  wrote:
> >
> > Introduce the concept of padding the i915_vma with guard pages before
> > and aft. The major consequence is that all ordinary uses of i915_vma
> > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
> > directly, as the drm_mm_node will include the guard pages that surround
> > our object.
> >
> > The biggest connundrum is how exactly to mix requesting a fixed address
> > with guard pages, particularly through the existing uABI. The user does
> > not know about guard pages, so such must be transparent to the user, and
> > so the execobj.offset must be that of the object itself excluding the
> > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.
> > The caveat is that some placements will be impossible with guard pages,
> > as wrap arounds need to be avoided, and the vma itself will require a
> > larger node. We must we not report EINVAL but ENOSPC as these are
> > unavailable locations within the GTT rather than conflicting user
> > requirements.
> >
> > In the next patch, we start using guard pages for scanout objects. While
> > these are limited to GGTT vma, on a few platforms these vma (or at least
> > an alias of the vma) is shared with userspace, so we may leak the
> > existence of such guards if we are not careful to ensure that the
> > execobj.offset is transparent and excludes the guards. (On such platforms
> > like ivb, without full-ppgtt, userspace has to use relocations so the
> > presence of more untouchable regions within its GTT such be of no further
> > issue.)
> >
> > v2: Include the guard range in the overflow checks and placement
> > restrictions.
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Matthew Auld 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
> >  drivers/gpu/drm/i915/i915_vma.c   | 28 ++-
> >  drivers/gpu/drm/i915/i915_vma.h   |  5 +++--
> >  drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
> >  4 files changed, 38 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index c5803c434d33..6b326138e765 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >
> > gte = (gen8_pte_t __iomem *)ggtt->gsm;
> > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >
> > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > +   while (gte < end)
> > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > +
> > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > for_each_sgt_daddr(addr, iter, vma->pages)
> > gen8_set_pte(gte++, pte_encode | addr);
> > GEM_BUG_ON(gte > end);
> > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >
> > gte = (gen6_pte_t __iomem *)ggtt->gsm;
> > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >
> > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > +   while (gte < end)
> > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > +
> > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > for_each_sgt_daddr(addr, iter, vma->pages)
> > iowrite32(vm->pte_encode(addr, level, flags), gte++);
> > GEM_BUG_ON(gte > end);
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c 
> > b/drivers/gpu/drm/i915/i915_vma.c
> > index 748f5ea1ba04..31d0f8b64ec0 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, 
> > unsigned long color)
> >  static int
> >  i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
> >  {
> > -   unsigned long color;
> > +   unsigned long color, guard;
> > u64 start, end;
> > int ret;
> >
> > @@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
> > alignment, u64 flags)
> > GEM_BUG_ON(drm_mm_node_allocated(>node));

Re: [Intel-gfx] [PATCH] drm/i915/selftest: Fix an error code in mock_context_barrier()

2021-02-15 Thread Chris Wilson
Quoting Dan Carpenter (2021-02-15 15:58:27)
> If the igt_request_alloc() call fails then this should return a
> negative error code, but currently it returns success.
> 
> Fixes: 85fddf0b0027 ("drm/i915: Introduce a context barrier callback")
> Signed-off-by: Dan Carpenter 

Reviewed-by: Chris Wilson 

Looks to be the only missing one in that file; smatch will no doubt find
more.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/3] drm/i915: Refine VT-d scanout workaround

2021-02-15 Thread Chris Wilson
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 160 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.

If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.

v2: Move the guard from modifying drm_mm_node.start which is still used
by the drm_mm itself, into an adjustment of node.start at the point of
use.

v3: Pass the requested guard padding from the caller, so we can drop the
VT-d w/a knowledge from the i915_vma allocator.

Signed-off-by: Chris Wilson 
Cc: Ville Syrjälä 
Cc: Matthew Auld 
Reviewed-by: Matthew Auld  # v2
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c |  6 ++
 drivers/gpu/drm/i915/gt/intel_ggtt.c   | 25 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 drivers/gpu/drm/i915/i915_vma.c|  8 +++
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 0478b069c202..4b577cf7ec7e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -16,6 +16,8 @@
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
+#define VTD_GUARD (160 * I915_GTT_PAGE_SIZE) /* 160 PTE padding */
+
 static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
return !(obj->cache_level == I915_CACHE_NONE ||
@@ -345,6 +347,10 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (ret)
goto err;
 
+   /* VT-d may overfetch before/after the vma, so pad with scratch */
+   if (intel_scanout_needs_vtd_wa(i915))
+   flags |= PIN_OFFSET_GUARD | VTD_GUARD;
+
/*
 * As the user may map the buffer once pinned in the display plane
 * (e.g. libkms for the bootup splash), we have to ensure that we
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 6b326138e765..251b50884d1c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm,
 {
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-   gen8_pte_t __iomem *gtt_base =
-   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-   int i;
-
-   if (WARN(num_entries > max_entries,
-"First entry = %d; Num entries = %d (max=%d)\n",
-first_entry, num_entries, max_entries))
-   num_entries = max_entries;
-
-   for (i = 0; i < num_entries; i++)
-   gen8_set_pte(_base[i], scratch_pte);
-}
-
 static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
 {
/*
@@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
ggtt->vm.clear_range = nop_clear_range;
-   if (intel_scanout_needs_vtd_wa(i915))
-   ggtt->vm.clear_range = gen8_ggtt_clear_range;
 
ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
 
@@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.alloc_pt_dma = alloc_pt_dma;
 
ggtt->vm.clear_range = nop_clear_range;
-   if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
+   if (!HAS_FULL_PPGTT(i915))
ggtt->vm.clear_range = gen6_ggtt_clear_range;
ggtt->vm.insert_page = gen6_ggtt_insert_page;
ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c9b0ee5e1d23..f3ae9afdee15 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -41,6 +41,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 #define PIN_HIGH   BIT_ULL(5

[Intel-gfx] [PATCH 1/3] drm/i915: Wrap all access to i915_vma.node.start|size

2021-02-15 Thread Chris Wilson
We already wrap i915_vma.node.start for use with the GGTT, as there we
can perform additional sanity checks that the node belongs to the GGTT
and fits within the 32b registers. In the next couple of patches, we
will introduce guard pages around the objects _inside_ the drm_mm_node
allocation. That is we will offset the vma->pages so that the first page
is at drm_mm_node.start + vma->guard (not 0 as is currently the case).
All users must then not use i915_vma.node.start directly, but compute
the guard offset, thus all users are converted to use a
i915_vma_offset() wrapper.

The notable exceptions are the selftests that are testing exact
behaviour of i915_vma_pin/i915_vma_insert.

Signed-off-by: Chris Wilson 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/display/intel_fbdev.c|  6 +--
 .../gpu/drm/i915/gem/i915_gem_client_blt.c|  3 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 41 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  2 +-
 .../gpu/drm/i915/gem/i915_gem_object_blt.c| 13 +++---
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c|  4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  | 15 +++
 .../drm/i915/gem/selftests/i915_gem_context.c | 19 ++---
 .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
 .../drm/i915/gem/selftests/igt_gem_utils.c|  7 ++--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  2 +-
 drivers/gpu/drm/i915/gt/gen7_renderclear.c|  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  8 ++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  5 ++-
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  3 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  7 +++-
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  | 12 +++---
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 18 
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 17 
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 16 
 .../drm/i915/gt/selftest_ring_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 12 +++---
 .../gpu/drm/i915/gt/selftest_workarounds.c|  9 ++--
 drivers/gpu/drm/i915/i915_cmd_parser.c|  4 +-
 drivers/gpu/drm/i915/i915_debugfs.c   |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  4 +-
 drivers/gpu/drm/i915/i915_perf.c  |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c |  4 +-
 drivers/gpu/drm/i915/i915_vma.c   | 21 +-
 drivers/gpu/drm/i915/i915_vma.h   | 23 +--
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 20 -
 .../gpu/drm/i915/selftests/i915_scheduler.c   |  4 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  | 11 +++--
 38 files changed, 196 insertions(+), 145 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 07db8e83f98e..2a72fb3c8416 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -237,8 +237,8 @@ static int intelfb_create(struct drm_fb_helper *helper,
 
/* Our framebuffer is the entirety of fbdev's system memory */
info->fix.smem_start =
-   (unsigned long)(ggtt->gmadr.start + vma->node.start);
-   info->fix.smem_len = vma->node.size;
+   (unsigned long)(ggtt->gmadr.start + i915_ggtt_offset(vma));
+   info->fix.smem_len = vma->size;
 
vaddr = i915_vma_pin_iomap(vma);
if (IS_ERR(vaddr)) {
@@ -248,7 +248,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
goto out_unpin;
}
info->screen_base = vaddr;
-   info->screen_size = vma->node.size;
+   info->screen_size = vma->size;
 
drm_fb_helper_fill_info(info, >helper, sizes);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 44821d94544f..864510b3a7fd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -218,7 +218,8 @@ static void clear_pages_worker(struct work_struct *work)
}
 
err = rq->engine->emit_bb_start(rq,
-   batch->node.start, batch->node.size,
+   i915_vma_offset(batch),
+   i915_vma_size(batch),
0);
 out_request:
if (unlikely(err)) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index fe170186dd42..7f985902c24a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_exec

[Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma

2021-02-15 Thread Chris Wilson
Introduce the concept of padding the i915_vma with guard pages before
and aft. The major consequence is that all ordinary uses of i915_vma
must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
directly, as the drm_mm_node will include the guard pages that surround
our object.

The biggest connundrum is how exactly to mix requesting a fixed address
with guard pages, particularly through the existing uABI. The user does
not know about guard pages, so such must be transparent to the user, and
so the execobj.offset must be that of the object itself excluding the
guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.
The caveat is that some placements will be impossible with guard pages,
as wrap arounds need to be avoided, and the vma itself will require a
larger node. We must we not report EINVAL but ENOSPC as these are
unavailable locations within the GTT rather than conflicting user
requirements.

In the next patch, we start using guard pages for scanout objects. While
these are limited to GGTT vma, on a few platforms these vma (or at least
an alias of the vma) is shared with userspace, so we may leak the
existence of such guards if we are not careful to ensure that the
execobj.offset is transparent and excludes the guards. (On such platforms
like ivb, without full-ppgtt, userspace has to use relocations so the
presence of more untouchable regions within its GTT such be of no further
issue.)

v2: Include the guard range in the overflow checks and placement
restrictions.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
 drivers/gpu/drm/i915/i915_vma.c   | 28 ++-
 drivers/gpu/drm/i915/i915_vma.h   |  5 +++--
 drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index c5803c434d33..6b326138e765 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen8_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
gen8_set_pte(gte++, pte_encode | addr);
GEM_BUG_ON(gte > end);
@@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen6_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
iowrite32(vm->pte_encode(addr, level, flags), gte++);
GEM_BUG_ON(gte > end);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 748f5ea1ba04..31d0f8b64ec0 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, 
unsigned long color)
 static int
 i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
-   unsigned long color;
+   unsigned long color, guard;
u64 start, end;
int ret;
 
@@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
GEM_BUG_ON(drm_mm_node_allocated(>node));
 
size = max(size, vma->size);
-   alignment = max(alignment, vma->display_alignment);
+   alignment = max_t(typeof(alignment), alignment, vma->display_alignment);
if (flags & PIN_MAPPABLE) {
size = max_t(typeof(size), size, vma->fence_size);
alignment = max_t(typeof(alignment),
@@ -642,6 +642,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT));
GEM_BUG_ON(!is_power_of_2(alignment));
 
+   guard = vma->guard; /* retain guard across rebinds */
+   guard = ALIGN(guard, alignment);
+
start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE));
 
@@ -651,12 +654,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
if (flags & PIN_ZONE_4G)
end = min_t(u64, end, (1ULL << 32) 

Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases

2021-02-15 Thread Chris Wilson
Quoting Ram Moon, AnandX (2021-02-15 12:29:17)
> Hi Chris,
> 
> -Original Message-
> From: dri-devel  On Behalf Of Chris 
> Wilson
> Sent: Wednesday, February 10, 2021 4:15 PM
> To: Ram Moon, AnandX ; Jani Nikula 
> ; Auld, Matthew ; 
> Surendrakumar Upadhyay, TejaskumarX 
> ; Ursulin, Tvrtko 
> ; dri-de...@lists.freedesktop.org; 
> intel-gfx@lists.freedesktop.org
> Cc: Ram Moon, AnandX 
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size 
> for corner cases
> 
> Quoting Anand Moon (2021-02-10 07:59:29)
> > Add check for object size to return appropriate error -E2BIG or 
> > -EINVAL to avoid WARM_ON and successful return for some testcase.
> 
> No. You miss the point of having those warnings. We need to inspect the code 
> to remove the last remaining "int pagenum", and then we can remove the 
> GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for users, 
> only for us to motivate us into finally fixing the code.
> -Chris
> 
> Yes, I got your point these check are meant to catch up integer overflow.
> 
> I have check with the IGT testcase case  _gem_create_ and _gem_userptr_blits_ 
>  
> which fails pass size *-1ull << 32*  left shift and *0~* which leads to 
> integer overflow 
> ie  _negative_ size passed to create  i915_gem_create via ioctl  this leads 
> to GM_WARN_ON. 
> 
> Can we drop these testcase so that we don't break the kernel ?

The kernel rejects the ioctl with the expected errno. We leave a warning
purely for the benefit of CI, only when compiled for debugging and not by
default, so that we have a persistent reminder to do the code review.
What's broken?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [CI 4/4] drm/i915: move intel_init_audio_hooks inside display

2021-02-13 Thread Chris Wilson
Quoting Lucas De Marchi (2021-02-13 04:27:56)
> intel_init_audio_hooks() sets up hooks in the display struct and only
> makes sense when we have display. Move it inside
> intel_init_display_hooks() so it isn't called when we don't have
> display.
> 
> Signed-off-by: Lucas De Marchi 

And patch 2 answered the question of what about intel_audio_init().

Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 3/3] drm/i915/gt: Export device and per-process runtimes via procfs

2021-02-12 Thread Chris Wilson
Quoting Emil Velikov (2021-02-12 15:45:04)
> On Fri, 12 Feb 2021 at 15:16, Chris Wilson  wrote:
> >
> > Quoting Emil Velikov (2021-02-12 14:57:56)
> > > Hi Chris,
> > >
> > > On Thu, 4 Feb 2021 at 12:11, Chris Wilson  
> > > wrote:
> > > >
> > > > Register with /proc/gpu to provide the client runtimes for generic
> > > > top-like overview, e.g. gnome-system-monitor can use this information to
> > > > show the per-process multi-GPU usage.
> > > >
> > > Exposing this information to userspace sounds great IMHO and like the
> > > proposed "channels" for the device engines.
> > > If it were me, I would have the channel names a) exposed to userspace
> > > and b) be a "fixed set".
> >
> > - Total
> > - Graphics
> > - Compute
> > - Unified
> > - Video
> > - Copy
> > - Display
> > - Other
> >
> > Enough versatility for the foreseeable future?
> > But plan for extension.
> >
> With a bit of documentation about "unified" (is it a metric also
> counted towards any of the rest) it would be perfect.

With unified I was trying to find a place to things that are neither
wholly graphics nor compute, as some may prefer not to categorise
themselves as one or the other. Also whether or not some cores are more
compute than others (so should there be an AI/RT/ALU?)

> For future extension one might consider splitting video into
> encoder/decoder/post-processing.

Ok, I wasn't sure how commonly those functions were split on different
HW.

> > The other aspect then is the capacity of each channel. We can keep it
> > simple as the union/average (whichever the driver has to hand) runtime in
> > nanoseconds over all IP blocks within a channel.
> 
> Not sure what you mean with capacity. Are you referring to having
> multiple instances of the same engine (say 3 separate copy engines)?
> Personally I'm inclined to keep these separate entries, since some
> hardware can have multiple ones.
> 
> For example - before the latest changes nouveau had 8 copy engines,
> 3+3 video 'generic' video (enc,dec)oder engines, amongst others.

Yes, most HW have multiple engines within a family. Trying to keep it
simple, I thought presenting just one runtime metric for the whole
channel. Especially for the single-line per device format I had picked :)

If we switch to a more extensible format,

-'$device0' : 
-$channel0 : {
Total : $total # avg/union over all engines
Engines : [ $0, $1, ... ]
}
...

-'$device1' : 
...

Using the same fixed channel names, and dev_name(), pesky concerns such
as keeping it as a simple scanf can be forgotten.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] Revert "HAX sound: Disable probing snd_hda with DG1"

2021-02-12 Thread Chris Wilson
Quoting Chris Wilson (2021-02-12 15:22:13)
> Quoting Kai Vehmanen (2021-02-12 14:53:02)
> > This reverts commit 3632610d38316bca9b0cd9d649ce3cefab58520a.
> > 
> > DG1 has been supported in upstream since v5.10 with commit
> > 69b08bdfa818 ("ALSA: hda - add Intel DG1 PCI and HDMI ids").
> 
> Exactly, otherwise this patch wouldn't have been required to stop CI
> from timing out upon snd probing the hdmi components. You need the other
> half to be supported as well before CI is ready.

Just in case it isn't clear, this patch is only in the CI topic branch
to hide known issues with CI that we simply aren't ready for.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] Revert "HAX sound: Disable probing snd_hda with DG1"

2021-02-12 Thread Chris Wilson
Quoting Kai Vehmanen (2021-02-12 14:53:02)
> This reverts commit 3632610d38316bca9b0cd9d649ce3cefab58520a.
> 
> DG1 has been supported in upstream since v5.10 with commit
> 69b08bdfa818 ("ALSA: hda - add Intel DG1 PCI and HDMI ids").

Exactly, otherwise this patch wouldn't have been required to stop CI
from timing out upon snd probing the hdmi components. You need the other
half to be supported as well before CI is ready.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 3/3] drm/i915/gt: Export device and per-process runtimes via procfs

2021-02-12 Thread Chris Wilson
Quoting Emil Velikov (2021-02-12 14:57:56)
> Hi Chris,
> 
> On Thu, 4 Feb 2021 at 12:11, Chris Wilson  wrote:
> >
> > Register with /proc/gpu to provide the client runtimes for generic
> > top-like overview, e.g. gnome-system-monitor can use this information to
> > show the per-process multi-GPU usage.
> >
> Exposing this information to userspace sounds great IMHO and like the
> proposed "channels" for the device engines.
> If it were me, I would have the channel names a) exposed to userspace
> and b) be a "fixed set".

- Total
- Graphics
- Compute
- Unified
- Video
- Copy
- Display
- Other

Enough versatility for the foreseeable future?
But plan for extension.

The other aspect then is the capacity of each channel. We can keep it
simple as the union/average (whichever the driver has to hand) runtime in
nanoseconds over all IP blocks within a channel.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] i915/gem_userptr_blts: Move readonly test into common block

2021-02-12 Thread Chris Wilson
Since we only run test_readonly for a single sync-flag, place it in the
common block.

Signed-off-by: Chris Wilson 
---
 tests/i915/gem_userptr_blits.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c
index 1bc2d3600..a4f137f93 100644
--- a/tests/i915/gem_userptr_blits.c
+++ b/tests/i915/gem_userptr_blits.c
@@ -2474,6 +2474,9 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL)
igt_subtest("set-cache-level")
test_set_caching(fd);
 
+   igt_subtest("readonly")
+   test_readonly(fd);
+
igt_subtest("userfault")
test_userfault(fd);
 
@@ -2515,9 +2518,6 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL)
igt_subtest("dmabuf-unsync")
test_dmabuf();
 
-   igt_subtest("readonly-unsync")
-   test_readonly(fd);
-
igt_describe("Examine mmap-offset mapping to read-only 
userptr");
igt_subtest_with_dynamic("readonly-mmap-unsync")
for_each_mmap_offset_type(fd, t)
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] i915/gem_userptr_blts: Move readonly test into common block

2021-02-12 Thread Chris Wilson
Since we only run test_readonly for a single sync-flag, place it in the
common block.

Signed-off-by: Chris Wilson 
---
 tests/i915/gem_userptr_blits.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c
index 1bc2d3600..a4f137f93 100644
--- a/tests/i915/gem_userptr_blits.c
+++ b/tests/i915/gem_userptr_blits.c
@@ -2474,6 +2474,9 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL)
igt_subtest("set-cache-level")
test_set_caching(fd);
 
+   igt_subtest("readonly")
+   test_readonly(fd);
+
igt_subtest("userfault")
test_userfault(fd);
 
@@ -2515,9 +2518,6 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL)
igt_subtest("dmabuf-unsync")
test_dmabuf();
 
-   igt_subtest("readonly-unsync")
-   test_readonly(fd);
-
igt_describe("Examine mmap-offset mapping to read-only 
userptr");
igt_subtest_with_dynamic("readonly-mmap-unsync")
for_each_mmap_offset_type(fd, t)
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma

2021-02-12 Thread Chris Wilson
Quoting Matthew Auld (2021-02-12 13:43:46)
> On Fri, 12 Feb 2021 at 10:22, Chris Wilson  wrote:
> >
> > Introduce the concept of padding the i915_vma with guard pages before
> > and aft. The major consequence is that all ordinary uses of i915_vma
> > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
> > directly, as the drm_mm_node will include the guard pages that surround
> > our object.
> >
> > So in this patch, we look for all uses of i915_vma->node.start that
> > instead need to include the guard offset and switch them to
> > i915_vma_offset(), and in a few cases to i915_ggtt_offset(). Notable
> > exceptions are the selftests, which expect exact behaviour.
> >
> > The biggest connundrum is how exactly to mix request a fixed address
> > with guard pages, particular through the existing uABI. The user does
> > not know about guard pages, so such must be transparent to the user, and
> > so the execobj.offset must be that of the object itself excluding the
> > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.
> >
> > In the next patch, we start using guard pages for scanout objects. While
> > these are limited to GGTT vma, on a few platforms these vma (or at least
> > an alias of the vma) is shared with userspace, so we may leak the
> > existence of such guards if we are not careful to ensure that the
> > execobj.offset is transparent and excludes the guards. (On such platforms,
> > without full-ppgtt, userspace has to use relocations so the presence of
> > more untouchable regions within its GTT such be of no further issue.)
> >
> > Signed-off-by: Chris Wilson 
> > Cc: Matthew Auld 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
> >  drivers/gpu/drm/i915/i915_vma.c   | 10 +++---
> >  drivers/gpu/drm/i915/i915_vma.h   |  8 
> >  drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
> >  4 files changed, 23 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index c5803c434d33..6b326138e765 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >
> > gte = (gen8_pte_t __iomem *)ggtt->gsm;
> > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >
> > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > +   while (gte < end)
> > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > +
> > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > for_each_sgt_daddr(addr, iter, vma->pages)
> > gen8_set_pte(gte++, pte_encode | addr);
> > GEM_BUG_ON(gte > end);
> > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >
> > gte = (gen6_pte_t __iomem *)ggtt->gsm;
> > gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > -   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >
> > +   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
> > +   while (gte < end)
> > +   gen8_set_pte(gte++, vm->scratch[0]->encode);
> > +
> > +   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
> > for_each_sgt_daddr(addr, iter, vma->pages)
> > iowrite32(vm->pte_encode(addr, level, flags), gte++);
> > GEM_BUG_ON(gte > end);
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c 
> > b/drivers/gpu/drm/i915/i915_vma.c
> > index 17fe455bd770..155f510b4cc6 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, 
> > unsigned long color)
> >  static int
> >  i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
> >  {
> > -   unsigned long color;
> > +   unsigned long color, guard;
> > u64 start, end;
> > int ret;
> >
> > @@ -631,13 +631,16 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
> > alignment, u64 flags)
> > GEM_BUG_ON(drm_mm_node_allocated(>node));
> >
> > size = max(size, vma->size);
> > -   alignment = max(alignment, vma->display_alignment);
> > +   alignment = max_t(typ

[Intel-gfx] [PATCH v6] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-12 Thread Chris Wilson
The heartbeat runs through a few phases that we expect to complete
within a certain number of heartbeat intervals. First we must submit the
heartbeat to the queue, and if the queue is occupied it may take a
couple of intervals before the heartbeat preempts the workload and is
submitted to HW. Once running on HW, completion is not instantaneous as
it may have to first reset the current workload before it itself runs
through the empty request and signals completion. As such, we know that
the heartbeat must take at least the preempt reset timeout and before we
have had a chance to reset the engine, we do not want to issue a global
reset ourselves (simply so that we only try to do one reset at a time
and not confuse ourselves by resetting twice and hitting an innocent.)

So by taking into consideration that once running the request must take
a finite amount of time, we can delay the final completion check to
accommodate that and avoid checking too early (before we've had a chance
to handle any engine resets required).

v3: Now with more tracking for selftests and detection of
false/unexpected hang reports.

v2,4,5: Add a completion handler for the heartbeat to speed up discovery
of a successful heartbeat. Remove it, but then put it back to handle
large mismatches between the sysfs properties and reality.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
Suggested-by: CQ Tang 
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 107 +++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   8 ++
 .../drm/i915/gt/selftest_engine_heartbeat.c   | 103 +++--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   5 +-
 4 files changed, 173 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..7e7d2ff2299b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -20,6 +20,18 @@
  * issue a reset -- in the hope that restores progress.
  */
 
+#define HEARTBEAT_COMPLETION 50u /* milliseconds */
+
+static long completion_timeout(const struct intel_engine_cs *engine)
+{
+   long timeout = HEARTBEAT_COMPLETION;
+
+   if (intel_engine_has_preempt_reset(engine))
+   timeout += READ_ONCE(engine->props.preempt_timeout_ms);
+
+   return msecs_to_jiffies(timeout);
+}
+
 static bool next_heartbeat(struct intel_engine_cs *engine)
 {
long delay;
@@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine)
return false;
 
delay = msecs_to_jiffies_timeout(delay);
+
+   /*
+* Once we submit a heartbeat to the HW, we know that it will take
+* at least a certain amount of time to complete. On a hanging system
+* it will first have to wait for the preempt reset timeout, and
+* then it will take some time for the reset to resume with the
+* heartbeat and for it to complete. So once we have submitted the
+* heartbeat to HW, we can wait a while longer before declaring the
+* engine stuck and forcing a reset ourselves. If we do a reset
+* and the engine is also doing a reset, it is possible that we
+* reset the engine twice, harming an innocent.
+*
+* Before we have sumitted the heartbeat, we do not want to change
+* the interval as we want to promote the heartbeat and trigger
+* preemption in a deterministic time frame.
+*/
+   if (engine->heartbeat.systole &&
+   i915_request_is_active(engine->heartbeat.systole))
+   delay = max(delay, completion_timeout(engine));
+
if (delay >= HZ)
delay = round_jiffies_up_relative(delay);
mod_delayed_work(system_highpri_wq, >heartbeat.work, delay + 1);
@@ -48,12 +80,52 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp)
return rq;
 }
 
+static void
+untrack_heartbeat(struct intel_engine_cs *engine)
+{
+   struct i915_request *rq;
+
+   rq = engine->heartbeat.systole;
+   if (!rq)
+   return;
+
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT " completed\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.completed++);
+
+   dma_fence_remove_callback(>fence, >heartbeat.cb);
+   WRITE_ONCE(engine->heartbeat.systole, NULL);
+
+   i915_request_put(rq);
+}
+
+static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+   struct intel_engine_cs *engine =
+   container_of(cb, typeof(*engine), heartbeat.cb);
+
+   if (READ_ONCE(engine->heartbeat.systole))
+   mod_delayed_work(system_highpri_wq, >heartbeat.work, 0);
+}
+
+static void
+track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+   ENGINE_TRACE(engine, "heartbeat 

[Intel-gfx] [PATCH 3/3] drm/i915: Refine VT-d scanout workaround

2021-02-12 Thread Chris Wilson
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 160 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.

If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.

v2: Move the guard from modifying drm_mm_node.start which is still used
by the drm_mm itself, into an adjustment of node.start at the point of
use.

Signed-off-by: Chris Wilson 
Cc: Ville Syrjälä 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c |  3 +++
 drivers/gpu/drm/i915/gt/intel_ggtt.c   | 25 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 drivers/gpu/drm/i915/i915_vma.c| 10 +
 4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 0478b069c202..9f2ccc255ca1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (ret)
goto err;
 
+   if (intel_scanout_needs_vtd_wa(i915))
+   flags |= PIN_VTD;
+
/*
 * As the user may map the buffer once pinned in the display plane
 * (e.g. libkms for the bootup splash), we have to ensure that we
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 6b326138e765..251b50884d1c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm,
 {
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-   gen8_pte_t __iomem *gtt_base =
-   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-   int i;
-
-   if (WARN(num_entries > max_entries,
-"First entry = %d; Num entries = %d (max=%d)\n",
-first_entry, num_entries, max_entries))
-   num_entries = max_entries;
-
-   for (i = 0; i < num_entries; i++)
-   gen8_set_pte(_base[i], scratch_pte);
-}
-
 static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
 {
/*
@@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
ggtt->vm.clear_range = nop_clear_range;
-   if (intel_scanout_needs_vtd_wa(i915))
-   ggtt->vm.clear_range = gen8_ggtt_clear_range;
 
ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
 
@@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.alloc_pt_dma = alloc_pt_dma;
 
ggtt->vm.clear_range = nop_clear_range;
-   if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
+   if (!HAS_FULL_PPGTT(i915))
ggtt->vm.clear_range = gen6_ggtt_clear_range;
ggtt->vm.insert_page = gen6_ggtt_insert_page;
ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c9b0ee5e1d23..8a2dfc7144cf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -41,6 +41,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 #define PIN_HIGH   BIT_ULL(5)
 #define PIN_OFFSET_BIASBIT_ULL(6)
 #define PIN_OFFSET_FIXED   BIT_ULL(7)
+#define PIN_VTDBIT_ULL(8)
 
 #define PIN_GLOBAL BIT_ULL(10) /* I915_VMA_GLOBAL_BIND */
 #define PIN_USER   BIT_ULL(11) /* I915_VMA_LOCAL_BIND */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 155f510b4cc6..929d2a1a20b8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -38,6 +38,8 @@
 #include "i915_trace.h"
 #include "i

[Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma

2021-02-12 Thread Chris Wilson
Introduce the concept of padding the i915_vma with guard pages before
and aft. The major consequence is that all ordinary uses of i915_vma
must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size
directly, as the drm_mm_node will include the guard pages that surround
our object.

So in this patch, we look for all uses of i915_vma->node.start that
instead need to include the guard offset and switch them to
i915_vma_offset(), and in a few cases to i915_ggtt_offset(). Notable
exceptions are the selftests, which expect exact behaviour.

The biggest connundrum is how exactly to mix request a fixed address
with guard pages, particular through the existing uABI. The user does
not know about guard pages, so such must be transparent to the user, and
so the execobj.offset must be that of the object itself excluding the
guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages.

In the next patch, we start using guard pages for scanout objects. While
these are limited to GGTT vma, on a few platforms these vma (or at least
an alias of the vma) is shared with userspace, so we may leak the
existence of such guards if we are not careful to ensure that the
execobj.offset is transparent and excludes the guards. (On such platforms,
without full-ppgtt, userspace has to use relocations so the presence of
more untouchable regions within its GTT such be of no further issue.)

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 12 ++--
 drivers/gpu/drm/i915/i915_vma.c   | 10 +++---
 drivers/gpu/drm/i915/i915_vma.h   |  8 
 drivers/gpu/drm/i915/i915_vma_types.h |  3 ++-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index c5803c434d33..6b326138e765 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen8_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
gen8_set_pte(gte++, pte_encode | addr);
GEM_BUG_ON(gte > end);
@@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen6_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
-   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
+   end = gte + vma->guard / I915_GTT_PAGE_SIZE;
+   while (gte < end)
+   gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+   end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, iter, vma->pages)
iowrite32(vm->pte_encode(addr, level, flags), gte++);
GEM_BUG_ON(gte > end);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 17fe455bd770..155f510b4cc6 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, 
unsigned long color)
 static int
 i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 {
-   unsigned long color;
+   unsigned long color, guard;
u64 start, end;
int ret;
 
@@ -631,13 +631,16 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
GEM_BUG_ON(drm_mm_node_allocated(>node));
 
size = max(size, vma->size);
-   alignment = max(alignment, vma->display_alignment);
+   alignment = max_t(typeof(alignment), alignment, vma->display_alignment);
if (flags & PIN_MAPPABLE) {
size = max_t(typeof(size), size, vma->fence_size);
alignment = max_t(typeof(alignment),
  alignment, vma->fence_alignment);
}
 
+   guard = 0;
+   size += 2 * guard;
+
GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT));
GEM_BUG_ON(!is_power_of_2(alignment));
@@ -674,7 +677,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
alignment, u64 flags)
return -EINVAL;
 
ret = i915_gem_gtt_reserve(vma->vm, >node,
-  size, offset, color,
+  size, offset - guard, color,
   flags);
if (ret)
return ret;
@@ -725,6 +72

[Intel-gfx] [PATCH 1/3] drm/i915: Wrap all access to i915_vma.node.start|size

2021-02-12 Thread Chris Wilson
We already wrap i915_vma.node.start for use with the GGTT, as there we
can perform additional sanity checks that the node belongs to the GGTT
and fits within the 32b registers. In the next couple of patches, we
will introduce guard pages around the objects _inside_ the drm_mm_node
allocation. That is we will offset the vma->pages so that the first page
is at drm_mm_node.start + vma->guard (not 0 as is currently the case).
All users must then not use i915_vma.node.start directly, but compute
the guard offset, thus all users are converted to use a
i915_vma_offset() wrapper.

The notable exceptions are the selftests that are testing exact
behaviour of i915_vma_pin/i915_vma_insert.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/display/intel_fbdev.c|  6 ++--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 29 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  2 +-
 .../gpu/drm/i915/gem/i915_gem_object_blt.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c|  4 +--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  | 11 +++
 .../drm/i915/gem/selftests/i915_gem_context.c | 16 ++
 .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
 .../drm/i915/gem/selftests/igt_gem_utils.c|  4 +--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  2 +-
 drivers/gpu/drm/i915/gt/gen7_renderclear.c|  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  8 ++---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  5 ++--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  3 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  7 +++--
 drivers/gpu/drm/i915/gt/intel_renderstate.c   |  2 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  8 ++---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 14 +
 .../drm/i915/gt/selftest_ring_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 12 
 drivers/gpu/drm/i915/i915_debugfs.c   |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  4 +--
 drivers/gpu/drm/i915/i915_perf.c  |  2 +-
 drivers/gpu/drm/i915/i915_vma.c   | 21 +++---
 drivers/gpu/drm/i915/i915_vma.h   | 18 ++--
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 ++--
 drivers/gpu/drm/i915/selftests/i915_request.c |  8 ++---
 .../gpu/drm/i915/selftests/i915_scheduler.c   |  4 +--
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  9 --
 32 files changed, 134 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 07db8e83f98e..2a72fb3c8416 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -237,8 +237,8 @@ static int intelfb_create(struct drm_fb_helper *helper,
 
/* Our framebuffer is the entirety of fbdev's system memory */
info->fix.smem_start =
-   (unsigned long)(ggtt->gmadr.start + vma->node.start);
-   info->fix.smem_len = vma->node.size;
+   (unsigned long)(ggtt->gmadr.start + i915_ggtt_offset(vma));
+   info->fix.smem_len = vma->size;
 
vaddr = i915_vma_pin_iomap(vma);
if (IS_ERR(vaddr)) {
@@ -248,7 +248,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
goto out_unpin;
}
info->screen_base = vaddr;
-   info->screen_size = vma->node.size;
+   info->screen_size = vma->size;
 
drm_fb_helper_fill_info(info, >helper, sizes);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index fe170186dd42..f75f6751cf3f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -368,22 +368,23 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 
*entry,
 const struct i915_vma *vma,
 unsigned int flags)
 {
-   if (vma->node.size < entry->pad_to_size)
+   const u64 start = i915_vma_offset(vma);
+   const u64 size = i915_vma_size(vma);
+
+   if (size < entry->pad_to_size)
return true;
 
-   if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment))
+   if (entry->alignment && !IS_ALIGNED(start, entry->alignment))
return true;
 
-   if (flags & EXEC_OBJECT_PINNED &&
-   vma->node.start != entry->offset)
+   if (flags & EXEC_OBJECT_PINNED && start != entry->offset)
return true;
 
-   if (flags & __EXEC_OBJECT_NEEDS_BIAS &&
-   vma->node.start < BATCH_OFFSET_BIAS)
+   if (flags & __EXEC_OBJECT_NEEDS_BIAS && start < BATCH_OFFSET_BIAS)
 

[Intel-gfx] [PATCH v5] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-11 Thread Chris Wilson
The heartbeat runs through a few phases that we expect to complete
within a certain number of heartbeat intervals. First we must submit the
heartbeat to the queue, and if the queue is occupied it may take a
couple of intervals before the heartbeat preempts the workload and is
submitted to HW. Once running on HW, completion is not instantaneous as
it may have to first reset the current workload before it itself runs
through the empty request and signals completion. As such, we know that
the heartbeat must take at least the preempt reset timeout and before we
have had a chance to reset the engine, we do not want to issue a global
reset ourselves (simply so that we only try to do one reset at a time
and not confuse ourselves by resetting twice and hitting an innocent.)

So by taking into consideration that once running the request must take
a finite amount of time, we can delay the final completion check to
accommodate that and avoid checking too early (before we've had a chance
to handle any engine resets required).

v3: Now with more tracking for selftests and detection of
false/unexpected hang reports.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
Suggested-by: CQ Tang 
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 92 +++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  7 ++
 .../drm/i915/gt/selftest_engine_heartbeat.c   | 93 ---
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  5 +-
 4 files changed, 148 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..e14f9eab5779 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -20,6 +20,18 @@
  * issue a reset -- in the hope that restores progress.
  */
 
+#define HEARTBEAT_COMPLETION 50u /* milliseconds */
+
+static long completion_timeout(const struct intel_engine_cs *engine)
+{
+   long timeout = HEARTBEAT_COMPLETION;
+
+   if (intel_engine_has_preempt_reset(engine))
+   timeout += READ_ONCE(engine->props.preempt_timeout_ms);
+
+   return msecs_to_jiffies(timeout);
+}
+
 static bool next_heartbeat(struct intel_engine_cs *engine)
 {
long delay;
@@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine)
return false;
 
delay = msecs_to_jiffies_timeout(delay);
+
+   /*
+* Once we submit a heartbeat to the HW, we know that it will take
+* at least a certain amount of time to complete. On a hanging system
+* it will first have to wait for the preempt reset timeout, and
+* then it will take some time for the reset to resume with the
+* heartbeat and for it to complete. So once we have submitted the
+* heartbeat to HW, we can wait a while longer before declaring the
+* engine stuck and forcing a reset ourselves. If we do a reset
+* and the engine is also doing a reset, it is possible that we
+* reset the engine twice, harming an innocent.
+*
+* Before we have sumitted the heartbeat, we do not want to change
+* the interval as we want to promote the heartbeat and trigger
+* preemption in a deterministic time frame.
+*/
+   if (engine->heartbeat.systole &&
+   i915_request_is_active(engine->heartbeat.systole))
+   delay = max(delay, completion_timeout(engine));
+
if (delay >= HZ)
delay = round_jiffies_up_relative(delay);
mod_delayed_work(system_highpri_wq, >heartbeat.work, delay + 1);
@@ -48,12 +80,39 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp)
return rq;
 }
 
+static void
+untrack_heartbeat(struct intel_engine_cs *engine)
+{
+   struct i915_request *rq;
+
+   rq = engine->heartbeat.systole;
+   if (!rq)
+   return;
+
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT " completed\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.completed++);
+
+   WRITE_ONCE(engine->heartbeat.systole, NULL);
+   i915_request_put(rq);
+}
+
+static void
+track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT " started\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.submitted++);
+
+   WRITE_ONCE(engine->heartbeat.systole, i915_request_get(rq));
+   if (!next_heartbeat(engine))
+   untrack_heartbeat(engine);
+}
+
 static void idle_pulse(struct intel_engine_cs *engine, struct i915_request *rq)
 {
engine->wakeref_serial = READ_ONCE(engine->serial) + 1;
i915_request_add_active_barriers(rq);
if (!engine->heartbeat.systole && intel_engine_has_heartbeat(engine))
-   engine->heartb

[Intel-gfx] [PATCH v4] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-11 Thread Chris Wilson
The heartbeat runs through a few phases that we expect to complete
within a certain number of heartbeat intervals. First we must submit the
heartbeat to the queue, and if the queue is occupied it may take a
couple of intervals before the heartbeat preempts the workload and is
submitted to HW. Once running on HW, completion is not instantaneous as
it may have to first reset the current workload before it itself runs
through the empty request and signals completion. As such, we know that
the heartbeat must take at least the preempt reset timeout and before we
have had a chance to reset the engine, we do not want to issue a global
reset ourselves (simply so that we only try to do one reset at a time
and not confuse ourselves by resetting twice and hitting an innocent.)

So by taking into consideration that once running the request must take
a finite amount of time, we can delay the final completion check to
accommodate that and avoid checking too early (before we've had a chance
to handle any engine resets required).

v2: Attach a callback to flush the work immediately upon the heartbeat
completion and insert the delay before the next.

v3: Now with more tracking for selftests and detection of
false/unexpected hang reports.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
Suggested-by: CQ Tang 
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 107 +++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   8 ++
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  93 +--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   5 +-
 4 files changed, 163 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..351f58def216 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -20,6 +20,18 @@
  * issue a reset -- in the hope that restores progress.
  */
 
+#define HEARTBEAT_COMPLETION 50u /* milliseconds */
+
+static long completion_timeout(const struct intel_engine_cs *engine)
+{
+   long timeout = HEARTBEAT_COMPLETION;
+
+   if (intel_engine_has_preempt_reset(engine))
+   timeout += READ_ONCE(engine->props.preempt_timeout_ms);
+
+   return msecs_to_jiffies(timeout);
+}
+
 static bool next_heartbeat(struct intel_engine_cs *engine)
 {
long delay;
@@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine)
return false;
 
delay = msecs_to_jiffies_timeout(delay);
+
+   /*
+* Once we submit a heartbeat to the HW, we know that it will take
+* at least a certain amount of time to complete. On a hanging system
+* it will first have to wait for the preempt reset timeout, and
+* then it will take some time for the reset to resume with the
+* heartbeat and for it to complete. So once we have submitted the
+* heartbeat to HW, we can wait a while longer before declaring the
+* engine stuck and forcing a reset ourselves. If we do a reset
+* and the engine is also doing a reset, it is possible that we
+* reset the engine twice, harming an innocent.
+*
+* Before we have sumitted the heartbeat, we do not want to change
+* the interval as we want to promote the heartbeat and trigger
+* preemption in a deterministic time frame.
+*/
+   if (engine->heartbeat.systole &&
+   i915_request_is_active(engine->heartbeat.systole))
+   delay = max(delay, completion_timeout(engine));
+
if (delay >= HZ)
delay = round_jiffies_up_relative(delay);
mod_delayed_work(system_highpri_wq, >heartbeat.work, delay + 1);
@@ -48,12 +80,51 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp)
return rq;
 }
 
+static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+   struct intel_engine_cs *engine =
+   container_of(cb, typeof(*engine), heartbeat.cb);
+
+   if (READ_ONCE(engine->heartbeat.systole))
+   mod_delayed_work(system_highpri_wq, >heartbeat.work, 0);
+}
+
+static void
+untrack_heartbeat(struct intel_engine_cs *engine)
+{
+   struct i915_request *rq;
+
+   rq = fetch_and_zero(>heartbeat.systole);
+   if (!rq)
+   return;
+
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT "completed\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.completed++);
+
+   dma_fence_remove_callback(>fence, >heartbeat.cb);
+   i915_request_put(rq);
+}
+
+static void
+track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT "started\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.submitted++);
+

Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-11 Thread Chris Wilson
Quoting Matthew Auld (2021-02-11 17:00:20)
> Throwing some color_adjust at it might be another option to consider. 
> Maybe something like:
> 
> +static void i915_ggtt_color_adjust_vdt(const struct drm_mm_node *node,
> +  unsigned long color,
> +  u64 *start,
> +  u64 *end)
> +{
> +   if (color == COLOR_VDT) {
> +   *start += VDT_PADDING;
> +   *end -= VDT_PADDING;
> +   return;
> +   }
> +
> +   if (node->allocated && node->color == COLOR_VDT)
> +   *start += VDT_PADDING;
> +
> +   node = list_next_entry(node, node_list);
> +   if (node->allocated && node->color == COLOR_VDT)
> +   *end -= VDT_PADDING;
> +}
> 
> But not sure if I would call that simpler. Plus we need to add more 
> special casing in the eviction code. And I guess the cache coloring 
> might also be active here, which might be nasty. Also we end up with a 
> bunch of holes in the address space that are unusable, yet the mm search 
> will keep hitting them anyway. Ok, I guess that's what you meant with 
> not introducing "extra nodes". Hmmm.

I considered trying to use coloring, but the problem I found was in
knowing whether or not to fill outside of the vma with scratch pages. We
would have to lookup each PTE to check it is not in use, then fill with
scratch. And if we removed a neighbour, it would have to check to see if
it should replace the guard pages. (And it's more complicated by the
wrap-around of the VT-d, an object at beginning of the GGTT would
overfetch into the pages at the other end of the GGTT.)

I felt that make an explicit reservation and accounting the VT-d
overfetch to the scanout vma would save a lot of hassle. The PTE are
accounted for (will not be reused, and safe to clear after resume),
dedicated as scratch for the overfetch and the overfetch cannot wrap
around as we force the vma to be away from the edges.

Packing the information into the single scanout i915_vma so that we do a
single padded i915_vma_insert seemed to be much easier to manage than
having to do additional i915_vma_inserts on either side (and so we have
to somehow manage searching for enough space for all 3 in the first call
etc). Plus i915_vma is already massive :|
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-11 Thread Chris Wilson
Quoting Ville Syrjälä (2021-02-11 16:05:59)
> On Wed, Feb 10, 2021 at 11:39:46PM +0000, Chris Wilson wrote:
> > @@ -637,6 +642,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 
> > alignment, u64 flags)
> > alignment, vma->fence_alignment);
> >   }
> >  
> > + /* VT-d requires padding before/after the vma */
> > + if (flags & PIN_VTD) {
> > + alignment = max_t(typeof(alignment), alignment, VTD_GUARD);
> > + vma->guard = alignment;
> > + size += 2 * vma->guard;
> > + }
> > +
> >   GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
> >   GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT));
> >   GEM_BUG_ON(!is_power_of_2(alignment));
> someh> @@ -725,6 +737,11 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
> u64 alignment, u64 flags)
> >  
> >   list_add_tail(>vm_link, >vm->bound_list);
> >  
> > + if (flags & PIN_VTD) {
> > + vma->node.start += vma->guard;
> 
> Was a bit worried for a second that this might give the display
> a potentially misaligned vma start. But looks like you did consider
> all that: VTD_GUARD==POT, alignment + guard both get bumped
> to the max(). So AFAICS should guarantee everyone is happy.
> 
> I guess we're now wasting a lot more ggtt address space though?
> Not sure if anyone has ever been at risk of running out though.
> And DPT should help with this on new platforms.

Definitely this is a considerable bloat to most scanout buffers, which
for the sake of argument lets say are 8MiB. Still enough room for a flip
chain within the mappable portion, and when we get to scanouts that are
large enough to consume the majority of the GGTT, the fixed 2MiB of
padding is lost in the noise.

So handwaving it shouldn't lead to noticeably more thrashing of the
GGTT for existing platforms. There's too much recycling and too little
reuse of scanouts in current display systems for my liking, so the extra
25% overhead in GGTT updates is more likely to be a concern. (Though it
does balance out in that we now skip the clear after use.)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-11 Thread Chris Wilson
Quoting Matthew Auld (2021-02-11 14:25:41)
> On 10/02/2021 23:39, Chris Wilson wrote:
> > VT-d may cause overfetch of the scanout PTE, both before and after the
> > vma (depending on the scanout orientation). bspec recommends that we
> > provide a tile-row in either directions, and suggests using 160 PTE,
> > warning that the accesses will wrap around the ends of the GGTT.
> > Currently, we fill the entire GGTT with scratch pages when using VT-d to
> > always ensure there are valid entries around every vma, including
> > scanout. However, writing every PTE is slow as on recent devices we
> > perform 8MiB of uncached writes, incurring an extra 100ms during resume.
> > 
> > If instead we focus on only putting guard pages around scanout, we can
> > avoid touching the whole GGTT. To avoid having to introduce extra nodes
> > around each scanout vma, we adjust the scanout drm_mm_node to be smaller
> > than the allocated space, and fixup the extra PTE during dma binding.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Ville Syrjälä 
> > Cc: Matthew Auld 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_domain.c |  3 ++
> >   drivers/gpu/drm/i915/gt/intel_ggtt.c   | 37 --
> >   drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
> >   drivers/gpu/drm/i915/i915_vma.c| 23 ++
> >   drivers/gpu/drm/i915/i915_vma_types.h  |  1 +
> >   5 files changed, 41 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > index 0478b069c202..9f2ccc255ca1 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> > @@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct 
> > drm_i915_gem_object *obj,
> >   if (ret)
> >   goto err;
> >   
> > + if (intel_scanout_needs_vtd_wa(i915))
> > + flags |= PIN_VTD;
> > +
> >   /*
> >* As the user may map the buffer once pinned in the display plane
> >* (e.g. libkms for the bootup splash), we have to ensure that we
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index b0b8ded834f0..416f77f48561 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -238,6 +238,11 @@ static void gen8_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >   
> >   gte = (gen8_pte_t __iomem *)ggtt->gsm;
> >   gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > +
> > + end = gte - vma->guard / I915_GTT_PAGE_SIZE;
> > + while (end < gte)
> > + gen8_set_pte(end++, vm->scratch[0]->encode);
> > +
> >   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >   
> >   for_each_sgt_daddr(addr, iter, vma->pages)
> > @@ -245,6 +250,7 @@ static void gen8_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >   GEM_BUG_ON(gte > end);
> >   
> >   /* Fill the allocated but "unused" space beyond the end of the buffer 
> > */
> > + end += vma->guard / I915_GTT_PAGE_SIZE;
> >   while (gte < end)
> >   gen8_set_pte(gte++, vm->scratch[0]->encode);
> >   
> > @@ -289,6 +295,11 @@ static void gen6_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >   
> >   gte = (gen6_pte_t __iomem *)ggtt->gsm;
> >   gte += vma->node.start / I915_GTT_PAGE_SIZE;
> > +
> > + end = gte - vma->guard / I915_GTT_PAGE_SIZE;
> > + while (end < gte)
> > + gen8_set_pte(end++, vm->scratch[0]->encode);
> > +
> >   end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
> >   
> >   for_each_sgt_daddr(addr, iter, vma->pages)
> > @@ -296,6 +307,7 @@ static void gen6_ggtt_insert_entries(struct 
> > i915_address_space *vm,
> >   GEM_BUG_ON(gte > end);
> >   
> >   /* Fill the allocated but "unused" space beyond the end of the buffer 
> > */
> > + end += vma->guard / I915_GTT_PAGE_SIZE;
> >   while (gte < end)
> >   iowrite32(vm->scratch[0]->encode, gte++);
> >   
> > @@ -311,27 +323,6 @@ static void nop_clear_range(struct i915_address_space 
> > *vm,
> >   {
> >   }
> >   
> > -static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> > -   u64 st

Re: [Intel-gfx] [PATCH] drm/i915: Probe system memory regions before enabling HW

2021-02-11 Thread Chris Wilson
Quoting Matthew Auld (2021-02-11 12:27:56)
> On 11/02/2021 11:20, Chris Wilson wrote:
> > If we want to track system/stolen as memory regions, we need to setup
> > our bookkeeping *before* we want to start allocating and reserving
> > objects in those regions. In particular, in setup up the GGTT we will
> > try to preallocate stolen regions configured by the BIOS, and so should
> > prepare the system-stolen memory region beforehand.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Matthew Auld 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index b708517d3972..139cea4443fd 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -557,6 +557,10 @@ static int i915_driver_hw_probe(struct 
> > drm_i915_private *dev_priv)
> >   
> >   i915_perf_init(dev_priv);
> >   
> > + ret = intel_memory_regions_hw_probe(dev_priv);
> > + if (ret)
> > + goto err_ggtt;
> 
> Hmmm, adjust_stolen is also peeking at ggtt_total_entries(ggtt) on some 
> old platforms.

A nice catch-22. More splitting required. Maybe a fixup pass from
init_ggtt would work best.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-11 Thread Chris Wilson
The heartbeat runs through a few phases that we expect to complete
within a certain number of heartbeat intervals. First we must submit the
heartbeat to the queue, and if the queue is occupied it may take a
couple of intervals before the heartbeat preempts the workload and is
submitted to HW. Once running on HW, completion is not instantaneous as
it may have to first reset the current workload before it itself runs
through the empty request and signals completion. As such, we know that
the heartbeat must take at least the preempt reset timeout and before we
have had a chance to reset the engine, we do not want to issue a global
reset ourselves (simply so that we only try to do one reset at a time
and not confuse ourselves by resetting twice and hitting an innocent.)

So by taking into consideration that once running the request must take
a finite amount of time, we can delay the final completion check to
accommodate that and avoid checking too early (before we've had a chance
to handle any engine resets required).

v2: Attach a callback to flush the work immediately upon the heartbeat
completion and insert the delay before the next.

v3: Now with more tracking for selftests and detection of
false/unexpected hang reports.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
Suggested-by: CQ Tang 
Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 104 +++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   8 ++
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  92 ++--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   5 +-
 4 files changed, 159 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..71aa02995417 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -20,6 +20,18 @@
  * issue a reset -- in the hope that restores progress.
  */
 
+#define HEARTBEAT_COMPLETION 50u /* milliseconds */
+
+static long completion_timeout(const struct intel_engine_cs *engine)
+{
+   long timeout = HEARTBEAT_COMPLETION;
+
+   if (intel_engine_has_preempt_reset(engine))
+   timeout += READ_ONCE(engine->props.preempt_timeout_ms);
+
+   return msecs_to_jiffies(timeout);
+}
+
 static bool next_heartbeat(struct intel_engine_cs *engine)
 {
long delay;
@@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine)
return false;
 
delay = msecs_to_jiffies_timeout(delay);
+
+   /*
+* Once we submit a heartbeat to the HW, we know that it will take
+* at least a certain amount of time to complete. On a hanging system
+* it will first have to wait for the preempt reset timeout, and
+* then it will take some time for the reset to resume with the
+* heartbeat and for it to complete. So once we have submitted the
+* heartbeat to HW, we can wait a while longer before declaring the
+* engine stuck and forcing a reset ourselves. If we do a reset
+* and the engine is also doing a reset, it is possible that we
+* reset the engine twice, harming an innocent.
+*
+* Before we have sumitted the heartbeat, we do not want to change
+* the interval as we want to promote the heartbeat and trigger
+* preemption in a deterministic time frame.
+*/
+   if (engine->heartbeat.systole &&
+   i915_request_is_active(engine->heartbeat.systole))
+   delay = max(delay, completion_timeout(engine));
+
if (delay >= HZ)
delay = round_jiffies_up_relative(delay);
mod_delayed_work(system_highpri_wq, >heartbeat.work, delay + 1);
@@ -48,12 +80,51 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp)
return rq;
 }
 
+static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+   struct intel_engine_cs *engine =
+   container_of(cb, typeof(*engine), heartbeat.cb);
+
+   if (READ_ONCE(engine->heartbeat.systole))
+   mod_delayed_work(system_highpri_wq, >heartbeat.work, 0);
+}
+
+static void
+untrack_heartbeat(struct intel_engine_cs *engine)
+{
+   struct i915_request *rq;
+
+   rq = fetch_and_zero(>heartbeat.systole);
+   if (!rq)
+   return;
+
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT "completed\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.completed++);
+
+   dma_fence_remove_callback(>fence, >heartbeat.cb);
+   i915_request_put(rq);
+}
+
+static void
+track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+   ENGINE_TRACE(engine, "heartbeat " RQ_FMT "started\n", RQ_ARG(rq));
+   I915_SELFTEST_ONLY(engine->heartbeat.submitted++);
+

[Intel-gfx] [PATCH] lib: Add a YAML emitter

2021-02-11 Thread Chris Wilson
Provide a library to generate correct YAML for use in structured debugfs
or similar information dumps. This will be useful to pull our large
information dumps into a forwards compatible, parse-able file-format by
forcing some structure upon ourselves!

Originally from https://github.com/yaml/libyaml/blob/master/src/emitter.c
under a permissive MIT licence.

Signed-off-by: Chris Wilson 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 

---
Converting to kerneldoc is about the last major task. It uses an
opencoded stack struct which might be nice to convert to a library,
maybe just use XArray?

It has been dogfooded to convert i915_engine_info (and friends)
https://patchwork.freedesktop.org/patch/353209/?series=73403=1
---
 include/linux/yaml-events.h |  259 
 include/linux/yaml.h|  190 +++
 lib/Kconfig |   11 +
 lib/Makefile|2 +
 lib/yaml/Makefile   |8 +
 lib/yaml/test-yaml.c|  123 ++
 lib/yaml/yaml-emitter.c | 2539 +++
 lib/yaml/yaml-emitter.h |  140 ++
 lib/yaml/yaml-events.c  |  424 ++
 lib/yaml/yaml-simple.c  |  227 
 10 files changed, 3923 insertions(+)
 create mode 100644 include/linux/yaml-events.h
 create mode 100644 include/linux/yaml.h
 create mode 100644 lib/yaml/Makefile
 create mode 100644 lib/yaml/test-yaml.c
 create mode 100644 lib/yaml/yaml-emitter.c
 create mode 100644 lib/yaml/yaml-emitter.h
 create mode 100644 lib/yaml/yaml-events.c
 create mode 100644 lib/yaml/yaml-simple.c

diff --git a/include/linux/yaml-events.h b/include/linux/yaml-events.h
new file mode 100644
index ..33d1bb227e65
--- /dev/null
+++ b/include/linux/yaml-events.h
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright (c) 2006-2016 Kirill Simonov
+ * Copyright (c) 2017-2019 Ingy döt Net
+ * Copyright (c) 2020 Intel Corporation
+ */
+
+#ifndef __LINUX_YAML_EVENTS_H__
+#define __LINUX_YAML_EVENTS_H__
+
+#include 
+#include 
+#include 
+
+/** The version directive data. */
+struct yaml_version_directive {
+   int major;
+   int minor;
+};
+
+/** The tag directive data. */
+struct yaml_tag_directive {
+   char *handle;
+   char *prefix;
+};
+
+/** The pointer position. */
+struct yaml_mark {
+   size_t index;
+   size_t line;
+   size_t column;
+};
+
+enum yaml_event_type {
+   YAML_NO_EVENT = 0,
+
+   YAML_STREAM_START_EVENT,
+   YAML_STREAM_END_EVENT,
+
+   YAML_DOCUMENT_START_EVENT,
+   YAML_DOCUMENT_END_EVENT,
+
+   YAML_ALIAS_EVENT,
+   YAML_SCALAR_EVENT,
+
+   YAML_SEQUENCE_START_EVENT,
+   YAML_SEQUENCE_END_EVENT,
+
+   YAML_MAPPING_START_EVENT,
+   YAML_MAPPING_END_EVENT
+};
+
+struct yaml_event {
+   enum yaml_event_type type;
+   struct list_head link;
+
+   union {
+   /** The stream parameters (for YAML_STREAM_START_EVENT). */
+   struct {
+   } stream_start;
+
+   /** The document parameters (for YAML_DOCUMENT_START_EVENT). */
+   struct {
+   struct yaml_version_directive *version;
+
+   struct {
+   struct yaml_tag_directive *start;
+   struct yaml_tag_directive *end;
+   } tags;
+
+   int implicit;
+   } document_start;
+
+   /** The document end parameters (for YAML_DOCUMENT_END_EVENT). 
*/
+   struct {
+   int implicit;
+   } document_end;
+
+   /** The alias parameters (for YAML_ALIAS_EVENT). */
+   struct {
+   char *anchor;
+   } alias;
+
+   /** The scalar parameters (for YAML_SCALAR_EVENT). */
+   struct {
+   char *anchor;
+   char *tag;
+   char *value;
+   size_t length;
+   int plain_implicit;
+   int quoted_implicit;
+   enum yaml_scalar_style style;
+   } scalar;
+
+   /** The sequence parameters (for YAML_SEQUENCE_START_EVENT). */
+   struct {
+   char *anchor;
+   char *tag;
+   int implicit;
+   enum yaml_sequence_style style;
+   } sequence_start;
+
+   /** The mapping parameters (for YAML_MAPPING_START_EVENT). */
+   struct {
+   char *anchor;
+   char *tag;
+   int implicit;
+   enum yaml_mapping_style style;
+   } mapping_start;
+   };
+
+   struct yaml_mark start_mark;
+   struct yaml_mark end_mark;
+};
+
+struct yaml_event *yaml_stream_start_event_create(gfp_t gfp);
+struct yaml_event *yaml_stream_end_event_create(gfp_t gfp

[Intel-gfx] [PATCH] drm/i915: Defer object allocation from GGTT probe to GGTT init

2021-02-11 Thread Chris Wilson
Rather than try and allocate objects as we perform our early HW probes,
defer the allocation for GGTT objects (such as the scratch page) to later
in the initialisation.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 38 +++-
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index b0b8ded834f0..23c7eb462b2f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -78,19 +78,29 @@ static int ggtt_init_hw(struct i915_ggtt *ggtt)
  */
 int i915_ggtt_init_hw(struct drm_i915_private *i915)
 {
+   struct i915_ggtt *ggtt = >ggtt;
+   u32 pte_flags;
int ret;
 
+   ret = setup_scratch_page(>vm);
+   if (ret)
+   return ret;
+
+   pte_flags = 0;
+   if (i915_gem_object_is_lmem(ggtt->vm.scratch[0]))
+   pte_flags |= PTE_LM;
+
+   ggtt->vm.scratch[0]->encode =
+   ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
+   I915_CACHE_NONE, 0);
+
/*
 * Note that we use page colouring to enforce a guard page at the
 * end of the address space. This is required as the CS may prefetch
 * beyond the end of the batch buffer, across the page boundary,
 * and beyond the end of the GTT if we do not provide a guard.
 */
-   ret = ggtt_init_hw(>ggtt);
-   if (ret)
-   return ret;
-
-   return 0;
+   return ggtt_init_hw(ggtt);
 }
 
 /*
@@ -803,8 +813,6 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 
size)
struct drm_i915_private *i915 = ggtt->vm.i915;
struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
phys_addr_t phys_addr;
-   u32 pte_flags;
-   int ret;
 
/* For Modern GENs the PTEs and register space are split in the BAR */
phys_addr = pci_resource_start(pdev, 0) + pci_resource_len(pdev, 0) / 2;
@@ -825,22 +833,6 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 
size)
return -ENOMEM;
}
 
-   ret = setup_scratch_page(>vm);
-   if (ret) {
-   drm_err(>drm, "Scratch setup failed\n");
-   /* iounmap will also get called at remove, but meh */
-   iounmap(ggtt->gsm);
-   return ret;
-   }
-
-   pte_flags = 0;
-   if (i915_gem_object_is_lmem(ggtt->vm.scratch[0]))
-   pte_flags |= PTE_LM;
-
-   ggtt->vm.scratch[0]->encode =
-   ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-   I915_CACHE_NONE, pte_flags);
-
return 0;
 }
 
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Probe system memory regions before enabling HW

2021-02-11 Thread Chris Wilson
If we want to track system/stolen as memory regions, we need to setup
our bookkeeping *before* we want to start allocating and reserving
objects in those regions. In particular, in setup up the GGTT we will
try to preallocate stolen regions configured by the BIOS, and so should
prepare the system-stolen memory region beforehand.

Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_drv.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b708517d3972..139cea4443fd 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -557,6 +557,10 @@ static int i915_driver_hw_probe(struct drm_i915_private 
*dev_priv)
 
i915_perf_init(dev_priv);
 
+   ret = intel_memory_regions_hw_probe(dev_priv);
+   if (ret)
+   goto err_ggtt;
+
ret = i915_ggtt_probe_hw(dev_priv);
if (ret)
goto err_perf;
@@ -569,10 +573,6 @@ static int i915_driver_hw_probe(struct drm_i915_private 
*dev_priv)
if (ret)
goto err_ggtt;
 
-   ret = intel_memory_regions_hw_probe(dev_priv);
-   if (ret)
-   goto err_ggtt;
-
intel_gt_init_hw_early(_priv->gt, _priv->ggtt);
 
ret = intel_gt_probe_lmem(_priv->gt);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues

2021-02-11 Thread Chris Wilson
Quoting Hans de Goede (2021-02-11 10:36:13)
> Hi,
> 
> On 2/10/21 1:48 PM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-10 10:37:19)
> >> Hi,
> >>
> >> On 2/10/21 12:07 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-09 11:46:46)
> >>>> Hi,
> >>>>
> >>>> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>>>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>>>> Hi All,
> >>>>>>
> >>>>>> We (Fedora) have been receiving reports from multiple users about gfx 
> >>>>>> issues / glitches
> >>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell 
> >>>>>> iGPUs and all
> >>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes 
> >>>>>> things, see:
> >>>>>
> >>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>>>> did not report any differences with and without mitigations. I have yet
> >>>>> to test other platforms. So I don't yet have an alternative.
> >>>>
> >>>> Note the original / first reporter of:
> >>>>
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>>>
> >>>> Is using hsw-gt2, so it seems that the problem is not just the enabling 
> >>>> of
> >>>> the mitigations on ivy-bridge / bay-trail but that there actually is
> >>>> a regression on devices where the WA worked fine before...
> >>>
> >>> There have been 3 crashes uploaded related to v5.10.9, and in all 3
> >>> cases the ACTHD has been in the first page. This strongly suggests that
> >>> the w/a is scribbling over address 0. And there's then a very good
> >>> chance that
> >>>
> >>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> >>> Author: Chris Wilson 
> >>> Date:   Mon Jan 25 12:50:33 2021 +
> >>>
> >>> drm/i915/gt: Always try to reserve GGTT address 0x0
> >>> 
> >>> commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> >>>
> >>> in v5.10.14 is sufficient to hide the issue.
> >>
> >> That one actually is already in v5.10.13 and the various reportes of these
> >> issues have already tested 5.10.13. They did mention that it took longer
> >> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> >>
> >> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5
> > 
> > Started looking for scratch page overwrites, and found this little gem:
> > https://patchwork.freedesktop.org/patch/420436/?series=86947=1
> > 
> > Looks promising wrt the cause of overwriting random addresses -- and
> > I hope that is the explanation for the glitches/hangs. I have a hsw gt2
> > with gnome shell, piglit is happy, but I suspect it is all due to
> > placement and so will only occur at random.
> 
> If you can give me a list of commits to cherry-pick then I can prepare
> a Fedora 5.10.y kernel which those added for the group of Fedora users
> who are hitting this to test.

e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

are missing from v5.10.15
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend

2021-02-11 Thread Chris Wilson
Quoting Rodrigo Vivi (2021-02-11 04:25:17)
> On Wed, Feb 10, 2021 at 10:19:50PM +0000, Chris Wilson wrote:
> > After calling intel_gt_suspend_prepare(), the driver starts to turn off
> > various subsystems, such as clearing the GGTT, before calling
> > intel_gt_suspend_late() to relinquish control over the GT. However, if
> > we still have internal GPU state active as we clear the GGTT, the GPU
> > may write back its internal state to the residual GGTT addresses that
> > are now pointing into scratch. Let's reset the GPU to clear that
> > internal state as soon we have idled the GPU in prepare-to-suspend.
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> > b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index 0bd303d2823e..f41612faa269 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
> >   wait_for_suspend(gt);
> 
> you just wedged the gpu here...

Potentially. As a means to clear a stuck GPU and force it to idle.
 
> >   intel_uc_suspend(>uc);
> > +
> > + /* Flush all the contexts and internal state before turning off GGTT 
> > */
> > + gt_sanitize(gt, false);
> 
> and now we are unsetting wedge here...
> 
> is this right?

But irrelevant, since it is undone on any of the resume pathways which
must be taken by this point.

Resume has been for many years the method to unwedge a GPU; with the
presumption being that the intervening PCI level reset would be enough
to recover the GPU. Otherwise, it would presumably quite quickly go back
into the wedged state.

The wedging on suspend is just there to cancel outstanding work. Which
is not what we want (we just want to remove work), but is what we have
for the moment. The sanitize is to make sure we don't leak our state
beyond our control of the HW.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/i915: Clear internal GT state on hibernate

2021-02-11 Thread Chris Wilson
Quoting Rodrigo Vivi (2021-02-11 04:28:41)
> On Wed, Feb 10, 2021 at 10:19:51PM +0000, Chris Wilson wrote:
> > Call intel_gt_suspend_late() to disable the GT before hibernation.
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_pm.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > index 000e1cd8e920..da0ef483ad6b 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > @@ -115,6 +115,8 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
> >* the objects as well, see i915_gem_freeze()
> >*/
> >  
> > + intel_gt_suspend_late(>gt);
> > +
> 
> why are you calling this here instead of directly in i915_drm_suspend_late ?

For symmetry with the current code paths.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues

2021-02-10 Thread Chris Wilson
Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx 
> >>>> issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell 
> >>>> iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes 
> >>>> things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson 
> > Date:   Mon Jan 25 12:50:33 2021 +
> > 
> > drm/i915/gt: Always try to reserve GGTT address 0x0
> > 
> > commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

There's also another pair of pipecontrols required for that:

d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")

which didn't get picked up for stable.

https://intel-gfx-ci.01.org/tree/linux-stable/index-alt.html?

shows that we are still missing a couple of fixes for the w/a, at least
compare to BAT on drm-tip.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

2021-02-10 Thread Chris Wilson
VT-d may cause overfetch of the scanout PTE, both before and after the
vma (depending on the scanout orientation). bspec recommends that we
provide a tile-row in either directions, and suggests using 160 PTE,
warning that the accesses will wrap around the ends of the GGTT.
Currently, we fill the entire GGTT with scratch pages when using VT-d to
always ensure there are valid entries around every vma, including
scanout. However, writing every PTE is slow as on recent devices we
perform 8MiB of uncached writes, incurring an extra 100ms during resume.

If instead we focus on only putting guard pages around scanout, we can
avoid touching the whole GGTT. To avoid having to introduce extra nodes
around each scanout vma, we adjust the scanout drm_mm_node to be smaller
than the allocated space, and fixup the extra PTE during dma binding.

Signed-off-by: Chris Wilson 
Cc: Ville Syrjälä 
Cc: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_domain.c |  3 ++
 drivers/gpu/drm/i915/gt/intel_ggtt.c   | 37 --
 drivers/gpu/drm/i915/i915_gem_gtt.h|  1 +
 drivers/gpu/drm/i915/i915_vma.c| 23 ++
 drivers/gpu/drm/i915/i915_vma_types.h  |  1 +
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 0478b069c202..9f2ccc255ca1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
if (ret)
goto err;
 
+   if (intel_scanout_needs_vtd_wa(i915))
+   flags |= PIN_VTD;
+
/*
 * As the user may map the buffer once pinned in the display plane
 * (e.g. libkms for the bootup splash), we have to ensure that we
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index b0b8ded834f0..416f77f48561 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -238,6 +238,11 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen8_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
+
+   end = gte - vma->guard / I915_GTT_PAGE_SIZE;
+   while (end < gte)
+   gen8_set_pte(end++, vm->scratch[0]->encode);
+
end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
for_each_sgt_daddr(addr, iter, vma->pages)
@@ -245,6 +250,7 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
GEM_BUG_ON(gte > end);
 
/* Fill the allocated but "unused" space beyond the end of the buffer */
+   end += vma->guard / I915_GTT_PAGE_SIZE;
while (gte < end)
gen8_set_pte(gte++, vm->scratch[0]->encode);
 
@@ -289,6 +295,11 @@ static void gen6_ggtt_insert_entries(struct 
i915_address_space *vm,
 
gte = (gen6_pte_t __iomem *)ggtt->gsm;
gte += vma->node.start / I915_GTT_PAGE_SIZE;
+
+   end = gte - vma->guard / I915_GTT_PAGE_SIZE;
+   while (end < gte)
+   gen8_set_pte(end++, vm->scratch[0]->encode);
+
end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
 
for_each_sgt_daddr(addr, iter, vma->pages)
@@ -296,6 +307,7 @@ static void gen6_ggtt_insert_entries(struct 
i915_address_space *vm,
GEM_BUG_ON(gte > end);
 
/* Fill the allocated but "unused" space beyond the end of the buffer */
+   end += vma->guard / I915_GTT_PAGE_SIZE;
while (gte < end)
iowrite32(vm->scratch[0]->encode, gte++);
 
@@ -311,27 +323,6 @@ static void nop_clear_range(struct i915_address_space *vm,
 {
 }
 
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
- u64 start, u64 length)
-{
-   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-   gen8_pte_t __iomem *gtt_base =
-   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-   int i;
-
-   if (WARN(num_entries > max_entries,
-"First entry = %d; Num entries = %d (max=%d)\n",
-first_entry, num_entries, max_entries))
-   num_entries = max_entries;
-
-   for (i = 0; i < num_entries; i++)
-   gen8_set_pte(_base[i], scratch_pte);
-}
-
 static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
 {
/*
@@ -898,8 +889,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
 

[Intel-gfx] [PATCH 4/6] drm/i915/selftests: Restrict partial-tiling to write into the object

2021-02-10 Thread Chris Wilson
Check that the address we are about to write into maps into the object
to avoid stray writes into the scratch page.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index 39293d98f34d..df558ce95a94 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -104,9 +104,12 @@ static int check_partial_mapping(struct 
drm_i915_gem_object *obj,
GEM_BUG_ON(i915_gem_object_get_tiling(obj) != tile->tiling);
GEM_BUG_ON(i915_gem_object_get_stride(obj) != tile->stride);
 
-   page = i915_prandom_u32_max_state(npages, prng);
-   view = compute_partial_view(obj, page, MIN_CHUNK_PAGES);
+   do {
+   page = i915_prandom_u32_max_state(npages, prng);
+   offset = tiled_offset(tile, page << PAGE_SHIFT);
+   } while (offset >= obj->base.size);
 
+   view = compute_partial_view(obj, page, MIN_CHUNK_PAGES);
vma = i915_gem_object_ggtt_pin(obj, , 0, 0, PIN_MAPPABLE);
if (IS_ERR(vma)) {
pr_err("Failed to pin partial view: offset=%lu; err=%d\n",
@@ -129,10 +132,6 @@ static int check_partial_mapping(struct 
drm_i915_gem_object *obj,
iowrite32(page, io + n * PAGE_SIZE / sizeof(*io));
i915_vma_unpin_iomap(vma);
 
-   offset = tiled_offset(tile, page << PAGE_SHIFT);
-   if (offset >= obj->base.size)
-   goto out;
-
intel_gt_flush_ggtt_writes(_i915(obj->base.dev)->gt);
 
p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/6] drm/i915/selftests: Declare suspend_state before testing suspend

2021-02-10 Thread Chris Wilson
As we mock the suspend routines to exercise suspending driver and
manipulating backing storage across the suspend, declare the suspend
target as we do.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/selftests/i915_gem.c | 40 +--
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c 
b/drivers/gpu/drm/i915/selftests/i915_gem.c
index dc394fb7ccfa..6c764bcfe512 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 
 #include "gem/selftests/igt_gem_utils.h"
 #include "gem/selftests/mock_context.h"
@@ -87,25 +88,38 @@ static void simulate_hibernate(struct drm_i915_private 
*i915)
intel_runtime_pm_put(>runtime_pm, wakeref);
 }
 
-static int pm_prepare(struct drm_i915_private *i915)
+static int do_prepare(struct drm_i915_private *i915)
 {
i915_gem_suspend(i915);
 
return 0;
 }
 
-static void pm_suspend(struct drm_i915_private *i915)
+static suspend_state_t set_pm_target(suspend_state_t target)
 {
+#ifdef CONFIG_PM_SLEEP
+   return xchg(_suspend_target_state, target);
+#else
+   return PM_SUSPEND_ON;
+#endif
+}
+
+static suspend_state_t do_suspend(struct drm_i915_private *i915)
+{
+   suspend_state_t old = set_pm_target(PM_SUSPEND_MEM);
intel_wakeref_t wakeref;
 
with_intel_runtime_pm(>runtime_pm, wakeref) {
i915_ggtt_suspend(>ggtt);
i915_gem_suspend_late(i915);
}
+
+   return old;
 }
 
-static void pm_hibernate(struct drm_i915_private *i915)
+static suspend_state_t do_hibernate(struct drm_i915_private *i915)
 {
+   suspend_state_t old = set_pm_target(PM_SUSPEND_MAX);
intel_wakeref_t wakeref;
 
with_intel_runtime_pm(>runtime_pm, wakeref) {
@@ -114,9 +128,11 @@ static void pm_hibernate(struct drm_i915_private *i915)
i915_gem_freeze(i915);
i915_gem_freeze_late(i915);
}
+
+   return old;
 }
 
-static void pm_resume(struct drm_i915_private *i915)
+static void do_resume(struct drm_i915_private *i915, suspend_state_t saved)
 {
intel_wakeref_t wakeref;
 
@@ -128,12 +144,15 @@ static void pm_resume(struct drm_i915_private *i915)
i915_ggtt_resume(>ggtt);
i915_gem_resume(i915);
}
+
+   set_pm_target(saved);
 }
 
 static int igt_gem_suspend(void *arg)
 {
struct drm_i915_private *i915 = arg;
struct i915_gem_context *ctx;
+   suspend_state_t saved;
struct file *file;
int err;
 
@@ -148,16 +167,16 @@ static int igt_gem_suspend(void *arg)
if (err)
goto out;
 
-   err = pm_prepare(i915);
+   err = do_prepare(i915);
if (err)
goto out;
 
-   pm_suspend(i915);
+   saved = do_suspend(i915);
 
/* Here be dragons! Note that with S3RST any S3 may become S4! */
simulate_hibernate(i915);
 
-   pm_resume(i915);
+   do_resume(i915, saved);
 
err = switch_to_context(ctx);
 out:
@@ -169,6 +188,7 @@ static int igt_gem_hibernate(void *arg)
 {
struct drm_i915_private *i915 = arg;
struct i915_gem_context *ctx;
+   suspend_state_t saved;
struct file *file;
int err;
 
@@ -183,16 +203,16 @@ static int igt_gem_hibernate(void *arg)
if (err)
goto out;
 
-   err = pm_prepare(i915);
+   err = do_prepare(i915);
if (err)
goto out;
 
-   pm_hibernate(i915);
+   saved = do_hibernate(i915);
 
/* Here be dragons! */
simulate_hibernate(i915);
 
-   pm_resume(i915);
+   do_resume(i915, saved);
 
err = switch_to_context(ctx);
 out:
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 5/6] drm/i915: Check for scratch page scribbling

2021-02-10 Thread Chris Wilson
Periodically check, for example when idling and upon closing user
contexts, whether or not some client has written into unallocated PTE in
their ppGTT.

Signed-off-by: Chris Wilson 
Reviewed-by: Mika Kuoppala 
---
 .../drm/i915/gem/selftests/i915_gem_context.c | 24 +++---
 .../drm/i915/gem/selftests/i915_gem_mman.c| 32 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 31 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 +
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 45 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  1 +
 drivers/gpu/drm/i915/i915_scheduler.c | 33 +-
 drivers/gpu/drm/i915/i915_utils.c | 29 
 drivers/gpu/drm/i915/i915_utils.h |  3 ++
 9 files changed, 141 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index df949320f2b5..5a9128dd3979 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -1737,7 +1737,7 @@ static int read_from_scratch(struct i915_gem_context *ctx,
return err;
 }
 
-static int check_scratch_page(struct i915_gem_context *ctx, u32 *out)
+static int check_ctx_scratch(struct i915_gem_context *ctx, u32 *out)
 {
struct i915_address_space *vm;
struct page *page;
@@ -1770,6 +1770,17 @@ static int check_scratch_page(struct i915_gem_context 
*ctx, u32 *out)
return err;
 }
 
+static void reset_ctx_scratch(struct i915_gem_context *ctx, u32 value)
+{
+   struct i915_address_space *vm = ctx_vm(ctx);
+   struct page *page = __px_page(vm->scratch[0]);
+   u32 *vaddr;
+
+   vaddr = kmap(page);
+   memset32(vaddr, value, PAGE_SIZE / sizeof(value));
+   kunmap(page);
+}
+
 static int igt_vm_isolation(void *arg)
 {
struct drm_i915_private *i915 = arg;
@@ -1816,11 +1827,11 @@ static int igt_vm_isolation(void *arg)
goto out_file;
 
/* Read the initial state of the scratch page */
-   err = check_scratch_page(ctx_a, );
+   err = check_ctx_scratch(ctx_a, );
if (err)
goto out_file;
 
-   err = check_scratch_page(ctx_b, );
+   err = check_ctx_scratch(ctx_b, );
if (err)
goto out_file;
 
@@ -1855,7 +1866,7 @@ static int igt_vm_isolation(void *arg)
err = read_from_scratch(ctx_b, engine,
offset, );
if (err)
-   goto out_file;
+   goto out_scratch;
 
if (value != expected) {
pr_err("%s: Read %08x from scratch (offset 
0x%08x_%08x), after %lu reads!\n",
@@ -1864,7 +1875,7 @@ static int igt_vm_isolation(void *arg)
   lower_32_bits(offset),
   this);
err = -EINVAL;
-   goto out_file;
+   goto out_scratch;
}
 
this++;
@@ -1875,6 +1886,9 @@ static int igt_vm_isolation(void *arg)
pr_info("Checked %lu scratch offsets across %lu engines\n",
count, num_engines);
 
+out_scratch:
+   /* As we deliberately write into scratch, cover up our tracks */
+   reset_ctx_scratch(ctx_a, expected);
 out_file:
if (igt_live_test_end())
err = -EIO;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index df558ce95a94..b7f41f230c8f 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -155,6 +155,18 @@ static int check_partial_mapping(struct 
drm_i915_gem_object *obj,
drm_clflush_virt_range(cpu, sizeof(*cpu));
kunmap(p);
 
+   if (check_scratch_page(vma->vm)) {
+   pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, 
row size %u], fence=%d, tiling=%d, stride=%d) overwrote scratch\n",
+  page, n,
+  view.partial.offset,
+  view.partial.size,
+  vma->size >> PAGE_SHIFT,
+  tile->tiling ? tile_row_pages(obj) : 0,
+  vma->fence ? vma->fence->id : -1,
+  tile->tiling, tile->stride);
+   err = -EIO;
+   }
+
 out:
__i915_vma_put(vma);
return err;
@@ -250,6 +262,9 @@ static int check_partial_mappings(struct 
drm_i915_gem_object *obj,
return -EINTR;
}
 
+   if (check_scratch_page(_i915(obj->base.dev)->ggtt.vm))
+   return -EIO;
+
return

[Intel-gfx] [PATCH 6/6] drm/i915: Remove unused debug functions

2021-02-10 Thread Chris Wilson
Remove or hide unused debug functions from clang, or else it moans.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index dfabf291e5cd..566bc48e5b0c 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -49,10 +49,12 @@ static inline void debug_fence_init(struct i915_sw_fence 
*fence)
debug_object_init(fence, _sw_fence_debug_descr);
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 static inline void debug_fence_init_onstack(struct i915_sw_fence *fence)
 {
debug_object_init_on_stack(fence, _sw_fence_debug_descr);
 }
+#endif
 
 static inline void debug_fence_activate(struct i915_sw_fence *fence)
 {
@@ -92,9 +94,11 @@ static inline void debug_fence_init(struct i915_sw_fence 
*fence)
 {
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 static inline void debug_fence_init_onstack(struct i915_sw_fence *fence)
 {
 }
+#endif
 
 static inline void debug_fence_activate(struct i915_sw_fence *fence)
 {
@@ -113,10 +117,6 @@ static inline void debug_fence_destroy(struct 
i915_sw_fence *fence)
 {
 }
 
-static inline void debug_fence_free(struct i915_sw_fence *fence)
-{
-}
-
 static inline void debug_fence_assert(struct i915_sw_fence *fence)
 {
 }
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/6] drm/i915: Clear internal GT state on hibernate

2021-02-10 Thread Chris Wilson
Call intel_gt_suspend_late() to disable the GT before hibernation.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 000e1cd8e920..da0ef483ad6b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -115,6 +115,8 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
 * the objects as well, see i915_gem_freeze()
 */
 
+   intel_gt_suspend_late(>gt);
+
with_intel_runtime_pm(>runtime_pm, wakeref)
i915_gem_shrink(i915, -1UL, NULL, ~0);
i915_gem_drain_freed_objects(i915);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend

2021-02-10 Thread Chris Wilson
After calling intel_gt_suspend_prepare(), the driver starts to turn off
various subsystems, such as clearing the GGTT, before calling
intel_gt_suspend_late() to relinquish control over the GT. However, if
we still have internal GPU state active as we clear the GGTT, the GPU
may write back its internal state to the residual GGTT addresses that
are now pointing into scratch. Let's reset the GPU to clear that
internal state as soon we have idled the GPU in prepare-to-suspend.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 0bd303d2823e..f41612faa269 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
wait_for_suspend(gt);
 
intel_uc_suspend(>uc);
+
+   /* Flush all the contexts and internal state before turning off GGTT */
+   gt_sanitize(gt, false);
 }
 
 static suspend_state_t pm_suspend_target(void)
@@ -337,7 +340,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
intel_llc_disable(>llc);
}
 
-   gt_sanitize(gt, false);
+   gt_sanitize(gt, false); /* Be paranoid, remove all residual GPU state */
 
GT_TRACE(gt, "\n");
 }
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues

2021-02-10 Thread Chris Wilson
Quoting Hans de Goede (2021-02-10 10:37:19)
> Hi,
> 
> On 2/10/21 12:07 AM, Chris Wilson wrote:
> > Quoting Hans de Goede (2021-02-09 11:46:46)
> >> Hi,
> >>
> >> On 2/9/21 12:27 AM, Chris Wilson wrote:
> >>> Quoting Hans de Goede (2021-02-08 20:38:58)
> >>>> Hi All,
> >>>>
> >>>> We (Fedora) have been receiving reports from multiple users about gfx 
> >>>> issues / glitches
> >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell 
> >>>> iGPUs and all
> >>>> reporters report that adding i915.mitigations=off to the cmdline fixes 
> >>>> things, see:
> >>>
> >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit
> >>> did not report any differences with and without mitigations. I have yet
> >>> to test other platforms. So I don't yet have an alternative.
> >>
> >> Note the original / first reporter of:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346
> >>
> >> Is using hsw-gt2, so it seems that the problem is not just the enabling of
> >> the mitigations on ivy-bridge / bay-trail but that there actually is
> >> a regression on devices where the WA worked fine before...
> > 
> > There have been 3 crashes uploaded related to v5.10.9, and in all 3
> > cases the ACTHD has been in the first page. This strongly suggests that
> > the w/a is scribbling over address 0. And there's then a very good
> > chance that
> > 
> > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6
> > Author: Chris Wilson 
> > Date:   Mon Jan 25 12:50:33 2021 +
> > 
> > drm/i915/gt: Always try to reserve GGTT address 0x0
> > 
> > commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream.
> > 
> > in v5.10.14 is sufficient to hide the issue.
> 
> That one actually is already in v5.10.13 and the various reportes of these
> issues have already tested 5.10.13. They did mention that it took longer
> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to:
> 
> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals"
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5

Started looking for scratch page overwrites, and found this little gem:
https://patchwork.freedesktop.org/patch/420436/?series=86947=1

Looks promising wrt the cause of overwriting random addresses -- and
I hope that is the explanation for the glitches/hangs. I have a hsw gt2
with gnome shell, piglit is happy, but I suspect it is all due to
placement and so will only occur at random.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI] drm/i915/gt: Correct surface base address for renderclear

2021-02-10 Thread Chris Wilson
The surface_state_base is an offset into the batch, so we need to pass
the correct batch address for STATE_BASE_ADDRESS.

Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Prathap Kumar Valsan 
Cc: Akeem G Abodunrin 
Cc: Hans de Goede 
Reviewed-by: Mika Kuoppala 
Cc:  # v5.7+
---
 drivers/gpu/drm/i915/gt/gen7_renderclear.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c 
b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
index e403eb046a43..de575fdb033f 100644
--- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c
+++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
@@ -240,7 +240,7 @@ gen7_emit_state_base_address(struct batch_chunk *batch,
/* general */
*cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY;
/* surface */
-   *cs++ = batch_addr(batch) | surface_state_base | BASE_ADDRESS_MODIFY;
+   *cs++ = (batch_addr(batch) + surface_state_base) | BASE_ADDRESS_MODIFY;
/* dynamic */
*cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY;
/* indirect */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915/gt: Correct surface base address

2021-02-10 Thread Chris Wilson
Quoting Mika Kuoppala (2021-02-10 10:50:18)
> Chris Wilson  writes:
> 
> > The surface_state_base is an offset into the batch, so we need to pass
> > the correct batch address for STATE_BASE_ADDRESS.
> >
> > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > Cc: Prathap Kumar Valsan 
> > Cc: Akeem G Abodunrin 
> 
> Reviewed-by: Mika Kuoppala 

Compared against https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7571/index.html
I think we've found our suspect.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: Check for scratch page scribbling

2021-02-10 Thread Chris Wilson
Quoting Mika Kuoppala (2021-02-10 10:49:55)
> Chris Wilson  writes:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index d34770ae4c9a..5ac9eb4a3a92 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -158,10 +158,49 @@ static void poison_scratch_page(struct 
> > drm_i915_gem_object *scratch)
> >  
> >   vaddr = kmap(page);
> >   memset(vaddr, val, PAGE_SIZE);
> > + set_page_dirty(page); /* keep the poisoned contents */
> 
> Should we use locked version in here?

We don't hold the page-lock here, so no. If this is not an anonymous
page, something is very wrong :p
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default

2021-02-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-10 10:53:43)
> +static struct clients *display_clients(struct clients *clients)
> +{
> +   struct client *ac, *c, *cp = NULL;
> +   struct clients *aggregated;
> +   int tmp, num = 0;
> +
> +   if (!aggregate_pids)
> +   return sort_clients(clients, client_cmp);

Still two calls to return sort_clients(foo, client_cmp) in this function
:) [a clients = aggregated; after processing would merge the two paths].

Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default

2021-02-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-10 10:55:44)
> 
> On 10/02/2021 10:35, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2021-02-10 09:37:55)
> > Ok, that works very well. Hmm. The sort order does seem a little jumpy
> > though. May I suggest ac->id = -c->pid; instead of num;
> 
> Done it although I thought 1st pass being sort by pid already, num as id 
> would follow a stable order. I guess your point was inversion to 
> preserve order when cycling sort modes?

I thought that makes more sense for 'aggregate-by-pid', that it should
either be in ascending/descending pid order for idle.

It just felt very jumpy; it may have been tiny amount of %busy, but I
suspected it may have been slightly unstable sorting with changing
clients.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases

2021-02-10 Thread Chris Wilson
Quoting Anand Moon (2021-02-10 07:59:29)
> Add check for object size to return appropriate error -E2BIG or -EINVAL
> to avoid WARM_ON and sucessfull return for some testcase.

No. You miss the point of having those warnings. We need to inspect the
code to remove the last remaining "int pagenum", and then we can remove
the GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for
users, only for us to motivate us into finally fixing the code.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 1/3] intel_gpu_top: Wrap interactive header

2021-02-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-10 09:37:54)
> From: Tvrtko Ursulin 
> 
> Slight improvement with regards to wrapping header components to fit
> console width. If a single element is wider than max it can still
> overflow but it should now work better for practical console widths.

I'm not fond of this layout, but that's just a personal preference.
Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 3/3] intel_gpu_top: Interactive help screen

2021-02-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-10 09:37:56)
> From: Tvrtko Ursulin 
> 
> Show a list of supported interactive commands when pressing 'h'.
> 
> Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default

2021-02-10 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-10 09:37:55)
> +static struct clients *aggregated_clients(struct clients *clients)
> +{
> +   struct client *ac, *c, *cp = NULL;
> +   struct clients *aggregated;
> +   int tmp, num = 0;
> +
> +   /* Sort by pid first to make it easy to aggregate while walking. */
> +   sort_clients(clients, client_pid_cmp);

You could eliminate this tiny bit of duplication by always calling
aggregated_clients() and returning here for !aggregate_pids.

> +   aggregated = calloc(1, sizeof(*clients));
> +   assert(aggregated);
> +
> +   ac = calloc(clients->num_clients, sizeof(*c));
> +   assert(ac);
> +
> +   aggregated->num_classes = clients->num_classes;
> +   aggregated->class = clients->class;
> +   aggregated->client = ac;
> +
> +   for_each_client(clients, c, tmp) {
> +   unsigned int i;
> +
> +   if (c->status == FREE)
> +   break;
> +
> +   assert(c->status == ALIVE);
> +
> +   if ((cp && c->pid != cp->pid) || !cp) {
> +   ac = >client[num];
> +
> +   /* New pid. */
> +   ac->clients = aggregated;
> +   ac->status = ALIVE;
> +   ac->id = ++num;
> +   ac->pid = c->pid;
> +   strcpy(ac->name, c->name);
> +   strcpy(ac->print_name, c->print_name);
> +   ac->engines = c->engines;
> +   ac->val = calloc(clients->num_classes,
> +sizeof(ac->val[0]));
> +   assert(ac->val);
> +   ac->samples = 1;
> +   }
> +
> +   cp = c;
> +
> +   if (c->samples < 2)
> +   continue;
> +
> +   ac->samples = 2; /* All what matters for display. */
> +   ac->total_runtime += c->total_runtime;
> +   ac->last_runtime += c->last_runtime;
> +
> +   for (i = 0; i < clients->num_classes; i++)
> +   ac->val[i] += c->val[i];
> +   }
> +
> +   aggregated->num_clients = num;
> +   aggregated->active_clients = num;
> +
> +   return sort_clients(aggregated, client_cmp);
>  }

Ok, that works very well. Hmm. The sort order does seem a little jumpy
though. May I suggest ac->id = -c->pid; instead of num;
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/2] drm/i915/gt: Correct surface base address

2021-02-10 Thread Chris Wilson
The surface_state_base is an offset into the batch, so we need to pass
the correct batch address for STATE_BASE_ADDRESS.

Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Prathap Kumar Valsan 
Cc: Akeem G Abodunrin 
---
 drivers/gpu/drm/i915/gt/gen7_renderclear.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c 
b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
index e403eb046a43..de575fdb033f 100644
--- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c
+++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c
@@ -240,7 +240,7 @@ gen7_emit_state_base_address(struct batch_chunk *batch,
/* general */
*cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY;
/* surface */
-   *cs++ = batch_addr(batch) | surface_state_base | BASE_ADDRESS_MODIFY;
+   *cs++ = (batch_addr(batch) + surface_state_base) | BASE_ADDRESS_MODIFY;
/* dynamic */
*cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY;
/* indirect */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   3   4   5   6   7   8   9   10   >