Re: [Intel-gfx] [PATCH] drm/i915/selftests: Drop igt_cs_tlb
Quoting Jonathan Cavitt (2023-02-17 17:20:19) > The gt_tlb live selftest has the same code coverage as the > igt_cs_tlb subtest of gtt. True, the intent of the code is the same, but gt_tlb has had a much high success rate at hitting TLB bugs, so is my preferred test. > However, igt_cs_tlb is hitting > an issue in that we are updating PTE in gt0->vm but > executing on the media tile without updating gt1->vm. I'm no longer convinced this a good explanation of the issue, as unlike the i915_requests selftest this is using a GEM context and not the local kernel contexts. The GEM context->vm should work equally on the mtl render and media tiles, afaict. The failure is very early in running on media tile (after running on the render tile) so I think it should be easy enough to reproduce in a simpler test to narrow down the cause. > This issue is corrected in gt_tlb, and thus igt_cs_tlb > is obsolete and should be removed. gt_tlb supersedes igt_cs_tlb, that I can agree on, Acked-by: Chris Wilson -Chris
Re: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv
Quoting fei.y...@intel.com (2022-03-02 18:26:57) > From: Fei Yang > > GPU hangs have been observed when multiple engines write to the > same aux_inv register at the same time. To avoid this each engine > should only invalidate its own auxiliary table. The function > gen12_emit_flush_xcs() currently invalidate the auxiliary table for > all engines because the rq->engine is not necessarily the engine > eventually carrying out the request, and potentially the engine > could even be a virtual one (with engine->instance being -1). > With this patch, auxiliary table invalidation is done only for the > engine executing the request. And the mmio address for the aux_inv > register is set after the engine instance becomes certain. > > Signed-off-by: Chris Wilson > Signed-off-by: Fei Yang > --- > drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 41 --- > .../drm/i915/gt/intel_execlists_submission.c | 38 + > drivers/gpu/drm/i915/i915_request.h | 2 + > 3 files changed, 47 insertions(+), 34 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > index b1b9c3fd7bf9..af62e2bc2c9b 100644 > --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > @@ -165,30 +165,6 @@ static u32 preparser_disable(bool state) > return MI_ARB_CHECK | 1 << 8 | state; > } > > -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) > -{ > - static const i915_reg_t vd[] = { > - GEN12_VD0_AUX_NV, > - GEN12_VD1_AUX_NV, > - GEN12_VD2_AUX_NV, > - GEN12_VD3_AUX_NV, > - }; > - > - static const i915_reg_t ve[] = { > - GEN12_VE0_AUX_NV, > - GEN12_VE1_AUX_NV, > - }; > - > - if (engine->class == VIDEO_DECODE_CLASS) > - return vd[engine->instance]; > - > - if (engine->class == VIDEO_ENHANCEMENT_CLASS) > - return ve[engine->instance]; > - > - GEM_BUG_ON("unknown aux_inv reg\n"); > - return INVALID_MMIO_REG; > -} > - > static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) > { > *cs++ = MI_LOAD_REGISTER_IMM(1); > @@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 > mode) > if (mode & EMIT_INVALIDATE) > aux_inv = rq->engine->mask & ~BIT(BCS0); > if (aux_inv) > - cmd += 2 * hweight32(aux_inv) + 2; > + cmd += 4; > > cs = intel_ring_begin(rq, cmd); > if (IS_ERR(cs)) > @@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 > mode) > *cs++ = 0; /* value */ > > if (aux_inv) { /* hsdes: 1809175790 */ > - struct intel_engine_cs *engine; > - unsigned int tmp; > - > - *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv)); > - for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) { > - *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); > - *cs++ = AUX_INV; > - } > + *cs++ = MI_LOAD_REGISTER_IMM(1); > + rq->vd_ve_aux_inv = cs; > + *cs++ = 0; /* address to be set at submission to HW */ > + *cs++ = AUX_INV; > *cs++ = MI_NOOP; > - } > + } else > + rq->vd_ve_aux_inv = NULL; > > if (mode & EMIT_INVALIDATE) > *cs++ = preparser_disable(false); > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index 1c602d4ae297..a018de6dcac5 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq) > return __i915_request_is_complete(rq); > } > > +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) > +{ > + static const i915_reg_t vd[] = { > + GEN12_VD0_AUX_NV, > + GEN12_VD1_AUX_NV, > + GEN12_VD2_AUX_NV, > + GEN12_VD3_AUX_NV, > + }; > + > + static const i915_reg_t ve[] = { > + GEN12_VE0_AUX_NV, > + GEN12_VE1_AUX_NV, > + }; > + > + if (engine->class == VIDEO_DECODE_CLASS) { > + GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd)); > + return vd[engine->insta
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] lib/igt_device: Add support for accessing unbound VF PCI devices
Quoting Janusz Krzysztofik (2022-02-18 17:08:41) > Hi Chris, > > On Friday, 18 February 2022 17:03:01 CET Chris Wilson wrote: > > Quoting Janusz Krzysztofik (2022-02-18 15:19:35) > > > @@ -206,15 +229,19 @@ static struct pci_device > > > *__igt_device_get_pci_device(int fd) > > > igt_warn("Couldn't find PCI device %04x:%02x:%02x:%02x\n", > > > pci_addr.domain, pci_addr.bus, > > > pci_addr.device, pci_addr.function); > > > - return NULL; > > > + goto cleanup; > > > } > > > > > > if (pci_device_probe(pci_dev)) { > > > igt_warn("Couldn't probe PCI device\n"); > > > - return NULL; > > > + goto cleanup; > > > } > > > > > > return pci_dev; > > > + > > > +cleanup: > > > + pci_system_cleanup(); > > > > This is a global cleanup of libpciaccess iirc, such that if anyone else > > was using the library they would be affected. > > Right, but shouldn't we also drop pci_system_init() from here and request > users to manage initialization and cleanup of that data themselves? On each > call pci_system_init() abandons existing data and overwrites a pointer to it > with that of newly allocated memory, then tests calling > igt_device_get_pci_device() multiple times are going to suffer from > significant memory leaking. Right, I thought it only inited once -- I just remember the issue with calling pci_system_cleanup() while others were still using it. Stick the call to init in an __attribute__((constructor)) or pthread_once. -Chris
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] lib/igt_device: Add support for accessing unbound VF PCI devices
Quoting Janusz Krzysztofik (2022-02-18 15:19:35) > @@ -206,15 +229,19 @@ static struct pci_device > *__igt_device_get_pci_device(int fd) > igt_warn("Couldn't find PCI device %04x:%02x:%02x:%02x\n", > pci_addr.domain, pci_addr.bus, > pci_addr.device, pci_addr.function); > - return NULL; > + goto cleanup; > } > > if (pci_device_probe(pci_dev)) { > igt_warn("Couldn't probe PCI device\n"); > - return NULL; > + goto cleanup; > } > > return pci_dev; > + > +cleanup: > + pci_system_cleanup(); This is a global cleanup of libpciaccess iirc, such that if anyone else was using the library they would be affected. > + return NULL; > }
Re: [Intel-gfx] [PATCH 0/2] Backport upstream commit e49a8b2cc852
Quoting Janusz Krzysztofik (2021-12-03 13:21:06) > diff --git a/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch > b/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch > index efadd30d8cad..62b0a41d4aa4 100644 > --- a/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch > +++ b/0001-drm-i915-gt-Cleanup-partial-engine-discovery-failure.patch > @@ -8,20 +8,20 @@ some engines will be fully setup and some not. Those > incompletely setup > engines only have 'engine->release == NULL' and so will leak any of the > common objects allocated. > > -Incorporates some suggestions from Janusz for handling pinned context > -cleanup. > +v2: no longer incorporates suggestions from Janusz for handling pinned > +context cleanup since upstream version has been backported Noting the absence of something not explained just adds confusion. You've handled this appropriately as a standalone patch, no more needs to be said here (and the chunk doesn't even appear in the patch anymore). The covernote explained why this chunk disappears, so we have a story for the pile, which is a different story as that told by the series of individual patches. Reviewed-by: Chris Wilson -Chris
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Rename gt to gt0
Quoting Andi Shyti (2021-11-17 13:34:56) > diff --git a/drivers/gpu/drm/i915/display/intel_atomic_plane.c > b/drivers/gpu/drm/i915/display/intel_atomic_plane.c > index 089fb4658b216..0bbf8c0c42eac 100644 > --- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c > +++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c > @@ -817,7 +817,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane, > * maximum clocks following a vblank miss (see do_rps_boost()). > */ > if (!state->rps_interactive) { > - intel_rps_mark_interactive(&dev_priv->gt.rps, true); > + intel_rps_mark_interactive(&dev_priv->gt0.rps, true); This should be across all gt, so probably wants a fresh interface that takes i915 and does for_each_gt in a later patch. (Since we could be rendering on a remote tile to present on a display.) > state->rps_interactive = true; > } > > @@ -851,7 +851,7 @@ intel_cleanup_plane_fb(struct drm_plane *plane, > return; > > if (state->rps_interactive) { > - intel_rps_mark_interactive(&dev_priv->gt.rps, false); > + intel_rps_mark_interactive(&dev_priv->gt0.rps, false); > state->rps_interactive = false; > } > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 0ceee8ac66717..d4fcd8f236476 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -838,7 +838,7 @@ __intel_display_resume(struct drm_device *dev, > static bool gpu_reset_clobbers_display(struct drm_i915_private *dev_priv) > { > return (INTEL_INFO(dev_priv)->gpu_reset_clobbers_display && > - intel_has_gpu_reset(&dev_priv->gt)); > + intel_has_gpu_reset(&dev_priv->gt0)); All these display consumers probably want to use dev_priv->ggtt->vm.gt, since the scanout capable GGTT would seem to be the defining feature. to_scanout_gt(i915) ? > static bool pxp_is_borked(struct drm_i915_gem_object *obj) > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index ebd775cb1661c..c62253d0af044 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -237,7 +237,7 @@ static int proto_context_set_persistence(struct > drm_i915_private *i915, > * colateral damage, and we should not pretend we can by > * exposing the interface. > */ > - if (!intel_has_reset_engine(&i915->gt)) > + if (!intel_has_reset_engine(&i915->gt0)) > return -ENODEV; Prep for all gt. A lot of these need an all-gt interface so we don't have for_each_gt spread all other the place. > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > index ef22d4ed66ad6..69ad407eb15f3 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c > @@ -166,7 +166,7 @@ static struct dma_fence *i915_ttm_accel_move(struct > ttm_buffer_object *bo, > enum i915_cache_level src_level, dst_level; > int ret; > > - if (!i915->gt.migrate.context || intel_gt_is_wedged(&i915->gt)) > + if (!i915->gt0.migrate.context || intel_gt_is_wedged(&i915->gt0)) This should already be looking at lmem->gt > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c > b/drivers/gpu/drm/i915/gt/intel_engine_user.c > index 8f8bea08e734d..176ea5c7d422f 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c > @@ -116,7 +116,7 @@ static void set_scheduler_caps(struct drm_i915_private > *i915) > disabled |= (I915_SCHEDULER_CAP_ENABLED | > I915_SCHEDULER_CAP_PRIORITY); > > - if (intel_uc_uses_guc_submission(&i915->gt.uc)) > + if (intel_uc_uses_guc_submission(&i915->gt0.uc)) This shouldn't be looking at gt at all, but if it must, that information must be coming via engine->gt. Kind of renders the mapping moot currently. > diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c > b/drivers/gpu/drm/i915/gt/intel_rps.c > index 07ff7ba7b2b71..63089e671a242 100644 > --- a/drivers/gpu/drm/i915/gt/intel_rps.c > +++ b/drivers/gpu/drm/i915/gt/intel_rps.c > @@ -2302,7 +2302,7 @@ unsigned long i915_read_mch_val(void) > return 0; > > with_intel_runtime_pm(&i915->runtime_pm, wakeref) { > - struct intel_ips *ips = &i915->gt.rps.ips; > + struct intel_ips *ips = &i915->gt0.rps.ips; Make mchdev_get() return the gt or rps, at the slight cost of making the drm_dev_put() more complicated (but can be pushed into a mchdev_put for symmetry). > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c > b/drivers/gp
Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2
Quoting Ville Syrjälä (2021-03-22 14:48:44) > On Sun, Mar 21, 2021 at 04:30:32PM +0000, Chris Wilson wrote: > > Quoting Chris Wilson (2021-03-21 16:28:07) > > > Quoting Ville Syrjala (2021-03-21 16:10:38) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > > index ec28a6cde49b..0b2434e29d00 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > > @@ -189,7 +189,7 @@ compute_partial_view(const struct > > > > drm_i915_gem_object *obj, > > > > struct i915_ggtt_view view; > > > > > > > > if (i915_gem_object_is_tiled(obj)) > > > > - chunk = roundup(chunk, tile_row_pages(obj)); > > > > + chunk = roundup(chunk, tile_row_pages(obj) ?: 1); > > > > > > I was thinking the answer would be to align to the next page, and hey > > > presto! > > > > Wait, the tile row cannot be a single page. Something else is zero that > > should not be. > > How come? At least i915_tiling_ok() doesn't enforce any > bigger lower bound. This maybe the trap I'm falling into, thinking that all arch have at least 4K tile rows. Some might say, "shouldn't the chunk be aligned to an even tile row" as well, but I was never certain about that. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2
Quoting Chris Wilson (2021-03-21 16:30:32) > Quoting Chris Wilson (2021-03-21 16:28:07) > > Quoting Ville Syrjala (2021-03-21 16:10:38) > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > index ec28a6cde49b..0b2434e29d00 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > > @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object > > > *obj, > > > struct i915_ggtt_view view; > > > > > > if (i915_gem_object_is_tiled(obj)) > > > - chunk = roundup(chunk, tile_row_pages(obj)); > > > + chunk = roundup(chunk, tile_row_pages(obj) ?: 1); > > > > I was thinking the answer would be to align to the next page, and hey > > presto! > > Wait, the tile row cannot be a single page. Something else is zero that > should not be. Which fortunately does not matter here, as we start with a 2MiB chunk and want to align that to a multiple of tile rows. Still, as you said, something stinks. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2
Quoting Chris Wilson (2021-03-21 16:28:07) > Quoting Ville Syrjala (2021-03-21 16:10:38) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > index ec28a6cde49b..0b2434e29d00 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > > @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object > > *obj, > > struct i915_ggtt_view view; > > > > if (i915_gem_object_is_tiled(obj)) > > - chunk = roundup(chunk, tile_row_pages(obj)); > > + chunk = roundup(chunk, tile_row_pages(obj) ?: 1); > > I was thinking the answer would be to align to the next page, and hey > presto! Wait, the tile row cannot be a single page. Something else is zero that should not be. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid div-by-zero on gen2
Quoting Ville Syrjala (2021-03-21 16:10:38) > From: Ville Syrjälä > > Gen2 tiles are 2KiB in size so i915_gem_object_get_tile_row_size() > can in fact return <4KiB, which leads to div-by-zero here. > Avoid that. So long as we overestimate it is fine, since we only care to find a suitably small chunk that is large enough. I thought it was overestimating, oh well. > Not sure i915_gem_object_get_tile_row_size() is entirely > sane anyway since it doesn't account for the different tile > layouts on i8xx/i915... It should not matter so long as we pick a common divisor, suitable for the fence register. > I'm not able to hit this before commit 6846895fde05 ("drm/i915: > Replace PIN_NONFAULT with calls to PIN_NOEVICT") and it looks > like I also need to run recent version of Mesa. With those in > place xonotic trips on this quite easily on my 85x. NOEVICT will make it much less eager to remove older bindings, with the preference then to use smaller views of objects. The theory being that the workingset is less than the whole object, so we can fit more active pages in and cause less thrashing when moving the unused pages around in the GTT. > Cc: Chris Wilson > Signed-off-by: Ville Syrjälä > --- > drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > index ec28a6cde49b..0b2434e29d00 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > @@ -189,7 +189,7 @@ compute_partial_view(const struct drm_i915_gem_object > *obj, > struct i915_ggtt_view view; > > if (i915_gem_object_is_tiled(obj)) > - chunk = roundup(chunk, tile_row_pages(obj)); > + chunk = roundup(chunk, tile_row_pages(obj) ?: 1); I was thinking the answer would be to align to the next page, and hey presto! Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] i915: Drop relocation support on all new hardware (v3)
Quoting Zbigniew Kempczyński (2021-03-11 11:44:32) > On Wed, Mar 10, 2021 at 03:50:07PM -0600, Jason Ekstrand wrote: > > The Vulkan driver in Mesa for Intel hardware never uses relocations if > > it's running on a version of i915 that supports at least softpin which > > all versions of i915 supporting Gen12 do. On the OpenGL side, Gen12+ is > > only supported by iris which never uses relocations. The older i965 > > driver in Mesa does use relocations but it only supports Intel hardware > > through Gen11 and has been deprecated for all hardware Gen9+. The > > compute driver also never uses relocations. This only leaves the media > > driver which is supposed to be switching to softpin going forward. > > Making softpin a requirement for all future hardware seems reasonable. > > > > Rejecting relocations starting with Gen12 has the benefit that we don't > > have to bother supporting it on platforms with local memory. Given how > > much CPU touching of memory is required for relocations, not having to > > do so on platforms where not all memory is directly CPU-accessible > > carries significant advantages. > > > > v2 (Jason Ekstrand): > > - Allow TGL-LP platforms as they've already shipped > > > > v3 (Jason Ekstrand): > > - WARN_ON platforms with LMEM support in case the check is wrong > > I was asked to review of this patch. It works along with expected > IGT check https://patchwork.freedesktop.org/patch/423361/?series=82954&rev=25 > > Before I'll give you r-b - isn't i915_gem_execbuffer2_ioctl() better place > to do for loop just after copy_from_user() and check relocation_count? > We have an access to exec2_list there, we know the gen so we're able to say > relocations are not supported immediate, without entering > i915_gem_do_execbuffer(). There's a NORELOC flag you can enforce as mandatory. That's trivial for userspace to set, really makes sure they are aware of the change afoot, and i915_gem_ceck_execbuffer() will perform the validation upfront with the other flag checks. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "drm/i915: Propagate errors on awaiting already signaled fences"
Quoting Daniel Vetter (2021-03-11 16:01:46) > On Fri, Mar 05, 2021 at 11:05:46AM -0600, Jason Ekstrand wrote: > > This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever > > since that commit, we've been having issues where a hang in one client > > can propagate to another. In particular, a hang in an app can propagate > > to the X server which causes the whole desktop to lock up. > > > > Signed-off-by: Jason Ekstrand > > Reported-by: Marcin Slusarz > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080 > > Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already > > signaled fences") > > Yeah I suggested to just go with the revert, so I guess on my to give you > the explainer to be added to the commit message. If you took the patch this was copied from and only revert on the dma-resv, things do not break horribly. > Error propagation along fences sound like a good idea, but as your bug > shows, surprising consequences, since propagating errors across security > boundaries is not a good thing. > > What we do have is track the hangs on the ctx, and report information to > userspace using RESET_STATS. And by the fence, which is far more precise. > That's how arb_robustness works. Also, if my > understanding is still correct, the EIO from execbuf is when your context > is banned (because not recoverable or too many hangs). And in all these > cases it's up to userspace to figure out what is all impacted and should > be reported to the application, that's not on the kernel to guess and > automatically propagate. > > What's more, we're also building more features on top of ctx error > reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the > userspace fence wait also relies on that mechanism. So it is the path > going forward for reporting gpu hangs and resets to userspace. That ioctl is not a solid basis, it never did quite work as expected and we kept realising the limitations of the information and the accuracy that it could report. > So all together that's why I think we should just bury this idea again as > not quite the direction we want to go to, hence why I think the revert is > the right option here. No, as shown by igt it's a critical issue that we have to judicially chose which errors to ignore. And it's not just the ability to subvert gen7 and gen9, it's the error tracking employed for preempting contexts among others. Hence go with the original patch to undo the propagation along dma-resv. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Tvrtko Ursulin (2021-03-10 10:19:12) > > Hi, > > On 08/03/2021 17:32, Chiou, Cooper wrote: > > I've tested on GLK, KBL, CFL Intel NUC devices and got the following > > performance results, there is no performance regression per my testing. > > > > Patch: [v5] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads > > for Gen9 > > Test suite: > > phoronix-test-suite.supertuxkart.1024x768.Fullscreen.Ultimate.1.GranParadisoIsland.frames_per_second > > Kernel version: 5.12.0-rc1 (drm-tip) > > > > a. Device: Intel NUC kit NUC7JY Gemini Lake Celeron J4005 @2.7GHz (2 Cores) > > Without patch, fps=57.45 > > With patch, fps=57.49 > > b. Device: Intel NUC kit NUC8BEH Coffee Lake Core i3-8109U @3.6GHz(4 Cores) > > Without patch, fps=117.23 > > With patch, fps=117.27 > > c. Device: Intel NUC kit NUC7i3BNH Kaby Lake Core i3-7100U @2.4GHz(4 Cores) > > Without patch, fps=114.05 > > With patch, fps=114.34 > > > > Meanwhile, Intel lkp team has validated performance on lkp-kbl-nuc1 and no > > regression. > > f69d02e37a85645a d912096c40cdc3bc9364966971 testcase/testparams/testbox > > -- --- > >%stddev change %stddev > >\ |\ > >29.79 29.67 > > phoronix-test-suite/performance-true-Fullscreen-Ultimate-1-Gran_Paradiso_Island__Approxima-supertuxkart-1.5.2-ucode=0xde/lkp-kbl-nuc1 > >29.79 29.67GEO-MEAN > > phoronix-test-suite.supertuxkart.1280x1024.Fullscreen.Ultimate.1.GranParadisoIsland.frames_per_second > > > > CI results are green so that is good. > > Do the machines used for performance testing include unusual fusing? > Worrying thing is that we were never able to reproduce the reported > regression in house due lack of identical machine, right? Although I > guess avoiding hangs trumps performance. The issue is that if the regression is reproducible it means that the broadcast mask is no longer correct (or never was, one or the other ;) And another w/a is going astray because it depends on the previous undefined value of the mcr. Which raises the question as to whether the hang prevention seen here is also because some other w/a (or other mmio) is not being applied to the relevant units. Or vice versa. Either way there remains an underlying issue in that some register writes for gen9 require mcr being set that were are not handling correctly. Changing the mask here changing results elsewhere indicate that the issues are fully addressed, and the fear that undoing some other mmio is going to introduce other subtle hangs. And we are all blindly poking at the issue as no one has access to the affected skus. What would be useful is if we print the value before changing it so that we can see if we have any machines in CI where we are making significant changes to the broadcast mask. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] gpu: drm: i915: fix error return code of igt_buddy_alloc_smoke()
Quoting Jia-Ju Bai (2021-03-08 08:59:52) > When i915_random_order() returns NULL to order, no error return code of > igt_buddy_alloc_smoke() is assigned. > To fix this bug, err is assigned with -EINVAL in this case. It would not be EINVAL since that is used for a reference failure, but in this case the idea was to return 0 as no testing was done and the ENOMEM was raised before testing began i.e. not an internal and unexpected driver allocation failure. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] gpu: drm: i915: fix error return code of igt_threaded_blt()
Quoting Jia-Ju Bai (2021-03-08 09:07:22) > When kcalloc() returns NULL to tsk or thread, no error code of > igt_threaded_blt() is returned. > To fix this bug, -ENOMEM is returned as error code. Because we decided to skip the test if it could not be run due to insufficient memory, as opposed to an internal allocation failure from the driver. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "drm/i915: Propagate errors on awaiting already signaled fences"
Quoting Jason Ekstrand (2021-03-05 17:05:46) > This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever > since that commit, we've been having issues where a hang in one client > can propagate to another. In particular, a hang in an app can propagate > to the X server which causes the whole desktop to lock up. The fence error handling is required to prevent user's circumventing incomplete work, such as security validation or escaping isolation. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Chris Wilson (2021-03-05 12:20:45) > Quoting Tvrtko Ursulin (2021-03-05 09:23:02) > > I am not sure if PC8 and DMC could also be involved from what Cooper was > > saying in a different thread. Maybe another CI run without the DMC, both > > ffs and fls. Another for limiting cstates. > > Disabling the dmc leaves the display code in an inconsistent state so we > don't complete a BAT run; but since the warnings are thrown during boot > we can say that disabling the dmc does clear up the workaround issues on > ehl/jsl: > > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-ehl-2/boot0.txt > https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-jsl-1/boot0.txt Fwiw, disabling the dmc and using fls for gen9 is not enough to avoid the warnings though: https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7614/fi-cfl-8109u/igt@i915_selftest@live@memory_region.html So far only ffs works for gen9 mcr. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Tvrtko Ursulin (2021-03-05 09:23:02) > I am not sure if PC8 and DMC could also be involved from what Cooper was > saying in a different thread. Maybe another CI run without the DMC, both > ffs and fls. Another for limiting cstates. Disabling the dmc leaves the display code in an inconsistent state so we don't complete a BAT run; but since the warnings are thrown during boot we can say that disabling the dmc does clear up the workaround issues on ehl/jsl: https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-ehl-2/boot0.txt https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7612/fi-jsl-1/boot0.txt -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 6/7] drm/i915: rename DISP_STEPPING->DISPLAY_STEP and GT_STEPPING->GT_STEP
Quoting Jani Nikula (2021-02-24 08:46:55) > On Tue, 23 Feb 2021, Lucas De Marchi wrote: > > On Tue, Feb 23, 2021 at 05:35:11PM +0200, Jani Nikula wrote: > >>Matter of taste. STEP matches the enums. > >> > >>Signed-off-by: Jani Nikula > >>--- > >> drivers/gpu/drm/i915/display/intel_display_power.c | 2 +- > >> drivers/gpu/drm/i915/display/intel_psr.c | 4 ++-- > >> drivers/gpu/drm/i915/display/skl_universal_plane.c | 2 +- > >> drivers/gpu/drm/i915/gt/intel_workarounds.c| 10 +- > >> drivers/gpu/drm/i915/i915_drv.h| 10 +- > >> drivers/gpu/drm/i915/intel_device_info.c | 2 +- > >> drivers/gpu/drm/i915/intel_pm.c| 2 +- > >> 7 files changed, 16 insertions(+), 16 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c > >>b/drivers/gpu/drm/i915/display/intel_display_power.c > >>index f00c1750febd..1f7b2700947a 100644 > >>--- a/drivers/gpu/drm/i915/display/intel_display_power.c > >>+++ b/drivers/gpu/drm/i915/display/intel_display_power.c > >>@@ -5349,7 +5349,7 @@ static void tgl_bw_buddy_init(struct drm_i915_private > >>*dev_priv) > >> > >> if (IS_ALDERLAKE_S(dev_priv) || > >> IS_DG1_REVID(dev_priv, DG1_REVID_A0, DG1_REVID_A0) || > >>- IS_TGL_DISP_STEPPING(dev_priv, STEP_A0, STEP_B0)) > >>+ IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0)) > >> /* Wa_1409767108:tgl,dg1,adl-s */ > >> table = wa_1409767108_buddy_page_masks; > >> else > >>diff --git a/drivers/gpu/drm/i915/display/intel_psr.c > >>b/drivers/gpu/drm/i915/display/intel_psr.c > >>index 7c6e561f86c1..da5084b54eb6 100644 > >>--- a/drivers/gpu/drm/i915/display/intel_psr.c > >>+++ b/drivers/gpu/drm/i915/display/intel_psr.c > >>@@ -548,7 +548,7 @@ static void hsw_activate_psr2(struct intel_dp *intel_dp) > >> > >> if (intel_dp->psr.psr2_sel_fetch_enabled) { > >> /* WA 1408330847 */ > >>- if (IS_TGL_DISP_STEPPING(dev_priv, STEP_A0, STEP_A0) || > >>+ if (IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_A0) || > > > > I always hated the DISP vs DISPLAY. It should be in the commit message. > > > > But if you are doing the s/STEPPING/STEP/, shouldn't the filename also use > > step and all the functions/structs? > > To be honest, the rename came as an afterthought, after Aditya (I think) > added the STEP_X enums. > > For me step everywhere sounds good, I wonder what the native speakers > think. IS_DISPLAY_STEPPING(STEP_X) is more flamboyant than IS_DISPLAY_STEP(STEP_X), but we often make the concession for brevity and in this case the consistency between macro and enum beats the inconsistency in English. So STEP reads as a perfectly acceptable synonym for STEPPING. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Chris Wilson (2021-03-04 11:56:16) > Quoting Chris Wilson (2021-03-04 09:19:24) > > Quoting Tvrtko Ursulin (2021-03-04 09:12:26) > > > > > > On 02/03/2021 06:27, Cooper Chiou wrote: > > > > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to > > > > resolve VP8 hardware encoding system hang up on GT1 sku for > > > > ChromiumOS projects > > > > > > > > Slice specific MMIO read inaccurate so MGSR needs to be programmed > > > > appropriately to get correct reads from these slicet-related MMIOs. > > > > > > > > It dictates that before any MMIO read into Slice/Subslice specific > > > > registers, MCR packet control register(0xFDC) needs to be programmed > > > > to point to any enabled slice/subslice pair, especially GT1 fused sku > > > > since this issue can be reproduced on VP8 hardware encoding via ffmpeg > > > > on ChromiumOS devices. > > > > When exit PC7, MGSR will reset so that we have to skip fused subslice > > > > ID. > > > > > > > > Reference: HSD#1508045018,1405586840, BSID#0575 > > > > > > > > Cc: Ville Syrjälä > > > > Cc: Rodrigo Vivi > > > > Cc: Jani Nikula > > > > Cc: Chris Wilson > > > > Cc: Tvrtko Ursulin > > > > Cc: William Tseng > > > > Cc: Lee Shawn C > > > > > > > > Signed-off-by: Cooper Chiou > > > > --- > > > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 + > > > > 1 file changed, 38 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > > b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > > index 3b4a7da60f0b..4ad598a727a6 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private > > > > *i915, struct i915_wa_list *wal) > > > > wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME); > > > > } > > > > > > > > +static void > > > > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list > > > > *wal) > > > > +{ > > > > + const struct sseu_dev_info *sseu = &i915->gt.info.sseu; > > > > + unsigned int slice, subslice; > > > > + u32 mcr, mcr_mask; > > > > + > > > > + GEM_BUG_ON(INTEL_GEN(i915) < 9); > > > > + > > > > + /* > > > > + * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml > > > > + * Before any MMIO read into slice/subslice specific registers, > > > > MCR > > > > + * packet control register needs to be programmed to point to any > > > > + * enabled s/ss pair. Otherwise, incorrect values will be > > > > returned. > > > > + * This means each subsequent MMIO read will be forwarded to an > > > > + * specific s/ss combination, but this is OK since these registers > > > > + * are consistent across s/ss in almost all cases. In the rare > > > > + * occasions, such as INSTDONE, where this value is dependent > > > > + * on s/ss combo, the read should be done with read_subslice_reg. > > > > + */ > > > > + slice = fls(sseu->slice_mask) - 1; > > > > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask)); > > > > + subslice = fls(intel_sseu_get_subslices(sseu, slice)); > > > > + GEM_BUG_ON(!subslice); > > > > + subslice--; > > > > + > > > > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice); > > > > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; > > > > + > > > > + drm_dbg(&i915->drm, "MCR slice:%d/subslice:%d = %x\n", slice, > > > > subslice, mcr); > > > > + > > > > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr); > > > > +} > > > > > > Have you considered reusing existing wa_init_mcr? Just needs the > > > top-level assert changed and otherwise it looks it would do the right > > > thing for Gen9. Advantage being a smaller patch and less code to carry. > > > > That was the first patch, and fails for the same reason. The problem > > would appear to be in the mcr_mask for gt3. > > For the record, it appears to be an issue with fls vs ffs. Switching to > ffs also removes the warnings for workaround failures on ehl/jsl. Of course the icl in farm2 started spewing warnigns, but the other icl in farm1 were happy. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Chris Wilson (2021-03-04 09:19:24) > Quoting Tvrtko Ursulin (2021-03-04 09:12:26) > > > > On 02/03/2021 06:27, Cooper Chiou wrote: > > > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to > > > resolve VP8 hardware encoding system hang up on GT1 sku for > > > ChromiumOS projects > > > > > > Slice specific MMIO read inaccurate so MGSR needs to be programmed > > > appropriately to get correct reads from these slicet-related MMIOs. > > > > > > It dictates that before any MMIO read into Slice/Subslice specific > > > registers, MCR packet control register(0xFDC) needs to be programmed > > > to point to any enabled slice/subslice pair, especially GT1 fused sku > > > since this issue can be reproduced on VP8 hardware encoding via ffmpeg > > > on ChromiumOS devices. > > > When exit PC7, MGSR will reset so that we have to skip fused subslice ID. > > > > > > Reference: HSD#1508045018,1405586840, BSID#0575 > > > > > > Cc: Ville Syrjälä > > > Cc: Rodrigo Vivi > > > Cc: Jani Nikula > > > Cc: Chris Wilson > > > Cc: Tvrtko Ursulin > > > Cc: William Tseng > > > Cc: Lee Shawn C > > > > > > Signed-off-by: Cooper Chiou > > > --- > > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 + > > > 1 file changed, 38 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > index 3b4a7da60f0b..4ad598a727a6 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private > > > *i915, struct i915_wa_list *wal) > > > wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME); > > > } > > > > > > +static void > > > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal) > > > +{ > > > + const struct sseu_dev_info *sseu = &i915->gt.info.sseu; > > > + unsigned int slice, subslice; > > > + u32 mcr, mcr_mask; > > > + > > > + GEM_BUG_ON(INTEL_GEN(i915) < 9); > > > + > > > + /* > > > + * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml > > > + * Before any MMIO read into slice/subslice specific registers, MCR > > > + * packet control register needs to be programmed to point to any > > > + * enabled s/ss pair. Otherwise, incorrect values will be returned. > > > + * This means each subsequent MMIO read will be forwarded to an > > > + * specific s/ss combination, but this is OK since these registers > > > + * are consistent across s/ss in almost all cases. In the rare > > > + * occasions, such as INSTDONE, where this value is dependent > > > + * on s/ss combo, the read should be done with read_subslice_reg. > > > + */ > > > + slice = fls(sseu->slice_mask) - 1; > > > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask)); > > > + subslice = fls(intel_sseu_get_subslices(sseu, slice)); > > > + GEM_BUG_ON(!subslice); > > > + subslice--; > > > + > > > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice); > > > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; > > > + > > > + drm_dbg(&i915->drm, "MCR slice:%d/subslice:%d = %x\n", slice, > > > subslice, mcr); > > > + > > > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr); > > > +} > > > > Have you considered reusing existing wa_init_mcr? Just needs the > > top-level assert changed and otherwise it looks it would do the right > > thing for Gen9. Advantage being a smaller patch and less code to carry. > > That was the first patch, and fails for the same reason. The problem > would appear to be in the mcr_mask for gt3. For the record, it appears to be an issue with fls vs ffs. Switching to ffs also removes the warnings for workaround failures on ehl/jsl. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Quoting Lionel Landwerlin (2021-03-04 09:45:47) > On 04/03/2021 10:58, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2021-03-04 08:28:59) > >> On 04/03/2021 02:09, Chris Wilson wrote: > >>> Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00) > >>>> Perf measurements rely on CPU and engine timestamps to correlate > >>>> events of interest across these time domains. Current mechanisms get > >>>> these timestamps separately and the calculated delta between these > >>>> timestamps lack enough accuracy. > >>>> > >>>> To improve the accuracy of these time measurements to within a few us, > >>>> add a query that returns the engine and cpu timestamps captured as > >>>> close to each other as possible. > >>>> > >>>> v2: (Tvrtko) > >>>> - document clock reference used > >>>> - return cpu timestamp always > >>>> - capture cpu time just before lower dword of cs timestamp > >>>> > >>>> v3: (Chris) > >>>> - use uncore-rpm > >>>> - use __query_cs_timestamp helper > >>>> > >>>> v4: (Lionel) > >>>> - Kernel perf subsytem allows users to specify the clock id to be used > >>>> in perf_event_open. This clock id is used by the perf subsystem to > >>>> return the appropriate cpu timestamp in perf events. Similarly, let > >>>> the user pass the clockid to this query so that cpu timestamp > >>>> corresponds to the clock id requested. > >>>> > >>>> v5: (Tvrtko) > >>>> - Use normal ktime accessors instead of fast versions > >>>> - Add more uApi documentation > >>>> > >>>> v6: (Lionel) > >>>> - Move switch out of spinlock > >>>> > >>>> v7: (Chris) > >>>> - cs_timestamp is a misnomer, use cs_cycles instead > >>>> - return the cs cycle frequency as well in the query > >>>> > >>>> v8: > >>>> - Add platform and engine specific checks > >>>> > >>>> v9: (Lionel) > >>>> - Return 2 cpu timestamps in the query - captured before and after the > >>>> register read > >>>> > >>>> Signed-off-by: Umesh Nerlige Ramappa > >>>> --- > >>>>drivers/gpu/drm/i915/i915_query.c | 144 ++ > >>>>include/uapi/drm/i915_drm.h | 47 ++ > >>>>2 files changed, 191 insertions(+) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/i915_query.c > >>>> b/drivers/gpu/drm/i915/i915_query.c > >>>> index fed337ad7b68..acca22ee6014 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_query.c > >>>> +++ b/drivers/gpu/drm/i915/i915_query.c > >>>> @@ -6,6 +6,8 @@ > >>>> > >>>>#include > >>>> > >>>> +#include "gt/intel_engine_pm.h" > >>>> +#include "gt/intel_engine_user.h" > >>>>#include "i915_drv.h" > >>>>#include "i915_perf.h" > >>>>#include "i915_query.h" > >>>> @@ -90,6 +92,147 @@ static int query_topology_info(struct > >>>> drm_i915_private *dev_priv, > >>>> return total_length; > >>>>} > >>>> > >>>> +typedef u64 (*__ktime_func_t)(void); > >>>> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) > >>>> +{ > >>>> + /* > >>>> +* Use logic same as the perf subsystem to allow user to select > >>>> the > >>>> +* reference clock id to be used for timestamps. > >>>> +*/ > >>>> + switch (clk_id) { > >>>> + case CLOCK_MONOTONIC: > >>>> + return &ktime_get_ns; > >>>> + case CLOCK_MONOTONIC_RAW: > >>>> + return &ktime_get_raw_ns; > >>>> + case CLOCK_REALTIME: > >>>> + return &ktime_get_real_ns; > >>>> + case CLOCK_BOOTTIME: > >>>> + return &ktime_get_boottime_ns; > >>>> + case CLOCK_TAI: > >>>> + return &ktime_get_clocktai_ns; > >>&
Re: [Intel-gfx] [PATCH v3] drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9
Quoting Tvrtko Ursulin (2021-03-04 09:12:26) > > On 02/03/2021 06:27, Cooper Chiou wrote: > > WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Gen9 to > > resolve VP8 hardware encoding system hang up on GT1 sku for > > ChromiumOS projects > > > > Slice specific MMIO read inaccurate so MGSR needs to be programmed > > appropriately to get correct reads from these slicet-related MMIOs. > > > > It dictates that before any MMIO read into Slice/Subslice specific > > registers, MCR packet control register(0xFDC) needs to be programmed > > to point to any enabled slice/subslice pair, especially GT1 fused sku > > since this issue can be reproduced on VP8 hardware encoding via ffmpeg > > on ChromiumOS devices. > > When exit PC7, MGSR will reset so that we have to skip fused subslice ID. > > > > Reference: HSD#1508045018,1405586840, BSID#0575 > > > > Cc: Ville Syrjälä > > Cc: Rodrigo Vivi > > Cc: Jani Nikula > > Cc: Chris Wilson > > Cc: Tvrtko Ursulin > > Cc: William Tseng > > Cc: Lee Shawn C > > > > Signed-off-by: Cooper Chiou > > --- > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 38 + > > 1 file changed, 38 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > index 3b4a7da60f0b..4ad598a727a6 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > @@ -878,9 +878,47 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, > > struct i915_wa_list *wal) > > wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME); > > } > > > > +static void > > +gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal) > > +{ > > + const struct sseu_dev_info *sseu = &i915->gt.info.sseu; > > + unsigned int slice, subslice; > > + u32 mcr, mcr_mask; > > + > > + GEM_BUG_ON(INTEL_GEN(i915) < 9); > > + > > + /* > > + * WaProgramMgsrForCorrectSliceSpecificMmioReads:glk,kbl,cml > > + * Before any MMIO read into slice/subslice specific registers, MCR > > + * packet control register needs to be programmed to point to any > > + * enabled s/ss pair. Otherwise, incorrect values will be returned. > > + * This means each subsequent MMIO read will be forwarded to an > > + * specific s/ss combination, but this is OK since these registers > > + * are consistent across s/ss in almost all cases. In the rare > > + * occasions, such as INSTDONE, where this value is dependent > > + * on s/ss combo, the read should be done with read_subslice_reg. > > + */ > > + slice = fls(sseu->slice_mask) - 1; > > + GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask)); > > + subslice = fls(intel_sseu_get_subslices(sseu, slice)); > > + GEM_BUG_ON(!subslice); > > + subslice--; > > + > > + mcr = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice); > > + mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; > > + > > + drm_dbg(&i915->drm, "MCR slice:%d/subslice:%d = %x\n", slice, > > subslice, mcr); > > + > > + wa_write_clr_set(wal, GEN8_MCR_SELECTOR, mcr_mask, mcr); > > +} > > Have you considered reusing existing wa_init_mcr? Just needs the > top-level assert changed and otherwise it looks it would do the right > thing for Gen9. Advantage being a smaller patch and less code to carry. That was the first patch, and fails for the same reason. The problem would appear to be in the mcr_mask for gt3. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Quoting Lionel Landwerlin (2021-03-04 08:28:59) > On 04/03/2021 02:09, Chris Wilson wrote: > > Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00) > >> Perf measurements rely on CPU and engine timestamps to correlate > >> events of interest across these time domains. Current mechanisms get > >> these timestamps separately and the calculated delta between these > >> timestamps lack enough accuracy. > >> > >> To improve the accuracy of these time measurements to within a few us, > >> add a query that returns the engine and cpu timestamps captured as > >> close to each other as possible. > >> > >> v2: (Tvrtko) > >> - document clock reference used > >> - return cpu timestamp always > >> - capture cpu time just before lower dword of cs timestamp > >> > >> v3: (Chris) > >> - use uncore-rpm > >> - use __query_cs_timestamp helper > >> > >> v4: (Lionel) > >> - Kernel perf subsytem allows users to specify the clock id to be used > >>in perf_event_open. This clock id is used by the perf subsystem to > >>return the appropriate cpu timestamp in perf events. Similarly, let > >>the user pass the clockid to this query so that cpu timestamp > >>corresponds to the clock id requested. > >> > >> v5: (Tvrtko) > >> - Use normal ktime accessors instead of fast versions > >> - Add more uApi documentation > >> > >> v6: (Lionel) > >> - Move switch out of spinlock > >> > >> v7: (Chris) > >> - cs_timestamp is a misnomer, use cs_cycles instead > >> - return the cs cycle frequency as well in the query > >> > >> v8: > >> - Add platform and engine specific checks > >> > >> v9: (Lionel) > >> - Return 2 cpu timestamps in the query - captured before and after the > >>register read > >> > >> Signed-off-by: Umesh Nerlige Ramappa > >> --- > >> drivers/gpu/drm/i915/i915_query.c | 144 ++ > >> include/uapi/drm/i915_drm.h | 47 ++ > >> 2 files changed, 191 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_query.c > >> b/drivers/gpu/drm/i915/i915_query.c > >> index fed337ad7b68..acca22ee6014 100644 > >> --- a/drivers/gpu/drm/i915/i915_query.c > >> +++ b/drivers/gpu/drm/i915/i915_query.c > >> @@ -6,6 +6,8 @@ > >> > >> #include > >> > >> +#include "gt/intel_engine_pm.h" > >> +#include "gt/intel_engine_user.h" > >> #include "i915_drv.h" > >> #include "i915_perf.h" > >> #include "i915_query.h" > >> @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private > >> *dev_priv, > >> return total_length; > >> } > >> > >> +typedef u64 (*__ktime_func_t)(void); > >> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) > >> +{ > >> + /* > >> +* Use logic same as the perf subsystem to allow user to select the > >> +* reference clock id to be used for timestamps. > >> +*/ > >> + switch (clk_id) { > >> + case CLOCK_MONOTONIC: > >> + return &ktime_get_ns; > >> + case CLOCK_MONOTONIC_RAW: > >> + return &ktime_get_raw_ns; > >> + case CLOCK_REALTIME: > >> + return &ktime_get_real_ns; > >> + case CLOCK_BOOTTIME: > >> + return &ktime_get_boottime_ns; > >> + case CLOCK_TAI: > >> + return &ktime_get_clocktai_ns; > >> + default: > >> + return NULL; > >> + } > >> +} > >> + > >> +static inline int > >> +__read_timestamps(struct intel_uncore *uncore, > >> + i915_reg_t lower_reg, > >> + i915_reg_t upper_reg, > >> + u64 *cs_ts, > >> + u64 *cpu_ts, > >> + __ktime_func_t cpu_clock) > >> +{ > >> + u32 upper, lower, old_upper, loop = 0; > >> + > >> + upper = intel_uncore_read_fw(uncore, upper_reg); > >> + do { > >> + cpu_ts[0] = cpu_clock(); > >> + lower = intel_uncore_read_fw(uncore, lower_reg); > >> + cpu_ts[1] = cpu_clock(); > >
Re: [Intel-gfx] [PATCH] i915/query: Correlate engine and cpu timestamps with better accuracy
Quoting Umesh Nerlige Ramappa (2021-03-03 21:28:00) > Perf measurements rely on CPU and engine timestamps to correlate > events of interest across these time domains. Current mechanisms get > these timestamps separately and the calculated delta between these > timestamps lack enough accuracy. > > To improve the accuracy of these time measurements to within a few us, > add a query that returns the engine and cpu timestamps captured as > close to each other as possible. > > v2: (Tvrtko) > - document clock reference used > - return cpu timestamp always > - capture cpu time just before lower dword of cs timestamp > > v3: (Chris) > - use uncore-rpm > - use __query_cs_timestamp helper > > v4: (Lionel) > - Kernel perf subsytem allows users to specify the clock id to be used > in perf_event_open. This clock id is used by the perf subsystem to > return the appropriate cpu timestamp in perf events. Similarly, let > the user pass the clockid to this query so that cpu timestamp > corresponds to the clock id requested. > > v5: (Tvrtko) > - Use normal ktime accessors instead of fast versions > - Add more uApi documentation > > v6: (Lionel) > - Move switch out of spinlock > > v7: (Chris) > - cs_timestamp is a misnomer, use cs_cycles instead > - return the cs cycle frequency as well in the query > > v8: > - Add platform and engine specific checks > > v9: (Lionel) > - Return 2 cpu timestamps in the query - captured before and after the > register read > > Signed-off-by: Umesh Nerlige Ramappa > --- > drivers/gpu/drm/i915/i915_query.c | 144 ++ > include/uapi/drm/i915_drm.h | 47 ++ > 2 files changed, 191 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_query.c > b/drivers/gpu/drm/i915/i915_query.c > index fed337ad7b68..acca22ee6014 100644 > --- a/drivers/gpu/drm/i915/i915_query.c > +++ b/drivers/gpu/drm/i915/i915_query.c > @@ -6,6 +6,8 @@ > > #include > > +#include "gt/intel_engine_pm.h" > +#include "gt/intel_engine_user.h" > #include "i915_drv.h" > #include "i915_perf.h" > #include "i915_query.h" > @@ -90,6 +92,147 @@ static int query_topology_info(struct drm_i915_private > *dev_priv, > return total_length; > } > > +typedef u64 (*__ktime_func_t)(void); > +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) > +{ > + /* > +* Use logic same as the perf subsystem to allow user to select the > +* reference clock id to be used for timestamps. > +*/ > + switch (clk_id) { > + case CLOCK_MONOTONIC: > + return &ktime_get_ns; > + case CLOCK_MONOTONIC_RAW: > + return &ktime_get_raw_ns; > + case CLOCK_REALTIME: > + return &ktime_get_real_ns; > + case CLOCK_BOOTTIME: > + return &ktime_get_boottime_ns; > + case CLOCK_TAI: > + return &ktime_get_clocktai_ns; > + default: > + return NULL; > + } > +} > + > +static inline int > +__read_timestamps(struct intel_uncore *uncore, > + i915_reg_t lower_reg, > + i915_reg_t upper_reg, > + u64 *cs_ts, > + u64 *cpu_ts, > + __ktime_func_t cpu_clock) > +{ > + u32 upper, lower, old_upper, loop = 0; > + > + upper = intel_uncore_read_fw(uncore, upper_reg); > + do { > + cpu_ts[0] = cpu_clock(); > + lower = intel_uncore_read_fw(uncore, lower_reg); > + cpu_ts[1] = cpu_clock(); > + old_upper = upper; > + upper = intel_uncore_read_fw(uncore, upper_reg); Both register reads comprise the timestamp returned to userspace, so presumably you want cpu_ts[] to wrap both. do { old_upper = upper; cpu_ts[0] = cpu_clock(); lower = intel_uncore_read_fw(uncore, lower_reg); upper = intel_uncore_read_fw(uncore, upper_reg); cpu_ts[1] = cpu_clock(); } while (upper != old_upper && loop++ < 2); - Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 13/16] drm/i915/pxp: User interface for Protected buffer
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:57) > From: Bommu Krishnaiah > > This api allow user mode to create Protected buffers. Only contexts > marked as protected are allowed to operate on protected buffers. > > We only allow setting the flags at creation time. > > All protected objects that have backing storage will be considered > invalid when the session is destroyed and they won't be usable anymore. > > This is a rework of the original code by Bommu Krishnaiah. I've > authorship unchanged since significant chunks have not been modified. > > v2: split context changes, fix defines and improve documentation (Chris), > add object invalidation logic > > Signed-off-by: Bommu Krishnaiah > Signed-off-by: Daniele Ceraolo Spurio > Cc: Telukuntla Sreedhar > Cc: Kondapally Kalyan > Cc: Gupta Anshuman > Cc: Huang Sean Z > Cc: Chris Wilson > --- > drivers/gpu/drm/i915/gem/i915_gem_create.c| 27 +++-- > .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 + > drivers/gpu/drm/i915/gem/i915_gem_object.c| 6 +++ > drivers/gpu/drm/i915/gem/i915_gem_object.h| 12 ++ > .../gpu/drm/i915/gem/i915_gem_object_types.h | 13 ++ > drivers/gpu/drm/i915/pxp/intel_pxp.c | 40 +++ > drivers/gpu/drm/i915/pxp/intel_pxp.h | 13 ++ > drivers/gpu/drm/i915/pxp/intel_pxp_types.h| 5 +++ > include/uapi/drm/i915_drm.h | 22 ++ > 9 files changed, 145 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > index 3ad3413c459f..d02e5938afbe 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > @@ -5,6 +5,7 @@ > > #include "gem/i915_gem_ioctls.h" > #include "gem/i915_gem_region.h" > +#include "pxp/intel_pxp.h" > > #include "i915_drv.h" > #include "i915_user_extensions.h" > @@ -13,7 +14,8 @@ static int > i915_gem_create(struct drm_file *file, > struct intel_memory_region *mr, > u64 *size_p, > - u32 *handle_p) > + u32 *handle_p, > + u64 user_flags) > { > struct drm_i915_gem_object *obj; > u32 handle; > @@ -35,12 +37,17 @@ i915_gem_create(struct drm_file *file, > > GEM_BUG_ON(size != obj->base.size); > > + obj->user_flags = user_flags; > + > ret = drm_gem_handle_create(file, &obj->base, &handle); > /* drop reference from allocate - handle holds it now */ > i915_gem_object_put(obj); > if (ret) > return ret; > > + if (user_flags & I915_GEM_OBJECT_PROTECTED) > + intel_pxp_object_add(obj); > + > *handle_p = handle; > *size_p = size; > return 0; > @@ -89,11 +96,12 @@ i915_gem_dumb_create(struct drm_file *file, > return i915_gem_create(file, >intel_memory_region_by_type(to_i915(dev), >mem_type), > - &args->size, &args->handle); > + &args->size, &args->handle, 0); > } > > struct create_ext { > struct drm_i915_private *i915; > + unsigned long user_flags; > }; > > static int __create_setparam(struct drm_i915_gem_object_param *args, > @@ -104,6 +112,19 @@ static int __create_setparam(struct > drm_i915_gem_object_param *args, > return -EINVAL; > } > > + switch (lower_32_bits(args->param)) { > + case I915_OBJECT_PARAM_PROTECTED_CONTENT: > + if (!intel_pxp_is_enabled(&ext_data->i915->gt.pxp)) > + return -ENODEV; > + if (args->size) { > + return -EINVAL; > + } else if (args->data) { > + ext_data->user_flags |= I915_GEM_OBJECT_PROTECTED; > + return 0; > + } > + break; > + } > + > return -EINVAL; > } > > @@ -148,5 +169,5 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, > return i915_gem_create(file, >intel_memory_region_by_type(i915, > > INTEL_MEMORY_SYSTEM), > - &args->size, &args->handle); > + &args->size, &args->handle, >
Re: [Intel-gfx] [PATCH v2 11/16] drm/i915/pxp: interface for creation of protected contexts
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:55) > Usage of protected objects, coming in a follow-up patch, will be > restricted to protected contexts. Contexts can only be marked as > protected at creation time and they must be both bannable and not > recoverable. > > When a PXP teardown occurs, all gem contexts marked as protected that > have been used at least once will be marked as invalid and all new > submissions using them will be rejected. All intel contexts within the > invalidated gem contexts will be marked banned. > A new flag has been added to the RESET_STATS ioctl to report the > invalidation to userspace. > > v2: split to its own patch and improve doc (Chris), invalidate contexts > on teardown > > Signed-off-by: Daniele Ceraolo Spurio > Cc: Chris Wilson > Signed-off-by: Daniele Ceraolo Spurio > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 59 ++- > drivers/gpu/drm/i915/gem/i915_gem_context.h | 18 ++ > .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + > .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 13 > drivers/gpu/drm/i915/pxp/intel_pxp.c | 38 > drivers/gpu/drm/i915/pxp/intel_pxp.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 3 + > include/uapi/drm/i915_drm.h | 19 ++ > 8 files changed, 150 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index ca37d93ef5e7..19ac24a3c42c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -76,6 +76,8 @@ > #include "gt/intel_gpu_commands.h" > #include "gt/intel_ring.h" > > +#include "pxp/intel_pxp.h" > + > #include "i915_drm_client.h" > #include "i915_gem_context.h" > #include "i915_globals.h" > @@ -2006,6 +2008,40 @@ static int set_priority(struct i915_gem_context *ctx, > return 0; > } > > +static int set_protected(struct i915_gem_context *ctx, > +const struct drm_i915_gem_context_param *args) > +{ > + int ret = 0; > + > + if (!intel_pxp_is_enabled(&ctx->i915->gt.pxp)) > + ret = -ENODEV; > + else if (ctx->client) /* can't change this after creation! */ > + ret = -EEXIST; > + else if (args->size) > + ret = -EINVAL; > + else if (!args->value) > + clear_bit(UCONTEXT_PROTECTED, &ctx->user_flags); > + else if (i915_gem_context_is_recoverable(ctx) || > +!i915_gem_context_is_bannable(ctx)) > + ret = -EPERM; > + else > + set_bit(UCONTEXT_PROTECTED, &ctx->user_flags); > + > + return ret; > +} > + > +static int get_protected(struct i915_gem_context *ctx, > +struct drm_i915_gem_context_param *args) > +{ > + if (!intel_pxp_is_enabled(&ctx->i915->gt.pxp)) > + return -ENODEV; > + > + args->size = 0; > + args->value = i915_gem_context_can_use_protected_content(ctx); > + > + return 0; > +} > + > static int ctx_setparam(struct drm_i915_file_private *fpriv, > struct i915_gem_context *ctx, > struct drm_i915_gem_context_param *args) > @@ -2038,6 +2074,8 @@ static int ctx_setparam(struct drm_i915_file_private > *fpriv, > ret = -EPERM; > else if (args->value) > i915_gem_context_set_bannable(ctx); > + else if (i915_gem_context_can_use_protected_content(ctx)) > + ret = -EPERM; /* can't clear this for protected > contexts */ > else > i915_gem_context_clear_bannable(ctx); > break; > @@ -2045,10 +2083,12 @@ static int ctx_setparam(struct drm_i915_file_private > *fpriv, > case I915_CONTEXT_PARAM_RECOVERABLE: > if (args->size) > ret = -EINVAL; > - else if (args->value) > - i915_gem_context_set_recoverable(ctx); > - else > + else if (!args->value) > i915_gem_context_clear_recoverable(ctx); > + else if (i915_gem_context_can_use_protected_content(ctx)) > + ret = -EPERM; /* can't set this for protected > contexts */ > + else > + i915_gem_context_set_recoverable(ctx); > break; >
Re: [Intel-gfx] [PATCH v2 10/16] drm/i915/pxp: Enable PXP power management
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:54) > +int intel_pxp_runtime_resume(struct intel_pxp *pxp) > +{ > + struct intel_gt *gt = pxp_to_gt(pxp); > + int ret; > + > + if (!intel_pxp_is_enabled(pxp)) > + return 0; > + > + intel_pxp_irq_enable(pxp); > + pxp->global_state_in_suspend = false; > + > + /* > +* if the display loses power during runtime suspend it will cause the > +* session to become invalid, so to be safe we always re-create it. > The > +* MEI module is still bound, so this is the same as a teardown event, > +* hence we can just pretend we received the irq. > +*/ > + intel_pxp_mark_termination_in_progress(pxp); > + > + spin_lock_irq(>->irq_lock); > + intel_pxp_irq_handler(pxp, > GEN12_DISPLAY_PXP_STATE_TERMINATED_INTERRUPT); > + spin_unlock_irq(>->irq_lock); > + > + return ret; return random() ? > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > index 488006a0cf39..bb981d38c2fe 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > @@ -25,8 +25,14 @@ static bool intel_pxp_session_is_in_play(struct intel_pxp > *pxp, u32 id) > intel_wakeref_t wakeref; > u32 sip = 0; > > - with_intel_runtime_pm(gt->uncore->rpm, wakeref) > - sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP); > + /* if we're suspended the session is considered off */ > + wakeref = intel_runtime_pm_get_if_in_use(gt->uncore->rpm); > + if (!wakeref) > + return false; > + > + sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP); > + > + intel_runtime_pm_put(gt->uncore->rpm, wakeref); with_intel_runtime_pm_if_in_use(gt->uncore->rpm, wakeref) sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP); > > return sip & BIT(id); > } > @@ -43,12 +49,18 @@ static int pxp_wait_for_session_state(struct intel_pxp > *pxp, u32 id, bool in_pla > u32 mask = BIT(id); > int ret; > > - with_intel_runtime_pm(gt->uncore->rpm, wakeref) > - ret = intel_wait_for_register(gt->uncore, > - GEN12_KCR_SIP, > - mask, > - in_play ? mask : 0, > - 100); > + /* if we're suspended the session is considered off */ > + wakeref = intel_runtime_pm_get_if_in_use(gt->uncore->rpm); > + if (!wakeref) > + return in_play ? -ENODEV : 0; > + > + ret = intel_wait_for_register(gt->uncore, > + GEN12_KCR_SIP, > + mask, > + in_play ? mask : 0, > + 100); > + > + intel_runtime_pm_put(gt->uncore->rpm, wakeref); Ditto -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53) > +static int pxp_terminate(struct intel_pxp *pxp) > +{ > + int ret = 0; > + > + mutex_lock(&pxp->mutex); > + > + pxp->global_state_attacked = true; global_state_attacked is serialised by pxp->work > + > + ret = intel_pxp_arb_terminate_session_with_global_terminate(pxp); > + > + mutex_unlock(&pxp->mutex); > + > + return ret; Fake error returns. > +} -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53) > From: "Huang, Sean Z" > > The HW will generate a teardown interrupt when session termination is > required, which requires i915 to submit a terminating batch. Once the HW > is done with the termination it will generate another interrupt, at > which point it is safe to re-create the session. Why do we do the auto recreation after the teardown interrupt? > > v2: use struct completion instead of bool (Chris) > > Signed-off-by: Huang, Sean Z > Signed-off-by: Daniele Ceraolo Spurio > Cc: Chris Wilson > --- > drivers/gpu/drm/i915/Makefile| 1 + > drivers/gpu/drm/i915/gt/intel_gt_irq.c | 7 + > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp.c | 34 + > drivers/gpu/drm/i915/pxp/intel_pxp.h | 16 ++ > drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 151 +++ > drivers/gpu/drm/i915/pxp/intel_pxp_irq.h | 33 > drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 9 +- > drivers/gpu/drm/i915/pxp/intel_pxp_session.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 10 +- > drivers/gpu/drm/i915/pxp/intel_pxp_types.h | 8 + > 11 files changed, 268 insertions(+), 3 deletions(-) > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 8b605f326039..5e9bd34dec38 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -274,6 +274,7 @@ i915-y += i915_perf.o > i915-$(CONFIG_DRM_I915_PXP) += \ > pxp/intel_pxp.o \ > pxp/intel_pxp_cmd.o \ > + pxp/intel_pxp_irq.o \ > pxp/intel_pxp_session.o \ > pxp/intel_pxp_tee.o > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c > b/drivers/gpu/drm/i915/gt/intel_gt_irq.c > index d29126c458ba..0d3585efe2b8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c > @@ -13,6 +13,7 @@ > #include "intel_lrc_reg.h" > #include "intel_uncore.h" > #include "intel_rps.h" > +#include "pxp/intel_pxp_irq.h" > > static void guc_irq_handler(struct intel_guc *guc, u16 iir) > { > @@ -64,6 +65,9 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 > instance, > if (instance == OTHER_GTPM_INSTANCE) > return gen11_rps_irq_handler(>->rps, iir); > > + if (instance == OTHER_KCR_INSTANCE) > + return intel_pxp_irq_handler(>->pxp, iir); > + > WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n", > instance, iir); > } > @@ -190,6 +194,9 @@ void gen11_gt_irq_reset(struct intel_gt *gt) > intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); > intel_uncore_write(uncore, GEN11_GUC_SG_INTR_ENABLE, 0); > intel_uncore_write(uncore, GEN11_GUC_SG_INTR_MASK, ~0); > + > + intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_ENABLE, 0); > + intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_MASK, ~0); > } > > void gen11_gt_irq_postinstall(struct intel_gt *gt) > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index e5dd0203991b..97a6d0c638ec 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -7958,6 +7958,7 @@ enum { > /* irq instances for OTHER_CLASS */ > #define OTHER_GUC_INSTANCE 0 > #define OTHER_GTPM_INSTANCE1 > +#define OTHER_KCR_INSTANCE 4 > > #define GEN11_INTR_IDENTITY_REG(x) _MMIO(0x190060 + ((x) * 4)) > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c > b/drivers/gpu/drm/i915/pxp/intel_pxp.c > index cbec9395bde9..0ca1c2c16972 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c > @@ -2,7 +2,9 @@ > /* > * Copyright(c) 2020 Intel Corporation. > */ > +#include > #include "intel_pxp.h" > +#include "intel_pxp_irq.h" > #include "intel_pxp_tee.h" > #include "gt/intel_context.h" > #include "i915_drv.h" > @@ -67,12 +69,23 @@ void intel_pxp_init(struct intel_pxp *pxp) > > mutex_init(&pxp->mutex); > > + /* > +* we'll use the completion to check if there is a termination > pending, > +* so we start it as completed and we reinit it when a termination > +* is triggered. > +*/ > + init_completion(&pxp->termination); > + c
Re: [Intel-gfx] [PATCH v2 04/16] drm/i915/pxp: allocate a vcs context for pxp usage
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:48) > @@ -232,6 +235,13 @@ ktime_t intel_engine_get_busy_time(struct > intel_engine_cs *engine, > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > +struct intel_context * > +intel_engine_pinned_context_create(struct intel_engine_cs *engine, > + unsigned int hwsp, > + struct lock_class_key *key, > + const char *name); intel_engine_create_pinned_context() Other users would want to pass in vm and ring size (ring size may be useful here as well for larger SESSION_TERMINATE_LEN()) -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 07/16] drm/i915/pxp: Create the arbitrary session after boot
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:51) > +static inline bool intel_pxp_is_active(const struct intel_pxp *pxp) > +{ > + return pxp->arb_is_in_play; > +} > +static bool intel_pxp_session_is_in_play(struct intel_pxp *pxp, u32 id) > +{ > + struct intel_gt *gt = pxp_to_gt(pxp); > + intel_wakeref_t wakeref; > + u32 sip = 0; > + > + with_intel_runtime_pm(gt->uncore->rpm, wakeref) > + sip = intel_uncore_read(gt->uncore, GEN12_KCR_SIP); > + > + return sip & BIT(id); > +} > + > +bool intel_pxp_arb_session_is_in_play(struct intel_pxp *pxp) > +{ > + return intel_pxp_session_is_in_play(pxp, ARB_SESSION); > +} So pxp->arb_is_in_play is not the same as intel_pxp_arb_session_is_in_play(). That's confusing. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 08/16] drm/i915/pxp: Implement arb session teardown
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:52) > From: "Huang, Sean Z" > > Teardown is triggered when the display topology changes and no > long meets the secure playback requirement, and hardware trashes > all the encryption keys for display. Additionally, we want to emit a > teardown operation to make sure we're clean on boot and resume > > v2: emit in the ring, use high prio request (Chris) > > Signed-off-by: Huang, Sean Z > Signed-off-by: Daniele Ceraolo Spurio > Cc: Chris Wilson > --- > drivers/gpu/drm/i915/Makefile| 1 + > drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c | 166 +++ > drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h | 15 ++ > drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 38 + > drivers/gpu/drm/i915/pxp/intel_pxp_session.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 5 +- > 6 files changed, 225 insertions(+), 1 deletion(-) > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_cmd.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index d6d510e4875e..8b605f326039 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -273,6 +273,7 @@ i915-y += i915_perf.o > # Protected execution platform (PXP) support > i915-$(CONFIG_DRM_I915_PXP) += \ > pxp/intel_pxp.o \ > + pxp/intel_pxp_cmd.o \ > pxp/intel_pxp_session.o \ > pxp/intel_pxp_tee.o > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c > b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c > new file mode 100644 > index ..ffab09839cd3 > --- /dev/null > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd.c > @@ -0,0 +1,166 @@ > +// SPDX-License-Identifier: MIT > +/* > + * Copyright(c) 2020, Intel Corporation. All rights reserved. > + */ > + > +#include "intel_pxp.h" > +#include "intel_pxp_session.h" > +#include "gt/intel_context.h" > +#include "gt/intel_engine_pm.h" > +#include "gt/intel_gpu_commands.h" > +#include "gt/intel_ring.h" > + > +#include "i915_trace.h" > + > +/* PXP GPU command definitions */ > + > +/* MI_SET_APPID */ > +#define MI_SET_APPID_SESSION_ID(x)((x) << 0) > + > +/* MI_FLUSH_DW */ > +#define MI_FLUSH_DW_DW0_PROTECTED_MEMORY_ENABLE BIT(22) > + > +/* MI_WAIT */ > +#define MFX_WAIT_DW0_PXP_SYNC_CONTROL_FLAG BIT(9) > +#define MFX_WAIT_DW0_MFX_SYNC_CONTROL_FLAG BIT(8) We've been using REG_BIT() for the explicit (u32) casting. > +/* CRYPTO_KEY_EXCHANGE */ > +#define CRYPTO_KEY_EXCHANGE ((0x3 << 29) | (0x01609 << 16)) #define __INSTR(client) ((client) << INSTR_CLIENT_SHIFT) #define MI_INSTR(opcode, flags) \ (__INSTR(INSTR_MI_CLIENT) | (opcode) << 23 | (flags)) #define RC_INTR(foo) (__INSTR(INSTR_RC_CLIENT) | (foo) << 16) #define CRYPTO_KEY_EXCHANGE RC_INSTR(0x1609) With a better (foo). > + > +/* stall until prior PXP and MFX/HCP/HUC objects are cmopleted */ > +#define MFX_WAIT_PXP \ > + MFX_WAIT | \ > + MFX_WAIT_DW0_PXP_SYNC_CONTROL_FLAG | \ > + MFX_WAIT_DW0_MFX_SYNC_CONTROL_FLAG; > + > +static u32 *pxp_emit_session_selection(u32 *cs, u32 idx) > +{ > + *cs++ = MFX_WAIT_PXP; One day someone will proofread bspec. > + /* pxp off */ > + *cs++ = MI_FLUSH_DW; > + *cs++ = 0; > + *cs++ = 0; Hmm. Can the immediate data be dropped? TIL. > + /* select session */ > + *cs++ = MI_SET_APPID | MI_SET_APPID_SESSION_ID(idx); > + > + *cs++ = MFX_WAIT_PXP; > + > + /* pxp on */ > + *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_DW0_PROTECTED_MEMORY_ENABLE; > + *cs++ = 0; > + *cs++ = 0; Bspec says "after completion of the flush...", which says to me we should not initiate the wait until after the flush, so we would need a post-sync op here to stall the CS (or else we may complete the wait before the operation is begun). I don't see any programming notes to that effect, so could just be my paranoia from handling atomics. > + *cs++ = MFX_WAIT_PXP; Fwiw, the bspec language would seem to imply that nothing should happen with this wait at this point. Perhaps more reason to make the pxp-on MI_FLUSH_DW be synchronous. (Though we will have a sync point at the breadcrumb, so meh.) > + > + return cs; > +} > + > +static u32 *pxp_emit_inline_termination(u32 *cs) > +{ > + /* session inline termination */ > + *cs++ = CRYPTO_KEY_EXCHANGE; > + *cs++ = 0; > + > + return cs; > +} > + > +static u32 *pxp_emit_wait(u32 *cs) > +{ > +
Re: [Intel-gfx] [PATCH v2 09/16] drm/i915/pxp: Implement PXP irq handler
Quoting Daniele Ceraolo Spurio (2021-03-01 19:31:53) > From: "Huang, Sean Z" > > The HW will generate a teardown interrupt when session termination is > required, which requires i915 to submit a terminating batch. Once the HW > is done with the termination it will generate another interrupt, at > which point it is safe to re-create the session. > > v2: use struct completion instead of bool (Chris) > > Signed-off-by: Huang, Sean Z > Signed-off-by: Daniele Ceraolo Spurio > Cc: Chris Wilson > --- > drivers/gpu/drm/i915/Makefile| 1 + > drivers/gpu/drm/i915/gt/intel_gt_irq.c | 7 + > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp.c | 34 + > drivers/gpu/drm/i915/pxp/intel_pxp.h | 16 ++ > drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 151 +++ > drivers/gpu/drm/i915/pxp/intel_pxp_irq.h | 33 > drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 9 +- > drivers/gpu/drm/i915/pxp/intel_pxp_session.h | 1 + > drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 10 +- > drivers/gpu/drm/i915/pxp/intel_pxp_types.h | 8 + > 11 files changed, 268 insertions(+), 3 deletions(-) > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c > create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_irq.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 8b605f326039..5e9bd34dec38 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -274,6 +274,7 @@ i915-y += i915_perf.o > i915-$(CONFIG_DRM_I915_PXP) += \ > pxp/intel_pxp.o \ > pxp/intel_pxp_cmd.o \ > + pxp/intel_pxp_irq.o \ > pxp/intel_pxp_session.o \ > pxp/intel_pxp_tee.o > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c > b/drivers/gpu/drm/i915/gt/intel_gt_irq.c > index d29126c458ba..0d3585efe2b8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c > @@ -13,6 +13,7 @@ > #include "intel_lrc_reg.h" > #include "intel_uncore.h" > #include "intel_rps.h" > +#include "pxp/intel_pxp_irq.h" > > static void guc_irq_handler(struct intel_guc *guc, u16 iir) > { > @@ -64,6 +65,9 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 > instance, > if (instance == OTHER_GTPM_INSTANCE) > return gen11_rps_irq_handler(>->rps, iir); > > + if (instance == OTHER_KCR_INSTANCE) > + return intel_pxp_irq_handler(>->pxp, iir); > + > WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n", > instance, iir); > } > @@ -190,6 +194,9 @@ void gen11_gt_irq_reset(struct intel_gt *gt) > intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); > intel_uncore_write(uncore, GEN11_GUC_SG_INTR_ENABLE, 0); > intel_uncore_write(uncore, GEN11_GUC_SG_INTR_MASK, ~0); > + > + intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_ENABLE, 0); > + intel_uncore_write(uncore, GEN11_CRYPTO_RSVD_INTR_MASK, ~0); > } > > void gen11_gt_irq_postinstall(struct intel_gt *gt) > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index e5dd0203991b..97a6d0c638ec 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -7958,6 +7958,7 @@ enum { > /* irq instances for OTHER_CLASS */ > #define OTHER_GUC_INSTANCE 0 > #define OTHER_GTPM_INSTANCE1 > +#define OTHER_KCR_INSTANCE 4 > > #define GEN11_INTR_IDENTITY_REG(x) _MMIO(0x190060 + ((x) * 4)) > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c > b/drivers/gpu/drm/i915/pxp/intel_pxp.c > index cbec9395bde9..0ca1c2c16972 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c > @@ -2,7 +2,9 @@ > /* > * Copyright(c) 2020 Intel Corporation. > */ > +#include > #include "intel_pxp.h" > +#include "intel_pxp_irq.h" > #include "intel_pxp_tee.h" > #include "gt/intel_context.h" > #include "i915_drv.h" > @@ -67,12 +69,23 @@ void intel_pxp_init(struct intel_pxp *pxp) > > mutex_init(&pxp->mutex); > > + /* > +* we'll use the completion to check if there is a termination > pending, > +* so we start it as completed and we reinit it when a termination > +* is triggered. > +*/ > + init_completion(&pxp->termination); > + complete_all(&pxp->termination); > + > kcr_pxp_enable(gt); > >
Re: [Intel-gfx] [PATCH] drm/i915: Verify dma_addr in gen8_ggtt_pte_encode
Quoting Piorkowski, Piotr (2021-02-24 15:29:25) > From: Piotr Piórkowski > > Until now, the gen8_ggtt_pte_encode function, responsible for the preparation > of GGTT PTE, has not verified in any way whether the address given as the > parameter is correct. > By adding a GGTT address mask, we can easily verify that dma_addr will not fit > in the PTE field. > While around, cleanup a place where we hold all GEN12 GGTT PTE masks, > and the addition of the PTE description. > > Bspec: 45015 > > Signed-off-by: Piotr Piórkowski > Cc: Matthew Auld > Cc: Michal Wajdeczko > Cc: Michal Winiarski > --- > drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 ++ > drivers/gpu/drm/i915/gt/intel_gtt.h | 13 - > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c > b/drivers/gpu/drm/i915/gt/intel_ggtt.c > index b0b8ded834f0..52b2428da431 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c > @@ -193,6 +193,8 @@ static u64 gen8_ggtt_pte_encode(dma_addr_t addr, > { > gen8_pte_t pte = addr | _PAGE_PRESENT; > > + GEM_BUG_ON(addr & ~GEN12_GGTT_PTE_ADDR_MASK); You can also check the dma_get_mask() doesn't exceed the addr mask. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Wedge the GPU if command parser setup fails
Quoting Tvrtko Ursulin (2021-02-26 09:51:54) > From: Tvrtko Ursulin > > Commit 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing") > introduced mandatory command parsing but setup failures were not > translated into wedging the GPU which was probably the intent. > > Possible errors come in two categories. Either the sanity check on > internal tables has failed, which should be caught in CI unless an > affected platform would be missed in testing; or memory allocation failure > happened during driver load, which should be extremely unlikely but for > correctness should still be handled. > > Signed-off-by: Tvrtko Ursulin > Fixes: 311a50e76a33 ("drm/i915: Add support for mandatory cmdparsing") > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > Cc: Chris Wilson > --- > To catchup with referenced patch we'd need to copy stable from v5.4+. > Failures are very unlikely but I think it would still be prudent to do so. > --- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 +++- > drivers/gpu/drm/i915/i915_cmd_parser.c| 19 +-- > drivers/gpu/drm/i915/i915_drv.h | 2 +- > 3 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 577ebd4a324f..ac6478c4ede4 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -696,6 +696,11 @@ static int engine_setup_common(struct intel_engine_cs > *engine) > goto err_status; > } > > + err = intel_engine_init_cmd_parser(engine); > + if (err) { > + goto err_cmd_parser; > + } -{} > + > i915_sched_init(&engine->sched, > engine->i915->drm.dev, > engine->name, > @@ -703,7 +708,6 @@ static int engine_setup_common(struct intel_engine_cs > *engine) > ENGINE_PHYSICAL); > > intel_engine_init_execlists(engine); > - intel_engine_init_cmd_parser(engine); > intel_engine_init__pm(engine); > intel_engine_init_retire(engine); > > @@ -720,6 +724,8 @@ static int engine_setup_common(struct intel_engine_cs > *engine) > > return 0; > > +err_cmd_parser: > + intel_breadcrumbs_free(engine->breadcrumbs); > err_status: > cleanup_status_page(engine); > return err; > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c > b/drivers/gpu/drm/i915/i915_cmd_parser.c > index ced9a96d7c34..13905891aff7 100644 > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > @@ -940,7 +940,7 @@ static void fini_hash_table(struct intel_engine_cs > *engine) > * struct intel_engine_cs based on whether the platform requires software > * command parsing. > */ > -void intel_engine_init_cmd_parser(struct intel_engine_cs *engine) > +int intel_engine_init_cmd_parser(struct intel_engine_cs *engine) > { > const struct drm_i915_cmd_table *cmd_tables; > int cmd_table_count; > @@ -948,7 +948,7 @@ void intel_engine_init_cmd_parser(struct intel_engine_cs > *engine) > > if (!IS_GEN(engine->i915, 7) && !(IS_GEN(engine->i915, 9) && > engine->class == COPY_ENGINE_CLASS)) > - return; > + return 0; > > switch (engine->class) { > case RENDER_CLASS: > @@ -1013,19 +1013,19 @@ void intel_engine_init_cmd_parser(struct > intel_engine_cs *engine) > break; > default: > MISSING_CASE(engine->class); > - return; > + goto out; > } > > if (!validate_cmds_sorted(engine, cmd_tables, cmd_table_count)) { > drm_err(&engine->i915->drm, > "%s: command descriptions are not sorted\n", > engine->name); > - return; > + goto out; > } > if (!validate_regs_sorted(engine)) { > drm_err(&engine->i915->drm, > "%s: registers are not sorted\n", engine->name); > - return; > + goto out; > } > > ret = init_hash_table(engine, cmd_tables, cmd_table_count); > @@ -1033,10 +1033,17 @@ void intel_engine_init_cmd_parser(struct > intel_engine_cs *engine) > drm_err(&engine->i915->drm, > "%s: initialised fail
[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 168 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. v2: Move the guard from modifying drm_mm_node.start which is still used by the drm_mm itself, into an adjustment of node.start at the point of use. v3: Pass the requested guard padding from the caller, so we can drop the VT-d w/a knowledge from the i915_vma allocator. v4: Bump minimum padding to 168 PTE and cautiously ensure that a full tile row around the vma is included with the guard. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld Cc: Imre Deak Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 25 +- drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_vma.c| 8 +++ 4 files changed, 23 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..32b13af0d3df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -16,6 +16,8 @@ #include "i915_gem_lmem.h" #include "i915_gem_mman.h" +#define VTD_GUARD (168u * I915_GTT_PAGE_SIZE) /* 168 or tile-row PTE padding */ + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || @@ -345,6 +347,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + /* VT-d may overfetch before/after the vma, so pad with scratch */ + if (intel_scanout_needs_vtd_wa(i915)) { + unsigned int guard = VTD_GUARD; + + if (i915_gem_object_is_tiled(obj)) + guard = max(guard, + i915_gem_object_get_tile_row_size(obj)); + + flags |= PIN_OFFSET_GUARD | guard; + } + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 6b326138e765..251b50884d1c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, -"First entry = %d; Num entries = %d (max=%d)\n", -first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page; ggtt->vm.clear_range = nop_clear_range; - if (intel_scanout_needs_vtd_wa(i915)) - ggtt->vm.clear_range = gen8_ggtt_clear_range; ggtt->vm.insert_entries = gen8_ggtt_insert_entries; @@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.alloc_pt_dma = alloc_pt_dma; ggtt->vm.clear_range = nop_clear_range; - if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915)) + if (!HAS_FULL_PPGTT(i915)) ggtt->vm.clear_range = gen6_ggtt_clear_range; ggtt->vm.insert_page = gen6_ggtt_insert_page; gg
[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 168 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. v2: Move the guard from modifying drm_mm_node.start which is still used by the drm_mm itself, into an adjustment of node.start at the point of use. v3: Pass the requested guard padding from the caller, so we can drop the VT-d w/a knowledge from the i915_vma allocator. v4: Bump minimum padding to 168 PTE and cautiously ensure that a full tile row around the vma is included with the guard. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld Cc: Imre Deak Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 25 +- drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_vma.c| 8 +++ 4 files changed, 23 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..32b13af0d3df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -16,6 +16,8 @@ #include "i915_gem_lmem.h" #include "i915_gem_mman.h" +#define VTD_GUARD (168u * I915_GTT_PAGE_SIZE) /* 168 or tile-row PTE padding */ + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || @@ -345,6 +347,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + /* VT-d may overfetch before/after the vma, so pad with scratch */ + if (intel_scanout_needs_vtd_wa(i915)) { + unsigned int guard = VTD_GUARD; + + if (i915_gem_object_is_tiled(obj)) + guard = max(guard, + i915_gem_object_get_tile_row_size(obj)); + + flags |= PIN_OFFSET_GUARD | guard; + } + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 6b326138e765..251b50884d1c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, -"First entry = %d; Num entries = %d (max=%d)\n", -first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page; ggtt->vm.clear_range = nop_clear_range; - if (intel_scanout_needs_vtd_wa(i915)) - ggtt->vm.clear_range = gen8_ggtt_clear_range; ggtt->vm.insert_entries = gen8_ggtt_insert_entries; @@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.alloc_pt_dma = alloc_pt_dma; ggtt->vm.clear_range = nop_clear_range; - if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915)) + if (!HAS_FULL_PPGTT(i915)) ggtt->vm.clear_range = gen6_ggtt_clear_range; ggtt->vm.insert_page = gen6_ggtt_insert_page; gg
Re: [Intel-gfx] [PATCH] drm/i915: Wait for scanout to stop when sanitizing planes
Quoting Ville Syrjala (2021-02-17 16:20:50) > From: Ville Syrjälä > > When we sanitize planes let's wait for the scanout to stop > before we let the subsequent code tear down the ggtt mappings > and whatnot. Cures an underrun on my ivb when I boot with > VT-d enabled and the BIOS fb gets thrown out due to stolen > being considered unusable with VT-d active. > > Signed-off-by: Ville Syrjälä > --- > drivers/gpu/drm/i915/display/intel_display.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index acade004e8b1..3e2c192ec708 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -2403,6 +2403,7 @@ static void intel_plane_disable_noatomic(struct > intel_crtc *crtc, > intel_set_cpu_fifo_underrun_reporting(dev_priv, crtc->pipe, > false); > > intel_disable_plane(plane, crtc_state); > + intel_wait_for_vblank(dev_priv, crtc->pipe); I could only find paths here from sanitize, so it looks safe from the prospect of adding random delays to userspace. Makes sense so, Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy
Quoting Tvrtko Ursulin (2021-02-16 15:59:33) > > On 16/02/2021 12:49, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2021-02-16 10:50:50) > >> From: Tvrtko Ursulin > >> > >> Test that periodic reads of engine busyness against a constant 100% load > >> are within the 5000ppm tolerance when comparing perf timestamp versus > >> counter values. > >> > >> Signed-off-by: Tvrtko Ursulin > >> --- > >> tests/i915/perf_pmu.c | 46 ++- > >> 1 file changed, 41 insertions(+), 5 deletions(-) > >> > >> diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c > >> index 50b5c82bc472..728312be5293 100644 > >> --- a/tests/i915/perf_pmu.c > >> +++ b/tests/i915/perf_pmu.c > >> @@ -26,6 +26,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -46,6 +47,7 @@ > >> #include "igt_perf.h" > >> #include "igt_sysfs.h" > >> #include "igt_pm.h" > >> +#include "igt_stats.h" > >> #include "sw_sync.h" > >> > >> IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface"); > >> @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, > >> unsigned int flags) > >> static void > >> single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int > >> flags) > >> { > >> + unsigned int loops = flags & FLAG_LONG ? 20 : 1; > >> + double err_min = DBL_MAX, err_max = -DBL_MAX; > >> unsigned long slept; > >> igt_spin_t *spin; > >> + igt_stats_t s; > >> uint64_t val; > >> int fd; > >> > >> @@ -290,11 +295,40 @@ single(int gem_fd, const struct > >> intel_execution_engine2 *e, unsigned int flags) > >> else > >> spin = NULL; > >> > >> - val = pmu_read_single(fd); > >> - slept = measured_usleep(batch_duration_ns / 1000); > >> - if (flags & TEST_TRAILING_IDLE) > >> - end_spin(gem_fd, spin, flags); > >> - val = pmu_read_single(fd) - val; > >> + igt_stats_init_with_size(&s, loops); > >> + > >> + while (--loops) { > > > > while (loops--) > > > > /o\ > > Yeah.. At least I know the oddity is related to sampling. Since even on > Haswell: > > (perf_pmu:1591) DEBUG: time=500207720 busy=500037022 error=-341.25ppm > (perf_pmu:1591) DEBUG: time=500252520 busy=500033517 error=-437.78ppm > (perf_pmu:1591) DEBUG: time=500187490 busy=49817 error=-375.21ppm > (perf_pmu:1591) DEBUG: time=500244871 busy=49837 error=-489.83ppm > (perf_pmu:1591) DEBUG: time=500268670 busy=49477 error=-538.10ppm > (perf_pmu:1591) DEBUG: time=500245246 busy=50432 error=-489.39ppm > (perf_pmu:1591) DEBUG: time=500245735 busy=49306 error=-492.62ppm > (perf_pmu:1591) DEBUG: time=500270045 busy=51747 error=-536.31ppm > (perf_pmu:1591) DEBUG: time=500254286 busy=48162 error=-511.99ppm > (perf_pmu:1591) DEBUG: time=500247790 busy=50347 error=-494.64ppm > (perf_pmu:1591) DEBUG: time=500250261 busy=50257 error=-499.76ppm > (perf_pmu:1591) DEBUG: time=500250005 busy=58177 error=-483.41ppm > (perf_pmu:1591) DEBUG: time=500249065 busy=41867 error=-514.14ppm > (perf_pmu:1591) DEBUG: time=500249725 busy=50371 error=-498.46ppm > (perf_pmu:1591) DEBUG: time=500250335 busy=49772 error=-500.88ppm > (perf_pmu:1591) DEBUG: time=500258691 busy=49937 error=-517.24ppm > (perf_pmu:1591) DEBUG: time=500239980 busy=51037 error=-477.66ppm > (perf_pmu:1591) DEBUG: time=500240791 busy=504999361 error=9512.56ppm > > And this last one is way more than one sampling period. I'll be thinking > about this in the background. One thing to add would be the cumulative error. It does feel like a corrective factor is applied to the sampling period. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Readout conn_state->max_bpc
Quoting Ville Syrjala (2021-02-16 16:00:35) > From: Ville Syrjälä > > Populate conn_state->max_bpc with something sensible from the start. > Otherwise it's possible that we get to compute_sink_pipe_bpp() with > max_bpc==0. > > The specific scenario goes as follows: > 1. Initial connector state allocated with max_bpc==0 > 2. Trigger a modeset on the crtc feeding the connector, without >actually adding the connector to the commit > 3. drm_atomic_connector_check() is skipped because the >connector has not yet been added, hence conn_state->max_bpc >retains its current value > 4. drm_atomic_helper_check_modeset() -> >drm_atomic_add_affected_connectors() -> the connector >is now part of the commit > 5. compute_baseline_pipe_bpp() -> MISSING_CASE(max_bpc==0) > > Note that pipe_bpp itself may not be populated on pre-g4x machines, > in which case we just fall back to max_bpc==8 and let .compute_config() > limit the resulting pipe_bpp further if necessary. > > Cc: Daniel Vetter > Reported-by: Chris Wilson > Signed-off-by: Ville Syrjälä Tested-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers
In light of the VT-d workarounds, we may introduce padding around the scanout vma. This should not affect relocations referencing the scanout on !full-ppgtt where we leak the GGTT address of scanout to users. Signed-off-by: Chris Wilson Cc: Matthew Auld Reviewed-by: Matthew Auld --- tests/i915/gem_exec_reloc.c | 102 1 file changed, 102 insertions(+) diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c index cc9b8cd6d..23057f76e 100644 --- a/tests/i915/gem_exec_reloc.c +++ b/tests/i915/gem_exec_reloc.c @@ -26,7 +26,9 @@ #include "i915/gem.h" #include "igt.h" +#include "igt_device.h" #include "igt_dummyload.h" +#include "igt_kms.h" #include "sw_sync.h" IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations."); @@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common) igt_assert_eq(result, 0); } +static uint32_t +pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb) +{ + drmModeModeInfoPtr mode = igt_output_get_mode(output); + igt_plane_t *primary; + + igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, + LOCAL_I915_FORMAT_MOD_X_TILED, fb); + + primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY); + igt_plane_set_fb(primary, fb); + + igt_display_commit2(dpy, COMMIT_LEGACY); + + return fb->gem_handle; +} + +static void scanout(int i915, + igt_display_t *dpy, + const struct intel_execution_engine2 *e) +{ + struct drm_i915_gem_relocation_entry reloc; + struct drm_i915_gem_exec_object2 obj[2] = { + [1] = { .handle = batch_create(i915) }, + }; + struct drm_i915_gem_execbuffer2 execbuf = { + .buffers_ptr = to_user_pointer(obj), + .buffer_count = 2, + .flags = e->flags, + }; + igt_output_t *output; + struct igt_fb fb; + uint64_t *map; + + igt_display_reset(dpy); + + output = igt_get_single_output_for_pipe(dpy, PIPE_A); + igt_require(output); + igt_output_set_pipe(output, PIPE_A); + + /* +* Find where the scanout is in our GTT; on !full-ppgtt this will be +* the actual GGTT address of the scanout. +*/ + obj[0].handle = pin_scanout(dpy, output, &fb); + gem_execbuf(i915, &execbuf); + igt_info("Scanout GTT address: %#llx\n", obj[0].offset); + + /* Relocations should match the scanout address */ + reloc.target_handle = obj[0].handle; + reloc.delta = 0; + reloc.presumed_offset = -1; + reloc.offset = 4000; + obj[1].relocation_count = 1; + obj[1].relocs_ptr = to_user_pointer(&reloc); + gem_execbuf(i915, &execbuf); + igt_info("Reloc address: %#llx\n", reloc.presumed_offset); + igt_assert_eq_u64(reloc.presumed_offset, obj[0].offset); + + /* The address written into the batch should match the relocation */ + gem_sync(i915, obj[1].handle); + map = gem_mmap__device_coherent(i915, obj[1].handle, + 0, 4096, PROT_WRITE); + igt_assert_eq_u64(map[500], obj[0].offset); + munmap(map, 4096); + + /* And finally softpinning with the scanout address should work */ + obj[0].flags |= EXEC_OBJECT_PINNED; + obj[1].relocation_count = 0; + gem_execbuf(i915, &execbuf); + igt_assert_eq_u64(obj[0].offset, reloc.presumed_offset); + + gem_close(i915, obj[1].handle); + igt_remove_fb(dpy->drm_fd, &fb); +} + #define I915_GEM_GPU_DOMAINS \ (I915_GEM_DOMAIN_RENDER | \ I915_GEM_DOMAIN_SAMPLER | \ @@ -1511,6 +1590,29 @@ igt_main igt_subtest("invalid-domains") invalid_domains(fd); + igt_subtest_group { + igt_display_t display = { + .drm_fd = fd, + .n_pipes = IGT_MAX_PIPES + }; + + igt_fixture { + igt_device_set_master(fd); + kmstest_set_vt_graphics_mode(); + igt_display_require(&display, fd); + } + + igt_subtest_with_dynamic("basic-scanout") { + __for_each_physical_engine(fd, e) { + igt_dynamic_f("%s", e->name) + scanout(fd, &display, e); + } + } + + igt_fixture + igt_display_fini(&display); + } + igt_fixture close(fd); } -- 2.30.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers
Quoting Matthew Auld (2021-02-16 16:49:28) > On Tue, 16 Feb 2021 at 14:32, Chris Wilson wrote: > > > > In light of the VT-d workarounds, we may introduce padding around the > > scanout vma. This should not affect relocations referencing the scanout > > on !full-ppgtt where we leak the GGTT address of scanout to users. > > > > Signed-off-by: Chris Wilson > > Cc: Matthew Auld > > --- > > tests/i915/gem_exec_reloc.c | 102 > > 1 file changed, 102 insertions(+) > > > > diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c > > index cc9b8cd6d..98960bb84 100644 > > --- a/tests/i915/gem_exec_reloc.c > > +++ b/tests/i915/gem_exec_reloc.c > > @@ -26,7 +26,9 @@ > > > > #include "i915/gem.h" > > #include "igt.h" > > +#include "igt_device.h" > > #include "igt_dummyload.h" > > +#include "igt_kms.h" > > #include "sw_sync.h" > > > > IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations."); > > @@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common) > > igt_assert_eq(result, 0); > > } > > > > +static uint32_t > > +pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb) > > +{ > > + drmModeModeInfoPtr mode; > > + igt_plane_t *primary; > > + > > + mode = igt_output_get_mode(output); > > + > > + igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay, > > + DRM_FORMAT_XRGB, > > + LOCAL_I915_FORMAT_MOD_X_TILED, fb); > > + > > + primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY); > > + igt_plane_set_fb(primary, fb); > > + > > + igt_display_commit2(dpy, COMMIT_LEGACY); > > + > > + return fb->gem_handle; > > +} > > + > > +static void scanout(int i915, > > + igt_display_t *dpy, > > + const struct intel_execution_engine2 *e) > > Missing feeding the engine into the execbuf? Oops. Pretty pointless in making it loop over the engines without selecting one! -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] i915/gem_exec_reloc: Verify relocations with pinned scanout framebuffers
In light of the VT-d workarounds, we may introduce padding around the scanout vma. This should not affect relocations referencing the scanout on !full-ppgtt where we leak the GGTT address of scanout to users. Signed-off-by: Chris Wilson Cc: Matthew Auld --- tests/i915/gem_exec_reloc.c | 102 1 file changed, 102 insertions(+) diff --git a/tests/i915/gem_exec_reloc.c b/tests/i915/gem_exec_reloc.c index cc9b8cd6d..98960bb84 100644 --- a/tests/i915/gem_exec_reloc.c +++ b/tests/i915/gem_exec_reloc.c @@ -26,7 +26,9 @@ #include "i915/gem.h" #include "igt.h" +#include "igt_device.h" #include "igt_dummyload.h" +#include "igt_kms.h" #include "sw_sync.h" IGT_TEST_DESCRIPTION("Basic sanity check of execbuf-ioctl relocations."); @@ -1286,6 +1288,83 @@ static void concurrent(int i915, int num_common) igt_assert_eq(result, 0); } +static uint32_t +pin_scanout(igt_display_t *dpy, igt_output_t *output, struct igt_fb *fb) +{ + drmModeModeInfoPtr mode; + igt_plane_t *primary; + + mode = igt_output_get_mode(output); + + igt_create_pattern_fb(dpy->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, + LOCAL_I915_FORMAT_MOD_X_TILED, fb); + + primary = igt_output_get_plane_type(output, DRM_PLANE_TYPE_PRIMARY); + igt_plane_set_fb(primary, fb); + + igt_display_commit2(dpy, COMMIT_LEGACY); + + return fb->gem_handle; +} + +static void scanout(int i915, + igt_display_t *dpy, + const struct intel_execution_engine2 *e) +{ + struct drm_i915_gem_relocation_entry reloc; + struct drm_i915_gem_exec_object2 obj[2] = { + [1] = { .handle = batch_create(i915) }, + }; + struct drm_i915_gem_execbuffer2 execbuf = { + .buffers_ptr = to_user_pointer(obj), + .buffer_count = 2, + }; + igt_output_t *output; + struct igt_fb fb; + uint64_t *map; + + igt_display_reset(dpy); + + output = igt_get_single_output_for_pipe(dpy, PIPE_A); + igt_require(output); + igt_output_set_pipe(output, PIPE_A); + + /* +* Find where the scanout is in our GTT; on !full-ppgtt this will be +* the actual GGTT address of the scanout. +*/ + obj[0].handle = pin_scanout(dpy, output, &fb); + gem_execbuf(i915, &execbuf); + igt_info("Scanout GTT address: %#llx\n", obj[0].offset); + + /* Relocations should match the scanout address */ + reloc.target_handle = obj[0].handle; + reloc.delta = 0; + reloc.presumed_offset = -1; + reloc.offset = 4000; + obj[1].relocation_count = 1; + obj[1].relocs_ptr = to_user_pointer(&reloc); + gem_execbuf(i915, &execbuf); + igt_info("Reloc address: %#llx\n", reloc.presumed_offset); + igt_assert_eq_u64(reloc.presumed_offset, obj[0].offset); + + gem_sync(i915, obj[1].handle); + map = gem_mmap__device_coherent(i915, obj[1].handle, + 0, 4096, PROT_WRITE); + igt_assert_eq_u64(map[500], obj[0].offset); + munmap(map, 4096); + + /* And finally softpinning with the scanout address should work */ + obj[0].flags |= EXEC_OBJECT_PINNED; + obj[1].relocation_count = 0; + gem_execbuf(i915, &execbuf); + igt_assert_eq_u64(obj[0].offset, reloc.presumed_offset); + + gem_close(i915, obj[1].handle); + igt_remove_fb(dpy->drm_fd, &fb); +} + #define I915_GEM_GPU_DOMAINS \ (I915_GEM_DOMAIN_RENDER | \ I915_GEM_DOMAIN_SAMPLER | \ @@ -1511,6 +1590,29 @@ igt_main igt_subtest("invalid-domains") invalid_domains(fd); + igt_subtest_group { + igt_display_t display = { + .drm_fd = fd, + .n_pipes = IGT_MAX_PIPES + }; + + igt_fixture { + igt_device_set_master(fd); + kmstest_set_vt_graphics_mode(); + igt_display_require(&display, fd); + } + + igt_subtest_with_dynamic("basic-scanout") { + __for_each_physical_engine(fd, e) { + igt_dynamic_f("%s", e->name) + scanout(fd, &display, e); + } + } + + igt_fixture + igt_display_fini(&display); + } + igt_fixture close(fd); } -- 2.30.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy
Quoting Tvrtko Ursulin (2021-02-16 10:50:50) > From: Tvrtko Ursulin > > Test that periodic reads of engine busyness against a constant 100% load > are within the 5000ppm tolerance when comparing perf timestamp versus > counter values. > > Signed-off-by: Tvrtko Ursulin > --- > tests/i915/perf_pmu.c | 46 ++- > 1 file changed, 41 insertions(+), 5 deletions(-) > > diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c > index 50b5c82bc472..728312be5293 100644 > --- a/tests/i915/perf_pmu.c > +++ b/tests/i915/perf_pmu.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -46,6 +47,7 @@ > #include "igt_perf.h" > #include "igt_sysfs.h" > #include "igt_pm.h" > +#include "igt_stats.h" > #include "sw_sync.h" > > IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface"); > @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned > int flags) > static void > single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int > flags) > { > + unsigned int loops = flags & FLAG_LONG ? 20 : 1; > + double err_min = DBL_MAX, err_max = -DBL_MAX; > unsigned long slept; > igt_spin_t *spin; > + igt_stats_t s; > uint64_t val; > int fd; > > @@ -290,11 +295,40 @@ single(int gem_fd, const struct intel_execution_engine2 > *e, unsigned int flags) > else > spin = NULL; > > - val = pmu_read_single(fd); > - slept = measured_usleep(batch_duration_ns / 1000); > - if (flags & TEST_TRAILING_IDLE) > - end_spin(gem_fd, spin, flags); > - val = pmu_read_single(fd) - val; > + igt_stats_init_with_size(&s, loops); > + > + while (--loops) { while (loops--) /o\ -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy
Quoting Tvrtko Ursulin (2021-02-16 10:50:50) > From: Tvrtko Ursulin > > Test that periodic reads of engine busyness against a constant 100% load > are within the 5000ppm tolerance when comparing perf timestamp versus > counter values. > > Signed-off-by: Tvrtko Ursulin > --- > tests/i915/perf_pmu.c | 46 ++- > 1 file changed, 41 insertions(+), 5 deletions(-) > > diff --git a/tests/i915/perf_pmu.c b/tests/i915/perf_pmu.c > index 50b5c82bc472..728312be5293 100644 > --- a/tests/i915/perf_pmu.c > +++ b/tests/i915/perf_pmu.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -46,6 +47,7 @@ > #include "igt_perf.h" > #include "igt_sysfs.h" > #include "igt_pm.h" > +#include "igt_stats.h" > #include "sw_sync.h" > > IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface"); > @@ -278,8 +280,11 @@ static void end_spin(int fd, igt_spin_t *spin, unsigned > int flags) > static void > single(int gem_fd, const struct intel_execution_engine2 *e, unsigned int > flags) > { > + unsigned int loops = flags & FLAG_LONG ? 20 : 1; > + double err_min = DBL_MAX, err_max = -DBL_MAX; > unsigned long slept; > igt_spin_t *spin; > + igt_stats_t s; > uint64_t val; > int fd; Something to record is that TEST_TRAILING_IDLE and TEST_LONG are mutually exclusive. So assert(igt_hweight(flags & (TEST_TRAILING_IDLE | TEST_LONG)) <= 1); ? -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases
Quoting Ram Moon, AnandX (2021-02-16 12:05:23) > Hi Chris, > > -Original Message- > From: dri-devel On Behalf Of Chris > Wilson > Sent: Monday, February 15, 2021 6:10 PM > To: Auld, Matthew ; Ram Moon, AnandX > ; Surendrakumar Upadhyay, TejaskumarX > ; Ursulin, Tvrtko > ; Jani Nikula ; > dri-de...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size > for corner cases > > Quoting Ram Moon, AnandX (2021-02-15 12:29:17) > > Hi Chris, > > > > -----Original Message- > > From: dri-devel On Behalf Of > > Chris Wilson > > Sent: Wednesday, February 10, 2021 4:15 PM > > To: Ram Moon, AnandX ; Jani Nikula > > ; Auld, Matthew ; > > Surendrakumar Upadhyay, TejaskumarX > > ; Ursulin, Tvrtko > > ; dri-de...@lists.freedesktop.org; > > intel-gfx@lists.freedesktop.org > > Cc: Ram Moon, AnandX > > Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object > > size for corner cases > > > > Quoting Anand Moon (2021-02-10 07:59:29) > > > Add check for object size to return appropriate error -E2BIG or > > > -EINVAL to avoid WARM_ON and successful return for some testcase. > > > > No. You miss the point of having those warnings. We need to inspect the > > code to remove the last remaining "int pagenum", and then we can remove the > > GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for users, > > only for us to motivate us into finally fixing the code. > > -Chris > > > > Yes, I got your point these check are meant to catch up integer overflow. > > > > I have check with the IGT testcase case _gem_create_ and > > _gem_userptr_blits_ which fails pass size *-1ull << 32* left shift > > and *0~* which leads to integer overflow ie _negative_ size passed to > > create i915_gem_create via ioctl this leads to GM_WARN_ON. > > > > Can we drop these testcase so that we don't break the kernel ? > > The kernel rejects the ioctl with the expected errno. We leave a warning > purely for the benefit of CI, only when compiled for debugging and not by > default, so that we have a persistent reminder to do the code review. > What's broken? > -Chris > > All though the testcase return with appropriate error we observe kernel taint > see below. Which is an intentional taint added for CI so that we get a warning and a visible bug so that we can allocate resources to _fix_ the underlying problems in the code. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/i915/perf_pmu: Subtest to measure sampling error for 100% busy
Quoting Tvrtko Ursulin (2021-02-16 10:50:50) > From: Tvrtko Ursulin > > Test that periodic reads of engine busyness against a constant 100% load > are within the 5000ppm tolerance when comparing perf timestamp versus > counter values. We've previously only included the accuracy tests on platforms that claim to be accurate. Can we rephrase the goal of the test such that it sounds more like a universal truth that should also be valid for the sampling backends? "The busyness should never exceed 100% (within a margin of error)..." And since accuracy is currently protected by has_busy_stats, let's try to find a different name. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Introduce guard pages to i915_vma
Introduce the concept of padding the i915_vma with guard pages before and aft. The major consequence is that all ordinary uses of i915_vma must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size directly, as the drm_mm_node will include the guard pages that surround our object. The biggest connundrum is how exactly to mix requesting a fixed address with guard pages, particularly through the existing uABI. The user does not know about guard pages, so such must be transparent to the user, and so the execobj.offset must be that of the object itself excluding the guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. The caveat is that some placements will be impossible with guard pages, as wrap arounds need to be avoided, and the vma itself will require a larger node. We must we not report EINVAL but ENOSPC as these are unavailable locations within the GTT rather than conflicting user requirements. In the next patch, we start using guard pages for scanout objects. While these are limited to GGTT vma, on a few platforms these vma (or at least an alias of the vma) is shared with userspace, so we may leak the existence of such guards if we are not careful to ensure that the execobj.offset is transparent and excludes the guards. (On such platforms like ivb, without full-ppgtt, userspace has to use relocations so the presence of more untouchable regions within its GTT such be of no further issue.) v2: Include the guard range in the overflow checks and placement restrictions. v3: Fix the check on the placement upper bound. The request user offset is relative to the guard offset (not the node.start) and so we should not include the initial guard offset again when computing the upper bound of the node. Signed-off-by: Chris Wilson Cc: Matthew Auld Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- drivers/gpu/drm/i915/i915_vma.c | 28 ++- drivers/gpu/drm/i915/i915_vma.h | 5 +++-- drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- 4 files changed, 38 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index c5803c434d33..6b326138e765 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen8_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) gen8_set_pte(gte++, pte_encode | addr); GEM_BUG_ON(gte > end); @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen6_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) iowrite32(vm->pte_encode(addr, level, flags), gte++); GEM_BUG_ON(gte > end); diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 748f5ea1ba04..01e62edf4704 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color) static int i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { - unsigned long color; + unsigned long color, guard; u64 start, end; int ret; @@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); size = max(size, vma->size); - alignment = max(alignment, vma->display_alignment); + alignment = max_t(typeof(alignment), alignment, vma->display_alignment); if (flags & PIN_MAPPABLE) { size = max_t(typeof(size), size, vma->fence_size); alignment = max_t(typeof(alignment), @@ -642,6 +642,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT)); GEM_BUG_ON(!is_power_of_2(alignment)); + guard = vma->guard; /* retain guard across rebinds */ + guard = ALIGN(guard, alignment); + start = flags & PIN_OFFSET_BIAS ? flags &
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma
Quoting Matthew Auld (2021-02-15 19:31:39) > On Mon, 15 Feb 2021 at 18:15, Chris Wilson wrote: > > > > Quoting Matthew Auld (2021-02-15 18:04:08) > > > On Mon, 15 Feb 2021 at 15:56, Chris Wilson > > > wrote: > > > > > > > > Introduce the concept of padding the i915_vma with guard pages before > > > > and aft. The major consequence is that all ordinary uses of i915_vma > > > > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size > > > > directly, as the drm_mm_node will include the guard pages that surround > > > > our object. > > > > > > > > The biggest connundrum is how exactly to mix requesting a fixed address > > > > with guard pages, particularly through the existing uABI. The user does > > > > not know about guard pages, so such must be transparent to the user, and > > > > so the execobj.offset must be that of the object itself excluding the > > > > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. > > > > The caveat is that some placements will be impossible with guard pages, > > > > as wrap arounds need to be avoided, and the vma itself will require a > > > > larger node. We must we not report EINVAL but ENOSPC as these are > > > > unavailable locations within the GTT rather than conflicting user > > > > requirements. > > > > > > > > In the next patch, we start using guard pages for scanout objects. While > > > > these are limited to GGTT vma, on a few platforms these vma (or at least > > > > an alias of the vma) is shared with userspace, so we may leak the > > > > existence of such guards if we are not careful to ensure that the > > > > execobj.offset is transparent and excludes the guards. (On such > > > > platforms > > > > like ivb, without full-ppgtt, userspace has to use relocations so the > > > > presence of more untouchable regions within its GTT such be of no > > > > further > > > > issue.) > > > > > > > > v2: Include the guard range in the overflow checks and placement > > > > restrictions. > > > > > > > > Signed-off-by: Chris Wilson > > > > Cc: Matthew Auld > > > > --- > > > > drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- > > > > drivers/gpu/drm/i915/i915_vma.c | 28 ++- > > > > drivers/gpu/drm/i915/i915_vma.h | 5 +++-- > > > > drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- > > > > 4 files changed, 38 insertions(+), 10 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > > > index c5803c434d33..6b326138e765 100644 > > > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > > > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct > > > > i915_address_space *vm, > > > > > > > > gte = (gen8_pte_t __iomem *)ggtt->gsm; > > > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > > > + while (gte < end) > > > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > > > + > > > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > > > for_each_sgt_daddr(addr, iter, vma->pages) > > > > gen8_set_pte(gte++, pte_encode | addr); > > > > GEM_BUG_ON(gte > end); > > > > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct > > > > i915_address_space *vm, > > > > > > > > gte = (gen6_pte_t __iomem *)ggtt->gsm; > > > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > > > + while (gte < end) > > > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > > > + > > > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > > > for_each_sgt_daddr(addr, iter, vma->pages) > > > >
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma
Quoting Matthew Auld (2021-02-15 18:04:08) > On Mon, 15 Feb 2021 at 15:56, Chris Wilson wrote: > > > > Introduce the concept of padding the i915_vma with guard pages before > > and aft. The major consequence is that all ordinary uses of i915_vma > > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size > > directly, as the drm_mm_node will include the guard pages that surround > > our object. > > > > The biggest connundrum is how exactly to mix requesting a fixed address > > with guard pages, particularly through the existing uABI. The user does > > not know about guard pages, so such must be transparent to the user, and > > so the execobj.offset must be that of the object itself excluding the > > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. > > The caveat is that some placements will be impossible with guard pages, > > as wrap arounds need to be avoided, and the vma itself will require a > > larger node. We must we not report EINVAL but ENOSPC as these are > > unavailable locations within the GTT rather than conflicting user > > requirements. > > > > In the next patch, we start using guard pages for scanout objects. While > > these are limited to GGTT vma, on a few platforms these vma (or at least > > an alias of the vma) is shared with userspace, so we may leak the > > existence of such guards if we are not careful to ensure that the > > execobj.offset is transparent and excludes the guards. (On such platforms > > like ivb, without full-ppgtt, userspace has to use relocations so the > > presence of more untouchable regions within its GTT such be of no further > > issue.) > > > > v2: Include the guard range in the overflow checks and placement > > restrictions. > > > > Signed-off-by: Chris Wilson > > Cc: Matthew Auld > > --- > > drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- > > drivers/gpu/drm/i915/i915_vma.c | 28 ++- > > drivers/gpu/drm/i915/i915_vma.h | 5 +++-- > > drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- > > 4 files changed, 38 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > index c5803c434d33..6b326138e765 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen8_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > + while (gte < end) > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > + > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > for_each_sgt_daddr(addr, iter, vma->pages) > > gen8_set_pte(gte++, pte_encode | addr); > > GEM_BUG_ON(gte > end); > > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen6_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > + while (gte < end) > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > + > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > for_each_sgt_daddr(addr, iter, vma->pages) > > iowrite32(vm->pte_encode(addr, level, flags), gte++); > > GEM_BUG_ON(gte > end); > > diff --git a/drivers/gpu/drm/i915/i915_vma.c > > b/drivers/gpu/drm/i915/i915_vma.c > > index 748f5ea1ba04..31d0f8b64ec0 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.c > > +++ b/drivers/gpu/drm/i915/i915_vma.c > > @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, > > unsigned long color) > > static int > > i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) > > { > > - unsigned long color; > > + unsigned long color, guard; > > u64 start, end; > > int ret; > > > > @@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 > > alignment, u64 flags) > > GEM_BUG_ON(drm_mm_node_allocated(&vma-&
Re: [Intel-gfx] [PATCH] drm/i915/selftest: Fix an error code in mock_context_barrier()
Quoting Dan Carpenter (2021-02-15 15:58:27) > If the igt_request_alloc() call fails then this should return a > negative error code, but currently it returns success. > > Fixes: 85fddf0b0027 ("drm/i915: Introduce a context barrier callback") > Signed-off-by: Dan Carpenter Reviewed-by: Chris Wilson Looks to be the only missing one in that file; smatch will no doubt find more. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/3] drm/i915: Refine VT-d scanout workaround
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 160 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. v2: Move the guard from modifying drm_mm_node.start which is still used by the drm_mm itself, into an adjustment of node.start at the point of use. v3: Pass the requested guard padding from the caller, so we can drop the VT-d w/a knowledge from the i915_vma allocator. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld Reviewed-by: Matthew Auld # v2 --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 6 ++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 25 +- drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_vma.c| 8 +++ 4 files changed, 16 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..4b577cf7ec7e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -16,6 +16,8 @@ #include "i915_gem_lmem.h" #include "i915_gem_mman.h" +#define VTD_GUARD (160 * I915_GTT_PAGE_SIZE) /* 160 PTE padding */ + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || @@ -345,6 +347,10 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + /* VT-d may overfetch before/after the vma, so pad with scratch */ + if (intel_scanout_needs_vtd_wa(i915)) + flags |= PIN_OFFSET_GUARD | VTD_GUARD; + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 6b326138e765..251b50884d1c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, -"First entry = %d; Num entries = %d (max=%d)\n", -first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page; ggtt->vm.clear_range = nop_clear_range; - if (intel_scanout_needs_vtd_wa(i915)) - ggtt->vm.clear_range = gen8_ggtt_clear_range; ggtt->vm.insert_entries = gen8_ggtt_insert_entries; @@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.alloc_pt_dma = alloc_pt_dma; ggtt->vm.clear_range = nop_clear_range; - if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915)) + if (!HAS_FULL_PPGTT(i915)) ggtt->vm.clear_range = gen6_ggtt_clear_range; ggtt->vm.insert_page = gen6_ggtt_insert_page; ggtt->vm.insert_entries = gen6_ggtt_insert_entries; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index c9b0ee5e1d23..f3ae9afdee15 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -41,6 +41,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, #define PIN_HIGH BIT_UL
[Intel-gfx] [PATCH 1/3] drm/i915: Wrap all access to i915_vma.node.start|size
We already wrap i915_vma.node.start for use with the GGTT, as there we can perform additional sanity checks that the node belongs to the GGTT and fits within the 32b registers. In the next couple of patches, we will introduce guard pages around the objects _inside_ the drm_mm_node allocation. That is we will offset the vma->pages so that the first page is at drm_mm_node.start + vma->guard (not 0 as is currently the case). All users must then not use i915_vma.node.start directly, but compute the guard offset, thus all users are converted to use a i915_vma_offset() wrapper. The notable exceptions are the selftests that are testing exact behaviour of i915_vma_pin/i915_vma_insert. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/display/intel_fbdev.c| 6 +-- .../gpu/drm/i915/gem/i915_gem_client_blt.c| 3 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 41 ++- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +- .../gpu/drm/i915/gem/i915_gem_object_blt.c| 13 +++--- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c| 4 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../i915/gem/selftests/i915_gem_client_blt.c | 15 +++ .../drm/i915/gem/selftests/i915_gem_context.c | 19 ++--- .../drm/i915/gem/selftests/i915_gem_mman.c| 2 +- .../drm/i915/gem/selftests/igt_gem_utils.c| 7 ++-- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 2 +- drivers/gpu/drm/i915/gt/gen7_renderclear.c| 2 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 8 ++-- drivers/gpu/drm/i915/gt/intel_ggtt.c | 5 ++- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 3 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 +++- drivers/gpu/drm/i915/gt/intel_renderstate.c | 2 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 2 +- drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 12 +++--- drivers/gpu/drm/i915/gt/selftest_execlists.c | 18 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 17 drivers/gpu/drm/i915/gt/selftest_lrc.c| 16 .../drm/i915/gt/selftest_ring_submission.c| 2 +- drivers/gpu/drm/i915/gt/selftest_rps.c| 12 +++--- .../gpu/drm/i915/gt/selftest_workarounds.c| 9 ++-- drivers/gpu/drm/i915/i915_cmd_parser.c| 4 +- drivers/gpu/drm/i915/i915_debugfs.c | 3 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/i915_perf.c | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 4 +- drivers/gpu/drm/i915/i915_vma.c | 21 +- drivers/gpu/drm/i915/i915_vma.h | 23 +-- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 +- drivers/gpu/drm/i915/selftests/i915_request.c | 20 - .../gpu/drm/i915/selftests/i915_scheduler.c | 4 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 11 +++-- 38 files changed, 196 insertions(+), 145 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 07db8e83f98e..2a72fb3c8416 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -237,8 +237,8 @@ static int intelfb_create(struct drm_fb_helper *helper, /* Our framebuffer is the entirety of fbdev's system memory */ info->fix.smem_start = - (unsigned long)(ggtt->gmadr.start + vma->node.start); - info->fix.smem_len = vma->node.size; + (unsigned long)(ggtt->gmadr.start + i915_ggtt_offset(vma)); + info->fix.smem_len = vma->size; vaddr = i915_vma_pin_iomap(vma); if (IS_ERR(vaddr)) { @@ -248,7 +248,7 @@ static int intelfb_create(struct drm_fb_helper *helper, goto out_unpin; } info->screen_base = vaddr; - info->screen_size = vma->node.size; + info->screen_size = vma->size; drm_fb_helper_fill_info(info, &ifbdev->helper, sizes); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c index 44821d94544f..864510b3a7fd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c @@ -218,7 +218,8 @@ static void clear_pages_worker(struct work_struct *work) } err = rq->engine->emit_bb_start(rq, - batch->node.start, batch->node.size, + i915_vma_offset(batch), + i915_vma_size(batch), 0); out_request: if (unlikely(err)) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index fe170186dd42..7f985902c24a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/
[Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma
Introduce the concept of padding the i915_vma with guard pages before and aft. The major consequence is that all ordinary uses of i915_vma must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size directly, as the drm_mm_node will include the guard pages that surround our object. The biggest connundrum is how exactly to mix requesting a fixed address with guard pages, particularly through the existing uABI. The user does not know about guard pages, so such must be transparent to the user, and so the execobj.offset must be that of the object itself excluding the guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. The caveat is that some placements will be impossible with guard pages, as wrap arounds need to be avoided, and the vma itself will require a larger node. We must we not report EINVAL but ENOSPC as these are unavailable locations within the GTT rather than conflicting user requirements. In the next patch, we start using guard pages for scanout objects. While these are limited to GGTT vma, on a few platforms these vma (or at least an alias of the vma) is shared with userspace, so we may leak the existence of such guards if we are not careful to ensure that the execobj.offset is transparent and excludes the guards. (On such platforms like ivb, without full-ppgtt, userspace has to use relocations so the presence of more untouchable regions within its GTT such be of no further issue.) v2: Include the guard range in the overflow checks and placement restrictions. Signed-off-by: Chris Wilson Cc: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- drivers/gpu/drm/i915/i915_vma.c | 28 ++- drivers/gpu/drm/i915/i915_vma.h | 5 +++-- drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- 4 files changed, 38 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index c5803c434d33..6b326138e765 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen8_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) gen8_set_pte(gte++, pte_encode | addr); GEM_BUG_ON(gte > end); @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen6_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) iowrite32(vm->pte_encode(addr, level, flags), gte++); GEM_BUG_ON(gte > end); diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 748f5ea1ba04..31d0f8b64ec0 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color) static int i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { - unsigned long color; + unsigned long color, guard; u64 start, end; int ret; @@ -631,7 +631,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); size = max(size, vma->size); - alignment = max(alignment, vma->display_alignment); + alignment = max_t(typeof(alignment), alignment, vma->display_alignment); if (flags & PIN_MAPPABLE) { size = max_t(typeof(size), size, vma->fence_size); alignment = max_t(typeof(alignment), @@ -642,6 +642,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT)); GEM_BUG_ON(!is_power_of_2(alignment)); + guard = vma->guard; /* retain guard across rebinds */ + guard = ALIGN(guard, alignment); + start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0; GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE)); @@ -651,12 +654,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) if (flags & PIN_ZONE_4G) end = min_t(u64, end, (1ULL &
Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases
Quoting Ram Moon, AnandX (2021-02-15 12:29:17) > Hi Chris, > > -Original Message- > From: dri-devel On Behalf Of Chris > Wilson > Sent: Wednesday, February 10, 2021 4:15 PM > To: Ram Moon, AnandX ; Jani Nikula > ; Auld, Matthew ; > Surendrakumar Upadhyay, TejaskumarX > ; Ursulin, Tvrtko > ; dri-de...@lists.freedesktop.org; > intel-gfx@lists.freedesktop.org > Cc: Ram Moon, AnandX > Subject: Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size > for corner cases > > Quoting Anand Moon (2021-02-10 07:59:29) > > Add check for object size to return appropriate error -E2BIG or > > -EINVAL to avoid WARM_ON and successful return for some testcase. > > No. You miss the point of having those warnings. We need to inspect the code > to remove the last remaining "int pagenum", and then we can remove the > GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for users, > only for us to motivate us into finally fixing the code. > -Chris > > Yes, I got your point these check are meant to catch up integer overflow. > > I have check with the IGT testcase case _gem_create_ and _gem_userptr_blits_ > > which fails pass size *-1ull << 32* left shift and *0~* which leads to > integer overflow > ie _negative_ size passed to create i915_gem_create via ioctl this leads > to GM_WARN_ON. > > Can we drop these testcase so that we don't break the kernel ? The kernel rejects the ioctl with the expected errno. We leave a warning purely for the benefit of CI, only when compiled for debugging and not by default, so that we have a persistent reminder to do the code review. What's broken? -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [CI 4/4] drm/i915: move intel_init_audio_hooks inside display
Quoting Lucas De Marchi (2021-02-13 04:27:56) > intel_init_audio_hooks() sets up hooks in the display struct and only > makes sense when we have display. Move it inside > intel_init_display_hooks() so it isn't called when we don't have > display. > > Signed-off-by: Lucas De Marchi And patch 2 answered the question of what about intel_audio_init(). Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 3/3] drm/i915/gt: Export device and per-process runtimes via procfs
Quoting Emil Velikov (2021-02-12 15:45:04) > On Fri, 12 Feb 2021 at 15:16, Chris Wilson wrote: > > > > Quoting Emil Velikov (2021-02-12 14:57:56) > > > Hi Chris, > > > > > > On Thu, 4 Feb 2021 at 12:11, Chris Wilson > > > wrote: > > > > > > > > Register with /proc/gpu to provide the client runtimes for generic > > > > top-like overview, e.g. gnome-system-monitor can use this information to > > > > show the per-process multi-GPU usage. > > > > > > > Exposing this information to userspace sounds great IMHO and like the > > > proposed "channels" for the device engines. > > > If it were me, I would have the channel names a) exposed to userspace > > > and b) be a "fixed set". > > > > - Total > > - Graphics > > - Compute > > - Unified > > - Video > > - Copy > > - Display > > - Other > > > > Enough versatility for the foreseeable future? > > But plan for extension. > > > With a bit of documentation about "unified" (is it a metric also > counted towards any of the rest) it would be perfect. With unified I was trying to find a place to things that are neither wholly graphics nor compute, as some may prefer not to categorise themselves as one or the other. Also whether or not some cores are more compute than others (so should there be an AI/RT/ALU?) > For future extension one might consider splitting video into > encoder/decoder/post-processing. Ok, I wasn't sure how commonly those functions were split on different HW. > > The other aspect then is the capacity of each channel. We can keep it > > simple as the union/average (whichever the driver has to hand) runtime in > > nanoseconds over all IP blocks within a channel. > > Not sure what you mean with capacity. Are you referring to having > multiple instances of the same engine (say 3 separate copy engines)? > Personally I'm inclined to keep these separate entries, since some > hardware can have multiple ones. > > For example - before the latest changes nouveau had 8 copy engines, > 3+3 video 'generic' video (enc,dec)oder engines, amongst others. Yes, most HW have multiple engines within a family. Trying to keep it simple, I thought presenting just one runtime metric for the whole channel. Especially for the single-line per device format I had picked :) If we switch to a more extensible format, -'$device0' : -$channel0 : { Total : $total # avg/union over all engines Engines : [ $0, $1, ... ] } ... -'$device1' : ... Using the same fixed channel names, and dev_name(), pesky concerns such as keeping it as a simple scanf can be forgotten. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "HAX sound: Disable probing snd_hda with DG1"
Quoting Chris Wilson (2021-02-12 15:22:13) > Quoting Kai Vehmanen (2021-02-12 14:53:02) > > This reverts commit 3632610d38316bca9b0cd9d649ce3cefab58520a. > > > > DG1 has been supported in upstream since v5.10 with commit > > 69b08bdfa818 ("ALSA: hda - add Intel DG1 PCI and HDMI ids"). > > Exactly, otherwise this patch wouldn't have been required to stop CI > from timing out upon snd probing the hdmi components. You need the other > half to be supported as well before CI is ready. Just in case it isn't clear, this patch is only in the CI topic branch to hide known issues with CI that we simply aren't ready for. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Revert "HAX sound: Disable probing snd_hda with DG1"
Quoting Kai Vehmanen (2021-02-12 14:53:02) > This reverts commit 3632610d38316bca9b0cd9d649ce3cefab58520a. > > DG1 has been supported in upstream since v5.10 with commit > 69b08bdfa818 ("ALSA: hda - add Intel DG1 PCI and HDMI ids"). Exactly, otherwise this patch wouldn't have been required to stop CI from timing out upon snd probing the hdmi components. You need the other half to be supported as well before CI is ready. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 3/3] drm/i915/gt: Export device and per-process runtimes via procfs
Quoting Emil Velikov (2021-02-12 14:57:56) > Hi Chris, > > On Thu, 4 Feb 2021 at 12:11, Chris Wilson wrote: > > > > Register with /proc/gpu to provide the client runtimes for generic > > top-like overview, e.g. gnome-system-monitor can use this information to > > show the per-process multi-GPU usage. > > > Exposing this information to userspace sounds great IMHO and like the > proposed "channels" for the device engines. > If it were me, I would have the channel names a) exposed to userspace > and b) be a "fixed set". - Total - Graphics - Compute - Unified - Video - Copy - Display - Other Enough versatility for the foreseeable future? But plan for extension. The other aspect then is the capacity of each channel. We can keep it simple as the union/average (whichever the driver has to hand) runtime in nanoseconds over all IP blocks within a channel. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] i915/gem_userptr_blts: Move readonly test into common block
Since we only run test_readonly for a single sync-flag, place it in the common block. Signed-off-by: Chris Wilson --- tests/i915/gem_userptr_blits.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c index 1bc2d3600..a4f137f93 100644 --- a/tests/i915/gem_userptr_blits.c +++ b/tests/i915/gem_userptr_blits.c @@ -2474,6 +2474,9 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL) igt_subtest("set-cache-level") test_set_caching(fd); + igt_subtest("readonly") + test_readonly(fd); + igt_subtest("userfault") test_userfault(fd); @@ -2515,9 +2518,6 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL) igt_subtest("dmabuf-unsync") test_dmabuf(); - igt_subtest("readonly-unsync") - test_readonly(fd); - igt_describe("Examine mmap-offset mapping to read-only userptr"); igt_subtest_with_dynamic("readonly-mmap-unsync") for_each_mmap_offset_type(fd, t) -- 2.30.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] i915/gem_userptr_blts: Move readonly test into common block
Since we only run test_readonly for a single sync-flag, place it in the common block. Signed-off-by: Chris Wilson --- tests/i915/gem_userptr_blits.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/i915/gem_userptr_blits.c b/tests/i915/gem_userptr_blits.c index 1bc2d3600..a4f137f93 100644 --- a/tests/i915/gem_userptr_blits.c +++ b/tests/i915/gem_userptr_blits.c @@ -2474,6 +2474,9 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL) igt_subtest("set-cache-level") test_set_caching(fd); + igt_subtest("readonly") + test_readonly(fd); + igt_subtest("userfault") test_userfault(fd); @@ -2515,9 +2518,6 @@ igt_main_args("c:", NULL, help_str, opt_handler, NULL) igt_subtest("dmabuf-unsync") test_dmabuf(); - igt_subtest("readonly-unsync") - test_readonly(fd); - igt_describe("Examine mmap-offset mapping to read-only userptr"); igt_subtest_with_dynamic("readonly-mmap-unsync") for_each_mmap_offset_type(fd, t) -- 2.30.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma
Quoting Matthew Auld (2021-02-12 13:43:46) > On Fri, 12 Feb 2021 at 10:22, Chris Wilson wrote: > > > > Introduce the concept of padding the i915_vma with guard pages before > > and aft. The major consequence is that all ordinary uses of i915_vma > > must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size > > directly, as the drm_mm_node will include the guard pages that surround > > our object. > > > > So in this patch, we look for all uses of i915_vma->node.start that > > instead need to include the guard offset and switch them to > > i915_vma_offset(), and in a few cases to i915_ggtt_offset(). Notable > > exceptions are the selftests, which expect exact behaviour. > > > > The biggest connundrum is how exactly to mix request a fixed address > > with guard pages, particular through the existing uABI. The user does > > not know about guard pages, so such must be transparent to the user, and > > so the execobj.offset must be that of the object itself excluding the > > guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. > > > > In the next patch, we start using guard pages for scanout objects. While > > these are limited to GGTT vma, on a few platforms these vma (or at least > > an alias of the vma) is shared with userspace, so we may leak the > > existence of such guards if we are not careful to ensure that the > > execobj.offset is transparent and excludes the guards. (On such platforms, > > without full-ppgtt, userspace has to use relocations so the presence of > > more untouchable regions within its GTT such be of no further issue.) > > > > Signed-off-by: Chris Wilson > > Cc: Matthew Auld > > --- > > drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- > > drivers/gpu/drm/i915/i915_vma.c | 10 +++--- > > drivers/gpu/drm/i915/i915_vma.h | 8 > > drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- > > 4 files changed, 23 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > index c5803c434d33..6b326138e765 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen8_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > + while (gte < end) > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > + > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > for_each_sgt_daddr(addr, iter, vma->pages) > > gen8_set_pte(gte++, pte_encode | addr); > > GEM_BUG_ON(gte > end); > > @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen6_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > + end = gte + vma->guard / I915_GTT_PAGE_SIZE; > > + while (gte < end) > > + gen8_set_pte(gte++, vm->scratch[0]->encode); > > + > > + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; > > for_each_sgt_daddr(addr, iter, vma->pages) > > iowrite32(vm->pte_encode(addr, level, flags), gte++); > > GEM_BUG_ON(gte > end); > > diff --git a/drivers/gpu/drm/i915/i915_vma.c > > b/drivers/gpu/drm/i915/i915_vma.c > > index 17fe455bd770..155f510b4cc6 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.c > > +++ b/drivers/gpu/drm/i915/i915_vma.c > > @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, > > unsigned long color) > > static int > > i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) > > { > > - unsigned long color; > > + unsigned long color, guard; > > u64 start, end; > > int ret; > > > > @@ -631,13 +631,16 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 > > alignment, u64 flags) > > GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); > > > > size = max(size, vma->size); > > - alignment = max(alignment, vma->display_alignment); > > + alignment =
[Intel-gfx] [PATCH v6] drm/i915/gt: Ratelimit heartbeat completion probing
The heartbeat runs through a few phases that we expect to complete within a certain number of heartbeat intervals. First we must submit the heartbeat to the queue, and if the queue is occupied it may take a couple of intervals before the heartbeat preempts the workload and is submitted to HW. Once running on HW, completion is not instantaneous as it may have to first reset the current workload before it itself runs through the empty request and signals completion. As such, we know that the heartbeat must take at least the preempt reset timeout and before we have had a chance to reset the engine, we do not want to issue a global reset ourselves (simply so that we only try to do one reset at a time and not confuse ourselves by resetting twice and hitting an innocent.) So by taking into consideration that once running the request must take a finite amount of time, we can delay the final completion check to accommodate that and avoid checking too early (before we've had a chance to handle any engine resets required). v3: Now with more tracking for selftests and detection of false/unexpected hang reports. v2,4,5: Add a completion handler for the heartbeat to speed up discovery of a successful heartbeat. Remove it, but then put it back to handle large mismatches between the sysfs properties and reality. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853 Suggested-by: CQ Tang Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 107 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 ++ .../drm/i915/gt/selftest_engine_heartbeat.c | 103 +++-- drivers/gpu/drm/i915/gt/selftest_execlists.c | 5 +- 4 files changed, 173 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 0b062fad1837..7e7d2ff2299b 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -20,6 +20,18 @@ * issue a reset -- in the hope that restores progress. */ +#define HEARTBEAT_COMPLETION 50u /* milliseconds */ + +static long completion_timeout(const struct intel_engine_cs *engine) +{ + long timeout = HEARTBEAT_COMPLETION; + + if (intel_engine_has_preempt_reset(engine)) + timeout += READ_ONCE(engine->props.preempt_timeout_ms); + + return msecs_to_jiffies(timeout); +} + static bool next_heartbeat(struct intel_engine_cs *engine) { long delay; @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine) return false; delay = msecs_to_jiffies_timeout(delay); + + /* +* Once we submit a heartbeat to the HW, we know that it will take +* at least a certain amount of time to complete. On a hanging system +* it will first have to wait for the preempt reset timeout, and +* then it will take some time for the reset to resume with the +* heartbeat and for it to complete. So once we have submitted the +* heartbeat to HW, we can wait a while longer before declaring the +* engine stuck and forcing a reset ourselves. If we do a reset +* and the engine is also doing a reset, it is possible that we +* reset the engine twice, harming an innocent. +* +* Before we have sumitted the heartbeat, we do not want to change +* the interval as we want to promote the heartbeat and trigger +* preemption in a deterministic time frame. +*/ + if (engine->heartbeat.systole && + i915_request_is_active(engine->heartbeat.systole)) + delay = max(delay, completion_timeout(engine)); + if (delay >= HZ) delay = round_jiffies_up_relative(delay); mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, delay + 1); @@ -48,12 +80,52 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp) return rq; } +static void +untrack_heartbeat(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = engine->heartbeat.systole; + if (!rq) + return; + + ENGINE_TRACE(engine, "heartbeat " RQ_FMT " completed\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine->heartbeat.completed++); + + dma_fence_remove_callback(&rq->fence, &engine->heartbeat.cb); + WRITE_ONCE(engine->heartbeat.systole, NULL); + + i915_request_put(rq); +} + +static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb) +{ + struct intel_engine_cs *engine = + container_of(cb, typeof(*engine), heartbeat.cb); + + if (READ_ONCE(engine->heartbeat.systole)) + mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, 0); +} + +static void +track_heartbeat(struct intel_engine_cs *engine, struct i915_req
[Intel-gfx] [PATCH 3/3] drm/i915: Refine VT-d scanout workaround
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 160 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. v2: Move the guard from modifying drm_mm_node.start which is still used by the drm_mm itself, into an adjustment of node.start at the point of use. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 25 +- drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_vma.c| 10 + 4 files changed, 15 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..9f2ccc255ca1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + if (intel_scanout_needs_vtd_wa(i915)) + flags |= PIN_VTD; + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 6b326138e765..251b50884d1c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, -"First entry = %d; Num entries = %d (max=%d)\n", -first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page; ggtt->vm.clear_range = nop_clear_range; - if (intel_scanout_needs_vtd_wa(i915)) - ggtt->vm.clear_range = gen8_ggtt_clear_range; ggtt->vm.insert_entries = gen8_ggtt_insert_entries; @@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.alloc_pt_dma = alloc_pt_dma; ggtt->vm.clear_range = nop_clear_range; - if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915)) + if (!HAS_FULL_PPGTT(i915)) ggtt->vm.clear_range = gen6_ggtt_clear_range; ggtt->vm.insert_page = gen6_ggtt_insert_page; ggtt->vm.insert_entries = gen6_ggtt_insert_entries; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index c9b0ee5e1d23..8a2dfc7144cf 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -41,6 +41,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, #define PIN_HIGH BIT_ULL(5) #define PIN_OFFSET_BIASBIT_ULL(6) #define PIN_OFFSET_FIXED BIT_ULL(7) +#define PIN_VTDBIT_ULL(8) #define PIN_GLOBAL BIT_ULL(10) /* I915_VMA_GLOBAL_BIND */ #define PIN_USER BIT_ULL(11) /* I915_VMA_LOCAL_BIND */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 155f510b4cc6..929d2a1a20b8 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -38,6 +38,8 @@ #include "i915_trace.h" #include &quo
[Intel-gfx] [PATCH 2/3] drm/i915: Introduce guard pages to i915_vma
Introduce the concept of padding the i915_vma with guard pages before and aft. The major consequence is that all ordinary uses of i915_vma must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size directly, as the drm_mm_node will include the guard pages that surround our object. So in this patch, we look for all uses of i915_vma->node.start that instead need to include the guard offset and switch them to i915_vma_offset(), and in a few cases to i915_ggtt_offset(). Notable exceptions are the selftests, which expect exact behaviour. The biggest connundrum is how exactly to mix request a fixed address with guard pages, particular through the existing uABI. The user does not know about guard pages, so such must be transparent to the user, and so the execobj.offset must be that of the object itself excluding the guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. In the next patch, we start using guard pages for scanout objects. While these are limited to GGTT vma, on a few platforms these vma (or at least an alias of the vma) is shared with userspace, so we may leak the existence of such guards if we are not careful to ensure that the execobj.offset is transparent and excludes the guards. (On such platforms, without full-ppgtt, userspace has to use relocations so the presence of more untouchable regions within its GTT such be of no further issue.) Signed-off-by: Chris Wilson Cc: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- drivers/gpu/drm/i915/i915_vma.c | 10 +++--- drivers/gpu/drm/i915/i915_vma.h | 8 drivers/gpu/drm/i915/i915_vma_types.h | 3 ++- 4 files changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index c5803c434d33..6b326138e765 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -238,8 +238,12 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen8_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) gen8_set_pte(gte++, pte_encode | addr); GEM_BUG_ON(gte > end); @@ -289,8 +293,12 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen6_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + end = gte + vma->guard / I915_GTT_PAGE_SIZE; + while (gte < end) + gen8_set_pte(gte++, vm->scratch[0]->encode); + + end += (vma->node.size - vma->guard) / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) iowrite32(vm->pte_encode(addr, level, flags), gte++); GEM_BUG_ON(gte > end); diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 17fe455bd770..155f510b4cc6 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -623,7 +623,7 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color) static int i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { - unsigned long color; + unsigned long color, guard; u64 start, end; int ret; @@ -631,13 +631,16 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); size = max(size, vma->size); - alignment = max(alignment, vma->display_alignment); + alignment = max_t(typeof(alignment), alignment, vma->display_alignment); if (flags & PIN_MAPPABLE) { size = max_t(typeof(size), size, vma->fence_size); alignment = max_t(typeof(alignment), alignment, vma->fence_alignment); } + guard = 0; + size += 2 * guard; + GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)); GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT)); GEM_BUG_ON(!is_power_of_2(alignment)); @@ -674,7 +677,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) return -EINVAL; ret = i915_gem_gtt_reserve(vma->vm, &vma->node, - size, offset, color, + size, offset - guard, color, flags); if (ret) return ret; @@ -7
[Intel-gfx] [PATCH 1/3] drm/i915: Wrap all access to i915_vma.node.start|size
We already wrap i915_vma.node.start for use with the GGTT, as there we can perform additional sanity checks that the node belongs to the GGTT and fits within the 32b registers. In the next couple of patches, we will introduce guard pages around the objects _inside_ the drm_mm_node allocation. That is we will offset the vma->pages so that the first page is at drm_mm_node.start + vma->guard (not 0 as is currently the case). All users must then not use i915_vma.node.start directly, but compute the guard offset, thus all users are converted to use a i915_vma_offset() wrapper. The notable exceptions are the selftests that are testing exact behaviour of i915_vma_pin/i915_vma_insert. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/display/intel_fbdev.c| 6 ++-- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 29 ++- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +- .../gpu/drm/i915/gem/i915_gem_object_blt.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c| 4 +-- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../i915/gem/selftests/i915_gem_client_blt.c | 11 +++ .../drm/i915/gem/selftests/i915_gem_context.c | 16 ++ .../drm/i915/gem/selftests/i915_gem_mman.c| 2 +- .../drm/i915/gem/selftests/igt_gem_utils.c| 4 +-- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 2 +- drivers/gpu/drm/i915/gt/gen7_renderclear.c| 2 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 8 ++--- drivers/gpu/drm/i915/gt/intel_ggtt.c | 5 ++-- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 3 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 +++-- drivers/gpu/drm/i915/gt/intel_renderstate.c | 2 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 2 +- drivers/gpu/drm/i915/gt/selftest_execlists.c | 8 ++--- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 14 + .../drm/i915/gt/selftest_ring_submission.c| 2 +- drivers/gpu/drm/i915/gt/selftest_rps.c| 12 drivers/gpu/drm/i915/i915_debugfs.c | 3 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +-- drivers/gpu/drm/i915/i915_perf.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 21 +++--- drivers/gpu/drm/i915/i915_vma.h | 18 ++-- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 ++-- drivers/gpu/drm/i915/selftests/i915_request.c | 8 ++--- .../gpu/drm/i915/selftests/i915_scheduler.c | 4 +-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 9 -- 32 files changed, 134 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 07db8e83f98e..2a72fb3c8416 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -237,8 +237,8 @@ static int intelfb_create(struct drm_fb_helper *helper, /* Our framebuffer is the entirety of fbdev's system memory */ info->fix.smem_start = - (unsigned long)(ggtt->gmadr.start + vma->node.start); - info->fix.smem_len = vma->node.size; + (unsigned long)(ggtt->gmadr.start + i915_ggtt_offset(vma)); + info->fix.smem_len = vma->size; vaddr = i915_vma_pin_iomap(vma); if (IS_ERR(vaddr)) { @@ -248,7 +248,7 @@ static int intelfb_create(struct drm_fb_helper *helper, goto out_unpin; } info->screen_base = vaddr; - info->screen_size = vma->node.size; + info->screen_size = vma->size; drm_fb_helper_fill_info(info, &ifbdev->helper, sizes); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index fe170186dd42..f75f6751cf3f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -368,22 +368,23 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, const struct i915_vma *vma, unsigned int flags) { - if (vma->node.size < entry->pad_to_size) + const u64 start = i915_vma_offset(vma); + const u64 size = i915_vma_size(vma); + + if (size < entry->pad_to_size) return true; - if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment)) + if (entry->alignment && !IS_ALIGNED(start, entry->alignment)) return true; - if (flags & EXEC_OBJECT_PINNED && - vma->node.start != entry->offset) + if (flags & EXEC_OBJECT_PINNED && start != entry->offset) return true; - if (flags & __EXEC_OBJECT_NEEDS_BIAS && - vma->node.start < BATCH_OFFSET_BIAS) + if (flags & __EXEC_OBJECT_NEEDS_BIAS && start <
[Intel-gfx] [PATCH v5] drm/i915/gt: Ratelimit heartbeat completion probing
The heartbeat runs through a few phases that we expect to complete within a certain number of heartbeat intervals. First we must submit the heartbeat to the queue, and if the queue is occupied it may take a couple of intervals before the heartbeat preempts the workload and is submitted to HW. Once running on HW, completion is not instantaneous as it may have to first reset the current workload before it itself runs through the empty request and signals completion. As such, we know that the heartbeat must take at least the preempt reset timeout and before we have had a chance to reset the engine, we do not want to issue a global reset ourselves (simply so that we only try to do one reset at a time and not confuse ourselves by resetting twice and hitting an innocent.) So by taking into consideration that once running the request must take a finite amount of time, we can delay the final completion check to accommodate that and avoid checking too early (before we've had a chance to handle any engine resets required). v3: Now with more tracking for selftests and detection of false/unexpected hang reports. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853 Suggested-by: CQ Tang Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 92 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 7 ++ .../drm/i915/gt/selftest_engine_heartbeat.c | 93 --- drivers/gpu/drm/i915/gt/selftest_execlists.c | 5 +- 4 files changed, 148 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 0b062fad1837..e14f9eab5779 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -20,6 +20,18 @@ * issue a reset -- in the hope that restores progress. */ +#define HEARTBEAT_COMPLETION 50u /* milliseconds */ + +static long completion_timeout(const struct intel_engine_cs *engine) +{ + long timeout = HEARTBEAT_COMPLETION; + + if (intel_engine_has_preempt_reset(engine)) + timeout += READ_ONCE(engine->props.preempt_timeout_ms); + + return msecs_to_jiffies(timeout); +} + static bool next_heartbeat(struct intel_engine_cs *engine) { long delay; @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine) return false; delay = msecs_to_jiffies_timeout(delay); + + /* +* Once we submit a heartbeat to the HW, we know that it will take +* at least a certain amount of time to complete. On a hanging system +* it will first have to wait for the preempt reset timeout, and +* then it will take some time for the reset to resume with the +* heartbeat and for it to complete. So once we have submitted the +* heartbeat to HW, we can wait a while longer before declaring the +* engine stuck and forcing a reset ourselves. If we do a reset +* and the engine is also doing a reset, it is possible that we +* reset the engine twice, harming an innocent. +* +* Before we have sumitted the heartbeat, we do not want to change +* the interval as we want to promote the heartbeat and trigger +* preemption in a deterministic time frame. +*/ + if (engine->heartbeat.systole && + i915_request_is_active(engine->heartbeat.systole)) + delay = max(delay, completion_timeout(engine)); + if (delay >= HZ) delay = round_jiffies_up_relative(delay); mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, delay + 1); @@ -48,12 +80,39 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp) return rq; } +static void +untrack_heartbeat(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = engine->heartbeat.systole; + if (!rq) + return; + + ENGINE_TRACE(engine, "heartbeat " RQ_FMT " completed\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine->heartbeat.completed++); + + WRITE_ONCE(engine->heartbeat.systole, NULL); + i915_request_put(rq); +} + +static void +track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq) +{ + ENGINE_TRACE(engine, "heartbeat " RQ_FMT " started\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine->heartbeat.submitted++); + + WRITE_ONCE(engine->heartbeat.systole, i915_request_get(rq)); + if (!next_heartbeat(engine)) + untrack_heartbeat(engine); +} + static void idle_pulse(struct intel_engine_cs *engine, struct i915_request *rq) { engine->wakeref_serial = READ_ONCE(engine->serial) + 1; i915_request_add_active_barriers(rq); if (!engine->heartbeat.systole && intel_engine_has_heartbeat(engine)) - engine-&
[Intel-gfx] [PATCH v4] drm/i915/gt: Ratelimit heartbeat completion probing
The heartbeat runs through a few phases that we expect to complete within a certain number of heartbeat intervals. First we must submit the heartbeat to the queue, and if the queue is occupied it may take a couple of intervals before the heartbeat preempts the workload and is submitted to HW. Once running on HW, completion is not instantaneous as it may have to first reset the current workload before it itself runs through the empty request and signals completion. As such, we know that the heartbeat must take at least the preempt reset timeout and before we have had a chance to reset the engine, we do not want to issue a global reset ourselves (simply so that we only try to do one reset at a time and not confuse ourselves by resetting twice and hitting an innocent.) So by taking into consideration that once running the request must take a finite amount of time, we can delay the final completion check to accommodate that and avoid checking too early (before we've had a chance to handle any engine resets required). v2: Attach a callback to flush the work immediately upon the heartbeat completion and insert the delay before the next. v3: Now with more tracking for selftests and detection of false/unexpected hang reports. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853 Suggested-by: CQ Tang Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 107 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 ++ .../drm/i915/gt/selftest_engine_heartbeat.c | 93 +-- drivers/gpu/drm/i915/gt/selftest_execlists.c | 5 +- 4 files changed, 163 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 0b062fad1837..351f58def216 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -20,6 +20,18 @@ * issue a reset -- in the hope that restores progress. */ +#define HEARTBEAT_COMPLETION 50u /* milliseconds */ + +static long completion_timeout(const struct intel_engine_cs *engine) +{ + long timeout = HEARTBEAT_COMPLETION; + + if (intel_engine_has_preempt_reset(engine)) + timeout += READ_ONCE(engine->props.preempt_timeout_ms); + + return msecs_to_jiffies(timeout); +} + static bool next_heartbeat(struct intel_engine_cs *engine) { long delay; @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine) return false; delay = msecs_to_jiffies_timeout(delay); + + /* +* Once we submit a heartbeat to the HW, we know that it will take +* at least a certain amount of time to complete. On a hanging system +* it will first have to wait for the preempt reset timeout, and +* then it will take some time for the reset to resume with the +* heartbeat and for it to complete. So once we have submitted the +* heartbeat to HW, we can wait a while longer before declaring the +* engine stuck and forcing a reset ourselves. If we do a reset +* and the engine is also doing a reset, it is possible that we +* reset the engine twice, harming an innocent. +* +* Before we have sumitted the heartbeat, we do not want to change +* the interval as we want to promote the heartbeat and trigger +* preemption in a deterministic time frame. +*/ + if (engine->heartbeat.systole && + i915_request_is_active(engine->heartbeat.systole)) + delay = max(delay, completion_timeout(engine)); + if (delay >= HZ) delay = round_jiffies_up_relative(delay); mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, delay + 1); @@ -48,12 +80,51 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp) return rq; } +static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb) +{ + struct intel_engine_cs *engine = + container_of(cb, typeof(*engine), heartbeat.cb); + + if (READ_ONCE(engine->heartbeat.systole)) + mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, 0); +} + +static void +untrack_heartbeat(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = fetch_and_zero(&engine->heartbeat.systole); + if (!rq) + return; + + ENGINE_TRACE(engine, "heartbeat " RQ_FMT "completed\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine->heartbeat.completed++); + + dma_fence_remove_callback(&rq->fence, &engine->heartbeat.cb); + i915_request_put(rq); +} + +static void +track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq) +{ + ENGINE_TRACE(engine, "heartbeat " RQ_FMT "started\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine-&g
Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
Quoting Matthew Auld (2021-02-11 17:00:20) > Throwing some color_adjust at it might be another option to consider. > Maybe something like: > > +static void i915_ggtt_color_adjust_vdt(const struct drm_mm_node *node, > + unsigned long color, > + u64 *start, > + u64 *end) > +{ > + if (color == COLOR_VDT) { > + *start += VDT_PADDING; > + *end -= VDT_PADDING; > + return; > + } > + > + if (node->allocated && node->color == COLOR_VDT) > + *start += VDT_PADDING; > + > + node = list_next_entry(node, node_list); > + if (node->allocated && node->color == COLOR_VDT) > + *end -= VDT_PADDING; > +} > > But not sure if I would call that simpler. Plus we need to add more > special casing in the eviction code. And I guess the cache coloring > might also be active here, which might be nasty. Also we end up with a > bunch of holes in the address space that are unusable, yet the mm search > will keep hitting them anyway. Ok, I guess that's what you meant with > not introducing "extra nodes". Hmmm. I considered trying to use coloring, but the problem I found was in knowing whether or not to fill outside of the vma with scratch pages. We would have to lookup each PTE to check it is not in use, then fill with scratch. And if we removed a neighbour, it would have to check to see if it should replace the guard pages. (And it's more complicated by the wrap-around of the VT-d, an object at beginning of the GGTT would overfetch into the pages at the other end of the GGTT.) I felt that make an explicit reservation and accounting the VT-d overfetch to the scanout vma would save a lot of hassle. The PTE are accounted for (will not be reused, and safe to clear after resume), dedicated as scratch for the overfetch and the overfetch cannot wrap around as we force the vma to be away from the edges. Packing the information into the single scanout i915_vma so that we do a single padded i915_vma_insert seemed to be much easier to manage than having to do additional i915_vma_inserts on either side (and so we have to somehow manage searching for enough space for all 3 in the first call etc). Plus i915_vma is already massive :| -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
Quoting Ville Syrjälä (2021-02-11 16:05:59) > On Wed, Feb 10, 2021 at 11:39:46PM +0000, Chris Wilson wrote: > > @@ -637,6 +642,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 > > alignment, u64 flags) > > alignment, vma->fence_alignment); > > } > > > > + /* VT-d requires padding before/after the vma */ > > + if (flags & PIN_VTD) { > > + alignment = max_t(typeof(alignment), alignment, VTD_GUARD); > > + vma->guard = alignment; > > + size += 2 * vma->guard; > > + } > > + > > GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)); > > GEM_BUG_ON(!IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT)); > > GEM_BUG_ON(!is_power_of_2(alignment)); > someh> @@ -725,6 +737,11 @@ i915_vma_insert(struct i915_vma *vma, u64 size, > u64 alignment, u64 flags) > > > > list_add_tail(&vma->vm_link, &vma->vm->bound_list); > > > > + if (flags & PIN_VTD) { > > + vma->node.start += vma->guard; > > Was a bit worried for a second that this might give the display > a potentially misaligned vma start. But looks like you did consider > all that: VTD_GUARD==POT, alignment + guard both get bumped > to the max(). So AFAICS should guarantee everyone is happy. > > I guess we're now wasting a lot more ggtt address space though? > Not sure if anyone has ever been at risk of running out though. > And DPT should help with this on new platforms. Definitely this is a considerable bloat to most scanout buffers, which for the sake of argument lets say are 8MiB. Still enough room for a flip chain within the mappable portion, and when we get to scanouts that are large enough to consume the majority of the GGTT, the fixed 2MiB of padding is lost in the noise. So handwaving it shouldn't lead to noticeably more thrashing of the GGTT for existing platforms. There's too much recycling and too little reuse of scanouts in current display systems for my liking, so the extra 25% overhead in GGTT updates is more likely to be a concern. (Though it does balance out in that we now skip the clear after use.) -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
Quoting Matthew Auld (2021-02-11 14:25:41) > On 10/02/2021 23:39, Chris Wilson wrote: > > VT-d may cause overfetch of the scanout PTE, both before and after the > > vma (depending on the scanout orientation). bspec recommends that we > > provide a tile-row in either directions, and suggests using 160 PTE, > > warning that the accesses will wrap around the ends of the GGTT. > > Currently, we fill the entire GGTT with scratch pages when using VT-d to > > always ensure there are valid entries around every vma, including > > scanout. However, writing every PTE is slow as on recent devices we > > perform 8MiB of uncached writes, incurring an extra 100ms during resume. > > > > If instead we focus on only putting guard pages around scanout, we can > > avoid touching the whole GGTT. To avoid having to introduce extra nodes > > around each scanout vma, we adjust the scanout drm_mm_node to be smaller > > than the allocated space, and fixup the extra PTE during dma binding. > > > > Signed-off-by: Chris Wilson > > Cc: Ville Syrjälä > > Cc: Matthew Auld > > --- > > drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 ++ > > drivers/gpu/drm/i915/gt/intel_ggtt.c | 37 -- > > drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + > > drivers/gpu/drm/i915/i915_vma.c| 23 ++ > > drivers/gpu/drm/i915/i915_vma_types.h | 1 + > > 5 files changed, 41 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > index 0478b069c202..9f2ccc255ca1 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c > > @@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct > > drm_i915_gem_object *obj, > > if (ret) > > goto err; > > > > + if (intel_scanout_needs_vtd_wa(i915)) > > + flags |= PIN_VTD; > > + > > /* > >* As the user may map the buffer once pinned in the display plane > >* (e.g. libkms for the bootup splash), we have to ensure that we > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > index b0b8ded834f0..416f77f48561 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c > > @@ -238,6 +238,11 @@ static void gen8_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen8_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > + > > + end = gte - vma->guard / I915_GTT_PAGE_SIZE; > > + while (end < gte) > > + gen8_set_pte(end++, vm->scratch[0]->encode); > > + > > end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > for_each_sgt_daddr(addr, iter, vma->pages) > > @@ -245,6 +250,7 @@ static void gen8_ggtt_insert_entries(struct > > i915_address_space *vm, > > GEM_BUG_ON(gte > end); > > > > /* Fill the allocated but "unused" space beyond the end of the buffer > > */ > > + end += vma->guard / I915_GTT_PAGE_SIZE; > > while (gte < end) > > gen8_set_pte(gte++, vm->scratch[0]->encode); > > > > @@ -289,6 +295,11 @@ static void gen6_ggtt_insert_entries(struct > > i915_address_space *vm, > > > > gte = (gen6_pte_t __iomem *)ggtt->gsm; > > gte += vma->node.start / I915_GTT_PAGE_SIZE; > > + > > + end = gte - vma->guard / I915_GTT_PAGE_SIZE; > > + while (end < gte) > > + gen8_set_pte(end++, vm->scratch[0]->encode); > > + > > end = gte + vma->node.size / I915_GTT_PAGE_SIZE; > > > > for_each_sgt_daddr(addr, iter, vma->pages) > > @@ -296,6 +307,7 @@ static void gen6_ggtt_insert_entries(struct > > i915_address_space *vm, > > GEM_BUG_ON(gte > end); > > > > /* Fill the allocated but "unused" space beyond the end of the buffer > > */ > > + end += vma->guard / I915_GTT_PAGE_SIZE; > > while (gte < end) > > iowrite32(vm->scratch[0]->encode, gte++); > > > > @@ -311,27 +323,6 @@ static void nop_clear_range(struct i915_address_space > > *vm, > > { > > } > > > > -static void gen8_ggtt_clear_range(struct i915_address_space *vm, > > - u64 st
Re: [Intel-gfx] [PATCH] drm/i915: Probe system memory regions before enabling HW
Quoting Matthew Auld (2021-02-11 12:27:56) > On 11/02/2021 11:20, Chris Wilson wrote: > > If we want to track system/stolen as memory regions, we need to setup > > our bookkeeping *before* we want to start allocating and reserving > > objects in those regions. In particular, in setup up the GGTT we will > > try to preallocate stolen regions configured by the BIOS, and so should > > prepare the system-stolen memory region beforehand. > > > > Signed-off-by: Chris Wilson > > Cc: Matthew Auld > > --- > > drivers/gpu/drm/i915/i915_drv.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c > > b/drivers/gpu/drm/i915/i915_drv.c > > index b708517d3972..139cea4443fd 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -557,6 +557,10 @@ static int i915_driver_hw_probe(struct > > drm_i915_private *dev_priv) > > > > i915_perf_init(dev_priv); > > > > + ret = intel_memory_regions_hw_probe(dev_priv); > > + if (ret) > > + goto err_ggtt; > > Hmmm, adjust_stolen is also peeking at ggtt_total_entries(ggtt) on some > old platforms. A nice catch-22. More splitting required. Maybe a fixup pass from init_ggtt would work best. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3] drm/i915/gt: Ratelimit heartbeat completion probing
The heartbeat runs through a few phases that we expect to complete within a certain number of heartbeat intervals. First we must submit the heartbeat to the queue, and if the queue is occupied it may take a couple of intervals before the heartbeat preempts the workload and is submitted to HW. Once running on HW, completion is not instantaneous as it may have to first reset the current workload before it itself runs through the empty request and signals completion. As such, we know that the heartbeat must take at least the preempt reset timeout and before we have had a chance to reset the engine, we do not want to issue a global reset ourselves (simply so that we only try to do one reset at a time and not confuse ourselves by resetting twice and hitting an innocent.) So by taking into consideration that once running the request must take a finite amount of time, we can delay the final completion check to accommodate that and avoid checking too early (before we've had a chance to handle any engine resets required). v2: Attach a callback to flush the work immediately upon the heartbeat completion and insert the delay before the next. v3: Now with more tracking for selftests and detection of false/unexpected hang reports. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853 Suggested-by: CQ Tang Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 104 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 ++ .../drm/i915/gt/selftest_engine_heartbeat.c | 92 ++-- drivers/gpu/drm/i915/gt/selftest_execlists.c | 5 +- 4 files changed, 159 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 0b062fad1837..71aa02995417 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -20,6 +20,18 @@ * issue a reset -- in the hope that restores progress. */ +#define HEARTBEAT_COMPLETION 50u /* milliseconds */ + +static long completion_timeout(const struct intel_engine_cs *engine) +{ + long timeout = HEARTBEAT_COMPLETION; + + if (intel_engine_has_preempt_reset(engine)) + timeout += READ_ONCE(engine->props.preempt_timeout_ms); + + return msecs_to_jiffies(timeout); +} + static bool next_heartbeat(struct intel_engine_cs *engine) { long delay; @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs *engine) return false; delay = msecs_to_jiffies_timeout(delay); + + /* +* Once we submit a heartbeat to the HW, we know that it will take +* at least a certain amount of time to complete. On a hanging system +* it will first have to wait for the preempt reset timeout, and +* then it will take some time for the reset to resume with the +* heartbeat and for it to complete. So once we have submitted the +* heartbeat to HW, we can wait a while longer before declaring the +* engine stuck and forcing a reset ourselves. If we do a reset +* and the engine is also doing a reset, it is possible that we +* reset the engine twice, harming an innocent. +* +* Before we have sumitted the heartbeat, we do not want to change +* the interval as we want to promote the heartbeat and trigger +* preemption in a deterministic time frame. +*/ + if (engine->heartbeat.systole && + i915_request_is_active(engine->heartbeat.systole)) + delay = max(delay, completion_timeout(engine)); + if (delay >= HZ) delay = round_jiffies_up_relative(delay); mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, delay + 1); @@ -48,12 +80,51 @@ heartbeat_create(struct intel_context *ce, gfp_t gfp) return rq; } +static void defibrillator(struct dma_fence *f, struct dma_fence_cb *cb) +{ + struct intel_engine_cs *engine = + container_of(cb, typeof(*engine), heartbeat.cb); + + if (READ_ONCE(engine->heartbeat.systole)) + mod_delayed_work(system_highpri_wq, &engine->heartbeat.work, 0); +} + +static void +untrack_heartbeat(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = fetch_and_zero(&engine->heartbeat.systole); + if (!rq) + return; + + ENGINE_TRACE(engine, "heartbeat " RQ_FMT "completed\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine->heartbeat.completed++); + + dma_fence_remove_callback(&rq->fence, &engine->heartbeat.cb); + i915_request_put(rq); +} + +static void +track_heartbeat(struct intel_engine_cs *engine, struct i915_request *rq) +{ + ENGINE_TRACE(engine, "heartbeat " RQ_FMT "started\n", RQ_ARG(rq)); + I915_SELFTEST_ONLY(engine-&g
[Intel-gfx] [PATCH] lib: Add a YAML emitter
Provide a library to generate correct YAML for use in structured debugfs or similar information dumps. This will be useful to pull our large information dumps into a forwards compatible, parse-able file-format by forcing some structure upon ourselves! Originally from https://github.com/yaml/libyaml/blob/master/src/emitter.c under a permissive MIT licence. Signed-off-by: Chris Wilson Cc: Jani Nikula Cc: Joonas Lahtinen --- Converting to kerneldoc is about the last major task. It uses an opencoded stack struct which might be nice to convert to a library, maybe just use XArray? It has been dogfooded to convert i915_engine_info (and friends) https://patchwork.freedesktop.org/patch/353209/?series=73403&rev=1 --- include/linux/yaml-events.h | 259 include/linux/yaml.h| 190 +++ lib/Kconfig | 11 + lib/Makefile|2 + lib/yaml/Makefile |8 + lib/yaml/test-yaml.c| 123 ++ lib/yaml/yaml-emitter.c | 2539 +++ lib/yaml/yaml-emitter.h | 140 ++ lib/yaml/yaml-events.c | 424 ++ lib/yaml/yaml-simple.c | 227 10 files changed, 3923 insertions(+) create mode 100644 include/linux/yaml-events.h create mode 100644 include/linux/yaml.h create mode 100644 lib/yaml/Makefile create mode 100644 lib/yaml/test-yaml.c create mode 100644 lib/yaml/yaml-emitter.c create mode 100644 lib/yaml/yaml-emitter.h create mode 100644 lib/yaml/yaml-events.c create mode 100644 lib/yaml/yaml-simple.c diff --git a/include/linux/yaml-events.h b/include/linux/yaml-events.h new file mode 100644 index ..33d1bb227e65 --- /dev/null +++ b/include/linux/yaml-events.h @@ -0,0 +1,259 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright (c) 2006-2016 Kirill Simonov + * Copyright (c) 2017-2019 Ingy döt Net + * Copyright (c) 2020 Intel Corporation + */ + +#ifndef __LINUX_YAML_EVENTS_H__ +#define __LINUX_YAML_EVENTS_H__ + +#include +#include +#include + +/** The version directive data. */ +struct yaml_version_directive { + int major; + int minor; +}; + +/** The tag directive data. */ +struct yaml_tag_directive { + char *handle; + char *prefix; +}; + +/** The pointer position. */ +struct yaml_mark { + size_t index; + size_t line; + size_t column; +}; + +enum yaml_event_type { + YAML_NO_EVENT = 0, + + YAML_STREAM_START_EVENT, + YAML_STREAM_END_EVENT, + + YAML_DOCUMENT_START_EVENT, + YAML_DOCUMENT_END_EVENT, + + YAML_ALIAS_EVENT, + YAML_SCALAR_EVENT, + + YAML_SEQUENCE_START_EVENT, + YAML_SEQUENCE_END_EVENT, + + YAML_MAPPING_START_EVENT, + YAML_MAPPING_END_EVENT +}; + +struct yaml_event { + enum yaml_event_type type; + struct list_head link; + + union { + /** The stream parameters (for YAML_STREAM_START_EVENT). */ + struct { + } stream_start; + + /** The document parameters (for YAML_DOCUMENT_START_EVENT). */ + struct { + struct yaml_version_directive *version; + + struct { + struct yaml_tag_directive *start; + struct yaml_tag_directive *end; + } tags; + + int implicit; + } document_start; + + /** The document end parameters (for YAML_DOCUMENT_END_EVENT). */ + struct { + int implicit; + } document_end; + + /** The alias parameters (for YAML_ALIAS_EVENT). */ + struct { + char *anchor; + } alias; + + /** The scalar parameters (for YAML_SCALAR_EVENT). */ + struct { + char *anchor; + char *tag; + char *value; + size_t length; + int plain_implicit; + int quoted_implicit; + enum yaml_scalar_style style; + } scalar; + + /** The sequence parameters (for YAML_SEQUENCE_START_EVENT). */ + struct { + char *anchor; + char *tag; + int implicit; + enum yaml_sequence_style style; + } sequence_start; + + /** The mapping parameters (for YAML_MAPPING_START_EVENT). */ + struct { + char *anchor; + char *tag; + int implicit; + enum yaml_mapping_style style; + } mapping_start; + }; + + struct yaml_mark start_mark; + struct yaml_mark end_mark; +}; + +struct yaml_event *yaml_stream_start_event_create(gfp_t gfp); +struct yaml_event *yaml_stream_end_event_create(gfp_t
[Intel-gfx] [PATCH] drm/i915: Defer object allocation from GGTT probe to GGTT init
Rather than try and allocate objects as we perform our early HW probes, defer the allocation for GGTT objects (such as the scratch page) to later in the initialisation. Signed-off-by: Chris Wilson Cc: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 38 +++- 1 file changed, 15 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index b0b8ded834f0..23c7eb462b2f 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -78,19 +78,29 @@ static int ggtt_init_hw(struct i915_ggtt *ggtt) */ int i915_ggtt_init_hw(struct drm_i915_private *i915) { + struct i915_ggtt *ggtt = &i915->ggtt; + u32 pte_flags; int ret; + ret = setup_scratch_page(&ggtt->vm); + if (ret) + return ret; + + pte_flags = 0; + if (i915_gem_object_is_lmem(ggtt->vm.scratch[0])) + pte_flags |= PTE_LM; + + ggtt->vm.scratch[0]->encode = + ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), + I915_CACHE_NONE, 0); + /* * Note that we use page colouring to enforce a guard page at the * end of the address space. This is required as the CS may prefetch * beyond the end of the batch buffer, across the page boundary, * and beyond the end of the GTT if we do not provide a guard. */ - ret = ggtt_init_hw(&i915->ggtt); - if (ret) - return ret; - - return 0; + return ggtt_init_hw(ggtt); } /* @@ -803,8 +813,6 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) struct drm_i915_private *i915 = ggtt->vm.i915; struct pci_dev *pdev = to_pci_dev(i915->drm.dev); phys_addr_t phys_addr; - u32 pte_flags; - int ret; /* For Modern GENs the PTEs and register space are split in the BAR */ phys_addr = pci_resource_start(pdev, 0) + pci_resource_len(pdev, 0) / 2; @@ -825,22 +833,6 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) return -ENOMEM; } - ret = setup_scratch_page(&ggtt->vm); - if (ret) { - drm_err(&i915->drm, "Scratch setup failed\n"); - /* iounmap will also get called at remove, but meh */ - iounmap(ggtt->gsm); - return ret; - } - - pte_flags = 0; - if (i915_gem_object_is_lmem(ggtt->vm.scratch[0])) - pte_flags |= PTE_LM; - - ggtt->vm.scratch[0]->encode = - ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]), - I915_CACHE_NONE, pte_flags); - return 0; } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Probe system memory regions before enabling HW
If we want to track system/stolen as memory regions, we need to setup our bookkeeping *before* we want to start allocating and reserving objects in those regions. In particular, in setup up the GGTT we will try to preallocate stolen regions configured by the BIOS, and so should prepare the system-stolen memory region beforehand. Signed-off-by: Chris Wilson Cc: Matthew Auld --- drivers/gpu/drm/i915/i915_drv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index b708517d3972..139cea4443fd 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -557,6 +557,10 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv) i915_perf_init(dev_priv); + ret = intel_memory_regions_hw_probe(dev_priv); + if (ret) + goto err_ggtt; + ret = i915_ggtt_probe_hw(dev_priv); if (ret) goto err_perf; @@ -569,10 +573,6 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv) if (ret) goto err_ggtt; - ret = intel_memory_regions_hw_probe(dev_priv); - if (ret) - goto err_ggtt; - intel_gt_init_hw_early(&dev_priv->gt, &dev_priv->ggtt); ret = intel_gt_probe_lmem(&dev_priv->gt); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
Quoting Hans de Goede (2021-02-11 10:36:13) > Hi, > > On 2/10/21 1:48 PM, Chris Wilson wrote: > > Quoting Hans de Goede (2021-02-10 10:37:19) > >> Hi, > >> > >> On 2/10/21 12:07 AM, Chris Wilson wrote: > >>> Quoting Hans de Goede (2021-02-09 11:46:46) > >>>> Hi, > >>>> > >>>> On 2/9/21 12:27 AM, Chris Wilson wrote: > >>>>> Quoting Hans de Goede (2021-02-08 20:38:58) > >>>>>> Hi All, > >>>>>> > >>>>>> We (Fedora) have been receiving reports from multiple users about gfx > >>>>>> issues / glitches > >>>>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell > >>>>>> iGPUs and all > >>>>>> reporters report that adding i915.mitigations=off to the cmdline fixes > >>>>>> things, see: > >>>>> > >>>>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit > >>>>> did not report any differences with and without mitigations. I have yet > >>>>> to test other platforms. So I don't yet have an alternative. > >>>> > >>>> Note the original / first reporter of: > >>>> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1925346 > >>>> > >>>> Is using hsw-gt2, so it seems that the problem is not just the enabling > >>>> of > >>>> the mitigations on ivy-bridge / bay-trail but that there actually is > >>>> a regression on devices where the WA worked fine before... > >>> > >>> There have been 3 crashes uploaded related to v5.10.9, and in all 3 > >>> cases the ACTHD has been in the first page. This strongly suggests that > >>> the w/a is scribbling over address 0. And there's then a very good > >>> chance that > >>> > >>> commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6 > >>> Author: Chris Wilson > >>> Date: Mon Jan 25 12:50:33 2021 + > >>> > >>> drm/i915/gt: Always try to reserve GGTT address 0x0 > >>> > >>> commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream. > >>> > >>> in v5.10.14 is sufficient to hide the issue. > >> > >> That one actually is already in v5.10.13 and the various reportes of these > >> issues have already tested 5.10.13. They did mention that it took longer > >> to reproduce with 5.10.13 then with 5.10.10, but that could also be due to: > >> > >> "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals" > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5 > > > > Started looking for scratch page overwrites, and found this little gem: > > https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1 > > > > Looks promising wrt the cause of overwriting random addresses -- and > > I hope that is the explanation for the glitches/hangs. I have a hsw gt2 > > with gnome shell, piglit is happy, but I suspect it is all due to > > placement and so will only occur at random. > > If you can give me a list of commits to cherry-pick then I can prepare > a Fedora 5.10.y kernel which those added for the group of Fedora users > who are hitting this to test. e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals") d30bbd62b1bf ("drm/i915/gt: Flush before changing register state") 1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear") are missing from v5.10.15 -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend
Quoting Rodrigo Vivi (2021-02-11 04:25:17) > On Wed, Feb 10, 2021 at 10:19:50PM +0000, Chris Wilson wrote: > > After calling intel_gt_suspend_prepare(), the driver starts to turn off > > various subsystems, such as clearing the GGTT, before calling > > intel_gt_suspend_late() to relinquish control over the GT. However, if > > we still have internal GPU state active as we clear the GGTT, the GPU > > may write back its internal state to the residual GGTT addresses that > > are now pointing into scratch. Let's reset the GPU to clear that > > internal state as soon we have idled the GPU in prepare-to-suspend. > > > > Signed-off-by: Chris Wilson > > --- > > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > index 0bd303d2823e..f41612faa269 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt) > > wait_for_suspend(gt); > > you just wedged the gpu here... Potentially. As a means to clear a stuck GPU and force it to idle. > > intel_uc_suspend(>->uc); > > + > > + /* Flush all the contexts and internal state before turning off GGTT > > */ > > + gt_sanitize(gt, false); > > and now we are unsetting wedge here... > > is this right? But irrelevant, since it is undone on any of the resume pathways which must be taken by this point. Resume has been for many years the method to unwedge a GPU; with the presumption being that the intervening PCI level reset would be enough to recover the GPU. Otherwise, it would presumably quite quickly go back into the wedged state. The wedging on suspend is just there to cancel outstanding work. Which is not what we want (we just want to remove work), but is what we have for the moment. The sanitize is to make sure we don't leak our state beyond our control of the HW. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Clear internal GT state on hibernate
Quoting Rodrigo Vivi (2021-02-11 04:28:41) > On Wed, Feb 10, 2021 at 10:19:51PM +0000, Chris Wilson wrote: > > Call intel_gt_suspend_late() to disable the GT before hibernation. > > > > Signed-off-by: Chris Wilson > > --- > > drivers/gpu/drm/i915/gem/i915_gem_pm.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c > > b/drivers/gpu/drm/i915/gem/i915_gem_pm.c > > index 000e1cd8e920..da0ef483ad6b 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c > > @@ -115,6 +115,8 @@ int i915_gem_freeze_late(struct drm_i915_private *i915) > >* the objects as well, see i915_gem_freeze() > >*/ > > > > + intel_gt_suspend_late(&i915->gt); > > + > > why are you calling this here instead of directly in i915_drm_suspend_late ? For symmetry with the current code paths. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
Quoting Hans de Goede (2021-02-10 10:37:19) > Hi, > > On 2/10/21 12:07 AM, Chris Wilson wrote: > > Quoting Hans de Goede (2021-02-09 11:46:46) > >> Hi, > >> > >> On 2/9/21 12:27 AM, Chris Wilson wrote: > >>> Quoting Hans de Goede (2021-02-08 20:38:58) > >>>> Hi All, > >>>> > >>>> We (Fedora) have been receiving reports from multiple users about gfx > >>>> issues / glitches > >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell > >>>> iGPUs and all > >>>> reporters report that adding i915.mitigations=off to the cmdline fixes > >>>> things, see: > >>> > >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit > >>> did not report any differences with and without mitigations. I have yet > >>> to test other platforms. So I don't yet have an alternative. > >> > >> Note the original / first reporter of: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346 > >> > >> Is using hsw-gt2, so it seems that the problem is not just the enabling of > >> the mitigations on ivy-bridge / bay-trail but that there actually is > >> a regression on devices where the WA worked fine before... > > > > There have been 3 crashes uploaded related to v5.10.9, and in all 3 > > cases the ACTHD has been in the first page. This strongly suggests that > > the w/a is scribbling over address 0. And there's then a very good > > chance that > > > > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6 > > Author: Chris Wilson > > Date: Mon Jan 25 12:50:33 2021 + > > > > drm/i915/gt: Always try to reserve GGTT address 0x0 > > > > commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream. > > > > in v5.10.14 is sufficient to hide the issue. > > That one actually is already in v5.10.13 and the various reportes of these > issues have already tested 5.10.13. They did mention that it took longer > to reproduce with 5.10.13 then with 5.10.10, but that could also be due to: > > "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals" > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5 There's also another pair of pipecontrols required for that: d30bbd62b1bf ("drm/i915/gt: Flush before changing register state") which didn't get picked up for stable. https://intel-gfx-ci.01.org/tree/linux-stable/index-alt.html? shows that we are still missing a couple of fixes for the w/a, at least compare to BAT on drm-tip. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 160 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 ++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 37 -- drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_vma.c| 23 ++ drivers/gpu/drm/i915/i915_vma_types.h | 1 + 5 files changed, 41 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..9f2ccc255ca1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -345,6 +345,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + if (intel_scanout_needs_vtd_wa(i915)) + flags |= PIN_VTD; + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index b0b8ded834f0..416f77f48561 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -238,6 +238,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen8_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; + + end = gte - vma->guard / I915_GTT_PAGE_SIZE; + while (end < gte) + gen8_set_pte(end++, vm->scratch[0]->encode); + end = gte + vma->node.size / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) @@ -245,6 +250,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, GEM_BUG_ON(gte > end); /* Fill the allocated but "unused" space beyond the end of the buffer */ + end += vma->guard / I915_GTT_PAGE_SIZE; while (gte < end) gen8_set_pte(gte++, vm->scratch[0]->encode); @@ -289,6 +295,11 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, gte = (gen6_pte_t __iomem *)ggtt->gsm; gte += vma->node.start / I915_GTT_PAGE_SIZE; + + end = gte - vma->guard / I915_GTT_PAGE_SIZE; + while (end < gte) + gen8_set_pte(end++, vm->scratch[0]->encode); + end = gte + vma->node.size / I915_GTT_PAGE_SIZE; for_each_sgt_daddr(addr, iter, vma->pages) @@ -296,6 +307,7 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, GEM_BUG_ON(gte > end); /* Fill the allocated but "unused" space beyond the end of the buffer */ + end += vma->guard / I915_GTT_PAGE_SIZE; while (gte < end) iowrite32(vm->scratch[0]->encode, gte++); @@ -311,27 +323,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, -"First entry = %d; Num entries = %d (max=%d)\n", -first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -898,8 +889,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page
[Intel-gfx] [PATCH 4/6] drm/i915/selftests: Restrict partial-tiling to write into the object
Check that the address we are about to write into maps into the object to avoid stray writes into the scratch page. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 39293d98f34d..df558ce95a94 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -104,9 +104,12 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, GEM_BUG_ON(i915_gem_object_get_tiling(obj) != tile->tiling); GEM_BUG_ON(i915_gem_object_get_stride(obj) != tile->stride); - page = i915_prandom_u32_max_state(npages, prng); - view = compute_partial_view(obj, page, MIN_CHUNK_PAGES); + do { + page = i915_prandom_u32_max_state(npages, prng); + offset = tiled_offset(tile, page << PAGE_SHIFT); + } while (offset >= obj->base.size); + view = compute_partial_view(obj, page, MIN_CHUNK_PAGES); vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE); if (IS_ERR(vma)) { pr_err("Failed to pin partial view: offset=%lu; err=%d\n", @@ -129,10 +132,6 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, iowrite32(page, io + n * PAGE_SIZE / sizeof(*io)); i915_vma_unpin_iomap(vma); - offset = tiled_offset(tile, page << PAGE_SHIFT); - if (offset >= obj->base.size) - goto out; - intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/6] drm/i915/selftests: Declare suspend_state before testing suspend
As we mock the suspend routines to exercise suspending driver and manipulating backing storage across the suspend, declare the suspend target as we do. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/selftests/i915_gem.c | 40 +-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c index dc394fb7ccfa..6c764bcfe512 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c @@ -5,6 +5,7 @@ */ #include +#include #include "gem/selftests/igt_gem_utils.h" #include "gem/selftests/mock_context.h" @@ -87,25 +88,38 @@ static void simulate_hibernate(struct drm_i915_private *i915) intel_runtime_pm_put(&i915->runtime_pm, wakeref); } -static int pm_prepare(struct drm_i915_private *i915) +static int do_prepare(struct drm_i915_private *i915) { i915_gem_suspend(i915); return 0; } -static void pm_suspend(struct drm_i915_private *i915) +static suspend_state_t set_pm_target(suspend_state_t target) { +#ifdef CONFIG_PM_SLEEP + return xchg(&pm_suspend_target_state, target); +#else + return PM_SUSPEND_ON; +#endif +} + +static suspend_state_t do_suspend(struct drm_i915_private *i915) +{ + suspend_state_t old = set_pm_target(PM_SUSPEND_MEM); intel_wakeref_t wakeref; with_intel_runtime_pm(&i915->runtime_pm, wakeref) { i915_ggtt_suspend(&i915->ggtt); i915_gem_suspend_late(i915); } + + return old; } -static void pm_hibernate(struct drm_i915_private *i915) +static suspend_state_t do_hibernate(struct drm_i915_private *i915) { + suspend_state_t old = set_pm_target(PM_SUSPEND_MAX); intel_wakeref_t wakeref; with_intel_runtime_pm(&i915->runtime_pm, wakeref) { @@ -114,9 +128,11 @@ static void pm_hibernate(struct drm_i915_private *i915) i915_gem_freeze(i915); i915_gem_freeze_late(i915); } + + return old; } -static void pm_resume(struct drm_i915_private *i915) +static void do_resume(struct drm_i915_private *i915, suspend_state_t saved) { intel_wakeref_t wakeref; @@ -128,12 +144,15 @@ static void pm_resume(struct drm_i915_private *i915) i915_ggtt_resume(&i915->ggtt); i915_gem_resume(i915); } + + set_pm_target(saved); } static int igt_gem_suspend(void *arg) { struct drm_i915_private *i915 = arg; struct i915_gem_context *ctx; + suspend_state_t saved; struct file *file; int err; @@ -148,16 +167,16 @@ static int igt_gem_suspend(void *arg) if (err) goto out; - err = pm_prepare(i915); + err = do_prepare(i915); if (err) goto out; - pm_suspend(i915); + saved = do_suspend(i915); /* Here be dragons! Note that with S3RST any S3 may become S4! */ simulate_hibernate(i915); - pm_resume(i915); + do_resume(i915, saved); err = switch_to_context(ctx); out: @@ -169,6 +188,7 @@ static int igt_gem_hibernate(void *arg) { struct drm_i915_private *i915 = arg; struct i915_gem_context *ctx; + suspend_state_t saved; struct file *file; int err; @@ -183,16 +203,16 @@ static int igt_gem_hibernate(void *arg) if (err) goto out; - err = pm_prepare(i915); + err = do_prepare(i915); if (err) goto out; - pm_hibernate(i915); + saved = do_hibernate(i915); /* Here be dragons! */ simulate_hibernate(i915); - pm_resume(i915); + do_resume(i915, saved); err = switch_to_context(ctx); out: -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 5/6] drm/i915: Check for scratch page scribbling
Periodically check, for example when idling and upon closing user contexts, whether or not some client has written into unallocated PTE in their ppGTT. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- .../drm/i915/gem/selftests/i915_gem_context.c | 24 +++--- .../drm/i915/gem/selftests/i915_gem_mman.c| 32 - drivers/gpu/drm/i915/gt/intel_engine_cs.c | 31 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 + drivers/gpu/drm/i915/gt/intel_gtt.c | 45 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 33 +- drivers/gpu/drm/i915/i915_utils.c | 29 drivers/gpu/drm/i915/i915_utils.h | 3 ++ 9 files changed, 141 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index df949320f2b5..5a9128dd3979 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1737,7 +1737,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, return err; } -static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) +static int check_ctx_scratch(struct i915_gem_context *ctx, u32 *out) { struct i915_address_space *vm; struct page *page; @@ -1770,6 +1770,17 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) return err; } +static void reset_ctx_scratch(struct i915_gem_context *ctx, u32 value) +{ + struct i915_address_space *vm = ctx_vm(ctx); + struct page *page = __px_page(vm->scratch[0]); + u32 *vaddr; + + vaddr = kmap(page); + memset32(vaddr, value, PAGE_SIZE / sizeof(value)); + kunmap(page); +} + static int igt_vm_isolation(void *arg) { struct drm_i915_private *i915 = arg; @@ -1816,11 +1827,11 @@ static int igt_vm_isolation(void *arg) goto out_file; /* Read the initial state of the scratch page */ - err = check_scratch_page(ctx_a, &expected); + err = check_ctx_scratch(ctx_a, &expected); if (err) goto out_file; - err = check_scratch_page(ctx_b, &expected); + err = check_ctx_scratch(ctx_b, &expected); if (err) goto out_file; @@ -1855,7 +1866,7 @@ static int igt_vm_isolation(void *arg) err = read_from_scratch(ctx_b, engine, offset, &value); if (err) - goto out_file; + goto out_scratch; if (value != expected) { pr_err("%s: Read %08x from scratch (offset 0x%08x_%08x), after %lu reads!\n", @@ -1864,7 +1875,7 @@ static int igt_vm_isolation(void *arg) lower_32_bits(offset), this); err = -EINVAL; - goto out_file; + goto out_scratch; } this++; @@ -1875,6 +1886,9 @@ static int igt_vm_isolation(void *arg) pr_info("Checked %lu scratch offsets across %lu engines\n", count, num_engines); +out_scratch: + /* As we deliberately write into scratch, cover up our tracks */ + reset_ctx_scratch(ctx_a, expected); out_file: if (igt_live_test_end(&t)) err = -EIO; diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index df558ce95a94..b7f41f230c8f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -155,6 +155,18 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, drm_clflush_virt_range(cpu, sizeof(*cpu)); kunmap(p); + if (check_scratch_page(vma->vm)) { + pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) overwrote scratch\n", + page, n, + view.partial.offset, + view.partial.size, + vma->size >> PAGE_SHIFT, + tile->tiling ? tile_row_pages(obj) : 0, + vma->fence ? vma->fence->id : -1, + tile->tiling, tile->stride); + err = -EIO; + } + out: __i915_vma_put(vma); return err; @@ -250,6 +262,9 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, return -EINTR; } + if (check_scratch_page(&to_i915(obj-&g
[Intel-gfx] [PATCH 6/6] drm/i915: Remove unused debug functions
Remove or hide unused debug functions from clang, or else it moans. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_sw_fence.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index dfabf291e5cd..566bc48e5b0c 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -49,10 +49,12 @@ static inline void debug_fence_init(struct i915_sw_fence *fence) debug_object_init(fence, &i915_sw_fence_debug_descr); } +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) static inline void debug_fence_init_onstack(struct i915_sw_fence *fence) { debug_object_init_on_stack(fence, &i915_sw_fence_debug_descr); } +#endif static inline void debug_fence_activate(struct i915_sw_fence *fence) { @@ -92,9 +94,11 @@ static inline void debug_fence_init(struct i915_sw_fence *fence) { } +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) static inline void debug_fence_init_onstack(struct i915_sw_fence *fence) { } +#endif static inline void debug_fence_activate(struct i915_sw_fence *fence) { @@ -113,10 +117,6 @@ static inline void debug_fence_destroy(struct i915_sw_fence *fence) { } -static inline void debug_fence_free(struct i915_sw_fence *fence) -{ -} - static inline void debug_fence_assert(struct i915_sw_fence *fence) { } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/6] drm/i915: Clear internal GT state on hibernate
Call intel_gt_suspend_late() to disable the GT before hibernation. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_pm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index 000e1cd8e920..da0ef483ad6b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -115,6 +115,8 @@ int i915_gem_freeze_late(struct drm_i915_private *i915) * the objects as well, see i915_gem_freeze() */ + intel_gt_suspend_late(&i915->gt); + with_intel_runtime_pm(&i915->runtime_pm, wakeref) i915_gem_shrink(i915, -1UL, NULL, ~0); i915_gem_drain_freed_objects(i915); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend
After calling intel_gt_suspend_prepare(), the driver starts to turn off various subsystems, such as clearing the GGTT, before calling intel_gt_suspend_late() to relinquish control over the GT. However, if we still have internal GPU state active as we clear the GGTT, the GPU may write back its internal state to the residual GGTT addresses that are now pointing into scratch. Let's reset the GPU to clear that internal state as soon we have idled the GPU in prepare-to-suspend. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index 0bd303d2823e..f41612faa269 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt) wait_for_suspend(gt); intel_uc_suspend(>->uc); + + /* Flush all the contexts and internal state before turning off GGTT */ + gt_sanitize(gt, false); } static suspend_state_t pm_suspend_target(void) @@ -337,7 +340,7 @@ void intel_gt_suspend_late(struct intel_gt *gt) intel_llc_disable(>->llc); } - gt_sanitize(gt, false); + gt_sanitize(gt, false); /* Be paranoid, remove all residual GPU state */ GT_TRACE(gt, "\n"); } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [5.10.y regression] i915 clear-residuals mitigation is causing gfx issues
Quoting Hans de Goede (2021-02-10 10:37:19) > Hi, > > On 2/10/21 12:07 AM, Chris Wilson wrote: > > Quoting Hans de Goede (2021-02-09 11:46:46) > >> Hi, > >> > >> On 2/9/21 12:27 AM, Chris Wilson wrote: > >>> Quoting Hans de Goede (2021-02-08 20:38:58) > >>>> Hi All, > >>>> > >>>> We (Fedora) have been receiving reports from multiple users about gfx > >>>> issues / glitches > >>>> stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell > >>>> iGPUs and all > >>>> reporters report that adding i915.mitigations=off to the cmdline fixes > >>>> things, see: > >>> > >>> I tried to reproduce this on the w/e on hsw-gt1, to no avail; and piglit > >>> did not report any differences with and without mitigations. I have yet > >>> to test other platforms. So I don't yet have an alternative. > >> > >> Note the original / first reporter of: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1925346 > >> > >> Is using hsw-gt2, so it seems that the problem is not just the enabling of > >> the mitigations on ivy-bridge / bay-trail but that there actually is > >> a regression on devices where the WA worked fine before... > > > > There have been 3 crashes uploaded related to v5.10.9, and in all 3 > > cases the ACTHD has been in the first page. This strongly suggests that > > the w/a is scribbling over address 0. And there's then a very good > > chance that > > > > commit 29d35b73ead4e41aa0d1a954c9bfbdce659ec5d6 > > Author: Chris Wilson > > Date: Mon Jan 25 12:50:33 2021 + > > > > drm/i915/gt: Always try to reserve GGTT address 0x0 > > > > commit 489140b5ba2e7cc4b853c29e0591895ddb462a82 upstream. > > > > in v5.10.14 is sufficient to hide the issue. > > That one actually is already in v5.10.13 and the various reportes of these > issues have already tested 5.10.13. They did mention that it took longer > to reproduce with 5.10.13 then with 5.10.10, but that could also be due to: > > "drm/i915/gt: Clear CACHE_MODE prior to clearing residuals" > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=520d05a77b2866eb4cb9e548e1d8c8abcfe60ec5 Started looking for scratch page overwrites, and found this little gem: https://patchwork.freedesktop.org/patch/420436/?series=86947&rev=1 Looks promising wrt the cause of overwriting random addresses -- and I hope that is the explanation for the glitches/hangs. I have a hsw gt2 with gnome shell, piglit is happy, but I suspect it is all due to placement and so will only occur at random. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [CI] drm/i915/gt: Correct surface base address for renderclear
The surface_state_base is an offset into the batch, so we need to pass the correct batch address for STATE_BASE_ADDRESS. Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Prathap Kumar Valsan Cc: Akeem G Abodunrin Cc: Hans de Goede Reviewed-by: Mika Kuoppala Cc: # v5.7+ --- drivers/gpu/drm/i915/gt/gen7_renderclear.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index e403eb046a43..de575fdb033f 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -240,7 +240,7 @@ gen7_emit_state_base_address(struct batch_chunk *batch, /* general */ *cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY; /* surface */ - *cs++ = batch_addr(batch) | surface_state_base | BASE_ADDRESS_MODIFY; + *cs++ = (batch_addr(batch) + surface_state_base) | BASE_ADDRESS_MODIFY; /* dynamic */ *cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY; /* indirect */ -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915/gt: Correct surface base address
Quoting Mika Kuoppala (2021-02-10 10:50:18) > Chris Wilson writes: > > > The surface_state_base is an offset into the batch, so we need to pass > > the correct batch address for STATE_BASE_ADDRESS. > > > > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") > > Signed-off-by: Chris Wilson > > Cc: Mika Kuoppala > > Cc: Prathap Kumar Valsan > > Cc: Akeem G Abodunrin > > Reviewed-by: Mika Kuoppala Compared against https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7571/index.html I think we've found our suspect. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Check for scratch page scribbling
Quoting Mika Kuoppala (2021-02-10 10:49:55) > Chris Wilson writes: > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c > > b/drivers/gpu/drm/i915/gt/intel_gtt.c > > index d34770ae4c9a..5ac9eb4a3a92 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c > > @@ -158,10 +158,49 @@ static void poison_scratch_page(struct > > drm_i915_gem_object *scratch) > > > > vaddr = kmap(page); > > memset(vaddr, val, PAGE_SIZE); > > + set_page_dirty(page); /* keep the poisoned contents */ > > Should we use locked version in here? We don't hold the page-lock here, so no. If this is not an anonymous page, something is very wrong :p -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default
Quoting Tvrtko Ursulin (2021-02-10 10:53:43) > +static struct clients *display_clients(struct clients *clients) > +{ > + struct client *ac, *c, *cp = NULL; > + struct clients *aggregated; > + int tmp, num = 0; > + > + if (!aggregate_pids) > + return sort_clients(clients, client_cmp); Still two calls to return sort_clients(foo, client_cmp) in this function :) [a clients = aggregated; after processing would merge the two paths]. Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default
Quoting Tvrtko Ursulin (2021-02-10 10:55:44) > > On 10/02/2021 10:35, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2021-02-10 09:37:55) > > Ok, that works very well. Hmm. The sort order does seem a little jumpy > > though. May I suggest ac->id = -c->pid; instead of num; > > Done it although I thought 1st pass being sort by pid already, num as id > would follow a stable order. I guess your point was inversion to > preserve order when cycling sort modes? I thought that makes more sense for 'aggregate-by-pid', that it should either be in ascending/descending pid order for idle. It just felt very jumpy; it may have been tiny amount of %busy, but I suspected it may have been slightly unstable sorting with changing clients. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/gem: Add a check for object size for corner cases
Quoting Anand Moon (2021-02-10 07:59:29) > Add check for object size to return appropriate error -E2BIG or -EINVAL > to avoid WARM_ON and sucessfull return for some testcase. No. You miss the point of having those warnings. We need to inspect the code to remove the last remaining "int pagenum", and then we can remove the GEM_WARN_ON((sz) >> PAGE_SHIFT > INT_MAX). These are not emitted for users, only for us to motivate us into finally fixing the code. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t 1/3] intel_gpu_top: Wrap interactive header
Quoting Tvrtko Ursulin (2021-02-10 09:37:54) > From: Tvrtko Ursulin > > Slight improvement with regards to wrapping header components to fit > console width. If a single element is wider than max it can still > overflow but it should now work better for practical console widths. I'm not fond of this layout, but that's just a personal preference. Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 3/3] intel_gpu_top: Interactive help screen
Quoting Tvrtko Ursulin (2021-02-10 09:37:56) > From: Tvrtko Ursulin > > Show a list of supported interactive commands when pressing 'h'. > > Signed-off-by: Tvrtko Ursulin Reviewed-by: Chris Wilson -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 2/3] intel_gpu_top: Aggregate clients by PID by default
Quoting Tvrtko Ursulin (2021-02-10 09:37:55) > +static struct clients *aggregated_clients(struct clients *clients) > +{ > + struct client *ac, *c, *cp = NULL; > + struct clients *aggregated; > + int tmp, num = 0; > + > + /* Sort by pid first to make it easy to aggregate while walking. */ > + sort_clients(clients, client_pid_cmp); You could eliminate this tiny bit of duplication by always calling aggregated_clients() and returning here for !aggregate_pids. > + aggregated = calloc(1, sizeof(*clients)); > + assert(aggregated); > + > + ac = calloc(clients->num_clients, sizeof(*c)); > + assert(ac); > + > + aggregated->num_classes = clients->num_classes; > + aggregated->class = clients->class; > + aggregated->client = ac; > + > + for_each_client(clients, c, tmp) { > + unsigned int i; > + > + if (c->status == FREE) > + break; > + > + assert(c->status == ALIVE); > + > + if ((cp && c->pid != cp->pid) || !cp) { > + ac = &aggregated->client[num]; > + > + /* New pid. */ > + ac->clients = aggregated; > + ac->status = ALIVE; > + ac->id = ++num; > + ac->pid = c->pid; > + strcpy(ac->name, c->name); > + strcpy(ac->print_name, c->print_name); > + ac->engines = c->engines; > + ac->val = calloc(clients->num_classes, > +sizeof(ac->val[0])); > + assert(ac->val); > + ac->samples = 1; > + } > + > + cp = c; > + > + if (c->samples < 2) > + continue; > + > + ac->samples = 2; /* All what matters for display. */ > + ac->total_runtime += c->total_runtime; > + ac->last_runtime += c->last_runtime; > + > + for (i = 0; i < clients->num_classes; i++) > + ac->val[i] += c->val[i]; > + } > + > + aggregated->num_clients = num; > + aggregated->active_clients = num; > + > + return sort_clients(aggregated, client_cmp); > } Ok, that works very well. Hmm. The sort order does seem a little jumpy though. May I suggest ac->id = -c->pid; instead of num; -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/2] drm/i915/gt: Correct surface base address
The surface_state_base is an offset into the batch, so we need to pass the correct batch address for STATE_BASE_ADDRESS. Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Prathap Kumar Valsan Cc: Akeem G Abodunrin --- drivers/gpu/drm/i915/gt/gen7_renderclear.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index e403eb046a43..de575fdb033f 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -240,7 +240,7 @@ gen7_emit_state_base_address(struct batch_chunk *batch, /* general */ *cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY; /* surface */ - *cs++ = batch_addr(batch) | surface_state_base | BASE_ADDRESS_MODIFY; + *cs++ = (batch_addr(batch) + surface_state_base) | BASE_ADDRESS_MODIFY; /* dynamic */ *cs++ = batch_addr(batch) | BASE_ADDRESS_MODIFY; /* indirect */ -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx