Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Tvrtko Ursulin (2018-08-15 12:51:28) > > On 14/08/2018 16:22, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-08-14 15:40:58) > > Looks like we should be able to hook this up to a selftest to confirm > > the modification does land in the target context image, and a SRM to > > confirm it loaded. > > Lionel wrote an IGT which reads it back via SRM so that should be > covered. I will be posting the IGT counterpart at some point as well, > ideally when the agreement on the i915 side is there. Wouldn't you rather have both? I know I would :-p The emphasis is slightly different, IGT wants to cover the ABI behaviour, selftests looking towards the edge cases hard to reach otherwise. But even a trivial selftest would let us know the series functions as intended. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/2018 16:22, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-08-14 15:40:58) +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = ctx->i915; + struct i915_request *rq; + struct intel_ring *ring; + int ret; + + lockdep_assert_held(>drm.struct_mutex); + + /* Submitting requests etc needs the hw awake. */ + intel_runtime_pm_get(i915); + + i915_retire_requests(i915); + + /* Now use the RCS to actually reconfigure. */ + engine = i915->engine[RCS]; + + rq = i915_request_alloc(engine, i915->kernel_context); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + goto out_put; + } + + ret = engine->emit_rpcs_config(rq, ctx, sseu); + if (ret) + goto out_add; + + /* Queue this switch after all other activity */ + list_for_each_entry(ring, >gt.active_rings, active_link) { + struct i915_request *prev; + + prev = last_request_on_engine(ring->timeline, engine); + if (prev) + i915_sw_fence_await_sw_fence_gfp(>submit, +>submit, +I915_FENCE_GFP); + } + + i915_gem_set_global_barrier(i915, rq); + +out_add: + i915_request_add(rq); +out_put: + intel_runtime_pm_put(i915); + + return ret; Looks like we should be able to hook this up to a selftest to confirm the modification does land in the target context image, and a SRM to confirm it loaded. Lionel wrote an IGT which reads it back via SRM so that should be covered. I will be posting the IGT counterpart at some point as well, ideally when the agreement on the i915 side is there. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/2018 19:53, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-08-14 19:44:09) On 14/08/2018 15:59, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-08-14 15:40:58) From: Chris Wilson We want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .class = 0, .instance = 0, }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) v5: Validate sseu configuration against the device's capabilities (Lionel) v6: Change context powergating settings through MI_SDM on kernel context (Chris) v7: Synchronize the requests following a powergating setting change using a global dependency (Chris) Iterate timelines through dev_priv.gt.active_rings (Tvrtko) Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) v8: s/union intel_sseu/struct intel_sseu/ (Lionel) s/dev_priv/i915/ (Tvrtko) Change uapi class/instance fields to u16 (Tvrtko) Bump mask fields to 64bits (Lionel) Don't return EPERM when dynamic sseu is disabled (Tvrtko) v9: Import context image into kernel context's ppgtt only when reconfiguring powergated slice/subslices (Chris) Use aliasing ppgtt when needed (Michel) Tvrtko Ursulin: v10: * Update for upstream changes. * Request submit needs a RPM reference. * Reject on !FULL_PPGTT for simplicity. * Pull out get/set param to helpers for readability and less indent. * Use i915_request_await_dma_fence in add_global_barrier to skip waits on the same timeline and avoid GEM_BUG_ON. * No need to explicitly assign a NULL pointer to engine in legacy mode. * No need to move gen8_make_rpcs up. * Factored out global barrier as prep patch. * Allow to only CAP_SYS_ADMIN if !Gen11. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Issue: https://github.com/intel/media-driver/issues/267 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin Cc: Dmitry Rogozhkin Cc: Tvrtko Ursulin Cc: Zhipeng Gong Cc: Joonas Lahtinen Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_context.c | 187 +++- drivers/gpu/drm/i915/intel_lrc.c| 55 +++ drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + include/uapi/drm/i915_drm.h | 43 ++ 4 files changed, 288 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 8a12984e7495..6d6220634e9e 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = ctx->i915; + struct i915_request *rq; + struct intel_ring *ring; + int ret; + + lockdep_assert_held(>drm.struct_mutex); + + /* Submitting requests etc needs the hw awake. */ + intel_runtime_pm_get(i915); + + i915_retire_requests(i915); ? I wondered myself but did not make myself dig through all the history of the series. Cannot think that it does anything useful in the current design. + + /* Now use the RCS to actually reconfigure. */ +
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Tvrtko Ursulin (2018-08-14 19:44:09) > > On 14/08/2018 15:59, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-08-14 15:40:58) > >> From: Chris Wilson > >> > >> We want to allow userspace to reconfigure the subslice configuration for > >> its own use case. To do so, we expose a context parameter to allow > >> adjustment of the RPCS register stored within the context image (and > >> currently not accessible via LRI). If the context is adjusted before > >> first use, the adjustment is for "free"; otherwise if the context is > >> active we flush the context off the GPU (stalling all users) and forcing > >> the GPU to save the context to memory where we can modify it and so > >> ensure that the register is reloaded on next execution. > >> > >> The overhead of managing additional EU subslices can be significant, > >> especially in multi-context workloads. Non-GPGPU contexts should > >> preferably disable the subslices it is not using, and others should > >> fine-tune the number to match their workload. > >> > >> We expose complete control over the RPCS register, allowing > >> configuration of slice/subslice, via masks packed into a u64 for > >> simplicity. For example, > >> > >> struct drm_i915_gem_context_param arg; > >> struct drm_i915_gem_context_param_sseu sseu = { .class = 0, > >> .instance = 0, }; > >> > >> memset(, 0, sizeof(arg)); > >> arg.ctx_id = ctx; > >> arg.param = I915_CONTEXT_PARAM_SSEU; > >> arg.value = (uintptr_t) > >> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) > >> { > >> sseu.packed.subslice_mask = 0; > >> > >> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); > >> } > >> > >> could be used to disable all subslices where supported. > >> > >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() > >> (Lionel) > >> > >> v3: Add ability to program this per engine (Chris) > >> > >> v4: Move most get_sseu() into i915_gem_context.c (Lionel) > >> > >> v5: Validate sseu configuration against the device's capabilities (Lionel) > >> > >> v6: Change context powergating settings through MI_SDM on kernel context > >> (Chris) > >> > >> v7: Synchronize the requests following a powergating setting change using > >> a global > >> dependency (Chris) > >> Iterate timelines through dev_priv.gt.active_rings (Tvrtko) > >> Disable RPCS configuration setting for non capable users > >> (Lionel/Tvrtko) > >> > >> v8: s/union intel_sseu/struct intel_sseu/ (Lionel) > >> s/dev_priv/i915/ (Tvrtko) > >> Change uapi class/instance fields to u16 (Tvrtko) > >> Bump mask fields to 64bits (Lionel) > >> Don't return EPERM when dynamic sseu is disabled (Tvrtko) > >> > >> v9: Import context image into kernel context's ppgtt only when > >> reconfiguring powergated slice/subslices (Chris) > >> Use aliasing ppgtt when needed (Michel) > >> > >> Tvrtko Ursulin: > >> > >> v10: > >> * Update for upstream changes. > >> * Request submit needs a RPM reference. > >> * Reject on !FULL_PPGTT for simplicity. > >> * Pull out get/set param to helpers for readability and less indent. > >> * Use i915_request_await_dma_fence in add_global_barrier to skip waits > >> on the same timeline and avoid GEM_BUG_ON. > >> * No need to explicitly assign a NULL pointer to engine in legacy mode. > >> * No need to move gen8_make_rpcs up. > >> * Factored out global barrier as prep patch. > >> * Allow to only CAP_SYS_ADMIN if !Gen11. > >> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > >> Issue: https://github.com/intel/media-driver/issues/267 > >> Signed-off-by: Chris Wilson > >> Signed-off-by: Lionel Landwerlin > >> Cc: Dmitry Rogozhkin > >> Cc: Tvrtko Ursulin > >> Cc: Zhipeng Gong > >> Cc: Joonas Lahtinen > >> Signed-off-by: Tvrtko Ursulin > >> --- > >> drivers/gpu/drm/i915/i915_gem_context.c | 187 +++- > >> drivers/gpu/drm/i915/intel_lrc.c| 55 +++ > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + > >> include/uapi/drm/i915_drm.h | 43 ++ > >> 4 files changed, 288 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > >> b/drivers/gpu/drm/i915/i915_gem_context.c > >> index 8a12984e7495..6d6220634e9e 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem_context.c > >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c > >> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device > >> *dev, void *data, > >> return 0; > >> } > >> > >> +static int > >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, > >> + struct intel_engine_cs *engine, > >> + struct intel_sseu sseu) > >> +{ > >> + struct drm_i915_private *i915 = ctx->i915; > >> + struct
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/2018 15:59, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-08-14 15:40:58) From: Chris Wilson We want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .class = 0, .instance = 0, }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) v5: Validate sseu configuration against the device's capabilities (Lionel) v6: Change context powergating settings through MI_SDM on kernel context (Chris) v7: Synchronize the requests following a powergating setting change using a global dependency (Chris) Iterate timelines through dev_priv.gt.active_rings (Tvrtko) Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) v8: s/union intel_sseu/struct intel_sseu/ (Lionel) s/dev_priv/i915/ (Tvrtko) Change uapi class/instance fields to u16 (Tvrtko) Bump mask fields to 64bits (Lionel) Don't return EPERM when dynamic sseu is disabled (Tvrtko) v9: Import context image into kernel context's ppgtt only when reconfiguring powergated slice/subslices (Chris) Use aliasing ppgtt when needed (Michel) Tvrtko Ursulin: v10: * Update for upstream changes. * Request submit needs a RPM reference. * Reject on !FULL_PPGTT for simplicity. * Pull out get/set param to helpers for readability and less indent. * Use i915_request_await_dma_fence in add_global_barrier to skip waits on the same timeline and avoid GEM_BUG_ON. * No need to explicitly assign a NULL pointer to engine in legacy mode. * No need to move gen8_make_rpcs up. * Factored out global barrier as prep patch. * Allow to only CAP_SYS_ADMIN if !Gen11. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Issue: https://github.com/intel/media-driver/issues/267 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin Cc: Dmitry Rogozhkin Cc: Tvrtko Ursulin Cc: Zhipeng Gong Cc: Joonas Lahtinen Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_context.c | 187 +++- drivers/gpu/drm/i915/intel_lrc.c| 55 +++ drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + include/uapi/drm/i915_drm.h | 43 ++ 4 files changed, 288 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 8a12984e7495..6d6220634e9e 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = ctx->i915; + struct i915_request *rq; + struct intel_ring *ring; + int ret; + + lockdep_assert_held(>drm.struct_mutex); + + /* Submitting requests etc needs the hw awake. */ + intel_runtime_pm_get(i915); + + i915_retire_requests(i915); ? I wondered myself but did not make myself dig through all the history of the series. Cannot think that it does anything useful in the current design. + + /* Now use the RCS to actually reconfigure. */ + engine = i915->engine[RCS]; ? Modifying registers stored in another engine's context image. Well, I was hoping design was
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/18 17:05, Lionel Landwerlin wrote: On 14/08/18 16:18, Chris Wilson wrote: Quoting Lionel Landwerlin (2018-08-14 16:11:36) On 14/08/18 15:59, Chris Wilson wrote: And I'd still recommend not using indirect access if we can apply the changes immediately. -Chris Hangs on Gen9 :( How does modifying the context image of an idle (unpinned) context cause a hang? -Chris I thought you meant a context modifying it's own RPCS register... no? - Lionel Oh I get it now... Sorry, forget me :) - Lionel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/18 16:18, Chris Wilson wrote: Quoting Lionel Landwerlin (2018-08-14 16:11:36) On 14/08/18 15:59, Chris Wilson wrote: And I'd still recommend not using indirect access if we can apply the changes immediately. -Chris Hangs on Gen9 :( How does modifying the context image of an idle (unpinned) context cause a hang? -Chris I thought you meant a context modifying it's own RPCS register... no? - Lionel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Tvrtko Ursulin (2018-08-14 15:40:58) > +static int > +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, > + struct intel_engine_cs *engine, > + struct intel_sseu sseu) > +{ > + struct drm_i915_private *i915 = ctx->i915; > + struct i915_request *rq; > + struct intel_ring *ring; > + int ret; > + > + lockdep_assert_held(>drm.struct_mutex); > + > + /* Submitting requests etc needs the hw awake. */ > + intel_runtime_pm_get(i915); > + > + i915_retire_requests(i915); > + > + /* Now use the RCS to actually reconfigure. */ > + engine = i915->engine[RCS]; > + > + rq = i915_request_alloc(engine, i915->kernel_context); > + if (IS_ERR(rq)) { > + ret = PTR_ERR(rq); > + goto out_put; > + } > + > + ret = engine->emit_rpcs_config(rq, ctx, sseu); > + if (ret) > + goto out_add; > + > + /* Queue this switch after all other activity */ > + list_for_each_entry(ring, >gt.active_rings, active_link) { > + struct i915_request *prev; > + > + prev = last_request_on_engine(ring->timeline, engine); > + if (prev) > + i915_sw_fence_await_sw_fence_gfp(>submit, > +>submit, > +I915_FENCE_GFP); > + } > + > + i915_gem_set_global_barrier(i915, rq); > + > +out_add: > + i915_request_add(rq); > +out_put: > + intel_runtime_pm_put(i915); > + > + return ret; Looks like we should be able to hook this up to a selftest to confirm the modification does land in the target context image, and a SRM to confirm it loaded. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Lionel Landwerlin (2018-08-14 16:11:36) > On 14/08/18 15:59, Chris Wilson wrote: > > And I'd still recommend not using indirect access if we can apply the > > changes immediately. > > -Chris > > > > Hangs on Gen9 :( How does modifying the context image of an idle (unpinned) context cause a hang? -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 14/08/18 15:59, Chris Wilson wrote: And I'd still recommend not using indirect access if we can apply the changes immediately. -Chris Hangs on Gen9 :( - Lionel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Tvrtko Ursulin (2018-08-14 15:40:58) > From: Chris Wilson > > We want to allow userspace to reconfigure the subslice configuration for > its own use case. To do so, we expose a context parameter to allow > adjustment of the RPCS register stored within the context image (and > currently not accessible via LRI). If the context is adjusted before > first use, the adjustment is for "free"; otherwise if the context is > active we flush the context off the GPU (stalling all users) and forcing > the GPU to save the context to memory where we can modify it and so > ensure that the register is reloaded on next execution. > > The overhead of managing additional EU subslices can be significant, > especially in multi-context workloads. Non-GPGPU contexts should > preferably disable the subslices it is not using, and others should > fine-tune the number to match their workload. > > We expose complete control over the RPCS register, allowing > configuration of slice/subslice, via masks packed into a u64 for > simplicity. For example, > > struct drm_i915_gem_context_param arg; > struct drm_i915_gem_context_param_sseu sseu = { .class = 0, > .instance = 0, }; > > memset(, 0, sizeof(arg)); > arg.ctx_id = ctx; > arg.param = I915_CONTEXT_PARAM_SSEU; > arg.value = (uintptr_t) > if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { > sseu.packed.subslice_mask = 0; > > drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); > } > > could be used to disable all subslices where supported. > > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) > > v3: Add ability to program this per engine (Chris) > > v4: Move most get_sseu() into i915_gem_context.c (Lionel) > > v5: Validate sseu configuration against the device's capabilities (Lionel) > > v6: Change context powergating settings through MI_SDM on kernel context > (Chris) > > v7: Synchronize the requests following a powergating setting change using a > global > dependency (Chris) > Iterate timelines through dev_priv.gt.active_rings (Tvrtko) > Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) > > v8: s/union intel_sseu/struct intel_sseu/ (Lionel) > s/dev_priv/i915/ (Tvrtko) > Change uapi class/instance fields to u16 (Tvrtko) > Bump mask fields to 64bits (Lionel) > Don't return EPERM when dynamic sseu is disabled (Tvrtko) > > v9: Import context image into kernel context's ppgtt only when > reconfiguring powergated slice/subslices (Chris) > Use aliasing ppgtt when needed (Michel) > > Tvrtko Ursulin: > > v10: > * Update for upstream changes. > * Request submit needs a RPM reference. > * Reject on !FULL_PPGTT for simplicity. > * Pull out get/set param to helpers for readability and less indent. > * Use i915_request_await_dma_fence in add_global_barrier to skip waits >on the same timeline and avoid GEM_BUG_ON. > * No need to explicitly assign a NULL pointer to engine in legacy mode. > * No need to move gen8_make_rpcs up. > * Factored out global barrier as prep patch. > * Allow to only CAP_SYS_ADMIN if !Gen11. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > Issue: https://github.com/intel/media-driver/issues/267 > Signed-off-by: Chris Wilson > Signed-off-by: Lionel Landwerlin > Cc: Dmitry Rogozhkin > Cc: Tvrtko Ursulin > Cc: Zhipeng Gong > Cc: Joonas Lahtinen > Signed-off-by: Tvrtko Ursulin > --- > drivers/gpu/drm/i915/i915_gem_context.c | 187 +++- > drivers/gpu/drm/i915/intel_lrc.c| 55 +++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + > include/uapi/drm/i915_drm.h | 43 ++ > 4 files changed, 288 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > b/drivers/gpu/drm/i915/i915_gem_context.c > index 8a12984e7495..6d6220634e9e 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device > *dev, void *data, > return 0; > } > > +static int > +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, > + struct intel_engine_cs *engine, > + struct intel_sseu sseu) > +{ > + struct drm_i915_private *i915 = ctx->i915; > + struct i915_request *rq; > + struct intel_ring *ring; > + int ret; > + > + lockdep_assert_held(>drm.struct_mutex); > + > + /* Submitting requests etc needs the hw awake. */ > + intel_runtime_pm_get(i915); > + > + i915_retire_requests(i915); ? > + > + /* Now use the RCS to actually reconfigure. */ > + engine = i915->engine[RCS]; ? Modifying registers stored in another engine's context image. > + > + rq =
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 08/05/18 21:56, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-05-03 18:18:43) On 25/04/2018 12:45, Lionel Landwerlin wrote: From: Chris WilsonWe want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin c: Dmitry Rogozhkin CC: Tvrtko Ursulin CC: Zhipeng Gong CC: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_context.c | 82 - drivers/gpu/drm/i915/intel_lrc.c| 55 + drivers/gpu/drm/i915/intel_lrc.h| 4 ++ include/uapi/drm/i915_drm.h | 28 + 4 files changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index bdf050beeb94..b97ddcf47514 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static struct i915_gem_context_sseu +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, + const struct drm_i915_gem_context_param_sseu *user_sseu) +{ + struct i915_gem_context_sseu value = { + .slice_mask = user_sseu->packed.slice_mask == 0 ? + sseu->slice_mask : + (user_sseu->packed.slice_mask & sseu->slice_mask), + .subslice_mask = user_sseu->packed.subslice_mask == 0 ? + sseu->subslice_mask[0] : + (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]), + .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice, + sseu->max_eus_per_subslice), + .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice, + sseu->max_eus_per_subslice), + }; + + return value; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_PRIORITY: args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: { + struct drm_i915_gem_context_param_sseu param_sseu; + struct intel_engine_cs *engine; + + if (copy_from_user(_sseu, u64_to_user_ptr(args->value), +sizeof(param_sseu))) { + ret = -EFAULT; + break; + } + + engine = i915_gem_engine_from_flags(to_i915(dev), file, + param_sseu.flags); + if (!engine) { + ret = -EINVAL; + break; + } + + param_sseu.packed.slice_mask = + ctx->engine[engine->id].sseu.slice_mask; + param_sseu.packed.subslice_mask = +
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Tvrtko Ursulin (2018-05-03 18:18:43) > > On 25/04/2018 12:45, Lionel Landwerlin wrote: > > From: Chris Wilson> > > > We want to allow userspace to reconfigure the subslice configuration for > > its own use case. To do so, we expose a context parameter to allow > > adjustment of the RPCS register stored within the context image (and > > currently not accessible via LRI). If the context is adjusted before > > first use, the adjustment is for "free"; otherwise if the context is > > active we flush the context off the GPU (stalling all users) and forcing > > the GPU to save the context to memory where we can modify it and so > > ensure that the register is reloaded on next execution. > > > > The overhead of managing additional EU subslices can be significant, > > especially in multi-context workloads. Non-GPGPU contexts should > > preferably disable the subslices it is not using, and others should > > fine-tune the number to match their workload. > > > > We expose complete control over the RPCS register, allowing > > configuration of slice/subslice, via masks packed into a u64 for > > simplicity. For example, > > > > struct drm_i915_gem_context_param arg; > > struct drm_i915_gem_context_param_sseu sseu = { .flags = > > I915_EXEC_RENDER }; > > > > memset(, 0, sizeof(arg)); > > arg.ctx_id = ctx; > > arg.param = I915_CONTEXT_PARAM_SSEU; > > arg.value = (uintptr_t) > > if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { > > sseu.packed.subslice_mask = 0; > > > > drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); > > } > > > > could be used to disable all subslices where supported. > > > > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() > > (Lionel) > > > > v3: Add ability to program this per engine (Chris) > > > > v4: Move most get_sseu() into i915_gem_context.c (Lionel) > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > > Signed-off-by: Chris Wilson > > Signed-off-by: Lionel Landwerlin > > c: Dmitry Rogozhkin > > CC: Tvrtko Ursulin > > CC: Zhipeng Gong > > CC: Joonas Lahtinen > > --- > > drivers/gpu/drm/i915/i915_gem_context.c | 82 - > > drivers/gpu/drm/i915/intel_lrc.c| 55 + > > drivers/gpu/drm/i915/intel_lrc.h| 4 ++ > > include/uapi/drm/i915_drm.h | 28 + > > 4 files changed, 168 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > > b/drivers/gpu/drm/i915/i915_gem_context.c > > index bdf050beeb94..b97ddcf47514 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device > > *dev, void *data, > > return 0; > > } > > > > +static struct i915_gem_context_sseu > > +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, > > + const struct > > drm_i915_gem_context_param_sseu *user_sseu) > > +{ > > + struct i915_gem_context_sseu value = { > > + .slice_mask = user_sseu->packed.slice_mask == 0 ? > > + sseu->slice_mask : > > + (user_sseu->packed.slice_mask & > > sseu->slice_mask), > > + .subslice_mask = user_sseu->packed.subslice_mask == 0 ? > > + sseu->subslice_mask[0] : > > + (user_sseu->packed.subslice_mask & > > sseu->subslice_mask[0]), > > + .min_eus_per_subslice = > > min(user_sseu->packed.min_eus_per_subslice, > > + sseu->max_eus_per_subslice), > > + .max_eus_per_subslice = > > min(user_sseu->packed.max_eus_per_subslice, > > + sseu->max_eus_per_subslice), > > + }; > > + > > + return value; > > +} > > + > > int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, > > struct drm_file *file) > > { > > @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device > > *dev, void *data, > > case I915_CONTEXT_PARAM_PRIORITY: > > args->value = ctx->sched.priority; > > break; > > + case I915_CONTEXT_PARAM_SSEU: { > > + struct drm_i915_gem_context_param_sseu param_sseu; > > + struct intel_engine_cs *engine; > > + > > + if (copy_from_user(_sseu, u64_to_user_ptr(args->value), > > +sizeof(param_sseu))) { > > + ret = -EFAULT; > > + break; > > + } > > + > > + engine =
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
>> Context of the question was whether you need to change slice configuration >> for a single GEM context between submitting batch buffers? When we create a context we know the optimal slice count for it. And this optimal point does not change for this context in unperturbed conditions, i.e. when context runs alone. However, there are critical use cases when we never run single context, but instead we run few contexts in parallel and each of them has its own optimal slice count operating point. As a result, major question is: what is the cost of context switch if there is associated slice configuration change, powering on or off the slice(s)? In the ideal situation when the cost is 0, we don't need any change of slice configuration for single GEM context between submitting batch buffers. Problem is that cost is far from 0. And the cost is non-tolerable for the worst case when we will have a switch at each next batch buffer. As a result, we are forced to have some negotiation channel between different contexts and make them agree on some single slice configuration which will exist for reasonably long period of time to have associated cost negligible in genera. During this period we will submit a number of batch buffers before next reconfiguration attempt. So, that's the situation when we need to reconfigure slice configuration for a single GEM context between submitting batches. Dmitry. -Original Message- From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com] Sent: Tuesday, May 8, 2018 1:25 AM To: Rogozhkin, Dmitry V <dmitry.v.rogozh...@intel.com>; Landwerlin, Lionel G <lionel.g.landwer...@intel.com>; intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote: >>> I'm pretty sure Dmitry wants dynamic configurations. > > Yes, I afraid we really need dynamic slice configurations for media. Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers? Regards, Tvrtko > *From:*Landwerlin, Lionel G > *Sent:* Friday, May 4, 2018 9:25 AM > *To:* Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; > intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V > <dmitry.v.rogozh...@intel.com> > *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) > configuration to userspace > > On 03/05/18 18:18, Tvrtko Ursulin wrote: > > +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, > + struct intel_engine_cs *engine, > + struct i915_gem_context_sseu *sseu) > +{ > + struct drm_i915_private *dev_priv = ctx->i915; > + struct intel_context *ce; > + enum intel_engine_id id; > + int ret; > + > + lockdep_assert_held(_priv->drm.struct_mutex); > + > + if (memcmp(sseu, >engine[engine->id].sseu, > sizeof(*sseu)) == 0) > + return 0; > + > + /* > + * We can only program this on render ring. > + */ > + ce = >engine[RCS]; > + > + if (ce->pin_count) { /* Assume that the context is active! */ > + ret = i915_gem_switch_to_kernel_context(dev_priv); > + if (ret) > + return ret; > + > + ret = i915_gem_wait_for_idle(dev_priv, > + I915_WAIT_INTERRUPTIBLE | > + I915_WAIT_LOCKED); > > > Could we consider the alternative of only allowing this to be > configured on context create? That way we would not need to idle the > GPU every time userspace decides to fiddle with it. It is > unprivileged so quite an easy way for random app to ruin GPU > performance for everyone. > > Regards, > > Tvrtko > > I'm pretty sure Dmitry wants dynamic configurations. > > Dmitry? > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote: I'm pretty sure Dmitry wants dynamic configurations. Yes, I afraid we really need dynamic slice configurations for media. Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers? Regards, Tvrtko *From:*Landwerlin, Lionel G *Sent:* Friday, May 4, 2018 9:25 AM *To:* Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V <dmitry.v.rogozh...@intel.com> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace On 03/05/18 18:18, Tvrtko Ursulin wrote: +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct i915_gem_context_sseu *sseu) +{ + struct drm_i915_private *dev_priv = ctx->i915; + struct intel_context *ce; + enum intel_engine_id id; + int ret; + + lockdep_assert_held(_priv->drm.struct_mutex); + + if (memcmp(sseu, >engine[engine->id].sseu, sizeof(*sseu)) == 0) + return 0; + + /* + * We can only program this on render ring. + */ + ce = >engine[RCS]; + + if (ce->pin_count) { /* Assume that the context is active! */ + ret = i915_gem_switch_to_kernel_context(dev_priv); + if (ret) + return ret; + + ret = i915_gem_wait_for_idle(dev_priv, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_LOCKED); Could we consider the alternative of only allowing this to be configured on context create? That way we would not need to idle the GPU every time userspace decides to fiddle with it. It is unprivileged so quite an easy way for random app to ruin GPU performance for everyone. Regards, Tvrtko I'm pretty sure Dmitry wants dynamic configurations. Dmitry? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
>> I'm pretty sure Dmitry wants dynamic configurations. Yes, I afraid we really need dynamic slice configurations for media. From: Landwerlin, Lionel G Sent: Friday, May 4, 2018 9:25 AM To: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V <dmitry.v.rogozh...@intel.com> Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace On 03/05/18 18:18, Tvrtko Ursulin wrote: +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct i915_gem_context_sseu *sseu) +{ +struct drm_i915_private *dev_priv = ctx->i915; +struct intel_context *ce; +enum intel_engine_id id; +int ret; + +lockdep_assert_held(_priv->drm.struct_mutex); + +if (memcmp(sseu, >engine[engine->id].sseu, sizeof(*sseu)) == 0) +return 0; + +/* + * We can only program this on render ring. + */ +ce = >engine[RCS]; + +if (ce->pin_count) { /* Assume that the context is active! */ +ret = i915_gem_switch_to_kernel_context(dev_priv); +if (ret) +return ret; + +ret = i915_gem_wait_for_idle(dev_priv, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_LOCKED); Could we consider the alternative of only allowing this to be configured on context create? That way we would not need to idle the GPU every time userspace decides to fiddle with it. It is unprivileged so quite an easy way for random app to ruin GPU performance for everyone. Regards, Tvrtko I'm pretty sure Dmitry wants dynamic configurations. Dmitry? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 03/05/18 18:18, Tvrtko Ursulin wrote: +int intel_lr_context_set_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct i915_gem_context_sseu *sseu) +{ + struct drm_i915_private *dev_priv = ctx->i915; + struct intel_context *ce; + enum intel_engine_id id; + int ret; + + lockdep_assert_held(_priv->drm.struct_mutex); + + if (memcmp(sseu, >engine[engine->id].sseu, sizeof(*sseu)) == 0) + return 0; + + /* + * We can only program this on render ring. + */ + ce = >engine[RCS]; + + if (ce->pin_count) { /* Assume that the context is active! */ + ret = i915_gem_switch_to_kernel_context(dev_priv); + if (ret) + return ret; + + ret = i915_gem_wait_for_idle(dev_priv, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_LOCKED); Could we consider the alternative of only allowing this to be configured on context create? That way we would not need to idle the GPU every time userspace decides to fiddle with it. It is unprivileged so quite an easy way for random app to ruin GPU performance for everyone. Regards, Tvrtko I'm pretty sure Dmitry wants dynamic configurations. Dmitry? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 25/04/2018 12:45, Lionel Landwerlin wrote: From: Chris WilsonWe want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin c: Dmitry Rogozhkin CC: Tvrtko Ursulin CC: Zhipeng Gong CC: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_context.c | 82 - drivers/gpu/drm/i915/intel_lrc.c| 55 + drivers/gpu/drm/i915/intel_lrc.h| 4 ++ include/uapi/drm/i915_drm.h | 28 + 4 files changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index bdf050beeb94..b97ddcf47514 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static struct i915_gem_context_sseu +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, +const struct drm_i915_gem_context_param_sseu *user_sseu) +{ + struct i915_gem_context_sseu value = { + .slice_mask = user_sseu->packed.slice_mask == 0 ? + sseu->slice_mask : + (user_sseu->packed.slice_mask & sseu->slice_mask), + .subslice_mask = user_sseu->packed.subslice_mask == 0 ? +sseu->subslice_mask[0] : +(user_sseu->packed.subslice_mask & sseu->subslice_mask[0]), + .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice, + sseu->max_eus_per_subslice), + .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice, + sseu->max_eus_per_subslice), + }; + + return value; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_PRIORITY: args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: { + struct drm_i915_gem_context_param_sseu param_sseu; + struct intel_engine_cs *engine; + + if (copy_from_user(_sseu, u64_to_user_ptr(args->value), + sizeof(param_sseu))) { + ret = -EFAULT; + break; + } + + engine = i915_gem_engine_from_flags(to_i915(dev), file, + param_sseu.flags); + if (!engine) { + ret = -EINVAL; + break; + } + + param_sseu.packed.slice_mask = + ctx->engine[engine->id].sseu.slice_mask; + param_sseu.packed.subslice_mask = +
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 03/05/2018 17:04, Joonas Lahtinen wrote: Quoting Lionel Landwerlin (2018-04-26 13:22:30) On 26/04/18 11:00, Joonas Lahtinen wrote: Quoting Lionel Landwerlin (2018-04-25 14:45:21) From: Chris WilsonWe want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. This hit a dead end last time due to the system wide policy needed to avoid two parties fighting over the slice count (and going back and forth between two slice counts would counter the benefits received from this). Do we now have a solution for the contention? I don't see code to negotiate a global value, just raw setter. Regards, Joonas I've tried to come up with some numbers about the cost of the back & forth (see igt series). Other than that, I don't think we can expect the kernel to workaround the inefficient use of the hardware by userspace. Well, I'm pretty sure we should not try to make the situation too miserable for the basic usecases. If we allow two contexts to fight over the slice count, countering any perceived benefit, I don't think that'll be a good default. My recollection of the last round of discussion was that reasonable thing to do would be to only disable slices when everyone is willing to let go of. Then it would become a system maintainer level decision to only run on two slices for when they see fit (configuring the userspace). Yes there was definitely talk about co-ordination between OpenCL and media stacks. Btw we mentioned on IRC a few days back that a sysfs knob like allow_media_priority, defaulting to 0, could be a good start. That way userspace could use the uAPI, but unless sysadmin allows it to, there will be no fighting over slice counts. Then at some future time, as long as the uAPI is sensible, more advanced policies could be explored. Regardless if in userspace or i915. Regards, Tvrtko More advanced tactics would include scheduling work so that we try to avoid the slice count changes and deduct the switching time from the execution budget of the app requesting less slices (if we had fair time slicing). Regards, Joonas We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin c: Dmitry Rogozhkin CC: Tvrtko Ursulin CC: Zhipeng Gong CC: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_context.c | 82 - drivers/gpu/drm/i915/intel_lrc.c| 55 + drivers/gpu/drm/i915/intel_lrc.h| 4 ++ include/uapi/drm/i915_drm.h | 28 + 4 files changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index bdf050beeb94..b97ddcf47514 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static struct i915_gem_context_sseu +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, +const struct drm_i915_gem_context_param_sseu *user_sseu) +{ + struct
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 03/05/18 17:04, Joonas Lahtinen wrote: Quoting Lionel Landwerlin (2018-04-26 13:22:30) On 26/04/18 11:00, Joonas Lahtinen wrote: Quoting Lionel Landwerlin (2018-04-25 14:45:21) From: Chris WilsonWe want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. This hit a dead end last time due to the system wide policy needed to avoid two parties fighting over the slice count (and going back and forth between two slice counts would counter the benefits received from this). Do we now have a solution for the contention? I don't see code to negotiate a global value, just raw setter. Regards, Joonas I've tried to come up with some numbers about the cost of the back & forth (see igt series). Other than that, I don't think we can expect the kernel to workaround the inefficient use of the hardware by userspace. Well, I'm pretty sure we should not try to make the situation too miserable for the basic usecases. If we allow two contexts to fight over the slice count, countering any perceived benefit, I don't think that'll be a good default. My recollection of the last round of discussion was that reasonable thing to do would be to only disable slices when everyone is willing to let go of. Then it would become a system maintainer level decision to only run on two slices for when they see fit (configuring the userspace). How would you detect that everybody is willing to let go? We don't appear to have a mechanism to detect that. Wouldn't that require to teach all userspace drivers? More advanced tactics would include scheduling work so that we try to avoid the slice count changes and deduct the switching time from the execution budget of the app requesting less slices (if we had fair time slicing). That sounds more workable, although fairly complicated. Maybe we could tweak the priority based on the slice count and say : for the next (1second / number of slice config), priority bumped for the configuration with x slices and rotate every (1second / number of slice config) Would that resolve the back & forth issue completely though? Moving a particular context back & forth between different configurations is costly too. You need to drain the context, then pin the context before you can edit and resubmit. Thanks for your feedback, - Lionel Regards, Joonas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Joonas Lahtinen (2018-05-03 17:04:31) > More advanced tactics would include scheduling work so that we try to > avoid the slice count changes and deduct the switching time from the > execution budget of the app requesting less slices (if we had fair > time slicing). Have you been peeking at my reading material? ;) -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Lionel Landwerlin (2018-04-26 13:22:30) > On 26/04/18 11:00, Joonas Lahtinen wrote: > > Quoting Lionel Landwerlin (2018-04-25 14:45:21) > >> From: Chris Wilson> >> > >> We want to allow userspace to reconfigure the subslice configuration for > >> its own use case. To do so, we expose a context parameter to allow > >> adjustment of the RPCS register stored within the context image (and > >> currently not accessible via LRI). If the context is adjusted before > >> first use, the adjustment is for "free"; otherwise if the context is > >> active we flush the context off the GPU (stalling all users) and forcing > >> the GPU to save the context to memory where we can modify it and so > >> ensure that the register is reloaded on next execution. > >> > >> The overhead of managing additional EU subslices can be significant, > >> especially in multi-context workloads. Non-GPGPU contexts should > >> preferably disable the subslices it is not using, and others should > >> fine-tune the number to match their workload. > > This hit a dead end last time due to the system wide policy needed to > > avoid two parties fighting over the slice count (and going back and > > forth between two slice counts would counter the benefits received from > > this). > > > > Do we now have a solution for the contention? I don't see code to > > negotiate a global value, just raw setter. > > > > Regards, Joonas > > I've tried to come up with some numbers about the cost of the back & > forth (see igt series). > > Other than that, I don't think we can expect the kernel to workaround > the inefficient use of the hardware by userspace. Well, I'm pretty sure we should not try to make the situation too miserable for the basic usecases. If we allow two contexts to fight over the slice count, countering any perceived benefit, I don't think that'll be a good default. My recollection of the last round of discussion was that reasonable thing to do would be to only disable slices when everyone is willing to let go of. Then it would become a system maintainer level decision to only run on two slices for when they see fit (configuring the userspace). More advanced tactics would include scheduling work so that we try to avoid the slice count changes and deduct the switching time from the execution budget of the app requesting less slices (if we had fair time slicing). Regards, Joonas > > > > >> We expose complete control over the RPCS register, allowing > >> configuration of slice/subslice, via masks packed into a u64 for > >> simplicity. For example, > >> > >> struct drm_i915_gem_context_param arg; > >> struct drm_i915_gem_context_param_sseu sseu = { .flags = > >> I915_EXEC_RENDER }; > >> > >> memset(, 0, sizeof(arg)); > >> arg.ctx_id = ctx; > >> arg.param = I915_CONTEXT_PARAM_SSEU; > >> arg.value = (uintptr_t) > >> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) > >> { > >> sseu.packed.subslice_mask = 0; > >> > >> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); > >> } > >> > >> could be used to disable all subslices where supported. > >> > >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() > >> (Lionel) > >> > >> v3: Add ability to program this per engine (Chris) > >> > >> v4: Move most get_sseu() into i915_gem_context.c (Lionel) > >> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > >> Signed-off-by: Chris Wilson > >> Signed-off-by: Lionel Landwerlin > >> c: Dmitry Rogozhkin > >> CC: Tvrtko Ursulin > >> CC: Zhipeng Gong > >> CC: Joonas Lahtinen > >> --- > >> drivers/gpu/drm/i915/i915_gem_context.c | 82 - > >> drivers/gpu/drm/i915/intel_lrc.c| 55 + > >> drivers/gpu/drm/i915/intel_lrc.h| 4 ++ > >> include/uapi/drm/i915_drm.h | 28 + > >> 4 files changed, 168 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > >> b/drivers/gpu/drm/i915/i915_gem_context.c > >> index bdf050beeb94..b97ddcf47514 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem_context.c > >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c > >> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device > >> *dev, void *data, > >> return 0; > >> } > >> > >> +static struct i915_gem_context_sseu > >> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, > >> +const struct > >> drm_i915_gem_context_param_sseu *user_sseu) > >> +{ > >> + struct i915_gem_context_sseu value = { > >> + .slice_mask = user_sseu->packed.slice_mask == 0 ? > >> + sseu->slice_mask : >
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
On 26/04/18 11:00, Joonas Lahtinen wrote: Quoting Lionel Landwerlin (2018-04-25 14:45:21) From: Chris WilsonWe want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. This hit a dead end last time due to the system wide policy needed to avoid two parties fighting over the slice count (and going back and forth between two slice counts would counter the benefits received from this). Do we now have a solution for the contention? I don't see code to negotiate a global value, just raw setter. Regards, Joonas I've tried to come up with some numbers about the cost of the back & forth (see igt series). Other than that, I don't think we can expect the kernel to workaround the inefficient use of the hardware by userspace. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER }; memset(, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin c: Dmitry Rogozhkin CC: Tvrtko Ursulin CC: Zhipeng Gong CC: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_context.c | 82 - drivers/gpu/drm/i915/intel_lrc.c| 55 + drivers/gpu/drm/i915/intel_lrc.h| 4 ++ include/uapi/drm/i915_drm.h | 28 + 4 files changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index bdf050beeb94..b97ddcf47514 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static struct i915_gem_context_sseu +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, +const struct drm_i915_gem_context_param_sseu *user_sseu) +{ + struct i915_gem_context_sseu value = { + .slice_mask = user_sseu->packed.slice_mask == 0 ? + sseu->slice_mask : + (user_sseu->packed.slice_mask & sseu->slice_mask), + .subslice_mask = user_sseu->packed.subslice_mask == 0 ? +sseu->subslice_mask[0] : +(user_sseu->packed.subslice_mask & sseu->subslice_mask[0]), + .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice, + sseu->max_eus_per_subslice), + .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice, + sseu->max_eus_per_subslice), + }; + + return value; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_PRIORITY: args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: { + struct drm_i915_gem_context_param_sseu param_sseu; + struct intel_engine_cs *engine; + + if (copy_from_user(_sseu,
Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
Quoting Lionel Landwerlin (2018-04-25 14:45:21) > From: Chris Wilson> > We want to allow userspace to reconfigure the subslice configuration for > its own use case. To do so, we expose a context parameter to allow > adjustment of the RPCS register stored within the context image (and > currently not accessible via LRI). If the context is adjusted before > first use, the adjustment is for "free"; otherwise if the context is > active we flush the context off the GPU (stalling all users) and forcing > the GPU to save the context to memory where we can modify it and so > ensure that the register is reloaded on next execution. > > The overhead of managing additional EU subslices can be significant, > especially in multi-context workloads. Non-GPGPU contexts should > preferably disable the subslices it is not using, and others should > fine-tune the number to match their workload. This hit a dead end last time due to the system wide policy needed to avoid two parties fighting over the slice count (and going back and forth between two slice counts would counter the benefits received from this). Do we now have a solution for the contention? I don't see code to negotiate a global value, just raw setter. Regards, Joonas > > We expose complete control over the RPCS register, allowing > configuration of slice/subslice, via masks packed into a u64 for > simplicity. For example, > > struct drm_i915_gem_context_param arg; > struct drm_i915_gem_context_param_sseu sseu = { .flags = > I915_EXEC_RENDER }; > > memset(, 0, sizeof(arg)); > arg.ctx_id = ctx; > arg.param = I915_CONTEXT_PARAM_SSEU; > arg.value = (uintptr_t) > if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) { > sseu.packed.subslice_mask = 0; > > drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); > } > > could be used to disable all subslices where supported. > > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) > > v3: Add ability to program this per engine (Chris) > > v4: Move most get_sseu() into i915_gem_context.c (Lionel) > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > Signed-off-by: Chris Wilson > Signed-off-by: Lionel Landwerlin > c: Dmitry Rogozhkin > CC: Tvrtko Ursulin > CC: Zhipeng Gong > CC: Joonas Lahtinen > --- > drivers/gpu/drm/i915/i915_gem_context.c | 82 - > drivers/gpu/drm/i915/intel_lrc.c| 55 + > drivers/gpu/drm/i915/intel_lrc.h| 4 ++ > include/uapi/drm/i915_drm.h | 28 + > 4 files changed, 168 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > b/drivers/gpu/drm/i915/i915_gem_context.c > index bdf050beeb94..b97ddcf47514 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device > *dev, void *data, > return 0; > } > > +static struct i915_gem_context_sseu > +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu, > +const struct > drm_i915_gem_context_param_sseu *user_sseu) > +{ > + struct i915_gem_context_sseu value = { > + .slice_mask = user_sseu->packed.slice_mask == 0 ? > + sseu->slice_mask : > + (user_sseu->packed.slice_mask & > sseu->slice_mask), > + .subslice_mask = user_sseu->packed.subslice_mask == 0 ? > +sseu->subslice_mask[0] : > +(user_sseu->packed.subslice_mask & > sseu->subslice_mask[0]), > + .min_eus_per_subslice = > min(user_sseu->packed.min_eus_per_subslice, > + sseu->max_eus_per_subslice), > + .max_eus_per_subslice = > min(user_sseu->packed.max_eus_per_subslice, > + sseu->max_eus_per_subslice), > + }; > + > + return value; > +} > + > int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, > struct drm_file *file) > { > @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device > *dev, void *data, > case I915_CONTEXT_PARAM_PRIORITY: > args->value = ctx->sched.priority; > break; > + case I915_CONTEXT_PARAM_SSEU: { > + struct drm_i915_gem_context_param_sseu param_sseu; > + struct intel_engine_cs *engine; > + > + if (copy_from_user(_sseu, u64_to_user_ptr(args->value), > + sizeof(param_sseu))) { > +